# cv-search-job — Internet Job Search Worker Background worker. Polls the database every 30 s for pending job search sessions and processes them. ## What it does (per session) 1. Reads session from DB (`Status = Pending`) 2. Sets `Status = Processing` 3. Deserializes `ProviderConfigJson` (snapshot of provider configs taken at token-start time) 4. For each enabled provider: calls `HtmlJobSearcher` to scrape job URLs 5. Deduplicates URLs across providers, caps at `MaxJobsToMatch` (default 15) 6. Calls `cv-matcher-api POST /api/cv/match-job` for each URL (uses existing LLM scoring) 7. Saves each result as `JobSearchResultEntity` 8. Filters to `Score >= MinMatchScore` (default 15) 9. Sets `Status = Done`, saves keywords + provider snapshot to session 10. Sends ranked results email via `CvSearchEmailSender` (dual-recipient: user + `Contact:ToEmail`) 11. Attaches CV PDF from shared file storage if it exists ## Crash recovery On every tick, sessions with `Status = Processing` AND `CreatedAt < UtcNow - 10 min` are reset to `Pending`. This handles container restarts mid-processing. ## HtmlJobSearcher — generic HTML scraper No per-provider logic. Config-driven. For each provider: 1. Combines `provider.InitialKeywords` + CV keywords from session, URL-encodes as space-joined string 2. `GET {SearchUrlTemplate}` with keyword substitution 3. Regex-parses all `text` tags 4. Two-stage filter: - Stage 1: `href` must contain `JobLinkContains` - Stage 2: anchor text must contain at least one CV keyword 5. Makes hrefs absolute, deduplicates, returns up to `MaxResults` URLs ## Provider config Defined under `JobSearch:Providers` in appsettings / docker-compose env vars. Three providers ship as defaults (all `Enabled: false`): | Name | Notes | |------|-------| | `ejobs.ro` | Romanian job board; reliable HTML structure | | `bestjobs.eu` | Romanian job board | | `linkedin.com` | Likely to return empty results due to bot detection | Provider config is snapshotted to `JobSearchSessionEntity.ProviderConfigJson` at session creation time (in `cv-matcher-api`), so changes to config do not affect in-flight sessions. To enable a provider via docker-compose env var (index-based): ``` JobSearch__Providers__0__Enabled=true # ejobs.ro JobSearch__Providers__1__Enabled=true # bestjobs.eu JobSearch__Providers__2__Enabled=true # linkedin.com ``` ## Email `CvSearchEmailSender` reads SMTP config directly from `IConfiguration` (same `Smtp:*` keys as `api`). Sends to both `toEmail` (from session) and `Contact:ToEmail` (operator copy). CV PDF attached from `{FileStorage:Path}/{cvDocumentId}.pdf` if the file exists. ## Shared volume `../Apis/api/Files:/app/Files` — same bind mount as `api` and `cv-cleanup-job`. CV PDFs written by `api` are readable here without any API call. ## Key settings | Section | Env var | Notes | |---------|---------|-------| | `Database` | `Database__*` | Same SQL Server as other services | | `CvMatcherApi` | `CvMatcherApi__BaseUrl`, `CvMatcherApi__InternalApiKey` | Internal call to match-job endpoint | | `Smtp` | `Smtp__*` | Same vars as `api` | | `Contact` | `Contact__ToEmail` | Operator copy recipient | | `FileStorage` | `FileStorage__Path` | Must match the shared volume mount path | | `JobSearch` | `JobSearch__Enabled`, `MinMatchScore`, `MaxJobsToMatch` | Core search limits | | `Jobs:Tasks:0` | `Jobs__Tasks__0__Interval` | Poll interval (default `00:00:30`) | ## Logging Follows the same scheme as `cv-cleanup-job`: - **Console** — `[HH:mm:ss LVL] SourceContext: Message` - **File** — `logs/cv-search-job-.log`, daily rolling, 30-day retention - **Email** (index 2) — Errors only, wired via `Serilog__WriteTo__2__Args__*` env vars in docker-compose - **Enrich** — `FromLogContext`, `WithMachineName`, `WithEnvironmentName` `Serilog.Sinks.Email` is available transitively through `startup-helpers` — no extra package needed in the csproj. ## EF migrations This project runs `CvSearchDbContext.Database.Migrate()` on startup. Migrations live in `Apis/cv-search-data/Migrations/`. To add a migration: see root CLAUDE.md.