Closes#41
- Add RequireKeywordInAnchor per-provider flag (default true); set false for
ejobs.ro and bestjobs.eu so Stage 2 anchor-text filter is skipped — their
search URL already filters by relevance server-side
- Update AI system prompts (en + ro) to extract concise job-board-friendly
keywords (role title + key tech, not abstract concepts) and candidate location
- Propagate location through JobMatchResponse -> CreateJobSearchTokenRequest ->
JobSearchTokenEntity -> JobSearchSessionEntity
- Add {location} and {location-slug} substitution in HtmlJobSearcher
- Update provider SearchUrlTemplates to include location:
ejobs.ro: /locuri-de-munca/{location-slug}?q={keywords}
bestjobs.eu: /ro/locuri-de-munca-in-{location-slug}?keywords={keywords}
linkedin.com: ?keywords={keywords}&location={location}
- Three new migrations: AddRequireKeywordInAnchorAndLocation,
ImproveKeywordsAndAddLocation, AddLocationToProviders
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.
- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
(remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
npx; copy browser binaries to the final image; install Chromium system
libs (Ubuntu noble t64 variants)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Piggybacks keyword extraction onto the existing CV-to-job LLM call —
no extra API calls. The system prompt now instructs the model to return
8-12 English job-search terms (job titles, technologies, skills, domains)
in a new `keywords` field alongside the existing score/summary fields.
Keywords flow: LLM JSON → JobMatchResponse.Keywords → CreateJobSearchTokenRequest →
JobSearchTokenEntity.Keywords (stored comma-separated) → JobSearchSessionEntity.Keywords
(copied at session-creation time, no RAG call needed).
Changes:
- Add Keywords to JobMatchResponse, CreateJobSearchTokenRequest, JobSearchTokenEntity
- IJobTokenService.CreateTokenAsync now accepts IReadOnlyList<string> keywords
- JobTokenService: store keywords on token; TriggerStartAsync reads token.Keywords
instead of fetching CV text from RAG — removes IRagApiClient dependency
- Remove heuristic ExtractKeywords method
- Migration AddKeywordsToJobSearchTokens: adds Keywords column to cvSearch.JobSearchTokens
- Migration UpdateCvMatchSystemPromptKeywords: updates ai.cv-match.system-prompt seed
to include keywords in the JSON shape
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New JobProviderEntity persists provider config (name, URL template,
link filter, initial keywords, max results, display order) in the DB
instead of appsettings. Migration seeds three disabled defaults:
ejobs.ro, bestjobs.eu, and linkedin.com.
Closes#35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move IRagRepository, EfRagRepository, and VectorSerializer from rag-api/Data to rag-data/Repositories
- Add rag-api-models ProjectReference to rag-data.csproj for model type availability
- Delete rag-api/Data folder (no longer needed; all data access is now in rag-data)
- This aligns RAG with email-api and other services: all data code in the data project
Pattern: rag-api (API logic) → rag-data (repository, EF entities, migrations)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Each DbContext now explicitly configures its migration history table to use
the schema-qualified name pattern [schemaName].[_Migrations]:
- [cvMatcher].[_Migrations] for CvMatcherDbContext
- [emailApi].[_Migrations] for EmailApiDbContext
- [cvSearch].[_Migrations] for CvSearchDbContext
- [rag].[_Migrations] for RagDbContext
- [myAi].[_Migrations] for MyAiDbContext
This is done via OnConfiguring() with UseSqlServer().MigrationsHistoryTable(name, schema).
Removed incorrect rename migrations that were created due to misunderstanding
of the proper EF Core configuration approach.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>