Piggybacks keyword extraction onto the existing CV-to-job LLM call —
no extra API calls. The system prompt now instructs the model to return
8-12 English job-search terms (job titles, technologies, skills, domains)
in a new `keywords` field alongside the existing score/summary fields.
Keywords flow: LLM JSON → JobMatchResponse.Keywords → CreateJobSearchTokenRequest →
JobSearchTokenEntity.Keywords (stored comma-separated) → JobSearchSessionEntity.Keywords
(copied at session-creation time, no RAG call needed).
Changes:
- Add Keywords to JobMatchResponse, CreateJobSearchTokenRequest, JobSearchTokenEntity
- IJobTokenService.CreateTokenAsync now accepts IReadOnlyList<string> keywords
- JobTokenService: store keywords on token; TriggerStartAsync reads token.Keywords
instead of fetching CV text from RAG — removes IRagApiClient dependency
- Remove heuristic ExtractKeywords method
- Migration AddKeywordsToJobSearchTokens: adds Keywords column to cvSearch.JobSearchTokens
- Migration UpdateCvMatchSystemPromptKeywords: updates ai.cv-match.system-prompt seed
to include keywords in the JSON shape
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PDF text extraction often stores all content without newlines. The previous
line-based splitter would produce one line > 200 chars which was filtered out,
yielding empty keywords. Replace with word-level sampling of the first 2000
chars, splitting on whitespace and common delimiters, skipping phone fragments,
emails, and URLs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
JobTokenService.CreateTokenAsync queries cvSearch.JobProviders for any
enabled row; returns null (no token created) when the table is empty or
all providers are disabled. TriggerStartAsync snapshots enabled providers
from DB at session-start time, preserving the existing snapshot contract.
CvMatcherController guards link-building on a non-null TokenId so the
"Start a job search" CTA is omitted from match emails when no providers
are configured.
JobSearchSettings.Providers list removed — provider config now lives
exclusively in the DB. CvSearchJobTask.GetProviders falls back to an
empty list with a warning (snapshot should always be populated from DB).
Closes#35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update CLAUDE.md: replace incorrect 'no XML doc on internal code' rule
with the correct convention (XML doc on all public methods and
non-trivial private/protected helpers)
- Restore /// <summary> on FileDownloadController private helpers
(HandleRangeRequest, StreamRangeAsync)
- Add full XML doc to all service contracts:
ICaptchaVerifier, IEmailSender, ICvMatcherService, IJobTextExtractor,
IJobTokenService, IDocumentClassifier, IRagService, ITextChunker,
ITextExtractor, IEmailTemplateService, ITemplateService
- Add /// <summary> and /// <inheritdoc /> to all concrete service classes
and their methods: RecaptchaVerifier, EmailApiEmailSender,
SmtpEmailDispatcher, CvMatcherService, JobTextExtractor, JobTokenService,
RagService, DocumentClassifier, TextChunker, TextExtractor,
HtmlJobSearcher, CvSearchEmailSender, CvSearchJobTask,
EmailTemplateService, DbTemplateService
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- EmailController: add class summary, full SwaggerResponse/ProducesResponseType
for 400 and 500, and Description on SwaggerOperation
- ContactController: fix terse "Failed." error message to
"Could not process subscription."
- FileDownloadController: remove redundant XML <response code> tags from
the public action doc block; convert private-method /// <summary> to //
(project convention: no XML doc on internal code)
- CvMatcherService: remove two dead commented-out blocks (old email send
and BuildEmailBody helper)
- JobTokenService: comment the phone/contact-line regex filter in
ExtractKeywords
- DocumentClassifier: comment the keyword-frequency scoring approach and
the confidence formula
- TextChunker: comment the sliding-window step (chunkSize - overlap)
- CvSearchJobTask: comment the GdprConsent = true rationale and the
BuildCvFileName sanitisation logic
- HtmlJobSearcher: comment GetLeftPart(UriPartial.Path) query-strip dedup
Closes#26
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New Apis/myai-models project: MyAiDbContext (schema myAi), TemplateEntity,
ITemplateService, DbTemplateService with 10-min in-memory cache
- Seeds EN+RO variants for all user-facing templates (match email, job search
results email, HTML status pages, AI system prompt)
- Match result email now sent in user's UI language (en/ro)
- Job search results email now respects session language
- Language propagates: MatchJobRequest -> token -> session -> email
- Add Language column to JobSearchTokens and JobSearchSessions (default 'en')
- All three Dockerfiles updated to include myai-models in build context
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New cv-search-models shared library: EF entities + CvSearchDbContext for cvSearch schema (JobSearchTokens, JobSearchSessions, JobSearchResults tables)
- New cv-search-job worker service: polls DB for pending sessions, scrapes job boards via configurable HTML scraping, runs LLM scoring via cv-matcher-api, emails ranked results
- cv-matcher-api: JobTokenService creates one-time tokens; JobSearchController handles link clicks and creates sessions
- api: proxies job-search start endpoint, appends job search link to match result email
- CI workflow updated to build and push myai-cv-search-job:staging image
- CLAUDE.md documentation added for all affected services
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>