PDF text extraction often stores all content without newlines. The previous
line-based splitter would produce one line > 200 chars which was filtered out,
yielding empty keywords. Replace with word-level sampling of the first 2000
chars, splitting on whitespace and common delimiters, skipping phone fragments,
emails, and URLs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
JobTokenService.CreateTokenAsync queries cvSearch.JobProviders for any
enabled row; returns null (no token created) when the table is empty or
all providers are disabled. TriggerStartAsync snapshots enabled providers
from DB at session-start time, preserving the existing snapshot contract.
CvMatcherController guards link-building on a non-null TokenId so the
"Start a job search" CTA is omitted from match emails when no providers
are configured.
JobSearchSettings.Providers list removed — provider config now lives
exclusively in the DB. CvSearchJobTask.GetProviders falls back to an
empty list with a warning (snapshot should always be populated from DB).
Closes#35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update CLAUDE.md: replace incorrect 'no XML doc on internal code' rule
with the correct convention (XML doc on all public methods and
non-trivial private/protected helpers)
- Restore /// <summary> on FileDownloadController private helpers
(HandleRangeRequest, StreamRangeAsync)
- Add full XML doc to all service contracts:
ICaptchaVerifier, IEmailSender, ICvMatcherService, IJobTextExtractor,
IJobTokenService, IDocumentClassifier, IRagService, ITextChunker,
ITextExtractor, IEmailTemplateService, ITemplateService
- Add /// <summary> and /// <inheritdoc /> to all concrete service classes
and their methods: RecaptchaVerifier, EmailApiEmailSender,
SmtpEmailDispatcher, CvMatcherService, JobTextExtractor, JobTokenService,
RagService, DocumentClassifier, TextChunker, TextExtractor,
HtmlJobSearcher, CvSearchEmailSender, CvSearchJobTask,
EmailTemplateService, DbTemplateService
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- EmailController: add class summary, full SwaggerResponse/ProducesResponseType
for 400 and 500, and Description on SwaggerOperation
- ContactController: fix terse "Failed." error message to
"Could not process subscription."
- FileDownloadController: remove redundant XML <response code> tags from
the public action doc block; convert private-method /// <summary> to //
(project convention: no XML doc on internal code)
- CvMatcherService: remove two dead commented-out blocks (old email send
and BuildEmailBody helper)
- JobTokenService: comment the phone/contact-line regex filter in
ExtractKeywords
- DocumentClassifier: comment the keyword-frequency scoring approach and
the confidence formula
- TextChunker: comment the sliding-window step (chunkSize - overlap)
- CvSearchJobTask: comment the GdprConsent = true rationale and the
BuildCvFileName sanitisation logic
- HtmlJobSearcher: comment GetLeftPart(UriPartial.Path) query-strip dedup
Closes#26
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New Apis/myai-models project: MyAiDbContext (schema myAi), TemplateEntity,
ITemplateService, DbTemplateService with 10-min in-memory cache
- Seeds EN+RO variants for all user-facing templates (match email, job search
results email, HTML status pages, AI system prompt)
- Match result email now sent in user's UI language (en/ro)
- Job search results email now respects session language
- Language propagates: MatchJobRequest -> token -> session -> email
- Add Language column to JobSearchTokens and JobSearchSessions (default 'en')
- All three Dockerfiles updated to include myai-models in build context
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The frontend sends the active language code (currentLang()) with every match
request. CvMatcherService injects a language instruction into the system prompt
so the LLM returns summary, strengths, gaps, recommendations, and evidence in
the correct language. The match result cache (CvMatchResults) now includes
Language as part of the lookup key so Romanian and English results are stored
and retrieved independently. Existing cached rows default to 'en'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New cv-search-models shared library: EF entities + CvSearchDbContext for cvSearch schema (JobSearchTokens, JobSearchSessions, JobSearchResults tables)
- New cv-search-job worker service: polls DB for pending sessions, scrapes job boards via configurable HTML scraping, runs LLM scoring via cv-matcher-api, emails ranked results
- cv-matcher-api: JobTokenService creates one-time tokens; JobSearchController handles link clicks and creates sessions
- api: proxies job-search start endpoint, appends job search link to match result email
- CI workflow updated to build and push myai-cv-search-job:staging image
- CLAUDE.md documentation added for all affected services
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>