Files
myAi/Apis/cv-matcher-api/CLAUDE.md
T
claude 6293fa89e3
Build and Push Docker Images / build (push) Failing after 1m36s
Add internet job search feature (cv-search-job)
- New cv-search-models shared library: EF entities + CvSearchDbContext for cvSearch schema (JobSearchTokens, JobSearchSessions, JobSearchResults tables)
- New cv-search-job worker service: polls DB for pending sessions, scrapes job boards via configurable HTML scraping, runs LLM scoring via cv-matcher-api, emails ranked results
- cv-matcher-api: JobTokenService creates one-time tokens; JobSearchController handles link clicks and creates sessions
- api: proxies job-search start endpoint, appends job search link to match result email
- CI workflow updated to build and push myai-cv-search-job:staging image
- CLAUDE.md documentation added for all affected services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 17:56:23 +03:00

2.7 KiB

cv-matcher-api — Internal CV Match Engine

Internal port 8082. Only reachable from api and cv-search-job via X-Internal-Api-Key.

Responsibilities

  • Indexes CV PDFs into the RAG system via rag-api
  • Matches a CV against a job posting URL (scrapes job HTML, scores pair with LLM)
  • Manages job search tokens and sessions for the one-click job search feature
  • Owns two EF DbContexts: CvMatcherDbContext (schema cvMatcher) and CvSearchDbContext (schema cvSearch)
  • Runs EF migrations for both contexts on startup

Key routes

Method Route Description
POST /api/cv/upload Index CV PDF into RAG
POST /api/cv/match-job Score CV against a job URL (LLM call)
POST /api/cv/find-jobs Find matching jobs from the RAG index
POST /api/cv/job-search/token Create a job search token (called by api after a match)
POST /api/cv/job-search/token/{tokenId}/start Validate token, create Pending session (called by api on link click)
GET /api/health Health check

Core services

  • CvMatcherService — orchestrates upload + match; calls IRagApiClient and IMatcherAiClient
  • JobTextExtractor — fetches a job page URL and extracts plain text
  • JobTokenService — creates tokens; validates + starts job search sessions; extracts CV keywords using simple heuristics (first 5 meaningful non-empty lines of CV text, split into words)

AI providers

Configured under Ai:Provider (OpenAI or Ollama). Both providers implement IMatcherAiClient.
Default model: gpt-4o-mini. Timeout: 90 s.

Database contexts

Both contexts use the same SQL Server connection string (from Database:* settings).

  • CvMatcherDbContext — schema cvMatcher; migrations in cv-matcher-api assembly
  • CvSearchDbContext — schema cvSearch; migrations in cv-search-models assembly (MigrationsAssembly = "cv-search-models")

Keyword extraction (JobTokenService.ExtractKeywords)

No LLM call. Takes the first 5 non-empty lines of CV text that are:

  • Longer than 5 characters
  • Not purely numeric or contact-line patterns

Splits into words, strips punctuation, deduplicates, returns up to 10 comma-separated keywords.
These keywords are stored in JobSearchSessionEntity.Keywords and used by cv-search-job for scraping.

Settings

Section Notes
Database Shared SQL Server connection
RagApi BaseUrl + InternalApiKey for rag-api
Ai Provider, model, timeout
Matcher TopK, DeepScoreTopN, MaxJobTextChars
JobSearch TokenExpiryDays, providers list (stored in session JSON)
InternalApi ApiKey used by UseInternalApiKeyProtection middleware