Clone
1
CV Matcher
gelu edited this page 2026-05-22 19:05:41 +03:00

CV Matcher

The CV matcher is the core feature of myAi.ro. Users upload a CV PDF and either paste a single job URL/description or rely on the RAG index to find the best matches — and get a scored, structured analysis from an LLM with strengths, gaps, and recommendations.

Service Chain

Browser / web
  -> api (port 8080) -- captcha, rate limiting, email, CV file cache
     -> cv-matcher-api (port 8082) -- match logic, RAG orchestration, LLM scoring
        -> rag-api (port 8081) -- vector indexing and semantic search
        -> OpenAI / Ollama -- LLM scoring (gpt-4o-mini by default)

api is the only internet-facing service. All calls to cv-matcher-api and rag-api require the X-Internal-Api-Key header.

Flows

1 -- CV Upload

  1. Browser POST /api/cv-matcher/upload (multipart PDF, GDPR consent, captcha token)
  2. api verifies reCAPTCHA, forwards PDF to cv-matcher-api POST /api/cv/upload
  3. cv-matcher-api calls rag-api POST /api/rag/index to chunk and embed the PDF
  4. rag-api returns { documentId, textHash, chunks, characters, cached }
  5. api caches the PDF to {FileStorage:Path}/{documentId}.pdf for later email attachment
  6. Returns CvUploadResponse to the browser

If the same PDF was previously uploaded (same textHash), rag-api returns the cached document — no re-embedding cost.

2 -- Match CV to a Single Job

  1. Browser POST /api/cv-matcher/match-job with { cvDocumentId, jobUrl or jobDescription, email, gdprConsent, captchaToken }
  2. api verifies reCAPTCHA, forwards to cv-matcher-api POST /api/cv/match-job
  3. cv-matcher-api:
    • Fetches CV text from rag-api GET /api/rag/document/{cvDocumentId}
    • Fetches and strips HTML from jobUrl via JobTextExtractor (or uses pasted jobDescription)
    • Indexes the job text into rag-api (type = "job")
    • Runs a semantic search against the RAG index to find matching job chunks
    • Calls ScorePairAsync (LLM) to produce the structured match result
    • Caches the result in cvMatcher.CvMatchResults by (cvDocumentId, jobDocumentId) hash
  4. api (on return):
    • If email was provided, creates a job search token via IJobSearchApi.CreateTokenAsync
    • Sends match result email with CV PDF attached and job search link included
  5. Returns JobMatchResponse to the browser

3 -- Find Jobs from RAG Index

  1. Browser POST /api/cv-matcher/find-jobs with { cvDocumentId, topK }
  2. cv-matcher-api fetches CV text from rag-api
  3. Builds a CV search profile string from the CV text
  4. Calls rag-api semantic search against indexed jobs (targetDocumentTypes: ["job"])
  5. Takes top DeepScoreTopN results (default 5), runs ScorePairAsync LLM scoring on each
  6. Returns FindJobsResponse { jobs: JobMatchResponse[] }

LLM Scoring (ScorePairAsync)

Called for both match-job and find-jobs. Checks the DB cache first -- if a result exists for the same (cvId, jobId) pair it is returned immediately (no AI call).

If not cached:

  • Truncates CV text to 18 000 chars, job text to 14 000 chars
  • Takes up to 4 RAG evidence chunks (or first 4 000 chars of job text as fallback)
  • Sends system + user prompt to the configured AI provider with temperature = 0.2
  • Expects JSON response; falls back to a safe error object if parsing fails
  • Persists the raw AI chat response in cvMatcher.CvMatcherChatCache by a hash of (provider, model, temperature, systemPrompt, userPrompt)

Match Result Structure (JobMatchResponse)

Field Type Description
score int 0-100 Overall match percentage
summary string One-paragraph narrative
strengths string[] CV aspects that match well
gaps string[] Missing or weak areas
recommendations string[] Actionable advice for the candidate
evidence string[] RAG chunks that drove the score
cached bool True if returned from DB cache
jobDocumentId string? RAG document id of the indexed job
jobUrl string? Source URL of the job

JobTextExtractor

Extracts plain text from a job posting for the LLM prompt.

  • If jobDescription (pasted text) is provided it is used directly -- no HTTP call
  • Otherwise fetches jobUrl, strips <script>, <style>, and all HTML tags, decodes HTML entities, collapses whitespace
  • Truncates to MaxJobTextChars (default 60 000, minimum 4 000)
  • Throws InvalidOperationException if the extracted text is under 80 characters

User-agent sent: MyAi.ro CV Matcher/1.0. HTTP timeout: 25 seconds.

AI Providers

Configured under Ai:Provider (OpenAI or Ollama).

Setting Default Notes
Ai:Provider OpenAI Switch to Ollama for local/offline
Ai:OpenAI:ChatModel gpt-4o-mini Any OpenAI chat model
Ai:OpenAI:TimeoutSeconds 90 Per-request timeout
Ai:Ollama:BaseUrl http://host.docker.internal:11434 Local Ollama instance
Ai:Ollama:ChatModel llama3.1:8b Any Ollama chat model

Both providers use response_format: json_object (or Ollama format: "json") to guarantee parseable output. All AI responses are cached in the DB by content hash -- repeated identical prompts never hit the API twice.

Caching

Two layers of caching in cvMatcher schema:

Cache Table Key What's stored
AI responses CvMatcherChatCache SHA256 of full prompt + model Raw JSON string from LLM
Match results CvMatchResults (cvDocumentId, jobDocumentId) Full JobMatchResponse

The match result cache means re-matching the same CV against the same job URL is instant and free.

API Routes

api (public, port 8080)

Method Route Description
POST /api/cv-matcher/upload Upload CV PDF (multipart)
POST /api/cv-matcher/match-job Match CV to a job URL or pasted description
GET /api/cv-matcher/job-search/start?t= One-click job search start (token link)

Rate limited by the cvMatcher policy: 10 requests / 10 minutes per IP.

cv-matcher-api (internal, port 8082)

Method Route Description
POST /api/cv/upload Index CV PDF into RAG
POST /api/cv/match-job Score CV against a job URL or text
POST /api/cv/find-jobs Find top jobs from RAG index for a CV
POST /api/cv/job-search/token Create job search token
POST /api/cv/job-search/token/{id}/start Validate token, create Pending session
GET /api/health Health check

Settings Reference

Matcher section (cv-matcher-api)

Key Default Description
TopK 10 RAG search result count
DeepScoreTopN 5 How many RAG results get LLM deep scoring
MaxJobTextChars 60000 Max job text length sent to LLM

FileStorage section (api)

Key Default Description
Path Files Directory for cached CV PDFs (relative to app root or absolute)

Shared via bind mount with cv-cleanup-job and cv-search-job.

Match Email

Sent by api via SMTP after a successful match when email is provided.

  • Subject: MyAi.ro CV Match: {score}% -- {jobLabel}
  • Body: score, summary, strengths, gaps, recommendations
  • Attachment: cached CV PDF from {FileStorage:Path}/{documentId}.pdf
  • Footer: job search link (if token creation succeeded) — see Features/Internet-Job-Search

Sending is fire-and-forget: email failure does not affect the match result returned to the browser.

Database Schema (cvMatcher)

Managed by CvMatcherDbContext. Migrations live in Apis/cv-matcher-api/Migrations/.

dotnet ef migrations add <Name> --context CvMatcherDbContext --project Apis/cv-matcher-api