Add CV Matcher wiki page
+174
@@ -0,0 +1,174 @@
|
||||
# CV Matcher
|
||||
|
||||
The CV matcher is the core feature of myAi.ro. Users upload a CV PDF and either paste a single job URL/description or rely on the RAG index to find the best matches — and get a scored, structured analysis from an LLM with strengths, gaps, and recommendations.
|
||||
|
||||
## Service Chain
|
||||
|
||||
```
|
||||
Browser / web
|
||||
-> api (port 8080) -- captcha, rate limiting, email, CV file cache
|
||||
-> cv-matcher-api (port 8082) -- match logic, RAG orchestration, LLM scoring
|
||||
-> rag-api (port 8081) -- vector indexing and semantic search
|
||||
-> OpenAI / Ollama -- LLM scoring (gpt-4o-mini by default)
|
||||
```
|
||||
|
||||
`api` is the only internet-facing service. All calls to `cv-matcher-api` and `rag-api` require the `X-Internal-Api-Key` header.
|
||||
|
||||
## Flows
|
||||
|
||||
### 1 -- CV Upload
|
||||
|
||||
1. Browser `POST /api/cv-matcher/upload` (multipart PDF, GDPR consent, captcha token)
|
||||
2. `api` verifies reCAPTCHA, forwards PDF to `cv-matcher-api POST /api/cv/upload`
|
||||
3. `cv-matcher-api` calls `rag-api POST /api/rag/index` to chunk and embed the PDF
|
||||
4. `rag-api` returns `{ documentId, textHash, chunks, characters, cached }`
|
||||
5. `api` caches the PDF to `{FileStorage:Path}/{documentId}.pdf` for later email attachment
|
||||
6. Returns `CvUploadResponse` to the browser
|
||||
|
||||
If the same PDF was previously uploaded (same `textHash`), `rag-api` returns the cached document — no re-embedding cost.
|
||||
|
||||
### 2 -- Match CV to a Single Job
|
||||
|
||||
1. Browser `POST /api/cv-matcher/match-job` with `{ cvDocumentId, jobUrl or jobDescription, email, gdprConsent, captchaToken }`
|
||||
2. `api` verifies reCAPTCHA, forwards to `cv-matcher-api POST /api/cv/match-job`
|
||||
3. `cv-matcher-api`:
|
||||
- Fetches CV text from `rag-api GET /api/rag/document/{cvDocumentId}`
|
||||
- Fetches and strips HTML from `jobUrl` via `JobTextExtractor` (or uses pasted `jobDescription`)
|
||||
- Indexes the job text into `rag-api` (type = "job")
|
||||
- Runs a semantic search against the RAG index to find matching job chunks
|
||||
- Calls `ScorePairAsync` (LLM) to produce the structured match result
|
||||
- Caches the result in `cvMatcher.CvMatchResults` by `(cvDocumentId, jobDocumentId)` hash
|
||||
4. `api` (on return):
|
||||
- If `email` was provided, creates a job search token via `IJobSearchApi.CreateTokenAsync`
|
||||
- Sends match result email with CV PDF attached and job search link included
|
||||
5. Returns `JobMatchResponse` to the browser
|
||||
|
||||
### 3 -- Find Jobs from RAG Index
|
||||
|
||||
1. Browser `POST /api/cv-matcher/find-jobs` with `{ cvDocumentId, topK }`
|
||||
2. `cv-matcher-api` fetches CV text from `rag-api`
|
||||
3. Builds a CV search profile string from the CV text
|
||||
4. Calls `rag-api` semantic search against indexed jobs (`targetDocumentTypes: ["job"]`)
|
||||
5. Takes top `DeepScoreTopN` results (default 5), runs `ScorePairAsync` LLM scoring on each
|
||||
6. Returns `FindJobsResponse { jobs: JobMatchResponse[] }`
|
||||
|
||||
## LLM Scoring (`ScorePairAsync`)
|
||||
|
||||
Called for both match-job and find-jobs. Checks the DB cache first -- if a result exists for the same `(cvId, jobId)` pair it is returned immediately (no AI call).
|
||||
|
||||
If not cached:
|
||||
- Truncates CV text to 18 000 chars, job text to 14 000 chars
|
||||
- Takes up to 4 RAG evidence chunks (or first 4 000 chars of job text as fallback)
|
||||
- Sends `system + user` prompt to the configured AI provider with `temperature = 0.2`
|
||||
- Expects JSON response; falls back to a safe error object if parsing fails
|
||||
- Persists the raw AI chat response in `cvMatcher.CvMatcherChatCache` by a hash of `(provider, model, temperature, systemPrompt, userPrompt)`
|
||||
|
||||
### Match Result Structure (`JobMatchResponse`)
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `score` | int 0-100 | Overall match percentage |
|
||||
| `summary` | string | One-paragraph narrative |
|
||||
| `strengths` | string[] | CV aspects that match well |
|
||||
| `gaps` | string[] | Missing or weak areas |
|
||||
| `recommendations` | string[] | Actionable advice for the candidate |
|
||||
| `evidence` | string[] | RAG chunks that drove the score |
|
||||
| `cached` | bool | True if returned from DB cache |
|
||||
| `jobDocumentId` | string? | RAG document id of the indexed job |
|
||||
| `jobUrl` | string? | Source URL of the job |
|
||||
|
||||
## JobTextExtractor
|
||||
|
||||
Extracts plain text from a job posting for the LLM prompt.
|
||||
|
||||
- If `jobDescription` (pasted text) is provided it is used directly -- no HTTP call
|
||||
- Otherwise fetches `jobUrl`, strips `<script>`, `<style>`, and all HTML tags, decodes HTML entities, collapses whitespace
|
||||
- Truncates to `MaxJobTextChars` (default 60 000, minimum 4 000)
|
||||
- Throws `InvalidOperationException` if the extracted text is under 80 characters
|
||||
|
||||
User-agent sent: `MyAi.ro CV Matcher/1.0`. HTTP timeout: 25 seconds.
|
||||
|
||||
## AI Providers
|
||||
|
||||
Configured under `Ai:Provider` (`OpenAI` or `Ollama`).
|
||||
|
||||
| Setting | Default | Notes |
|
||||
|---------|---------|-------|
|
||||
| `Ai:Provider` | `OpenAI` | Switch to `Ollama` for local/offline |
|
||||
| `Ai:OpenAI:ChatModel` | `gpt-4o-mini` | Any OpenAI chat model |
|
||||
| `Ai:OpenAI:TimeoutSeconds` | `90` | Per-request timeout |
|
||||
| `Ai:Ollama:BaseUrl` | `http://host.docker.internal:11434` | Local Ollama instance |
|
||||
| `Ai:Ollama:ChatModel` | `llama3.1:8b` | Any Ollama chat model |
|
||||
|
||||
Both providers use `response_format: json_object` (or Ollama `format: "json"`) to guarantee parseable output. All AI responses are cached in the DB by content hash -- repeated identical prompts never hit the API twice.
|
||||
|
||||
## Caching
|
||||
|
||||
Two layers of caching in `cvMatcher` schema:
|
||||
|
||||
| Cache | Table | Key | What's stored |
|
||||
|-------|-------|-----|---------------|
|
||||
| AI responses | `CvMatcherChatCache` | SHA256 of full prompt + model | Raw JSON string from LLM |
|
||||
| Match results | `CvMatchResults` | `(cvDocumentId, jobDocumentId)` | Full `JobMatchResponse` |
|
||||
|
||||
The match result cache means re-matching the same CV against the same job URL is instant and free.
|
||||
|
||||
## API Routes
|
||||
|
||||
### `api` (public, port 8080)
|
||||
|
||||
| Method | Route | Description |
|
||||
|--------|-------|-------------|
|
||||
| POST | `/api/cv-matcher/upload` | Upload CV PDF (multipart) |
|
||||
| POST | `/api/cv-matcher/match-job` | Match CV to a job URL or pasted description |
|
||||
| GET | `/api/cv-matcher/job-search/start?t=` | One-click job search start (token link) |
|
||||
|
||||
Rate limited by the `cvMatcher` policy: 10 requests / 10 minutes per IP.
|
||||
|
||||
### `cv-matcher-api` (internal, port 8082)
|
||||
|
||||
| Method | Route | Description |
|
||||
|--------|-------|-------------|
|
||||
| POST | `/api/cv/upload` | Index CV PDF into RAG |
|
||||
| POST | `/api/cv/match-job` | Score CV against a job URL or text |
|
||||
| POST | `/api/cv/find-jobs` | Find top jobs from RAG index for a CV |
|
||||
| POST | `/api/cv/job-search/token` | Create job search token |
|
||||
| POST | `/api/cv/job-search/token/{id}/start` | Validate token, create Pending session |
|
||||
| GET | `/api/health` | Health check |
|
||||
|
||||
## Settings Reference
|
||||
|
||||
### `Matcher` section (`cv-matcher-api`)
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `TopK` | `10` | RAG search result count |
|
||||
| `DeepScoreTopN` | `5` | How many RAG results get LLM deep scoring |
|
||||
| `MaxJobTextChars` | `60000` | Max job text length sent to LLM |
|
||||
|
||||
### `FileStorage` section (`api`)
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `Path` | `Files` | Directory for cached CV PDFs (relative to app root or absolute) |
|
||||
|
||||
Shared via bind mount with `cv-cleanup-job` and `cv-search-job`.
|
||||
|
||||
## Match Email
|
||||
|
||||
Sent by `api` via SMTP after a successful match when `email` is provided.
|
||||
|
||||
- Subject: `MyAi.ro CV Match: {score}% -- {jobLabel}`
|
||||
- Body: score, summary, strengths, gaps, recommendations
|
||||
- Attachment: cached CV PDF from `{FileStorage:Path}/{documentId}.pdf`
|
||||
- Footer: job search link (if token creation succeeded) — see [[Features/Internet-Job-Search]]
|
||||
|
||||
Sending is fire-and-forget: email failure does not affect the match result returned to the browser.
|
||||
|
||||
## Database Schema (`cvMatcher`)
|
||||
|
||||
Managed by `CvMatcherDbContext`. Migrations live in `Apis/cv-matcher-api/Migrations/`.
|
||||
|
||||
```powershell
|
||||
dotnet ef migrations add <Name> --context CvMatcherDbContext --project Apis/cv-matcher-api
|
||||
```
|
||||
Reference in New Issue
Block a user