12 Commits

Author SHA1 Message Date
claude 473c36d65f Store match-time ClientIpAddress on cvSearch.JobSearchTokens
Captures the IP when the user submits the CV match form and stores it on
the token, giving a full audit trail: token holds the match-site IP,
session holds the email link-click IP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 19:20:02 +03:00
claude d56729de42 Add Email and ClientIpAddress audit fields to cvSearch.JobSearchSessions and JobSearchResults
Captures client IP at job-search link-click time and threads it through to the session.
Both Email and ClientIpAddress are copied from session to each result row during processing.
Closes #47

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 19:11:50 +03:00
claude a83f6f705f Remove UseHeadlessBrowser from JobProvider — all fetches now go via page-fetcher-api
page-fetcher-api always uses Playwright (networkidle by default), so the
per-provider flag that chose between headless and plain HTTP is obsolete.

- Removed from JobProviderEntity, CvSearchDbContext, JobProviderConfig, JobTokenService
- HtmlJobSearcher no longer passes WaitFor (uses page-fetcher-api default)
- EF migration drops the column from cvSearch.JobProviders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:43:42 +03:00
claude 99e5cfb76b Fix job search: location filtering, keyword quality, anchor filter bypass
Closes #41

- Add RequireKeywordInAnchor per-provider flag (default true); set false for
  ejobs.ro and bestjobs.eu so Stage 2 anchor-text filter is skipped — their
  search URL already filters by relevance server-side
- Update AI system prompts (en + ro) to extract concise job-board-friendly
  keywords (role title + key tech, not abstract concepts) and candidate location
- Propagate location through JobMatchResponse -> CreateJobSearchTokenRequest ->
  JobSearchTokenEntity -> JobSearchSessionEntity
- Add {location} and {location-slug} substitution in HtmlJobSearcher
- Update provider SearchUrlTemplates to include location:
    ejobs.ro:    /locuri-de-munca/{location-slug}?q={keywords}
    bestjobs.eu: /ro/locuri-de-munca-in-{location-slug}?keywords={keywords}
    linkedin.com: ?keywords={keywords}&location={location}
- Three new migrations: AddRequireKeywordInAnchorAndLocation,
  ImproveKeywordsAndAddLocation, AddLocationToProviders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 15:45:45 +03:00
claude e38f40732f feat(providers): add headless browser scraping via Playwright for SPA job sites
Build and Push Docker Images Staging / build (push) Successful in 5m20s
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.

- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
  and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
  is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
  (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
  plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
  content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
  npx; copy browser binaries to the final image; install Chromium system
  libs (Ubuntu noble t64 variants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:42:52 +03:00
claude 209325ace5 fix(providers): correct bestjobs.eu job link filter pattern
Individual job listings on bestjobs.eu use /loc-de-munca/{slug} URLs.
The seeded JobLinkContains value /ro/locuri-de-munca/ matched only the
category navigation links (Vanzari, Inginerie, Management...), so
zero job URLs passed the stage-1 href filter and the scraper returned
nothing. Migration updates the stored record (Id=2) to /loc-de-munca/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:16:35 +03:00
claude 9bedf57f39 fix(migrations): replace hardcoded schema strings with MigrationConstants.SchemaName
Two migration files had literal schema strings that were missed in earlier passes:
- cv-search-data AddJobSearchTables: two CreateIndex calls used "cvSearch"
- rag-data InitialRagSchema: FK principalSchema used "rag"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:46:41 +03:00
claude b78ede23cf feat(job-search): extract keywords from LLM match call instead of heuristics
Piggybacks keyword extraction onto the existing CV-to-job LLM call —
no extra API calls. The system prompt now instructs the model to return
8-12 English job-search terms (job titles, technologies, skills, domains)
in a new `keywords` field alongside the existing score/summary fields.

Keywords flow: LLM JSON → JobMatchResponse.Keywords → CreateJobSearchTokenRequest →
JobSearchTokenEntity.Keywords (stored comma-separated) → JobSearchSessionEntity.Keywords
(copied at session-creation time, no RAG call needed).

Changes:
- Add Keywords to JobMatchResponse, CreateJobSearchTokenRequest, JobSearchTokenEntity
- IJobTokenService.CreateTokenAsync now accepts IReadOnlyList<string> keywords
- JobTokenService: store keywords on token; TriggerStartAsync reads token.Keywords
  instead of fetching CV text from RAG — removes IRagApiClient dependency
- Remove heuristic ExtractKeywords method
- Migration AddKeywordsToJobSearchTokens: adds Keywords column to cvSearch.JobSearchTokens
- Migration UpdateCvMatchSystemPromptKeywords: updates ai.cv-match.system-prompt seed
  to include keywords in the JSON shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:44:13 +03:00
claude c675954f8a fix(cv-search-data): use MigrationConstants.SchemaName in AddJobProviders migration
Replace hardcoded "cvSearch" string literals with MigrationConstants.SchemaName
in the Up, InsertData, and Down methods, consistent with all other migrations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:15:44 +03:00
claude 7c09f5a871 feat(cv-search-data): add JobProviders table to cvSearch schema
New JobProviderEntity persists provider config (name, URL template,
link filter, initial keywords, max results, display order) in the DB
instead of appsettings. Migration seeds three disabled defaults:
ejobs.ro, bestjobs.eu, and linkedin.com.

Closes #35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 11:46:34 +03:00
claude 487924e345 Move RAG repository from rag-api to rag-data — consolidate data layer ownership
- Move IRagRepository, EfRagRepository, and VectorSerializer from rag-api/Data to rag-data/Repositories
- Add rag-api-models ProjectReference to rag-data.csproj for model type availability
- Delete rag-api/Data folder (no longer needed; all data access is now in rag-data)
- This aligns RAG with email-api and other services: all data code in the data project

Pattern: rag-api (API logic) → rag-data (repository, EF entities, migrations)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-29 09:38:25 +03:00
claude e95ed36647 refactor: restructure solution into -models/-data/-api project taxonomy
Phases 1-10 of the planned refactoring:

Phase 1: rename shared-models -> common
  - namespace Shared.Models -> Common throughout
  - remove stale AspNetCore.Http.Features 5.0 reference

Phase 2: create shared-data with abstract BaseEntity
  - BaseEntity: required string Id { get; init; } + DateTime CreatedAt { get; init; }

Phase 3: rename myai-models -> myai-data
  - namespace MyAi.Models -> MyAi.Data
  - MigrationsAssembly("myai-data")

Phase 4: rename cv-search-models -> cv-search-data
  - namespace CvSearch.Models -> CvSearch.Data
  - move JobSearchSettings to cv-matcher-api-models
  - JobSearch*Entity now inherits BaseEntity

Phase 5: extract rag-data from rag-api
  - new project: Apis/rag-data with RagDbContext + entities + migrations
  - RagDocumentEntity inherits BaseEntity; cache entities use CacheKey PK
  - fix duplicate AddHttpClient<RagAiClient>/AddScoped registrations in rag-api
  - MigrationsAssembly("rag-data")

Phase 6: extract cv-matcher-data from cv-matcher-api
  - new project: Apis/cv-matcher-data with CvMatcherDbContext + entities + migrations
  - CvMatchResultEntity inherits BaseEntity; CvMatcherChatCacheEntity uses CacheKey PK
  - MigrationsAssembly("cv-matcher-data")

Phase 7: create empty cv-cleanup-job-models and cv-search-job-models

Phase 8: update all 5 Dockerfiles for renamed/new projects

Phase 9: reorganise .sln virtual folders (Apis/Jobs/Models/Data/Helpers)
  - update root CLAUDE.md with new project taxonomy and migration commands
  - update cv-matcher-api/CLAUDE.md and cv-search-job/CLAUDE.md

Phase 10: add Directory.Packages.props for centralised NuGet versions
  - remove Version= from all PackageReference elements in active .csproj files

No database changes. No runtime behaviour changes.
All MigrationId strings in __EFMigrationsHistory are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 15:26:03 +03:00