Commit Graph

7 Commits

Author SHA1 Message Date
claude a83f6f705f Remove UseHeadlessBrowser from JobProvider — all fetches now go via page-fetcher-api
page-fetcher-api always uses Playwright (networkidle by default), so the
per-provider flag that chose between headless and plain HTTP is obsolete.

- Removed from JobProviderEntity, CvSearchDbContext, JobProviderConfig, JobTokenService
- HtmlJobSearcher no longer passes WaitFor (uses page-fetcher-api default)
- EF migration drops the column from cvSearch.JobProviders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:43:42 +03:00
claude e38f40732f feat(providers): add headless browser scraping via Playwright for SPA job sites
Build and Push Docker Images Staging / build (push) Successful in 5m20s
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.

- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
  and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
  is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
  (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
  plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
  content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
  npx; copy browser binaries to the final image; install Chromium system
  libs (Ubuntu noble t64 variants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:42:52 +03:00
claude b78ede23cf feat(job-search): extract keywords from LLM match call instead of heuristics
Piggybacks keyword extraction onto the existing CV-to-job LLM call —
no extra API calls. The system prompt now instructs the model to return
8-12 English job-search terms (job titles, technologies, skills, domains)
in a new `keywords` field alongside the existing score/summary fields.

Keywords flow: LLM JSON → JobMatchResponse.Keywords → CreateJobSearchTokenRequest →
JobSearchTokenEntity.Keywords (stored comma-separated) → JobSearchSessionEntity.Keywords
(copied at session-creation time, no RAG call needed).

Changes:
- Add Keywords to JobMatchResponse, CreateJobSearchTokenRequest, JobSearchTokenEntity
- IJobTokenService.CreateTokenAsync now accepts IReadOnlyList<string> keywords
- JobTokenService: store keywords on token; TriggerStartAsync reads token.Keywords
  instead of fetching CV text from RAG — removes IRagApiClient dependency
- Remove heuristic ExtractKeywords method
- Migration AddKeywordsToJobSearchTokens: adds Keywords column to cvSearch.JobSearchTokens
- Migration UpdateCvMatchSystemPromptKeywords: updates ai.cv-match.system-prompt seed
  to include keywords in the JSON shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:44:13 +03:00
claude 7c09f5a871 feat(cv-search-data): add JobProviders table to cvSearch schema
New JobProviderEntity persists provider config (name, URL template,
link filter, initial keywords, max results, display order) in the DB
instead of appsettings. Migration seeds three disabled defaults:
ejobs.ro, bestjobs.eu, and linkedin.com.

Closes #35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 11:46:34 +03:00
claude 487924e345 Move RAG repository from rag-api to rag-data — consolidate data layer ownership
- Move IRagRepository, EfRagRepository, and VectorSerializer from rag-api/Data to rag-data/Repositories
- Add rag-api-models ProjectReference to rag-data.csproj for model type availability
- Delete rag-api/Data folder (no longer needed; all data access is now in rag-data)
- This aligns RAG with email-api and other services: all data code in the data project

Pattern: rag-api (API logic) → rag-data (repository, EF entities, migrations)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-29 09:38:25 +03:00
claude 57e8cb3f4b fix: Configure EF Core migration history tables with schema-qualified names
Each DbContext now explicitly configures its migration history table to use
the schema-qualified name pattern [schemaName].[_Migrations]:
- [cvMatcher].[_Migrations] for CvMatcherDbContext
- [emailApi].[_Migrations] for EmailApiDbContext
- [cvSearch].[_Migrations] for CvSearchDbContext
- [rag].[_Migrations] for RagDbContext
- [myAi].[_Migrations] for MyAiDbContext

This is done via OnConfiguring() with UseSqlServer().MigrationsHistoryTable(name, schema).

Removed incorrect rename migrations that were created due to misunderstanding
of the proper EF Core configuration approach.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-29 08:37:23 +03:00
claude e95ed36647 refactor: restructure solution into -models/-data/-api project taxonomy
Phases 1-10 of the planned refactoring:

Phase 1: rename shared-models -> common
  - namespace Shared.Models -> Common throughout
  - remove stale AspNetCore.Http.Features 5.0 reference

Phase 2: create shared-data with abstract BaseEntity
  - BaseEntity: required string Id { get; init; } + DateTime CreatedAt { get; init; }

Phase 3: rename myai-models -> myai-data
  - namespace MyAi.Models -> MyAi.Data
  - MigrationsAssembly("myai-data")

Phase 4: rename cv-search-models -> cv-search-data
  - namespace CvSearch.Models -> CvSearch.Data
  - move JobSearchSettings to cv-matcher-api-models
  - JobSearch*Entity now inherits BaseEntity

Phase 5: extract rag-data from rag-api
  - new project: Apis/rag-data with RagDbContext + entities + migrations
  - RagDocumentEntity inherits BaseEntity; cache entities use CacheKey PK
  - fix duplicate AddHttpClient<RagAiClient>/AddScoped registrations in rag-api
  - MigrationsAssembly("rag-data")

Phase 6: extract cv-matcher-data from cv-matcher-api
  - new project: Apis/cv-matcher-data with CvMatcherDbContext + entities + migrations
  - CvMatchResultEntity inherits BaseEntity; CvMatcherChatCacheEntity uses CacheKey PK
  - MigrationsAssembly("cv-matcher-data")

Phase 7: create empty cv-cleanup-job-models and cv-search-job-models

Phase 8: update all 5 Dockerfiles for renamed/new projects

Phase 9: reorganise .sln virtual folders (Apis/Jobs/Models/Data/Helpers)
  - update root CLAUDE.md with new project taxonomy and migration commands
  - update cv-matcher-api/CLAUDE.md and cv-search-job/CLAUDE.md

Phase 10: add Directory.Packages.props for centralised NuGet versions
  - remove Version= from all PackageReference elements in active .csproj files

No database changes. No runtime behaviour changes.
All MigrationId strings in __EFMigrationsHistory are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 15:26:03 +03:00