4 Commits

Author SHA1 Message Date
claude a83f6f705f Remove UseHeadlessBrowser from JobProvider — all fetches now go via page-fetcher-api
page-fetcher-api always uses Playwright (networkidle by default), so the
per-provider flag that chose between headless and plain HTTP is obsolete.

- Removed from JobProviderEntity, CvSearchDbContext, JobProviderConfig, JobTokenService
- HtmlJobSearcher no longer passes WaitFor (uses page-fetcher-api default)
- EF migration drops the column from cvSearch.JobProviders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:43:42 +03:00
claude 99e5cfb76b Fix job search: location filtering, keyword quality, anchor filter bypass
Closes #41

- Add RequireKeywordInAnchor per-provider flag (default true); set false for
  ejobs.ro and bestjobs.eu so Stage 2 anchor-text filter is skipped — their
  search URL already filters by relevance server-side
- Update AI system prompts (en + ro) to extract concise job-board-friendly
  keywords (role title + key tech, not abstract concepts) and candidate location
- Propagate location through JobMatchResponse -> CreateJobSearchTokenRequest ->
  JobSearchTokenEntity -> JobSearchSessionEntity
- Add {location} and {location-slug} substitution in HtmlJobSearcher
- Update provider SearchUrlTemplates to include location:
    ejobs.ro:    /locuri-de-munca/{location-slug}?q={keywords}
    bestjobs.eu: /ro/locuri-de-munca-in-{location-slug}?keywords={keywords}
    linkedin.com: ?keywords={keywords}&location={location}
- Three new migrations: AddRequireKeywordInAnchorAndLocation,
  ImproveKeywordsAndAddLocation, AddLocationToProviders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 15:45:45 +03:00
claude e38f40732f feat(providers): add headless browser scraping via Playwright for SPA job sites
Build and Push Docker Images Staging / build (push) Successful in 5m20s
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.

- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
  and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
  is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
  (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
  plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
  content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
  npx; copy browser binaries to the final image; install Chromium system
  libs (Ubuntu noble t64 variants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:42:52 +03:00
claude 7c09f5a871 feat(cv-search-data): add JobProviders table to cvSearch schema
New JobProviderEntity persists provider config (name, URL template,
link filter, initial keywords, max results, display order) in the DB
instead of appsettings. Migration seeds three disabled defaults:
ejobs.ro, bestjobs.eu, and linkedin.com.

Closes #35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 11:46:34 +03:00