feat: page-fetcher-api centralised Playwright page fetcher #44

Merged
claude merged 10 commits from feature/page-fetcher-api into main 2026-06-08 15:36:44 +00:00

10 Commits

Author SHA1 Message Date
claude dcfc50ff32 Fix Docker builds: upgrade Refit to 11.0.1, add page-fetcher-api-models to Dockerfiles
- Refit 10.1.6 signing certificate was revoked; upgraded to 11.0.1 in Directory.Packages.props
- cv-matcher-api/Dockerfile and cv-search-job/Dockerfile were missing COPY steps
  for page-fetcher-api-models (added in this feature branch)

All 8 images now build cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:35:41 +03:00
claude b1ed1cb201 Rename EmailApi.Models.* namespace to Email.Models.* in email-api-models
Removes the spurious Api segment to match the pattern used by all other
models projects: CvMatcher.Models.*, Rag.Models.*, PageFetcher.Models.*.

Updated all consumers: email-api, api, cv-search-job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:06:38 +03:00
claude e1f171168e Align email-api and page-fetcher-api namespaces to Api.* convention
Fixes inconsistency where email-api used EmailApi.* and page-fetcher-api
used PageFetcherApi.*, while cv-matcher-api and rag-api use the generic
Api.* namespace. All four API projects now follow the same pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:04:03 +03:00
claude ae2bc9b902 Move SmtpSettings and PageFetcherSettings into their respective models projects
Settings classes now live in the -models project alongside DTOs and client
interfaces, eliminating the Settings/ folder from both API projects.

- SmtpSettings: email-api/Settings/ → email-api-models/Settings/ (namespace EmailApi.Models.Settings)
- PageFetcherSettings: page-fetcher-api/Settings/ → page-fetcher-api-models/Settings/ (namespace PageFetcher.Models.Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 18:00:44 +03:00
claude 30a8df431f Move PageFetcherSettings back to page-fetcher-api/Settings/, matching SmtpSettings pattern
Server-side-only settings (internal config not needed by callers) belong in
the API project itself, not in the models project. PageFetcherSettings
(DefaultWaitFor, TimeoutSeconds, MaxTextChars) mirrors SmtpSettings in
email-api/Settings/ — callers never reference these.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 17:56:21 +03:00
claude 95b0cfa0a9 Move PageFetcherSettings to page-fetcher-api-models, consistent with EmailApiSettings pattern
Settings class now lives in Apis/page-fetcher-api-models/Settings/ with
namespace PageFetcher.Models.Settings, matching how EmailApiSettings is
placed in email-api-models/Settings/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 17:54:08 +03:00
claude 20b13647de Move PageFetcherSettings to Settings/ folder, consistent with email-api pattern
Settings classes belong in Settings/ with namespace PageFetcherApi.Settings,
not Services/. Matches the SmtpSettings placement in email-api.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 17:51:51 +03:00
claude df011f2a03 Fix PageFetcherApi BaseUrl default to use Docker service name, not container name
Use http://page-fetcher-api:8080 (the Compose service key) for Docker DNS
resolution, consistent with all other internal service URLs (rag-api,
email-api, cv-matcher-api).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 17:49:00 +03:00
claude 3414c61cea Commit 2026-06-08 17:47:17 +03:00
claude 898dd09d50 feat: add page-fetcher-api — centralised Playwright page fetcher
Introduces page-fetcher-api, a new internal ASP.NET Core service that
centralises all web-page fetching through a single Playwright (headless
Chromium) browser instance. All fetches are persisted to the pageFetcher
SQL schema for auditing.

New projects:
- Apis/page-fetcher-api-models: FetchPageRequest, FetchPageResponse, IPageFetcherApiClient
- Apis/page-fetcher-data: PageFetchDbContext, PageFetchEntity, InitialSchema migration (schema: pageFetcher)
- Apis/page-fetcher-api: PlaywrightBrowserService (singleton), PageFetcherService, PageController

Changes to existing services:
- cv-matcher-api: JobTextExtractor now calls IPageFetcherApiClient instead of HttpClient
- cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient (removes inline Playwright);
  CvSearchJobTask fetches individual job pages and applies keyword pre-filter before
  LLM call; passes pre-fetched JobDescription to cv-matcher-api to skip re-fetch
- common: add PageFetcherApiSettings
- docker-compose.yml, build.yml: add new service + env vars for callers

Closes #43

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 17:43:56 +03:00