feat: page-fetcher-api centralised Playwright page fetcher #44

Merged
claude merged 10 commits from feature/page-fetcher-api into main 2026-06-08 15:36:44 +00:00
Member

Closes #43

New page-fetcher-api service with Playwright. All fetches saved to pageFetcher schema.

Changes:

  • cv-matcher-api: JobTextExtractor uses IPageFetcherApiClient
  • cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient, keyword pre-filter in CvSearchJobTask
  • docker-compose, build.yml: new myai-page-fetcher-api service
Closes #43 New page-fetcher-api service with Playwright. All fetches saved to pageFetcher schema. Changes: - cv-matcher-api: JobTextExtractor uses IPageFetcherApiClient - cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient, keyword pre-filter in CvSearchJobTask - docker-compose, build.yml: new myai-page-fetcher-api service
claude added 1 commit 2026-06-08 14:44:45 +00:00
Introduces page-fetcher-api, a new internal ASP.NET Core service that
centralises all web-page fetching through a single Playwright (headless
Chromium) browser instance. All fetches are persisted to the pageFetcher
SQL schema for auditing.

New projects:
- Apis/page-fetcher-api-models: FetchPageRequest, FetchPageResponse, IPageFetcherApiClient
- Apis/page-fetcher-data: PageFetchDbContext, PageFetchEntity, InitialSchema migration (schema: pageFetcher)
- Apis/page-fetcher-api: PlaywrightBrowserService (singleton), PageFetcherService, PageController

Changes to existing services:
- cv-matcher-api: JobTextExtractor now calls IPageFetcherApiClient instead of HttpClient
- cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient (removes inline Playwright);
  CvSearchJobTask fetches individual job pages and applies keyword pre-filter before
  LLM call; passes pre-fetched JobDescription to cv-matcher-api to skip re-fetch
- common: add PageFetcherApiSettings
- docker-compose.yml, build.yml: add new service + env vars for callers

Closes #43

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 14:47:22 +00:00
gelu added 1 commit 2026-06-08 14:49:06 +00:00
Use http://page-fetcher-api:8080 (the Compose service key) for Docker DNS
resolution, consistent with all other internal service URLs (rag-api,
email-api, cv-matcher-api).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 14:51:55 +00:00
Settings classes belong in Settings/ with namespace PageFetcherApi.Settings,
not Services/. Matches the SmtpSettings placement in email-api.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 14:54:12 +00:00
Settings class now lives in Apis/page-fetcher-api-models/Settings/ with
namespace PageFetcher.Models.Settings, matching how EmailApiSettings is
placed in email-api-models/Settings/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 14:56:23 +00:00
Server-side-only settings (internal config not needed by callers) belong in
the API project itself, not in the models project. PageFetcherSettings
(DefaultWaitFor, TimeoutSeconds, MaxTextChars) mirrors SmtpSettings in
email-api/Settings/ — callers never reference these.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 15:00:46 +00:00
Settings classes now live in the -models project alongside DTOs and client
interfaces, eliminating the Settings/ folder from both API projects.

- SmtpSettings: email-api/Settings/ → email-api-models/Settings/ (namespace EmailApi.Models.Settings)
- PageFetcherSettings: page-fetcher-api/Settings/ → page-fetcher-api-models/Settings/ (namespace PageFetcher.Models.Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 15:04:06 +00:00
Fixes inconsistency where email-api used EmailApi.* and page-fetcher-api
used PageFetcherApi.*, while cv-matcher-api and rag-api use the generic
Api.* namespace. All four API projects now follow the same pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 15:06:41 +00:00
Removes the spurious Api segment to match the pattern used by all other
models projects: CvMatcher.Models.*, Rag.Models.*, PageFetcher.Models.*.

Updated all consumers: email-api, api, cv-search-job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu added 1 commit 2026-06-08 15:35:44 +00:00
- Refit 10.1.6 signing certificate was revoked; upgraded to 11.0.1 in Directory.Packages.props
- cv-matcher-api/Dockerfile and cv-search-job/Dockerfile were missing COPY steps
  for page-fetcher-api-models (added in this feature branch)

All 8 images now build cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gelu approved these changes 2026-06-08 15:36:30 +00:00
claude merged commit 61805e2fb5 into main 2026-06-08 15:36:44 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: AI/myAi#44