898dd09d50
Introduces page-fetcher-api, a new internal ASP.NET Core service that centralises all web-page fetching through a single Playwright (headless Chromium) browser instance. All fetches are persisted to the pageFetcher SQL schema for auditing. New projects: - Apis/page-fetcher-api-models: FetchPageRequest, FetchPageResponse, IPageFetcherApiClient - Apis/page-fetcher-data: PageFetchDbContext, PageFetchEntity, InitialSchema migration (schema: pageFetcher) - Apis/page-fetcher-api: PlaywrightBrowserService (singleton), PageFetcherService, PageController Changes to existing services: - cv-matcher-api: JobTextExtractor now calls IPageFetcherApiClient instead of HttpClient - cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient (removes inline Playwright); CvSearchJobTask fetches individual job pages and applies keyword pre-filter before LLM call; passes pre-fetched JobDescription to cv-matcher-api to skip re-fetch - common: add PageFetcherApiSettings - docker-compose.yml, build.yml: add new service + env vars for callers Closes #43 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
51 lines
2.2 KiB
Docker
51 lines
2.2 KiB
Docker
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
|
|
ARG BUILD_CONFIGURATION=Release
|
|
WORKDIR /src
|
|
COPY Directory.Packages.props ./
|
|
|
|
COPY Apis/page-fetcher-api/page-fetcher-api.csproj Apis/page-fetcher-api/
|
|
COPY Apis/page-fetcher-data/page-fetcher-data.csproj Apis/page-fetcher-data/
|
|
COPY Apis/page-fetcher-api-models/page-fetcher-api-models.csproj Apis/page-fetcher-api-models/
|
|
COPY Apis/common/common.csproj Apis/common/
|
|
COPY Apis/shared-data/shared-data.csproj Apis/shared-data/
|
|
COPY Helpers/startup-helpers/startup-helpers.csproj Helpers/startup-helpers/
|
|
COPY Helpers/common-helpers/common-helpers.csproj Helpers/common-helpers/
|
|
|
|
RUN dotnet restore Apis/page-fetcher-api/page-fetcher-api.csproj
|
|
|
|
COPY Apis/page-fetcher-api/ Apis/page-fetcher-api/
|
|
COPY Apis/page-fetcher-data/ Apis/page-fetcher-data/
|
|
COPY Apis/page-fetcher-api-models/ Apis/page-fetcher-api-models/
|
|
COPY Apis/common/ Apis/common/
|
|
COPY Apis/shared-data/ Apis/shared-data/
|
|
COPY Helpers/startup-helpers/ Helpers/startup-helpers/
|
|
COPY Helpers/common-helpers/ Helpers/common-helpers/
|
|
|
|
RUN dotnet publish Apis/page-fetcher-api/page-fetcher-api.csproj -c $BUILD_CONFIGURATION -o /app/publish /p:UseAppHost=false
|
|
|
|
# Download Playwright Chromium browser in the build stage.
|
|
# Node.js is only needed here to run npx — it is not copied to the final image.
|
|
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
|
|
RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \
|
|
&& npx --yes playwright@1.60.0 install chromium \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
|
|
WORKDIR /app
|
|
|
|
# System libraries required by Chromium on Debian bookworm
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
|
|
libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 \
|
|
libgbm1 libasound2t64 libpango-1.0-0 libcairo2 libatspi2.0-0 \
|
|
libwayland-client0 libx11-xcb1 libx11-6 libxcb1 libxext6 \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Copy the Playwright Chromium browser from the build stage
|
|
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
|
|
COPY --from=build /ms-playwright /ms-playwright
|
|
|
|
COPY --from=build /app/publish .
|
|
|
|
ENTRYPOINT ["dotnet", "page-fetcher-api.dll"]
|