Files
myAi/Jobs/cv-search-job/Dockerfile
T
claude e38f40732f
Build and Push Docker Images Staging / build (push) Successful in 5m20s
feat(providers): add headless browser scraping via Playwright for SPA job sites
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.

- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
  and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
  is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
  (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
  plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
  content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
  npx; copy browser binaries to the final image; install Chromium system
  libs (Ubuntu noble t64 variants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:42:52 +03:00

57 lines
2.4 KiB
Docker

FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
ARG BUILD_CONFIGURATION=Release
WORKDIR /src
COPY Directory.Packages.props ./
COPY Jobs/cv-search-job/cv-search-job.csproj Jobs/cv-search-job/
COPY Jobs/job-scheduler/job-scheduler.csproj Jobs/job-scheduler/
COPY Apis/cv-search-data/cv-search-data.csproj Apis/cv-search-data/
COPY Apis/cv-matcher-api-models/cv-matcher-api-models.csproj Apis/cv-matcher-api-models/
COPY Apis/email-data/email-data.csproj Apis/email-data/
COPY Apis/email-api-models/email-api-models.csproj Apis/email-api-models/
COPY Apis/common/common.csproj Apis/common/
COPY Apis/myai-data/myai-data.csproj Apis/myai-data/
COPY Apis/shared-data/shared-data.csproj Apis/shared-data/
COPY Helpers/startup-helpers/startup-helpers.csproj Helpers/startup-helpers/
RUN dotnet restore Jobs/cv-search-job/cv-search-job.csproj
COPY Jobs/cv-search-job/ Jobs/cv-search-job/
COPY Jobs/job-scheduler/ Jobs/job-scheduler/
COPY Apis/cv-search-data/ Apis/cv-search-data/
COPY Apis/cv-matcher-api-models/ Apis/cv-matcher-api-models/
COPY Apis/email-data/ Apis/email-data/
COPY Apis/email-api-models/ Apis/email-api-models/
COPY Apis/common/ Apis/common/
COPY Apis/myai-data/ Apis/myai-data/
COPY Apis/shared-data/ Apis/shared-data/
COPY Helpers/startup-helpers/ Helpers/startup-helpers/
RUN dotnet publish Jobs/cv-search-job/cv-search-job.csproj -c $BUILD_CONFIGURATION -o /app/publish /p:UseAppHost=false
# Download Playwright Chromium browser in the build stage.
# Node.js is only needed here to run npx — it is not copied to the final image.
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \
&& npx --yes playwright@1.60.0 install chromium \
&& rm -rf /var/lib/apt/lists/*
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
WORKDIR /app
# System libraries required by Chromium on Debian bookworm
RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 \
libgbm1 libasound2t64 libpango-1.0-0 libcairo2 libatspi2.0-0 \
libwayland-client0 libx11-xcb1 libx11-6 libxcb1 libxext6 \
&& rm -rf /var/lib/apt/lists/*
# Copy the Playwright Chromium browser from the build stage
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
COPY --from=build /ms-playwright /ms-playwright
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "cv-search-job.dll"]