feat(providers): add headless browser scraping via Playwright for SPA job sites
Build and Push Docker Images Staging / build (push) Successful in 5m20s
Build and Push Docker Images Staging / build (push) Successful in 5m20s
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS bundle. This change equips cv-search-job with a headless Chromium (Playwright 1.60) so it can fully render SPA pages before extracting job links. - Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig, and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag is included in the session provider-config snapshot - Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true - HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync; plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial content rather than failing outright - Dockerfile: download Playwright Chromium in the SDK build stage via npx; copy browser binaries to the final image; install Chromium system libs (Ubuntu noble t64 variants) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -66,6 +66,11 @@ namespace CvSearch.Data.Migrations
|
||||
.HasMaxLength(1024)
|
||||
.HasColumnType("nvarchar(1024)");
|
||||
|
||||
b.Property<bool>("UseHeadlessBrowser")
|
||||
.ValueGeneratedOnAdd()
|
||||
.HasColumnType("bit")
|
||||
.HasDefaultValue(false);
|
||||
|
||||
b.HasKey("Id");
|
||||
|
||||
b.ToTable("JobProviders", "cvSearch");
|
||||
|
||||
Reference in New Issue
Block a user