Commit Graph

7 Commits

Author SHA1 Message Date
claude f7d856147e Escalate provider fetch failures to Error for alert emails
HTTP and Playwright fetch failures in HtmlJobSearcher are now logged at
Error so that Serilog's email sink triggers an alert when a job provider
is unreachable. Per-URL match failures remain at Warning (expected noise).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 22:03:46 +03:00
claude e38f40732f feat(providers): add headless browser scraping via Playwright for SPA job sites
Build and Push Docker Images Staging / build (push) Successful in 5m20s
ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS
bundle. This change equips cv-search-job with a headless Chromium
(Playwright 1.60) so it can fully render SPA pages before extracting
job links.

- Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig,
  and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag
  is included in the session provider-config snapshot
- Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL
  (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true
- HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync;
  plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial
  content rather than failing outright
- Dockerfile: download Playwright Chromium in the SDK build stage via
  npx; copy browser binaries to the final image; install Chromium system
  libs (Ubuntu noble t64 variants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:42:52 +03:00
claude af3a14c7ed feat(cv-search-job): enrich diagnostics and add scan summary to results email
Build and Push Docker Images Staging / build (push) Successful in 24s
Add funnel-level logging to HtmlJobSearcher (total anchors found,
stage-1 href-filter count, stage-2 keyword-filter count) and warn
when the keyword list is empty. Log the full search URL and response
size to catch silent HTTP failures or bot-block pages.

In CvSearchJobTask, log keywords and active providers at session start,
per-provider URL counts after each scrape, and every scored URL with its
verdict (ACCEPTED / rejected) at Information level.

Add a scan summary block to the results email (both non-empty and
empty-results paths) showing the CV keywords used as chips and the
comma-separated list of providers scanned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 11:00:04 +03:00
claude 16bb195cb5 Add XML doc to all service interfaces and implementations (#26)
- Update CLAUDE.md: replace incorrect 'no XML doc on internal code' rule
  with the correct convention (XML doc on all public methods and
  non-trivial private/protected helpers)
- Restore /// <summary> on FileDownloadController private helpers
  (HandleRangeRequest, StreamRangeAsync)
- Add full XML doc to all service contracts:
  ICaptchaVerifier, IEmailSender, ICvMatcherService, IJobTextExtractor,
  IJobTokenService, IDocumentClassifier, IRagService, ITextChunker,
  ITextExtractor, IEmailTemplateService, ITemplateService
- Add /// <summary> and /// <inheritdoc /> to all concrete service classes
  and their methods: RecaptchaVerifier, EmailApiEmailSender,
  SmtpEmailDispatcher, CvMatcherService, JobTextExtractor, JobTokenService,
  RagService, DocumentClassifier, TextChunker, TextExtractor,
  HtmlJobSearcher, CvSearchEmailSender, CvSearchJobTask,
  EmailTemplateService, DbTemplateService

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 09:17:42 +03:00
claude 4ee4a59b5e Improve comments and Swagger annotations across services (#26)
- EmailController: add class summary, full SwaggerResponse/ProducesResponseType
  for 400 and 500, and Description on SwaggerOperation
- ContactController: fix terse "Failed." error message to
  "Could not process subscription."
- FileDownloadController: remove redundant XML <response code> tags from
  the public action doc block; convert private-method /// <summary> to //
  (project convention: no XML doc on internal code)
- CvMatcherService: remove two dead commented-out blocks (old email send
  and BuildEmailBody helper)
- JobTokenService: comment the phone/contact-line regex filter in
  ExtractKeywords
- DocumentClassifier: comment the keyword-frequency scoring approach and
  the confidence formula
- TextChunker: comment the sliding-window step (chunkSize - overlap)
- CvSearchJobTask: comment the GdprConsent = true rationale and the
  BuildCvFileName sanitisation logic
- HtmlJobSearcher: comment GetLeftPart(UriPartial.Path) query-strip dedup

Closes #26

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 09:07:23 +03:00
claude e95ed36647 refactor: restructure solution into -models/-data/-api project taxonomy
Phases 1-10 of the planned refactoring:

Phase 1: rename shared-models -> common
  - namespace Shared.Models -> Common throughout
  - remove stale AspNetCore.Http.Features 5.0 reference

Phase 2: create shared-data with abstract BaseEntity
  - BaseEntity: required string Id { get; init; } + DateTime CreatedAt { get; init; }

Phase 3: rename myai-models -> myai-data
  - namespace MyAi.Models -> MyAi.Data
  - MigrationsAssembly("myai-data")

Phase 4: rename cv-search-models -> cv-search-data
  - namespace CvSearch.Models -> CvSearch.Data
  - move JobSearchSettings to cv-matcher-api-models
  - JobSearch*Entity now inherits BaseEntity

Phase 5: extract rag-data from rag-api
  - new project: Apis/rag-data with RagDbContext + entities + migrations
  - RagDocumentEntity inherits BaseEntity; cache entities use CacheKey PK
  - fix duplicate AddHttpClient<RagAiClient>/AddScoped registrations in rag-api
  - MigrationsAssembly("rag-data")

Phase 6: extract cv-matcher-data from cv-matcher-api
  - new project: Apis/cv-matcher-data with CvMatcherDbContext + entities + migrations
  - CvMatchResultEntity inherits BaseEntity; CvMatcherChatCacheEntity uses CacheKey PK
  - MigrationsAssembly("cv-matcher-data")

Phase 7: create empty cv-cleanup-job-models and cv-search-job-models

Phase 8: update all 5 Dockerfiles for renamed/new projects

Phase 9: reorganise .sln virtual folders (Apis/Jobs/Models/Data/Helpers)
  - update root CLAUDE.md with new project taxonomy and migration commands
  - update cv-matcher-api/CLAUDE.md and cv-search-job/CLAUDE.md

Phase 10: add Directory.Packages.props for centralised NuGet versions
  - remove Version= from all PackageReference elements in active .csproj files

No database changes. No runtime behaviour changes.
All MigrationId strings in __EFMigrationsHistory are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 15:26:03 +03:00
claude 6293fa89e3 Add internet job search feature (cv-search-job)
Build and Push Docker Images / build (push) Failing after 1m36s
- New cv-search-models shared library: EF entities + CvSearchDbContext for cvSearch schema (JobSearchTokens, JobSearchSessions, JobSearchResults tables)
- New cv-search-job worker service: polls DB for pending sessions, scrapes job boards via configurable HTML scraping, runs LLM scoring via cv-matcher-api, emails ranked results
- cv-matcher-api: JobTokenService creates one-time tokens; JobSearchController handles link clicks and creates sessions
- api: proxies job-search start endpoint, appends job search link to match result email
- CI workflow updated to build and push myai-cv-search-job:staging image
- CLAUDE.md documentation added for all affected services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 17:56:23 +03:00