AI/myAi - myAi - Gitea: Git with a cup of tea

AI/myAi

Author	SHA1	Message	Date
claude	a83f6f705f	Remove UseHeadlessBrowser from JobProvider — all fetches now go via page-fetcher-api page-fetcher-api always uses Playwright (networkidle by default), so the per-provider flag that chose between headless and plain HTTP is obsolete. - Removed from JobProviderEntity, CvSearchDbContext, JobProviderConfig, JobTokenService - HtmlJobSearcher no longer passes WaitFor (uses page-fetcher-api default) - EF migration drops the column from cvSearch.JobProviders Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 18:43:42 +03:00
claude	898dd09d50	feat: add page-fetcher-api — centralised Playwright page fetcher Introduces page-fetcher-api, a new internal ASP.NET Core service that centralises all web-page fetching through a single Playwright (headless Chromium) browser instance. All fetches are persisted to the pageFetcher SQL schema for auditing. New projects: - Apis/page-fetcher-api-models: FetchPageRequest, FetchPageResponse, IPageFetcherApiClient - Apis/page-fetcher-data: PageFetchDbContext, PageFetchEntity, InitialSchema migration (schema: pageFetcher) - Apis/page-fetcher-api: PlaywrightBrowserService (singleton), PageFetcherService, PageController Changes to existing services: - cv-matcher-api: JobTextExtractor now calls IPageFetcherApiClient instead of HttpClient - cv-search-job: HtmlJobSearcher uses IPageFetcherApiClient (removes inline Playwright); CvSearchJobTask fetches individual job pages and applies keyword pre-filter before LLM call; passes pre-fetched JobDescription to cv-matcher-api to skip re-fetch - common: add PageFetcherApiSettings - docker-compose.yml, build.yml: add new service + env vars for callers Closes #43 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 17:43:56 +03:00
claude	1222a86eb7	Fix file:// URL bug in HtmlJobSearcher — skip non-HTTP(S) URLs After resolving relative hrefs against the base search URL, some ejobs.ro links were producing file:/// URIs (e.g. file:///user/locuri-de-munca/...). These were sent to cv-matcher-api and rejected with HTTP 400, causing 0 matches. Added a scheme guard after URI resolution to skip any URL that is not http:// or https://, preventing malformed URLs from reaching the matcher. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 17:30:45 +03:00
claude	99e5cfb76b	Fix job search: location filtering, keyword quality, anchor filter bypass Closes #41 - Add RequireKeywordInAnchor per-provider flag (default true); set false for ejobs.ro and bestjobs.eu so Stage 2 anchor-text filter is skipped — their search URL already filters by relevance server-side - Update AI system prompts (en + ro) to extract concise job-board-friendly keywords (role title + key tech, not abstract concepts) and candidate location - Propagate location through JobMatchResponse -> CreateJobSearchTokenRequest -> JobSearchTokenEntity -> JobSearchSessionEntity - Add {location} and {location-slug} substitution in HtmlJobSearcher - Update provider SearchUrlTemplates to include location: ejobs.ro: /locuri-de-munca/{location-slug}?q={keywords} bestjobs.eu: /ro/locuri-de-munca-in-{location-slug}?keywords={keywords} linkedin.com: ?keywords={keywords}&location={location} - Three new migrations: AddRequireKeywordInAnchorAndLocation, ImproveKeywordsAndAddLocation, AddLocationToProviders Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 15:45:45 +03:00
claude	f7d856147e	Escalate provider fetch failures to Error for alert emails HTTP and Playwright fetch failures in HtmlJobSearcher are now logged at Error so that Serilog's email sink triggers an alert when a job provider is unreachable. Per-URL match failures remain at Warning (expected noise). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-01 22:03:46 +03:00
claude	e38f40732f	feat(providers): add headless browser scraping via Playwright for SPA job sites Build and Push Docker Images Staging / build (push) Successful in 5m20s Details ejobs.ro migrated to a Nuxt SPA - plain HTTP GET returns only the JS bundle. This change equips cv-search-job with a headless Chromium (Playwright 1.60) so it can fully render SPA pages before extracting job links. - Add UseHeadlessBrowser flag to JobProviderEntity, JobProviderConfig, and CvSearchDbContext; map it in JobTokenService.ToConfig so the flag is included in the session provider-config snapshot - Migration: add UseHeadlessBrowser column; fix ejobs.ro search URL (remove /user/ prefix that caused 404) and set UseHeadlessBrowser=true - HtmlJobSearcher: detect flag and dispatch to FetchWithPlaywrightAsync; plain-HTTP path is unchanged; NetworkIdle timeout falls back to partial content rather than failing outright - Dockerfile: download Playwright Chromium in the SDK build stage via npx; copy browser binaries to the final image; install Chromium system libs (Ubuntu noble t64 variants) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 13:42:52 +03:00
claude	af3a14c7ed	feat(cv-search-job): enrich diagnostics and add scan summary to results email Build and Push Docker Images Staging / build (push) Successful in 24s Details Add funnel-level logging to HtmlJobSearcher (total anchors found, stage-1 href-filter count, stage-2 keyword-filter count) and warn when the keyword list is empty. Log the full search URL and response size to catch silent HTTP failures or bot-block pages. In CvSearchJobTask, log keywords and active providers at session start, per-provider URL counts after each scrape, and every scored URL with its verdict (ACCEPTED / rejected) at Information level. Add a scan summary block to the results email (both non-empty and empty-results paths) showing the CV keywords used as chips and the comma-separated list of providers scanned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 11:00:04 +03:00
claude	16bb195cb5	Add XML doc to all service interfaces and implementations (#26 ) - Update CLAUDE.md: replace incorrect 'no XML doc on internal code' rule with the correct convention (XML doc on all public methods and non-trivial private/protected helpers) - Restore /// <summary> on FileDownloadController private helpers (HandleRangeRequest, StreamRangeAsync) - Add full XML doc to all service contracts: ICaptchaVerifier, IEmailSender, ICvMatcherService, IJobTextExtractor, IJobTokenService, IDocumentClassifier, IRagService, ITextChunker, ITextExtractor, IEmailTemplateService, ITemplateService - Add /// <summary> and /// <inheritdoc /> to all concrete service classes and their methods: RecaptchaVerifier, EmailApiEmailSender, SmtpEmailDispatcher, CvMatcherService, JobTextExtractor, JobTokenService, RagService, DocumentClassifier, TextChunker, TextExtractor, HtmlJobSearcher, CvSearchEmailSender, CvSearchJobTask, EmailTemplateService, DbTemplateService Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:17:42 +03:00
claude	4ee4a59b5e	Improve comments and Swagger annotations across services (#26 ) - EmailController: add class summary, full SwaggerResponse/ProducesResponseType for 400 and 500, and Description on SwaggerOperation - ContactController: fix terse "Failed." error message to "Could not process subscription." - FileDownloadController: remove redundant XML <response code> tags from the public action doc block; convert private-method /// <summary> to // (project convention: no XML doc on internal code) - CvMatcherService: remove two dead commented-out blocks (old email send and BuildEmailBody helper) - JobTokenService: comment the phone/contact-line regex filter in ExtractKeywords - DocumentClassifier: comment the keyword-frequency scoring approach and the confidence formula - TextChunker: comment the sliding-window step (chunkSize - overlap) - CvSearchJobTask: comment the GdprConsent = true rationale and the BuildCvFileName sanitisation logic - HtmlJobSearcher: comment GetLeftPart(UriPartial.Path) query-strip dedup Closes #26 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:07:23 +03:00
claude	e95ed36647	refactor: restructure solution into -models/-data/-api project taxonomy Phases 1-10 of the planned refactoring: Phase 1: rename shared-models -> common - namespace Shared.Models -> Common throughout - remove stale AspNetCore.Http.Features 5.0 reference Phase 2: create shared-data with abstract BaseEntity - BaseEntity: required string Id { get; init; } + DateTime CreatedAt { get; init; } Phase 3: rename myai-models -> myai-data - namespace MyAi.Models -> MyAi.Data - MigrationsAssembly("myai-data") Phase 4: rename cv-search-models -> cv-search-data - namespace CvSearch.Models -> CvSearch.Data - move JobSearchSettings to cv-matcher-api-models - JobSearch*Entity now inherits BaseEntity Phase 5: extract rag-data from rag-api - new project: Apis/rag-data with RagDbContext + entities + migrations - RagDocumentEntity inherits BaseEntity; cache entities use CacheKey PK - fix duplicate AddHttpClient<RagAiClient>/AddScoped registrations in rag-api - MigrationsAssembly("rag-data") Phase 6: extract cv-matcher-data from cv-matcher-api - new project: Apis/cv-matcher-data with CvMatcherDbContext + entities + migrations - CvMatchResultEntity inherits BaseEntity; CvMatcherChatCacheEntity uses CacheKey PK - MigrationsAssembly("cv-matcher-data") Phase 7: create empty cv-cleanup-job-models and cv-search-job-models Phase 8: update all 5 Dockerfiles for renamed/new projects Phase 9: reorganise .sln virtual folders (Apis/Jobs/Models/Data/Helpers) - update root CLAUDE.md with new project taxonomy and migration commands - update cv-matcher-api/CLAUDE.md and cv-search-job/CLAUDE.md Phase 10: add Directory.Packages.props for centralised NuGet versions - remove Version= from all PackageReference elements in active .csproj files No database changes. No runtime behaviour changes. All MigrationId strings in __EFMigrationsHistory are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 15:26:03 +03:00
claude	6293fa89e3	Add internet job search feature (cv-search-job) Build and Push Docker Images / build (push) Failing after 1m36s Details - New cv-search-models shared library: EF entities + CvSearchDbContext for cvSearch schema (JobSearchTokens, JobSearchSessions, JobSearchResults tables) - New cv-search-job worker service: polls DB for pending sessions, scrapes job boards via configurable HTML scraping, runs LLM scoring via cv-matcher-api, emails ranked results - cv-matcher-api: JobTokenService creates one-time tokens; JobSearchController handles link clicks and creates sessions - api: proxies job-search start endpoint, appends job search link to match result email - CI workflow updated to build and push myai-cv-search-job:staging image - CLAUDE.md documentation added for all affected services Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 17:56:23 +03:00

11 Commits