Fix file:// URL bug in HtmlJobSearcher — skip non-HTTP(S) URLs
Build and Push Docker Images Staging / build (push) Successful in 35s

After resolving relative hrefs against the base search URL, some ejobs.ro
links were producing file:/// URIs (e.g. file:///user/locuri-de-munca/...).
These were sent to cv-matcher-api and rejected with HTTP 400, causing 0 matches.

Added a scheme guard after URI resolution to skip any URL that is not
http:// or https://, preventing malformed URLs from reaching the matcher.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-08 16:57:52 +03:00
parent c89df975bd
commit 2e9069cbdb
@@ -117,6 +117,10 @@ public sealed class HtmlJobSearcher
continue; continue;
} }
// Skip non-HTTP(S) URLs (e.g. file:// or javascript: that can appear in scraped HTML)
if (absoluteUri.Scheme != Uri.UriSchemeHttp && absoluteUri.Scheme != Uri.UriSchemeHttps)
continue;
var url = absoluteUri.GetLeftPart(UriPartial.Path); var url = absoluteUri.GetLeftPart(UriPartial.Path);
if (seen.Add(url)) if (seen.Add(url))
results.Add(url); results.Add(url);