Fix file:// URL bug in HtmlJobSearcher — skip non-HTTP(S) URLs

After resolving relative hrefs against the base search URL, some ejobs.ro
links were producing file:/// URIs (e.g. file:///user/locuri-de-munca/...).
These were sent to cv-matcher-api and rejected with HTTP 400, causing 0 matches.

Added a scheme guard after URI resolution to skip any URL that is not
http:// or https://, preventing malformed URLs from reaching the matcher.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-08 16:57:52 +03:00
parent c89df975bd
commit 1222a86eb7
@@ -117,6 +117,10 @@ public sealed class HtmlJobSearcher
continue;
}
// Skip non-HTTP(S) URLs (e.g. file:// or javascript: that can appear in scraped HTML)
if (absoluteUri.Scheme != Uri.UriSchemeHttp && absoluteUri.Scheme != Uri.UriSchemeHttps)
continue;
var url = absoluteUri.GetLeftPart(UriPartial.Path);
if (seen.Add(url))
results.Add(url);