Fix file:// URL bug in HtmlJobSearcher — skip non-HTTP(S) URLs
Build and Push Docker Images Staging / build (push) Successful in 35s
Build and Push Docker Images Staging / build (push) Successful in 35s
After resolving relative hrefs against the base search URL, some ejobs.ro links were producing file:/// URIs (e.g. file:///user/locuri-de-munca/...). These were sent to cv-matcher-api and rejected with HTTP 400, causing 0 matches. Added a scheme guard after URI resolution to skip any URL that is not http:// or https://, preventing malformed URLs from reaching the matcher. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -117,6 +117,10 @@ public sealed class HtmlJobSearcher
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Skip non-HTTP(S) URLs (e.g. file:// or javascript: that can appear in scraped HTML)
|
||||||
|
if (absoluteUri.Scheme != Uri.UriSchemeHttp && absoluteUri.Scheme != Uri.UriSchemeHttps)
|
||||||
|
continue;
|
||||||
|
|
||||||
var url = absoluteUri.GetLeftPart(UriPartial.Path);
|
var url = absoluteUri.GetLeftPart(UriPartial.Path);
|
||||||
if (seen.Add(url))
|
if (seen.Add(url))
|
||||||
results.Add(url);
|
results.Add(url);
|
||||||
|
|||||||
Reference in New Issue
Block a user