a467fac35d
PDF text extraction often stores all content without newlines. The previous line-based splitter would produce one line > 200 chars which was filtered out, yielding empty keywords. Replace with word-level sampling of the first 2000 chars, splitting on whitespace and common delimiters, skipping phone fragments, emails, and URLs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>