fix: address 7 CodeQL high-severity alerts#3
Merged
Merged
Conversation
Clean sweep of the security scan. Each fix is defensive — none of these were exploitable in deepdive's context (search snippets aren't HTML-rendered; URLs are not attacker-controlled sizes) — but silencing CodeQL matters for the paste-able health signal, and the new implementations are genuinely better. Alerts fixed: - #4, #5, #6, #7 js/polynomial-redos — `/\/+$/` and `/#.*$/` end-anchored patterns replaced with non-regex string walks in a new src/url-util.ts (trimTrailingSlashes, stripHashFragment, dedupeKey). Provably linear-time, with a pathological-input test that asserts sub-100ms runtime on 100k trailing slashes. - #3 js/incomplete-url-substring-sanitization — in the DDG redirect unwrap, `u.hostname.endsWith("duckduckgo.com")` would also match `evil-duckduckgo.com`. Tightened to `hostname === "duckduckgo.com" || hostname.endsWith(".duckduckgo.com")`. - #2 js/double-escaping — decodeHtmlEntities was chaining 8 sequential `.replace()` calls, so `&#39;` would double-decode: first pass produces `'`, second pass expands to `'`. Replaced with a single-pass tokenizer that resolves each `&...;` exactly once, so `&#39;` now correctly stays as the literal `'`. Also added support for `'` and `&#xHEX;` forms while I was in there. - #1 js/incomplete-multi-character-sanitization — stripTags matched `/<[^>]+>/g`, which leaves malformed partials like `<scrip` (no closing `>`) in the output. Now does a two-step strip: well-formed tags replaced with a space, then any remaining `<` is also replaced with a space, so no tag-opener character can ever leak downstream. Test coverage added: - test/url-util.test.mjs — 10 assertions including a linear-time assert - test/html-sanitize.test.mjs — 10 assertions; the `<scrip` partial and the double-decode regression are both pinned. 116 tests total, all pass (up from 96).
3 tasks
askalf
added a commit
that referenced
this pull request
Apr 23, 2026
) CodeQL flagged `baseUrl.replace(/\/+$/, "")` in doctor.ts as polynomial- ReDoS after PR #4 landed doctor.ts. Same class as the original 7 fixed in #3; the regex is actually benign (no nested repetition) but using the shared trimTrailingSlashes helper from url-util.ts keeps the pattern consistent and CodeQL's query clean. Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Clean sweep of the 7 open CodeQL alerts. None were exploitable in context (search snippets aren't HTML-rendered, URLs aren't attacker-controlled sizes) but silencing CodeQL matters for the paste-able health signal, and the new implementations are genuinely better.
Alert-by-alert
js/polynomial-redosllm.ts,search.ts,searxng.ts/\/+$/and/#.*$/replaced with non-regex string walks in a newsrc/url-util.ts(trimTrailingSlashes,stripHashFragment,dedupeKey). Provably linear-time.js/incomplete-url-substring-sanitizationsearch/duckduckgo.tshostname.endsWith('duckduckgo.com')would matchevil-duckduckgo.com. Tightened to=== 'duckduckgo.com' || endsWith('.duckduckgo.com').js/double-escapingsearch/duckduckgo.ts.replace()calls meant&#39;would double-decode to'. Replaced with a single-pass tokenizer — each&...;resolved exactly once.js/incomplete-multi-character-sanitizationsearch/duckduckgo.tsstripTagsleft malformed partials like<scrip(no closing>) intact. Now strips well-formed tags, then drops any remaining<so no tag-opener can leak.Test plan
npm run build— clean under strict: truenpm test— 116 pass, 0 fail (up from 96)test/url-util.test.mjs— 10 assertions including a linear-time assertion on 100k-char pathological input (sub-100ms)test/html-sanitize.test.mjs— 10 assertions; the<scrippartial and the&#39;double-decode regressions are both pinned