Skip to content

Update search engine requests and result parsing#17

Open
fccview wants to merge 2 commits into
mainfrom
develop
Open

Update search engine requests and result parsing#17
fccview wants to merge 2 commits into
mainfrom
develop

Conversation

@fccview

@fccview fccview commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features
    • Wikipedia now builds result links using the selected language.
    • Safer-search behavior is applied more consistently across Bing and DuckDuckGo (including Images and News) using cookie-based state.
  • Bug Fixes
    • Improved Ecosia result parsing and URL validation.
    • Startpage now detects CAPTCHA/anti-bot pages and stops early with a clearer error.
    • Reddit time filters now reliably fall back to “all” for unsupported values.
    • Improved resilience when warmed requests are blocked (more reliable fallback behavior).
  • Chores
    • Updated bundled component versions.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR updates Bing, DuckDuckGo, and DuckDuckGo Images requests to send safe-search state through query parameters and cookies, adds a language-derived Wikipedia host, expands Ecosia result parsing, adds User-Agent headers to several engines, adds CAPTCHA detection to Startpage, tightens Reddit time mapping, updates lolcat-4play cache and warmup keying, changes warmed fetch fallback behavior, and bumps engine versions in package.json.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately reflects the main changes to search engine requests and result handling across the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch develop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot changed the title @coderabbitai Update search engine requests and result parsing Jun 27, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@engines/bing-videos/index.js`:
- Around line 60-61: Normalize safeSearch in the Bing video request builder
before using it, so invalid or persisted values do not diverge between the query
string and cookie. Update the logic in the request path that uses
this.safeSearch and adltCookie to first map safeSearch to the same
strict/moderate/off values used by the other Bing engines, then use that
normalized value consistently for both adlt and SRCHHPGUSR in the BingVideos
implementation.

In `@engines/duckduckgo-images/index.js`:
- Around line 103-114: The _region() helper in
engines/duckduckgo-images/index.js is generating invalid DuckDuckGo region
values like en-en when context.lang is English. Update _region() to special-case
English or map context.lang through a supported DDG country-language pair before
_headers() uses it for the l parameter and Cookie, while keeping the existing
fallback behavior for missing context.lang.

In `@engines/duckduckgo-news/index.js`:
- Around line 61-68: The safe-search mapping in the duckduckgo-news engine is
collapsing off and moderate to the same value, so users choosing off still send
the moderate setting. Update the mapping used in the request builder so the
equivalent of safeSearch handling in this engine distinguishes off from
moderate, and apply the same correction anywhere the `p` cookie/query parameter
is assembled in the affected block(s). Use the `safeMap`, `safeSearch`, and
request header/query construction logic to ensure `off` no longer resolves to
the moderate value.

In `@engines/duckduckgo/index.js`:
- Line 5: Add the explicit off safe-search state to SAFE_SEARCH_MAP so
DuckDuckGo requests can propagate the default setting correctly. Update the
mapping used by the safe-search resolution in the engine’s main request-building
flow so safeSearch = "off" resolves to a valid value instead of undefined,
ensuring the kp parameter and Cookie are set consistently in the logic around
the request builder and related safe-search handling paths.

In `@engines/ecosia/index.js`:
- Around line 75-76: The hostname check in the Ecosia URL handling is too loose
because `endsWith("ecosia.org")` also matches unrelated domains. Update the
guard in the `new URL(href)` / `parsed.hostname` logic to verify the registrable
domain boundary for Ecosia instead of a simple suffix match, so only true
`ecosia.org` hosts are skipped.

In `@engines/hacker-news/index.js`:
- Around line 53-56: The Hacker News request headers are using
context?.userAgent?.() with nullish coalescing, so an empty string still gets
sent as User-Agent instead of falling back. Update the header selection in the
request builder to treat blank results from context.userAgent() as missing, and
use the default UA when the returned value is empty; keep the change localized
to the headers block in the HN fetch path.

In `@engines/internet-archive/index.js`:
- Around line 38-42: The dynamic User-Agent fallback in the Internet Archive
fetch setup should use a truthy check instead of preserving empty strings.
Update the headers construction in the doFetch call inside the internet-archive
handler so the context?.userAgent?.() value only wins when it is non-empty, and
otherwise the default Mozilla-style string is used.

In `@engines/nasa-images/index.js`:
- Line 31: The NASA images request header currently uses `context?.userAgent?.()
?? "Mozilla/5.0 (compatible; degoog/1.0)"`, which still passes through an empty
string from `context.userAgent()`. Update the UA fallback logic in
`engines/nasa-images/index.js` so the default value is used when
`context.userAgent()` returns a blank string as well as when it is nullish,
keeping the fallback behavior in the same header construction site.

In `@engines/openverse/index.js`:
- Line 50: The Openverse header builder is allowing an empty User-Agent through
because `context.userAgent()` is only guarded with `??`, so `""` bypasses the
fallback. Update the logic in `engines/openverse/index.js` to normalize blank
values before constructing the headers, ensuring the `User-Agent` falls back to
the default when `context.userAgent()` returns an empty string.

In `@engines/the-guardian/index.js`:
- Around line 94-97: The User-Agent fallback in the request headers is not
applied when context.userAgent() returns an empty string because the current
nullish check treats it as valid. Update the header construction in the Guardian
request code to use a truthy fallback so empty UA strings also resolve to the
default Mozilla-compatible value, and keep the logic centered around the
existing context.userAgent() call in the headers block.

In `@engines/wikipedia/index.js`:
- Around line 6-12: The host selection in `_wikiHost` is too restrictive because
`LANG_RE` only accepts 2–3 letter codes, so valid Wikipedia identifiers like
`simple` or locale-tagged values such as `pt-BR` fall back to English. Update
the language normalization logic in `engines/wikipedia/index.js` to map
`context.lang` to a supported Wikipedia host code using an allowlist or explicit
normalization step, and make `_wikiHost` use that normalized value so these
inputs resolve to the correct Wikipedia domain instead of `en.wikipedia.org`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b9c369de-a493-48a7-b18a-3716349bc72f

📥 Commits

Reviewing files that changed from the base of the PR and between 04bfd13 and 267d88d.

📒 Files selected for processing (16)
  • engines/bing-images/index.js
  • engines/bing-videos/index.js
  • engines/bing/index.js
  • engines/duckduckgo-images/index.js
  • engines/duckduckgo-news/index.js
  • engines/duckduckgo/index.js
  • engines/ecosia/index.js
  • engines/hacker-news/index.js
  • engines/internet-archive/index.js
  • engines/nasa-images/index.js
  • engines/openverse/index.js
  • engines/reddit/index.js
  • engines/startpage/index.js
  • engines/the-guardian/index.js
  • engines/wikipedia/index.js
  • package.json

Comment on lines 60 to +61
if (this.safeSearch !== "off") url += `&adlt=${this.safeSearch}`;
const adltCookie = { strict: "STRICT", moderate: "DEMOTE", off: "OFF" }[this.safeSearch] ?? "OFF";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Normalize safeSearch before building the Bing request.

configure() accepts any string, but this branch sends this.safeSearch straight into adlt while the cookie path falls back unknown values to OFF. That makes the two signals diverge for invalid persisted/programmatic values (adlt=foo + SRCHHPGUSR=ADLT=OFF), unlike engines/bing/index.js and engines/bing-images/index.js, which normalize first.

Proposed fix
-    if (this.safeSearch !== "off") url += `&adlt=${this.safeSearch}`;
-    const adltCookie = { strict: "STRICT", moderate: "DEMOTE", off: "OFF" }[this.safeSearch] ?? "OFF";
+    const adlt =
+      this.safeSearch === "strict" || this.safeSearch === "moderate"
+        ? this.safeSearch
+        : "off";
+    if (adlt !== "off") url += `&adlt=${adlt}`;
+    const adltCookie = { strict: "STRICT", moderate: "DEMOTE", off: "OFF" }[adlt] ?? "OFF";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (this.safeSearch !== "off") url += `&adlt=${this.safeSearch}`;
const adltCookie = { strict: "STRICT", moderate: "DEMOTE", off: "OFF" }[this.safeSearch] ?? "OFF";
const adlt =
this.safeSearch === "strict" || this.safeSearch === "moderate"
? this.safeSearch
: "off";
if (adlt !== "off") url += `&adlt=${adlt}`;
const adltCookie = { strict: "STRICT", moderate: "DEMOTE", off: "OFF" }[adlt] ?? "OFF";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/bing-videos/index.js` around lines 60 - 61, Normalize safeSearch in
the Bing video request builder before using it, so invalid or persisted values
do not diverge between the query string and cookie. Update the logic in the
request path that uses this.safeSearch and adltCookie to first map safeSearch to
the same strict/moderate/off values used by the other Bing engines, then use
that normalized value consistently for both adlt and SRCHHPGUSR in the
BingVideos implementation.

Comment on lines +103 to 114
_region(context) {
return context?.lang ? `${context.lang}-${context.lang}` : "us-en";
}

_headers(context, safe, region) {
return {
"User-Agent": context?.userAgent?.() ?? FALLBACK_UA,
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": context?.buildAcceptLanguage?.() ?? "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
Cookie: `p=${safe}; ah=${region}; l=${region}`,
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify how context.lang is populated and whether a region/locale helper already exists.
rg -n -C3 '\bcontext\?\.lang\b|\bcontext\.lang\b|buildAcceptLanguage|kl|region' .

Repository: degoog-org/official-extensions

Length of output: 28283


🏁 Script executed:

#!/bin/bash
set -euo pipefail

ast-grep outline engines/duckduckgo-images/index.js --view expanded
printf '\n--- duckduckgo-news ---\n'
ast-grep outline engines/duckduckgo-news/index.js --view expanded
printf '\n--- related lang/region helpers ---\n'
rg -n -C2 '\b_buildKl\b|\bcontext\.lang\b|\blang === "en"\b|\bus-en\b|\ben-en\b|\bkl\b|\bl=\b' engines/duckduckgo*.js autocomplete/*.js engines/*/*.js

Repository: degoog-org/official-extensions

Length of output: 4453


🌐 Web query:

DuckDuckGo images region parameter kl values en-en us-en documentation accepted region codes

💡 Result:

DuckDuckGo uses the kl (keyboard language/region) URL parameter to specify the region and language settings for search queries, including image searches [1][2][3]. For standard web searches, you append the kl parameter to the URL, for example: https://duckduckgo.com/?q=search&kl=us-en [1][2]. For image searches, while the parameter is technically kl in the broader search system [3], some third-party API implementations or wrappers may refer to this setting as locale [4] or include it within their own library's region argument [5]. The kl value follows a specific format: a two-letter country code followed by a hyphen and a two-letter language code (e.g., us-en for United States English) [1][6][7]. The setting wt-wt is commonly used to represent "no region" (worldwide) [3][8][7]. Accepted Region Codes DuckDuckGo supports a wide range of regional codes. Below is a selection of commonly used codes [7]: - us-en: United States (English) - uk-en: United Kingdom (English) - ca-en: Canada (English) - ca-fr: Canada (French) - fr-fr: France (French) - de-de: Germany (German) - es-es: Spain (Spanish) - jp-jp: Japan (Japanese) - wt-wt: No region/Worldwide A comprehensive list of supported region codes can be found in technical documentation repositories, such as those maintained by search API providers [6][8][7]. Note that while the kl parameter is the standard way to set regions in the URL, some automated scraping tools may require additional parameters like vqd (Validation Query Digest) to successfully query DuckDuckGo's internal endpoints (e.g., i.js for images) [3][9].

Citations:


Avoid emitting en-en for DuckDuckGo Images. DuckDuckGo region values are country-language pairs like us-en; _region() currently turns context.lang === "en" into en-en and sends it through both l and the cookie. Special-case English or map context.lang to a supported DDG region first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/duckduckgo-images/index.js` around lines 103 - 114, The _region()
helper in engines/duckduckgo-images/index.js is generating invalid DuckDuckGo
region values like en-en when context.lang is English. Update _region() to
special-case English or map context.lang through a supported DDG
country-language pair before _headers() uses it for the l parameter and Cookie,
while keeping the existing fallback behavior for missing context.lang.

Comment on lines +61 to +68
const safeMap = { off: "-1", moderate: "-1", strict: "1" };
const safe = safeMap[this.safeSearch] ?? "-1";
const headers = {
"User-Agent": ua,
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": acceptLang,
"Accept-Encoding": "gzip, deflate, br",
Cookie: `p=${safe}`,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t map off to the moderate safe-search value.

Line 61 makes off and moderate both resolve to "-1", so users selecting Safe Search off still send the moderate value in both Cookie: p=... and the p query parameter. This also diverges from the images engine’s SAFE_OFF = "-2" / SAFE_MODERATE = "-1" mapping.

Proposed fix
-    const safeMap = { off: "-1", moderate: "-1", strict: "1" };
+    const safeMap = { off: "-2", moderate: "-1", strict: "1" };

Also applies to: 81-88

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/duckduckgo-news/index.js` around lines 61 - 68, The safe-search
mapping in the duckduckgo-news engine is collapsing off and moderate to the same
value, so users choosing off still send the moderate setting. Update the mapping
used in the request builder so the equivalent of safeSearch handling in this
engine distinguishes off from moderate, and apply the same correction anywhere
the `p` cookie/query parameter is assembled in the affected block(s). Use the
`safeMap`, `safeSearch`, and request header/query construction logic to ensure
`off` no longer resolves to the moderate value.

const FALLBACK_UA = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36";

const SAFE_SEARCH_MAP = { moderate: "-2", strict: "1" };
const SAFE_SEARCH_MAP = { moderate: "-1", strict: "1" };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Send the explicit off safe-search state too.

safeSearch = "off" is a supported option, but it is missing from SAFE_SEARCH_MAP, so Line 46 resolves safe to undefined and Lines 50/69 omit both kp and Cookie. Add the off mapping so the default setting is actually propagated.

Proposed fix
-const SAFE_SEARCH_MAP = { moderate: "-1", strict: "1" };
+const SAFE_SEARCH_MAP = { off: "-2", moderate: "-1", strict: "1" };

Also applies to: 46-50, 69-69

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/duckduckgo/index.js` at line 5, Add the explicit off safe-search
state to SAFE_SEARCH_MAP so DuckDuckGo requests can propagate the default
setting correctly. Update the mapping used by the safe-search resolution in the
engine’s main request-building flow so safeSearch = "off" resolves to a valid
value instead of undefined, ensuring the kp parameter and Cookie are set
consistently in the logic around the request builder and related safe-search
handling paths.

Comment thread engines/ecosia/index.js
Comment on lines 75 to +76
const parsed = new URL(href);
if (parsed.hostname === "www.ecosia.org") return;
if (parsed.hostname.endsWith("ecosia.org")) return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Tighten the Ecosia hostname guard.

endsWith("ecosia.org") also matches unrelated domains like notecosia.org, so this can drop legitimate search results. Match the registrable domain boundary instead.

Suggested fix
-          if (parsed.hostname.endsWith("ecosia.org")) return;
+          if (parsed.hostname === "ecosia.org" || parsed.hostname.endsWith(".ecosia.org")) return;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const parsed = new URL(href);
if (parsed.hostname === "www.ecosia.org") return;
if (parsed.hostname.endsWith("ecosia.org")) return;
const parsed = new URL(href);
if (parsed.hostname === "ecosia.org" || parsed.hostname.endsWith(".ecosia.org")) return;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/ecosia/index.js` around lines 75 - 76, The hostname check in the
Ecosia URL handling is too loose because `endsWith("ecosia.org")` also matches
unrelated domains. Update the guard in the `new URL(href)` / `parsed.hostname`
logic to verify the registrable domain boundary for Ecosia instead of a simple
suffix match, so only true `ecosia.org` hosts are skipped.

Comment on lines 38 to 42
const response = await doFetch(url, {
headers: {
"User-Agent": "Mozilla/5.0 (compatible; degoog/1.0)",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
Accept: "application/json",
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Use a truthy fallback for the dynamic UA.

Line 40 preserves "" because it uses ??, so a missing upstream UA still results in an empty User-Agent header instead of the fallback string.

Suggested fix
-        "User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
+        "User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const response = await doFetch(url, {
headers: {
"User-Agent": "Mozilla/5.0 (compatible; degoog/1.0)",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
Accept: "application/json",
},
const response = await doFetch(url, {
headers: {
"User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
Accept: "application/json",
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/internet-archive/index.js` around lines 38 - 42, The dynamic
User-Agent fallback in the Internet Archive fetch setup should use a truthy
check instead of preserving empty strings. Update the headers construction in
the doFetch call inside the internet-archive handler so the
context?.userAgent?.() value only wins when it is non-empty, and otherwise the
default Mozilla-style string is used.

headers: {
Accept: "application/json",
"Accept-Language": context?.buildAcceptLanguage?.() ?? "en,en-US;q=0.9",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Blank upstream UAs still bypass the fallback here.

Line 31 uses ??, so context.userAgent() returning "" sends an empty header value instead of the default UA.

Suggested fix
-          "User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
+          "User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
"User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/nasa-images/index.js` at line 31, The NASA images request header
currently uses `context?.userAgent?.() ?? "Mozilla/5.0 (compatible;
degoog/1.0)"`, which still passes through an empty string from
`context.userAgent()`. Update the UA fallback logic in
`engines/nasa-images/index.js` so the default value is used when
`context.userAgent()` returns a blank string as well as when it is nullish,
keeping the fallback behavior in the same header construction site.

headers: {
Accept: "application/json",
"Accept-Language": context?.buildAcceptLanguage?.() ?? "en,en-US;q=0.9",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Normalize empty UA values before building headers.

Line 50 uses ??, which means "" from context.userAgent() skips the fallback and leaves the request with a blank User-Agent.

Suggested fix
-          "User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
+          "User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
"User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/openverse/index.js` at line 50, The Openverse header builder is
allowing an empty User-Agent through because `context.userAgent()` is only
guarded with `??`, so `""` bypasses the fallback. Update the logic in
`engines/openverse/index.js` to normalize blank values before constructing the
headers, ensuring the `User-Agent` falls back to the default when
`context.userAgent()` returns an empty string.

Comment on lines +94 to +97
headers: {
Accept: "application/json",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

The fallback does not cover empty UA strings.

Line 96 uses ??, so an empty string from context.userAgent() is treated as valid and the request still goes out without the intended fallback UA.

Suggested fix
-          "User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
+          "User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
headers: {
Accept: "application/json",
"User-Agent": context?.userAgent?.() ?? "Mozilla/5.0 (compatible; degoog/1.0)",
},
headers: {
Accept: "application/json",
"User-Agent": context?.userAgent?.() || "Mozilla/5.0 (compatible; degoog/1.0)",
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/the-guardian/index.js` around lines 94 - 97, The User-Agent fallback
in the request headers is not applied when context.userAgent() returns an empty
string because the current nullish check treats it as valid. Update the header
construction in the Guardian request code to use a truthy fallback so empty UA
strings also resolve to the default Mozilla-compatible value, and keep the logic
centered around the existing context.userAgent() call in the headers block.

Comment on lines +6 to +12
const LANG_RE = /^[a-z]{2,3}$/;

const _wikiHost = (lang) => {
if (lang && LANG_RE.test(lang)) return `${lang}.wikipedia.org`;
return "en.wikipedia.org";
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the repository contract for context.lang and look for locale-style values.
rg -n -C2 --type=js '\bcontext\??\.lang\b' .
rg -n -C2 --type=js '\blang\b' engines

Repository: degoog-org/official-extensions

Length of output: 22016


LANG_RE excludes valid Wikipedia language hosts.

simple and locale-tagged inputs like pt-BR fall back to en.wikipedia.org, so language-aware routing is lost for those cases. Normalize context.lang to a Wikipedia host code or use an allowlist instead of this regex.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engines/wikipedia/index.js` around lines 6 - 12, The host selection in
`_wikiHost` is too restrictive because `LANG_RE` only accepts 2–3 letter codes,
so valid Wikipedia identifiers like `simple` or locale-tagged values such as
`pt-BR` fall back to English. Update the language normalization logic in
`engines/wikipedia/index.js` to map `context.lang` to a supported Wikipedia host
code using an allowlist or explicit normalization step, and make `_wikiHost` use
that normalized value so these inputs resolve to the correct Wikipedia domain
instead of `en.wikipedia.org`.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@transports/lolcat-4play/index.js`:
- Around line 394-420: The `_loadCookieJar` path is preferring the shared
`_cookieCache` over the fresher in-memory jar, which can let stale session state
override recent writes from `_persistCookieJar`. Update `_loadCookieJar` to
check `_cookieJarTextSessions` first for the `memKey` and return that value when
present, falling back to `_cookieCache` only if no in-memory jar exists. Keep
the existing error handling and use the same `cookieJarKeyFor`, `_memKey`, and
`_cacheKey` symbols so the live cookie state always wins within the process.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b20920d4-315d-44fb-a79c-9a076350aeef

📥 Commits

Reviewing files that changed from the base of the PR and between 267d88d and eb3f45c.

📒 Files selected for processing (2)
  • package.json
  • transports/lolcat-4play/index.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • package.json

Comment on lines 394 to +420
async _loadCookieJar(origin, containerId) {
const key = cookieJarKeyFor(origin, this._containerConfigKey || "default");
const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
if (this._cookieCache) {
try {
const cached = await this._cookieCache.get(key);
const cached = await this._cookieCache.get(cacheKey);
if (cached) return cached;
} catch (error) {
console.warn(
`[lolcat-4play] failed to read cookie jar for ${origin}: ${error?.message || error}`,
);
}
}
return this._cookieJarTextSessions.get(key) || null;
return this._cookieJarTextSessions.get(memKey) || null;
}

async _loadSessionFromCache(origin, containerId) {
const key = cookieJarKeyFor(origin, this._containerConfigKey || "default");
const warmupKey = warmupKeyFor(origin, this._containerConfigKey || "default");
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
const warmupKey = warmupKeyFor(origin, this._memKey(containerId));
if (this._browserHeaderSessions.has(warmupKey)) {
return;
}

if (this._cookieCache) {
try {
const cachedCookies = await this._cookieCache.get(key);
const cachedHeadersJson = await this._cookieCache.get(key + ":headers");
const cachedCookies = await this._cookieCache.get(cacheKey);
const cachedHeadersJson = await this._cookieCache.get(cacheKey + ":headers");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Prefer the live in-memory cookie jar before shared cache.

_persistCookieJar updates _cookieJarTextSessions immediately, but shared cache writes are async and can fail. Since _loadCookieJar reads _cookieCache first, a stale cached jar can override fresher in-process cookies and keep warmed curl requests using old session state.

Proposed fix
   async _loadCookieJar(origin, containerId) {
     const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
     const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
+    const inMemory = this._cookieJarTextSessions.get(memKey);
+    if (typeof inMemory === "string") return inMemory;
+
     if (this._cookieCache) {
       try {
         const cached = await this._cookieCache.get(cacheKey);
-        if (cached) return cached;
+        if (cached) {
+          this._cookieJarTextSessions.set(memKey, cached);
+          return cached;
+        }
       } catch (error) {
         console.warn(
           `[lolcat-4play] failed to read cookie jar for ${origin}: ${error?.message || error}`,
         );
       }
     }
-    return this._cookieJarTextSessions.get(memKey) || null;
+    return null;
   }
 
   async _loadSessionFromCache(origin, containerId) {
     const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
+    const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
     const warmupKey = warmupKeyFor(origin, this._memKey(containerId));
     if (this._browserHeaderSessions.has(warmupKey)) {
       return;
@@
           };
           this._browserHeaderSessions.set(warmupKey, session);
+          this._cookieJarTextSessions.set(memKey, cachedCookies);
           this._setWarmupState(origin, containerId, { warmedAt: Date.now() });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async _loadCookieJar(origin, containerId) {
const key = cookieJarKeyFor(origin, this._containerConfigKey || "default");
const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
if (this._cookieCache) {
try {
const cached = await this._cookieCache.get(key);
const cached = await this._cookieCache.get(cacheKey);
if (cached) return cached;
} catch (error) {
console.warn(
`[lolcat-4play] failed to read cookie jar for ${origin}: ${error?.message || error}`,
);
}
}
return this._cookieJarTextSessions.get(key) || null;
return this._cookieJarTextSessions.get(memKey) || null;
}
async _loadSessionFromCache(origin, containerId) {
const key = cookieJarKeyFor(origin, this._containerConfigKey || "default");
const warmupKey = warmupKeyFor(origin, this._containerConfigKey || "default");
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
const warmupKey = warmupKeyFor(origin, this._memKey(containerId));
if (this._browserHeaderSessions.has(warmupKey)) {
return;
}
if (this._cookieCache) {
try {
const cachedCookies = await this._cookieCache.get(key);
const cachedHeadersJson = await this._cookieCache.get(key + ":headers");
const cachedCookies = await this._cookieCache.get(cacheKey);
const cachedHeadersJson = await this._cookieCache.get(cacheKey + ":headers");
async _loadCookieJar(origin, containerId) {
const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
const inMemory = this._cookieJarTextSessions.get(memKey);
if (typeof inMemory === "string") return inMemory;
if (this._cookieCache) {
try {
const cached = await this._cookieCache.get(cacheKey);
if (cached) {
this._cookieJarTextSessions.set(memKey, cached);
return cached;
}
} catch (error) {
console.warn(
`[lolcat-4play] failed to read cookie jar for ${origin}: ${error?.message || error}`,
);
}
}
return null;
}
async _loadSessionFromCache(origin, containerId) {
const cacheKey = cookieJarKeyFor(origin, this._cacheKey());
const memKey = cookieJarKeyFor(origin, this._memKey(containerId));
const warmupKey = warmupKeyFor(origin, this._memKey(containerId));
if (this._browserHeaderSessions.has(warmupKey)) {
return;
}
if (this._cookieCache) {
try {
const cachedCookies = await this._cookieCache.get(cacheKey);
const cachedHeadersJson = await this._cookieCache.get(cacheKey + ":headers");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@transports/lolcat-4play/index.js` around lines 394 - 420, The
`_loadCookieJar` path is preferring the shared `_cookieCache` over the fresher
in-memory jar, which can let stale session state override recent writes from
`_persistCookieJar`. Update `_loadCookieJar` to check `_cookieJarTextSessions`
first for the `memKey` and return that value when present, falling back to
`_cookieCache` only if no in-memory jar exists. Keep the existing error handling
and use the same `cookieJarKeyFor`, `_memKey`, and `_cacheKey` symbols so the
live cookie state always wins within the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants