feat: add CRW as unified alternative to Tavily, Serper, and Firecrawl#3066
feat: add CRW as unified alternative to Tavily, Serper, and Firecrawl#3066us wants to merge 2 commits intokortix-ai:mainfrom
Conversation
CRW (fastcrw.com) provides web search, image search, and web scraping through a single API, replacing three separate providers. All existing providers remain fully functional as fallbacks — nothing is removed. API layer: - Add CRW_API_KEY and CRW_API_URL to config schema - Register CRW as a tool provider in the provider registry - Add CRW proxy service with 3-mode auth (managed/passthrough-billing/passthrough) - Add CRW route in proxy router Sandbox env injection: - Conditionally inject CRW_API_URL into sandboxes only when CRW is configured - Write empty string when CRW is disabled to clear stale values from pool sandboxes - Add CRW injection to all providers: pool, local-docker, daytona, justavps Sandbox tools (web_search, image_search, scrape_webpage): - Add resolveAuth() with priority: direct CRW key → router proxy CRW → legacy provider - Never send raw CRW_API_KEY to router proxy (isRouterProxy guard) - Add automatic fallback to legacy providers when CRW returns errors - Scrape: retry individual failed URLs in batch mode, not just all-or-nothing - Search: preserve Tavily answer/images fields for mobile UI Runtime (getEnv): - Make s6 env dir authoritative: empty file means key was explicitly cleared, preventing fallthrough to stale process.env values
|
@us is attempting to deploy a commit to the Kortix AI Corp Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Pull request overview
Adds CRW (fastcrw.com) as a unified web search / image search / scraping provider, with router-proxy support and legacy-provider fallbacks retained.
Changes:
- Added CRW configuration (API key + URL), provider registry entry, proxy service definition, and proxy response header sanitization.
- Updated sandbox env injection to conditionally inject/clear
CRW_API_URLacross pool + platform providers; updated sandbox tools to be CRW-first with fallbacks. - Adjusted tool/runtime env resolution so an empty s6 env file is authoritative (prevents falling through to stale
process.env).
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/setup-env.sh | Generates CRW env entries (and adds Anthropic key) in generated apps/api/.env. |
| core/kortix-master/opencode/tools/web_search.ts | Switches web search tool to CRW-first with Tavily fallback and unified output formatting. |
| core/kortix-master/opencode/tools/image_search.ts | Switches image search tool to CRW-first with Serper fallback; removes prior Replicate enrichment path. |
| core/kortix-master/opencode/tools/scrape_webpage.ts | Switches scraping tool to CRW-first with per-URL Firecrawl fallback retries. |
| core/kortix-master/opencode/tools/lib/get-env.ts | Makes s6 env dir authoritative even for empty values (prevents stale fallthrough). |
| core/docker/.env.example | Documents CRW_API_KEY and optional CRW_API_URL. |
| apps/api/src/config.ts | Adds CRW env schema + config exports; adds proxy_crw tool pricing. |
| apps/api/src/providers/registry.ts | Registers CRW as a tool provider for settings/UI visibility. |
| apps/api/src/router/config/proxy-services.ts | Adds CRW proxy service config and allowed routes; keeps legacy providers. |
| apps/api/src/router/routes/proxy.ts | Strips transport/content-encoding headers on proxy responses to avoid double-decompression issues. |
| apps/api/src/router/services/tavily.ts | Prefers CRW for web search with Tavily fallback; adds CRW search implementation. |
| apps/api/src/router/services/serper.ts | Prefers CRW for image search with Serper fallback; preserves safe-search behavior via Serper when explicitly off. |
| apps/api/src/pool/env-injector.ts | Conditionally injects or clears CRW_API_URL into pool sandboxes based on backend CRW key presence. |
| apps/api/src/platform/providers/local-docker.ts | Conditionally injects/clears CRW_API_URL and includes it in sync lists/env generation. |
| apps/api/src/platform/providers/justavps.ts | Conditionally injects CRW_API_URL into JustAVPS sandboxes. |
| apps/api/src/platform/providers/daytona.ts | Conditionally injects CRW_API_URL into Daytona sandboxes. |
| apps/api/src/tests/unit-proxy-services.test.ts | Updates proxy service registry expectations to include CRW (and existing LLM services). |
| apps/api/.env.example | Documents CRW_API_KEY / CRW_API_URL in the API env example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 81c35f8. Configure here.
| } | ||
| return result; | ||
| } | ||
| return await searchTavily(q, auth, maxResults, topic, isAdvanced); |
There was a problem hiding this comment.
CRW network errors bypass fallback to legacy providers
Medium Severity
When searchCrw or searchImagesCrw throws an exception (e.g., DNS failure, connection refused), the outer catch (e) returns an error without attempting the fallback provider. The fallback logic only triggers when the CRW function returns an error object (HTTP-level errors like 503), not when it throws (network-level errors). In contrast, scrape_webpage.ts handles this correctly — scrapeOne catches exceptions internally and always returns a result, so the aggregate fallback retries all failure modes. This inconsistency means web search and image search won't fall back to Tavily/Serper on network errors, even when those providers are available.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 81c35f8. Configure here.
- image_search: restore Replicate/Moondream2 enrichment (enrich arg, describeImage, enrichImages) that was accidentally dropped - scrape_webpage: remove unreachable duplicate crwKey branch in resolveAuth
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |


Summary
Add CRW (web search, image search, scraping) as a single-API alternative to three separate providers (Tavily, Serper, Firecrawl). All existing providers remain fully functional as fallbacks — nothing is removed or broken.
Why CRW?
What changed
API layer (
apps/api/)config.ts— AddedCRW_API_KEYandCRW_API_URLto the config schemaproviders/registry.ts— Registered CRW as a tool provider (visible in settings UI)router/config/proxy-services.ts— Added CRW proxy service definition with 3-mode authrouter/routes/proxy.ts— Added CRW route to the proxy routerSandbox env injection (
apps/api/src/platform/,apps/api/src/pool/)CRW_API_URLinto sandboxes only whenCRW_API_KEYis configured on the backendSandbox tools (
core/kortix-master/opencode/tools/)web_search.ts— CRW-first with Tavily fallback; preservesanswerandimagesfields for mobile UIimage_search.ts— CRW-first with Serper fallback; safe_search routing (Serper when explicit off)scrape_webpage.ts— CRW-first with Firecrawl fallback; per-URL retry in batch mode (not all-or-nothing)resolveAuth()pattern:CRW_API_KEY→ hit CRW directly (never sends raw key to router proxy)CRW_API_URL→ useKORTIX_TOKENvia proxyRuntime fix (
core/kortix-master/opencode/tools/lib/get-env.ts)process.envvalues. This is critical for the conditional CRW injection — when CRW is disabled, the empty s6 file must stop the tool from picking up an old URL.Config / setup
.env.examplefiles updated withCRW_API_KEY/CRW_API_URLscripts/setup-env.shincludes CRW in generated env filesHow it works (architecture)
Zero-risk integration
CRW_API_KEYis not set, everything works exactly as beforeTest plan
/v1/providersresponseCRW_API_URLvisible in Secrets Manager UINote
Medium Risk
Introduces a new proxied provider and updates sandbox tool request routing/env injection, which could affect production tool traffic and billing if misconfigured. Includes a proxy response header handling change that impacts all proxied upstreams.
Overview
Adds CRW as a unified tool provider for web search, image search, and scraping, wired through the router proxy with allowlisted endpoints, API key injection, and a new
proxy_crwbilling price.Updates sandbox provisioning/env injection (daytona/justavps/local-docker/pool) to only set
CRW_API_URLwhenCRW_API_KEYis configured and to clear stale values when disabled, and registers CRW in the provider registry and env examples/setup.Switches sandbox OpenCode tools (
web_search,image_search,scrape_webpage) to prefer CRW with legacy-provider fallbacks, and adjustsgetEnv()so an empty s6 env file is treated as an explicit clear to prevent falling back to staleprocess.env.Fixes the proxy router to strip transport headers (
content-encoding,content-length,transfer-encoding) from upstream responses to avoid decompression-related client errors.Reviewed by Cursor Bugbot for commit 81c35f8. Bugbot is set up for automated code reviews on this repo. Configure here.