Skip to content

feat: add CRW as unified alternative to Tavily, Serper, and Firecrawl#3066

Open
us wants to merge 2 commits intokortix-ai:mainfrom
us:feat/crw-provider-integration
Open

feat: add CRW as unified alternative to Tavily, Serper, and Firecrawl#3066
us wants to merge 2 commits intokortix-ai:mainfrom
us:feat/crw-provider-integration

Conversation

@us
Copy link
Copy Markdown
Contributor

@us us commented Apr 13, 2026

Summary

Add CRW (web search, image search, scraping) as a single-API alternative to three separate providers (Tavily, Serper, Firecrawl). All existing providers remain fully functional as fallbacks — nothing is removed or broken.

Why CRW?

Capability Before (3 providers) After (1 provider + 3 fallbacks)
Web search Tavily CRW → Tavily fallback
Image search Serper CRW → Serper fallback
Web scraping Firecrawl CRW → Firecrawl fallback
API keys needed 3 separate keys 1 key (or 0 if using router proxy)

What changed

API layer (apps/api/)

  • config.ts — Added CRW_API_KEY and CRW_API_URL to the config schema
  • providers/registry.ts — Registered CRW as a tool provider (visible in settings UI)
  • router/config/proxy-services.ts — Added CRW proxy service definition with 3-mode auth
  • router/routes/proxy.ts — Added CRW route to the proxy router

Sandbox env injection (apps/api/src/platform/, apps/api/src/pool/)

  • Conditionally inject CRW_API_URL into sandboxes only when CRW_API_KEY is configured on the backend
  • Write empty string when CRW is disabled → clears stale values from pool sandboxes
  • Applied consistently across all 4 providers: pool (env-injector), local-docker, daytona, justavps

Sandbox tools (core/kortix-master/opencode/tools/)

  • web_search.ts — CRW-first with Tavily fallback; preserves answer and images fields for mobile UI
  • image_search.ts — CRW-first with Serper fallback; safe_search routing (Serper when explicit off)
  • scrape_webpage.ts — CRW-first with Firecrawl fallback; per-URL retry in batch mode (not all-or-nothing)
  • All tools share the same resolveAuth() pattern:
    1. Direct CRW_API_KEY → hit CRW directly (never sends raw key to router proxy)
    2. Router proxy CRW_API_URL → use KORTIX_TOKEN via proxy
    3. Legacy provider (Tavily/Serper/Firecrawl) as fallback

Runtime fix (core/kortix-master/opencode/tools/lib/get-env.ts)

  • s6 env dir is now authoritative: an empty file means the key was explicitly cleared, preventing fallthrough to stale process.env values. This is critical for the conditional CRW injection — when CRW is disabled, the empty s6 file must stop the tool from picking up an old URL.

Config / setup

  • .env.example files updated with CRW_API_KEY / CRW_API_URL
  • scripts/setup-env.sh includes CRW in generated env files

How it works (architecture)

┌─────────────────────────────────────────────┐
│  Sandbox (Docker container)                 │
│                                             │
│  web_search.ts ──► resolveAuth()            │
│    1. CRW_API_KEY set? → direct to CRW     │
│    2. CRW_API_URL is proxy? → via proxy     │
│    3. else → legacy Tavily                  │
│    4. CRW fails? → fallback to Tavily       │
│                                             │
│  getEnv() reads from:                       │
│    s6 env dir (authoritative, always fresh) │
│    → process.env → .env file                │
└──────────────┬──────────────────────────────┘
               │ KORTIX_TOKEN
               ▼
┌──────────────────────────────────────────────┐
│  API Router Proxy (/v1/router/crw/*)         │
│  - Validates KORTIX_TOKEN                    │
│  - Injects CRW_API_KEY from backend config   │
│  - Forwards to https://fastcrw.com/api       │
└──────────────────────────────────────────────┘

Zero-risk integration

  • No existing behavior changed: If CRW_API_KEY is not set, everything works exactly as before
  • Automatic fallback: If CRW returns an error (503, timeout, etc.), tools automatically retry with the legacy provider
  • Stale env cleanup: Disabling CRW writes an empty s6 file, preventing sandbox tools from using an old proxy URL
  • Pool sandbox safe: env-injector conditionally injects CRW_API_URL; clears it when CRW is not configured

Test plan

  • Unit tests pass (27/27)
  • API health check OK
  • CRW appears in /v1/providers response
  • CRW proxy route responds (204 OPTIONS, 401 unauthorized without token)
  • CRW proxy search works end-to-end (200, 672ms, reaches fastcrw.com)
  • CRW_API_URL visible in Secrets Manager UI
  • Web UI login, dashboard, sessions, settings all work normally
  • Legacy providers (Tavily, Serper, Firecrawl) still available as fallbacks

Note

Medium Risk
Introduces a new proxied provider and updates sandbox tool request routing/env injection, which could affect production tool traffic and billing if misconfigured. Includes a proxy response header handling change that impacts all proxied upstreams.

Overview
Adds CRW as a unified tool provider for web search, image search, and scraping, wired through the router proxy with allowlisted endpoints, API key injection, and a new proxy_crw billing price.

Updates sandbox provisioning/env injection (daytona/justavps/local-docker/pool) to only set CRW_API_URL when CRW_API_KEY is configured and to clear stale values when disabled, and registers CRW in the provider registry and env examples/setup.

Switches sandbox OpenCode tools (web_search, image_search, scrape_webpage) to prefer CRW with legacy-provider fallbacks, and adjusts getEnv() so an empty s6 env file is treated as an explicit clear to prevent falling back to stale process.env.

Fixes the proxy router to strip transport headers (content-encoding, content-length, transfer-encoding) from upstream responses to avoid decompression-related client errors.

Reviewed by Cursor Bugbot for commit 81c35f8. Bugbot is set up for automated code reviews on this repo. Configure here.

CRW (fastcrw.com) provides web search, image search, and web scraping
through a single API, replacing three separate providers. All existing
providers remain fully functional as fallbacks — nothing is removed.

API layer:
- Add CRW_API_KEY and CRW_API_URL to config schema
- Register CRW as a tool provider in the provider registry
- Add CRW proxy service with 3-mode auth (managed/passthrough-billing/passthrough)
- Add CRW route in proxy router

Sandbox env injection:
- Conditionally inject CRW_API_URL into sandboxes only when CRW is configured
- Write empty string when CRW is disabled to clear stale values from pool sandboxes
- Add CRW injection to all providers: pool, local-docker, daytona, justavps

Sandbox tools (web_search, image_search, scrape_webpage):
- Add resolveAuth() with priority: direct CRW key → router proxy CRW → legacy provider
- Never send raw CRW_API_KEY to router proxy (isRouterProxy guard)
- Add automatic fallback to legacy providers when CRW returns errors
- Scrape: retry individual failed URLs in batch mode, not just all-or-nothing
- Search: preserve Tavily answer/images fields for mobile UI

Runtime (getEnv):
- Make s6 env dir authoritative: empty file means key was explicitly cleared,
  preventing fallthrough to stale process.env values
Copilot AI review requested due to automatic review settings April 13, 2026 20:30
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

@us is attempting to deploy a commit to the Kortix AI Corp Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CRW (fastcrw.com) as a unified web search / image search / scraping provider, with router-proxy support and legacy-provider fallbacks retained.

Changes:

  • Added CRW configuration (API key + URL), provider registry entry, proxy service definition, and proxy response header sanitization.
  • Updated sandbox env injection to conditionally inject/clear CRW_API_URL across pool + platform providers; updated sandbox tools to be CRW-first with fallbacks.
  • Adjusted tool/runtime env resolution so an empty s6 env file is authoritative (prevents falling through to stale process.env).

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/setup-env.sh Generates CRW env entries (and adds Anthropic key) in generated apps/api/.env.
core/kortix-master/opencode/tools/web_search.ts Switches web search tool to CRW-first with Tavily fallback and unified output formatting.
core/kortix-master/opencode/tools/image_search.ts Switches image search tool to CRW-first with Serper fallback; removes prior Replicate enrichment path.
core/kortix-master/opencode/tools/scrape_webpage.ts Switches scraping tool to CRW-first with per-URL Firecrawl fallback retries.
core/kortix-master/opencode/tools/lib/get-env.ts Makes s6 env dir authoritative even for empty values (prevents stale fallthrough).
core/docker/.env.example Documents CRW_API_KEY and optional CRW_API_URL.
apps/api/src/config.ts Adds CRW env schema + config exports; adds proxy_crw tool pricing.
apps/api/src/providers/registry.ts Registers CRW as a tool provider for settings/UI visibility.
apps/api/src/router/config/proxy-services.ts Adds CRW proxy service config and allowed routes; keeps legacy providers.
apps/api/src/router/routes/proxy.ts Strips transport/content-encoding headers on proxy responses to avoid double-decompression issues.
apps/api/src/router/services/tavily.ts Prefers CRW for web search with Tavily fallback; adds CRW search implementation.
apps/api/src/router/services/serper.ts Prefers CRW for image search with Serper fallback; preserves safe-search behavior via Serper when explicitly off.
apps/api/src/pool/env-injector.ts Conditionally injects or clears CRW_API_URL into pool sandboxes based on backend CRW key presence.
apps/api/src/platform/providers/local-docker.ts Conditionally injects/clears CRW_API_URL and includes it in sync lists/env generation.
apps/api/src/platform/providers/justavps.ts Conditionally injects CRW_API_URL into JustAVPS sandboxes.
apps/api/src/platform/providers/daytona.ts Conditionally injects CRW_API_URL into Daytona sandboxes.
apps/api/src/tests/unit-proxy-services.test.ts Updates proxy service registry expectations to include CRW (and existing LLM services).
apps/api/.env.example Documents CRW_API_KEY / CRW_API_URL in the API env example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/kortix-master/opencode/tools/scrape_webpage.ts Outdated
Comment thread core/kortix-master/opencode/tools/image_search.ts
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 81c35f8. Configure here.

}
return result;
}
return await searchTavily(q, auth, maxResults, topic, isAdvanced);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRW network errors bypass fallback to legacy providers

Medium Severity

When searchCrw or searchImagesCrw throws an exception (e.g., DNS failure, connection refused), the outer catch (e) returns an error without attempting the fallback provider. The fallback logic only triggers when the CRW function returns an error object (HTTP-level errors like 503), not when it throws (network-level errors). In contrast, scrape_webpage.ts handles this correctly — scrapeOne catches exceptions internally and always returns a result, so the aggregate fallback retries all failure modes. This inconsistency means web search and image search won't fall back to Tavily/Serper on network errors, even when those providers are available.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 81c35f8. Configure here.

Comment thread core/kortix-master/opencode/tools/scrape_webpage.ts Outdated
- image_search: restore Replicate/Moondream2 enrichment (enrich arg,
  describeImage, enrichImages) that was accidentally dropped
- scrape_webpage: remove unreachable duplicate crwKey branch in resolveAuth
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 13, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants