feat(anthropic): native passthrough for /v1/messages#2712
Conversation
Anthropic /v1/messages was served by round-tripping through the internal OpenAI /v1/chat/completions pipeline, which collapsed native Anthropic content into a plain text block. Web search responses lost their server_tool_use / web_search_tool_result blocks and inline citations, and the response shape didn't match Anthropic's Messages API. When the resolved provider is Anthropic-native (anthropic, vertex-anthropic), return the raw upstream Anthropic response verbatim — both non-streaming JSON and streaming SSE — while still running cost/billing/logging. /v1/messages signals this to the chat pipeline via an un-forgeable per-process nonce header so a direct /v1/chat/completions request cannot trigger it. - chat.ts: forward raw Anthropic SSE events verbatim (skipping the OpenAI transform and finish/[DONE] synthesis) and attach the raw upstream body for non-streaming; billing still runs via the existing cost path. - anthropic.ts: return the raw body verbatim (non-streaming) and detect a native stream by peeking the first event, forwarding it raw (streaming); falls back to the existing reconstruction for non-Anthropic providers. - Cache key namespaces passthrough requests; passthrough streams aren't cached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughAdds a nonce-secured "native Anthropic passthrough" mode to the gateway. A new primitives module defines the shared header, nonce, field key, and provider predicate. ChangesNative Anthropic Passthrough
Sequence Diagram(s)sequenceDiagram
participant Client
participant anthropic.ts
participant chat.ts
participant UpstreamAnthropic
Client->>anthropic.ts: POST /v1/messages (stream or JSON)
anthropic.ts->>chat.ts: POST /v1/chat/completions + nonce header
chat.ts->>chat.ts: validate nonce → nativeAnthropicPassthroughRequested=true
chat.ts->>UpstreamAnthropic: forward request
UpstreamAnthropic-->>chat.ts: raw Anthropic SSE / JSON
alt Streaming response
chat.ts->>chat.ts: set nativeAnthropicPassthroughStreaming=true, skip cache/transform
chat.ts-->>anthropic.ts: raw SSE stream
anthropic.ts->>anthropic.ts: buffer chunks, detect message_start
anthropic.ts-->>Client: pipe raw SSE verbatim
else Non-streaming response
chat.ts->>chat.ts: attach raw body to NATIVE_ANTHROPIC_PASSTHROUGH_FIELD
chat.ts-->>anthropic.ts: transformed response + passthrough field
anthropic.ts->>anthropic.ts: detect passthrough field
anthropic.ts-->>Client: return raw Anthropic JSON verbatim
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e38f2ac623
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ( | ||
| cachingEnabled && | ||
| streamingCacheKey && | ||
| !nativeAnthropicPassthroughStreaming | ||
| ) { |
There was a problem hiding this comment.
Skip saving native passthrough stream caches
When native passthrough is active this condition prevents every streamed event from being appended to streamingChunks, but the later cache-save path still writes a completed streaming cache entry whenever caching is enabled. In a project with response caching enabled, the first streamed Anthropic-native /v1/messages request stores chunks: [], and the next identical request hits that completed cache and replays no events; the save path also needs to be skipped for native passthrough streams.
Useful? React with 👍 / 👎.
| /"object"\s*:\s*"chat\.completion(\.chunk)?"/.test(buffer) || | ||
| buffer.includes("data: [DONE]") || | ||
| buffer.length > 65536 | ||
| ) { | ||
| classified = true; |
There was a problem hiding this comment.
Parse the peeked OpenAI buffer before reading again
For non-native streams this peek can classify after reading an OpenAI SSE chunk into buffer, but the reconstruction loop below calls reader.read() before it ever splits the existing buffer. If the inner stream ends after that first read (common for small/error streams, or fallback to a non-Anthropic provider), the already-buffered events are skipped and /v1/messages returns an empty/truncated Anthropic stream; process the buffered data before waiting for another chunk.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)
8267-8291:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove Anthropic web-search accounting before the null-transform skip.
Native-only events are forwarded before transformation, but if
transformStreamingToOpenai(...)returnsnull, thecontinueskips theweb_search_tool_resultcounter below. That can under-bill streaming native passthrough web search even though the event was sent to the client.🐛 Proposed fix
if (nativeAnthropicPassthroughStreaming) { await writeSSEAndCache({ event: typeof data?.type === "string" ? data.type : undefined, data: JSON.stringify(data), id: String(eventId++), }); } + + const isAnthropicStreamingProvider = + usedProvider === "anthropic" || + usedProvider === "vertex-anthropic"; + if ( + isAnthropicStreamingProvider && + data.type === "content_block_start" && + data.content_block?.type === "web_search_tool_result" + ) { + webSearchCount++; + } // Transform streaming responses to OpenAI format for all providers const transformedData = transformStreamingToOpenai(- if ( - usedProvider === "anthropic" || - usedProvider === "vertex-anthropic" - ) { - // For Anthropic, count web_search_tool_result blocks - if ( - data.type === "content_block_start" && - data.content_block?.type === "web_search_tool_result" - ) { - webSearchCount++; - } - } else if (isGoogleCompatibleProvider(usedProvider)) { + if (isGoogleCompatibleProvider(usedProvider)) {Also applies to: 8627-8640
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/chat.ts` around lines 8267 - 8291, The issue is that when transformStreamingToOpenai() returns null, the continue statement skips the web-search accounting code that appears later, causing native Anthropic passthrough web search events to be under-billed despite being sent to the client. Move the web-search tool result counter logic (related to web_search_tool_result accounting) to execute before the null-transform check, ensuring the accounting happens regardless of whether the transformation returns null. This fix needs to be applied at two locations in the same file: the primary location around line 8267-8291 and the secondary location around line 8627-8640.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 5913-5921: The cache persistence logic is writing empty entries
for native Anthropic passthrough streams even though no chunks are collected
during streaming. Locate the final save/write guard that persists the cache
entry (likely in the streaming completion handler or after the streaming loop)
and add a condition to skip saving when nativeAnthropicPassthroughStreaming is
true or when the chunks collection is empty, preventing cache hits from
replaying zero chunks for the same /v1/messages stream.
---
Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 8267-8291: The issue is that when transformStreamingToOpenai()
returns null, the continue statement skips the web-search accounting code that
appears later, causing native Anthropic passthrough web search events to be
under-billed despite being sent to the client. Move the web-search tool result
counter logic (related to web_search_tool_result accounting) to execute before
the null-transform check, ensuring the accounting happens regardless of whether
the transformation returns null. This fix needs to be applied at two locations
in the same file: the primary location around line 8267-8291 and the secondary
location around line 8627-8640.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dedda435-e3df-4ad9-b367-8503caebd10e
📒 Files selected for processing (4)
apps/gateway/src/anthropic/anthropic.tsapps/gateway/src/api.spec.tsapps/gateway/src/chat/chat.tsapps/gateway/src/chat/native-passthrough.ts
| // Capture for streaming cache if enabled. Native-passthrough | ||
| // streams are not cached: the cached-replay path re-injects an | ||
| // OpenAI metadata chunk that would corrupt a native Anthropic | ||
| // stream, and web-search responses are dynamic anyway. | ||
| if ( | ||
| cachingEnabled && | ||
| streamingCacheKey && | ||
| !nativeAnthropicPassthroughStreaming | ||
| ) { |
There was a problem hiding this comment.
Don’t persist empty native-passthrough stream cache entries.
writeSSEAndCache intentionally skips collecting chunks for native Anthropic passthrough, but the final save guard still writes a completed cache entry. A later cache hit would replay zero chunks for the same /v1/messages stream.
🐛 Proposed fix
if (
cachingEnabled &&
streamingCacheKey &&
+ !nativeAnthropicPassthroughStreaming &&
!canceled &&
finishReason &&
!streamingError
) {Also applies to: 9927-9933
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/gateway/src/chat/chat.ts` around lines 5913 - 5921, The cache
persistence logic is writing empty entries for native Anthropic passthrough
streams even though no chunks are collected during streaming. Locate the final
save/write guard that persists the cache entry (likely in the streaming
completion handler or after the streaming loop) and add a condition to skip
saving when nativeAnthropicPassthroughStreaming is true or when the chunks
collection is empty, preventing cache hits from replaying zero chunks for the
same /v1/messages stream.
Problem
Follow-up to #2702. The native Anthropic
/v1/messagesendpoint serves requests by translating them into the internal OpenAI/v1/chat/completionspipeline and converting the response back to Anthropic shape. That round-trip is lossy: for web search it collapses the native content into a singletextblock, dropping theserver_tool_usequery block, theweb_search_tool_resultblock (withencrypted_content/page_age), and the inlinecitations(web_search_result_locationwithcited_text/encrypted_index). The response shape didn't match Anthropic's Messages API.Reported comparison: https://gist.github.com/mcowger/4b5ac55db15999fb38045fe9700039ef
Fix — native passthrough
When the resolved provider is Anthropic-native (
anthropic,vertex-anthropic), the upstream response is already in the exact shape the client wants, so the gateway returns it verbatim — both non-streaming JSON and streaming SSE — instead of reconstructing it. Billing/logging still run unchanged (web search is still counted and billed)./v1/messagessignals this to the chat pipeline via an un-forgeable per-process nonce header, so a direct/v1/chat/completionsrequest can't trigger passthrough. Non-Anthropic providers (e.g. fallback to bedrock/OpenAI) keep the existing OpenAI→Anthropic reconstruction.How it works
native-passthrough.ts(new): the nonce/header, the gated response field, and theisAnthropicNativeProvidercheck.chat.ts:server_tool_use/web_search_tool_resultevents that map to no OpenAI chunk are preserved), skip the transformed write, and skip the OpenAI finish/[DONE]synthesis. Web-search counting and cost still run via the existingcalculateCostspath./v1/messages-origin requests so passthrough and non-passthrough responses never collide; passthrough streams aren't cached (avoids the replay metadata-chunk injection corrupting a native stream).anthropic.ts:event: message_start) it forwards the bytes raw; otherwise it falls back to the existing reconstruction (robust against fallback to a non-Anthropic provider, with no response-header timing issues).This also improves fidelity of
thinking/tool_useblocks for all Anthropic-native/v1/messagesresponses, not just web search.Tests
New deterministic specs in
apps/gateway/src/api.spec.ts:contentdeep-equals the upstream Anthropic blocks (incl.encrypted_content,page_age,encrypted_index, citations); the private field never leaks; web search is billed (webSearchCost > 0).message_start,server_tool_use/web_search_tool_resultcontent_block_start,message_stop) with nochat.completion.chunk/[DONE].x-native-anthropic-passthroughheader on/v1/chat/completionsdoes not trigger passthrough.Manual verification (live gateway, real Anthropic key)
server_tool_use+web_search_tool_result(withencrypted_content/page_age) + text withweb_search_result_locationcitations +usage.server_tool_use.web_search_cost = 0.0095($0.01 × 1 search × 0.95 discount).pnpm build,pnpm format, and the new + existing/v1/messagestests pass.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests