Skip to content

feat(anthropic): native passthrough for /v1/messages#2712

Open
steebchen wants to merge 1 commit into
mainfrom
fix/anthropic-native-messages-passthrough
Open

feat(anthropic): native passthrough for /v1/messages#2712
steebchen wants to merge 1 commit into
mainfrom
fix/anthropic-native-messages-passthrough

Conversation

@steebchen

@steebchen steebchen commented Jun 16, 2026

Copy link
Copy Markdown
Member

Problem

Follow-up to #2702. The native Anthropic /v1/messages endpoint serves requests by translating them into the internal OpenAI /v1/chat/completions pipeline and converting the response back to Anthropic shape. That round-trip is lossy: for web search it collapses the native content into a single text block, dropping the server_tool_use query block, the web_search_tool_result block (with encrypted_content/page_age), and the inline citations (web_search_result_location with cited_text/encrypted_index). The response shape didn't match Anthropic's Messages API.

Reported comparison: https://gist.github.com/mcowger/4b5ac55db15999fb38045fe9700039ef

Fix — native passthrough

When the resolved provider is Anthropic-native (anthropic, vertex-anthropic), the upstream response is already in the exact shape the client wants, so the gateway returns it verbatim — both non-streaming JSON and streaming SSE — instead of reconstructing it. Billing/logging still run unchanged (web search is still counted and billed).

/v1/messages signals this to the chat pipeline via an un-forgeable per-process nonce header, so a direct /v1/chat/completions request can't trigger passthrough. Non-Anthropic providers (e.g. fallback to bedrock/OpenAI) keep the existing OpenAI→Anthropic reconstruction.

How it works

  • native-passthrough.ts (new): the nonce/header, the gated response field, and the isAnthropicNativeProvider check.
  • chat.ts:
    • Non-streaming: attach the raw upstream Anthropic body under a private gated field on the response (only when the nonce is present and the provider is Anthropic-native).
    • Streaming: forward the raw Anthropic SSE events verbatim (before the lossy OpenAI transform, so server_tool_use/web_search_tool_result events that map to no OpenAI chunk are preserved), skip the transformed write, and skip the OpenAI finish/[DONE] synthesis. Web-search counting and cost still run via the existing calculateCosts path.
    • Cache key namespaces /v1/messages-origin requests so passthrough and non-passthrough responses never collide; passthrough streams aren't cached (avoids the replay metadata-chunk injection corrupting a native stream).
  • anthropic.ts:
    • Sets the nonce header on the internal request.
    • Non-streaming: returns the gated raw body verbatim.
    • Streaming: peeks the first SSE event — if it's a native Anthropic stream (event: message_start) it forwards the bytes raw; otherwise it falls back to the existing reconstruction (robust against fallback to a non-Anthropic provider, with no response-header timing issues).

This also improves fidelity of thinking/tool_use blocks for all Anthropic-native /v1/messages responses, not just web search.

Tests

New deterministic specs in apps/gateway/src/api.spec.ts:

  • Non-streaming verbatim: the response content deep-equals the upstream Anthropic blocks (incl. encrypted_content, page_age, encrypted_index, citations); the private field never leaks; web search is billed (webSearchCost > 0).
  • Streaming verbatim: native SSE events are forwarded (message_start, server_tool_use/web_search_tool_result content_block_start, message_stop) with no chat.completion.chunk/[DONE].
  • Security: a forged x-native-anthropic-passthrough header on /v1/chat/completions does not trigger passthrough.

Manual verification (live gateway, real Anthropic key)

  • Non-streaming gist payload → response contains server_tool_use + web_search_tool_result (with encrypted_content/page_age) + text with web_search_result_location citations + usage.server_tool_use.
  • Streaming → native event sequence with no OpenAI leakage.
  • Both billed web_search_cost = 0.0095 ($0.01 × 1 search × 0.95 discount).

pnpm build, pnpm format, and the new + existing /v1/messages tests pass.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Native Anthropic responses, including web search results, citations, and thinking blocks, are now returned verbatim without transformation.
    • Streaming and non-streaming requests preserve encrypted content and special fields.
  • Tests

    • Added test coverage for native Anthropic passthrough responses.

Anthropic /v1/messages was served by round-tripping through the internal
OpenAI /v1/chat/completions pipeline, which collapsed native Anthropic
content into a plain text block. Web search responses lost their
server_tool_use / web_search_tool_result blocks and inline citations, and
the response shape didn't match Anthropic's Messages API.

When the resolved provider is Anthropic-native (anthropic, vertex-anthropic),
return the raw upstream Anthropic response verbatim — both non-streaming JSON
and streaming SSE — while still running cost/billing/logging. /v1/messages
signals this to the chat pipeline via an un-forgeable per-process nonce header
so a direct /v1/chat/completions request cannot trigger it.

- chat.ts: forward raw Anthropic SSE events verbatim (skipping the OpenAI
  transform and finish/[DONE] synthesis) and attach the raw upstream body for
  non-streaming; billing still runs via the existing cost path.
- anthropic.ts: return the raw body verbatim (non-streaming) and detect a
  native stream by peeking the first event, forwarding it raw (streaming);
  falls back to the existing reconstruction for non-Anthropic providers.
- Cache key namespaces passthrough requests; passthrough streams aren't cached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Adds a nonce-secured "native Anthropic passthrough" mode to the gateway. A new primitives module defines the shared header, nonce, field key, and provider predicate. chat.ts detects the nonce, attaches raw upstream Anthropic responses, and bypasses OpenAI transformation. anthropic.ts classifies incoming streams and returns raw Anthropic JSON or SSE verbatim. Tests cover non-streaming, streaming, and anti-forgery cases.

Changes

Native Anthropic Passthrough

Layer / File(s) Summary
Passthrough primitives module
apps/gateway/src/chat/native-passthrough.ts
New module exports NATIVE_ANTHROPIC_PASSTHROUGH_HEADER, nativeAnthropicPassthroughNonce, NATIVE_ANTHROPIC_PASSTHROUGH_FIELD, and isAnthropicNativeProvider (true only for "anthropic" and "vertex-anthropic").
chat.ts: nonce detection, streaming forwarding, non-streaming field attachment
apps/gateway/src/chat/chat.ts
Imports primitives; reads the nonce from the request header to set nativeAnthropicPassthroughRequested; records flag in cache metadata; sets nativeAnthropicPassthroughStreaming per-attempt; skips cache capture, OpenAI chunk transformation, and synthesized finish/usage chunks when active; forwards raw SSE events verbatim via writeSSEAndCache; attaches raw upstream body under NATIVE_ANTHROPIC_PASSTHROUGH_FIELD for non-streaming responses.
anthropic.ts: nonce injection, stream classification, verbatim forwarding
apps/gateway/src/anthropic/anthropic.ts
Imports passthrough constants; injects the nonce header into the internal /v1/chat/completions request; buffers early SSE chunks to detect message_start signatures and pipes the raw stream verbatim when Anthropic-native is detected; reads NATIVE_ANTHROPIC_PASSTHROUGH_FIELD from non-streaming responses and returns the object directly as JSON.
Tests: non-streaming, streaming, and anti-forgery
apps/gateway/src/api.spec.ts
Adds a spyAnthropicResponse helper and fixtures; tests that non-streaming web search content blocks are returned verbatim with billing applied and the passthrough field not exposed; that streaming SSE events are forwarded raw without OpenAI-style output or [DONE]; and that a forged x-native-anthropic-passthrough header on /v1/chat/completions does not trigger passthrough.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant anthropic.ts
  participant chat.ts
  participant UpstreamAnthropic

  Client->>anthropic.ts: POST /v1/messages (stream or JSON)
  anthropic.ts->>chat.ts: POST /v1/chat/completions + nonce header
  chat.ts->>chat.ts: validate nonce → nativeAnthropicPassthroughRequested=true
  chat.ts->>UpstreamAnthropic: forward request
  UpstreamAnthropic-->>chat.ts: raw Anthropic SSE / JSON

  alt Streaming response
    chat.ts->>chat.ts: set nativeAnthropicPassthroughStreaming=true, skip cache/transform
    chat.ts-->>anthropic.ts: raw SSE stream
    anthropic.ts->>anthropic.ts: buffer chunks, detect message_start
    anthropic.ts-->>Client: pipe raw SSE verbatim
  else Non-streaming response
    chat.ts->>chat.ts: attach raw body to NATIVE_ANTHROPIC_PASSTHROUGH_FIELD
    chat.ts-->>anthropic.ts: transformed response + passthrough field
    anthropic.ts->>anthropic.ts: detect passthrough field
    anthropic.ts-->>Client: return raw Anthropic JSON verbatim
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • theopenco/llmgateway#2271: Modifies the same streaming SSE handling path in apps/gateway/src/anthropic/anthropic.ts, adding event: error SSE translation—directly adjacent to the new native passthrough stream detection logic.
  • theopenco/llmgateway#2380: Also modifies Anthropic /v1/messages streaming SSE in apps/gateway/src/anthropic/anthropic.ts, fixing message_delta/usage emission in the same streaming code path that this PR now gates behind the passthrough check.

Suggested reviewers

  • smakosh
  • proxysoul
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(anthropic): native passthrough for /v1/messages' clearly summarizes the main feature added: native passthrough support for Anthropic's /v1/messages endpoint, which is directly supported by all file changes shown.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/anthropic-native-messages-passthrough

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e38f2ac623

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5917 to +5921
if (
cachingEnabled &&
streamingCacheKey &&
!nativeAnthropicPassthroughStreaming
) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip saving native passthrough stream caches

When native passthrough is active this condition prevents every streamed event from being appended to streamingChunks, but the later cache-save path still writes a completed streaming cache entry whenever caching is enabled. In a project with response caching enabled, the first streamed Anthropic-native /v1/messages request stores chunks: [], and the next identical request hits that completed cache and replays no events; the save path also needs to be skipped for native passthrough streams.

Useful? React with 👍 / 👎.

Comment on lines +766 to +770
/"object"\s*:\s*"chat\.completion(\.chunk)?"/.test(buffer) ||
buffer.includes("data: [DONE]") ||
buffer.length > 65536
) {
classified = true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse the peeked OpenAI buffer before reading again

For non-native streams this peek can classify after reading an OpenAI SSE chunk into buffer, but the reconstruction loop below calls reader.read() before it ever splits the existing buffer. If the inner stream ends after that first read (common for small/error streams, or fallback to a non-Anthropic provider), the already-buffered events are skipped and /v1/messages returns an empty/truncated Anthropic stream; process the buffered data before waiting for another chunk.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)

8267-8291: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move Anthropic web-search accounting before the null-transform skip.

Native-only events are forwarded before transformation, but if transformStreamingToOpenai(...) returns null, the continue skips the web_search_tool_result counter below. That can under-bill streaming native passthrough web search even though the event was sent to the client.

🐛 Proposed fix
 								if (nativeAnthropicPassthroughStreaming) {
 									await writeSSEAndCache({
 										event:
 											typeof data?.type === "string" ? data.type : undefined,
 										data: JSON.stringify(data),
 										id: String(eventId++),
 									});
 								}
+
+								const isAnthropicStreamingProvider =
+									usedProvider === "anthropic" ||
+									usedProvider === "vertex-anthropic";
+								if (
+									isAnthropicStreamingProvider &&
+									data.type === "content_block_start" &&
+									data.content_block?.type === "web_search_tool_result"
+								) {
+									webSearchCount++;
+								}
 
 								// Transform streaming responses to OpenAI format for all providers
 								const transformedData = transformStreamingToOpenai(
-								if (
-									usedProvider === "anthropic" ||
-									usedProvider === "vertex-anthropic"
-								) {
-									// For Anthropic, count web_search_tool_result blocks
-									if (
-										data.type === "content_block_start" &&
-										data.content_block?.type === "web_search_tool_result"
-									) {
-										webSearchCount++;
-									}
-								} else if (isGoogleCompatibleProvider(usedProvider)) {
+								if (isGoogleCompatibleProvider(usedProvider)) {

Also applies to: 8627-8640

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/chat.ts` around lines 8267 - 8291, The issue is that
when transformStreamingToOpenai() returns null, the continue statement skips the
web-search accounting code that appears later, causing native Anthropic
passthrough web search events to be under-billed despite being sent to the
client. Move the web-search tool result counter logic (related to
web_search_tool_result accounting) to execute before the null-transform check,
ensuring the accounting happens regardless of whether the transformation returns
null. This fix needs to be applied at two locations in the same file: the
primary location around line 8267-8291 and the secondary location around line
8627-8640.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 5913-5921: The cache persistence logic is writing empty entries
for native Anthropic passthrough streams even though no chunks are collected
during streaming. Locate the final save/write guard that persists the cache
entry (likely in the streaming completion handler or after the streaming loop)
and add a condition to skip saving when nativeAnthropicPassthroughStreaming is
true or when the chunks collection is empty, preventing cache hits from
replaying zero chunks for the same /v1/messages stream.

---

Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 8267-8291: The issue is that when transformStreamingToOpenai()
returns null, the continue statement skips the web-search accounting code that
appears later, causing native Anthropic passthrough web search events to be
under-billed despite being sent to the client. Move the web-search tool result
counter logic (related to web_search_tool_result accounting) to execute before
the null-transform check, ensuring the accounting happens regardless of whether
the transformation returns null. This fix needs to be applied at two locations
in the same file: the primary location around line 8267-8291 and the secondary
location around line 8627-8640.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dedda435-e3df-4ad9-b367-8503caebd10e

📥 Commits

Reviewing files that changed from the base of the PR and between 6f8c753 and e38f2ac.

📒 Files selected for processing (4)
  • apps/gateway/src/anthropic/anthropic.ts
  • apps/gateway/src/api.spec.ts
  • apps/gateway/src/chat/chat.ts
  • apps/gateway/src/chat/native-passthrough.ts

Comment on lines +5913 to +5921
// Capture for streaming cache if enabled. Native-passthrough
// streams are not cached: the cached-replay path re-injects an
// OpenAI metadata chunk that would corrupt a native Anthropic
// stream, and web-search responses are dynamic anyway.
if (
cachingEnabled &&
streamingCacheKey &&
!nativeAnthropicPassthroughStreaming
) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t persist empty native-passthrough stream cache entries.

writeSSEAndCache intentionally skips collecting chunks for native Anthropic passthrough, but the final save guard still writes a completed cache entry. A later cache hit would replay zero chunks for the same /v1/messages stream.

🐛 Proposed fix
 					if (
 						cachingEnabled &&
 						streamingCacheKey &&
+						!nativeAnthropicPassthroughStreaming &&
 						!canceled &&
 						finishReason &&
 						!streamingError
 					) {

Also applies to: 9927-9933

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/chat.ts` around lines 5913 - 5921, The cache
persistence logic is writing empty entries for native Anthropic passthrough
streams even though no chunks are collected during streaming. Locate the final
save/write guard that persists the cache entry (likely in the streaming
completion handler or after the streaming loop) and add a condition to skip
saving when nativeAnthropicPassthroughStreaming is true or when the chunks
collection is empty, preventing cache hits from replaying zero chunks for the
same /v1/messages stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant