feat(reasoning): bucket max_tokens to effort on adaptive Opus#2753
feat(reasoning): bucket max_tokens to effort on adaptive Opus#2753steebchen wants to merge 3 commits into
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds a ChangesAdaptive thinking effort resolution
Sequence DiagramsequenceDiagram
participant RequestBuilder
participant resolveAdaptiveEffort
participant Anthropic as Anthropic Provider
participant Bedrock as AWS Bedrock Provider
RequestBuilder->>resolveAdaptiveEffort: reasoning_effort, reasoning_max_tokens, effort
resolveAdaptiveEffort->>resolveAdaptiveEffort: Check effort > reasoning_effort > reasoning_max_tokens
resolveAdaptiveEffort->>RequestBuilder: AdaptiveEffort (low|medium|high|xhigh|max|undefined)
RequestBuilder->>Anthropic: output_config.effort
RequestBuilder->>Bedrock: output_config.effort
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b0d3e89e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| The `reasoning.max_tokens` parameter is supported by: | ||
|
|
||
| - **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5 | ||
| - **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5, Claude Opus 4.6, Claude Opus 4.7, Claude Opus 4.8 |
There was a problem hiding this comment.
Remove adaptive Opus from max_tokens support
For Claude Opus 4.6/4.7/4.8 requests that include reasoning.max_tokens, the gateway still rejects named/root models unless some provider mapping has reasoningMaxTokens === true (apps/gateway/src/chat/tools/validate-model-capabilities.ts), and the Opus 4.6/4.7/4.8 Anthropic mappings only set reasoningMode: "adaptive" without that flag. In practice anthropic/claude-opus-4-6 with reasoning.max_tokens returns the documented 400 rather than being accepted, so listing these models here (and saying below that the value is accepted) sends users toward a failing request path.
Useful? React with 👍 / 👎.
Adaptive Claude Opus models (4.6+) reject an explicit thinking budget, so a bare reasoning.max_tokens was previously accepted but dropped. Map it onto an adaptive effort level (<2k low, <8k medium, <24k high, else xhigh) so the requested budget still influences depth while keeping budget-based clients (e.g. Claude Code) working. Factor the shared effort resolution into one helper and document adaptive thinking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f9e017202
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const adaptiveEffort = resolveAdaptiveEffort( | ||
| undefined, | ||
| reasoning_effort, | ||
| reasoning_max_tokens, | ||
| ); |
There was a problem hiding this comment.
Route adaptive max-token requests past validation
Fresh evidence in this patch is the new reasoning_max_tokens bucketing call here, but gateway traffic still cannot reach it for claude-opus-4-6/4-7/4-8: validateModelCapabilities() only accepts reasoning.max_tokens when a mapping has reasoningMaxTokens === true, and the adaptive Opus mappings only declare reasoningMode: "adaptive"; auto-routing has the same provider.reasoningMaxTokens !== true filter. As a result requests like model: "anthropic/claude-opus-4-8" with reasoning.max_tokens are rejected/filtered before prepareRequestBody can translate the budget into output_config.effort.
Useful? React with 👍 / 👎.
The shared effort switch fell through 'none' to a default of "high", which would force strong reasoning if 'none' ever reached it (it's normalized away upstream for Anthropic/Bedrock today). Handle 'none' explicitly as no effort and drop the misleading catch-all; the switch is now exhaustive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc81872b46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (adaptiveEffort !== undefined) { | ||
| requestBody.output_config ??= {}; | ||
| requestBody.output_config.effort = adaptiveEffort; |
There was a problem hiding this comment.
Preserve bucketed effort with JSON schema output
When an adaptive Anthropic request includes both reasoning.max_tokens and response_format: { type: "json_schema" }, this new bucketed output_config.effort is assigned here but is later replaced wholesale by the JSON-schema output_config block, so the requested budget no longer influences adaptive depth even though the PR’s main behavior change is to avoid silently dropping it. This affects structured-output calls to Opus 4.6+ that rely on reasoning.max_tokens; the later merge should keep the existing effort while adding format.
Useful? React with 👍 / 👎.
Summary
Originating from a support question: a user reported reasoning "no longer activates" for
claude-opus-4-6and that the docs don't list it as a reasoning model. Reasoning is working correctly for Opus 4.6 — verified live against both Anthropic and AWS Bedrock. The confusion stems from adaptive thinking behavior plus stale docs. This PR improves thereasoning.max_tokenshandling for adaptive models and rewrites the docs.Behavior change: bucket
reasoning.max_tokens→ effort on adaptive modelsAdaptive Claude Opus models (4.6+) use Anthropic's adaptive thinking and reject an explicit
budget_tokens. Previously a barereasoning.max_tokenswas accepted but the budget was silently dropped (the model just ran adaptive at default depth).We considered returning a 4xx, but that would break budget-based clients — notably Claude Code, whose
/v1/messagesthinking:{type:"enabled",budget_tokens:N}is mapped toreasoning.max_tokensprecisely so it survives the Anthropic→OpenAI translation. Instead, we now bucket the requested budget into an adaptive effort level so it still influences depth:reasoning.max_tokens< 2000low< 8000medium< 24000high≥ 24000xhighPrecedence is unchanged: explicit
effort→reasoning_effort/reasoning.effort→ bucketedmax_tokens. The shared resolution logic is factored into oneresolveAdaptiveEfforthelper, replacing the two duplicatedmapEffortclosures in the Anthropic and Bedrock branches.Docs (
features/reasoning.mdx)reasoning.max_tokensbudget (it's bucketed to effort) and that effort is a hint — the model may dynamically reason briefly (or start immediately) even athigheffort. This is expected, not a disabled-reasoning bug.Verification
prepare-request-body.adaptive.spec.ts— 20/20 pass (new cases cover each budget→effort bucket for Opus 4.6/4.7/4.8, and explicit effort winning over a budget). Fullprepare-request-bodysuite: 118/118.chat-reasoning.e2e.tswithTEST_MODELS="anthropic/claude-opus-4-6,aws-bedrock/claude-opus-4-6"— 4/4 pass. Anthropic-direct returns reasoning (reasoning_tokens: 69), Bedrock returns the reasoning summary; all upstream calls 200.pnpm format+ docs build pass.🤖 Generated with Claude Code
Summary by CodeRabbit
reasoning.max_tokensguidance with a new “Adaptive Thinking (Claude Opus)” section, clarifying effort-based depth control.reasoning_effortor, when absent, from bucketedreasoning_max_tokens.reasoning_effortoverridesreasoning_max_tokens, andreasoning_effort: "none"disables thinking.reasoning_max_tokensis set, plus regression tests for override and disable behavior.