Skip to content

feat(reasoning): bucket max_tokens to effort on adaptive Opus#2753

Open
steebchen wants to merge 3 commits into
mainfrom
opus-4-6-reasoning-regression
Open

feat(reasoning): bucket max_tokens to effort on adaptive Opus#2753
steebchen wants to merge 3 commits into
mainfrom
opus-4-6-reasoning-regression

Conversation

@steebchen

@steebchen steebchen commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

Originating from a support question: a user reported reasoning "no longer activates" for claude-opus-4-6 and that the docs don't list it as a reasoning model. Reasoning is working correctly for Opus 4.6 — verified live against both Anthropic and AWS Bedrock. The confusion stems from adaptive thinking behavior plus stale docs. This PR improves the reasoning.max_tokens handling for adaptive models and rewrites the docs.

Behavior change: bucket reasoning.max_tokens → effort on adaptive models

Adaptive Claude Opus models (4.6+) use Anthropic's adaptive thinking and reject an explicit budget_tokens. Previously a bare reasoning.max_tokens was accepted but the budget was silently dropped (the model just ran adaptive at default depth).

We considered returning a 4xx, but that would break budget-based clients — notably Claude Code, whose /v1/messages thinking:{type:"enabled",budget_tokens:N} is mapped to reasoning.max_tokens precisely so it survives the Anthropic→OpenAI translation. Instead, we now bucket the requested budget into an adaptive effort level so it still influences depth:

reasoning.max_tokens adaptive effort
< 2000 low
< 8000 medium
< 24000 high
≥ 24000 xhigh

Precedence is unchanged: explicit effortreasoning_effort/reasoning.effort → bucketed max_tokens. The shared resolution logic is factored into one resolveAdaptiveEffort helper, replacing the two duplicated mapEffort closures in the Anthropic and Bedrock branches.

Docs (features/reasoning.mdx)

  • Stop enumerating a hardcoded "supported models" list (goes stale); point to the models page reasoning filter instead.
  • Add a dedicated Adaptive Thinking (Claude Opus) section explaining that Opus 4.6+ don't honor an exact reasoning.max_tokens budget (it's bucketed to effort) and that effort is a hint — the model may dynamically reason briefly (or start immediately) even at high effort. This is expected, not a disabled-reasoning bug.

Verification

  • Unit prepare-request-body.adaptive.spec.ts — 20/20 pass (new cases cover each budget→effort bucket for Opus 4.6/4.7/4.8, and explicit effort winning over a budget). Full prepare-request-body suite: 118/118.
  • Live e2e chat-reasoning.e2e.ts with TEST_MODELS="anthropic/claude-opus-4-6,aws-bedrock/claude-opus-4-6" — 4/4 pass. Anthropic-direct returns reasoning (reasoning_tokens: 69), Bedrock returns the reasoning summary; all upstream calls 200.
  • pnpm format + docs build pass.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Expanded the reasoning docs to cover additional Anthropic Claude reasoning-enabled models.
    • Updated reasoning.max_tokens guidance with a new “Adaptive Thinking (Claude Opus)” section, clarifying effort-based depth control.
  • Improvements
    • Enhanced adaptive-thinking request building to derive depth/effort from reasoning_effort or, when absent, from bucketed reasoning_max_tokens.
    • Ensures reasoning_effort overrides reasoning_max_tokens, and reasoning_effort: "none" disables thinking.
  • Tests
    • Added coverage for adaptive models where only reasoning_max_tokens is set, plus regression tests for override and disable behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c87160a-df3d-4017-895a-ac73a7d232a8

📥 Commits

Reviewing files that changed from the base of the PR and between 9f9e017 and bc81872.

📒 Files selected for processing (2)
  • packages/actions/src/prepare-request-body.adaptive.spec.ts
  • packages/actions/src/prepare-request-body.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/actions/src/prepare-request-body.adaptive.spec.ts
  • packages/actions/src/prepare-request-body.ts

Walkthrough

Adds a resolveAdaptiveEffort() helper function that converts reasoning_effort and reasoning_max_tokens into Anthropic adaptive-thinking effort levels, applies it to both Anthropic and AWS Bedrock request builders, and expands reasoning documentation with updated model support and adaptive thinking guidance.

Changes

Adaptive thinking effort resolution

Layer / File(s) Summary
Reasoning documentation expansion
apps/docs/content/features/reasoning.mdx
Broadens "Reasoning-Enabled Models" list from single Claude 3.7 Sonnet to general Claude Sonnet and Claude Opus. Updates reasoning.max_tokens supported models section with guidance that newer Claude Opus models use adaptive thinking and map the budget to effort levels instead. Adds new "Adaptive Thinking (Claude Opus)" section describing adaptive behavior, effort control via reasoning_effort / reasoning.effort, and backward-compatibility mapping of reasoning.max_tokens.
Adaptive effort resolution helper
packages/actions/src/prepare-request-body.ts
Introduces AdaptiveEffort type and resolveAdaptiveEffort() helper that derives adaptive-thinking effort (low, medium, high, xhigh, max) by checking explicit effort parameter first, then reasoning_effort mapping, then reasoning_max_tokens bucket thresholds, returning undefined if none apply.
Adaptive thinking request construction and tests
packages/actions/src/prepare-request-body.adaptive.spec.ts, packages/actions/src/prepare-request-body.ts
Adds test cases validating that reasoning_max_tokens is converted to output_config.effort levels and that reasoning_effort takes precedence. Extends reasoning_effort type to include "none" for disabling thinking. Updates Anthropic adaptive-thinking path to use resolveAdaptiveEffort() when effort is not explicitly set. Updates AWS Bedrock adaptive-thinking path to use resolveAdaptiveEffort() instead of local mapping logic.

Sequence Diagram

sequenceDiagram
  participant RequestBuilder
  participant resolveAdaptiveEffort
  participant Anthropic as Anthropic Provider
  participant Bedrock as AWS Bedrock Provider
  RequestBuilder->>resolveAdaptiveEffort: reasoning_effort, reasoning_max_tokens, effort
  resolveAdaptiveEffort->>resolveAdaptiveEffort: Check effort > reasoning_effort > reasoning_max_tokens
  resolveAdaptiveEffort->>RequestBuilder: AdaptiveEffort (low|medium|high|xhigh|max|undefined)
  RequestBuilder->>Anthropic: output_config.effort
  RequestBuilder->>Bedrock: output_config.effort
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • theopenco/llmgateway#2034: Both PRs modify packages/actions/src/prepare-request-body.ts to change Bedrock Anthropic "adaptive thinking" request assembly—specifically how reasoning_effort/reasoning_max_tokens are mapped into output_config.effort for thinking.type = "adaptive".
  • theopenco/llmgateway#2550: Both PRs modify packages/actions/src/prepare-request-body.ts and its adaptive-thinking request construction (plus prepare-request-body.adaptive.spec.ts) to map Anthropic Opus adaptive inputs—especially reasoning_max_tokens/reasoning_effort into output_config.effort and thinking: { type: "adaptive" }.
  • theopenco/llmgateway#2558: Main PR extends Anthropic adaptive request building by deriving thinking.output_config.effort from reasoning_max_tokens/reasoning_effort, while the retrieved PR adds the inverse gateway mapping that translates Anthropic thinking/output_config.effort into unified reasoning.effort/reasoning.max_tokens—both directly cover the same Anthropic reasoning-effort fields, just on opposite request-building/parsing paths.

Suggested reviewers

  • smakosh
  • proxysoul
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly identifies the main change: introducing bucketing of reasoning.max_tokens to adaptive effort levels for Claude Opus models, which is the core behavioral addition.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch opus-4-6-reasoning-regression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b0d3e89e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The `reasoning.max_tokens` parameter is supported by:

- **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5
- **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5, Claude Opus 4.6, Claude Opus 4.7, Claude Opus 4.8

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove adaptive Opus from max_tokens support

For Claude Opus 4.6/4.7/4.8 requests that include reasoning.max_tokens, the gateway still rejects named/root models unless some provider mapping has reasoningMaxTokens === true (apps/gateway/src/chat/tools/validate-model-capabilities.ts), and the Opus 4.6/4.7/4.8 Anthropic mappings only set reasoningMode: "adaptive" without that flag. In practice anthropic/claude-opus-4-6 with reasoning.max_tokens returns the documented 400 rather than being accepted, so listing these models here (and saying below that the value is accepted) sends users toward a failing request path.

Useful? React with 👍 / 👎.

Adaptive Claude Opus models (4.6+) reject an explicit thinking budget, so a
bare reasoning.max_tokens was previously accepted but dropped. Map it onto an
adaptive effort level (<2k low, <8k medium, <24k high, else xhigh) so the
requested budget still influences depth while keeping budget-based clients
(e.g. Claude Code) working. Factor the shared effort resolution into one
helper and document adaptive thinking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@steebchen steebchen changed the title docs(reasoning): document Opus 4.6/4.7/4.8 adaptive reasoning feat(reasoning): bucket max_tokens to effort on adaptive Opus Jun 19, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f9e017202

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1974 to +1978
const adaptiveEffort = resolveAdaptiveEffort(
undefined,
reasoning_effort,
reasoning_max_tokens,
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route adaptive max-token requests past validation

Fresh evidence in this patch is the new reasoning_max_tokens bucketing call here, but gateway traffic still cannot reach it for claude-opus-4-6/4-7/4-8: validateModelCapabilities() only accepts reasoning.max_tokens when a mapping has reasoningMaxTokens === true, and the adaptive Opus mappings only declare reasoningMode: "adaptive"; auto-routing has the same provider.reasoningMaxTokens !== true filter. As a result requests like model: "anthropic/claude-opus-4-8" with reasoning.max_tokens are rejected/filtered before prepareRequestBody can translate the budget into output_config.effort.

Useful? React with 👍 / 👎.

The shared effort switch fell through 'none' to a default of "high", which
would force strong reasoning if 'none' ever reached it (it's normalized away
upstream for Anthropic/Bedrock today). Handle 'none' explicitly as no effort
and drop the misleading catch-all; the switch is now exhaustive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc81872b46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1983 to +1985
if (adaptiveEffort !== undefined) {
requestBody.output_config ??= {};
requestBody.output_config.effort = adaptiveEffort;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve bucketed effort with JSON schema output

When an adaptive Anthropic request includes both reasoning.max_tokens and response_format: { type: "json_schema" }, this new bucketed output_config.effort is assigned here but is later replaced wholesale by the JSON-schema output_config block, so the requested budget no longer influences adaptive depth even though the PR’s main behavior change is to avoid silently dropping it. This affects structured-output calls to Opus 4.6+ that rely on reasoning.max_tokens; the later merge should keep the existing effort while adding format.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant