fix(BetaToolRunner): cap max_tokens in compaction summary request#958
Open
Zelys-DFKH wants to merge 1 commit intoanthropics:mainfrom
Open
fix(BetaToolRunner): cap max_tokens in compaction summary request#958Zelys-DFKH wants to merge 1 commit intoanthropics:mainfrom
Zelys-DFKH wants to merge 1 commit intoanthropics:mainfrom
Conversation
The compaction summary is a non-streaming messages.create() call. When the tool runner's max_tokens is large (e.g. 64000), the SDK's pre-flight timeout check throws before the request is made: "Streaming is required for operations that may take longer than 10 minutes" The formula is expectedTime = (60min * max_tokens) / 128000, so the error fires for max_tokens > ~21333 (or > 8192 on Opus models), well within the range of typical tool-runner configurations. A compaction summary only needs a few thousand tokens. Cap max_tokens at the new COMPACTION_SUMMARY_MAX_TOKENS constant (4096) so the request always stays within the non-streaming allowed range for every model. Fixes anthropics#863
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
#checkAndCompact()passesthis.#state.params.max_tokensdirectly to anon-streaming
messages.create()call. The SDK's pre-flight timeout checkthrows "Streaming is required" when
max_tokensexceeds ~21,333 (or ~8,192for Opus models). Tool runners routinely use values well above that threshold,
so compaction fails before the request goes out.
Fix
Cap the compaction request at a new
COMPACTION_SUMMARY_MAX_TOKENS = 4096constant (exported from
CompactionControl.ts). Summaries don't need morethan a few thousand tokens, and 4096 stays within the non-streaming limit for
every model.
Fixes #863