Skip to content

fix(BetaToolRunner): cap max_tokens in compaction summary request#958

Open
Zelys-DFKH wants to merge 1 commit intoanthropics:mainfrom
Zelys-DFKH:fix/compaction-max-tokens-streaming-error
Open

fix(BetaToolRunner): cap max_tokens in compaction summary request#958
Zelys-DFKH wants to merge 1 commit intoanthropics:mainfrom
Zelys-DFKH:fix/compaction-max-tokens-streaming-error

Conversation

@Zelys-DFKH
Copy link
Copy Markdown

Problem

#checkAndCompact() passes this.#state.params.max_tokens directly to a
non-streaming messages.create() call. The SDK's pre-flight timeout check
throws "Streaming is required" when max_tokens exceeds ~21,333 (or ~8,192
for Opus models). Tool runners routinely use values well above that threshold,
so compaction fails before the request goes out.

Fix

Cap the compaction request at a new COMPACTION_SUMMARY_MAX_TOKENS = 4096
constant (exported from CompactionControl.ts). Summaries don't need more
than a few thousand tokens, and 4096 stays within the non-streaming limit for
every model.

Fixes #863

The compaction summary is a non-streaming messages.create() call. When the
tool runner's max_tokens is large (e.g. 64000), the SDK's pre-flight timeout
check throws before the request is made:

  "Streaming is required for operations that may take longer than 10 minutes"

The formula is expectedTime = (60min * max_tokens) / 128000, so the error
fires for max_tokens > ~21333 (or > 8192 on Opus models), well within the
range of typical tool-runner configurations.

A compaction summary only needs a few thousand tokens. Cap max_tokens at the
new COMPACTION_SUMMARY_MAX_TOKENS constant (4096) so the request always stays
within the non-streaming allowed range for every model.

Fixes anthropics#863
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues using compactionControl

1 participant