Skip to content

fix(llm): map max_completion_tokens to max_output_tokens for Responses API#438

Merged
pancacake merged 1 commit into
HKUDS:devfrom
truffle-dev:fix/responses-api-max-completion-tokens-mapping
May 3, 2026
Merged

fix(llm): map max_completion_tokens to max_output_tokens for Responses API#438
pancacake merged 1 commit into
HKUDS:devfrom
truffle-dev:fix/responses-api-max-completion-tokens-mapping

Conversation

@truffle-dev

Copy link
Copy Markdown
Contributor

What

When the Responses API path is selected for newer OpenAI models (gpt-5.x, o1, o3, o4) and an extra kwarg of `max_completion_tokens` flows in, the OpenAI SDK raises `TypeError` from `responses.create()` before any HTTP request leaves the client. Closes #437.

Why

`get_token_limit_kwargs(model, n)` returns `{"max_completion_tokens": n}` for newer models (the Chat Completions name). Both `OpenAICompatProvider` and `AzureOpenAIProvider` route those kwargs through `client.responses.create()` when `_should_use_responses_api()` matches, but the Responses API only accepts `max_output_tokens`. The SDK rejects the unknown kwarg with `TypeError` inside the client, and `_should_fallback_from_responses_error()` only catches HTTP errors with `status_code` — so the call is never retried via Chat Completions.

The reporter in #437 hit this with `LLM_BINDING = openai` and `LLM_MODEL = gpt-5.5` on v1.3.5, which routes through `OpenAICompatProvider`.

How

Add `adapt_chat_kwargs_to_responses()` to `provider_core/openai_responses/converters.py` and use it at all four merge sites: `chat` and `chat_stream` on both `OpenAICompatProvider` (lines 698, 781) and `AzureOpenAIProvider` (lines 126, 154).

The helper:

  • preserves the existing `None`-drop semantics of the previous dict-comprehension merge,
  • maps `max_completion_tokens` → `max_output_tokens`,
  • only applies the alias when `max_output_tokens` is not explicitly set by the caller,
  • does not mutate the input.

Tests

`tests/services/llm/test_openai_responses_converters.py` covers seven cases: passthrough, None-drop, the rename, None for the alias, explicit-name precedence, empty input, and input non-mutation. `pytest tests/services/llm/test_openai_responses_converters.py -v` is green; the rest of `tests/services/llm/` and `tests/services/config/test_llm_probe_config.py` pass with no new failures from this branch (75/76; the one unrelated failure is a missing `data/user/settings/agents.yaml` fixture).

Note on #390

There is an existing open PR (#390, "[codex] Normalize Azure OpenAI max token aliases") that addresses a related problem at the factory layer for Azure only by translating to `max_tokens`. The reproduction in #437 uses `LLM_BINDING = openai`, which goes through `OpenAICompatProvider` and is outside that PR's scope. Happy to defer if a different shape is preferred.

…s API

When the Responses API path is selected for newer OpenAI models
(gpt-5.x, o1, o3, o4) and an extra kwarg of `max_completion_tokens`
flows in (typically from `get_token_limit_kwargs(model, n)`), the
OpenAI SDK raises `TypeError` from `responses.create()` before any
HTTP request leaves the client.

`_should_fallback_from_responses_error()` only catches HTTP errors
that carry a `status_code`, so the call is never retried via Chat
Completions; the user just sees a stack trace. The bug exists on
both OpenAICompatProvider and AzureOpenAIProvider, in the chat and
chat_stream paths (4 sites total).

Add `adapt_chat_kwargs_to_responses()` that translates the alias to
`max_output_tokens` and is used at all four merge sites. The helper:
- preserves the existing `None`-drop semantics of the previous
  dict-comprehension merge,
- only applies the alias when `max_output_tokens` is not explicitly
  set by the caller,
- does not mutate the input.

Closes HKUDS#437.
@pancacake

Copy link
Copy Markdown
Collaborator

Thanks for your contribution!

@pancacake pancacake merged commit abe8020 into HKUDS:dev May 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants