Skip to content

Fix/495 ollama thinking#506

Merged
dosco merged 2 commits intoax-llm:mainfrom
YizukiAme:fix/495-ollama-thinking
Apr 10, 2026
Merged

Fix/495 ollama thinking#506
dosco merged 2 commits intoax-llm:mainfrom
YizukiAme:fix/495-ollama-thinking

Conversation

@YizukiAme
Copy link
Copy Markdown
Contributor

Closes #495

Changes

New: chatRespProcessor / chatStreamRespProcessor callbacks

Added two optional callbacks to AxAIOpenAIBaseArgs (following the existing chatReqUpdater pattern), allowing providers to post-process responses without subclassing
AxAIOpenAIImpl.

Ollama think parameter support

  • thinkingTokenBudget: 'none' → sends think: false (disables thinking, reduces latency)
  • Any other budget value → sends think: true
  • hasThinkingBudget and hasShowThoughts both set to true

<think> tag extraction

When Ollama returns thinking content inline as <think>...</think>, it is now extracted into the thought field and removed from content:

  • Non-streaming: regex extraction after full response
  • Streaming: stateful chunk-by-chunk routing via processThinkStreamChunk()

Testing

  • test:unit: 1854 passed
  • test:type-check: clean
  • test:lint: clean

When think: true is passed to Ollama, thinking models (Qwen3 etc.)
return thought content inline as <think>...</think> in the response
body. This commit:

- Adds chatRespProcessor / chatStreamRespProcessor callbacks to
  AxAIOpenAIBaseArgs so providers can post-process responses without
  subclassing AxAIOpenAIImpl
- Implements extractThinkTags() for non-streaming responses
- Implements processThinkStreamChunk() with stateful streaming support
- Sets hasShowThoughts: true for Ollama
- Adds tests for both tag extraction and the no-tag pass-through case
@dosco dosco merged commit b1fcdf3 into ax-llm:main Apr 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Support think parameter for Ollama thinking models (Qwen 3/3.5, etc.)

2 participants