Skip to content

Fix token count for ollama cloud#1566

Closed
ForceConstant wants to merge 1 commit intomnfst:mainfrom
ForceConstant:fix/ollama-streaming-token-usage
Closed

Fix token count for ollama cloud#1566
ForceConstant wants to merge 1 commit intomnfst:mainfrom
ForceConstant:fix/ollama-streaming-token-usage

Conversation

@ForceConstant
Copy link
Copy Markdown

@ForceConstant ForceConstant commented Apr 14, 2026

Related to #1502

  1. Added include_usage to stream_options.
  2. Capture usage when stream is done.
  3. Added Test Cases.

Summary by cubic

Fixes incorrect token usage reporting for streaming responses from ollama and ollama-cloud. Always requests usage in streams and captures final usage even if the SSE stream ends without a trailing newline.

  • Bug Fixes
    • Force stream_options.include_usage: true for ollama and ollama-cloud in sanitizeOpenAiBody.
    • Capture usage from the leftover SSE buffer at stream end (handles missing trailing newline) in pipeStream.
    • Added tests for converter injection and end-of-stream usage parsing.

Written for commit 6ec0456. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/backend/src/routing/proxy/stream-writer.ts">

<violation number="1" location="packages/backend/src/routing/proxy/stream-writer.ts:151">
P2: End-of-stream passthrough parsing only strips `data: ` and can miss usage when SSE lines are `data:<json>` without a space.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

const payload = passthroughBuffer
.trim()
.split('\n')
.map((line) => (line.startsWith('data: ') ? line.slice(6) : line))
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: End-of-stream passthrough parsing only strips data: and can miss usage when SSE lines are data:<json> without a space.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/backend/src/routing/proxy/stream-writer.ts, line 151:

<comment>End-of-stream passthrough parsing only strips `data: ` and can miss usage when SSE lines are `data:<json>` without a space.</comment>

<file context>
@@ -144,6 +144,30 @@ export async function pipeStream(
+      const payload = passthroughBuffer
+        .trim()
+        .split('\n')
+        .map((line) => (line.startsWith('data: ') ? line.slice(6) : line))
+        .join('\n')
+        .trim();
</file context>
Suggested change
.map((line) => (line.startsWith('data: ') ? line.slice(6) : line))
.map((line) => (line.startsWith('data:') ? line.slice(5).trimStart() : line))
Fix with Cubic

@SebConejo
Copy link
Copy Markdown
Member

Hey, thanks for working on this! I opened #1567 in the same time which covers the same bugs with a broader fix.

The main difference: your PR injects stream_options for Ollama, but Ollama already sends usage by default. The real gap is OpenAI and OpenRouter, they only include usage in streams when stream_options.include_usage: true is in the request. My PR targets those instead, plus merges with existing options rather than overwriting.

The passthrough buffer flush is nearly identical, great catch on that.

-> Closing in favor of #1567. Thanks again

@SebConejo SebConejo closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants