Skip to content

[Bug] Output Token Count Is Zero for Streaming LLM Responses #91

Description

@anshul23102

Summary

When the application uses streaming mode, TokenFirewall records outputTokens: 0 because it does not accumulate tokens from streamed response chunks. This significantly undercounts actual cost.

Current Behavior

For SSE streaming responses, inputTokens is recorded correctly from the request body. outputTokens is always 0 because the middleware does not process the streamed response body.

Expected Behavior

TokenFirewall should track output tokens for streaming responses by either reading the final usage chunk (when the provider includes it) or counting accumulated response text.

Proposed Fix

async function handleStreamingResponse(
  response: ReadableStream,
  metadata: RequestMetadata,
  adapter: ProviderAdapter
): Promise<void> {
  let outputText = '';
  for await (const chunk of parseSSEStream(response)) {
    outputText += chunk.delta?.content ?? '';
    if (chunk.usage) {
      await recordUsage(metadata, { outputTokens: chunk.usage.completion_tokens });
      return;
    }
  }
  const outputTokens = await adapter.countTokens(outputText, metadata.model);
  await recordUsage(metadata, { outputTokens });
}

For OpenAI, pass stream_options: { include_usage: true } so the final SSE chunk contains usage data, avoiding the need for manual token counting.

Acceptance Criteria

  • Streaming responses record correct outputTokens in the usage log.
  • When the provider includes usage in the final SSE chunk, that value is used directly.
  • When no usage chunk is present, token count is estimated from accumulated output text.
  • Streaming functionality and pass-through to the client are not disrupted.
  • A test using a mock SSE stream verifies that output tokens are correctly recorded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions