Skip to content

[bot] Gemini streaming aggregation loses thought: true annotation, merging thinking and answer text #126

@braintrust-bot

Description

@braintrust-bot

Summary

The Gemini (genai) streaming response aggregator in postprocessStreamingResults() captures the text content of thought: true parts but discards the thought boolean annotation, merging thinking text with answer text into a single indistinguishable string. The non-streaming path correctly preserves thought annotations because it passes the full response through as-is.

This is distinct from #93 (which covers entirely non-text part types like functionCall being dropped). Here, the text IS captured but its thought: true metadata is lost, making it impossible to distinguish model reasoning from the final answer in streaming traces.

What is missing

1. Streaming aggregation discards the thought boolean on text parts

In trace/contrib/genai/generatecontent.go (lines 246–253), the streaming chunk iterator only extracts the text field from parts:

for _, p := range parts {
    part, ok := p.(map[string]any)
    if !ok {
        continue
    }
    if text, ok := part["text"].(string); ok {
        textParts = append(textParts, text)
    }
}

Parts with thought: true have their text extracted but the thought annotation is ignored. All text — thought and answer — is concatenated together (line 264):

"parts": []any{
    map[string]any{
        "text": strings.Join(textParts, ""),
    },
},

The final aggregated output is always a single text part with no thought field, making it impossible to distinguish model reasoning from the answer.

2. Non-streaming path preserves thought annotations

handleResponse() (line 310) sets the entire raw response as braintrust.output_json, preserving the thought: true field on thinking parts:

if err := internal.SetJSONAttr(span, "braintrust.output_json", raw); err != nil {
    return err
}

3. VCR cassettes confirm the thought: true response format

The existing cassette TestGenerateContentWithThinking.yaml confirms Gemini returns thought parts as text parts with a thought: true boolean:

{
    "text": "Let me analyze the sequence...",
    "thought": true
},
{
    "text": "The formula for the nth term is n(n+1)."
}

4. No streaming test for thinking

The TestGenerateContentWithThinking test only covers the non-streaming path. There is no TestStreamingGenerateContentWithThinking test, so this gap is untested.

5. Comparable integrations preserve thinking in streaming

  • The Anthropic streaming aggregator handles thinking_delta as a distinct content block type with type: "thinking" (lines 269–276 of messages.go)
  • The Bedrock ConverseStream tracer handles ReasoningContentBlockDelta as a separate "reasoning" block type (lines 171–176 of stream.go)
  • Both preserve the distinction between thinking/reasoning content and answer content in streaming traces

6. Impact

  • Streaming traces with thinking enabled show a single blob of text with no way to tell which part is reasoning vs answer
  • Token metrics correctly capture completion_reasoning_tokens (from thoughtsTokenCount), so users see reasoning tokens charged but can't see the corresponding reasoning text separately
  • The thinkingConfig request parameter IS correctly captured in metadata, creating an inconsistency: the trace shows thinking was requested, reports reasoning tokens, but doesn't show the thinking content distinctly

Braintrust docs status

Braintrust docs do not specifically address Gemini thinking/thought tracing. The Gemini integration in this SDK correctly captures thinkingConfig in request metadata and thoughtsTokenCount in metrics. The streaming output gap is an internal consistency issue. Status: supported (thinking is a standard generative execution feature; the streaming gap is an internal consistency issue).

Upstream sources

  • Gemini thinking docs: https://ai.google.dev/gemini-api/docs/thinking — documents thought: true on parts in both streaming and non-streaming responses
  • Gemini API Part object: text parts can include a thought boolean field
  • streamGenerateContent uses the same GenerateContentResponse structure as generateContent
  • Gemini 2.5 Pro and Flash models support thinking with ThinkingConfig

Braintrust docs sources

Local repo files inspected

  • trace/contrib/genai/generatecontent.gopostprocessStreamingResults() (lines 204–298): only captures text field, ignores thought boolean; final output is a single text part with no thought annotation
  • trace/contrib/genai/generatecontent.gohandleResponse() (line 310): non-streaming path preserves all part fields including thought
  • trace/contrib/genai/generatecontent.go — request metadata (line 78): correctly captures thinkingConfig
  • trace/contrib/genai/generatecontent.goparseUsageTokens() (line 360): correctly maps thoughtsTokenCount to completion_reasoning_tokens
  • trace/contrib/genai/tracegenai_test.goTestGenerateContentWithThinking (line 158): only tests non-streaming
  • trace/contrib/genai/testdata/cassettes/TestGenerateContentWithThinking.yaml — confirms thought: true in response parts
  • trace/contrib/anthropic/messages.go — reference: handles thinking_delta as distinct block type in streaming
  • trace/contrib/bedrockruntime/stream.go — reference: handles reasoning blocks distinctly in streaming

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions