Summary
The Gemini (genai) streaming response aggregator in postprocessStreamingResults() captures the text content of thought: true parts but discards the thought boolean annotation, merging thinking text with answer text into a single indistinguishable string. The non-streaming path correctly preserves thought annotations because it passes the full response through as-is.
This is distinct from #93 (which covers entirely non-text part types like functionCall being dropped). Here, the text IS captured but its thought: true metadata is lost, making it impossible to distinguish model reasoning from the final answer in streaming traces.
What is missing
1. Streaming aggregation discards the thought boolean on text parts
In trace/contrib/genai/generatecontent.go (lines 246–253), the streaming chunk iterator only extracts the text field from parts:
for _, p := range parts {
part, ok := p.(map[string]any)
if !ok {
continue
}
if text, ok := part["text"].(string); ok {
textParts = append(textParts, text)
}
}
Parts with thought: true have their text extracted but the thought annotation is ignored. All text — thought and answer — is concatenated together (line 264):
"parts": []any{
map[string]any{
"text": strings.Join(textParts, ""),
},
},
The final aggregated output is always a single text part with no thought field, making it impossible to distinguish model reasoning from the answer.
2. Non-streaming path preserves thought annotations
handleResponse() (line 310) sets the entire raw response as braintrust.output_json, preserving the thought: true field on thinking parts:
if err := internal.SetJSONAttr(span, "braintrust.output_json", raw); err != nil {
return err
}
3. VCR cassettes confirm the thought: true response format
The existing cassette TestGenerateContentWithThinking.yaml confirms Gemini returns thought parts as text parts with a thought: true boolean:
{
"text": "Let me analyze the sequence...",
"thought": true
},
{
"text": "The formula for the nth term is n(n+1)."
}
4. No streaming test for thinking
The TestGenerateContentWithThinking test only covers the non-streaming path. There is no TestStreamingGenerateContentWithThinking test, so this gap is untested.
5. Comparable integrations preserve thinking in streaming
- The Anthropic streaming aggregator handles
thinking_delta as a distinct content block type with type: "thinking" (lines 269–276 of messages.go)
- The Bedrock ConverseStream tracer handles
ReasoningContentBlockDelta as a separate "reasoning" block type (lines 171–176 of stream.go)
- Both preserve the distinction between thinking/reasoning content and answer content in streaming traces
6. Impact
- Streaming traces with thinking enabled show a single blob of text with no way to tell which part is reasoning vs answer
- Token metrics correctly capture
completion_reasoning_tokens (from thoughtsTokenCount), so users see reasoning tokens charged but can't see the corresponding reasoning text separately
- The
thinkingConfig request parameter IS correctly captured in metadata, creating an inconsistency: the trace shows thinking was requested, reports reasoning tokens, but doesn't show the thinking content distinctly
Braintrust docs status
Braintrust docs do not specifically address Gemini thinking/thought tracing. The Gemini integration in this SDK correctly captures thinkingConfig in request metadata and thoughtsTokenCount in metrics. The streaming output gap is an internal consistency issue. Status: supported (thinking is a standard generative execution feature; the streaming gap is an internal consistency issue).
Upstream sources
- Gemini thinking docs: https://ai.google.dev/gemini-api/docs/thinking — documents
thought: true on parts in both streaming and non-streaming responses
- Gemini API
Part object: text parts can include a thought boolean field
streamGenerateContent uses the same GenerateContentResponse structure as generateContent
- Gemini 2.5 Pro and Flash models support thinking with
ThinkingConfig
Braintrust docs sources
Local repo files inspected
trace/contrib/genai/generatecontent.go — postprocessStreamingResults() (lines 204–298): only captures text field, ignores thought boolean; final output is a single text part with no thought annotation
trace/contrib/genai/generatecontent.go — handleResponse() (line 310): non-streaming path preserves all part fields including thought
trace/contrib/genai/generatecontent.go — request metadata (line 78): correctly captures thinkingConfig
trace/contrib/genai/generatecontent.go — parseUsageTokens() (line 360): correctly maps thoughtsTokenCount to completion_reasoning_tokens
trace/contrib/genai/tracegenai_test.go — TestGenerateContentWithThinking (line 158): only tests non-streaming
trace/contrib/genai/testdata/cassettes/TestGenerateContentWithThinking.yaml — confirms thought: true in response parts
trace/contrib/anthropic/messages.go — reference: handles thinking_delta as distinct block type in streaming
trace/contrib/bedrockruntime/stream.go — reference: handles reasoning blocks distinctly in streaming
Summary
The Gemini (
genai) streaming response aggregator inpostprocessStreamingResults()captures the text content ofthought: trueparts but discards thethoughtboolean annotation, merging thinking text with answer text into a single indistinguishable string. The non-streaming path correctly preserves thought annotations because it passes the full response through as-is.This is distinct from #93 (which covers entirely non-text part types like
functionCallbeing dropped). Here, the text IS captured but itsthought: truemetadata is lost, making it impossible to distinguish model reasoning from the final answer in streaming traces.What is missing
1. Streaming aggregation discards the
thoughtboolean on text partsIn
trace/contrib/genai/generatecontent.go(lines 246–253), the streaming chunk iterator only extracts thetextfield from parts:Parts with
thought: truehave their text extracted but thethoughtannotation is ignored. All text — thought and answer — is concatenated together (line 264):The final aggregated output is always a single text part with no
thoughtfield, making it impossible to distinguish model reasoning from the answer.2. Non-streaming path preserves thought annotations
handleResponse()(line 310) sets the entire raw response asbraintrust.output_json, preserving thethought: truefield on thinking parts:3. VCR cassettes confirm the
thought: trueresponse formatThe existing cassette
TestGenerateContentWithThinking.yamlconfirms Gemini returns thought parts as text parts with athought: trueboolean:{ "text": "Let me analyze the sequence...", "thought": true }, { "text": "The formula for the nth term is n(n+1)." }4. No streaming test for thinking
The
TestGenerateContentWithThinkingtest only covers the non-streaming path. There is noTestStreamingGenerateContentWithThinkingtest, so this gap is untested.5. Comparable integrations preserve thinking in streaming
thinking_deltaas a distinct content block type withtype: "thinking"(lines 269–276 ofmessages.go)ReasoningContentBlockDeltaas a separate"reasoning"block type (lines 171–176 ofstream.go)6. Impact
completion_reasoning_tokens(fromthoughtsTokenCount), so users see reasoning tokens charged but can't see the corresponding reasoning text separatelythinkingConfigrequest parameter IS correctly captured in metadata, creating an inconsistency: the trace shows thinking was requested, reports reasoning tokens, but doesn't show the thinking content distinctlyBraintrust docs status
Braintrust docs do not specifically address Gemini thinking/thought tracing. The Gemini integration in this SDK correctly captures
thinkingConfigin request metadata andthoughtsTokenCountin metrics. The streaming output gap is an internal consistency issue. Status: supported (thinking is a standard generative execution feature; the streaming gap is an internal consistency issue).Upstream sources
thought: trueon parts in both streaming and non-streaming responsesPartobject: text parts can include athoughtboolean fieldstreamGenerateContentuses the sameGenerateContentResponsestructure asgenerateContentThinkingConfigBraintrust docs sources
Local repo files inspected
trace/contrib/genai/generatecontent.go—postprocessStreamingResults()(lines 204–298): only capturestextfield, ignoresthoughtboolean; final output is a single text part with nothoughtannotationtrace/contrib/genai/generatecontent.go—handleResponse()(line 310): non-streaming path preserves all part fields includingthoughttrace/contrib/genai/generatecontent.go— request metadata (line 78): correctly capturesthinkingConfigtrace/contrib/genai/generatecontent.go—parseUsageTokens()(line 360): correctly mapsthoughtsTokenCounttocompletion_reasoning_tokenstrace/contrib/genai/tracegenai_test.go—TestGenerateContentWithThinking(line 158): only tests non-streamingtrace/contrib/genai/testdata/cassettes/TestGenerateContentWithThinking.yaml— confirmsthought: truein response partstrace/contrib/anthropic/messages.go— reference: handlesthinking_deltaas distinct block type in streamingtrace/contrib/bedrockruntime/stream.go— reference: handles reasoning blocks distinctly in streaming