[bot] Gemini streaming aggregation loses `thought: true` annotation, merging thinking and answer text

## Summary

The Gemini (`genai`) streaming response aggregator in `postprocessStreamingResults()` captures the text content of `thought: true` parts but discards the `thought` boolean annotation, merging thinking text with answer text into a single indistinguishable string. The non-streaming path correctly preserves thought annotations because it passes the full response through as-is.

This is distinct from #93 (which covers entirely non-text part types like `functionCall` being dropped). Here, the text IS captured but its `thought: true` metadata is lost, making it impossible to distinguish model reasoning from the final answer in streaming traces.

## What is missing

### 1. Streaming aggregation discards the `thought` boolean on text parts

In `trace/contrib/genai/generatecontent.go` (lines 246–253), the streaming chunk iterator only extracts the `text` field from parts:

```go
for _, p := range parts {
    part, ok := p.(map[string]any)
    if !ok {
        continue
    }
    if text, ok := part["text"].(string); ok {
        textParts = append(textParts, text)
    }
}
```

Parts with `thought: true` have their text extracted but the `thought` annotation is ignored. All text — thought and answer — is concatenated together (line 264):

```go
"parts": []any{
    map[string]any{
        "text": strings.Join(textParts, ""),
    },
},
```

The final aggregated output is always a single text part with no `thought` field, making it impossible to distinguish model reasoning from the answer.

### 2. Non-streaming path preserves thought annotations

`handleResponse()` (line 310) sets the entire raw response as `braintrust.output_json`, preserving the `thought: true` field on thinking parts:

```go
if err := internal.SetJSONAttr(span, "braintrust.output_json", raw); err != nil {
    return err
}
```

### 3. VCR cassettes confirm the `thought: true` response format

The existing cassette `TestGenerateContentWithThinking.yaml` confirms Gemini returns thought parts as text parts with a `thought: true` boolean:

```json
{
    "text": "Let me analyze the sequence...",
    "thought": true
},
{
    "text": "The formula for the nth term is n(n+1)."
}
```

### 4. No streaming test for thinking

The `TestGenerateContentWithThinking` test only covers the non-streaming path. There is no `TestStreamingGenerateContentWithThinking` test, so this gap is untested.

### 5. Comparable integrations preserve thinking in streaming

- The **Anthropic** streaming aggregator handles `thinking_delta` as a distinct content block type with `type: "thinking"` (lines 269–276 of `messages.go`)
- The **Bedrock ConverseStream** tracer handles `ReasoningContentBlockDelta` as a separate `"reasoning"` block type (lines 171–176 of `stream.go`)
- Both preserve the distinction between thinking/reasoning content and answer content in streaming traces

### 6. Impact

- Streaming traces with thinking enabled show a single blob of text with no way to tell which part is reasoning vs answer
- Token metrics correctly capture `completion_reasoning_tokens` (from `thoughtsTokenCount`), so users see reasoning tokens charged but can't see the corresponding reasoning text separately
- The `thinkingConfig` request parameter IS correctly captured in metadata, creating an inconsistency: the trace shows thinking was requested, reports reasoning tokens, but doesn't show the thinking content distinctly

## Braintrust docs status

Braintrust docs do not specifically address Gemini thinking/thought tracing. The Gemini integration in this SDK correctly captures `thinkingConfig` in request metadata and `thoughtsTokenCount` in metrics. The streaming output gap is an internal consistency issue. Status: **supported** (thinking is a standard generative execution feature; the streaming gap is an internal consistency issue).

## Upstream sources

- Gemini thinking docs: https://ai.google.dev/gemini-api/docs/thinking — documents `thought: true` on parts in both streaming and non-streaming responses
- Gemini API `Part` object: text parts can include a `thought` boolean field
- `streamGenerateContent` uses the same `GenerateContentResponse` structure as `generateContent`
- Gemini 2.5 Pro and Flash models support thinking with `ThinkingConfig`

## Braintrust docs sources

- https://www.braintrust.dev/docs/integrations/ai-providers/gemini (Gemini integration overview)
- https://www.braintrust.dev/docs/instrument/trace-llm-calls (general LLM tracing)

## Local repo files inspected

- `trace/contrib/genai/generatecontent.go` — `postprocessStreamingResults()` (lines 204–298): only captures `text` field, ignores `thought` boolean; final output is a single text part with no `thought` annotation
- `trace/contrib/genai/generatecontent.go` — `handleResponse()` (line 310): non-streaming path preserves all part fields including `thought`
- `trace/contrib/genai/generatecontent.go` — request metadata (line 78): correctly captures `thinkingConfig`
- `trace/contrib/genai/generatecontent.go` — `parseUsageTokens()` (line 360): correctly maps `thoughtsTokenCount` to `completion_reasoning_tokens`
- `trace/contrib/genai/tracegenai_test.go` — `TestGenerateContentWithThinking` (line 158): only tests non-streaming
- `trace/contrib/genai/testdata/cassettes/TestGenerateContentWithThinking.yaml` — confirms `thought: true` in response parts
- `trace/contrib/anthropic/messages.go` — reference: handles `thinking_delta` as distinct block type in streaming
- `trace/contrib/bedrockruntime/stream.go` — reference: handles reasoning blocks distinctly in streaming

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Gemini streaming aggregation loses `thought: true` annotation, merging thinking and answer text #126

Summary

What is missing

1. Streaming aggregation discards the `thought` boolean on text parts

2. Non-streaming path preserves thought annotations

3. VCR cassettes confirm the `thought: true` response format

4. No streaming test for thinking

5. Comparable integrations preserve thinking in streaming

6. Impact

Braintrust docs status

Upstream sources

Braintrust docs sources

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bot] Gemini streaming aggregation loses thought: true annotation, merging thinking and answer text #126

Description

Summary

What is missing

1. Streaming aggregation discards the thought boolean on text parts

2. Non-streaming path preserves thought annotations

3. VCR cassettes confirm the thought: true response format

4. No streaming test for thinking

5. Comparable integrations preserve thinking in streaming

6. Impact

Braintrust docs status

Upstream sources

Braintrust docs sources

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[bot] Gemini streaming aggregation loses `thought: true` annotation, merging thinking and answer text #126

1. Streaming aggregation discards the `thought` boolean on text parts

3. VCR cassettes confirm the `thought: true` response format