wrapClaudeAgentSDK: sub-agent anthropic.messages.create span gets the ROOT conversation as `input` (wrong)

## Summary

When a Claude Agent SDK run dispatches a sub-agent (`Task`/`Agent` tool), the `anthropic.messages.create` span created **inside the sub-agent** is logged with the **wrong `input`**: it contains the root conversation (the root user prompt + the orchestrator's assistant turn that dispatched the sub-agent) instead of the sub-agent's own sidechain prompt (`"You are subagent ALPHA ..."`).

The span's `output` is correct (the sub-agent's actual turn), so within a single span **`input` and `output` belong to different conversations**. With multiple sub-agents dispatched in parallel it's also racy: their inputs interleave.

- `braintrust@3.17.0` (latest), `@anthropic-ai/claude-agent-sdk@0.3.162`

## Repro

See `repro.mjs` (single sub-agent is enough). After running, open the trace and inspect the sub-agent's child `anthropic.messages.create` span — its `Input` is the root prompt, not the sub-agent's prompt.

Ground truth for comparison: the SDK itself records the sub-agent's real input in its sidechain transcript (`~/.claude/projects/<slug>/<sessionId>/subagents/agent-<id>.jsonl`, first line, `isSidechain:true`) — it correctly shows `"You are subagent ALPHA ..."`. So the SDK is fine; only the span reconstruction is wrong.

## Root cause

In `dist/instrumentation/index.js` (plugin: `claude-agent-sdk-plugin`):

`buildLLMInput()` reconstructs every LLM span's input as `[...capturedPromptMessages, ...conversationHistory]`:

```js
function buildLLMInput(prompt, conversationHistory, capturedPromptMessages) {
  const promptMessages = [];
  if (typeof prompt === "string") promptMessages.push({ content: prompt, role: "user" });
  else if (capturedPromptMessages?.length) {
    for (const msg of capturedPromptMessages) {
      const role = msg.message?.role, content = msg.message?.content;
      if (role && content !== void 0) promptMessages.push({ content, role });
    }
  }
  return [...promptMessages, ...conversationHistory].length ? [...promptMessages, ...conversationHistory] : void 0;
}
```

Both inputs to this function are **single, run-global buffers — not keyed by `parentToolUseId`/sidechain**:

1. `capturedPromptMessages` is captured once, from the top-level `params.prompt` stream in the `start` handler. It is always the **root** prompt.
2. `conversationHistory` is `state.finalResults`, a single array on `state`. `finalizeCurrentMessageGroup` pushes **every** finalized group's final message into it regardless of parent:

```js
async function finalizeCurrentMessageGroup(state) {
  const parentToolUseId = state.currentMessages[0]?.parent_tool_use_id ?? null;
  const parentKey = llmParentKey(parentToolUseId);
  // ...parent span is correctly resolved per sub-agent...
  const llmSpanResult = await createLLMSpanForMessages(
    state.currentMessages,
    state.originalPrompt,
    state.finalResults,          // <-- shared across ALL parents (root + every sub-agent)
    state.options,
    state.currentMessageStartTime,
    state.capturedPromptMessages, // <-- always the ROOT prompt
    parentSpan,
    existingLlmSpan,
  );
  if (llmSpanResult?.finalMessage) state.finalResults.push(llmSpanResult.finalMessage); // appended globally
  // ...
}
```

So while the span **parent** is correctly scoped by `parentToolUseId` (nesting is right), the **input content** is always rebuilt from the root prompt + the global running conversation. The sub-agent's own sidechain prompt (`parent_tool_use_id != null`, `isSidechain:true`) is never used as that span's input. `output` is built from `state.currentMessages` (the actual current group), which is why output is right and input is wrong.

## Suggested fix

Key the reconstructed input by `parentToolUseId`/sidechain instead of using run-global buffers:

- Track a separate `conversationHistory` (and the originating prompt) **per `parentKey = llmParentKey(parentToolUseId)`**, e.g. `Map<parentKey, Message[]>`, appending `finalMessage` only to its own parent's bucket.
- For a sub-agent's first group, seed the input from the sub-agent's own first sidechain `user` message (the SDK already tags these with `parent_tool_use_id` and `isSidechain`) rather than from `capturedPromptMessages` (which is the root prompt).

This mirrors how the parent-span resolution already buckets by `parentToolUseId` (`activeLlmSpansByParentToolUse`, `latestLlmParentBySubAgentToolUse`); the input/history reconstruction just needs the same per-parent scoping.

## Related

Issue #1655 (tool spans nesting under the wrong parent for the Claude Agent SDK JS plugin) touches the same plugin and parallel-subagent parentage, but is a distinct problem (span *parent* vs. span *input content*).

---

*Created by Claude, reviewed by @alberduris. The "Summary"/"Root cause" were verified against the dist source and the SDK's sidechain transcript (ground truth). The "Suggested fix" is Claude's own and was **not** verified — treat it as a hint, possibly slop; take with a grain of salt.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrapClaudeAgentSDK: sub-agent anthropic.messages.create span gets the ROOT conversation as `input` (wrong) #2103

Summary

Repro

Root cause

Suggested fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

wrapClaudeAgentSDK: sub-agent anthropic.messages.create span gets the ROOT conversation as input (wrong) #2103

Description

Summary

Repro

Root cause

Suggested fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

wrapClaudeAgentSDK: sub-agent anthropic.messages.create span gets the ROOT conversation as `input` (wrong) #2103