Summary
When a Claude Agent SDK run dispatches a sub-agent (Task/Agent tool), the anthropic.messages.create span created inside the sub-agent is logged with the wrong input: it contains the root conversation (the root user prompt + the orchestrator's assistant turn that dispatched the sub-agent) instead of the sub-agent's own sidechain prompt ("You are subagent ALPHA ...").
The span's output is correct (the sub-agent's actual turn), so within a single span input and output belong to different conversations. With multiple sub-agents dispatched in parallel it's also racy: their inputs interleave.
braintrust@3.17.0 (latest), @anthropic-ai/claude-agent-sdk@0.3.162
Repro
See repro.mjs (single sub-agent is enough). After running, open the trace and inspect the sub-agent's child anthropic.messages.create span — its Input is the root prompt, not the sub-agent's prompt.
Ground truth for comparison: the SDK itself records the sub-agent's real input in its sidechain transcript (~/.claude/projects/<slug>/<sessionId>/subagents/agent-<id>.jsonl, first line, isSidechain:true) — it correctly shows "You are subagent ALPHA ...". So the SDK is fine; only the span reconstruction is wrong.
Root cause
In dist/instrumentation/index.js (plugin: claude-agent-sdk-plugin):
buildLLMInput() reconstructs every LLM span's input as [...capturedPromptMessages, ...conversationHistory]:
function buildLLMInput(prompt, conversationHistory, capturedPromptMessages) {
const promptMessages = [];
if (typeof prompt === "string") promptMessages.push({ content: prompt, role: "user" });
else if (capturedPromptMessages?.length) {
for (const msg of capturedPromptMessages) {
const role = msg.message?.role, content = msg.message?.content;
if (role && content !== void 0) promptMessages.push({ content, role });
}
}
return [...promptMessages, ...conversationHistory].length ? [...promptMessages, ...conversationHistory] : void 0;
}
Both inputs to this function are single, run-global buffers — not keyed by parentToolUseId/sidechain:
capturedPromptMessages is captured once, from the top-level params.prompt stream in the start handler. It is always the root prompt.
conversationHistory is state.finalResults, a single array on state. finalizeCurrentMessageGroup pushes every finalized group's final message into it regardless of parent:
async function finalizeCurrentMessageGroup(state) {
const parentToolUseId = state.currentMessages[0]?.parent_tool_use_id ?? null;
const parentKey = llmParentKey(parentToolUseId);
// ...parent span is correctly resolved per sub-agent...
const llmSpanResult = await createLLMSpanForMessages(
state.currentMessages,
state.originalPrompt,
state.finalResults, // <-- shared across ALL parents (root + every sub-agent)
state.options,
state.currentMessageStartTime,
state.capturedPromptMessages, // <-- always the ROOT prompt
parentSpan,
existingLlmSpan,
);
if (llmSpanResult?.finalMessage) state.finalResults.push(llmSpanResult.finalMessage); // appended globally
// ...
}
So while the span parent is correctly scoped by parentToolUseId (nesting is right), the input content is always rebuilt from the root prompt + the global running conversation. The sub-agent's own sidechain prompt (parent_tool_use_id != null, isSidechain:true) is never used as that span's input. output is built from state.currentMessages (the actual current group), which is why output is right and input is wrong.
Suggested fix
Key the reconstructed input by parentToolUseId/sidechain instead of using run-global buffers:
- Track a separate
conversationHistory (and the originating prompt) per parentKey = llmParentKey(parentToolUseId), e.g. Map<parentKey, Message[]>, appending finalMessage only to its own parent's bucket.
- For a sub-agent's first group, seed the input from the sub-agent's own first sidechain
user message (the SDK already tags these with parent_tool_use_id and isSidechain) rather than from capturedPromptMessages (which is the root prompt).
This mirrors how the parent-span resolution already buckets by parentToolUseId (activeLlmSpansByParentToolUse, latestLlmParentBySubAgentToolUse); the input/history reconstruction just needs the same per-parent scoping.
Related
Issue #1655 (tool spans nesting under the wrong parent for the Claude Agent SDK JS plugin) touches the same plugin and parallel-subagent parentage, but is a distinct problem (span parent vs. span input content).
Created by Claude, reviewed by Alber (@alberduris). The "Summary"/"Root cause" were verified against the dist source and the SDK's sidechain transcript (ground truth). The "Suggested fix" is Claude's own and was not verified — treat it as a hint, possibly slop; take with a grain of salt.
Summary
When a Claude Agent SDK run dispatches a sub-agent (
Task/Agenttool), theanthropic.messages.createspan created inside the sub-agent is logged with the wronginput: it contains the root conversation (the root user prompt + the orchestrator's assistant turn that dispatched the sub-agent) instead of the sub-agent's own sidechain prompt ("You are subagent ALPHA ...").The span's
outputis correct (the sub-agent's actual turn), so within a single spaninputandoutputbelong to different conversations. With multiple sub-agents dispatched in parallel it's also racy: their inputs interleave.braintrust@3.17.0(latest),@anthropic-ai/claude-agent-sdk@0.3.162Repro
See
repro.mjs(single sub-agent is enough). After running, open the trace and inspect the sub-agent's childanthropic.messages.createspan — itsInputis the root prompt, not the sub-agent's prompt.Ground truth for comparison: the SDK itself records the sub-agent's real input in its sidechain transcript (
~/.claude/projects/<slug>/<sessionId>/subagents/agent-<id>.jsonl, first line,isSidechain:true) — it correctly shows"You are subagent ALPHA ...". So the SDK is fine; only the span reconstruction is wrong.Root cause
In
dist/instrumentation/index.js(plugin:claude-agent-sdk-plugin):buildLLMInput()reconstructs every LLM span's input as[...capturedPromptMessages, ...conversationHistory]:Both inputs to this function are single, run-global buffers — not keyed by
parentToolUseId/sidechain:capturedPromptMessagesis captured once, from the top-levelparams.promptstream in thestarthandler. It is always the root prompt.conversationHistoryisstate.finalResults, a single array onstate.finalizeCurrentMessageGrouppushes every finalized group's final message into it regardless of parent:So while the span parent is correctly scoped by
parentToolUseId(nesting is right), the input content is always rebuilt from the root prompt + the global running conversation. The sub-agent's own sidechain prompt (parent_tool_use_id != null,isSidechain:true) is never used as that span's input.outputis built fromstate.currentMessages(the actual current group), which is why output is right and input is wrong.Suggested fix
Key the reconstructed input by
parentToolUseId/sidechain instead of using run-global buffers:conversationHistory(and the originating prompt) perparentKey = llmParentKey(parentToolUseId), e.g.Map<parentKey, Message[]>, appendingfinalMessageonly to its own parent's bucket.usermessage (the SDK already tags these withparent_tool_use_idandisSidechain) rather than fromcapturedPromptMessages(which is the root prompt).This mirrors how the parent-span resolution already buckets by
parentToolUseId(activeLlmSpansByParentToolUse,latestLlmParentBySubAgentToolUse); the input/history reconstruction just needs the same per-parent scoping.Related
Issue #1655 (tool spans nesting under the wrong parent for the Claude Agent SDK JS plugin) touches the same plugin and parallel-subagent parentage, but is a distinct problem (span parent vs. span input content).
Created by Claude, reviewed by Alber (@alberduris). The "Summary"/"Root cause" were verified against the dist source and the SDK's sidechain transcript (ground truth). The "Suggested fix" is Claude's own and was not verified — treat it as a hint, possibly slop; take with a grain of salt.