Skip to content

wrapClaudeAgentSDK: sub-agent anthropic.messages.create span gets the ROOT conversation as input (wrong) #2103

@alberduris

Description

@alberduris

Summary

When a Claude Agent SDK run dispatches a sub-agent (Task/Agent tool), the anthropic.messages.create span created inside the sub-agent is logged with the wrong input: it contains the root conversation (the root user prompt + the orchestrator's assistant turn that dispatched the sub-agent) instead of the sub-agent's own sidechain prompt ("You are subagent ALPHA ...").

The span's output is correct (the sub-agent's actual turn), so within a single span input and output belong to different conversations. With multiple sub-agents dispatched in parallel it's also racy: their inputs interleave.

  • braintrust@3.17.0 (latest), @anthropic-ai/claude-agent-sdk@0.3.162

Repro

See repro.mjs (single sub-agent is enough). After running, open the trace and inspect the sub-agent's child anthropic.messages.create span — its Input is the root prompt, not the sub-agent's prompt.

Ground truth for comparison: the SDK itself records the sub-agent's real input in its sidechain transcript (~/.claude/projects/<slug>/<sessionId>/subagents/agent-<id>.jsonl, first line, isSidechain:true) — it correctly shows "You are subagent ALPHA ...". So the SDK is fine; only the span reconstruction is wrong.

Root cause

In dist/instrumentation/index.js (plugin: claude-agent-sdk-plugin):

buildLLMInput() reconstructs every LLM span's input as [...capturedPromptMessages, ...conversationHistory]:

function buildLLMInput(prompt, conversationHistory, capturedPromptMessages) {
  const promptMessages = [];
  if (typeof prompt === "string") promptMessages.push({ content: prompt, role: "user" });
  else if (capturedPromptMessages?.length) {
    for (const msg of capturedPromptMessages) {
      const role = msg.message?.role, content = msg.message?.content;
      if (role && content !== void 0) promptMessages.push({ content, role });
    }
  }
  return [...promptMessages, ...conversationHistory].length ? [...promptMessages, ...conversationHistory] : void 0;
}

Both inputs to this function are single, run-global buffers — not keyed by parentToolUseId/sidechain:

  1. capturedPromptMessages is captured once, from the top-level params.prompt stream in the start handler. It is always the root prompt.
  2. conversationHistory is state.finalResults, a single array on state. finalizeCurrentMessageGroup pushes every finalized group's final message into it regardless of parent:
async function finalizeCurrentMessageGroup(state) {
  const parentToolUseId = state.currentMessages[0]?.parent_tool_use_id ?? null;
  const parentKey = llmParentKey(parentToolUseId);
  // ...parent span is correctly resolved per sub-agent...
  const llmSpanResult = await createLLMSpanForMessages(
    state.currentMessages,
    state.originalPrompt,
    state.finalResults,          // <-- shared across ALL parents (root + every sub-agent)
    state.options,
    state.currentMessageStartTime,
    state.capturedPromptMessages, // <-- always the ROOT prompt
    parentSpan,
    existingLlmSpan,
  );
  if (llmSpanResult?.finalMessage) state.finalResults.push(llmSpanResult.finalMessage); // appended globally
  // ...
}

So while the span parent is correctly scoped by parentToolUseId (nesting is right), the input content is always rebuilt from the root prompt + the global running conversation. The sub-agent's own sidechain prompt (parent_tool_use_id != null, isSidechain:true) is never used as that span's input. output is built from state.currentMessages (the actual current group), which is why output is right and input is wrong.

Suggested fix

Key the reconstructed input by parentToolUseId/sidechain instead of using run-global buffers:

  • Track a separate conversationHistory (and the originating prompt) per parentKey = llmParentKey(parentToolUseId), e.g. Map<parentKey, Message[]>, appending finalMessage only to its own parent's bucket.
  • For a sub-agent's first group, seed the input from the sub-agent's own first sidechain user message (the SDK already tags these with parent_tool_use_id and isSidechain) rather than from capturedPromptMessages (which is the root prompt).

This mirrors how the parent-span resolution already buckets by parentToolUseId (activeLlmSpansByParentToolUse, latestLlmParentBySubAgentToolUse); the input/history reconstruction just needs the same per-parent scoping.

Related

Issue #1655 (tool spans nesting under the wrong parent for the Claude Agent SDK JS plugin) touches the same plugin and parallel-subagent parentage, but is a distinct problem (span parent vs. span input content).


Created by Claude, reviewed by Alber (@alberduris). The "Summary"/"Root cause" were verified against the dist source and the SDK's sidechain transcript (ground truth). The "Suggested fix" is Claude's own and was not verified — treat it as a hint, possibly slop; take with a grain of salt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions