Skip to content

[bot] Anthropic Messages API: usage.output_tokens_details.thinking_tokens not captured in span metrics #175

@braintrust-bot

Description

@braintrust-bot

Summary

Anthropic SDK v1.44.0 added usage.output_tokens_details to Messages API responses. This nested object contains thinking_tokens — the number of output tokens consumed by extended thinking/reasoning. The Braintrust Ruby SDK does not capture this field. Users who enable extended thinking via anthropic.messages.create(thinking: {...}, ...) or via the beta messages API have no visibility into their thinking token consumption.

This is distinct from issue #164 (RubyLLM extended thinking), which covers the ruby_llm gem. This issue covers the direct anthropic gem instrumentation.

What is missing

The Anthropic Messages API now returns:

"usage": {
  "input_tokens": 2095,
  "output_tokens": 503,
  "cache_creation_input_tokens": 2051,
  "cache_read_input_tokens": 2051,
  "output_tokens_details": {
    "thinking_tokens": 312
  }
}

output_tokens_details.thinking_tokens is the count of tokens the model generated as internal reasoning (always ≤ output_tokens). Capturing it allows users to:

  • Attribute cost to extended thinking vs. standard output
  • Diagnose cases where reasoning dominates total output tokens
  • Compare thinking token spend across requests

Why it is dropped today

Common.parse_usage_tokens in lib/braintrust/contrib/anthropic/instrumentation/common.rb iterates over the top-level usage hash and skips any value that is not Numeric:

usage_hash.each do |key, value|
  next unless value.is_a?(Numeric)   # ← skips output_tokens_details (a Hash)
  ...
end

output_tokens_details maps to {thinking_tokens: 312}, which fails the Numeric check and is silently dropped. No field in the existing field_map covers it.

The same gap applies to both the stable Messages API instrumentation (messages.rb) and the beta Messages API instrumentation (beta_messages.rb), since both delegate to Common.parse_usage_tokens.

Braintrust docs status

not_found — The Braintrust Anthropic integration docs at https://www.braintrust.dev/docs/providers/anthropic list prompt_tokens, completion_tokens, and cache metrics as captured but do not mention thinking tokens or output_tokens_details.

Upstream sources

Local files inspected

  • lib/braintrust/contrib/anthropic/instrumentation/common.rbparse_usage_tokens method (lines 14–48): 4-field field_map; Numeric guard silently drops nested objects like output_tokens_details
  • lib/braintrust/contrib/anthropic/instrumentation/messages.rbset_metrics (line 132) calls Common.parse_usage_tokens; also captures streaming output via finalize_stream_span
  • lib/braintrust/contrib/anthropic/instrumentation/beta_messages.rb — same parse_usage_tokens call pattern

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions