Summary
Anthropic shipped a beta Advisor tool that lets Claude consult a secondary
model mid-conversation. The advisor runs its own inference with a model you
specify in the tool definition, and returns its analysis to the primary model.
I'd like to understand how this behaves through better-ccflare before relying on
it, since it doesn't fit the normal "one request in, one response out" shape the
strategies assume.
Tool definition / beta header for reference:
{ "type": "advisor_20260301", "name": "advisor", "model": "claude-sonnet-4-6" }
Beta header: advisor-tool-2026-03-01 (SDK sets it automatically on
client.beta.messages.create()).
Why it matters for a proxy
The advisor appears to be a server-side tool — its secondary call likely
runs inside Anthropic's request loop rather than coming back out as a separate
HTTP request through the proxy. If that's true, the proxy sees a single inbound
request, but the response carries additional token usage and potentially a
different model than the one requested. That has implications for account
selection, usage accounting, and analytics.
Questions to investigate
Out of scope
Not proposing a new routing strategy — this is about correctly handling requests
that use the Advisor tool under the existing strategies.
Summary
Anthropic shipped a beta Advisor tool that lets Claude consult a secondary
model mid-conversation. The advisor runs its own inference with a model you
specify in the tool definition, and returns its analysis to the primary model.
I'd like to understand how this behaves through better-ccflare before relying on
it, since it doesn't fit the normal "one request in, one response out" shape the
strategies assume.
Tool definition / beta header for reference:
{ "type": "advisor_20260301", "name": "advisor", "model": "claude-sonnet-4-6" }Beta header:
advisor-tool-2026-03-01(SDK sets it automatically onclient.beta.messages.create()).Why it matters for a proxy
The advisor appears to be a server-side tool — its secondary call likely
runs inside Anthropic's request loop rather than coming back out as a separate
HTTP request through the proxy. If that's true, the proxy sees a single inbound
request, but the response carries additional token usage and potentially a
different model than the one requested. That has implications for account
selection, usage accounting, and analytics.
Questions to investigate
through the proxy, or is it entirely server-side (invisible to ccflare)?
advisor-tool-2026-03-01beta header pass through cleanly, or isanything stripping/normalizing beta headers?
advisor's token usage, or does it undercount? Where does advisor usage
show up in the response
usageblock?session/leastUsed/session-affinity(e.g. does the advisor'smodeldiffering from the parent request confuse any model-aware logic)?work" because it's one request? (Confirm, don't assume.)
Out of scope
Not proposing a new routing strategy — this is about correctly handling requests
that use the Advisor tool under the existing strategies.