Feat/dynamic tool routing#1096
Conversation
|
@rab1prasad is attempting to deploy a commit to the Sachin Sharma's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (7)
📝 WalkthroughWalkthroughAdds pre-call tool routing to ChangesPre-call Tool Routing
Sequence Diagram(s)sequenceDiagram
participant Client
participant NeuroLink_stream as NeuroLink.stream()
participant applyToolRoutingExclusions
participant resolveToolRoutingExclusions
participant RouterLLM as Router LLM (generateFn)
Client->>NeuroLink_stream: stream(options, prompt)
NeuroLink_stream->>applyToolRoutingExclusions: options, userQuery
applyToolRoutingExclusions->>applyToolRoutingExclusions: buildToolRoutingCatalog(servers, registeredToolNames)
applyToolRoutingExclusions->>applyToolRoutingExclusions: fetchRecentRoutingHistory(options)
applyToolRoutingExclusions->>applyToolRoutingExclusions: buildRoutingQueryFromHistory(history, userQuery)
applyToolRoutingExclusions->>resolveToolRoutingExclusions: catalog, alwaysInclude, query, routerModel, timeout, generateFn
resolveToolRoutingExclusions->>RouterLLM: router prompt with JSON server catalog
RouterLLM-->>resolveToolRoutingExclusions: { servers: ["serverA"] }
resolveToolRoutingExclusions->>resolveToolRoutingExclusions: parse + validate + filter hallucinated ids
resolveToolRoutingExclusions-->>applyToolRoutingExclusions: excludedToolNames[]
applyToolRoutingExclusions->>NeuroLink_stream: options.excludeTools += excludedToolNames
NeuroLink_stream->>Client: stream response (with routing applied)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
src/lib/core/constants.tsParsing error: Unable to parse the specified 'tsconfig' file. Ensure it's correct and has valid syntax. error TS5012: Cannot read file '/.svelte-kit/tsconfig.json': ENOENT: no such file or directory, open '/.svelte-kit/tsconfig.json'. src/lib/core/toolRouting.tsParsing error: Unable to parse the specified 'tsconfig' file. Ensure it's correct and has valid syntax. error TS5012: Cannot read file '/.svelte-kit/tsconfig.json': ENOENT: no such file or directory, open '/.svelte-kit/tsconfig.json'. src/lib/neurolink.tsParsing error: Unable to parse the specified 'tsconfig' file. Ensure it's correct and has valid syntax. error TS5012: Cannot read file '/.svelte-kit/tsconfig.json': ENOENT: no such file or directory, open '/.svelte-kit/tsconfig.json'.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/core/toolRouting.ts`:
- Around line 49-62: The buildToolRoutingCatalog function uses startsWith to
match tools to server IDs, but this approach causes ambiguity when server IDs
have overlapping prefixes (e.g., "analytics" and "analytics_checkout" both match
"analytics_checkout_*" tools). Fix this by implementing a longest matching
prefix strategy instead of simple prefix matching: iterate through each tool in
registeredToolNames, find the server whose ID is the longest matching prefix for
that tool (accounting for the underscore separator), and assign the tool
exclusively to that server. This ensures each tool is owned by exactly one
server and prevents tools from being incorrectly excluded or assigned.
In `@src/lib/neurolink.ts`:
- Around line 1213-1215: The constructor assigns config.toolRouting directly by
reference to this.toolRoutingConfig, which allows subsequent mutations by
setToolRoutingServers() to affect the original config object. If the same config
is reused across multiple NeuroLink instances, mutations in one instance will
leak into others. Create a defensive clone of config.toolRouting before storing
it in this.toolRoutingConfig to prevent unintended cross-instance contamination.
This fix should also be applied at the other location where toolRouting config
is assigned (as noted in the comment).
- Around line 7748-7763: The `generateFn` callback in the
`resolveToolRoutingExclusions` call re-enters the public `generate()` method,
which has a `finally` block that resets the `_disableToolCacheForCurrentRequest`
flag to `false`. This causes the stream-scoped cache setting to be lost before
the main tool execution path runs. Save the current value of
`_disableToolCacheForCurrentRequest` before invoking the router call via
`generateFn`, and restore it afterward to preserve the original cache state.
Alternatively, implement an internal/silent generation path that avoids
re-entering the public `generate()` method and emitting public events for the
router turn.
- Around line 7742-7743: The routing catalog is being built from only custom
tools via getCustomTools().keys(), but this excludes external or in-memory MCP
server tools. To fix this, modify the registeredToolNames variable to include
all available MCP tool names from all sources - both custom tools and the tools
available from the servers collection. Instead of relying solely on
getCustomTools(), extract the complete set of tool names from the servers
parameter (which is passed to buildToolRoutingCatalog) to ensure the routing
catalog includes all MCP tools and the feature works for all tool registrations.
In `@src/lib/telemetry/attributes.ts`:
- Around line 164-166: The spanJsonAttribute function at Line 164 slices the
serialized string to maxChars and then appends a truncation suffix, which causes
the returned string to exceed the declared maxChars hard cap. To fix this,
calculate the length of the truncation suffix message and reduce the slice point
by that amount so the final concatenated result (slice plus suffix) does not
exceed maxChars. This ensures the hard-ceiling guarantee is actually enforced
and respects strict downstream attribute-size limits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e91b1d7a-2300-4b62-a558-321f86649563
📒 Files selected for processing (15)
package.jsonsrc/lib/core/constants.tssrc/lib/core/toolRouting.tssrc/lib/neurolink.tssrc/lib/providers/googleVertex.tssrc/lib/services/server/ai/observability/instrumentation.tssrc/lib/telemetry/attributes.tssrc/lib/telemetry/index.tssrc/lib/types/config.tssrc/lib/types/index.tssrc/lib/types/toolRouting.tssrc/lib/utils/anthropicTraceSanitizer.tssrc/lib/utils/mcpErrorText.tstest/continuous-test-suite-vertex-langfuse-spans.tstest/toolRouting.test.ts
9e29708 to
743fe33
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/core/toolRouting.ts`:
- Around line 236-249: The router call in the generateFn invocation currently
delegates timeout enforcement to the callee by passing the timeout parameter,
which can allow the operation to hang if the callee ignores it. Remove the
timeout property from the generateFn options object and instead wrap the entire
awaited generateFn call with the withTimeout utility, passing timeoutMs as the
second argument. This ensures timeoutMs is enforced as a true hard ceiling
locally rather than relying on the callee to respect it.
- Around line 254-262: The logger.warn call in the ToolRouting validation block
(around the rawResponsePreview parameter) is logging raw router output that may
contain sensitive user query or history content. Remove the rawResponsePreview
line entirely from the logger.warn call, or if router output logging is needed
for debugging, replace it with non-sensitive metadata instead. If raw output
logging is necessary, consider using transformParamsForLogging() to safely strip
secrets before logging as per the coding guidelines.
- Around line 112-126: The early return statement when priorTurns.length === 0
bypasses the maxChars constraint by returning currentQuery directly. Apply the
same character limiting logic used for the full transcript to the currentQuery
in the early return case: check if currentQuery.length exceeds maxChars and if
so, slice it to return only the last maxChars characters, otherwise return
currentQuery unchanged. This ensures the helper respects the caller-provided
character ceiling even when there is no prior conversation history.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 142c120e-5de6-40c1-adc1-77804373d263
📒 Files selected for processing (8)
src/lib/core/constants.tssrc/lib/core/toolRouting.tssrc/lib/neurolink.tssrc/lib/telemetry/attributes.tssrc/lib/types/config.tssrc/lib/types/index.tssrc/lib/types/toolRouting.tstest/toolRouting.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- src/lib/types/index.ts
- src/lib/telemetry/attributes.ts
- src/lib/types/toolRouting.ts
- src/lib/types/config.ts
- src/lib/neurolink.ts
743fe33 to
c8bada7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/lib/telemetry/attributes.ts (1)
165-167:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTruncation count and hard-cap can still be wrong in edge cases.
On Line 165-Line 167, the suffix count is computed from
serialized.length - maxChars, but oncekeepLengthis reduced, the real truncated count is larger. Also, when the suffix itself is longer thanmaxChars, the returned value can still exceed the cap.Suggested fix
if (serialized.length > maxChars) { - const truncationSuffix = `...[truncated ${serialized.length - maxChars} chars]`; - const keepLength = Math.max(0, maxChars - truncationSuffix.length); - return `${serialized.slice(0, keepLength)}${truncationSuffix}`; + const buildSuffix = (truncatedChars: number) => + `...[truncated ${truncatedChars} chars]`; + + // First pass to estimate room for prefix. + let suffix = buildSuffix(serialized.length - maxChars); + if (suffix.length >= maxChars) { + return suffix.slice(0, maxChars); + } + + let keepLength = maxChars - suffix.length; + // Recompute so suffix count reflects actual truncation after keepLength is known. + const truncatedChars = serialized.length - keepLength; + suffix = buildSuffix(truncatedChars); + + if (suffix.length >= maxChars) { + return suffix.slice(0, maxChars); + } + keepLength = maxChars - suffix.length; + return `${serialized.slice(0, keepLength)}${suffix}`; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/telemetry/attributes.ts` around lines 165 - 167, The truncation logic in the serialized value truncation has two issues: the truncation count reported in the suffix is calculated as serialized.length - maxChars, but this doesn't account for the space consumed by the suffix itself, resulting in an inaccurate count. Additionally, when the truncationSuffix is itself longer than maxChars, the final returned string can exceed the maxChars limit. Fix this by calculating the actual truncation count as serialized.length - keepLength (which reflects the real number of characters removed) and enforce a hard cap on the final result by slicing the concatenated string to ensure it never exceeds maxChars in length.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/neurolink.ts`:
- Around line 7817-7823: The code unconditionally returns an empty array
whenever context.sessionId is missing, but stream execution can accept
caller-provided conversationMessages for inline multi-turn conversations without
needing a sessionId. Modify the condition to check not only for the presence of
sessionId but also for the availability of inline conversationMessages in the
context or options. Only return the empty array if both sessionId is absent AND
no conversationMessages are provided, allowing inline callers to proceed with
their conversation history without requiring server-side session memory.
- Around line 7748-7795: The setup calls to getCustomTools() and
buildToolRoutingCatalog() can throw exceptions before the resolver's fail-open
boundary, and the router's generate() call does not receive the abort signal,
allowing aborted streams to wait for the routing timeout. Wrap the
getCustomTools() and buildToolRoutingCatalog() calls in a try-catch block that
returns early (fail-open) on error, pass options.abortSignal to the generateFn
call within resolveToolRoutingExclusions to respect stream cancellations
immediately, and add an abort signal check after the resolver completes to stop
routing setup if the stream was aborted during the router call.
---
Duplicate comments:
In `@src/lib/telemetry/attributes.ts`:
- Around line 165-167: The truncation logic in the serialized value truncation
has two issues: the truncation count reported in the suffix is calculated as
serialized.length - maxChars, but this doesn't account for the space consumed by
the suffix itself, resulting in an inaccurate count. Additionally, when the
truncationSuffix is itself longer than maxChars, the final returned string can
exceed the maxChars limit. Fix this by calculating the actual truncation count
as serialized.length - keepLength (which reflects the real number of characters
removed) and enforce a hard cap on the final result by slicing the concatenated
string to ensure it never exceeds maxChars in length.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 101cfa2d-3963-4f5e-a9cf-e46501e76b10
📒 Files selected for processing (8)
src/lib/core/constants.tssrc/lib/core/toolRouting.tssrc/lib/neurolink.tssrc/lib/telemetry/attributes.tssrc/lib/types/config.tssrc/lib/types/index.tssrc/lib/types/toolRouting.tstest/toolRouting.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- src/lib/core/constants.ts
- src/lib/types/index.ts
- test/toolRouting.test.ts
- src/lib/types/toolRouting.ts
- src/lib/core/toolRouting.ts
Add an optional pre-call tool-routing layer enabled through the constructor's `toolRouting` config. Once per stream() turn a cheap router LLM is asked which tool servers the user's query needs; the unpicked servers' registered tool names are appended to options.excludeTools before the main model call. The server catalog can be supplied or replaced after construction via setToolRoutingServers() for hosts that register tools per session. - alwaysIncludeServerIds servers are never offered to the router and never excluded; the router prompt prefix is overridable via routerPromptPrefix (defaults to DEFAULT_ROUTER_PROMPT_PREFIX), and the router LLM provider/model/region/temperature/timeout are configurable. - Fails open: missing query, <=1 routable server, parse/validation failure, empty/hallucinated pick, or any error yields no exclusions, so all tools stay available — byte-identical to routing disabled. The resolver never throws. - Untrusted query text is bounded (MAX_ROUTER_QUERY_CHARS) and framed as data, not instructions. - The router call runs inside the stream span's OTel context so its observation nests under the turn's trace rather than starting a separate one.
c8bada7 to
f162f86
Compare
|
@coderabbitai please review this pr |
|
I'll review the changes across all the new and modified files shortly. ✅ Action performedReview finished.
|
|
🎉 This PR is included in version 9.72.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
plane ticket: https://plane.breezehq.dev/breeze/browse/BZ-3640/
dev proof:
Pull Request
Description
What does this PR do?
Adds an optional pre-call tool-routing layer to NeuroLink. When enabled via
the constructor's
toolRoutingconfig, a cheap router LLM runs once perstream()turn, decides which tool servers the user's query actually needs, andappends the unpicked servers' tools to
options.excludeToolsbefore the mainmodel call. The result is fewer tool schemas sent to the chat model per turn
(lower token cost, better selection accuracy) with zero behavior change when the
feature is off.
Related Issues
Relates to BZ-3640 (dynamic per-turn tool filtering for chat)
Type of Change
Motivation and Context
Why is this change needed? What problem does it solve?
Hosts (e.g. Lighthouse chat) register large MCP tool sets — 100+ tools across
many servers — and the full schema set is re-sent to the model on every step of
the agentic loop. Tool-schema payload dominates prompt tokens, and selection
accuracy degrades well past ~30–50 tools.
This PR moves the routing decision into the SDK so any consumer gets it through
config alone, instead of re-implementing a router per host. A separate, cheap
router model classifies the query against a server catalog (id + description)
and narrows the offered tool set for that turn.
The mechanism is a deliberate denylist (append to
excludeTools): therouter only knows the declared catalog — a strict subset of the real tool set —
so built-in direct tools, always-include servers, and any tools outside the
catalog are never touched. Everything fails open, so a router problem can only
ever degrade to "all tools offered" (today's behavior), never to a broken turn.
Changes Made
What specific changes were made?
src/lib/core/toolRouting.ts—buildToolRoutingCatalog()(groupsregistered tool names by
${serverId}_prefix) andresolveToolRoutingExclusions()(runs the router LLM, validates JSON outputwith zod, returns the tool names to exclude). Includes
buildRouterPrompt(),parseRouterJson(),DEFAULT_ROUTER_PROMPT_PREFIX, and aMAX_ROUTER_QUERY_CHARSbound on the interpolated query.src/lib/types/toolRouting.ts—ToolRoutingConfig,ToolRoutingServerDescriptor,ToolRoutingModelConfig,ToolRoutingCatalogEntry,ToolRoutingResolutionParams.src/lib/neurolink.ts—toolRoutingconstructor wiring, theapplyToolRoutingExclusions()stream-path hook, andsetToolRoutingServers()to supply/replace the catalog after construction (for hosts that register
tools per session). The router call runs inside the
neurolink.streamspan'sOTel context so its observation nests under the turn's trace.
src/lib/types/config.ts— addstoolRouting?: ToolRoutingConfigtoNeurolinkConstructorConfig.src/lib/core/constants.ts—DEFAULT_TOOL_ROUTING_TIMEOUT_MS(15s).src/lib/types/index.ts— exports the new types.test/toolRouting.test.ts— unit tests for catalog building, set-math,every fail-open path, and the stream-hook wiring.
Configurable surface:
enabled,servers/setToolRoutingServers(),alwaysIncludeServerIds(never routed, never excluded),timeoutMs,routerModel(provider/model/region/temperature), androuterPromptPrefix(overrides the instruction header; defaults to
DEFAULT_ROUTER_PROMPT_PREFIX).Breaking Changes
Does this PR introduce breaking changes?
With no
toolRoutingconfig,toolRoutingConfigstaysundefinedand the hookreturns at its first guard before doing any work — byte-identical to prior
behavior.
disableToolsand an empty catalog also short-circuit it.Testing
How has this been tested?
gemini-3.1-flash-lite-preview; main chatclaude-sonnet-4-5)Test Coverage
test/toolRouting.test.ts— 13 tests passing. Covers: prefix grouping; drop ofzero-tool servers; exclude only unpicked routable servers; always-include never
offered/excluded; markdown-fenced output parsing; hallucinated-id filtering; and
fail-open on missing query, ≤1 routable server, non-JSON, schema-invalid,
empty/hallucinated pick, and router throw. Plus the private stream-hook appends
to
excludeToolscorrectly.Manual Testing Steps
toolRouting: { enabled: true, alwaysIncludeServerIds: [...], routerModel: {...} }.registerTools(...), thensetToolRoutingServers([{ id, description }, ...]).stream()turn with a domain-specific query (e.g. "find surcharge").{"servers":[...]}output) appears nested in the turn trace, before the main model call.
subset + always-include set.
open (all tools offered, turn proceeds normally).
Code Quality
Documentation
Commit Message Format
type(scope): descriptionfeat(tool-routing): pre-call per-turn tool routing via configDependencies
(Uses the already-present
zodfor router-output validation.)Performance Impact
Does this change affect performance?
When enabled, the offered tool set for a turn shrinks from the full catalog to
the routed subset + always-include servers, cutting per-step tool-schema tokens.
Cost: one extra cheap router LLM call before the first token (bounded by
timeoutMs, default 15s). Example observed turn — query "find surcharge":router selected
["surcharge-server"]for ~$0.00035 ongemini-flash-lite, andthe main model was offered the narrowed set instead of all 100+ tools.
When disabled (no config), there is no added work or latency.
Security Considerations
Are there any security implications?
Security review needed (low risk; described below)
The user query is interpolated into the router prompt. It is treated as
untrusted: framed explicitly as "data to classify, not instructions," length
is bounded by
MAX_ROUTER_QUERY_CHARS, and the blast radius of any injectionis bounded — the router can only pick from the declared catalog
(
routableServerIds.has(...)filters hallucinated ids), so the worst case is"keep more already-registered tools," never tool escalation outside the
catalog.
Router output is parsed defensively (fence-strip + JSON-object extraction) and
zod-validated; any failure fails open.
Deployment Notes
Inert until a consumer opts in via the
toolRoutingconstructor config.Reviewer Checklist
Additional Notes
executeStreamRequestso its span nests under the turn'sneurolink.streamtrace. A consequence is that on a model-access-denied fallback retry the
router runs again for that attempt. This is accepted: fallback is rare and the
router is the cheap flash-lite model with a hard timeout. Exclusions do not
compound across attempts —
applyToolRoutingExclusionsreassignsoptions.excludeToolsto a new array on a per-attempt clone, andstream()isolates options from the caller at entry.
neurolink.toolRoutingspan is omitted on this branch; therouter call is still visible as its own generation within the turn trace.
Pre-submission Checklist
pnpm test(pnpm exec vitest run test/toolRouting.test.ts)pnpm buildpnpm run validate:alland all checks passSummary by CodeRabbit
setToolRoutingServers()to supply/replace the routing server catalog after construction.toolRoutingsetting (including timeout and optional router model/prompt overrides); router failures now fail open (tools are not restricted).