feat(agent): harden agent loop against unknown tools and tool failures#4525
Open
benym wants to merge 10 commits intoalibaba:mainfrom
Open
feat(agent): harden agent loop against unknown tools and tool failures#4525benym wants to merge 10 commits intoalibaba:mainfrom
benym wants to merge 10 commits intoalibaba:mainfrom
Conversation
…prevent infinite loops
…t tool call loops
…r mode without tool execution
…or UnknownToolFinalAnswerInterceptor to extend it
…efactor error handling in tool responses
…d guard hook configuration options
…n and custom messages
Contributor
Author
|
PTAL, thanks @yuluo-yx @chickenlj @robocanic |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe what this PR does / why we need it
Current Situation:
com.alibaba.cloud.ai.graph.agent.node.AgentToolNode#executeToolCallWithInterceptors. This causes ReActAgent to be forced to terminate in long-running tasks due to a single erroneous tool call.ToolRetryInterceptorwhich directly attempts to retry a single tool call.Based on the above issues, this PR adds two new tool error handling mechanisms: Unknown Tool Guard and Tool Execution Failure Guard, to address two typical error scenarios that ReActAgent may encounter during the tool invocation phase:
Unknown Tool: The model experiences a tool illusion, leading to a request for a non-existent tool, causing the fallback agent to throw an exception and be forced to terminate.
Tool Execution Failure: An exception or timeout occurs during tool execution, and the fallback agent's continuous retries fail to recover.
Both Guards employ a unified two-phase upgrade strategy: First, the model is allowed to self-repair and retry, providing a default list of failed tools and currently available tools, with one retrieval attempt. If consecutive failures occur, the system switches to final-answer mode (disabling the tool and forcing the model to directly answer the user), thus gracefully terminating the error loop rather than simply throwing an exception.
Major Changes:
Added
AbstractToolCallGuardHookabstract base class: A unified guard hook template that implements core logic such as two-phase upgrade (self-repair → final-answer), failure counting, and synthetic instruction injection.Added
UnknownToolGuardHook: Detects unknown tool requests and provides a list of registered tools to help the model self-correct.Added
ToolExecutionFailureGuardHook: Detects tool execution failures (runtime exception / timeout) and collaborates withToolRetryInterceptor.Added
AbstractFinalAnswerInterceptorand two implementations (UnknownToolFinalAnswerInterceptor,ToolExecutionFailureFinalAnswerInterceptor): Separates the model's tool access capabilities in the final-answer round.Added
ToolCallGuardConstants: A unified cross-module metadata key constant.Enhanced
AgentToolNode: Enriches error metadata generation (error type, failure type, tool names, allFailed). - EnhancedToolRetryInterceptor: Supports exponential backoff retries, jitter, and retry exhaustion metadata flags, cascading with guard hooks.Enhanced
ReactAgentandBuilder: Automatically register default guard hooks and provide fine-grained disabling options (disableDefaultUnknownToolGuard(),disableDefaultToolExecutionFailureGuard(),disableDefaultGuards()).Added complete unit test and integration test coverage.
中文版
现状:
com.alibaba.cloud.ai.graph.agent.node.AgentToolNode#executeToolCallWithInterceptors,这会导致ReActAgent在长程任务中仅因为一次错误的toolcall调用就被迫中断,如果需要额外处理,这将强制开发者在使用的时候在外层捕获这个异常,这样代码显得写起来很难受,使得ReActAgent并不能做到长程任务开箱即用ToolRetryInterceptor直接尝试重试单个工具调用基于上述问题本 PR 新增了两种 Tool 错误处理机制:Unknown Tool Guard 和 Tool Execution Failure Guard,用于解决 ReActAgent 在工具调用阶段可能出现的两类典型错误场景:
两种 Guard 采用统一的 两阶段升级策略:先允许模型自行修复(self-repair retry),默认给出调用失败的工具和当前可用工具,重试次数为1次,若连续失败则切换为 final-answer 模式(禁用工具,强制模型直接回答用户),从而优雅地终止错误循环,而非简单地抛出异常。
主要变更:
AbstractToolCallGuardHook抽象基类:统一的 guard hook 模板,实现两阶段升级(self-repair → final-answer)、失败计数、合成指令注入等核心逻辑UnknownToolGuardHook:检测未知工具请求,提供已注册工具列表帮助模型自纠正ToolExecutionFailureGuardHook:检测工具执行失败(runtime exception / timeout),与ToolRetryInterceptor协作AbstractFinalAnswerInterceptor及两个实现(UnknownToolFinalAnswerInterceptor、ToolExecutionFailureFinalAnswerInterceptor):在 final-answer 轮次中剥离模型的工具访问能力ToolCallGuardConstants:统一的跨模块元数据 key 常量AgentToolNode:丰富错误元数据生成(error type、failure type、tool names、allFailed flag),支持 guard hook 的故障检测ToolRetryInterceptor:支持指数退避重试、jitter、重试耗尽元数据标记,与 guard hook 级联配合ReactAgent和Builder:自动注册默认 guard hook,提供细粒度的禁用选项(disableDefaultUnknownToolGuard()、disableDefaultToolExecutionFailureGuard()、disableDefaultGuards())Does this pull request fix one issue?
#4337
#4337 (comment)
Describe how you did it
Overall Architecture Design:
Core Implementation Details:
AbstractToolCallGuardHook (Two-Phase Upgrade Template):
beforeModel: Reads the guard-specific failure flag from theRunnableConfigmetadata, increments the consecutive failure counter, and injects a synthesizedAgentInstructionMessagewhen the count exceedsmaxSelfRepairRetries, instructing the model to answer directly.afterModel: Verifies whether the model follows the final-answer instruction. If a tool call is still issued, it is replaced with a fallback answer message and jumps to the end node.State is isolated through the
RunnableConfigcontext key, ensuring thread safety.UnknownToolGuardHook:
Reads the
allToolCallsUnknownflag set byAgentToolNode.Includes the requested tool name and a list of available tools in the final-answer instruction to help the model self-correct.
Priority
HIGHEST_PRECEDENCE + 100(higher than the execution failure guard).Default registration (even without tool configuration, as the model can illusionize tools)
ToolExecutionFailureGuardHook:
Reads the
allToolCallsFailedflag and failure type (runtime_exception / timeout)Collaborates with
ToolRetryInterceptor: triggers the guard only after retries are exhaustedPriority
HIGHEST_PRECEDENCE + 110Registers only when the Agent has tools configured
AgentToolNode Enhancements:
Generates structured error metadata (errorType, failureType, requestedToolNames, availableToolNames) for each failed tool call
Adds a unified
allToolCallsErroredflag, supporting mixed failure scenariosUses
AtomicReferenceArray + CASin parallel execution mode to prevent race conditions between timeout and completionToolRetryInterceptor Enhancements:
Supports exponential backoff (
initialDelay × backoffFactorretryNumber), with an upper limit ofmaxDelaySupports jitter (±25% random offset)
Marks
retryAttemptsandretryExhaustedin the response metadata after retries are exhausted, for downstream guard hooks to readBuilder Enhancements:
disableDefaultUnknownToolGuard(): Disables unknown tool guarddisableDefaultToolExecutionFailureGuard(): Disables execution failure guarddisableDefaultGuards(): Disables bothGuard Hook supports custom
maxSelfRepairRetries,customFinalAnswerInstruction, andcustomFallbackAnswerMessage中文版
整体架构设计:
核心实现细节:
AbstractToolCallGuardHook(两阶段升级模板):
beforeModel:从RunnableConfig元数据中读取 guard 专属的失败标记,递增连续失败计数器,当计数超过maxSelfRepairRetries时注入合成AgentInstructionMessage指示模型直接回答afterModel:校验模型是否遵循 final-answer 指令,若仍发出 tool call,则替换为兜底回答消息并跳转到 end 节点RunnableConfigcontext key 隔离,保证线程安全UnknownToolGuardHook:
AgentToolNode设置的allToolCallsUnknown标记HIGHEST_PRECEDENCE + 100(高于执行失败 guard)ToolExecutionFailureGuardHook:
allToolCallsFailed标记和失败类型(runtime_exception / timeout)ToolRetryInterceptor协作:重试耗尽后才触发 guardHIGHEST_PRECEDENCE + 110AgentToolNode 增强:
allToolCallsErrored统一标记,支持混合失败场景AtomicReferenceArray + CAS防止超时与完成之间的竞争条件ToolRetryInterceptor 增强:
initialDelay × backoffFactor^retryNumber),上限maxDelayretryAttempts和retryExhausted,供下游 guard hook 读取Builder 增强:
disableDefaultUnknownToolGuard():禁用未知工具 guarddisableDefaultToolExecutionFailureGuard():禁用执行失败 guarddisableDefaultGuards():同时禁用两者maxSelfRepairRetries、customFinalAnswerInstruction、customFallbackAnswerMessageDescribe how to verify it
Core Test Classes:
ReactAgentUnknownToolRecoveryTest: Verifies the Agent's self-healing and final-answer degradation in the Unknown Tool scenario.ReactAgentToolExecutionFailureRecoveryTest: Verifies the recovery process in the Tool Execution Failure scenario.UnknownToolGuardHookTest: Guard hook unit test (failure counting, instruction injection, fallback answer).ToolExecutionFailureGuardHookTest: Execution failure guard unit test.UnknownToolFinalAnswerInterceptorTest: Final-answer interceptor test (tool stripping verification).ToolExecutionFailureFinalAnswerInterceptorTest: Execution failure interceptor test.ToolRetryInterceptorStructuredFailureTest: Retry interceptor structured failure test (metadata tagging verification).Verification Key Points:
Unknown Tool Scenario: Model requests a non-existent tool → Self-healing retry → After exceeding the threshold... Final-answer downgrade → Returns a meaningful answer to the user
Execution Failure scenario: Tool execution throws an exception → ToolRetryInterceptor automatically retryes → Retry exhaustion → Guard hook upgrades to final-answer → Returns a fallback answer
Two guards can coexist and work independently
Builder's
disableDefaultGuards()can correctly disable the default guard中文版
核心测试类:
ReactAgentUnknownToolRecoveryTest:验证 Unknown Tool 场景下 Agent 的自修复和 final-answer 降级ReactAgentToolExecutionFailureRecoveryTest:验证 Tool Execution Failure 场景下的恢复流程UnknownToolGuardHookTest:guard hook 单元测试(失败计数、指令注入、兜底回答)ToolExecutionFailureGuardHookTest:执行失败 guard 单元测试UnknownToolFinalAnswerInterceptorTest:final-answer 拦截器测试(工具剥离验证)ToolExecutionFailureFinalAnswerInterceptorTest:执行失败拦截器测试ToolRetryInterceptorStructuredFailureTest:重试拦截器结构化失败测试(元数据标记验证)验证要点:
disableDefaultGuards()可正确禁用默认 guardSpecial notes for reviews
Backward Compatibility: Both guard hooks are automatically registered to
ReactAgentas default behavior, providing protection to existing code without modification. They can be explicitly disabled via the Builder method.Design Decisions:
UnknownToolGuardHookis always registered (even if the Agent doesn't have a tool configured), as the model might mistakenly believe a tool is being called.ToolExecutionFailureGuardHookis only registered if the Agent has a tool configured.Guard Hook priority: Unknown Tool (+100) > Execution Failure (+110), ensuring unknown tools are handled first.
Immature Message Pattern: All synthetic instructions are injected using
AgentInstructionMessage, without modifying the original message list;AbstractFinalAnswerInterceptorusesModelRequest.builder()to create a new copy of the request instead of modifying the original object.Extensibility of
AbstractToolCallGuardHook: As an abstract template class, it can be inherited to support more types of tool call error handling (such as insufficient permissions, rate limiting, etc.), by simply implementing template methods such asbuildFinalAnswerInstruction()andbuildFallbackAnswerMessage().中文版
向后兼容性:两种 guard hook 作为默认行为自动注册到
ReactAgent,已有代码无需修改即可获得保护。如需禁用,可通过 Builder 方法显式关闭。设计决策:
UnknownToolGuardHook始终注册(即使 Agent 未配置工具),因为模型可能幻觉出工具调用ToolExecutionFailureGuardHook仅在 Agent 配置了工具时注册不可变消息模式:所有合成指令使用
AgentInstructionMessage注入,不修改原始消息列表;AbstractFinalAnswerInterceptor使用ModelRequest.builder()创建新请求副本而非修改原对象。AbstractToolCallGuardHook的可扩展性:作为抽象模板类,后续可通过继承支持更多类型的 tool call 错误处理(如权限不足、速率限制等),只需实现buildFinalAnswerInstruction()和buildFallbackAnswerMessage()等模板方法。