Feature/critic agent by zbinxp · Pull Request #467 · HKUDS/DeepTutor

zbinxp · 2026-05-10T04:01:02Z

Description

Core Feature: CriticAgent

A new CriticAgent runs after the ReAct solve phase and before WriterAgent to audit citations and verify evidence quality.

New files:

deeptutor/agents/solve/agents/critic_agent.py — main agent (~670 lines)
deeptutor/core/content_filter.py — harmful content filter for web sources (~314 lines)
deeptutor/agents/solve/pipeline_errors.py — QuestionNeedsClarification error
deeptutor/agents/solve/prompts/{en,zh}/critic_agent.yaml — agent prompts
New tests:

tests/agents/solve/test_critic_agent.py (20 tests)
tests/core/test_content_filter.py (449 lines)
Source Validation Pipeline

URL deduplication (scratchpad.py):

Two-strategy dedup: URL normalization (same-host fast path) + fuzzy title match (cross-host, via RapidFuzz token_set_ratio at 80% threshold)
Normalized key: arXiv ID extraction, OpenReview ID extraction
Alternate URLs stored separately per source group
CriticAgent source validation (critic_agent.py):

VisitUrlTool logic inlined into _fetch_url() method — no longer exposed as standalone tool
Parallel group validation: primary URL first, alternates on failure
Content safety filtering via ContentFilter
Dead/unsafe sources removed from scratchpad entries in-place
Claim verification via embedding similarity (llama_index) + token-based fallback
Question Quality Gate

New flow (main_solver.py, deep_solve.py):

Pre-plan quality check that evaluates question completeness
Returns QuestionNeedsClarification with specific issues if question is ambiguous/incomplete
Triggers user clarification prompt in the UI

Related Issues

Closes #...
Related to [Feature Request]:References带出的内容需要做合规过滤 #375

Module(s) Affected

Checklist

I have read and followed the contribution guidelines.
My code follows the project's coding standards.
I have run pre-commit run --all-files and fixed any issues.
I have added relevant tests for my changes.
I have updated the documentation (if necessary).
My changes do not introduce any new security vulnerabilities.

Additional Notes

Add any other context or screenshots about the pull request here.

…verification CriticAgent runs after the ReAct solve phase and before writing to validate citations, detect hallucinated URLs, identify missing evidence, and call tools to fill gaps. It uses an iterative audit loop (up to 3 rounds). VisitUrlTool fetches one or more URLs in parallel (via asyncio.gather) and verifies each is alive (HTTP 2xx) with embedding-based semantic claim verification using LlamaIndex Settings.embed_model, with token-based fallback. Supports newline-separated URL strings for batch verification. Additional changes: - SolveToolRuntime: added audit control action and batch URL handling - Scratchpad: invalid_source_ids tracking, get_valid_sources(), find_source_id_by_url() - New test suites for critic_agent (18 tests), scratchpad (18 tests), visit_url (15 tests) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Pre-plan question quality check intercepts incomplete/ambiguous questions before the pipeline runs. The LLM evaluates whether the question is sufficently complete (including image content), and if not, raises QuestionNeedsClarification which the frontend displays as an amber clarification banner. Also fixes MiniMax vision support: binding minimax_anthropic was missing from PROVIDER_CAPABILITIES, causing images to be stripped before the question quality gate could evaluate them. Additional fixes: - Logger.info() calls in base_agent.py: switch from %-format to str concat to match custom Logger signature - DeepSolveCapability: catch and surface clarification error to frontend Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ce verification CriticAgent was calling visit_url without the claim parameter, so claim verification always returned None. Now _validate_all_sources extracts the Entry.observation text that originally cited a source and passes it as the claim argument to VisitUrlTool.execute(), enabling embedding-based relevance verification. Also refactored _validate_all_sources to instantiate VisitUrlTool directly instead of routing through the tool-runtime abstraction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Deduplication now groups web sources by normalized paper key (arxiv ID, openreview id, ICLR hash, etc.) instead of exact URL, so all variants of the same paper collapse to one entry. Alternate URLs are preserved in `alternate_urls` so the critic agent can still validate each variant individually. format_sources_markdown picks the best canonical URL via _best_url priority (arxiv.org/abs > html > pdf > others). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two-tier content safety system for CriticAgent URL validation: - Primary: CJK-aware regex patterns (self-harm, extremist, hate speech, graphic violence) - Secondary: LLM classifier (triggered only when pattern confidence < 0.9) - Educational content (suicide prevention, mental health) preserved - CSAM domain-level blocking via URL checks Files: - deeptutor/core/content_filter.py: ContentFilter + LLMClassifier - deeptutor/core/tool_protocol.py: add safety fields to ToolResult - deeptutor/agents/solve/agents/critic_agent.py: integrate filter in _validate_all_sources - deeptutor/agents/solve/prompts/zh/critic_agent.yaml: Chinese locale - tests/core/test_content_filter.py: 50 test cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tle match Add `_is_duplicate()` and `_longest_common_substring()` to detect same-paper sources even when titles differ (e.g., "[Quick Review]", "ICML", arXiv ID variants). Exact URL match also qualifies as duplicate. Skip adding duplicates in `_add_source_to_scratchpad()`. tests: 17 new tests for LCS, fuzzy dedup logic, and add_source_to_scratchpad. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replace URL-pattern-based dedup with fuzzy title matching via RapidFuzz, which handles arbitrary URL variants of the same paper without needing a domain allowlist. The token_set_ratio (threshold 80) correctly collapses e.g. arxiv.org/abs and huggingface.co/papers variants of the same paper. CriticAgent._add_source_to_scratchpad: uses exact URL match + RapidFuzz title dedup instead of _normalize_paper_key. Scratchpad._build_sources_list: uses RapidFuzz title dedup instead of URL-key grouping. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…onfig - VisitUrlTool.execute: remove multi-URL batch support (newline-split, list input, asyncio.gather, batch report formatting, _uses_batch flag) - Hoist isinstance check in critic_agent validation loop (5 checks → 1/iter) - Wire idle_timeout from settings.retry.idle_timeout through LLMConfig → GenerationSettings → all providers (replacing hardcoded 90s) - Add debug logging to complete() and stream() in factory.py - Remove auto-adding visit_url to enabled tools in main_solver.py - Add "url" to _ACTION_INPUT_PARAM_CANDIDATES in tool_runtime.py - Fix claim_verified unbound variable on TimeoutError in VisitUrlTool - Fix pre-existing test assertion: accept "timeout" or "timed out" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Simplify validation result handling in process() - Improve source deduplication using fuzzy title matching - Remove dead/unsafe sources from scratchpad entries in-place - Add harmful content filter for external web sources - Use action_input (search query) as claim instead of observation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Replace get_logger imports with stdlib logging.getLogger in critic_agent and builtin/tools (get_logger never existed in deeptutor.logging) - Fix self.logger.display_manager -> self.display_manager in main_solver (logger is a logging.Logger, not MainSolver) - Filter answer_now_context from config_overrides before passing to math_animator request validation - Rename tests/book/test_context.py to test_book_context.py to avoid module collision with tests/logging/test_context.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ferences - Add _fetch_url() method to CriticAgent with full URL fetching logic (HTTP fetch, HTML parsing, claim verification via embeddings, safety filter) - Remove VisitUrlTool from BUILTIN_TOOL_TYPES, TOOL_ALIASES, and __all__ (no longer exposed as a standalone tool) - Update critic_agent tests to mock _fetch_url instead of VisitUrlTool - Fix _ensure_references regex to match Chinese header ## 参考文献 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Reverts changes that introduced idle_timeout into: - LLMRetryConfig (settings.py) - LLMConfig (config.py) - GenerationSettings (provider_core/base.py) - Provider factory and individual providers The feature introduced idle_timeout settings that are not ready for production use yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion - Rewrite _normalize_url_for_dedup docstring to make clear it is a fast-path for same-host duplicates only, NOT a general paper deduplication mechanism - Update _build_sources_list docstring to explain the two-strategy approach: URL normalization (same-host) + fuzzy title match (cross-host) - Remove misleading comment about stripping version suffixes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

zbinxp and others added 22 commits May 6, 2026 09:30

feat: retry with backoff on HTTP 429 in VisitUrlTool._fetch_one

20e3b77

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat: retry with backoff on HTTP 429 in VisitUrlTool._fetch_one

6c5cb98

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix: use entry.action_input (search query) as claim instead of observ…

bc14f60

…ation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

add .claude folder to ignore file

e07f58b

remove duplicated imports

e6c2a8f

simplify changes

c99de12

simplify tool changes

a948eca

simplify changes

acb5cdc

simplify changes

866f044

zbinxp closed this May 10, 2026

zbinxp deleted the feature/critic-agent branch May 11, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/critic agent#467

Feature/critic agent#467
zbinxp wants to merge 22 commits into
HKUDS:devfrom
zbinxp:feature/critic-agent

zbinxp commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zbinxp commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Module(s) Affected

Checklist

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zbinxp commented May 10, 2026 •

edited

Loading