Feature/critic agent#467
Closed
zbinxp wants to merge 22 commits into
Closed
Conversation
…verification CriticAgent runs after the ReAct solve phase and before writing to validate citations, detect hallucinated URLs, identify missing evidence, and call tools to fill gaps. It uses an iterative audit loop (up to 3 rounds). VisitUrlTool fetches one or more URLs in parallel (via asyncio.gather) and verifies each is alive (HTTP 2xx) with embedding-based semantic claim verification using LlamaIndex Settings.embed_model, with token-based fallback. Supports newline-separated URL strings for batch verification. Additional changes: - SolveToolRuntime: added audit control action and batch URL handling - Scratchpad: invalid_source_ids tracking, get_valid_sources(), find_source_id_by_url() - New test suites for critic_agent (18 tests), scratchpad (18 tests), visit_url (15 tests) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-plan question quality check intercepts incomplete/ambiguous questions before the pipeline runs. The LLM evaluates whether the question is sufficently complete (including image content), and if not, raises QuestionNeedsClarification which the frontend displays as an amber clarification banner. Also fixes MiniMax vision support: binding minimax_anthropic was missing from PROVIDER_CAPABILITIES, causing images to be stripped before the question quality gate could evaluate them. Additional fixes: - Logger.info() calls in base_agent.py: switch from %-format to str concat to match custom Logger signature - DeepSolveCapability: catch and surface clarification error to frontend Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ce verification CriticAgent was calling visit_url without the claim parameter, so claim verification always returned None. Now _validate_all_sources extracts the Entry.observation text that originally cited a source and passes it as the claim argument to VisitUrlTool.execute(), enabling embedding-based relevance verification. Also refactored _validate_all_sources to instantiate VisitUrlTool directly instead of routing through the tool-runtime abstraction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deduplication now groups web sources by normalized paper key (arxiv ID, openreview id, ICLR hash, etc.) instead of exact URL, so all variants of the same paper collapse to one entry. Alternate URLs are preserved in `alternate_urls` so the critic agent can still validate each variant individually. format_sources_markdown picks the best canonical URL via _best_url priority (arxiv.org/abs > html > pdf > others). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two-tier content safety system for CriticAgent URL validation: - Primary: CJK-aware regex patterns (self-harm, extremist, hate speech, graphic violence) - Secondary: LLM classifier (triggered only when pattern confidence < 0.9) - Educational content (suicide prevention, mental health) preserved - CSAM domain-level blocking via URL checks Files: - deeptutor/core/content_filter.py: ContentFilter + LLMClassifier - deeptutor/core/tool_protocol.py: add safety fields to ToolResult - deeptutor/agents/solve/agents/critic_agent.py: integrate filter in _validate_all_sources - deeptutor/agents/solve/prompts/zh/critic_agent.yaml: Chinese locale - tests/core/test_content_filter.py: 50 test cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tle match Add `_is_duplicate()` and `_longest_common_substring()` to detect same-paper sources even when titles differ (e.g., "[Quick Review]", "ICML", arXiv ID variants). Exact URL match also qualifies as duplicate. Skip adding duplicates in `_add_source_to_scratchpad()`. tests: 17 new tests for LCS, fuzzy dedup logic, and add_source_to_scratchpad. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace URL-pattern-based dedup with fuzzy title matching via RapidFuzz, which handles arbitrary URL variants of the same paper without needing a domain allowlist. The token_set_ratio (threshold 80) correctly collapses e.g. arxiv.org/abs and huggingface.co/papers variants of the same paper. CriticAgent._add_source_to_scratchpad: uses exact URL match + RapidFuzz title dedup instead of _normalize_paper_key. Scratchpad._build_sources_list: uses RapidFuzz title dedup instead of URL-key grouping. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…onfig - VisitUrlTool.execute: remove multi-URL batch support (newline-split, list input, asyncio.gather, batch report formatting, _uses_batch flag) - Hoist isinstance check in critic_agent validation loop (5 checks → 1/iter) - Wire idle_timeout from settings.retry.idle_timeout through LLMConfig → GenerationSettings → all providers (replacing hardcoded 90s) - Add debug logging to complete() and stream() in factory.py - Remove auto-adding visit_url to enabled tools in main_solver.py - Add "url" to _ACTION_INPUT_PARAM_CANDIDATES in tool_runtime.py - Fix claim_verified unbound variable on TimeoutError in VisitUrlTool - Fix pre-existing test assertion: accept "timeout" or "timed out" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Simplify validation result handling in process() - Improve source deduplication using fuzzy title matching - Remove dead/unsafe sources from scratchpad entries in-place - Add harmful content filter for external web sources - Use action_input (search query) as claim instead of observation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Replace get_logger imports with stdlib logging.getLogger in critic_agent and builtin/tools (get_logger never existed in deeptutor.logging) - Fix self.logger.display_manager -> self.display_manager in main_solver (logger is a logging.Logger, not MainSolver) - Filter answer_now_context from config_overrides before passing to math_animator request validation - Rename tests/book/test_context.py to test_book_context.py to avoid module collision with tests/logging/test_context.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ferences - Add _fetch_url() method to CriticAgent with full URL fetching logic (HTTP fetch, HTML parsing, claim verification via embeddings, safety filter) - Remove VisitUrlTool from BUILTIN_TOOL_TYPES, TOOL_ALIASES, and __all__ (no longer exposed as a standalone tool) - Update critic_agent tests to mock _fetch_url instead of VisitUrlTool - Fix _ensure_references regex to match Chinese header ## 参考文献 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reverts changes that introduced idle_timeout into: - LLMRetryConfig (settings.py) - LLMConfig (config.py) - GenerationSettings (provider_core/base.py) - Provider factory and individual providers The feature introduced idle_timeout settings that are not ready for production use yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion - Rewrite _normalize_url_for_dedup docstring to make clear it is a fast-path for same-host duplicates only, NOT a general paper deduplication mechanism - Update _build_sources_list docstring to explain the two-strategy approach: URL normalization (same-host) + fuzzy title match (cross-host) - Remove misleading comment about stripping version suffixes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Core Feature: CriticAgent
A new CriticAgent runs after the ReAct solve phase and before WriterAgent to audit citations and verify evidence quality.
New files:
deeptutor/agents/solve/agents/critic_agent.py — main agent (~670 lines)
deeptutor/core/content_filter.py — harmful content filter for web sources (~314 lines)
deeptutor/agents/solve/pipeline_errors.py — QuestionNeedsClarification error
deeptutor/agents/solve/prompts/{en,zh}/critic_agent.yaml — agent prompts
New tests:
tests/agents/solve/test_critic_agent.py (20 tests)
tests/core/test_content_filter.py (449 lines)
Source Validation Pipeline
URL deduplication (scratchpad.py):
Two-strategy dedup: URL normalization (same-host fast path) + fuzzy title match (cross-host, via RapidFuzz token_set_ratio at 80% threshold)
Normalized key: arXiv ID extraction, OpenReview ID extraction
Alternate URLs stored separately per source group
CriticAgent source validation (critic_agent.py):
VisitUrlTool logic inlined into _fetch_url() method — no longer exposed as standalone tool
Parallel group validation: primary URL first, alternates on failure
Content safety filtering via ContentFilter
Dead/unsafe sources removed from scratchpad entries in-place
Claim verification via embedding similarity (llama_index) + token-based fallback
Question Quality Gate
New flow (main_solver.py, deep_solve.py):
Pre-plan quality check that evaluates question completeness
Returns QuestionNeedsClarification with specific issues if question is ambiguous/incomplete
Triggers user clarification prompt in the UI
Related Issues
Module(s) Affected
agentsapiconfigcoreknowledgeloggingservicestoolsutilsweb(Frontend)docs(Documentation)scriptstests...Checklist
pre-commit run --all-filesand fixed any issues.Additional Notes
Add any other context or screenshots about the pull request here.