A Multi-Agent System for Cross-Checking Phishing URLs.
Cross-Check is an advanced phishing detection framework powered by Large Language Models (LLMs). Built using Google's Agent Development Kit (ADK) and Mesop, it implements a "debate" mechanism where multiple specialized AI agents analyze a website from different perspectives before reaching a consensus on its legitimacy.
Traditional phishing detection often relies on single-point analysis. Cross-Check mitigates the risk of AI hallucinations and improves accuracy by employing a Multi-Agent Debate Framework.
Instead of asking one model "Is this phishing?", Cross-Check convenes a panel of experts:
- URL Analyst: Examines domain patterns, typosquatting, and TLDs.
- HTML Structure Analyst: Inspects code for hidden elements, obfuscated scripts, and form exploits.
- Content Semantic Analyst: Analyzes visible text for urgency, social engineering, and manipulative language.
- Brand Impersonation Analyst: Detects mismatches between brand identity and the actual URL/content.
These agents debate their findings under the supervision of a Moderator, and a Judge delivers the final verdict.
See Cross-Check in action:
- Legitimate URL – Analysis of a safe website
- Phishing URL – Detection of a phishing attempt
- Invalid URL – Handling of invalid URLs
- Rate Limit – Graceful handling of API limits
- Python 3.13+
uv(for fast Python package management)- API Keys for your LLM provider (e.g., Groq, OpenAI)
-
Clone the repository:
git clone https://github.com/vgnshwar/cross-check.git cd cross-check -
Install dependencies:
make install
-
Environment Setup: Rename the example environment file and add your API keys.
mv .env.example .env # Edit .env and add your GROQ_API_KEY or relevant model keys
You can run the application using the provided Makefile or via Docker.
Using Make: To see all available commands (including tests, evaluation, and dev server), simply run:
make helpTo start the web UI immediately:
make serveUsing Docker:
docker build -t cross-check .
docker run -p 7860:7860 -e GROQ_API_KEY=$GROQ_API_KEY cross-check- Google ADK Integration: Scalable and modular agent orchestration.
- Mesop UI: A clean, Python-native web interface.
- Model Agnostic: Uses LiteLLM to route requests to models like Llama 3, GPT-4, or Gemini.
- Debate Capability: Implements multi-round reasoning to reduce false positives.
- Robust Evaluation: Integrated Pytest suite for benchmarking and unit testing.
The system utilizes a sequential pipeline governed by a debate loop.
Cross-Check operates on a sophisticated SequentialAgent architecture powered by Google ADK. The pipeline simulates a panel of cybersecurity experts debating the legitimacy of a website.
The system processes a request in three distinct stages:
Agent: UrlPreProcessor – Before any AI analysis occurs, this custom Python agent executes deterministic validation:
- Validation: Verifies the URL format and reachability.
- Extraction: Scrapes the target website, cleaning the raw HTML and extracting visible text.
- Context Injection: Places the sanitized data into the session state, ensuring all subsequent agents analyze the exact same snapshot of the site.
Agent: LoopAgent – containing a ParallelAgent & Moderator
This is the core reasoning engine. Instead of a single pass, the system enters an iterative cycle:
- Parallel Analysis: Four specialist agents (
UrlAnalyst,HtmlAnalyst,ContentAnalyst,BrandAnalyst) analyze the website simultaneously. Each focuses solely on its domain (e.g., the URL analyst looks for typosquatting, while the HTML analyst looks for obfuscated scripts). - Moderator Review: The
ModeratorAgentaggregates the specialists' outputs. It evaluates if a consensus exists. - Dynamic Flow:
- If the team agrees, the Moderator calls the
exit_looptool to break the cycle. - If there is disagreement (e.g., URL looks fine but Content is suspicious), the Moderator triggers another round, forcing agents to re-evaluate based on peer feedback.
- If the team agrees, the Moderator calls the
Agent: JudgementAgent – Once the debate concludes (either via consensus or reaching the maximum iteration limit), the Judge reviews the entire conversation history. It weighs the final arguments from all specialists and delivers the authoritative PHISHING or LEGITIMATE verdict.
cross-check/
├── .env.example # API key template file
├── .github/
│ └── workflows/
│ └── tests.yml # CI test automation workflow
├── .gitignore
├── .python-version
├── .vscode/
│ └── launch.json # VS Code debugger config
├── CITATION.cff # Academic citation metadata
├── Dockerfile # Container build instructions
├── LICENSE
├── Makefile # Project command shortcuts
├── README.md
├── app/
│ ├── config.py # Debug and logging settings
│ ├── events.py # UI event handlers
│ ├── main.py # Mesop app entry point
│ ├── state.py # UI state management
│ └── styles.py # Component styling rules
├── engine/
│ ├── __init__.py
│ ├── agent.py # Multi-agent pipeline definition
│ ├── config.yaml # Agent prompts and models
│ ├── interface.py # Runner and streaming API
│ ├── schemas.py # Pydantic output schemas
│ └── utils.py # URL fetching and parsing
├── docs/
│ ├── invalid.mov # Invalid URL demo video
│ ├── legitimate.mov # Legitimate site demo video
│ ├── phishing.mov # Phishing detection demo video
│ ├── rate-limit.mov # Rate limit demo video
│ └── workflow.svg # Architecture diagram
├── eval/
│ ├── data/
│ │ ├── legitimate.evalset.json # Legitimate eval dataset
│ │ ├── phishing.evalset.json # Phishing eval dataset
│ │ └── test_config.json # Evaluation config
│ └── test_eval.py # Agent evaluation tests
├── pyproject.toml
├── tests/
│ ├── test_agents.py # Agent unit tests
│ └── test_utils.py # Utility function tests
└── uv.lock
Unit tests run automatically on every push via GitHub Actions. View the workflow status badge at the top of this README.
Viewing Coverage Reports from GitHub:
- Go to Actions → click on a workflow run
- Download the
coverage-reportartifact - Extract and serve locally:
cd coverage-report python -m http.server 8000 - Open http://localhost:8000/index.html in your browser
Full Coverage (including integration tests):
Integration tests require LLM API keys. Run locally with:
make coveragePhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah
arXiv:2506.15656 [cs.CR]
This project is licensed under the MIT License - see the LICENSE file for details.