Cross-Check

A Multi-Agent System for Cross-Checking Phishing URLs.

Cross-Check is an advanced phishing detection framework powered by Large Language Models (LLMs). Built using Google's Agent Development Kit (ADK) and Mesop, it implements a "debate" mechanism where multiple specialized AI agents analyze a website from different perspectives before reaching a consensus on its legitimacy.

Overview

Traditional phishing detection often relies on single-point analysis. Cross-Check mitigates the risk of AI hallucinations and improves accuracy by employing a Multi-Agent Debate Framework.

Instead of asking one model "Is this phishing?", Cross-Check convenes a panel of experts:

URL Analyst: Examines domain patterns, typosquatting, and TLDs.
HTML Structure Analyst: Inspects code for hidden elements, obfuscated scripts, and form exploits.
Content Semantic Analyst: Analyzes visible text for urgency, social engineering, and manipulative language.
Brand Impersonation Analyst: Detects mismatches between brand identity and the actual URL/content.

These agents debate their findings under the supervision of a Moderator, and a Judge delivers the final verdict.

Demo

See Cross-Check in action:

Legitimate URL – Analysis of a safe website
Phishing URL – Detection of a phishing attempt
Invalid URL – Handling of invalid URLs
Rate Limit – Graceful handling of API limits

🚀 Getting Started

Prerequisites

Python 3.13+
uv (for fast Python package management)
API Keys for your LLM provider (e.g., Groq, OpenAI)

Installation

Clone the repository:

git clone https://github.com/vgnshwar/cross-check.git
cd cross-check

Install dependencies:
```
make install
```

Environment Setup: Rename the example environment file and add your API keys.

mv .env.example .env
# Edit .env and add your GROQ_API_KEY or relevant model keys

Running the Application

You can run the application using the provided Makefile or via Docker.

Using Make: To see all available commands (including tests, evaluation, and dev server), simply run:

make help

To start the web UI immediately:

make serve

Using Docker:

docker build -t cross-check .
docker run -p 7860:7860 -e GROQ_API_KEY=$GROQ_API_KEY cross-check

✨ Features

Google ADK Integration: Scalable and modular agent orchestration.
Mesop UI: A clean, Python-native web interface.
Model Agnostic: Uses LiteLLM to route requests to models like Llama 3, GPT-4, or Gemini.
Debate Capability: Implements multi-round reasoning to reduce false positives.
Robust Evaluation: Integrated Pytest suite for benchmarking and unit testing.

🏗️ Architecture & Workflow

The system utilizes a sequential pipeline governed by a debate loop.

🤖 The Agentic Pipeline

Cross-Check operates on a sophisticated SequentialAgent architecture powered by Google ADK. The pipeline simulates a panel of cybersecurity experts debating the legitimacy of a website.

The system processes a request in three distinct stages:

1. Ingestion & Preprocessing

Agent: UrlPreProcessor – Before any AI analysis occurs, this custom Python agent executes deterministic validation:

Validation: Verifies the URL format and reachability.
Extraction: Scrapes the target website, cleaning the raw HTML and extracting visible text.
Context Injection: Places the sanitized data into the session state, ensuring all subsequent agents analyze the exact same snapshot of the site.

2. The Debate Loop

Agent: LoopAgent – containing a ParallelAgent & Moderator This is the core reasoning engine. Instead of a single pass, the system enters an iterative cycle:

Parallel Analysis: Four specialist agents (UrlAnalyst, HtmlAnalyst, ContentAnalyst, BrandAnalyst) analyze the website simultaneously. Each focuses solely on its domain (e.g., the URL analyst looks for typosquatting, while the HTML analyst looks for obfuscated scripts).
Moderator Review: The ModeratorAgent aggregates the specialists' outputs. It evaluates if a consensus exists.
Dynamic Flow:
- If the team agrees, the Moderator calls the exit_loop tool to break the cycle.
- If there is disagreement (e.g., URL looks fine but Content is suspicious), the Moderator triggers another round, forcing agents to re-evaluate based on peer feedback.

3. Final Judgment

Agent: JudgementAgent – Once the debate concludes (either via consensus or reaching the maximum iteration limit), the Judge reviews the entire conversation history. It weighs the final arguments from all specialists and delivers the authoritative PHISHING or LEGITIMATE verdict.

📁 Project Structure

cross-check/
├── .env.example                     # API key template file
├── .github/
│   └── workflows/
│       └── tests.yml                # CI test automation workflow
├── .gitignore
├── .python-version
├── .vscode/
│   └── launch.json                  # VS Code debugger config
├── CITATION.cff                     # Academic citation metadata
├── Dockerfile                       # Container build instructions
├── LICENSE
├── Makefile                         # Project command shortcuts
├── README.md
├── app/
│   ├── config.py                    # Debug and logging settings
│   ├── events.py                    # UI event handlers
│   ├── main.py                      # Mesop app entry point
│   ├── state.py                     # UI state management
│   └── styles.py                    # Component styling rules
├── engine/
│   ├── __init__.py
│   ├── agent.py                     # Multi-agent pipeline definition
│   ├── config.yaml                  # Agent prompts and models
│   ├── interface.py                 # Runner and streaming API
│   ├── schemas.py                   # Pydantic output schemas
│   └── utils.py                     # URL fetching and parsing
├── docs/
│   ├── invalid.mov                  # Invalid URL demo video
│   ├── legitimate.mov               # Legitimate site demo video
│   ├── phishing.mov                 # Phishing detection demo video
│   ├── rate-limit.mov               # Rate limit demo video
│   └── workflow.svg                 # Architecture diagram
├── eval/
│   ├── data/
│   │   ├── legitimate.evalset.json  # Legitimate eval dataset
│   │   ├── phishing.evalset.json    # Phishing eval dataset
│   │   └── test_config.json         # Evaluation config
│   └── test_eval.py                 # Agent evaluation tests
├── pyproject.toml
├── tests/
│   ├── test_agents.py               # Agent unit tests
│   └── test_utils.py                # Utility function tests
└── uv.lock

🧪 Testing

Unit tests run automatically on every push via GitHub Actions. View the workflow status badge at the top of this README.

Viewing Coverage Reports from GitHub:

Go to Actions → click on a workflow run
Download the coverage-report artifact

Extract and serve locally:

cd coverage-report
python -m http.server 8000

Open http://localhost:8000/index.html in your browser

Full Coverage (including integration tests):

Integration tests require LLM API keys. Run locally with:

make coverage

📚 Reference

PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah
arXiv:2506.15656 [cs.CR]

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Check

Overview

Demo

🚀 Getting Started

Prerequisites

Installation

Running the Application

✨ Features

🏗️ Architecture & Workflow

🤖 The Agentic Pipeline

1. Ingestion & Preprocessing

2. The Debate Loop

3. Final Judgment

📁 Project Structure

🧪 Testing

📚 Reference

📄 License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app		app
docs		docs
engine		engine
eval		eval
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Cross-Check

Overview

Demo

🚀 Getting Started

Prerequisites

Installation

Running the Application

✨ Features

🏗️ Architecture & Workflow

🤖 The Agentic Pipeline

1. Ingestion & Preprocessing

2. The Debate Loop

3. Final Judgment

📁 Project Structure

🧪 Testing

📚 Reference

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages