CodeAgora

Where LLMs Debate Your Code

Multiple LLMs review your code in parallel, debate conflicting opinions, then a head agent delivers the final verdict. Different models catch different bugs — consensus filters the noise.

Quick Start

npm i -g codeagora
agora init
git diff | agora review

agora init auto-detects your API keys and CLI tools, then generates a config.

Supported Providers (Tier 1)

Provider	Type	Cost
Groq	API	Free
Anthropic	API	Paid
Claude Code	CLI	Subscription
Gemini CLI	CLI	Free
Codex CLI	CLI	Subscription

Full provider list (24+ API, 12 CLI) ->

How It Works

git diff | agora review

  Pre  --- Semantic Diff Classification
       --- TypeScript Diagnostics
       --- Change Impact Analysis
            |
  L1   --- Reviewer A (security) --+
       --- Reviewer B (logic)    --+-- parallel specialist reviews
       --- Reviewer C (general)  --+
            |
  Filter -- Hallucination Check (file/line validation)
       --- Self-contradiction Filter
       --- Evidence Dedup
            |
  L2   --- Adversarial Discussion (supporters must disprove)
       --- Static analysis evidence in debate
            |
  L3   --- Head Agent --> ACCEPT / REJECT / NEEDS_HUMAN
            |
  Output -- Triage: N must-fix / N verify / N ignore

Web Dashboard

Real-time web UI for monitoring reviews, browsing sessions, and managing configuration.

agora dashboard          # Start on http://localhost:6274
agora dashboard -p 8080  # Custom port

Features:

9 pages — Dashboard, Sessions, Models, Costs, Discussions, Config, Pipeline, Compare, Review Detail
Live pipeline — WebSocket-powered real-time stage progression and discussion updates
Model intelligence — Leaderboard, quality trends, selection frequency charts
httpOnly cookie auth — Secure token exchange via POST /api/auth
Server-side pagination — Filterable by status, search, date range

The dashboard token is printed on startup and persisted to .ca/dashboard-token.

Interactive TUI

Terminal UI for running reviews without leaving the terminal.

agora tui

8 screens: Review Setup, Pipeline Progress, Results, Diff Viewer, Debate, Config, Model Selector, Provider Status. Navigate with arrow keys, Enter to select, q to quit.

MCP Server (Claude Code / Cursor)

9-tool MCP server for AI IDE integration.

// claude_desktop_config.json or .cursor/mcp.json
{
  "mcpServers": {
    "codeagora": {
      "command": "npx",
      "args": ["-y", "@codeagora/mcp"]
    }
  }
}

Tools: review_diff, review_pr, review_staged, session_list, session_detail, explain_session, config_get, config_set, health_check.

Notifications

agora notify 2026-03-27/001  # Send notification for a past session

Supported channels:

Discord — Real-time thread updates + summary (webhook URL in config)
Slack — Summary notification (webhook URL in config)
Generic webhook — HMAC-SHA256 signed payloads over HTTPS

Configure in .ca/config.json under notifications.

Extensions

All extensions are optional — install only what you need.

Package	Install	What it does
@codeagora/web	`npm i -g @codeagora/web`	Web dashboard — 9-page SPA with real-time pipeline monitoring, session history, model leaderboard, cost tracking
@codeagora/tui	`npm i -g @codeagora/tui`	Interactive terminal UI — run reviews, browse sessions, edit config, watch debates in real-time
@codeagora/mcp	`npm i -g @codeagora/mcp`	MCP server (9 tools) — integrates with Claude Code, Cursor, and any MCP-compatible IDE
@codeagora/notifications	`npm i -g @codeagora/notifications`	Webhooks — Discord (real-time threads + summary), Slack (summary), generic (HMAC-SHA256 signed)

Each extension works standalone or together. The core codeagora CLI includes everything needed for command-line reviews and GitHub Actions.

Extension guide ->

GitHub Actions

Add CodeAgora to any repo in 2 steps:

1. Create .ca/config.json (or run agora init):

{
  "mode": "pragmatic",
  "reviewers": [
    { "id": "r1", "model": "llama-3.3-70b-versatile", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 },
    { "id": "r2", "model": "qwen/qwen3-32b", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 },
    { "id": "r3", "model": "meta-llama/llama-4-scout-17b-16e-instruct", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 }
  ]
}

2. Add the workflow (.github/workflows/codeagora-review.yml):

name: CodeAgora Review
on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write
  statuses: write

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: justn-hyeok/CodeAgora@v2
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}

3. Add GROQ_API_KEY to your repo's Settings > Secrets > Actions.

Every PR gets inline review comments, a summary verdict, and a commit status check. Add review:skip label to any PR to bypass.

Documentation

Doc	Content
CLI Reference	All commands and options
Configuration	Config file guide
Providers	Full provider list with tiers
Architecture	Pipeline design and project structure
Extensions	Web, TUI, MCP, Notifications
Troubleshooting	Common errors and fixes, exit codes
FAQ	Frequently asked questions

Development

pnpm install && pnpm build
pnpm test          # 3386 tests
pnpm test:coverage # with coverage report
pnpm typecheck
pnpm cli review path/to/diff.patch

Benchmarks

Golden-bug fixtures under benchmarks/golden-bugs/ drive the false-negative measurement framework (see #472).

Score pre-computed results (fast, no API calls):

pnpm bench:fn -- --validate-only                     # schema-check fixtures
pnpm bench:fn -- --results path/to/results-dir       # score against pre-computed review output
pnpm bench:fn -- --results path/to/results-dir --json  # CI-friendly JSON report

Run the live pipeline against every fixture (produces the results dir above):

export OPENROUTER_API_KEY=...
pnpm bench:fn:run -- --results ./bench-out
pnpm bench:fn     -- --results ./bench-out

The driver uses benchmarks/.ca/config.json — a lean 3-reviewer OpenRouter setup. A full run over the 4 seed fixtures costs roughly $0.04–$0.10 depending on discussion rounds. Add --fixtures id1,id2 to restrict, --skip-head to skip the L3 verdict stage.

Two fixture kinds live side by side:

Recall cases (expectedFindings non-empty) — review must surface each listed bug. Misses count as FN.
FP regression cases (expectedFindings is []) — review must report nothing. Any finding is a regression.

Current seed fixtures: 3 recall cases (off-by-one, null-deref, SQL injection) + 1 FP regression (PR #490 moderator regex). See benchmarks/golden-bugs/README.md for fixture format.

Baseline (n=3, 2026-04-20)

Three live runs with the default 3-reviewer OpenRouter config (#24666562754, #24667305646, #24667897271):

Metric	Mean	Min	Max
recall@3	100.0%	100.0%	100.0%
recall@5	100.0%	100.0%	100.0%
recall@10	100.0%	100.0%	100.0%
FPs per fp-regression fixture	2.3	2	3
fp-regression triggered	3/3 runs

Recall stable — all three recall cases (off-by-one, null-deref, SQL injection) caught in top-3 on every run.

FP regression triggered on every run — but the content of the phantom findings shifts between runs: CRITICAL×3 about unhandled JSON.parse on run 1, WARNING×2 about regex DoS + input size on run 2, WARNING + CRITICAL about unbounded string + missing type import on run 3. Each individual claim is a plausible-sounding, code-level assertion that the review would make against a real diff, which is exactly why the current calibration stack does not filter them. This confirms the "high-confidence corroborated FP" blind spot documented in project_calibration_stack.md. This fixture is the regression gate for future calibration work (see #468).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 553 Commits
.claude/skills		.claude/skills
.github		.github
assets		assets
benchmarks		benchmarks
dist		dist
docs		docs
examples		examples
packages		packages
plugin		plugin
scripts		scripts
src		src
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmignore		.npmignore
AGENTS.md		AGENTS.md
CHANGELOG.ko.md		CHANGELOG.ko.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.ko.md		CODE_OF_CONDUCT.ko.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.ko.md		CONTRIBUTING.ko.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md
SECURITY.ko.md		SECURITY.ko.md
SECURITY.md		SECURITY.md
action.yml		action.yml
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAgora

Quick Start

Supported Providers (Tier 1)

How It Works

Web Dashboard

Interactive TUI

MCP Server (Claude Code / Cursor)

Notifications

Extensions

GitHub Actions

Documentation

Development

Benchmarks

Baseline (n=3, 2026-04-20)

License

About

Uh oh!

Releases 32

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeAgora

Quick Start

Supported Providers (Tier 1)

How It Works

Web Dashboard

Interactive TUI

MCP Server (Claude Code / Cursor)

Notifications

Extensions

GitHub Actions

Documentation

Development

Benchmarks

Baseline (n=3, 2026-04-20)

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 32

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages