docs: update README + website copy for multi-source benchmarks

arimxyer · claude · arimxyer · commit 5f469b52b24d · 2026-06-11T22:13:06.000-04:00
README and the landing page still advertised "~400 entries from
Artificial Analysis". Now: 4 sources credited (AA, Epoch AI, Arena,
LLM Stats), and the website's BENCHMARK_COUNT derives from the four
v2 data files (sum of model rows, ~1,062) instead of the frozen
legacy benchmarks.json lane.

Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ TUI and CLI for browsing AI models, benchmarks, coding agents, and provider stat
 ## Highlights
 
 - **~4,000+ models** across 85+ providers from [models.dev](https://models.dev) — filter by capability, price, context, and provider category
-- **~400 benchmark entries** from [Artificial Analysis](https://artificialanalysis.ai) — compare models head-to-head with scatter plots and radar charts
+- **~1,000 benchmark entries** across 4 data sources — [Artificial Analysis](https://artificialanalysis.ai), [Epoch AI](https://epoch.ai), [Arena](https://arena.ai), and [LLM Stats](https://llm-stats.com) — compare models head-to-head with scatter plots and radar charts
 - **11+ coding agents** tracked with version detection, changelogs, and GitHub integration
 - **22 provider statuses** monitored live across 7 status page platforms
 
@@ -86,7 +86,7 @@ Curated catalog of 12+ agents with automatic version detection, GitHub release t
 
 ![Benchmarks tab](public/assets/benchmark-screenshot.png)
 
-~400 entries with quality indexes, speed, and pricing. Compare mode with head-to-head tables, scatter plots, and radar charts. Filter by creator, region, type, reasoning, and open/closed source.
+~1,000 entries across 4 switchable data sources (Artificial Analysis, Epoch AI, Arena, LLM Stats) with quality indexes, Elo ratings, speed, and pricing. Compare mode with head-to-head tables, scatter plots, and radar charts. Choose visible metric columns, cycle a field-average/peer-average/rank comparator in the detail panel, and refresh any source in-app. Filter by creator, region, type, reasoning, and open/closed source.
 
 [Benchmarks wiki page](https://github.com/reyamira/models/wiki/Benchmarks) &#8226; CLI: `models benchmarks list`, `models benchmarks show`
 
@@ -118,7 +118,7 @@ Full documentation lives in the [wiki](https://github.com/reyamira/models/wiki):
 ## Data Sources
 
 - **Models**: [models.dev](https://models.dev) by [SST](https://github.com/sst/models.dev)
-- **Benchmarks**: [Artificial Analysis](https://artificialanalysis.ai)
+- **Benchmarks**: [Artificial Analysis](https://artificialanalysis.ai), [Epoch AI](https://epoch.ai) (CC-BY), [Arena](https://arena.ai), [LLM Stats](https://llm-stats.com)
 - **Agents**: Curated catalog in [`data/agents.json`](data/agents.json) — contributions welcome!
 - **Status**: Official provider status pages ([Statuspage](https://www.atlassian.com/software/statuspage), [BetterStack](https://betterstack.com), [Instatus](https://instatus.com), [incident.io](https://incident.io), and more)
 
diff --git a/website/src/components/Features.astro b/website/src/components/Features.astro
@@ -85,7 +85,8 @@ const tabTriggerClass =
             >03. BENCHMARKS</span
           >
           <p class="font-mono text-xs text-slate-400">
-            H2H tables, scatter plots, radar charts. ~{DISPLAY.benchmarks} entries.
+            H2H tables, scatter plots, radar charts. {DISPLAY.benchmarks} entries
+            from 4 sources.
           </p>
           <div
             class="tab-progress absolute bottom-0 left-0 h-0.5 w-0 bg-(--neon-cyan)"
@@ -194,7 +195,8 @@ const tabTriggerClass =
             class="mb-4 max-w-lg text-base leading-tight font-light text-white md:text-xl"
           >
             Head-to-head comparison with scatter plots, radar charts, and
-            quality indexes from Artificial Analysis.
+            quality indexes from four switchable sources — Artificial Analysis,
+            Epoch AI, Arena, and LLM Stats.
           </p>
           <ul class="space-y-1 font-mono text-xs text-(--neon-cyan) uppercase">
             <li>+ Compare models side by side</li>
diff --git a/website/src/data/site.ts b/website/src/data/site.ts
@@ -30,10 +30,15 @@ export const CRATES_URL = `https://crates.io/crates/${CRATE_NAME}`;
 
 // --- Local data files ---
 
-const benchmarksJson = JSON.parse(readRepoFile("data/benchmarks.json"));
-export const BENCHMARK_COUNT = Array.isArray(benchmarksJson)
-  ? benchmarksJson.length
-  : Object.keys(benchmarksJson).length;
+// Sum of model rows across the four v2 benchmark sources (aa, epoch, arena,
+// llmstats). The legacy data/benchmarks.json is the frozen single-source lane
+// for released binaries — not representative of the tab anymore.
+const BENCHMARK_SOURCE_IDS = ["aa", "epoch", "arena", "llmstats"] as const;
+export const BENCHMARK_SOURCE_COUNT = BENCHMARK_SOURCE_IDS.length;
+export const BENCHMARK_COUNT = BENCHMARK_SOURCE_IDS.reduce((total, id) => {
+  const sourceFile = JSON.parse(readRepoFile(`data/v2/${id}.json`));
+  return total + (sourceFile.models?.length ?? 0);
+}, 0);
 
 const agentsJson = JSON.parse(readRepoFile("data/agents.json"));
 const agents = agentsJson.agents ?? {};