Skip to content

Commit 5f469b5

Browse files
arimxyerclaude
andcommitted
docs: update README + website copy for multi-source benchmarks
README and the landing page still advertised "~400 entries from Artificial Analysis". Now: 4 sources credited (AA, Epoch AI, Arena, LLM Stats), and the website's BENCHMARK_COUNT derives from the four v2 data files (sum of model rows, ~1,062) instead of the frozen legacy benchmarks.json lane. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent c3705f4 commit 5f469b5

3 files changed

Lines changed: 16 additions & 9 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ TUI and CLI for browsing AI models, benchmarks, coding agents, and provider stat
1414
## Highlights
1515

1616
- **~4,000+ models** across 85+ providers from [models.dev](https://models.dev) — filter by capability, price, context, and provider category
17-
- **~400 benchmark entries** from [Artificial Analysis](https://artificialanalysis.ai) — compare models head-to-head with scatter plots and radar charts
17+
- **~1,000 benchmark entries** across 4 data sources — [Artificial Analysis](https://artificialanalysis.ai), [Epoch AI](https://epoch.ai), [Arena](https://arena.ai), and [LLM Stats](https://llm-stats.com) — compare models head-to-head with scatter plots and radar charts
1818
- **11+ coding agents** tracked with version detection, changelogs, and GitHub integration
1919
- **22 provider statuses** monitored live across 7 status page platforms
2020

@@ -86,7 +86,7 @@ Curated catalog of 12+ agents with automatic version detection, GitHub release t
8686

8787
![Benchmarks tab](public/assets/benchmark-screenshot.png)
8888

89-
~400 entries with quality indexes, speed, and pricing. Compare mode with head-to-head tables, scatter plots, and radar charts. Filter by creator, region, type, reasoning, and open/closed source.
89+
~1,000 entries across 4 switchable data sources (Artificial Analysis, Epoch AI, Arena, LLM Stats) with quality indexes, Elo ratings, speed, and pricing. Compare mode with head-to-head tables, scatter plots, and radar charts. Choose visible metric columns, cycle a field-average/peer-average/rank comparator in the detail panel, and refresh any source in-app. Filter by creator, region, type, reasoning, and open/closed source.
9090

9191
[Benchmarks wiki page](https://github.com/reyamira/models/wiki/Benchmarks) &#8226; CLI: `models benchmarks list`, `models benchmarks show`
9292

@@ -118,7 +118,7 @@ Full documentation lives in the [wiki](https://github.com/reyamira/models/wiki):
118118
## Data Sources
119119

120120
- **Models**: [models.dev](https://models.dev) by [SST](https://github.com/sst/models.dev)
121-
- **Benchmarks**: [Artificial Analysis](https://artificialanalysis.ai)
121+
- **Benchmarks**: [Artificial Analysis](https://artificialanalysis.ai), [Epoch AI](https://epoch.ai) (CC-BY), [Arena](https://arena.ai), [LLM Stats](https://llm-stats.com)
122122
- **Agents**: Curated catalog in [`data/agents.json`](data/agents.json) — contributions welcome!
123123
- **Status**: Official provider status pages ([Statuspage](https://www.atlassian.com/software/statuspage), [BetterStack](https://betterstack.com), [Instatus](https://instatus.com), [incident.io](https://incident.io), and more)
124124

website/src/components/Features.astro

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ const tabTriggerClass =
8585
>03. BENCHMARKS</span
8686
>
8787
<p class="font-mono text-xs text-slate-400">
88-
H2H tables, scatter plots, radar charts. ~{DISPLAY.benchmarks} entries.
88+
H2H tables, scatter plots, radar charts. {DISPLAY.benchmarks} entries
89+
from 4 sources.
8990
</p>
9091
<div
9192
class="tab-progress absolute bottom-0 left-0 h-0.5 w-0 bg-(--neon-cyan)"
@@ -194,7 +195,8 @@ const tabTriggerClass =
194195
class="mb-4 max-w-lg text-base leading-tight font-light text-white md:text-xl"
195196
>
196197
Head-to-head comparison with scatter plots, radar charts, and
197-
quality indexes from Artificial Analysis.
198+
quality indexes from four switchable sources — Artificial Analysis,
199+
Epoch AI, Arena, and LLM Stats.
198200
</p>
199201
<ul class="space-y-1 font-mono text-xs text-(--neon-cyan) uppercase">
200202
<li>+ Compare models side by side</li>

website/src/data/site.ts

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,15 @@ export const CRATES_URL = `https://crates.io/crates/${CRATE_NAME}`;
3030

3131
// --- Local data files ---
3232

33-
const benchmarksJson = JSON.parse(readRepoFile("data/benchmarks.json"));
34-
export const BENCHMARK_COUNT = Array.isArray(benchmarksJson)
35-
? benchmarksJson.length
36-
: Object.keys(benchmarksJson).length;
33+
// Sum of model rows across the four v2 benchmark sources (aa, epoch, arena,
34+
// llmstats). The legacy data/benchmarks.json is the frozen single-source lane
35+
// for released binaries — not representative of the tab anymore.
36+
const BENCHMARK_SOURCE_IDS = ["aa", "epoch", "arena", "llmstats"] as const;
37+
export const BENCHMARK_SOURCE_COUNT = BENCHMARK_SOURCE_IDS.length;
38+
export const BENCHMARK_COUNT = BENCHMARK_SOURCE_IDS.reduce((total, id) => {
39+
const sourceFile = JSON.parse(readRepoFile(`data/v2/${id}.json`));
40+
return total + (sourceFile.models?.length ?? 0);
41+
}, 0);
3742

3843
const agentsJson = JSON.parse(readRepoFile("data/agents.json"));
3944
const agents = agentsJson.agents ?? {};

0 commit comments

Comments
 (0)