Skip to content

bench: v0.23.0 validation — 5/6 PASS, 6/6 completed#91

Merged
askalf merged 1 commit into
masterfrom
chore/validation-scoreboard
Jun 12, 2026
Merged

bench: v0.23.0 validation — 5/6 PASS, 6/6 completed#91
askalf merged 1 commit into
masterfrom
chore/validation-scoreboard

Conversation

@askalf

@askalf askalf commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Same six questions that scored 1/6 under last night's DDG block and
4/6 yesterday morning. With the keyword ladder (#89) and dario 4.8.66
upstream: every run completes, citation support 0.68-1.00, cost
$0.04-$0.16/query. The one FAIL (recent) completed with a fully-cited
answer but kept 1 source vs the 3-source gate — --since=180d thins
the pool by design; gate calibration question, not a product gap.

Same six questions that scored 1/6 under last night's DDG block and
4/6 yesterday morning. With the keyword ladder (#89) and dario 4.8.66
upstream: every run completes, citation support 0.68-1.00, cost
$0.04-$0.16/query. The one FAIL (recent) completed with a fully-cited
answer but kept 1 source vs the 3-source gate — --since=180d thins
the pool by design; gate calibration question, not a product gap.
@askalf askalf merged commit 19a7fb1 into master Jun 12, 2026
5 checks passed
@askalf askalf deleted the chore/validation-scoreboard branch June 12, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant