Add scoreboard freshness check that opens an issue on stale data#123
Open
strimo378 wants to merge 3 commits into
Open
Add scoreboard freshness check that opens an issue on stale data#123strimo378 wants to merge 3 commits into
strimo378 wants to merge 3 commits into
Conversation
Adds a daily workflow that fetches the published scoreboard page (index.html), reads each backend's last update date from the embedded data and opens a tracking issue when any backend is older than 3 days. To avoid issue spam a single labelled issue is reused: it is created on the first stale detection, refreshed silently while it stays stale, and closed automatically once all backends are up to date again. - scripts/check_scoreboard_freshness.py: parser + freshness evaluation, writes results to GITHUB_OUTPUT (configurable --max-age-days, default 3) - .github/workflows/scoreboard-freshness.yml: daily schedule + manual dispatch, manages the tracking issue via actions/github-script - .gitignore: whitelist the new script (scripts/* is ignored by default) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EeeD61kc68kKLbHuSmahUc Signed-off-by: Timo Stripf <timo.stripf@emmtrix.com>
Addresses zizmor code-scanning findings: actions must be pinned to a full commit hash per the repo's blanket policy. - actions/checkout -> de0fac2 (v6.0.2, matches existing workflows) - actions/setup-python -> a309ff8 (v6.2.0) - actions/github-script -> f28e40c (v7.1.0) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Timo Stripf <timo.stripf@emmtrix.com>
cbf0bcc to
4abd383
Compare
|
There was a problem hiding this comment.
Pull request overview
This PR adds a lightweight monitoring layer that checks the published backend scoreboard for freshness and automatically manages a single tracking issue when data becomes stale, improving visibility without changing the benchmark pipeline itself.
Changes:
- Add
scripts/check_scoreboard_freshness.pyto download the publishedindex.html, parse embedded scoreboard JSON, and compute per-backend staleness with CI-friendly outputs. - Add a scheduled GitHub Actions workflow to run the check daily and open/update/close a tracking issue.
- Update
.gitignoreto ensure the new script is committed (despitescripts/*being ignored by default).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| scripts/check_scoreboard_freshness.py | New Python script that fetches/parses the published scoreboard and emits staleness outputs + issue-friendly Markdown. |
| .github/workflows/scoreboard-freshness.yml | New scheduled workflow that runs the script and manages the tracking issue. |
| .gitignore | Un-ignores the new monitoring script so it’s included in the repo. |
Comment on lines
+48
to
+60
| const label = 'scoreboard-stale'; | ||
| const title = '⚠️ Scoreboard data is out of date'; | ||
| const stale = process.env.STALE === 'true'; | ||
| const footer = `\n\n---\n_Automated by [Scoreboard freshness check](${process.env.RUN_URL})._`; | ||
| const body = (process.env.ISSUE_BODY || '').trim() + footer; | ||
|
|
||
| const existing = await github.paginate(github.rest.issues.listForRepo, { | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| state: 'open', | ||
| labels: label, | ||
| per_page: 100, | ||
| }); |
| labels: label, | ||
| per_page: 100, | ||
| }); | ||
| const issue = existing.find(i => i.title === title) || existing[0]; |
Comment on lines
+50
to
+54
| """Extract and parse the embedded scoreboard database JSON.""" | ||
| match = DATABASE_RE.search(html) | ||
| if not match: | ||
| raise ValueError("Could not find the 'database' payload in the page") | ||
| return json.loads(match.group(1)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Problem
The published scoreboard at https://onnx.ai/backend-scoreboard/index.html shows stale "Latest Update" dates. Root cause: the daily
Run benchmarksworkflow has been failing for weeks (the twoonnx-tfDocker builds break becausetabulate>=0.10.0is not installable on the Python 3.8 image). Because the deploy jobcollect_and_deploydepends vianeeds:on all backend jobs (withoutif: always()), it is skipped on every failure → no new results land on thebenchmark-resultsbranch → the website re-renders the same old state every day. On top of that nobody gets notified, since GitHub only emails cron-workflow failures to a single "actor" account.What this PR does (pure visibility layer — does not touch the benchmark pipeline)
Adds standalone monitoring that checks the actual outcome — the published page — instead of internal CI state.
Files (3):
scripts/check_scoreboard_freshness.py— downloadsindex.html, reads the last shown date per backend from the embeddeddatabaseJSON (trend[-1].date, formatMM/DD/YYYY HH:MM:SS) and flags any backend older than 3 days as stale. Threshold configurable via--max-age-days. Results go toGITHUB_OUTPUT..github/workflows/scoreboard-freshness.yml— runs daily (09:00 UTC) + manual dispatch. Runs the script and, viaactions/github-script, manages exactly one reusable issue (labelscoreboard-stale): created on first staleness, silently refreshed while stale, auto-closed on recovery → no issue spam..gitignore— un-ignores the new script (scripts/*is otherwise ignored).Status: Tested locally (stale + fresh + error paths).
Note
This is intentionally only the reporting layer. The underlying root cause (
onnx-tf/tabulatebuild + missingif: always()on the deploy job) is not fixed here — that would be a separate, optional follow-up.