Skip to content

Add scoreboard freshness check that opens an issue on stale data#123

Open
strimo378 wants to merge 3 commits into
mainfrom
scoreboard-freshness-check
Open

Add scoreboard freshness check that opens an issue on stale data#123
strimo378 wants to merge 3 commits into
mainfrom
scoreboard-freshness-check

Conversation

@strimo378

Copy link
Copy Markdown
Collaborator

Problem

The published scoreboard at https://onnx.ai/backend-scoreboard/index.html shows stale "Latest Update" dates. Root cause: the daily Run benchmarks workflow has been failing for weeks (the two onnx-tf Docker builds break because tabulate>=0.10.0 is not installable on the Python 3.8 image). Because the deploy job collect_and_deploy depends via needs: on all backend jobs (without if: always()), it is skipped on every failure → no new results land on the benchmark-results branch → the website re-renders the same old state every day. On top of that nobody gets notified, since GitHub only emails cron-workflow failures to a single "actor" account.

What this PR does (pure visibility layer — does not touch the benchmark pipeline)

Adds standalone monitoring that checks the actual outcome — the published page — instead of internal CI state.

Files (3):

  1. scripts/check_scoreboard_freshness.py — downloads index.html, reads the last shown date per backend from the embedded database JSON (trend[-1].date, format MM/DD/YYYY HH:MM:SS) and flags any backend older than 3 days as stale. Threshold configurable via --max-age-days. Results go to GITHUB_OUTPUT.
  2. .github/workflows/scoreboard-freshness.yml — runs daily (09:00 UTC) + manual dispatch. Runs the script and, via actions/github-script, manages exactly one reusable issue (label scoreboard-stale): created on first staleness, silently refreshed while stale, auto-closed on recovery → no issue spam.
  3. .gitignore — un-ignores the new script (scripts/* is otherwise ignored).

Status: Tested locally (stale + fresh + error paths).

Note

This is intentionally only the reporting layer. The underlying root cause (onnx-tf/tabulate build + missing if: always() on the deploy job) is not fixed here — that would be a separate, optional follow-up.

Comment thread .github/workflows/scoreboard-freshness.yml Fixed
Comment thread .github/workflows/scoreboard-freshness.yml Fixed
Comment thread .github/workflows/scoreboard-freshness.yml Fixed
strimo378 and others added 2 commits June 17, 2026 13:37
Adds a daily workflow that fetches the published scoreboard page
(index.html), reads each backend's last update date from the embedded
data and opens a tracking issue when any backend is older than 3 days.

To avoid issue spam a single labelled issue is reused: it is created on
the first stale detection, refreshed silently while it stays stale, and
closed automatically once all backends are up to date again.

- scripts/check_scoreboard_freshness.py: parser + freshness evaluation,
  writes results to GITHUB_OUTPUT (configurable --max-age-days, default 3)
- .github/workflows/scoreboard-freshness.yml: daily schedule + manual
  dispatch, manages the tracking issue via actions/github-script
- .gitignore: whitelist the new script (scripts/* is ignored by default)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EeeD61kc68kKLbHuSmahUc
Signed-off-by: Timo Stripf <timo.stripf@emmtrix.com>
Addresses zizmor code-scanning findings: actions must be pinned to a
full commit hash per the repo's blanket policy.

- actions/checkout      -> de0fac2 (v6.0.2, matches existing workflows)
- actions/setup-python  -> a309ff8 (v6.2.0)
- actions/github-script -> f28e40c (v7.1.0)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Timo Stripf <timo.stripf@emmtrix.com>
@strimo378 strimo378 force-pushed the scoreboard-freshness-check branch from cbf0bcc to 4abd383 Compare June 17, 2026 11:37
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a lightweight monitoring layer that checks the published backend scoreboard for freshness and automatically manages a single tracking issue when data becomes stale, improving visibility without changing the benchmark pipeline itself.

Changes:

  • Add scripts/check_scoreboard_freshness.py to download the published index.html, parse embedded scoreboard JSON, and compute per-backend staleness with CI-friendly outputs.
  • Add a scheduled GitHub Actions workflow to run the check daily and open/update/close a tracking issue.
  • Update .gitignore to ensure the new script is committed (despite scripts/* being ignored by default).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File Description
scripts/check_scoreboard_freshness.py New Python script that fetches/parses the published scoreboard and emits staleness outputs + issue-friendly Markdown.
.github/workflows/scoreboard-freshness.yml New scheduled workflow that runs the script and manages the tracking issue.
.gitignore Un-ignores the new monitoring script so it’s included in the repo.

Comment on lines +48 to +60
const label = 'scoreboard-stale';
const title = '⚠️ Scoreboard data is out of date';
const stale = process.env.STALE === 'true';
const footer = `\n\n---\n_Automated by [Scoreboard freshness check](${process.env.RUN_URL})._`;
const body = (process.env.ISSUE_BODY || '').trim() + footer;

const existing = await github.paginate(github.rest.issues.listForRepo, {
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
labels: label,
per_page: 100,
});
labels: label,
per_page: 100,
});
const issue = existing.find(i => i.title === title) || existing[0];
Comment on lines +50 to +54
"""Extract and parse the embedded scoreboard database JSON."""
match = DATABASE_RE.search(html)
if not match:
raise ValueError("Could not find the 'database' payload in the page")
return json.loads(match.group(1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants