ci: split Project Tests into one parallel job per exemplar (~45min → ~slowest-project)#18
Merged
Merged
Conversation
Replace the single sequential 9-exemplar test-project job (sum ~45 min on ubuntu) with a project-axis matrix: each public exemplar runs in its own parallel ubuntu job at py3.10 (floor) + py3.12 (ceiling) = 18 jobs. Wall-clock becomes the slowest single project (~active_inference) instead of the sum, and a failure now names the exact project immediately. Each job runs 'scripts/01_run_tests.py --project <name> --project-only --include-slow' and enforces that project's OWN 90% floor (authoritative per CLAUDE.md), which also removes the old code_project/fep_lean conftest plugin collision (every project is isolated in its own job). macOS breadth stays in test-infra. Codecov upload (py3.12) merges per-project coverage by flag. Updated .github CI-structure docs (AGENTS/README + workflows/AGENTS/README) to match the new test-infra (ubuntu x3 + macOS 3.12) and test-project layouts. Validated locally: 'scripts/01_run_tests.py --project templates/template_newspaper --project-only --include-slow' = 48/48, 94.1% (>=90%).
The job-split worked (every project PASSED its own 90% gate, e.g. newspaper 48/48 @ 94%), but the trailing 'uv run coverage xml -o coverage-project.xml' ran at repo root where there is no data — scripts/01_run_tests.py --project runs pytest with cwd=<project> and writes coverage into the project dir. CI's coverage exits 1 on 'No data to report', so set -e failed all 18 jobs despite green tests. The per-project --cov-fail-under=90 is enforced inside 01_run_tests.py; remove the redundant repo-root xml and point Codecov at the project's own coverage_project.json (best-effort, fail_ci_if_error: false).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #17 (base will retarget to main once #17 merges).
What
Replace the single sequential 9-exemplar
test-projectjob (sum ~45 min on ubuntu) with a project-axis matrix: each public exemplar runs in its own parallel ubuntu job at py3.10 (floor) + py3.12 (ceiling) = 18 jobs.Why
scripts/01_run_tests.py --project <name> --project-only --include-slow, enforcing that project's own 90% floor (per CLAUDE.md) — stronger than the old 75% combined-union gate.template_code_project/fep_leanconftest plugin-name collision (every project is isolated in its own job).test-infra; Codecov upload (py3.12) merges per-project coverage by flag.Validation
Locally:
scripts/01_run_tests.py --project templates/template_newspaper --project-only --include-slow→ 48/48, 94.1% (≥90%). YAML valid; matrix expands to 18 jobs..githubCI-structure docs (AGENTS/README + workflows/AGENTS/README) updated to match the new test-infra (ubuntu×3 + macOS 3.12) and test-project layouts.This PR is the structural follow-up flagged in #17; its own CI run validates the new matrix before merge.