fix(mutation): measure per-file to correct Stryker misreporting#90
Conversation
Stryker's vitest runner zeroes out src/main.ts (and under-counts other files) when all 8 mutated files run in a single pass. The suite's resetModules + doMock + dynamic-import style serves stale, un-mutated modules to the covering tests at scale, so main.ts scored 0% in the full run despite being ~86% covered in isolation. The headline dropped to ~42% and failed the 65% gate, even though the true score is ~71%. coverageAnalysis "all" did not help and maxTestRunnerReuse 1 crashed esbuild, so this measures each mutated file in its own Stryker run (where attribution is accurate) and aggregates the per-file JSON reports, gating on the combined score. - add scripts/run-batched-mutation.mjs (per-file run + aggregate + gate) - test:mutation now runs the batched runner; test:mutation:single keeps the raw `stryker run` for ad-hoc use - per-file reports are saved under reports/mutation/by-file for the CI artifact, plus an aggregate.json summary Aggregate is now 70.80% (>= 65). Genuinely lower files remain as future work: download.ts 59%, purge.ts 57%, upload.ts 64%.
|
Build artifact for this PR: |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the project’s mutation-testing workflow to work around a Stryker+Vitest measurement issue by running mutation testing per file and aggregating results, ensuring the mutation gate reflects accurate attribution for this codebase’s heavy module-mocking patterns.
Changes:
- Add a new batched mutation runner script that runs Stryker once per mutated file, aggregates JSON reports, and enforces the configured break threshold.
- Update
test:mutationto use the batched runner while preserving atest:mutation:singleescape hatch for ad-hoc raw Stryker runs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/run-batched-mutation.mjs | New per-file Stryker runner that aggregates per-file JSON results and gates on the combined score. |
| package.json | Switch test:mutation to the batched runner and add test:mutation:single. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Targeted tests for the three files that were genuinely below threshold when measured per-file (true scores, not the earlier full-run artifact): - purge.ts 56.8% -> 87.8%: bucket-wide purge + confirmation warning, dry-run preview, non-slash prefix normalization, failed-version warning. - upload.ts 64.2% -> 66.1%: progress label + fileId/sha1 summary line. - download.ts 59.1% -> 67.0%: prefix group/progress/wrote logging, destination resolution (existing dir / trailing slash), SSE-C round-trip, Windows reserved-name rejection, file/dir path collision, and the win32 rename-overwrite retry (fs + platform mocked).
|
Build artifact for this PR: |
Expand the Stryker mutate list to every src file (was 8 of 24) so mutation testing reaches the whole action surface, and lift the newly-covered laggards: - head.ts 30% -> 90%, hide.ts 37.5% -> 100%, unhide.ts 57% -> 93%: assert group labels, info lines, and the requireSource error wording. - progress.ts 57% -> 77%: pin percent/parts/total-suffix formatting, throttle skip vs first/final emit, and MB/s math (fake timers). - fs.ts 67% -> 100%: drop a redundant explicit `return undefined` whose only surviving mutant was equivalent (implicit undefined return). Add 'startgroup'/'endgroup' to the cspell allowlist; dist/ rebuilt.
|
Build artifact for this PR: |
The Stryker mutate list now spans every src file (was 8 of 24), so the gate reflects the whole action. With the laggard files lifted, the real aggregate is comfortably above 70, so raise thresholds.break 65 -> 70 (high 75 -> 80) to lock in the floor and prevent regressions.
|
Build artifact for this PR: |
|
Build artifact for this PR: |
|
Build artifact for this PR: |
|
Build artifact for this PR: |
|
Build artifact for this PR: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 21 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
.github/workflows/full-lockfile-audit.yml:41
- This job requests
issues: writeat the job level even onpull_requestruns. Since the PR build checks out and executes repo code (includingscripts/full-lockfile-audit.mjs), granting write permissions on PR runs increases the blast radius if a PR ever manages to execute unintended code. Consider splitting into two jobs (PR:contents: readonly; non-PR:issues: write+ tracking-issue steps) so PR runs never receiveissues: write.
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read # check out the repository and read the committed lockfile
issues: write # open or update default-branch tracking issues
|
Build artifact for this PR: |
|
Build artifact for this PR: |
Summary
This PR fixes the mutation gate by running Stryker once per configured mutation target, then aggregating the per-file JSON reports into a single score. The original 41.98% result was a Stryker Vitest runner measurement artifact caused by full-suite module isolation behavior, not the actual aggregate mutation score.
Implemented Fixes
scripts/run-batched-mutation.mjsand the tested runner library that powerspnpm test:mutation.pnpm test:mutation:singleas the rawstryker runcommand for local diagnosis.thresholds.break: nullby skipping the aggregate break gate when Stryker disables it.reports/mutation/by-file/before each run and writesreports/mutation/aggregate.jsonfor artifacts.pnpm.cmdon Windows and realpath-based entrypoint detection for robust CLI execution.dist/output.startgroupcspell allowlist entry.DEVELOPMENT.mdand.github/workflows/README.mdfor the batched runner contract, report layout, scope, scripts, and threshold.Scope and Baseline
The mutation scope expansion is intentional.
stryker.conf.jsonnow lists 24 action-ownedsrctargets, covering support modules insrc/*.tsand command implementations insrc/commands/*.ts.src/commands/*.tstargetssrc/inputs.tssrc/main.tssrc/sse.tsThe aggregate break threshold remains 65%.
Verification
lint,typecheck,test,build,cspell, and coverage hooks pass.