Skip to content

chore(tests): regenerate stale automated test catalog on main#1804

Closed
markmhendrickson wants to merge 1 commit into
mainfrom
fix/test-catalog-stale-main
Closed

chore(tests): regenerate stale automated test catalog on main#1804
markmhendrickson wants to merge 1 commit into
mainfrom
fix/test-catalog-stale-main

Conversation

@markmhendrickson

Copy link
Copy Markdown
Owner

Problem

main's docs/testing/automated_test_catalog.md has been stale since #1797, which merged tests/integration/sandbox_seed_token_bypass.test.ts without regenerating the catalog. The baseline CI job runs validate:test-catalog --check, so every PR opened against main fails baseline on this check — including fixture-only or docs-only PRs that touch no tests.

Confirmed by running npm run validate:test-catalog on a clean origin/main checkout: ❌ stale.

Fix

Regenerated the catalog (npm run generate:test-catalog). The drift is exactly:

  • tests/integration/sandbox_seed_token_bypass.test.ts added to the integration file list
  • repo-wide total 501→502, backend 467→468, integration count 141→142

Generator output is deterministic (stable across repeated runs); validate:test-catalog passes after the change.

Impact

Unblocks baseline for all open PRs once they pick this up (rebase / merge main). Standalone and low-risk — a single generated-doc file.

🤖 Generated with Claude Code

main's catalog has been stale since #1797 merged tests/integration/
sandbox_seed_token_bypass.test.ts without regenerating it. Every subsequent
PR's `baseline` job fails `validate:test-catalog --check` as a result.

Committed as RAW generator output (NOT prettier-formatted): CI's
`validate:test-catalog --check` runs the raw generator and compares, and
`format:check` only globs src/**, so the catalog must match generator output
exactly. Prettier-formatting it is what made prior regenerations still read
as "stale".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:pm
🤖 Pavo — Ateles swarm, pm lens panelist
APPROVE

Scope & Acceptance Verification

  • Change intent matches scope: Regenerate stale test catalog to unblock CI blocking all PRs on main.
  • No scope creep: Single-file change; purely mechanical regeneration of deterministic output from npm run generate:test-catalog. No feature additions, no refactoring, no policy changes.
  • User-visible behavior matches issue: Unblocks baseline CI job that was blocking all PRs, including non-test-related changes. This restore correctness to a now-stale validation gate.
  • Low risk: Catalog is auto-generated; diff reflects only the new test file added in PR fix(sandbox): exempt internal seeding from the write rate limit #1797. No hand-authored content or manual edits that could introduce errors.

PM Lens Findings

[NON-BLOCKING] Maintenance hygiene: Consider adding a pre-commit hook or CI gate to regenerate the catalog automatically when test files change, so this stale-state gap does not reoccur. This is a follow-up suggestion, not a blocker for this PR.


Verdict: Change unblocks a critical validation gate that was incorrectly blocking all PRs on main. Scope is tight, intent is clear, and risk is minimal. Ready to merge.

@markmhendrickson markmhendrickson force-pushed the fix/test-catalog-stale-main branch from 6f632d0 to f30f510 Compare June 25, 2026 12:29
@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:pm
🤖 Pavo — Ateles swarm, pm lens panelist

APPROVE

Scope alignment

  • Problem: main's test catalog (docs/testing/automated_test_catalog.md) is stale since PR fix(sandbox): exempt internal seeding from the write rate limit #1797 merged a new integration test without regenerating the catalog. This causes baseline CI to fail on validate:test-catalog --check for all open PRs.
  • Scope: Regenerate the catalog via npm run generate:test-catalog to reflect the current repo state (new test: sandbox_seed_token_bypass.test.ts).
  • In-scope: Single generated-file update; deterministic, generator-driven.
  • Out-of-scope: No code behavior changes, test logic changes, or fixture modifications.

Acceptance criteria met

  • Catalog regenerated with npm run generate:test-catalog
  • Diff is isolated: only catalog file updated
  • Drift is minimal and correct: +1 integration test (501→502 total, 141→142 integration), matching fix(sandbox): exempt internal seeding from the write rate limit #1797's addition
  • Generator output is committed raw (not prettier-formatted) — critical for CI's validate:test-catalog --check to pass
  • No format violations: catalog outside format:check glob (src/**)

No user-visible behavior impact

  • Unblocks CI for open PRs once they rebase/merge main
  • No product or developer-facing changes
  • Documentation-only fix

[NON-BLOCKING] Procedural note: Commit message notes why raw generator output was used (CI expects exact match). This is correct; marking as non-blocking since it's educational and the choice is sound per the message body.


Verdict: Change matches scoped intent, no scope creep, acceptance criteria satisfied. Ready to merge.

@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:ux
🤖 Accipiter — Ateles swarm, ux lens panelist
APPROVE

Surface & Discoverability

  • Commands are clearly named: generate:test-catalog, validate:test-catalog
  • Both are documented in the catalog's "Testing requirements" and "Primary validation commands" sections
  • CI lane integration is explicit (baseline lane runs validate:test-catalog)

Error Messages

  • Stale detection (generator, --check mode): ❌ Automated test catalog is stale. Run npm run generate:test-catalog. — tells the user exactly what action to take
  • Success message: ✅ Wrote automated test catalog to <path> — confirms the write and its location
  • Passing validation: ✅ Automated test catalog is up to date. — clear signal

Documentation Quality

  • Flow diagram in catalog (Mermaid) walks the happy path: Change Tests → Generate → Review → Validate
  • Examples section shows realistic scenarios (add file, rename file, add CI lane)
  • Agent instructions include an explicit validation checklist for downstream usage
  • Maintenance section explicitly forbids hand-edits and directs to the generator

Preventive DX

Commit message is transparent about a subtle but important detail: generator output must be raw (no prettier-formatting) so that validate:test-catalog --check deterministically compares against fresh generation. This prevents a repeat of prior regenerations that still read as "stale" — developer experience risk mitigated.

Operational Impact

This PR unblocks all downstream PRs currently failing CI because main's catalog drifted since #1797. The stale catalog is a blocker for every PR, even fixture-only and docs-only changes — this fixes a significant friction point.

Acceptance Checklist

  • Commands are discoverable and well-named
  • Error messages are actionable (not just "stale", but the exact command)
  • Documentation is thorough (flow, examples, agent instructions, validation checklist)
  • No hand-edit surface exposed (generator is single source of truth)
  • Determinism note prevents repeat of prior stale-regen cycles

📎 Neotoma: neotoma issue #1804

@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:ux
🤖 Accipiter — Ateles swarm, ux lens panelist
COMMENT

UX surface examined: agent/developer experience of the test catalog regeneration workflow (npm run generate:test-catalog / npm run validate:test-catalog).

Findings

[NON-BLOCKING] Error message clarity: When validate:test-catalog --check fails on main due to stale inventory, the error message is:

❌ Automated test catalog is stale. Run `npm run generate:test-catalog`.

Current friction: Developers making non-test changes (docs-only, fixture-only, or unrelated features) do not expect a catalog validation failure. The word "stale" does not immediately signal "test files were added/removed/moved upstream."

Recommendation: No blocking issue; the docs (lines 310–313 in the generated file) and error message together provide enough context. This is low friction. If refined in a follow-up, consider:

  • "Test catalog is stale. A test file was added or removed. Run npm run generate:test-catalog."
  • Or: "Automated test catalog out of sync with repository. Run npm run generate:test-catalog to regenerate."

Neither is necessary; the current UX is functional and clear to anyone familiar with the repo.

Positive Signals

  • Deterministic generation: No manual intervention or human choices in the generator (script carries all metadata)
  • Clear success/failure paths: ✅/❌ messages at lines 390 and 393 are unambiguous
  • Documented workflow: Embedded agent instructions (lines 363–379) tell developers when and why to load this doc
  • Maintenance barrier: "Do not hand-edit" warning (line 323) protects the catalog from divergence

Acceptance

  • Catalog regenerated after test file changes
  • No hand-editing of suite inventory entries
  • Validation enforced in CI (baseline lane)
  • Generated file diffs match the schema of the catalog

✅ Signed off. No UX blockers. The catalog maintenance surface is transparent, discoverable, and maintains a single source of truth.


📎 Neotoma: none (chore: generated file sync only)

@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:qa
🤖 Phoenicurus — Ateles swarm, qa lens panelist
COMMENT

Summary

This PR regenerates the automated test catalog to reflect the new integration test added in PR #1797. The change itself is non-functional (documentation only) and the underlying test coverage is appropriate.

Test Coverage Assessment

New Test Added (PR #1797)

The test in validates the security contract for the sandbox write-rate-limit bypass token:

Coverage:

  • ✅ Happy path: Requests with the valid seed token bypass rate limiting entirely
  • ✅ Error path: Requests without the token are rate-limited normally (3/min cap, 429 on overflow)
  • ✅ Wrong-token rejection: Invalid token does not bypass the limiter
  • ✅ Timing-safe comparison: Uses timingSafeEqual to prevent timing-based token inference

Edge cases covered:

  • Token presence/absence (header missing)
  • Invalid token (wrong value)
  • Length-safe comparison (buffer length checked before constant-time compare)
  • Concurrent load (hammers 10 requests in sequence)

Catalog Regeneration (This PR)

  • ✅ Reflects accurate count: 502 total test files (was 501)
  • ✅ Accurate integration test count: 142 (was 141)
  • ✅ New test correctly listed in alphabetical order

CI Status

All critical test lanes pass:

  • ✅ agentic_evals: PASS
  • ✅ eval_combined: PASS
  • ✅ eval_scenarios: PASS
  • ✅ security_gates: PASS
  • ✅ frontend: PASS
  • ✅ python_sdk: PASS

The catalog file format matches generator output (raw, not Prettier-formatted), which is required for validate:test-catalog --check to succeed on subsequent PRs.

Functional Surface

No functional surface — this is a documentation-only commit that regenerates a generated artifact. The actual test coverage is from PR #1797, which is already merged and validated.

Sign-off:
No functional surface — no eval required per repository guidelines. The underlying test in #1797 provides comprehensive coverage for the sandbox token bypass security contract.

@neotoma-agent

Copy link
Copy Markdown
Collaborator

review:qa
🤖 Phoenicurus — Ateles swarm, qa lens panelist

APPROVE

Summary

Catalog regeneration triggered by test file added in #1797 () without prior catalog sync.

QA Assessment

Functional surface: None — chore / documentation maintenance per test_catalog_maintenance_rules.md.

Existing test coverage (from #1797):

  • ✅ Happy path: seed token requests bypass rate limit (10 reqs, all 200)
  • ✅ Negative case: requests without token rate-limited (3/10 pass, 7/10 are 429)
  • ✅ Wrong token rejected: invalid token does not bypass (≥1 429 responses)
  • ✅ Security: uses timingSafeEqual for constant-time token comparison

Catalog accuracy:

  • ✅ File counts incremented: 501→502 total, 467→468 Vitest, 141→142 integration
  • sandbox_seed_token_bypass.test.ts listed in alphabetically correct position
  • ✅ Generated as raw output (not prettier-formatted) per generator contract

No additional test coverage required for catalog-only changes per project standards.

Verdict

No functional surface — no eval required.

Catalog regeneration is accurate and correctly reflects the test file added in #1797.

@neotoma-agent

Copy link
Copy Markdown
Collaborator

🤖 Vanellus — Ateles swarm, PR steward
APPROVE

Aggregated Review Verdict

All review lenses have signed off with no blocking findings.

Per-Lens Summary

Lens Reviewer Verdict Blockers Notes
PM Pavo APPROVE 0 Change unblocks critical validation gate blocking all PRs on main. Scope tight, intent clear, risk minimal.
UX Accipiter APPROVE 0 No UX blockers. Catalog maintenance surface transparent, discoverable, maintains single source of truth.
QA Phoenicurus APPROVE 0 No functional surface. Catalog regeneration accurate; reflects test file from PR #1797.

CI Status

✅ Required checks all passing:

  • baseline: SUCCESS
  • security_gates: SUCCESS
  • python_sdk: SUCCESS
  • frontend: SUCCESS
  • agentic_evals: SUCCESS
  • eval_combined: SUCCESS
  • eval_scenarios: SUCCESS

⏳ Advisory checks (non-blocking):

  • preview: IN_PROGRESS
  • site_export: IN_PROGRESS

Non-Blocking Observations

  • [NON-BLOCKING] PM: Consider adding pre-commit hook or CI gate to regenerate catalog automatically when test files change (follow-up suggestion, not a blocker for this PR).
  • [NON-BLOCKING] UX: Error message wording "stale" could be refined in a follow-up to better signal test file changes; current messaging is functional and clear to familiar developers.

Merge Readiness

✅ All blocking criteria met:

  • All required branch-protection checks: SUCCESS
  • PM gate: signed_off
  • UX gate: signed_off
  • QA gate: signed_off
  • No [BLOCKING] findings across review panel
  • Deterministic, low-risk change (generated artifact only)

READY TO MERGE. All lenses approved; no blockers; required CI passing. Awaiting operator merge instruction per autonomy guardrail.


📎 Neotoma: Vanellus PR steward aggregation for markmhendrickson/neotoma#1804

@neotoma-agent

Copy link
Copy Markdown
Collaborator

🤖 Vanellus — Ateles swarm, PR steward
APPROVE

Review Panel Roll-up

PM Lens (Pavo)

APPROVE — Scope and intent aligned. Regeneration of stale test catalog is a mechanical, low-risk fix. Single generated-file update; no scope creep. Unblocks critical CI gate that was incorrectly blocking all open PRs on main.

UX Lens (Accipiter)

APPROVE — Developer experience surface transparent and discoverable. Error messages are actionable (not just "stale", but the exact regeneration command). Documentation flow, examples, and validation checklist are thorough. No UX blockers.

QA Lens (Phoenicurus)

APPROVE — No functional surface (documentation/chore only). Underlying test coverage from PR #1797 is comprehensive (happy path, error paths, edge cases, timing-safe comparison). Catalog regeneration is accurate: 501→502 total tests, 141→142 integration tests, correctly reflects new sandbox_seed_token_bypass.test.ts file.

CI Status

  • baseline: ✅ SUCCESS
  • security_gates: ✅ SUCCESS
  • All required checks: ✅ PASS

Verdict

  • Blocking findings: 0
  • Non-blocking findings: 0 (procedural notes only, no action items)
  • Gate inheritance: (parent issue not linked in PR body; cannot verify pm/ux/arch pre-impl gates)
  • Recommendation: ✅ Ready to merge

Note: Merge is operator-gated per Vanellus protocol. All review lenses clear; awaiting operator confirmation to merge.

@markmhendrickson

Copy link
Copy Markdown
Owner Author

Closing as redundant: main's automated test catalog is now up to date (validate:test-catalog passes on current origin/main), so the standalone regeneration this PR carried is no longer needed. The catalog-staleness root cause (raw generator output vs prettier) is handled per-PR in the bundles/sandbox PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants