test(ci): gate cross-surface store/reference contract parity on the PR lane#1849
Conversation
Retrospective ent_68a9270e2e656da847c10ced found that the source_storage:'reference' feature shipped incomplete to an evaluator across three releases because (1) contract parity across Neotoma's surfaces (MCP, REST, CLI, SDK) was never tested, (2) "fixed" was declared from contract-acceptance rather than observed behaviour, and (3) the covering integration test ran only in the nightly remote_integration workflow, never on the PR baseline lane. This implements task_policy cross_surface_contract_parity_tested_all_surfaces (ent_2ad0677fe23c0c1878ae43e8) and fixed_means_behavior_verified_not_contract_accepted (ent_db0b7855d47012084477fb00): - Add a focused, fast PR-time `contract_parity` lane to ci_test_lanes.yml that runs ONLY the parity-critical store/reference integration tests on every pull_request, against the same local SQLite backend the nightly job uses. It deliberately does not pull the whole nightly suite onto each PR. - Add `test:contract-parity` npm script (file-parallelism disabled to avoid SQLite lock contention between the two HTTP-server test files). - Add a shared MCP<->REST parity-matrix helper (tests/helpers/ store_reference_parity.ts) and a parity test (tests/integration/ store_reference_source_parity.test.ts) that drives the SAME scenario across the MCP `store` tool dispatch and the REST POST /store route for BOTH the file-only and combined entities[]+file shapes, asserting the EFFECT (storage_mode=reference in the sources row), not just acceptance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
review:pm Scope Alignment
Acceptance Criteria
Priority FitHigh-confidence approval. This PR gates a known regression mode (contract parity drift across surfaces) that affected a real evaluator across three releases. The parity matrix is concrete and the PR correctly sequencesit: adds only to the PR lane (fast gate, no nightly overhead), tests the most vulnerable surface pair (MCP ↔ REST), and operationalizes the two task_policies that are code-enforced. The audit of other under-tested multi-surface capabilities (restore_relationship, delete_entity, etc.) is properly filed as separate tasks and not bundled into this PR, which keeps scope tight. No Blockers
|
|
review:legal Legal Review — Licensing, Data Handling, Credentials, PIIFindings
SummaryThis PR implements two task policies from the retrospective (cross-surface parity + effect-based testing) via test infrastructure only. No new dependencies, no production code changes, no credential or PII exposure, and no new data-handling surfaces. ✅ APPROVE — No blocking legal findings. Ready to merge on product/correctness sign-off. |
The contract_parity lane added tests/integration/store_reference_source_parity.test.ts and tests/helpers/store_reference_parity.ts; the catalog wasn't regenerated, failing the baseline lane's validate:test-catalog step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🤖 Lanius — Ateles swarm, PR gate inheritance Gate Inheritance StatusFinding: No parent issue with gate_status metadata found in PR body. This PR is retrospective-driven (rooted in ent_68a9270e2e656da847c10ced, not tied to an initialized feature issue). Legacy Issue Rule Applied: Gates were never initialized on a parent issue. No pre-impl gates to block on.
Reviewer Assignment: Vanellus assigned as PR reviewer (Phase 4). Verdict: All pre-impl gates are clear (legacy status). Review proceeds. Merge remains operator-gated per pipeline fail-open-for-review guardrail. |
|
review:qa Test Coverage AssessmentWhat This Change TestsThe PR implements a critical gap: cross-surface contract parity for Coverage Matrix: ✅ CompleteThe new
Test depth:
Regression Test ✅
CI Lane Integration ✅
Assertion Quality ✅
No Blocking Gaps ✅
Non-Blocking Suggestions[NON-BLOCKING] Test structure: The shared [NON-BLOCKING] Coverage gap (future work, not this PR): CLI surface is not directly tested; it relies on REST parity. No action needed this PR, but document that Summary
Verdict: This is a well-scoped, high-impact test that closes a critical gap (cross-surface parity never tested before) and implements two standing task policies. The parity matrix approach is the right choice for catching divergence. Ready to merge. 📎 Retrospective: ent_68a9270e2e656da847c10ced · Task policies: ent_2ad0677fe23c0c1878ae43e8, ent_db0b7855d47012084477fb00 |
|
🤖 Vanellus — Ateles swarm, PR steward Aggregated Review VerdictPer-Lens Roll-Up
CI Status
Blocking Findings[BLOCKING] Count: 0 All pre-impl gates clear (legacy retrospective-driven issue, no parent gate_status initialized). Required branch-protection check ( PR Gate InheritanceGate inheritance check passed. No parent issue gate_status to enforce; retrospective-sourced PR (rooted in ent_68a9270e2e656da847c10ced, task_policy entities ent_2ad0677fe23c0c1878ae43e8 + ent_db0b7855d47012084477fb00). Merge Readiness✅ Ready for operator approval. This PR:
Merge awaits operator gate. Vanellus does not auto-merge per autonomy guardrail. 📎 Neotoma: ent_68a9270e2e656da847c10ced (retrospective) · ent_2ad0677fe23c0c1878ae43e8 (task_policy: cross_surface_parity) · ent_db0b7855d47012084477fb00 (task_policy: effect_verified) |
|
review:ux Reviewed this PR through the UX lens: discoverability of the new CI lane + npm script, error messaging, and documentation for cross-surface testing contract parity. User-facing surfaceThe PR introduces two discoverable user-facing surfaces:
Assessment✅ Discoverability & naming — strong signals
✅ Error messaging & failure ergonomicsThe test helper (
✅ Documentation — clear intent & prevention rationaleThe retrospective backlinks ([ent_68a9270e2e656da847c10ced]) are embedded in:
✅ API contract clarity — parity matrix is explicitThe
This is explicit enough that a future developer adding a new surface (e.g., a new gRPC endpoint) sees immediately that they need to add 2 rows (the two shapes) to maintain parity coverage. The matrix is discoverable and extensible.
|
|
review:legal Compliance ReviewScope
FindingsDependencies: No new npm dependencies added. Test uses existing vitest framework. Data handling:
Credentials & PII:
Schema integrity:
Public contract surfaces:
Verdict✅ No blocking legal/compliance risk. The PR implements the required cross-surface parity testing policy with full data cleanup, no credential exposure, and no public contract surface changes. Complies with core Neotoma constraints (immutability, determinism, privacy). 📎 Neotoma: retrospective ent_68a9270e2e656da847c10ced · task_policy ent_2ad0677fe23c0c1878ae43e8 · task_policy ent_db0b7855d47012084477fb00 |
|
review:qa Test Coverage AnalysisThis PR adds a focused cross-surface contract-parity lane to the PR baseline. The change addresses a critical regression class identified in retrospective ent_68a9270e2e656da847c10ced: the What This TestsMatrix coverage:
Assertions per scenario:
Edge cases covered:
Regression Risk Mitigation✅ Blocks the exact regression class: The parity matrix drives the SAME scenario through MCP and REST in sequence, so a divergence between surfaces (e.g., REST stores reference but MCP falls back to inline) fails the lane on every PR. ✅ Effect-oriented assertions: Tests assert ✅ Scoped to critical path: The lane deliberately does NOT pull the whole nightly integration suite onto each PR—just the parity-critical files, keeping the lane fast (~a few seconds). ✅ Explicit test-lane config: CI lane disabled file parallelism ( Coverage Gaps & Observations[NON-BLOCKING] Surface coverage: Tests cover MCP + REST only. CLI and SDK surfaces mentioned in the retrospective's problem statement ("contract parity across Neotoma's surfaces (MCP, REST, CLI, SDK)") are not in this parity matrix. Recommend filing a follow-up task for CLI↔REST or SDK↔REST parity (out of scope for this regression fix). [NON-BLOCKING] Error-path parity: Happy-path reference storage is tested across surfaces. Error cases (invalid file path, permission denied, storage_url malformed) are not symmetrically tested for parity. If a future fix changes error handling, a divergence between MCP and REST error envelopes would not be caught by this lane. Recommend documenting this boundary in a follow-up or next audit. Test Catalog & CI Configuration✅ Test file registered in automated test catalog ( Validation Checklist
Sign-off: Eval is green, matrix is comprehensive for the stated regression class, assertions are effect-oriented per task_policy, and CI gate is properly wired. This change closes the retrospective finding and prevents the same divergence from shipping again. Recommend filing a follow-up task: "Extend contract-parity matrix to CLI and SDK surfaces" (ent_2ad0677fe23c0c1878ae43e8 / task_policy: cross_surface_contract_parity_tested_all_surfaces is now partially implemented; full coverage requires CLI + SDK). |
|
🤖 Vanellus — Ateles swarm, PR steward Aggregated Review VerdictAll four panel lenses have reviewed and approved this PR with zero blocking findings. Per-Lens Rollup
CI Status ✅All required checks passing (security_gates + baseline):
Advisory checks (non-blocking):
Verdict Summary✅ No blocking findings. All lenses clear. This PR implements two task policies from the retrospective (cross-surface parity + effect-based testing) via test infrastructure only. Scope is tight, assertions are effect-oriented per policy, and CI integration is fast & focused. Gate Status: Ready for merge (subject to operator approval per autonomy guardrail). 📎 Neotoma: Retrospective ent_68a9270e2e656da847c10ced · Task policy ent_2ad0677fe23c0c1878ae43e8 · Task policy ent_db0b7855d47012084477fb00 |
Docs previewPreview URL: https://dev.neotoma.io/pr-1849/ Built from |
Why
Retrospective [ent_68a9270e2e656da847c10ced] found that the
source_storage:'reference'feature shipped incomplete to evaluator Jeroen across three releases because:remote_integrationworkflow — never on the PR baseline lane (which runs onlytest:unit, no DB), so it never gated a PR.This PR implements two of the five preventions encoded as
task_policyentities:cross_surface_contract_parity_tested_all_surfaces([ent_2ad0677fe23c0c1878ae43e8])fixed_means_behavior_verified_not_contract_accepted([ent_db0b7855d47012084477fb00])(The other three —
evaluator_dry_run_before_fixed_claim[ent_ab038841f0a3ed1ff6acb709],hybrid_subagent_dispatch_for_swarm_roles[ent_af149de1fa4666805a0bcdc8],dispatch_work_to_owning_active_agent[ent_662b57b0a32d4a854c2183e9] — are process policies wired into the swarm gate agents separately.)What
contract_parityin.github/workflows/ci_test_lanes.yml. It runs ONLY the parity-critical store/reference integration tests on everypull_request, against the same local SQLite backend the nightly integration job uses. It deliberately does not pull the whole nightly integration suite onto each PR — just the cross-surface parity gate — so it stays fast (a few seconds of tests).test:contract-paritynpm script runningstore_reference_source_parity.test.ts+ the two existing sibling reference tests, with--no-file-parallelismto avoid SQLite-lock contention between the two HTTP-server test files.tests/helpers/store_reference_parity.ts) + parity test (tests/integration/store_reference_source_parity.test.ts) that drives the SAME scenario across:storetool dispatch (executeTool("store", …)— the path the/mcpJSON-RPC route routes atools/callinto), andPOST /storeroute,storage_mode=referenceon the sources row), not just that the input is accepted, and asserting MCP and REST produce the identical persisted value.Verification (run locally, green)
npm run test:contract-parity→ 3 files, 11 tests passed (~3.4s)npm run format:check→ clean (src + new test files via prettier)npm run lint→ 0 errorsnpm run type-check→ cleannpm run validate:test-catalog→ up to dateAudit — other single-surface-tested multi-surface capabilities (filed, not fixed here)
Per the retrospective's instruction, I audited the codebase for store/retrieve capabilities exposed on multiple surfaces (MCP tool + REST route + CLI command +
@neotoma/clientSDK) that have a test on only one surface, and filed a Neotomataskfor each (statuspending, prioritymedium,assigned_to: cicada,repository_name: neotoma, tagcontract_parity), eachREFERS_TOthe retrospective. These are not fixed in this PR:restore_relationshiprestore_entitydelete_relationshipsplit_entitymerge_entitiesdelete_entitycorrectNotes
🤖 Generated with Claude Code