Design ethos: modular, intelligent, functional, logged, tested, and documented. Real methods only; never mocks or fakes. Every release ships with green tests and accurate docs.
This file tracks live work after the v3.4.0 release (latest published release:
v3.4.0, tagged 2026-06-12). Historical release detail belongs in
CHANGELOG.md; generated counts and project rosters belong in
docs/_generated/COUNTS.md and
docs/_generated/active_projects.md.
Refreshed on 2026-06-13 on branch codex/template-exemplar-forkability
after the forkability and verifier-first roadmap pass. Re-run the
commands in the Source column before copying any number into prose; live
counts belong in docs/_generated/COUNTS.md, not
hard-coded here.
| Gate or surface | Current value | Source |
|---|---|---|
| Package version | 3.4.0 |
pyproject.toml#[project].version |
| Latest published release | v3.4.0 (tagged + GitHub release published 2026-06-12; CHANGELOG [3.4.0] body) |
gh release list, git tag |
| Public source scope | infrastructure plus nine public exemplar src/ trees |
uv run python -m infrastructure.project.public_scope source-paths |
| Public exemplars | template_active_inference, template_autoresearch_project, template_autoscientists, template_code_project, template_newspaper, template_prose_project, template_sia, template_template, template_textbook |
docs/_generated/active_projects.md |
| Canonical generated facts | Importable infrastructure packages, infrastructure Python-module count, project-scope + publishing test collections, and per-exemplar coverage — all live-derived; do not hard-code here | docs/_generated/COUNTS.md (regenerate with uv run python scripts/generate_counts.py --write) |
| Open GitHub PRs | 0 open | /opt/homebrew/bin/gh-axi pr list |
| Docs lint status | links-only, consistency-only, and doc-pairs all clean (re-verified 2026-06-13) | uv run python scripts/lint_docs.py --links-only --quiet --json, --consistency-only, --doc-pairs-only |
| Mermaid lint status | clean with chunked batch rendering under the default total budget | uv run python scripts/lint_docs.py --mermaid-only --quiet --json |
| Release verification baseline | drift --strict, COUNTS/skills/exports gates, tracked-project/generated-artifact guards, docs-lint (links/consistency), reporting+evidence+repro suites (877), and the LLM suite (1244, live Ollama) all green at the v3.4.0 commit |
v3.4.0 release + local command history |
Keep this section short. Details live in release notes or archived audits.
-
AutoResearch reviewer-boundary closeout (2026-06-13,
[Unreleased]): closedAR-REPORT-ERGONOMICS-1,AR-BENCHMARK-ERGONOMICS-1, andAR-SOURCE-LEDGER-2. The loop now emitsoutput/data/autoresearch_evidence_overview.json,output/reports/autoresearch_evidence_overview.md, andoutput/data/benchmark_boundary.json; the source-ledger contract validates ledger citekeys against BibTeX and manuscript citations, rejects invalid tiers/future dates, and prints offline age/tier summaries. Verified withuv run pytest projects/templates/template_autoresearch_project/tests/ -q(224 passed). -
Template forkability + verifier roadmap pass (2026-06-13,
[Unreleased]): closedTODO-REBASE-2(live PR count is 0; active backlog restored instead of_No open items_),REGRESSION-PIN-1(the first realtemplate_code_projectregression pins now collect and pass with a mutation negative control), andAI-VIZ-SPLIT-1(src/visualizations/figures.pysplit into focused semantic and simulation modules while preserving the figure facade).AI-SEMANTIC-FIXPOINT-1is now closed: the shared fixed-point orchestrator drives manuscript variables, sheaf semantic outputs, and contract settlement; the selected fixed-point cluster passed 21 tests on 2026-06-13. -
Generated-report web design (2026-06-12,
[Unreleased]): modernized the base HTML report/dashboard template (infrastructure/reporting/html_templates.py) with CSS design tokens, dark mode (prefers-color-scheme), WCAG-AA status contrast, fluid type, and a mobile breakpoint — template contract + deterministic output preserved (7 template + 875 reporting tests green). -
Scoped-improvement sweep (2026-06-12,
[Unreleased]): closed all three freshly-scoped items.WEBDESIGN-EXTEND-1— extractedhtml_templates.shared_css()as the single design-token source; the pipeline report, interactive dashboard, and web renderer all now anchor to--brand-1+ aprefers-color-schemeblock (108 tests).LINKCHECK-PERF-1—link_audit_coreprunes excluded/gitignored dirs before descending + single-pass file reads (~15–28× faster; 303 link tests + a timed regression).TEXLIVE-2026-BEAMER-1—latex_utils.compile_latextolerates the benign\reserved@akernel warning when a valid PDF results (the beamer test now passes on TeX Live 2026; genuine failures still raise). -
Backlog closeout + comprehensive review (2026-06-12): closed
REPRO-VERIFY-1(the repro bundle now fails closed on declared-but-absent outputs — a declared output absent at build rebases onto the project tree andverifyreturnsok=False; pinned bytests/infra_tests/publishing/test_repro_bundle_absent_output.py) andEVIDENCE-CLAIM-1(the claim-ledger candidate list gained anoutput/data/*claims*.jsonglob so thetemplate_autoresearch_projectexemplar'sautoresearch_claims.jsonis ingested as claim nodes +supportsedges; pinned bytest_ingest_claims_autoresearch_ledger_yields_nodes_and_supportsandtest_build_evidence_graph_ingests_real_autoresearch_ledger). Shipped alongside a large multi-pass review (dead-code removal,run.configmatrix runner,SOURCE_DATE_EPOCHdeterminism, thecanonical_facts.md→COUNTS.mdgenerator, exemplar-support-tier tagging, methods-plan gate, validation↔ autoresearch decoupling, and an Ollama test-model fix). Full detail inCHANGELOG.md. -
Backlog sweep + RedTeam hardening (2026-06-08): closed 6 backlog items —
PIPELINE-INCR-FLAG-1(--incrementalCLI flag on both entrypoints),DASHBOARD-CLI-1(release-readiness dashboard discoverable + subprocess test),REPRO-MULTI-1(--all-publicmulti-exemplar repro bundles),PROSE-GATE-WIRE-1(opt-in report-only prose gate in Stage 6, disabled = byte-identical),SIA-HARNESS-2(fixture/live separation tests + docs), andLOG-SEP-CENTRAL-1(33 banner literals routed through width constants + lint). VerifiedTODO-REBASE-1,ARCH-CONFTEST-1, andAI-GATE-CACHE-1already satisfied. A verifier-first RedTeam pass then fixed a real SIA fail-closed hole (max_generations<1returned a vacuous empty run) and closed five green-wash test gaps. Commits33e5ca71,c8381d9a. One out-of-scope finding logged asREPRO-VERIFY-1.RELEASE-TAG-1deferred (needs a clean tree + sign-off). -
v3.3.1release train (2026-06-07): completed Pandoc DOCX output (embed figures + resolve cross-references ininfrastructure/rendering/pipeline.py), ran a deep per-exemplar quality pass across the eight tracked public templates, completed and cross-linked sidecar publication metadata for all nine public exemplars, and reconciled the generated project-scope collection count to 216. Full detail inCHANGELOG.md. -
v3.3.0release train (2026-06-07): closed the three Medium backlog items (READFILE-SAFE-1, CI-MATRIX-DYNAMIC-1, LOG-CLEAN-1) and all five Major items (EVIDENCE-GRAPH-1, REPRO-BUNDLE-1, DASHBOARD-1, PLUGIN-STAGES-1, INCREMENTAL-PIPELINE-1) — the two pipeline-core features are opt-in/default-off. The shipped artifacts were re-verified on disk and under test on 2026-06-08 (169 dedicated tests pass; no stubs). Full detail inCHANGELOG.md. The genuine residual next-increments of these features are now tracked as new backlog items below (EVIDENCE-CLAIM-1, PIPELINE-INCR-FLAG-1, DASHBOARD-CLI-1, REPRO-MULTI-1, PROSE-GATE-WIRE-1). -
Reference-existence + AI-writing infrastructure (2026-06-06): new
infrastructure/reference/verification/deterministic anti-hallucination gate (Crossref→OpenAlex/arXiv resolver, SQLite cache, offline-first) andinfrastructure/validation/content/ai_writing.pyAI-writing fingerprint detector (validation.cli prose-quality); wired into thedocs/promptsworkflows. Clean-room distillation of academic-research-skills ideas (CC-BY-NC-4.0); no code vendored. -
Infra test parallelization (2026-06-06):
pytest-xdist+-n autoon the CItest-infrajob cut each leg from ~892s to ~585s; suite verified parallel-safe; default-vdropped fromaddopts. -
Docs accuracy sweep (2026-06-06): audited
docs/+ everyinfrastructure/*/{SKILL,README,AGENTS}.md; corrected examples that cited methods/params/CLI flags/test paths that did not exist (every fixed command re-verified to resolve). -
v3.2.0release train (2026-06-04): DOCX/EPUB renderers + format toggles (FMT-BUNDLE-1), Active Inference validation spine v2 (AI-SPINE-V2), infrastructure coverage gaps rebaselined (COVERAGE-REBASE-1), GitHub supply-chain hardening (GH-PIN-1, GH-ACTIONLINT-1, GH-AUTOMERGE-1), and the XML parser policy (DEP-DEFUSEDXML-1). SeeCHANGELOG.md. -
v3.1.0release train: SIA joined the public exemplar scope; Active Inference gained the first validation-spine tracks; docs signposting moved toprojects/templates/...; public coverage orchestration was hardened. SeeCHANGELOG.md. -
Thermo-nuclear infra/docs audit (2026-06-08): Waves A–E closed — doc contract, READFILE-SAFE-1 CLI, AGENTS v3.3 completeness, evidence-graph SUPPORTS/registry bridge, autoresearch validation split. Approve — see
docs/audit/archived/thermo-nuclear-code-quality-2026-06-08.md. -
Thermo-nuclear remediation waves: 2026-05-29 and 2026-05-30 blockers and branch deltas are closed. See
docs/audit/archived/thermo-nuclear-code-quality-2026-05-29.mdanddocs/audit/archived/thermo-nuclear-code-quality-2026-05-30.md.
- Problem:
template_active_inferencegate tests still have one heavy explicit roadmap/sheaf write path; clean-state fixed-point and gate setup now avoid redundant regeneration, but the full project suite still needs a durations-driven pass. - Why it matters: slow gates discourage local verification and make future semantic changes harder to review.
- Smallest next step: complete the remaining project-local
MEDIUM-TEST-PERF-1split with cheap source-only negative controls plus one end-to-end refresh characterization. - Acceptance check: focused mutation tests still catch the known failure
classes,
--durations=20shows fewer redundant artifact refreshes, and the manuscript gate still passes. - Out of scope: weakening coverage, skipping pymdp-backed evidence, or dropping the end-to-end refresh characterization.
As of 2026-06-12 there are no known divergences: pyproject.toml,
CHANGELOG.md, and the published tag all agree at 3.4.0 (the prior
"v3.3.1 bumped but not released" gap was resolved by cutting v3.4.0, which
folds the dated [3.3.1] entry and all post-3.3.1 work into one published
release). The docs-lint (links/consistency/doc-pairs), drift, and canonical-facts
gates are clean.
If a new drift appears between CHANGELOG.md, TO-DO.md,
generated facts, or .github/workflows/ci.yml, fix forward and record the
current source of truth here instead of rewriting shipped changelog entries.
- Backlog IDs are stable. Use them in commit messages when closing related work; do not silently reuse retired IDs for new work.
- ID scheme:
<AREA>-<SLUG>-<N>, whereAREAnames the surface —GH(GitHub workflows/CI),ARCH(architecture/test rules),LOG(logging),AI(Active Inference exemplar),SIA(SIA exemplar),DEP(dependencies),FMT(rendering formats),PIPELINE/DASHBOARD/REPRO/PROSE/EVIDENCE(shipped-capability follow-ups),COVERAGE/READFILE/CI-MATRIX(named gates/patterns), andTODO(backlog hygiene itself).Nis a monotonic counter within that area+slug. - Every active item must include a problem, why it matters, smallest next step, acceptance check, and out-of-scope boundary.
- Completion requires evidence from a command, file diff, or generated artifact; do not check off items from intention. Before retiring an item, confirm its artifact exists on disk (and under test) this session — never from a commit message alone.
- Re-baseline measured facts instead of copying old numbers from this file.
- Keep private or rotating project names out of public docs; link to
docs/_generated/active_projects.mdfor public scope. - Preserve the no-mocks testing policy, project coverage floors, and generated artifact guard when closing backlog items.
CHANGELOG.md- historical release notesdocs/development/roadmap.md- release direction and long-horizon themesdocs/development/coverage-gaps.md- measured infrastructure coverage gaps.github/AGENTS.md- CI gates and local parity commands