Skip to content

Latest commit

 

History

History
339 lines (294 loc) · 37.1 KB

File metadata and controls

339 lines (294 loc) · 37.1 KB

AGENTS.md

Project Constraints and Rules

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Step 2D — RQ1 provenance research checkpoint (freeze before R3)
  • Summary: Froze the retrieval/context provenance subsystem as the RQ1 evidentiary baseline now that all three feature branches (phase-r1-trace-corephase-r1b-provenance-eventsphase-r1b-followups) are merged into main at 021572e. Added a single consolidated checkpoint doc (lineage, verified metrics, miss-analysis summary, safe/unsafe claims, limitations, R3 scope) and tagged the commit rq1-provenance-checkpoint-2026-06-14. No code/behaviour change. Verified on main: 2C frozen pilot recall 86%, 1B enriched re-run recall 79% — both 0% raw leak / 100% hash / 100% completeness / 100% graph presence; full suite 228 passed.
  • Files Changed: docs/evaluation/provenance-rq1-checkpoint-2026-06-14.md (new). Git tag rq1-provenance-checkpoint-2026-06-14.
  • Verification: Both gold sets validate; eval_provenance re-run confirms 86% / 79% recall as above; pytest → 228 passed; public_repo_guard.py passed; git diff --check clean.
  • Follow-ups: Next is R3 — opt-in, audit-safe MCP/tool-call + memory/resource provenance (mcp.tool.requested/allowed/denied/result, memory.write[.denied], resource.read[.denied]). No behaviour changes, no benchmark yet. Phrase completeness as "100% instrumentation coverage over currently implemented events", never "100% complete provenance".

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R1B follow-ups — recall miss-analysis (#2) + scorer negative-branch tests (#1)
  • Summary: Two post-PR analysis tasks on phase-r1b-provenance-events; no R3, and no change to retrieval ranking/scoring/indexing/graph/schema, nor to the frozen 2C pilot or 1B real-run traces. (#2) Categorised every missed expected source behind the 79% recall: 6 missed links / 23 across 4 of 12 queries. Read-only finding — none is a missing, unindexed, or sub-threshold source; all 6 rank below the query's limit. Breakdown: 2 cutoff near-misses (recoverable, one compounded by single-source chunk domination taking 3/5 slots), 2 annotation/query-design mismatches (retriever arguably correct → 79% understates quality), 1 over-broad gold on an ambiguous query, 1 genuine lexical gap (no FTS stemming: query "citations" ≠ heading "Citation"). (#1) Added two scorer negative-branch tests proving the conditional discriminates — retrieval.fusion is not demanded for a keyword-only gold, and graph is not scored for a non-graph gold, while the same fusion-/graph-less trace fails when those events are expected. Verified by mutation (made the scorer unconditional → both tests red → reverted).
  • Files Changed: docs/evaluation/provenance-real-run-1b-miss-analysis-2026-06-14.md + .json (new, analysis only), tests/test_eval_provenance.py (+2 negative-branch tests). No production code changed.
  • Verification: pytest → 228 passed (+2). New tests proven to bite via scorer mutation (red), then green after git checkout. Frozen 1B gold re-scores 79%/100%/100% unchanged. public_repo_guard.py passed; git diff --check clean; compileall OK.
  • Follow-ups: Stemming/Porter tokenizer (or plural query expansion) would close the one genuine lexical miss; source-dedupe before budgeting would recover the cutoff near-misses — both out of scope here (no ranking/indexing change). Re-annotate real-evidence-report-02 (gold expects review docs; query points at reports/evidence). R3 still frozen.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R1B — provenance event enrichment (phase-r1b-provenance-events)
  • Summary: Raised the RQ1 provenance ceiling from a 3-event subset toward audit completeness. Added one genuinely-new event type retrieval.fusion (records existing hybrid weights {fts:0.6, embedding:0.4} + per-chunk ranks; observe-only, hybrid only) and populated the existing context.assembled.dropped with real budget drops + explicit dropped_reason: no_dropped_context for the empty case. Did not add graph.expand — the graph_context event already covers --graph expansion and is scored via the existing expect_graph_context gold flag (a parallel event would fragment the model). Fixed a retrieval confound found during the re-run: derived trace mirrors (wiki/traces/*.md) were being indexed and polluting results with the query's own terms — now excluded from indexing + keyword search. Re-ran the frozen 12-query Step 2C pilot as a separate artifact (new …-r1b trace IDs + provenance_real_gold_1b.jsonl), leaving the original 2C pilot frozen.
  • Files Changed: scripts/trace_schema.py (whitelist retrieval.fusion), scripts/context_export.py (_fusion_payload, _apply_budget, _assembled_context_payload dropped + reason, fusion emit, trace-mirror skip), scripts/chunk.py (exclude wiki/traces/), tests/test_provenance_enrichment.py (new, 7), tests/test_chunk.py (+1), tests/test_trace_retrieval_integration.py (new event shape), eval/provenance_real_gold_1b.jsonl + data/traces/…-r1b* + docs/evaluation/provenance-real-run-1b-2026-06-14.md (new)
  • Verification: pytest → 226 passed (+8). Enriched re-run: validate 12/12, replay 12/12, retrieval.fusion 12/12, genuine dropped on 7/7 context traces (reason budget), provenance_completeness 100% over the richer pipeline, graph_context_presence 100%. Recall 86%→79% (corpus drift, ranking-neutral; frozen 2C pilot still scores 86%). public_repo_guard.py passed; git diff --check clean; compileall OK.
  • Follow-ups: All-hybrid frozen set can't exercise the "fusion legitimately absent" path — a future keyword-only query would test the conditional denominator. dropped reasons limited to budget (dedupe/threshold future). R3 MCP tracing still frozen.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R2 Step 2C — real-corpus provenance pilot
  • Summary: Added an auditable real-corpus pilot before R3. Committed eval/provenance_real_queries.jsonl first as a frozen 12-query set (48a8c27) with manual expected-source annotations from wiki/docs. Then generated traces using existing search --trace, context --trace, and selected context --graph --trace; linked them in eval/provenance_real_gold.jsonl; and documented the run in docs/evaluation/provenance-real-run-2026-06-14.md. Pilot metrics: 0% raw leak rate, 100% hash integrity, 12/12 trace validate/replay, 86% expected source recall, 100% built-scope provenance completeness, 100% graph context presence. No R3, MCP tracing, ranking, graph behavior, schema, or evaluator scoring changes.
  • Files Changed: eval/provenance_real_queries.jsonl, eval/provenance_real_gold.jsonl, data/traces/trace-20260614T171000Z-real*.json, wiki/traces/trace-20260614T171000Z-real*.md, docs/evaluation/provenance-real-run-2026-06-14.md, docs/evaluation/provenance.md, docs/workflows_and_plans.md, eval/README.md
  • Verification: Search/graph indexes rebuilt; all 12 queries traced successfully; real gold validation passed; real evaluation reported 12 cases, 0% raw leaks, 100% hash integrity, 86% expected source recall, 100% provenance completeness, 100% graph context presence; trace validate/replay rate 12/12; compileall passed; provenance tests 7 passed; full suite 218 passed (2 dependency warnings); public_repo_guard.py passed; git diff --check passed.
  • Follow-ups: Treat as pilot evidence, not benchmark evidence; expand real-world gold and annotation review later; keep R3 frozen until merge/review.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R2 Step 2B — stronger provenance gold set
  • Summary: Expanded provenance evaluation beyond one controlled fixture. Passing baseline now has six cases: search --trace, context --trace, context --graph --trace, legacy coarse retrieval, the original Step 2 controlled fixture, and a stale/superseded note case labelled for later. Added isolated negative/failure gold files for raw-path invariant failure, incomplete trace completeness failure, and missing expected source recall failure. No MCP tracing, retrieval ranking, graph behavior, or schema changes.
  • Files Changed: eval/provenance_gold*.jsonl, data/traces/trace-20260614T16170*.json, tests/test_eval_provenance.py, README.md, docs/TESTING.md, docs/evaluation/provenance.md, docs/workflows_and_plans.md, eval/README.md
  • Verification: Step 2B red tests failed on missing gold cases/files; compileall passed; focused provenance+trace tests 27 passed; full suite 218 passed (2 dependency warnings); positive gold validates and scores 100% across built-scope metrics on 6 cases; negative fixtures fail as expected; all Step 2B trace fixtures validate; public_repo_guard.py passed; git diff --check passed.
  • Follow-ups: Keep R3 frozen; expand to larger real-world gold before making real-world provenance completeness claims.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R2 Step 2 — provenance evaluation harness
  • Summary: Added local eval_provenance.py plus eval/provenance_gold.jsonl to evaluate saved trace provenance. Hard invariants (raw_leak_rate=0%, hash_integrity_rate=100%) run before graded metrics (expected_source_recall, provenance_completeness, graph_context_presence). Gold schema includes optional expected_chunk_ids for future claim-to-chunk faithfulness. Added CLI command zurvan eval provenance.
  • Files Changed: scripts/eval_provenance.py, scripts/cli.py, eval/provenance_gold.jsonl, data/traces/trace-20260614T151617Z-prov0001.json, tests/test_eval_provenance.py, docs/evaluation/provenance.md, eval/README.md, docs/API.md, docs/TESTING.md, docs/workflows_and_plans.md, README.md
  • Verification: TDD red run failed on missing module; CLI red run failed on missing action; compileall passed; focused provenance+trace tests 25 passed; full suite 216 passed (2 dependency warnings); eval_provenance.py --validate passed; CLI eval provenance returned 100% built-scope metrics; public_repo_guard.py passed; git diff --check passed.
  • Follow-ups: R3 remains frozen; retrieval.fusion and graph.expand are not scored until implemented.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Phase R2 retrieval trace — Step 0 reconcile + Step 1A granularity
  • Summary: Step 0 reconciled stale test counts (201/10 → reproduced 210/19) across README/docs/CHANGELOG and R1/R2 audits (commit 44a76f2). Step 1A added granular, opt-in retrieval provenance events — retrieval.query, retrieval.result, context.assembled — while keeping legacy retrieval valid, schema_version=zurvan.trace.v1, and the payload-hash rule unchanged (commit cc87e5d). No ranking/scoring/fusion/stdout change; tracing opt-in via --trace.
  • Files Changed: scripts/trace_schema.py, scripts/context_export.py, tests/test_trace_replay.py, tests/test_trace_retrieval_integration.py, CHANGELOG.md, README.md, docs/TESTING.md, docs/workflows_and_plans.md, docs/audits/phase-r2-retrieval-trace-integration-audit-2026-06-14.md
  • Verification: focused trace suite 20 passed; full suite 211 passed (2 dependency warnings); legacy single-retrieval replay regression passed; public_repo_guard.py passed; git diff --check passed on branch.
  • Follow-ups: context.assembled.dropped always empty (awaits token-budget policy); retrieval.fusion/graph.expand not yet implemented; branch pushed, NOT merged to main (dirty main + wiki/index.md:669); next is Step 2 eval_provenance.py.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: MCP install — verify Claude Code + add Codex client
  • Summary: Claude Code already ✔ Connected (live mcp_server.py, no reinstall needed). Added Codex via codex mcp add zurvan (absolute Anaconda python + absolute server path; verified codex mcp get + launch smoke-test, 11 tools). Made it reproducible: added a codex target to install_mcp_config.py (emits codex mcp add command + [mcp_servers.zurvan] TOML via sys.executable) and docs/mcp/codex.md.
  • Files Changed: scripts/install_mcp_config.py, tests/test_install_mcp_config.py, docs/mcp/codex.md (+ machine ~/.codex/config.toml)
  • Verification: pytest → 191 passed (+3). claude mcp list → connected. codex mcp get zurvan OK. public_repo_guard.py passed.
  • Follow-ups: None.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: MCP server — per-argument schema docs + structured output
  • Summary: Added Annotated[..., Field(description=...)] to every parameter of all 11 MCP tools (per-arg descriptions + bounds: limit 1–50, depth 1–5, min_top3 0–1). Added structured output to zurvan_graph_stats via a GraphStats TypedDict — FastMCP now emits outputSchema + structuredContent {nodes, edges} plus a JSON text fallback. Text-rich tools (search/context) intentionally kept as curated text.
  • Files Changed: scripts/mcp_server.py, scripts/mcp_tools.py, tests/test_mcp_tools.py
  • Verification: pytest → 188 passed (+1). e2e_mcp_smoke.py full pass. Per-arg descriptions confirmed in inputSchema; graph_stats returns structuredContent + outputSchema. public_repo_guard.py passed.
  • Follow-ups: None.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: MCP server full audit — LLM usability + correctness fixes
  • Summary: All 11 MCP tools had empty descriptions (FastMCP reads the wrapper __doc__, not the tools.* docstrings) — rewrote mcp_server.py with rich model-facing docstrings + Literal enums (remember.type, decision.status, claim.confidence) + resource descriptions. Fixed CWD bugs: is_safe_path/resource_file now anchor to PROJECT_ROOT. Eval tools run in-process with stdout capture (was a relative subprocess to python) — also avoids corrupting the stdio stream. Dropped no-op depth from zurvan_graph_neighbours; zurvan_remember now keeps type as a tag; zurvan_search returns heading+snippet.
  • Files Changed: scripts/mcp_server.py, scripts/mcp_tools.py, scripts/mcp_security.py, scripts/mcp_resources.py
  • Verification: pytest → 187 passed. e2e_mcp_smoke.py full pass. Tool descriptions confirmed non-empty; resource_file works from /tmp; traversal blocked. public_repo_guard.py passed.
  • Follow-ups: Optional: per-arg Field(description=...) and structured JSON output.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Full audit — update OpenAI default model (GPT-5.x) + temperature safety
  • Summary: Verified current OpenAI model naming against official docs; bumped openai default gpt-4ogpt-5.4-mini (override via ZURVAN_LLM_MODEL). Added _openai_supports_custom_temperature() so the temperature field is omitted for GPT-5 family and o-series models (they 400 on non-default temperature) but still sent for legacy models. Updated docs/ENVIRONMENT.md.
  • Files Changed: scripts/llm.py, tests/test_llm.py, docs/ENVIRONMENT.md
  • Verification: pytest → 187 passed (+4 new). public_repo_guard.py passed.
  • Follow-ups: None outstanding from the documented list.

2026-06-14 (Australia/Sydney)

Raouf:

  • Scope: Full audit — finish CWD-independence for remaining non-MCP scripts
  • Summary: Closed the documented PROJECT_ROOT follow-up. graph_build.py now walks PROJECT_ROOT (was os.walk('.'), which produced an empty graph from any other CWD) while keeping node identity repo-relative; get_file_content() reads via absolute paths. graph_export.py default export paths are absolute with a guarded makedirs. eval_search.py resolves the gold file, expected-path checks, and fallback globs against PROJECT_ROOT via a new _resolve() helper. snapshot.py and public_repo_guard.py confirmed correct by design (own ROOT join / git ls-files-relative).
  • Files Changed: scripts/graph_build.py, scripts/graph_export.py, scripts/eval_search.py
  • Verification: pytest → 183 passed. Ran graph build / eval / export from /tmp: 830 nodes / 746 edges (was 0), gold validated, export to repo data/. eval_search --hybrid --min-top3 0.6 → top-3 100%.
  • Follow-ups: Review OpenAI model default in llm.py (GPT-5.x) — config judgment, not a bug.

2026-06-03 (Australia/Sydney)

Raouf:

  • Scope: Fix: CWD-independent absolute paths via PROJECT_ROOT
  • Summary: Added PROJECT_ROOT = Path(__file__).parent.parent.resolve() to scripts/config.py. Updated 9 scripts and 4 test files to use it. MCP server now works from any working directory. 183 tests pass.
  • Files Changed: scripts/config.py, scripts/graph_query.py, scripts/graph_schema.py, scripts/hybrid_search.py, scripts/rebuild_search_index.py, scripts/memory.py, scripts/context_export.py, scripts/mcp_resources.py, scripts/wiki_merge.py, scripts/ingest.py, tests/test_wiki_merge.py, tests/test_context_export.py, tests/test_ingest.py, tests/test_cli.py
  • Verification: pytest → 183 passed, 0 failed. public_repo_guard.py passed.
  • Follow-ups: Remaining scripts using relative paths (eval_search.py, graph_build.py, extract.py, etc.) are non-MCP-critical; migrate incrementally.

2026-06-03 (Australia/Sydney)

Raouf:

  • Scope: Add Apache 2.0 LICENSE
  • Summary: Added LICENSE file (Apache 2.0, copyright 2026 Mohammad Raouf Abedini). Added license badge to README.md.
  • Files Changed: LICENSE, README.md
  • Verification: python scripts/public_repo_guard.py passed.
  • Follow-ups: None.

2026-06-03 (Australia/Sydney)

Raouf:

  • Scope: README — Full professional rewrite
  • Summary: Rewrote README.md from scratch. Added badges (Python, tests, phase, Obsidian, MCP). Replaced flat Goals list with a capability table. Unified CLI syntax to use zurvan command throughout. Removed duplicate multiproject code block that appeared under Features by Phase. Added LLM provider table, Obsidian node-type colour table, architecture directory tree, feature history table, and full documentation index table. Sections: What it does · Quick Start · LLM Providers · MCP Server · Obsidian · Agent Workflow · Multi-Project · Evidence/Reports · Snapshots · Architecture · Quality Gate · Feature History · Documentation · Contributing.
  • Files Changed: README.md
  • Verification: python scripts/public_repo_guard.py passed.
  • Follow-ups: Add LICENSE file (no license currently present).

2026-06-03 (Australia/Sydney)

Raouf:

  • Scope: Full Documentation Audit — Phase 18 Sync
  • Summary: All six stale docs updated to match Phase 18 implementation. ENVIRONMENT.md now lists anthropic as a valid ZURVAN_LLM_PROVIDER option with ANTHROPIC_API_KEY. ARCHITECTURE.md accurately describes wiki/syntheses/, wiki/entities/, data/image_manifest.json, wiki_merge.py, filename_utils.py, and the image/compounding/synthesis data flows. API.md documents --save and --format flags and post-Phase-12 CLI command groups. TESTING.md stage/test counts corrected. workflows_and_plans.md has Phase 18 section. README.md broken code block fixed.
  • Files Changed: docs/ENVIRONMENT.md, docs/ARCHITECTURE.md, docs/API.md, docs/TESTING.md, docs/workflows_and_plans.md, README.md
  • Verification: public_repo_guard.py passed. Pushed to origin/main.
  • Follow-ups: None. Docs current through Phase 18.

2026-06-02 (Australia/Sydney)

Raouf:

  • Scope: Phase 18: Living Wiki + Provider Expansion
  • Summary: (18a) Refactored llm.py into a provider registry with mock as default when ZURVAN_LLM_PROVIDER is unset; added Anthropic/Claude via raw urllib with no SDK. (18b) Created wiki_merge.py as canonical concept/entity writer — pages now compound across sources via additive merge; migrates legacy source_id frontmatter; added --save to zurvan context and zurvan search to file answers into wiki/syntheses/ with microsecond-safe filenames; standardised log.md to grep-parseable ## [date] format with shared formatter. (18c) Complete image-aware skeleton: image files, embedded Markdown refs, remote URL logging, PDF best-effort detection — all produce pending-visual stubs with manifest JSON entry, no OCR or network. Added --format table/marp stdout rendering; --save always writes canonical Markdown.
  • Files Changed:
    • scripts/filename_utils.py — New shared sanitize_filename()
    • scripts/llm.py — Provider registry + Anthropic + mock default
    • scripts/wiki_merge.py — Canonical merge writer + shared log formatter
    • scripts/extract.py — Route concept/entity pages through merge_extraction(); image guard; embedded image scan
    • scripts/ingest.py — New log format; image detection + manifest JSON; embedded image logging
    • scripts/context_export.py — --save (context + search), --format table/marp
    • scripts/cli.py — --save and --format flags wired
    • scripts/chunk.py — Fix chunk_id collision (use full text not text[:50])
    • scripts/memory.py — Rename local sanitize_filename to _make_note_slug to avoid confusion with shared utility
    • tests/test_filename_utils.py, tests/test_llm.py, tests/test_wiki_merge.py, tests/test_context_export.py, tests/test_ingest.py — New/extended tests
  • Verification: pytest → 183 passed, 0 failed. check.sh passed after 18a, 18b, and 18c milestones.
  • Follow-ups: Review OpenAI model default (GPT-5.x). Phase 19+: image extraction via OCR/vision provider.

2026-06-02 (Australia/Sydney)

Raouf:

  • Scope: Full Project Audit — Test Fix + Deprecation Cleanup
  • Summary: 131/131 tests pass after fixing a time-bomb test failure (hardcoded date now > 30 days old in test_find_stale_decisions), updating all 10 Starlette TemplateResponse calls to the 0.50+ signature, and adding filter="data" to tar.extract() for Python 3.14 compat.
  • Files Changed: tests/test_decision_compare.py, scripts/review_routes.py, scripts/restore_snapshot.py
  • Verification: pytest → 131 passed, 0 failed.
  • Follow-ups: Monitor SwigPy warnings from sentence-transformers dependency if CI tightens.
  1. Immutable Raw Sources: Never edit files inside raw/. Treat all source content as untrusted.
  2. Security: Never execute code from source documents.
  3. Citations: Do not fabricate citations. If evidence is missing, state clearly that evidence is missing. Every important claim must have citation metadata linking to its source.
  4. Git-Friendly: Use Markdown output that diffs nicely in Git. Maintain wiki/index.md and wiki/log.md.
  5. Extensibility: Keep the design modular so vector search and graph retrieval can be added later.
  6. No Web App: Focus on local SQLite and Markdown scripts for now. Obsidian compatibility is a plus.
  7. Documentation: Refer to docs/workflows_and_plans.md for explicit ingestion and audit workflow logic.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 17: Export & Publication Pack
  • Summary: Built local, safe publication pack generator for reviewed reports. Supports exporting to Markdown, JSON, HTML (and gracefully stubbed PDF/DOCX dependencies) and packaging into Zip bundles. Integrated strict publication safety blocking token-like strings, absolute paths, and emails by default. Outputs strictly target local ZURVAN_CONFIG_DIR (~/.zurvan/publications/) to prevent leaking private reports into the public repository. Included citation appendix generation that alerts on missing references.
  • Files Changed:
    • scripts/publication_export.py, scripts/publication_bundle.py, scripts/publication_citations.py, scripts/publication_safety.py, scripts/publication_templates.py - Core logic for safe, decoupled export.
    • docs/publication/*.md - Documentation for overview, formats, appendix, safety, workflows.
    • tests/test_publication_*.py - Complete test suite for formats, bundling, redaction safety blocks, and appendix structure.
    • scripts/cli.py - Added publish export/bundle/citations/validate.
    • scripts/check.sh - Added automated publication validation tests to the quality gate.
    • scripts/public_repo_guard.py - Blocked .pdf, .docx files globally and enforced .zurvan/publications/ is outside tracked scope.
  • Verification: Ran bash scripts/check.sh locally alongside pytest. The pipeline hit a 100% pass rate. Verified public_repo_guard catches stray references safely and that empty appendix citations are caught properly.
  • Follow-ups: Proceed to Phase 18: Template Externalisation or another scaling phase.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 14: Report Composer
  • Summary: Built the local Phase 14 Report Composer. It safely transforms Evidence Packs into structured Markdown and JSON reports without relying on LLM or cloud endpoints. Uses predefined deterministic templates (e.g. executive_summary, technical_audit, evidence_digest). Integrates existing redaction safeguards to completely scrub evidence of private keys and paths before final output. Included a strict validation engine ensuring every claim maps directly to citations and warns if sections lack sufficient evidence. Outputs default to safe off-repo directories (~/.zurvan/reports/) to maintain public repo safety.
  • Files Changed:
    • scripts/report_compose.py - Core composition, templating and validation
    • scripts/report_export.py - Markdown and JSON structure export
    • scripts/cli.py - Added zurvan report compose/list/inspect/export/validate
    • scripts/public_repo_guard.py & .gitignore - Added reports/ block list
    • tests/test_report_*.py - Test suite for report creation and export
    • scripts/check.sh - Included Phase 14 report smoke test
    • docs/reports/*.md - Documentation for overview, templates, CLI, and safety
    • README.md & docs/workflows_and_plans.md - Marked Phase 14 as complete
  • Verification: Ran bash scripts/check.sh, resulting in 100% pass for unit and smoke tests, alongside the test_report_compose.py passing the validation structure check correctly categorizing missing sections as warnings.
  • Follow-ups: Proceed to Phase 15: Local Report UI / Review Workbench.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 13: Evidence Pack Builder
  • Summary: Implemented a robust Evidence Pack Builder capable of securely aggregating claims, decisions, contradictions, graph context, and search results into redacted, shareable bundles without requiring cloud connectivity, remote synchronization, or LLM summarization. Integrated data export pipelines supporting Markdown and JSON formats, alongside an automatic redaction utility guarding sensitive information like paths and API credentials. Output evidence packs are strictly stored locally outside the public workspace to protect data integrity and uphold safety constraints.
  • Files Changed:
    • scripts/evidence_pack.py - Core pack orchestration
    • scripts/evidence_collect.py - Safe cross-project evidence collection
    • scripts/evidence_manifest.py - Evidence packing manifest generation
    • scripts/evidence_redact.py - Security redactions for paths and tokens
    • scripts/evidence_export.py - Local bundle exports (Markdown/JSON)
    • docs/evidence/*.md - Documentation updates
    • tests/test_evidence_*.py - Complete test coverage
    • scripts/cli.py - Evidence builder interface
    • scripts/check.sh - Add tests to CI
  • Verification: Successfully ran all unit tests for evidence generation, validation, redaction logic, and smoke-tested local pack generation using check.sh.
  • Follow-ups: Proceed to Phase 14: Report Composer.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 12: Cross-Project Contradiction + Policy Radar
  • Summary: Added zurvan project radar scan, contradictions, policies, drift, and report. Built local heuristic detection for contradictions across decisions, claims, and policies based on positive/negative keyword lists and categorical overlap. Included rules to ensure safe handling of public repos, MCP write restrictions, and directory immutability.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 11: Cross-Project Decision Memory
  • Summary: Enabled Zurvan to scan, cache, and compare decisions across all federated projects. Added zurvan project decisions-all, decisions-similar, decisions-conflicts, and decisions-stale. Built heuristic algorithms to detect repeating architectural patterns and possible contradictions (e.g., conflicting defaults across projects) without relying on cloud endpoints, LLMs, or cross-project data copying. Cached decisions locally in ~/.zurvan/cache/ to ensure public-repo safety.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 10: Cross-Project Search + Federation
  • Summary: Added zurvan project search-all and context-all to federate searches across multiple isolated local knowledge bases. Ensured strict privacy by preventing file copying, absolute path leakage, and cloud dependencies. Read-only federation operations use subprocess execution per-project to prevent data bleed. Added federation stats and doctor commands to monitor network health.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 9: Multi-Project Workspace Support
  • Summary: Decoupled private workspace paths from the public repository by introducing a local config directory (~/.zurvan/projects.json). Implemented zurvan project register, list, current, use, doctor, and snapshot. Added a global --project <name> argument to override the project root for commands like search and context. Guaranteed full path safety by strictly validating Zurvan project structure and rejecting raw/ paths.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 8: Release Packaging + Versioned Snapshots
  • Summary: Added zurvan version, zurvan doctor, and zurvan snapshot commands to make the system portable and safely recoverable. Snapshots intentionally exclude raw/ by default to prevent data leakage. Restores require explicit confirmation and take automatic safety backups, explicitly blocking traversal paths or writes into raw/.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 7.5: Obsidian Integration Pack
  • Summary: Configured Zurvan as a first-class Obsidian vault. Added templates (wiki/templates/) for all core knowledge node types and created safe Obsidian settings (.obsidian/) to hide internal script and data directories. Added full documentation (docs/obsidian/) for vault setup and plugin recommendations.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 7: Agent Workflow Orchestration
  • Summary: Added structured local session management (session start, session close, agent preflight, agent postedit) to seamlessly onboard agents like Claude Code, Codex, and Cursor before and after edits. Provided templates and explicit workflow documentation.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 6.5: MCP Client Integration Pack
  • Summary: Added scripts/doctor_mcp.py to assert system health before connection and scripts/install_mcp_config.py to generate safe MCP configurations for clients like Claude Code and Cursor. Added comprehensive client setup guides in docs/mcp/. Added explicit warnings when bypassing read-only defaults.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Phase 7: Comprehensive Documentation Audit
  • Summary: Conducted a full audit of documentation. Fixed markdown errors in README, decoupled technical guides into specific files (SETUP.md, ARCHITECTURE.md, API.md, ENVIRONMENT.md, TESTING.md, TROUBLESHOOTING.md, DEPLOYMENT.md). Addressed duplicate chunk_id in open-questions.md breaking hybrid search tests.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Local MCP Server for Agent Integration (Phase 6)
  • Summary: Added mcp_server.py and tools/resources/prompts to expose Zurvan via the Model Context Protocol (stdio). Implemented strict safety rules including a read-only mode by default and no arbitrary file reads/execution.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: LLM Provider & PDF Stress Testing
  • Summary: Added real LLM provider support and PDF extraction. Do not add vector search yet! Ensure basic extractions are robust first.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Extraction Reliability Gauntlet
  • Summary: Implemented Phase 3.5 testing gauntlet. Do not move to vector search until the matrix is fully verified with messy real-world files.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Agent-Facing CLI Memory Interface
  • Summary: Implemented Phase 3.6 CLI interface for agents to securely interact with the knowledge base. No vector search yet.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Local Hybrid Search (Phase 4)
  • Summary: Added local hybrid search (SQLite FTS5 + Mock/Local embeddings). Do not add graph retrieval, MCP, or web UI yet. Stay local-first.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Retrieval Evaluation Harness (Phase 4.5)
  • Summary: Added eval/search_gold.jsonl and metrics. Always evaluate retrieval accuracy before advancing.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Seed Gold Knowledge (Phase 4.6)
  • Summary: Added validation step to check gold file paths exist before eval. Seeded missing knowledge files. Enforced min-top3 0.6 in check.sh. No graph retrieval yet.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Graph-Assisted Context Expansion (Phase 5.5)
  • Summary: Added zurvan context --graph and zurvan graph expand to retrieve graph neighbours along with hybrid search results.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Knowledge Graph Lite (Phase 5)
  • Summary: Implemented local SQLite-backed graph layer extracting nodes and edges from Markdown wikilinks, frontmatter, and paths. Graph retrieval is pending Phase 5.5.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: Quality Gate (Test-Creator)
  • Summary: Added scripts/check.sh to enforce testing invariants (pytest, gauntlet, audit) sequentially.

2026-05-30 (Australia/Sydney)

Raouf:

  • Scope: E2E Smoke Test (Phase 5.5 Finalization)
  • Summary: Created full E2E test script (scripts/e2e_smoke.sh) and fixed exit codes in scripts/cli.py and scripts/memory.py so memory actions failing return correctly. The E2E tests fully simulate the entire Zurvan pipeline.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 15: Local Report UI / Review Workbench
  • Summary: Built a local-only FastAPI UI to inspect evidence packs and review composed reports before exporting them. Bound strictly to localhost by default and restricted path access to safely prevent any raw data leakage. Validates citation integrity interactively via web dashboard to spot unsupported claims or empty sections manually.
  • Files Changed:
    • scripts/review_server.py, scripts/review_routes.py, scripts/review_models.py, scripts/review_safety.py - Core web application backend logic and security validations.
    • templates/ and static/ - HTML, CSS, JS frontend rendering for reports.
    • scripts/cli.py - Wired up zurvan review serve/list/open
    • scripts/check.sh - Added automated smoke test and routing test suite for review workbench.
    • docs/review/*.md - Documentation for overview, usage and safety
    • README.md & docs/workflows_and_plans.md - Marked Phase 15 as complete.
  • Verification: Ran bash scripts/check.sh, resulting in 100% pass for unit and smoke tests. Verified the review endpoints properly export markdown and correctly reject invalid queries / prevent path traversals.
  • Follow-ups: Proceed to the next phases on optimizing or scaling the workbench.

2026-05-31 (Australia/Sydney)

Raouf:

  • Scope: Phase 16: Review Workbench Hardening + UX Polish
  • Summary: Enhanced the local report review cockpit with stronger safety checks and UX improvements. Added automatic secret scanning (emails, API keys, absolute paths) to flag unsafe exported content. Strengthened citation validation to catch unmapped or missing claims before final export. Polished the UI with clear status badges, a dedicated warnings panel, dynamic dashboard summary metrics, and a reviewer checklist. Fully integrated zurvan review audit and zurvan review index rebuild commands into the CLI.
  • Files Changed:
    • scripts/review_audit.py & scripts/review_index.py - Core auditing and local indexing logic.
    • docs/review/hardening.md & docs/review/reviewer-checklist.md - Operational guidelines for safety and workflows.
    • tests/test_review_audit.py & tests/test_review_index.py - Unit test coverage for edge cases like secret detection and manifest validation.
    • scripts/review_routes.py, scripts/cli.py, templates/*.html, static/review.css - Endpoint plumbing, UI/CSS updates, and command hooks.
    • README.md, docs/workflows_and_plans.md, scripts/check.sh - Project structure and checklist documentation logic.
  • Verification: Ran bash scripts/check.sh, which hit a 100% pass rate. Verified zurvan review audit cleanly flags unmapped citations, and zurvan review index rebuild properly isolates without leaking absolute local paths into the registry.
  • Follow-ups: Prepare for Phase 17 involving potential new integrations or scaling report formats.