Raouf:
- Scope: Step 2D — RQ1 provenance research checkpoint (freeze before R3)
- Summary: Froze the retrieval/context provenance subsystem as the RQ1 evidentiary baseline now that all three feature branches (
phase-r1-trace-core→phase-r1b-provenance-events→phase-r1b-followups) are merged intomainat021572e. Added a single consolidated checkpoint doc (lineage, verified metrics, miss-analysis summary, safe/unsafe claims, limitations, R3 scope) and tagged the commitrq1-provenance-checkpoint-2026-06-14. No code/behaviour change. Verified onmain: 2C frozen pilot recall 86%, 1B enriched re-run recall 79% — both 0% raw leak / 100% hash / 100% completeness / 100% graph presence; full suite 228 passed. - Files Changed:
docs/evaluation/provenance-rq1-checkpoint-2026-06-14.md(new). Git tagrq1-provenance-checkpoint-2026-06-14. - Verification: Both gold sets validate;
eval_provenancere-run confirms 86% / 79% recall as above;pytest→ 228 passed;public_repo_guard.pypassed;git diff --checkclean. - Follow-ups: Next is R3 — opt-in, audit-safe MCP/tool-call + memory/resource provenance (
mcp.tool.requested/allowed/denied/result,memory.write[.denied],resource.read[.denied]). No behaviour changes, no benchmark yet. Phrase completeness as "100% instrumentation coverage over currently implemented events", never "100% complete provenance".
Raouf:
- Scope: Phase R1B follow-ups — recall miss-analysis (#2) + scorer negative-branch tests (#1)
- Summary: Two post-PR analysis tasks on
phase-r1b-provenance-events; no R3, and no change to retrieval ranking/scoring/indexing/graph/schema, nor to the frozen 2C pilot or 1B real-run traces. (#2) Categorised every missed expected source behind the 79% recall: 6 missed links / 23 across 4 of 12 queries. Read-only finding — none is a missing, unindexed, or sub-threshold source; all 6 rank below the query'slimit. Breakdown: 2 cutoff near-misses (recoverable, one compounded by single-source chunk domination taking 3/5 slots), 2 annotation/query-design mismatches (retriever arguably correct → 79% understates quality), 1 over-broad gold on an ambiguous query, 1 genuine lexical gap (no FTS stemming: query "citations" ≠ heading "Citation"). (#1) Added two scorer negative-branch tests proving the conditional discriminates —retrieval.fusionis not demanded for a keyword-only gold, and graph is not scored for a non-graph gold, while the same fusion-/graph-less trace fails when those events are expected. Verified by mutation (made the scorer unconditional → both tests red → reverted). - Files Changed:
docs/evaluation/provenance-real-run-1b-miss-analysis-2026-06-14.md+.json(new, analysis only),tests/test_eval_provenance.py(+2 negative-branch tests). No production code changed. - Verification:
pytest→ 228 passed (+2). New tests proven to bite via scorer mutation (red), then green aftergit checkout. Frozen 1B gold re-scores 79%/100%/100% unchanged.public_repo_guard.pypassed;git diff --checkclean;compileallOK. - Follow-ups: Stemming/Porter tokenizer (or plural query expansion) would close the one genuine lexical miss; source-dedupe before budgeting would recover the cutoff near-misses — both out of scope here (no ranking/indexing change). Re-annotate
real-evidence-report-02(gold expects review docs; query points at reports/evidence). R3 still frozen.
Raouf:
- Scope: Phase R1B — provenance event enrichment (
phase-r1b-provenance-events) - Summary: Raised the RQ1 provenance ceiling from a 3-event subset toward audit completeness. Added one genuinely-new event type
retrieval.fusion(records existing hybrid weights{fts:0.6, embedding:0.4}+ per-chunk ranks; observe-only, hybrid only) and populated the existingcontext.assembled.droppedwith realbudgetdrops + explicitdropped_reason: no_dropped_contextfor the empty case. Did not addgraph.expand— thegraph_contextevent already covers--graphexpansion and is scored via the existingexpect_graph_contextgold flag (a parallel event would fragment the model). Fixed a retrieval confound found during the re-run: derived trace mirrors (wiki/traces/*.md) were being indexed and polluting results with the query's own terms — now excluded from indexing + keyword search. Re-ran the frozen 12-query Step 2C pilot as a separate artifact (new…-r1btrace IDs +provenance_real_gold_1b.jsonl), leaving the original 2C pilot frozen. - Files Changed:
scripts/trace_schema.py(whitelistretrieval.fusion),scripts/context_export.py(_fusion_payload,_apply_budget,_assembled_context_payloaddropped + reason, fusion emit, trace-mirror skip),scripts/chunk.py(excludewiki/traces/),tests/test_provenance_enrichment.py(new, 7),tests/test_chunk.py(+1),tests/test_trace_retrieval_integration.py(new event shape),eval/provenance_real_gold_1b.jsonl+data/traces/…-r1b*+docs/evaluation/provenance-real-run-1b-2026-06-14.md(new) - Verification:
pytest→ 226 passed (+8). Enriched re-run: validate 12/12, replay 12/12,retrieval.fusion12/12, genuinedroppedon 7/7 context traces (reasonbudget),provenance_completeness100% over the richer pipeline,graph_context_presence100%. Recall 86%→79% (corpus drift, ranking-neutral; frozen 2C pilot still scores 86%).public_repo_guard.pypassed;git diff --checkclean;compileallOK. - Follow-ups: All-hybrid frozen set can't exercise the "fusion legitimately absent" path — a future keyword-only query would test the conditional denominator.
droppedreasons limited tobudget(dedupe/threshold future). R3 MCP tracing still frozen.
Raouf:
- Scope: Phase R2 Step 2C — real-corpus provenance pilot
- Summary: Added an auditable real-corpus pilot before R3. Committed
eval/provenance_real_queries.jsonlfirst as a frozen 12-query set (48a8c27) with manual expected-source annotations from wiki/docs. Then generated traces using existingsearch --trace,context --trace, and selectedcontext --graph --trace; linked them ineval/provenance_real_gold.jsonl; and documented the run indocs/evaluation/provenance-real-run-2026-06-14.md. Pilot metrics: 0% raw leak rate, 100% hash integrity, 12/12 trace validate/replay, 86% expected source recall, 100% built-scope provenance completeness, 100% graph context presence. No R3, MCP tracing, ranking, graph behavior, schema, or evaluator scoring changes. - Files Changed:
eval/provenance_real_queries.jsonl,eval/provenance_real_gold.jsonl,data/traces/trace-20260614T171000Z-real*.json,wiki/traces/trace-20260614T171000Z-real*.md,docs/evaluation/provenance-real-run-2026-06-14.md,docs/evaluation/provenance.md,docs/workflows_and_plans.md,eval/README.md - Verification: Search/graph indexes rebuilt; all 12 queries traced successfully; real gold validation passed; real evaluation reported 12 cases, 0% raw leaks, 100% hash integrity, 86% expected source recall, 100% provenance completeness, 100% graph context presence; trace validate/replay rate 12/12; compileall passed; provenance tests
7 passed; full suite218 passed(2 dependency warnings);public_repo_guard.pypassed;git diff --checkpassed. - Follow-ups: Treat as pilot evidence, not benchmark evidence; expand real-world gold and annotation review later; keep R3 frozen until merge/review.
Raouf:
- Scope: Phase R2 Step 2B — stronger provenance gold set
- Summary: Expanded provenance evaluation beyond one controlled fixture. Passing baseline now has six cases:
search --trace,context --trace,context --graph --trace, legacy coarseretrieval, the original Step 2 controlled fixture, and a stale/superseded note case labelled for later. Added isolated negative/failure gold files for raw-path invariant failure, incomplete trace completeness failure, and missing expected source recall failure. No MCP tracing, retrieval ranking, graph behavior, or schema changes. - Files Changed:
eval/provenance_gold*.jsonl,data/traces/trace-20260614T16170*.json,tests/test_eval_provenance.py,README.md,docs/TESTING.md,docs/evaluation/provenance.md,docs/workflows_and_plans.md,eval/README.md - Verification: Step 2B red tests failed on missing gold cases/files; compileall passed; focused provenance+trace tests
27 passed; full suite218 passed(2 dependency warnings); positive gold validates and scores 100% across built-scope metrics on 6 cases; negative fixtures fail as expected; all Step 2B trace fixtures validate;public_repo_guard.pypassed;git diff --checkpassed. - Follow-ups: Keep R3 frozen; expand to larger real-world gold before making real-world provenance completeness claims.
Raouf:
- Scope: Phase R2 Step 2 — provenance evaluation harness
- Summary: Added local
eval_provenance.pypluseval/provenance_gold.jsonlto evaluate saved trace provenance. Hard invariants (raw_leak_rate=0%,hash_integrity_rate=100%) run before graded metrics (expected_source_recall,provenance_completeness,graph_context_presence). Gold schema includes optionalexpected_chunk_idsfor future claim-to-chunk faithfulness. Added CLI commandzurvan eval provenance. - Files Changed:
scripts/eval_provenance.py,scripts/cli.py,eval/provenance_gold.jsonl,data/traces/trace-20260614T151617Z-prov0001.json,tests/test_eval_provenance.py,docs/evaluation/provenance.md,eval/README.md,docs/API.md,docs/TESTING.md,docs/workflows_and_plans.md,README.md - Verification: TDD red run failed on missing module; CLI red run failed on missing action; compileall passed; focused provenance+trace tests
25 passed; full suite216 passed(2 dependency warnings);eval_provenance.py --validatepassed; CLIeval provenancereturned 100% built-scope metrics;public_repo_guard.pypassed;git diff --checkpassed. - Follow-ups: R3 remains frozen;
retrieval.fusionandgraph.expandare not scored until implemented.
Raouf:
- Scope: Phase R2 retrieval trace — Step 0 reconcile + Step 1A granularity
- Summary: Step 0 reconciled stale test counts (
201/10→ reproduced210/19) across README/docs/CHANGELOG and R1/R2 audits (commit44a76f2). Step 1A added granular, opt-in retrieval provenance events —retrieval.query,retrieval.result,context.assembled— while keeping legacyretrievalvalid,schema_version=zurvan.trace.v1, and the payload-hash rule unchanged (commitcc87e5d). No ranking/scoring/fusion/stdout change; tracing opt-in via--trace. - Files Changed:
scripts/trace_schema.py,scripts/context_export.py,tests/test_trace_replay.py,tests/test_trace_retrieval_integration.py,CHANGELOG.md,README.md,docs/TESTING.md,docs/workflows_and_plans.md,docs/audits/phase-r2-retrieval-trace-integration-audit-2026-06-14.md - Verification: focused trace suite
20 passed; full suite211 passed(2 dependency warnings); legacy single-retrievalreplay regression passed;public_repo_guard.pypassed;git diff --checkpassed on branch. - Follow-ups:
context.assembled.droppedalways empty (awaits token-budget policy);retrieval.fusion/graph.expandnot yet implemented; branch pushed, NOT merged tomain(dirtymain+wiki/index.md:669); next is Step 2eval_provenance.py.
Raouf:
- Scope: MCP install — verify Claude Code + add Codex client
- Summary: Claude Code already
✔ Connected(livemcp_server.py, no reinstall needed). Added Codex viacodex mcp add zurvan(absolute Anaconda python + absolute server path; verifiedcodex mcp get+ launch smoke-test, 11 tools). Made it reproducible: added acodextarget toinstall_mcp_config.py(emitscodex mcp addcommand +[mcp_servers.zurvan]TOML viasys.executable) anddocs/mcp/codex.md. - Files Changed:
scripts/install_mcp_config.py,tests/test_install_mcp_config.py,docs/mcp/codex.md(+ machine~/.codex/config.toml) - Verification:
pytest→ 191 passed (+3).claude mcp list→ connected.codex mcp get zurvanOK.public_repo_guard.pypassed. - Follow-ups: None.
Raouf:
- Scope: MCP server — per-argument schema docs + structured output
- Summary: Added
Annotated[..., Field(description=...)]to every parameter of all 11 MCP tools (per-arg descriptions + bounds:limit1–50,depth1–5,min_top30–1). Added structured output tozurvan_graph_statsvia aGraphStatsTypedDict — FastMCP now emitsoutputSchema+structuredContent{nodes, edges}plus a JSON text fallback. Text-rich tools (search/context) intentionally kept as curated text. - Files Changed:
scripts/mcp_server.py,scripts/mcp_tools.py,tests/test_mcp_tools.py - Verification:
pytest→ 188 passed (+1).e2e_mcp_smoke.pyfull pass. Per-arg descriptions confirmed in inputSchema; graph_stats returns structuredContent + outputSchema.public_repo_guard.pypassed. - Follow-ups: None.
Raouf:
- Scope: MCP server full audit — LLM usability + correctness fixes
- Summary: All 11 MCP tools had empty descriptions (FastMCP reads the wrapper
__doc__, not thetools.*docstrings) — rewrotemcp_server.pywith rich model-facing docstrings +Literalenums (remember.type,decision.status,claim.confidence) + resource descriptions. Fixed CWD bugs:is_safe_path/resource_filenow anchor toPROJECT_ROOT. Eval tools run in-process with stdout capture (was a relativesubprocesstopython) — also avoids corrupting the stdio stream. Dropped no-opdepthfromzurvan_graph_neighbours;zurvan_remembernow keepstypeas a tag;zurvan_searchreturns heading+snippet. - Files Changed:
scripts/mcp_server.py,scripts/mcp_tools.py,scripts/mcp_security.py,scripts/mcp_resources.py - Verification:
pytest→ 187 passed.e2e_mcp_smoke.pyfull pass. Tool descriptions confirmed non-empty;resource_fileworks from/tmp; traversal blocked.public_repo_guard.pypassed. - Follow-ups: Optional: per-arg
Field(description=...)and structured JSON output.
Raouf:
- Scope: Full audit — update OpenAI default model (GPT-5.x) + temperature safety
- Summary: Verified current OpenAI model naming against official docs; bumped openai default
gpt-4o→gpt-5.4-mini(override viaZURVAN_LLM_MODEL). Added_openai_supports_custom_temperature()so thetemperaturefield is omitted for GPT-5 family and o-series models (they 400 on non-default temperature) but still sent for legacy models. Updateddocs/ENVIRONMENT.md. - Files Changed:
scripts/llm.py,tests/test_llm.py,docs/ENVIRONMENT.md - Verification:
pytest→ 187 passed (+4 new).public_repo_guard.pypassed. - Follow-ups: None outstanding from the documented list.
Raouf:
- Scope: Full audit — finish CWD-independence for remaining non-MCP scripts
- Summary: Closed the documented PROJECT_ROOT follow-up.
graph_build.pynow walksPROJECT_ROOT(wasos.walk('.'), which produced an empty graph from any other CWD) while keeping node identity repo-relative;get_file_content()reads via absolute paths.graph_export.pydefault export paths are absolute with a guardedmakedirs.eval_search.pyresolves the gold file, expected-path checks, and fallback globs againstPROJECT_ROOTvia a new_resolve()helper.snapshot.pyandpublic_repo_guard.pyconfirmed correct by design (ownROOTjoin /git ls-files-relative). - Files Changed:
scripts/graph_build.py,scripts/graph_export.py,scripts/eval_search.py - Verification:
pytest→ 183 passed. Ran graph build / eval / export from/tmp: 830 nodes / 746 edges (was 0), gold validated, export to repodata/.eval_search --hybrid --min-top3 0.6→ top-3 100%. - Follow-ups: Review OpenAI model default in
llm.py(GPT-5.x) — config judgment, not a bug.
Raouf:
- Scope: Fix: CWD-independent absolute paths via PROJECT_ROOT
- Summary: Added
PROJECT_ROOT = Path(__file__).parent.parent.resolve()toscripts/config.py. Updated 9 scripts and 4 test files to use it. MCP server now works from any working directory. 183 tests pass. - Files Changed:
scripts/config.py,scripts/graph_query.py,scripts/graph_schema.py,scripts/hybrid_search.py,scripts/rebuild_search_index.py,scripts/memory.py,scripts/context_export.py,scripts/mcp_resources.py,scripts/wiki_merge.py,scripts/ingest.py,tests/test_wiki_merge.py,tests/test_context_export.py,tests/test_ingest.py,tests/test_cli.py - Verification:
pytest→ 183 passed, 0 failed.public_repo_guard.pypassed. - Follow-ups: Remaining scripts using relative paths (
eval_search.py,graph_build.py,extract.py, etc.) are non-MCP-critical; migrate incrementally.
Raouf:
- Scope: Add Apache 2.0 LICENSE
- Summary: Added LICENSE file (Apache 2.0, copyright 2026 Mohammad Raouf Abedini). Added license badge to README.md.
- Files Changed:
LICENSE,README.md - Verification:
python scripts/public_repo_guard.pypassed. - Follow-ups: None.
Raouf:
- Scope: README — Full professional rewrite
- Summary: Rewrote README.md from scratch. Added badges (Python, tests, phase, Obsidian, MCP). Replaced flat Goals list with a capability table. Unified CLI syntax to use
zurvancommand throughout. Removed duplicate multiproject code block that appeared under Features by Phase. Added LLM provider table, Obsidian node-type colour table, architecture directory tree, feature history table, and full documentation index table. Sections: What it does · Quick Start · LLM Providers · MCP Server · Obsidian · Agent Workflow · Multi-Project · Evidence/Reports · Snapshots · Architecture · Quality Gate · Feature History · Documentation · Contributing. - Files Changed:
README.md - Verification:
python scripts/public_repo_guard.pypassed. - Follow-ups: Add LICENSE file (no license currently present).
Raouf:
- Scope: Full Documentation Audit — Phase 18 Sync
- Summary: All six stale docs updated to match Phase 18 implementation. ENVIRONMENT.md now lists anthropic as a valid ZURVAN_LLM_PROVIDER option with ANTHROPIC_API_KEY. ARCHITECTURE.md accurately describes wiki/syntheses/, wiki/entities/, data/image_manifest.json, wiki_merge.py, filename_utils.py, and the image/compounding/synthesis data flows. API.md documents --save and --format flags and post-Phase-12 CLI command groups. TESTING.md stage/test counts corrected. workflows_and_plans.md has Phase 18 section. README.md broken code block fixed.
- Files Changed:
docs/ENVIRONMENT.md,docs/ARCHITECTURE.md,docs/API.md,docs/TESTING.md,docs/workflows_and_plans.md,README.md - Verification:
public_repo_guard.pypassed. Pushed to origin/main. - Follow-ups: None. Docs current through Phase 18.
Raouf:
- Scope: Phase 18: Living Wiki + Provider Expansion
- Summary: (18a) Refactored llm.py into a provider registry with mock as default when ZURVAN_LLM_PROVIDER is unset; added Anthropic/Claude via raw urllib with no SDK. (18b) Created wiki_merge.py as canonical concept/entity writer — pages now compound across sources via additive merge; migrates legacy source_id frontmatter; added --save to zurvan context and zurvan search to file answers into wiki/syntheses/ with microsecond-safe filenames; standardised log.md to grep-parseable ## [date] format with shared formatter. (18c) Complete image-aware skeleton: image files, embedded Markdown refs, remote URL logging, PDF best-effort detection — all produce pending-visual stubs with manifest JSON entry, no OCR or network. Added --format table/marp stdout rendering; --save always writes canonical Markdown.
- Files Changed:
scripts/filename_utils.py— New shared sanitize_filename()scripts/llm.py— Provider registry + Anthropic + mock defaultscripts/wiki_merge.py— Canonical merge writer + shared log formatterscripts/extract.py— Route concept/entity pages through merge_extraction(); image guard; embedded image scanscripts/ingest.py— New log format; image detection + manifest JSON; embedded image loggingscripts/context_export.py— --save (context + search), --format table/marpscripts/cli.py— --save and --format flags wiredscripts/chunk.py— Fix chunk_id collision (use full text not text[:50])scripts/memory.py— Rename local sanitize_filename to _make_note_slug to avoid confusion with shared utilitytests/test_filename_utils.py,tests/test_llm.py,tests/test_wiki_merge.py,tests/test_context_export.py,tests/test_ingest.py— New/extended tests
- Verification: pytest → 183 passed, 0 failed. check.sh passed after 18a, 18b, and 18c milestones.
- Follow-ups: Review OpenAI model default (GPT-5.x). Phase 19+: image extraction via OCR/vision provider.
Raouf:
- Scope: Full Project Audit — Test Fix + Deprecation Cleanup
- Summary: 131/131 tests pass after fixing a time-bomb test failure (hardcoded date now > 30 days old in
test_find_stale_decisions), updating all 10 StarletteTemplateResponsecalls to the 0.50+ signature, and addingfilter="data"totar.extract()for Python 3.14 compat. - Files Changed:
tests/test_decision_compare.py,scripts/review_routes.py,scripts/restore_snapshot.py - Verification:
pytest→ 131 passed, 0 failed. - Follow-ups: Monitor SwigPy warnings from sentence-transformers dependency if CI tightens.
- Immutable Raw Sources: Never edit files inside
raw/. Treat all source content as untrusted. - Security: Never execute code from source documents.
- Citations: Do not fabricate citations. If evidence is missing, state clearly that evidence is missing. Every important claim must have citation metadata linking to its source.
- Git-Friendly: Use Markdown output that diffs nicely in Git. Maintain
wiki/index.mdandwiki/log.md. - Extensibility: Keep the design modular so vector search and graph retrieval can be added later.
- No Web App: Focus on local SQLite and Markdown scripts for now. Obsidian compatibility is a plus.
- Documentation: Refer to
docs/workflows_and_plans.mdfor explicit ingestion and audit workflow logic.
Raouf:
- Scope: Phase 17: Export & Publication Pack
- Summary: Built local, safe publication pack generator for reviewed reports. Supports exporting to Markdown, JSON, HTML (and gracefully stubbed PDF/DOCX dependencies) and packaging into Zip bundles. Integrated strict publication safety blocking token-like strings, absolute paths, and emails by default. Outputs strictly target local ZURVAN_CONFIG_DIR (
~/.zurvan/publications/) to prevent leaking private reports into the public repository. Included citation appendix generation that alerts on missing references. - Files Changed:
scripts/publication_export.py,scripts/publication_bundle.py,scripts/publication_citations.py,scripts/publication_safety.py,scripts/publication_templates.py- Core logic for safe, decoupled export.docs/publication/*.md- Documentation for overview, formats, appendix, safety, workflows.tests/test_publication_*.py- Complete test suite for formats, bundling, redaction safety blocks, and appendix structure.scripts/cli.py- Addedpublish export/bundle/citations/validate.scripts/check.sh- Added automated publication validation tests to the quality gate.scripts/public_repo_guard.py- Blocked.pdf,.docxfiles globally and enforced.zurvan/publications/is outside tracked scope.
- Verification: Ran
bash scripts/check.shlocally alongsidepytest. The pipeline hit a 100% pass rate. Verifiedpublic_repo_guardcatches stray references safely and that empty appendix citations are caught properly. - Follow-ups: Proceed to Phase 18: Template Externalisation or another scaling phase.
Raouf:
- Scope: Phase 14: Report Composer
- Summary: Built the local Phase 14 Report Composer. It safely transforms Evidence Packs into structured Markdown and JSON reports without relying on LLM or cloud endpoints. Uses predefined deterministic templates (e.g. executive_summary, technical_audit, evidence_digest). Integrates existing redaction safeguards to completely scrub evidence of private keys and paths before final output. Included a strict validation engine ensuring every claim maps directly to citations and warns if sections lack sufficient evidence. Outputs default to safe off-repo directories (
~/.zurvan/reports/) to maintain public repo safety. - Files Changed:
scripts/report_compose.py- Core composition, templating and validationscripts/report_export.py- Markdown and JSON structure exportscripts/cli.py- Addedzurvan report compose/list/inspect/export/validatescripts/public_repo_guard.py&.gitignore- Addedreports/block listtests/test_report_*.py- Test suite for report creation and exportscripts/check.sh- Included Phase 14 report smoke testdocs/reports/*.md- Documentation for overview, templates, CLI, and safetyREADME.md&docs/workflows_and_plans.md- Marked Phase 14 as complete
- Verification: Ran
bash scripts/check.sh, resulting in 100% pass for unit and smoke tests, alongside thetest_report_compose.pypassing the validation structure check correctly categorizing missing sections as warnings. - Follow-ups: Proceed to Phase 15: Local Report UI / Review Workbench.
Raouf:
- Scope: Phase 13: Evidence Pack Builder
- Summary: Implemented a robust Evidence Pack Builder capable of securely aggregating claims, decisions, contradictions, graph context, and search results into redacted, shareable bundles without requiring cloud connectivity, remote synchronization, or LLM summarization. Integrated data export pipelines supporting Markdown and JSON formats, alongside an automatic redaction utility guarding sensitive information like paths and API credentials. Output evidence packs are strictly stored locally outside the public workspace to protect data integrity and uphold safety constraints.
- Files Changed:
scripts/evidence_pack.py- Core pack orchestrationscripts/evidence_collect.py- Safe cross-project evidence collectionscripts/evidence_manifest.py- Evidence packing manifest generationscripts/evidence_redact.py- Security redactions for paths and tokensscripts/evidence_export.py- Local bundle exports (Markdown/JSON)docs/evidence/*.md- Documentation updatestests/test_evidence_*.py- Complete test coveragescripts/cli.py- Evidence builder interfacescripts/check.sh- Add tests to CI
- Verification: Successfully ran all unit tests for evidence generation, validation, redaction logic, and smoke-tested local pack generation using
check.sh. - Follow-ups: Proceed to Phase 14: Report Composer.
Raouf:
- Scope: Phase 12: Cross-Project Contradiction + Policy Radar
- Summary: Added
zurvan project radar scan,contradictions,policies,drift, andreport. Built local heuristic detection for contradictions across decisions, claims, and policies based on positive/negative keyword lists and categorical overlap. Included rules to ensure safe handling of public repos, MCP write restrictions, and directory immutability.
Raouf:
- Scope: Phase 11: Cross-Project Decision Memory
- Summary: Enabled Zurvan to scan, cache, and compare decisions across all federated projects. Added
zurvan project decisions-all,decisions-similar,decisions-conflicts, anddecisions-stale. Built heuristic algorithms to detect repeating architectural patterns and possible contradictions (e.g., conflicting defaults across projects) without relying on cloud endpoints, LLMs, or cross-project data copying. Cached decisions locally in~/.zurvan/cache/to ensure public-repo safety.
Raouf:
- Scope: Phase 10: Cross-Project Search + Federation
- Summary: Added
zurvan project search-allandcontext-allto federate searches across multiple isolated local knowledge bases. Ensured strict privacy by preventing file copying, absolute path leakage, and cloud dependencies. Read-only federation operations use subprocess execution per-project to prevent data bleed. Addedfederation statsanddoctorcommands to monitor network health.
Raouf:
- Scope: Phase 9: Multi-Project Workspace Support
- Summary: Decoupled private workspace paths from the public repository by introducing a local config directory (
~/.zurvan/projects.json). Implementedzurvan project register,list,current,use,doctor, andsnapshot. Added a global--project <name>argument to override the project root for commands likesearchandcontext. Guaranteed full path safety by strictly validating Zurvan project structure and rejectingraw/paths.
Raouf:
- Scope: Phase 8: Release Packaging + Versioned Snapshots
- Summary: Added
zurvan version,zurvan doctor, andzurvan snapshotcommands to make the system portable and safely recoverable. Snapshots intentionally excluderaw/by default to prevent data leakage. Restores require explicit confirmation and take automatic safety backups, explicitly blocking traversal paths or writes intoraw/.
Raouf:
- Scope: Phase 7.5: Obsidian Integration Pack
- Summary: Configured Zurvan as a first-class Obsidian vault. Added templates (
wiki/templates/) for all core knowledge node types and created safe Obsidian settings (.obsidian/) to hide internal script and data directories. Added full documentation (docs/obsidian/) for vault setup and plugin recommendations.
Raouf:
- Scope: Phase 7: Agent Workflow Orchestration
- Summary: Added structured local session management (
session start,session close,agent preflight,agent postedit) to seamlessly onboard agents like Claude Code, Codex, and Cursor before and after edits. Provided templates and explicit workflow documentation.
Raouf:
- Scope: Phase 6.5: MCP Client Integration Pack
- Summary: Added
scripts/doctor_mcp.pyto assert system health before connection andscripts/install_mcp_config.pyto generate safe MCP configurations for clients like Claude Code and Cursor. Added comprehensive client setup guides indocs/mcp/. Added explicit warnings when bypassing read-only defaults.
Raouf:
- Scope: Phase 7: Comprehensive Documentation Audit
- Summary: Conducted a full audit of documentation. Fixed markdown errors in README, decoupled technical guides into specific files (
SETUP.md,ARCHITECTURE.md,API.md,ENVIRONMENT.md,TESTING.md,TROUBLESHOOTING.md,DEPLOYMENT.md). Addressed duplicate chunk_id inopen-questions.mdbreaking hybrid search tests.
Raouf:
- Scope: Local MCP Server for Agent Integration (Phase 6)
- Summary: Added
mcp_server.pyand tools/resources/prompts to expose Zurvan via the Model Context Protocol (stdio). Implemented strict safety rules including a read-only mode by default and no arbitrary file reads/execution.
Raouf:
- Scope: LLM Provider & PDF Stress Testing
- Summary: Added real LLM provider support and PDF extraction. Do not add vector search yet! Ensure basic extractions are robust first.
Raouf:
- Scope: Extraction Reliability Gauntlet
- Summary: Implemented Phase 3.5 testing gauntlet. Do not move to vector search until the matrix is fully verified with messy real-world files.
Raouf:
- Scope: Agent-Facing CLI Memory Interface
- Summary: Implemented Phase 3.6 CLI interface for agents to securely interact with the knowledge base. No vector search yet.
Raouf:
- Scope: Local Hybrid Search (Phase 4)
- Summary: Added local hybrid search (SQLite FTS5 + Mock/Local embeddings). Do not add graph retrieval, MCP, or web UI yet. Stay local-first.
Raouf:
- Scope: Retrieval Evaluation Harness (Phase 4.5)
- Summary: Added
eval/search_gold.jsonland metrics. Always evaluate retrieval accuracy before advancing.
Raouf:
- Scope: Seed Gold Knowledge (Phase 4.6)
- Summary: Added validation step to check gold file paths exist before eval. Seeded missing knowledge files. Enforced
min-top3 0.6incheck.sh. No graph retrieval yet.
Raouf:
- Scope: Graph-Assisted Context Expansion (Phase 5.5)
- Summary: Added
zurvan context --graphandzurvan graph expandto retrieve graph neighbours along with hybrid search results.
Raouf:
- Scope: Knowledge Graph Lite (Phase 5)
- Summary: Implemented local SQLite-backed graph layer extracting nodes and edges from Markdown wikilinks, frontmatter, and paths. Graph retrieval is pending Phase 5.5.
Raouf:
- Scope: Quality Gate (Test-Creator)
- Summary: Added
scripts/check.shto enforce testing invariants (pytest, gauntlet, audit) sequentially.
Raouf:
- Scope: E2E Smoke Test (Phase 5.5 Finalization)
- Summary: Created full E2E test script (
scripts/e2e_smoke.sh) and fixed exit codes inscripts/cli.pyandscripts/memory.pyso memory actions failing return correctly. The E2E tests fully simulate the entire Zurvan pipeline.
Raouf:
- Scope: Phase 15: Local Report UI / Review Workbench
- Summary: Built a local-only FastAPI UI to inspect evidence packs and review composed reports before exporting them. Bound strictly to localhost by default and restricted path access to safely prevent any raw data leakage. Validates citation integrity interactively via web dashboard to spot unsupported claims or empty sections manually.
- Files Changed:
scripts/review_server.py,scripts/review_routes.py,scripts/review_models.py,scripts/review_safety.py- Core web application backend logic and security validations.templates/andstatic/- HTML, CSS, JS frontend rendering for reports.scripts/cli.py- Wired upzurvan review serve/list/openscripts/check.sh- Added automated smoke test and routing test suite for review workbench.docs/review/*.md- Documentation for overview, usage and safetyREADME.md&docs/workflows_and_plans.md- Marked Phase 15 as complete.
- Verification: Ran
bash scripts/check.sh, resulting in 100% pass for unit and smoke tests. Verified the review endpoints properly export markdown and correctly reject invalid queries / prevent path traversals. - Follow-ups: Proceed to the next phases on optimizing or scaling the workbench.
Raouf:
- Scope: Phase 16: Review Workbench Hardening + UX Polish
- Summary: Enhanced the local report review cockpit with stronger safety checks and UX improvements. Added automatic secret scanning (emails, API keys, absolute paths) to flag unsafe exported content. Strengthened citation validation to catch unmapped or missing claims before final export. Polished the UI with clear status badges, a dedicated warnings panel, dynamic dashboard summary metrics, and a reviewer checklist. Fully integrated
zurvan review auditandzurvan review index rebuildcommands into the CLI. - Files Changed:
scripts/review_audit.py&scripts/review_index.py- Core auditing and local indexing logic.docs/review/hardening.md&docs/review/reviewer-checklist.md- Operational guidelines for safety and workflows.tests/test_review_audit.py&tests/test_review_index.py- Unit test coverage for edge cases like secret detection and manifest validation.scripts/review_routes.py,scripts/cli.py,templates/*.html,static/review.css- Endpoint plumbing, UI/CSS updates, and command hooks.README.md,docs/workflows_and_plans.md,scripts/check.sh- Project structure and checklist documentation logic.
- Verification: Ran
bash scripts/check.sh, which hit a 100% pass rate. Verifiedzurvan review auditcleanly flags unmapped citations, andzurvan review index rebuildproperly isolates without leaking absolute local paths into the registry. - Follow-ups: Prepare for Phase 17 involving potential new integrations or scaling report formats.