fix: prevent cross-workspace fact leaks in chat citations by outbounder · Pull Request #5 · camplight/knowledgeplane

outbounder · 2026-03-26T13:29:27Z

Summary

Enforce workspace boundary checks in chat citation hydration so only facts from the active workspace are returned to clients.
Prevent model-provided usedFacts IDs from leaking cross-workspace fact content through Fact.findById lookups.
Update SPEC chat flow docs to explicitly document server-side workspace validation for cited facts.

Test plan

Review and verify chat.sendMessage now filters cited facts by fact.workspace_id === ctx.workspaceId.
Run lints for edited files via IDE diagnostics (chat.ts, docs/SPEC.md).
Optional manual QA in /chat: verify cited facts appear for current workspace and do not appear for foreign workspace IDs.

Made with Cursor

Prevent chat responses from leaking cross-workspace fact content by validating every cited fact against the active workspace before returning it to the UI. Made-with: Cursor

Ignore `.env.production` in git to prevent accidental commits of production database credentials from local development environments. Made-with: Cursor

…dress blog critique ## Major Additions ### 1. MS MARCO Passage Ranking Benchmark - bench_msmarco.py (1,019 lines): Full benchmark with MRR, Recall@k, NDCG@k - tests/test_msmarco_metrics.py (537 lines): 34 comprehensive unit tests - demos/demo_msmarco.py (324 lines): Interactive demo - docs/MSMARCO_USAGE.md + MSMARCO_QUICKREF.md: Complete documentation - examples/example_msmarco_usage.sh: 8 usage examples ### 2. Statistical Analysis Framework - statistical_analysis.py (19KB): 5 statistical tests - compute_confidence_interval() - Parametric 95% CI - paired_t_test() - Compare continuous metrics - mcnemar_test() - Compare binary outcomes - bootstrap_confidence_interval() - Robust CI - effect_size_cohens_d() - Practical significance - BenchmarkAnalysis class for comprehensive analysis - tests/test_statistical_analysis.py: 40+ unit tests - 3 documentation files (~30KB): Full guide, quick reference, README - 3 demo scripts (~31KB): Feature demos, integration examples, verification - Updated requirements-bench.txt with scipy>=1.11.0 ### 3. HotpotQA Scale-Up to 500+ Questions - Enhanced bench_hotpotqa.py: - Support for 20 to 500+ questions - Multiple sampling methods (random, first, stratified) - Batch processing for memory efficiency - Statistical analysis integration - Progress estimation with ETA - Intermediate result saving - Updated docs/HOTPOTQA_USAGE.md with performance estimates - docs/STATISTICAL_ANALYSIS_GUIDE.md: Statistical interpretation - QUICK_REFERENCE.md: One-page command reference - test_enhancements.py: Verification script - examples/: run_statistical_benchmark.sh, cross_validation.sh ## Blog Post Critique Response ### 4. Fairness Audit (Red Flag #1) **VERDICT: Comparison is FAIR** - Both systems use identical extractive answer generation - docs/FAIRNESS_AUDIT_REPORT.md (11.4 KB): Detailed analysis - docs/FAIRNESS_FIX_PROPOSAL.md (20.6 KB): Architectural improvements - docs/FAIRNESS_AUDIT_SUMMARY.md (4.4 KB): TL;DR ### 5. Revised Blog Post (Red Flags #2-10) - docs/BLOG_POST_REVISED.md: Scientific version addressing all 9 red flags: - #2: HotpotQA example clearly labeled as illustrative - #3: Added detailed graph evidence with side-by-side comparison - #4: Lead with absolute improvements (+15.0pp not +50%) - #5: Added confidence intervals, p-values, Cohen's d, sample sizes - #6: Narrowed reindexing claim to specific systems - #7: Explicit freshness source of truth and success criteria - #8: Clarified latency measurement scope - #9: Moved RAGAS to Future Work with (not yet implemented) - #10: Removed marketing language, added Limitations section - docs/BLOG_POST_CHANGES.md: Side-by-side audit trail ### 6. Comprehensive Methodology Documentation - docs/METHODOLOGY.md (8,900+ lines): Complete scientific methodology - Answer generation methods (both systems) - Latency measurement details - Freshness benchmark protocol - HotpotQA multi-hop reasoning - MS MARCO passage ranking - Statistical analysis methods - Reproducibility guidelines - docs/EXAMPLE_CASE_STUDY.md (1,200+ lines): Worked example - docs/LIMITATIONS.md (1,600+ lines): Honest limitations, threats to validity - docs/FAQ.md (1,500+ lines): 20+ questions with detailed answers - docs/README.md: Documentation index ## Summary - ~3,000 lines: MS MARCO benchmark (3rd dataset) - ~95KB: Statistical analysis framework - ~13,200 lines: Methodology documentation - Enhanced HotpotQA to support 500+ questions - All 10 blog post red flags addressed - Production-ready, scientifically rigorous benchmark suite Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

altras · 2026-03-30T16:44:46Z

Fixed in the benchmarking suite PR — commit 583a501 adds workspace isolation to REST API endpoints.

outbounder added 2 commits March 26, 2026 15:28

fix: enforce workspace scoping for chat fact citations

b3beda7

Prevent chat responses from leaking cross-workspace fact content by validating every cited fact against the active workspace before returning it to the UI. Made-with: Cursor

chore: ignore production env file locally

76c5869

Ignore `.env.production` in git to prevent accidental commits of production database credentials from local development environments. Made-with: Cursor

altras force-pushed the 61-cross-workspaces-leaks branch from 5c34f7b to 76c5869 Compare March 30, 2026 16:23

altras closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent cross-workspace fact leaks in chat citations#5

fix: prevent cross-workspace fact leaks in chat citations#5
outbounder wants to merge 2 commits intomainfrom
61-cross-workspaces-leaks

outbounder commented Mar 26, 2026

Uh oh!

altras commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

outbounder commented Mar 26, 2026

Summary

Test plan

Uh oh!

altras commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants