chore: prepare repository for open-source release by altras · Pull Request #6 · camplight/knowledgeplane

altras · 2026-03-30T14:38:07Z

Summary

Prepares the repository for public open-source release with governance, documentation, CI, and secret sanitization.

README: Full rewrite with logo, badges, dashboard screenshot, quick start, MCP integration, architecture diagram, and docs links
Governance: CONTRIBUTING.md, CODE_OF_CONDUCT.md (Contributor Covenant v2.1), SECURITY.md (responsible disclosure), CHANGELOG.md (v0.1.0)
GitHub: Issue templates (bug report, feature request), PR template, CI workflow (typecheck, lint, test)
Sanitization: Removed real credentials from ADR doc, removed specific ngrok domain, updated .gitignore for benchmark runs/env files/keys/swarm data
Code cleanup: Removed debug console.logs from Fact.ts, FactRelation.ts, editor/page.tsx, RelationEditForm.tsx

Pre-public checklist (after merge, before making repo public)

Rotate all credentials (OpenAI, Google OAuth)
Clean git history with BFG (bfg --delete-files .env.local)
Force push cleaned history
Verify no secrets in any commit

Test plan

npm run typecheck passes
npm run lint passes
README renders correctly on GitHub (logo, badges, images, tables)
No real secrets in any tracked file (git grep -i "bench_\|74be80db\|17ac0fa1\|boa-driving")
CI workflow triggers on PR

🤖 Generated with Claude Code

- Rewrite README with logo, badges, quick start, and MCP integration docs - Add governance files: CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, CHANGELOG - Add GitHub issue/PR templates and CI workflow (typecheck, lint, test) - Sanitize credentials from ADR doc (workspace ID, user ID, API key) - Remove specific ngrok domain from README - Update .gitignore: benchmark runs, env files, keys, swarm data - Remove debug console.logs from Fact, FactRelation, editor components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dress blog critique ## Major Additions ### 1. MS MARCO Passage Ranking Benchmark - bench_msmarco.py (1,019 lines): Full benchmark with MRR, Recall@k, NDCG@k - tests/test_msmarco_metrics.py (537 lines): 34 comprehensive unit tests - demos/demo_msmarco.py (324 lines): Interactive demo - docs/MSMARCO_USAGE.md + MSMARCO_QUICKREF.md: Complete documentation - examples/example_msmarco_usage.sh: 8 usage examples ### 2. Statistical Analysis Framework - statistical_analysis.py (19KB): 5 statistical tests - compute_confidence_interval() - Parametric 95% CI - paired_t_test() - Compare continuous metrics - mcnemar_test() - Compare binary outcomes - bootstrap_confidence_interval() - Robust CI - effect_size_cohens_d() - Practical significance - BenchmarkAnalysis class for comprehensive analysis - tests/test_statistical_analysis.py: 40+ unit tests - 3 documentation files (~30KB): Full guide, quick reference, README - 3 demo scripts (~31KB): Feature demos, integration examples, verification - Updated requirements-bench.txt with scipy>=1.11.0 ### 3. HotpotQA Scale-Up to 500+ Questions - Enhanced bench_hotpotqa.py: - Support for 20 to 500+ questions - Multiple sampling methods (random, first, stratified) - Batch processing for memory efficiency - Statistical analysis integration - Progress estimation with ETA - Intermediate result saving - Updated docs/HOTPOTQA_USAGE.md with performance estimates - docs/STATISTICAL_ANALYSIS_GUIDE.md: Statistical interpretation - QUICK_REFERENCE.md: One-page command reference - test_enhancements.py: Verification script - examples/: run_statistical_benchmark.sh, cross_validation.sh ## Blog Post Critique Response ### 4. Fairness Audit (Red Flag #1) **VERDICT: Comparison is FAIR** - Both systems use identical extractive answer generation - docs/FAIRNESS_AUDIT_REPORT.md (11.4 KB): Detailed analysis - docs/FAIRNESS_FIX_PROPOSAL.md (20.6 KB): Architectural improvements - docs/FAIRNESS_AUDIT_SUMMARY.md (4.4 KB): TL;DR ### 5. Revised Blog Post (Red Flags #2-10) - docs/BLOG_POST_REVISED.md: Scientific version addressing all 9 red flags: - #2: HotpotQA example clearly labeled as illustrative - #3: Added detailed graph evidence with side-by-side comparison - #4: Lead with absolute improvements (+15.0pp not +50%) - #5: Added confidence intervals, p-values, Cohen's d, sample sizes - #6: Narrowed reindexing claim to specific systems - #7: Explicit freshness source of truth and success criteria - #8: Clarified latency measurement scope - #9: Moved RAGAS to Future Work with (not yet implemented) - #10: Removed marketing language, added Limitations section - docs/BLOG_POST_CHANGES.md: Side-by-side audit trail ### 6. Comprehensive Methodology Documentation - docs/METHODOLOGY.md (8,900+ lines): Complete scientific methodology - Answer generation methods (both systems) - Latency measurement details - Freshness benchmark protocol - HotpotQA multi-hop reasoning - MS MARCO passage ranking - Statistical analysis methods - Reproducibility guidelines - docs/EXAMPLE_CASE_STUDY.md (1,200+ lines): Worked example - docs/LIMITATIONS.md (1,600+ lines): Honest limitations, threats to validity - docs/FAQ.md (1,500+ lines): 20+ questions with detailed answers - docs/README.md: Documentation index ## Summary - ~3,000 lines: MS MARCO benchmark (3rd dataset) - ~95KB: Statistical analysis framework - ~13,200 lines: Methodology documentation - Enhanced HotpotQA to support 500+ questions - All 10 blog post red flags addressed - Production-ready, scientifically rigorous benchmark suite Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

altras force-pushed the open-source-prep branch from ea43e06 to 9f51631 Compare March 30, 2026 15:06

altras force-pushed the open-source-prep branch from 9f51631 to cffe126 Compare March 30, 2026 15:54

altras merged commit c668122 into main Mar 30, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: prepare repository for open-source release#6

chore: prepare repository for open-source release#6
altras merged 1 commit intomainfrom
open-source-prep

altras commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

altras commented Mar 30, 2026

Summary

Pre-public checklist (after merge, before making repo public)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant