Fully open-source autonomous scientific research capabilities for Claude Code.
NOTE: NO official ties with Anthropic or Claude! Completely independent, open-source project.
Claude Code Scientist transforms Claude Code into a semi-autonomous, self-improving research system. It provides:
- Research Director logic via CLAUDE.md
- Specialized subagents for literature review, synthesis, peer review, experiments
- Skills for orchestrating multi-step research workflows
- Provenance tracking ensuring every claim has a source
- Python 3.9+ with pip
- Claude Code CLI installed and authenticated
- ~2GB disk space for models and caches
- Claude Code subscription: Required - Pro or Max (run /login inside Claude Code)
git clone https://github.com/rhowardstone/Claude-Code-Scientist.git
cd Claude-Code-Scientist
# Install Python dependencies
pip install -r requirements.txt
# Download spaCy model
python -m spacy download en_core_web_sm# Install to another project's .claude/ directory
./install.sh /path/to/your/project
# Or install globally to ~/.claude/
./install.sh --global./session.sh new "Your research goal here"or even more simply:
./session.sh newThat's it! This creates a session and launches Claude Code, which automatically:
- Decompose the goal into Research Questions
- Search literature across multiple databases (OpenAlex, PubMed, Semantic Scholar)
- Extract evidence with full provenance (DOI + quote + page)
- Synthesize findings into a LaTeX paper
- Run peer review with three specialized reviewers
- Iterate until unanimous acceptance
Each research project gets its own isolated session:
./session.sh new "goal" # Create new session
./session.sh list # List all sessions
./session.sh resume <id> # Resume a session
./session.sh current # Check active sessionSessions store all artifacts in workspace/sessions/session_<id>/.
A completed session produces:
workspace/sessions/session_abc123/
├── synthesis/
│ ├── paper.tex # LaTeX paper with citations
│ ├── paper.pdf # Compiled PDF
│ └── references.bib # Bibliography with DOIs
├── literature/
│ └── preread_papers.json # Discovered papers with abstracts
├── peer_review/
│ ├── methodology_review.json
│ ├── statistics_review.json
│ └── impact_review.json
├── experiments/ # If experiments were run
│ ├── results.json
│ └── figures/
└── world_model.json # Research state
After completing research sessions, you can run CORTEX to analyze what went well and what could be improved:
./session.sh cortex # Launch cortex sessionThen run /cortex to start the self-improvement cycle. CORTEX traces the narrative flow of prior sessions, diagnoses issues, and generates fixes. It's how this system improves itself.
Claude Code Scientist
│
├── CLAUDE.md (Research Director prompt)
│
├── .claude/
│ ├── agents/ # Specialized subagent configs (7)
│ │ ├── lit-scout.md
│ │ ├── synthesizer.md
│ │ ├── reviewer-*.md # 3 reviewers
│ │ ├── experimentalist.md
│ │ └── tool-acquirer.md
│ │
│ ├── skills/ # Orchestration workflows (24)
│ │ ├── literature-search/
│ │ ├── peer-review/
│ │ ├── goal-decomposition/
│ │ ├── synthesizer/
│ │ └── ...
│ │
│ ├── hooks/ # Validation automation (8)
│ │ ├── validate-claims.py
│ │ ├── validate-doi.py
│ │ └── verify-provenance.py
│ │
│ └── rules/ # Conventions (3)
│ ├── provenance-tracking.md
│ ├── world-model.md
│ └── workflow.md
│
├── craig/ # Python utilities (137 files)
│ ├── world_model.py # Research state management
│ ├── doi_fetcher.py # DOI validation
│ ├── latex_compiler.py # Paper compilation
│ ├── literature/ # Database clients
│ │ ├── openalex_client.py
│ │ ├── pubmed_client.py
│ │ └── semantic_scholar_client.py
│ ├── pipeline/ # Phase implementations
│ └── experiment_harness_templates/ # Experiment scaffolding
│
└── mcp-servers/literature/ # Literature search MCP
- Triple-search strategy (keywords, semantic, "googling the question")
- Multi-database support (OpenAlex, PubMed, Semantic Scholar)
- Citation graph expansion (forward + reverse)
- Relevance filtering with provenance
- Rigorous claim extraction with DOI + quote + page
- Confidence scoring with justification
- Conflict detection across papers
- Gap identification for experimental follow-up
- Academic paper generation (LaTeX)
- Proper citations with DOIs
- Narrative flow (not database dump)
- Separation of direct vs analogical evidence
- Three reviewers: methodology, statistics, impact
- Actionable feedback with specific locations
- Revision cycles until unanimous acceptance
- External validation (Codex) when available
- Phased design → implementation → validation → execution
- Real data only (no mock/simulated)
- Timing validation before full runs
- Incremental saves and checkpointing
- Provenance is everything - Every claim needs DOI + quote + page
- No simulation trap - Run actual tools, not simulations
- Writing code ≠ running code - Always execute to verify
- Honesty over completion - Missing evidence > false evidence
- Unanimous peer review - Not majority vote
- RAM: 8GB (32GB+ recommended)
- Storage: 10GB for caches and session data
- CPU: Any modern multi-core
- GPU: Not required (CPU embeddings work fine)
- RAM: 32GB+ (genomics/scRNA-seq data can be large)
- Storage: 50GB+ for paper PDFs and knowledge graphs
- GPU: NVIDIA with CUDA for faster embeddings (optional)
- Python packages: ~500MB
- Embedding models: ~400MB (downloaded on first use)
- spaCy models: ~50MB
- FAISS: ~10MB
- Claude Code subscription: Required - Pro or Max (run /login inside Claude Code)
- Literature search: Free (OpenAlex, PubMed, Semantic Scholar are free APIs)
- PDF access: Free (uses open access sources only)
Create .mcp.json to add literature databases:
{
"mcpServers": {
"openalex": {
"type": "stdio",
"command": "python",
"args": ["mcp-servers/literature/server.py"]
}
}
}Create .claude/skills/my-skill/SKILL.md:
---
name: my-skill
description: What it does and when to use it
user-invocable: true
---
# Skill instructions hereCreate .claude/agents/my-agent.md:
---
name: my-agent
description: Specialized agent description
model: sonnet
---
# Agent instructions hereResearch artifacts are stored in workspace/:
workspace/
├── world_model.json # Research state
├── literature/ # Search results, papers
├── synthesis/ # Paper drafts
├── peer_review/ # Review feedback
└── experiments/ # Experimental artifacts
The craig/ directory contains 137 Python files providing:
- world_model.py - Research state management (papers, claims, RQs)
- doi_fetcher.py - DOI validation and metadata retrieval
- latex_compiler.py - LaTeX paper compilation
- conflict_detector.py - Detect contradictions across sources
- data_provenance.py - Track evidence chains
- openalex_client.py - 200M+ open access works
- pubmed_client.py - Biomedical literature
- semantic_scholar_client.py - CS/AI papers with embeddings
- citation_expander.py - Forward/reverse citation traversal
The craig/experiment_harness_templates/ provides scaffolding for experiments:
run.sh- Master experiment runnersteps/- Modular experiment phaseslib/- Utilities for checkpointing, scaling, validation
For compiling the generated papers to PDF:
apt install texlive-latex-base texlive-latex-extra # Debian/Ubuntu
# or: brew install --cask mactex # macOSFor faster embeddings during knowledge graph ingestion:
# Replace faiss-cpu with faiss-gpu
pip uninstall faiss-cpu
pip install faiss-gpu
# Use CUDA for embeddings
python -m craig.literature.knowledge_graph.ingest --device cuda --batch-size 128If you see hooks running twice (e.g., PostToolUse:Write hook succeeded appearing 6 times instead of 3), you have hooks configured in both:
- Global:
~/.claude/settings.json - Local:
.claude/settings.json
Claude Code merges both, so they stack. Solutions:
- Remove global hooks if you only use this project
- Remove local hooks if you prefer global configuration
- Accept duplicates - they're harmless, just verbose
Messages like NCBI_API_KEY not set are informational. The pipeline works without API keys but may hit rate limits. To add keys:
cp .env.example .env
# Edit .env with your keysGet keys at:
- NCBI: https://www.ncbi.nlm.nih.gov/account/settings/
- OpenAlex: https://openalex.org/users/me
- Semantic Scholar: https://www.semanticscholar.org/product/api
This means the paper couldn't be downloaded from any open access source. It's normal for paywalled papers. The pipeline continues with abstract-only data.
We welcome contributions! The most valuable way to contribute is:
- Run Cortex between your research sessions
- Submit improvements as pull requests
# After a research session, run:
./session.sh cortex
# Then in Claude:
/cortexCortex analyzes past sessions, diagnoses issues, and generates fixes. Submit the improvements back!
See CONTRIBUTING.md for full guidelines.
MIT