Skip to content

michailmitsakis/notion-second-brain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 notion-second-brain

A fully local RAG agent over your Notion workspace β€” no cloud APIs, no subscriptions, no data leaving your machine.

A local-first "second brain" agent. Ingests Notion exports (and if needed live Notion pages), processes PDFs/images into markdown, indexes everything into Qdrant, and serves RAG queries via CLI (without memory) or Streamlit GUI (with memory within and between runs). All inference runs locally through Ollama. Designed to run well on native Windows.

Built as a complete local RAG stack: hybrid dense+sparse retrieval, cross-encoder reranking, sentence-aware chunking (with atomic code/table handling), file-based persistent memory, an anchored-rubric evaluation harness, and (optionally) Phoenix observability β€” all wired together with pydantic-ai. Optimised for a single 12 GB VRAM / 32 GB RAM system.

The problem

I use Notion heavily for knowledge management, collecting and organizing my thoughts, notes, and research. As my notes have grown, it has become increasingly beneficial to summarize key sections or pages. While Notion includes a native AI package, I wanted to avoid it due to security, privacy and cost concerns. So I built this instead.


✨ Features

  • πŸ” Hybrid retrieval β€” dense (nomic-embed-text) + sparse (BM25) with RRF fusion
  • πŸ” Cross-encoder reranking β€” BAAI/bge-reranker-v2-m3 for precision on top of recall
  • βœ‚οΈ Sentence-aware chunking β€” atomic handling of code blocks and tables (v4)
  • 🧠 Persistent memory β€” file-based per-session and long-term memory injected at runtime
  • πŸ“Š Evaluation harness β€” hand-rolled + pydantic_evals with 4-criterion anchored rubric
  • πŸ”­ Phoenix observability β€” optional OTel tracing via Arize Phoenix
  • 🏠 Fully local β€” Ollama inference, Qdrant vector store, zero cloud dependencies

πŸš€ Quickstart (assuming existing md data)

docker compose up -d
python -m venv .venv && .venv\Scripts\activate   # Windows
# python -m venv .venv && source .venv/bin/activate  # macOS/Linux
pip install -r requirements.txt
ollama pull nomic-embed-text    # or pull any suitable embedding model
ollama pull gemma4:latest       # or pull any agentic model
python scripts/run_rag.py       # index existing data/clean/
python -m assistant.cli         # start chatting in CLI (no memory)
streamlit run assistant/app.py  # start chatting in Streamlit (with memory)

πŸ“‹ Requirements

  • Python 3.12+
  • Ollama running locally
  • Docker for Qdrant + (optionally) Phoenix
  • Hardware baseline: 12 GB VRAM, 32 GB RAM (tested on Windows; cross-platform)
  • Disk: ~30 GB for Ollama models + Qdrant persistence

For lower-end hardware, adjust model size accordingly.


πŸ“ Project layout

notion-second-brain/
β”œβ”€β”€ assistant/                  # Agent runtime
β”‚   β”œβ”€β”€ agent.py                # pydantic-ai Agent + per-call instantiation
β”‚   β”œβ”€β”€ app.py                  # Streamlit GUI (with memory)
β”‚   β”œβ”€β”€ cli.py                  # CLI REPL (no memory)
β”‚   β”œβ”€β”€ memory.py               # file-based memory
β”‚   └── tools.py                # retrieve_knowledge, fetch_notion_page
β”œβ”€β”€ pipelines/
β”‚   β”œβ”€β”€ etl/                    # Notion/files β†’ raw markdown
β”‚   β”œβ”€β”€ rag/                    # chunker, embeddings, reranker, indexer
β”‚   β”œβ”€β”€ utils/
β”‚   └── models.py
β”œβ”€β”€ scripts/                    # Pipeline entry points
β”‚   β”œβ”€β”€ run_marker.py
β”‚   β”œβ”€β”€ run_clean_md.py
β”‚   β”œβ”€β”€ run_etl.py
β”‚   └── run_rag.py
β”œβ”€β”€ evals/                      # Evaluation suite
β”‚   β”œβ”€β”€ cases.py                # golden + adversarial + distribution cases
β”‚   β”œβ”€β”€ rubrics.py              # 4 anchored 1–5 rubrics
β”‚   β”œβ”€β”€ judges.py               # LLM-as-judge (Ollama)
β”‚   β”œβ”€β”€ run_evals.py            # hand-rolled runner (canonical)
β”‚   └── run_pydantic_evals.py   # pydantic_evals runner
β”œβ”€β”€ extras/                     # Optional / reference
β”‚   β”œβ”€β”€ run_deepeval.py         # DeepEval showcase (see deepeval_info.md)
β”‚   β”œβ”€β”€ run_phoenix.py          # Phoenix OTel tracing
β”‚   β”œβ”€β”€ deepeval_info.md        # DeepEval setup notes
β”‚   β”œβ”€β”€ llm-eval-patterns.md    # Eval methodology reference
β”‚   └── prompt-eval-designer.md # Rubric design protocol
β”œβ”€β”€ memory/                     # Conversation memory (gitignored)
β”œβ”€β”€ data/                       # All data files (gitignored)
β”œβ”€β”€ images/                     # README screenshots
β”œβ”€β”€ docker-compose.yml
└── requirements.txt

βš™οΈ Install

Single venv. One conflict: openai version swap to move between RAG/agent mode and marker mode (see OpenAI conflict).

With pip

python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS/Linux
pip install -r requirements.txt

requirements.txt pulls torch+cu130 β€” about 2.5 GB. Expect first install to download ~3–4 GB of wheels.

With uv (alternative)

uv venv .venv && .venv\Scripts\activate
uv pip install -r requirements.txt

The fully project-managed uv flow (with pyproject.toml + uv.lock) is theoretically cleaner, but encountered setup issues on this stack β€” uv pip or plain pip is the recommended path until those are resolved.


πŸ—’οΈ Notion setup

Relies on the notion-to-md-py library. You need two separate Notion integrations because they serve different code paths:

  • NOTION_TO_MD_AUTH_TOKEN β€” used by scripts/run_etl.py to bulk-fetch pages as markdown during ETL.
  • NOTION_ASSISTANT_AUTH_TOKEN β€” used by the agent's fetch_notion_page tool to live-fetch a single page on demand.

If you don't need live-fetch, skip the second integration.

Creating the integrations

  1. Go to https://www.notion.so/profile/integrations β†’ + New integration
  2. Name it (e.g. Second Brain β€” Notion-to-MD) β†’ Read content capabilities only
  3. Copy the Internal Integration Secret (ntn_...) into .env
  4. Optionally repeat for a second integration (NOTION_ASSISTANT_AUTH_TOKEN)

Granting page access

For each integration, open each page in Notion β†’ β‹― β†’ Connections β†’ add the integration. For workspace-wide access: Settings β†’ Connections β†’ add at workspace level.

Without this step the integration returns empty results for every page.


πŸ”§ Configure .env

# ── Notion ───────────────────────────────────────────────────────────────────
NOTION_ASSISTANT_AUTH_TOKEN=ntn_...
NOTION_TO_MD_AUTH_TOKEN=ntn_...

# ── ETL mode ─────────────────────────────────────────────────────────────────
LOAD_MODE="notion"            # "files" or "notion"
# ETL_PAGE_NAME="AI/ML/Data Science"  # limit to one page for first-run testing

# ── Marker (PDF / image OCR) ─────────────────────────────────────────────────
MARKER_USE_LLM=false
MARKER_FORCE_OCR=true
MARKER_WORKERS=2
MARKER_DISABLE_IMAGES=false

# ── Qdrant ───────────────────────────────────────────────────────────────────
QDRANT_URL=http://localhost:32768
FORCE_REINDEX=true            # set true on first run, or after schema changes
ENABLE_RERANK=true

# ── Ollama ───────────────────────────────────────────────────────────────────
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:latest
# Use gemma4 for the agent β€” qwen3 has a large KV-cache offload that
# spills to CPU and tanks throughput. Qwen3 is fine as eval judge.

# ── Memory ───────────────────────────────────────────────────────────────────
ENABLE_MEMORY=true
RECENT_LOG_DAYS=2

# ── Evaluations ──────────────────────────────────────────────────────────────
JUDGE_MODEL=qwen3:8b

Verify GPU offload

ollama ps

Expected: 100% GPU. If you see any CPU %, set OLLAMA_NUM_GPU=99 at the shell level (not just .env), or recreate the model via a Modelfile with num_gpu 99.


πŸ“ Data layout

data/
β”œβ”€β”€ raw/
β”‚   β”œβ”€β”€ documents/<page-name>/   # raw Notion exports
β”‚   β”œβ”€β”€ images/<page-name>/      # downloaded page images
β”‚   └── raw_md/                  # raw Notion text exports
β”œβ”€β”€ crawled/                     # (not yet used) web crawler output
β”œβ”€β”€ clean/
β”‚   β”œβ”€β”€ pdfs_md/                 # PDF β†’ md via marker
β”‚   β”œβ”€β”€ images_md/               # image OCR via marker
β”‚   └── clean_md/                # raw_md β†’ cleaned md
└── pages.txt                    # page list for live Notion fetch
memory/
β”œβ”€β”€ MEMORY.md                    # long-term distilled context
└── YYYY-MM-DD.md                # daily conversation logs

πŸ”„ Pipeline stages

1 Β· ETL β€” Notion / files β†’ raw markdown

python scripts/run_etl.py

Fetches from Notion (LOAD_MODE=notion) or reads local files (LOAD_MODE=files). Writes to data/raw/raw_md/.

First run: set ETL_PAGE_NAME="Some Page" to test the Notion connection on a single page before pulling your whole workspace. Notion rate limits are aggressive on large workspaces.

2 Β· Cleaning β€” raw β†’ clean markdown

python scripts/run_clean_md.py

LLM-based cleanup of data/raw/raw_md/ β†’ data/clean/clean_md/.

3 Β· Marker β€” PDF + image β†’ markdown

python scripts/run_marker.py

Converts PDFs and images into markdown. Independent of ETL β€” only run when you have new source files.

Env vars: MARKER_STEP (pdfs / images / all), MARKER_TEST_SUBDIR (debug subset).

Alternatives that avoid the openai conflict: PyMuPDF4LLM (lightest, no OCR), MinerU, Kreuzberg, Docling.

4 Β· RAG indexing β€” clean markdown β†’ Qdrant

python scripts/run_rag.py

Applies sentence-aware chunking β†’ hybrid embeddings β†’ optional reranking β†’ Qdrant upload with RRF fusion. Collection visible at http://localhost:32768/dashboard#/collections.

Key env vars: FORCE_REINDEX=true (schema changes), RETRIEVAL_TOP_N=10, RERANK_TOP_K=3.

Qdrant indexing note: indexed_vectors_count: 0 with points_count > 0 is normal β€” HNSW indexing is deferred until data exceeds indexing_threshold. Lower the threshold in pipelines/rag/indexer.py for immediate indexing.

5 Β· Agent β€” query your second brain

CLI REPL (no memory)

python -m assistant.cli
python -m assistant.cli "What should I focus on this quarter?"

Streamlit GUI (with memory)

streamlit run assistant/app.py

Always run from the repo root.

Streamlit UI


🧠 Memory

File-based memory at memory/:

  • MEMORY.md β€” long-term distilled context (curated facts, preferences)
  • YYYY-MM-DD.md β€” daily conversation logs

Each turn is appended as:

### HH:MM
User: ...
Assistant: ...

The agent loads MEMORY.md + the most recent RECENT_LOG_DAYS daily logs into its system prompt per call, implemented via per-call Agent instantiation (pydantic-ai 1.x freezes system_prompt after construction).

Toggle: ENABLE_MEMORY=true|false.


πŸ“Š Evaluations

Two runners, same anchored rubric and golden test set β€” run either or both for cross-validation:

Script Framework Purpose
python -m evals.run_evals Hand-rolled Canonical. 4-criterion rubric (relevance, correctness, citation_quality, safety), LLM-as-judge via Ollama.
python -m evals.run_pydantic_evals pydantic_evals Same dataset + rubric via Evaluator + EvaluationReason. Runs all 3 tiers by default.

Results written to evals/results/*.json. Set judge model via .env: JUDGE_MODEL=qwen3:8b.

Eval methodology

Follows the frameworks in extras/llm-eval-patterns.md and extras/prompt-eval-designer.md β€” moving from vibes-based to statistically anchored evaluation (pointwise rubrics, 3-tier test suites, CI gates). Built on: ai-engineering-from-scratch.


πŸ”­ Observability β€” Arize Phoenix (optional)

extras/run_phoenix.py wires Phoenix OTel tracing. Every agent.run(...), LLM call, and tool call appears as a span in the Phoenix UI at http://localhost:6006.

pip install openinference-instrumentation-pydantic-ai opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-api
docker compose up -d phoenix
python -m extras.run_phoenix "your query here"

Phoenix Trace UI

Don't run Phoenix tracing and DeepEval's DeepEvalInstrumentationSettings simultaneously β€” both wrap the same pydantic-ai OTel hooks.


🐳 Docker Compose

docker compose up -d          # starts both Qdrant and Phoenix
docker compose up -d qdrant   # Qdrant only
docker compose up -d phoenix  # Phoenix only
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "32768:6333"
      - "6334:6334"
    volumes:
      - ./data/qdrant:/qdrant/storage
    restart: unless-stopped

  phoenix:
    image: arizephoenix/phoenix:latest
    ports:
      - "6006:6006"
      - "4317:4317"
    volumes:
      - ./data/phoenix:/mnt/data
    restart: unless-stopped

⚠️ OpenAI conflict (marker-pdf vs pydantic-ai)

The single biggest setup friction. pip check surfaces it as:

marker-pdf 1.10.2 requires openai<2.0.0, but you have openai 2.41.0
  • marker-pdf pins openai<2.0.0
  • pydantic-ai pulls openai>=2.0.0
  • They are incompatible in the same environment

Workaround: swap versions as needed (~10 seconds):

# Before running marker / ETL scripts
pip install "openai<2.0.0,>=1.65.2"

# Before running the agent / evals
pip install "openai>=2.0.0"

Only needed when you have new PDFs or images to OCR. Re-indexing already-cleaned markdown (run_rag.py) doesn't touch marker and doesn't need the swap.

Decision tree

New PDFs / images to OCR?          β†’ marker mode  (pip install openai<2, run_marker.py)
Re-index existing cleaned markdown? β†’ agent mode   (pip install openai>=2, run_rag.py)
Chat / develop / run evals?        β†’ agent mode

πŸ› οΈ Troubleshooting

Symptom Fix
Notion returns nothing Pages not shared with integration β€” open page β†’ β‹― β†’ Connections β†’ add integration
Notion rate limits / 429s Use ETL_PAGE_NAME="Single Page" for first-run validation
GPU offload (CPU %) Set OLLAMA_NUM_GPU=99 at shell level, or recreate model via Modelfile
pydantic-ai 404 against Ollama Use OllamaProvider(base_url="http://localhost:11434/v1") β€” the ollama:<model> shorthand routes to the wrong path
Streamlit import errors Always cd to repo root before running
Eval JSON missing Default output: evals/results/*.json; override with --output PATH
Memory not used Check ENABLE_MEMORY=true; memory/MEMORY.md is auto-created on first run
Stale chunks after schema change Set FORCE_REINDEX=true to drop and rebuild the Qdrant collection

License

MIT β€” see LICENSE. This repo is meant to be used, adapted and improved upon based on individual user needs and system capabilities.

About

A local, memory-aware, Second Brain agent. Ingests Notion pages, processes PDFs/images into markdown, indexes everything into Qdrant, and serves RAG queries, with all inference running locally through Ollama.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages