MemSearchDemo.mp4
Automatic persistent memory for Claude Code. No commands to learn, no manual saving — just install the plugin and Claude remembers what you worked on across sessions.
Built on Claude Code's native Hooks, Skills, and CLI — no MCP servers, no sidecar services. Everything runs locally as shell scripts, a skill definition, and a Python CLI.
Default embedding changed to ONNX bge-m3 int8 — the plugin now runs entirely locally with no API key and no GPU required. Quality is comparable to OpenAI text-embedding-3-small (only ~1% lower on our benchmark). Existing users who want to switch: run memsearch config set embedding.provider onnx && memsearch index --force to re-index. See the evaluation README for detailed benchmark results and rationale.
graph LR
subgraph "memsearch (Python library)"
LIB[Core: chunker, embeddings,<br/>vector store, scanner]
end
subgraph "memsearch CLI"
CLI["CLI commands:<br/>search · index · watch<br/>expand · transcript · config"]
end
subgraph "plugins/claude-code (Claude Code Plugin)"
HOOKS["Shell hooks:<br/>SessionStart · UserPromptSubmit<br/>Stop · SessionEnd"]
SKILL["Skill:<br/>memory-recall (context: fork)"]
end
LIB --> CLI
CLI --> HOOKS
CLI --> SKILL
HOOKS -->|"runs inside"| CC[Claude Code]
SKILL -->|"subagent"| CC
style LIB fill:#dce8f5,stroke:#4a86c8,color:#1a2744
style CLI fill:#fae3d0,stroke:#d08040,color:#1a2744
style HOOKS fill:#d5f0d6,stroke:#4a9e4e,color:#1a2744
style CC fill:#e8d5f5,stroke:#9b59b6,color:#1a2744
The memsearch Python library provides the core engine (chunking, embedding, vector storage, search). The memsearch CLI wraps the library into shell-friendly commands. The Claude Code Plugin ties those CLI commands to Claude Code's hook lifecycle and skill system — hooks handle session management and memory capture, while the memory-recall skill handles intelligent retrieval in a forked subagent context.
sequenceDiagram
participant You
participant Claude as Claude Code
rect rgb(255, 230, 230)
note right of You: Without plugin
You->>Claude: Monday: "Add Redis caching with 5min TTL"
Claude->>You: ✅ Done — implements caching
note over Claude: Session ends. Context is gone.
You->>Claude: Wednesday: "The /orders endpoint is slow"
Claude->>You: ❌ Suggests solutions from scratch<br/>(forgot about the Redis cache from Monday)
end
rect rgb(220, 245, 220)
note right of You: With plugin
You->>Claude: Monday: "Add Redis caching with 5min TTL"
Claude->>You: ✅ Done — implements caching
note over Claude: Plugin auto-summarizes → memory/2026-02-10.md
You->>Claude: Wednesday: "The /orders endpoint is slow"
note over Claude: Plugin injects: "Added Redis caching<br/>middleware with 5min TTL..."
Claude->>You: ✅ "We already have Redis caching —<br/>let me add the /orders endpoint to it"
end
- Picking up where you left off. You debugged an auth issue yesterday but didn't finish. Today Claude remembers the root cause, which files you touched, and what you tried — no re-explaining needed.
- Recalling past decisions. "Why did we switch from JWT to session cookies?" Claude can trace back to the original conversation where the trade-offs were discussed, thanks to the 3-layer progressive disclosure.
- Long-running projects. Over days or weeks of development, architectural context accumulates automatically. Claude stays aware of your codebase conventions, past refactors, and resolved issues without you having to maintain a manual changelog.
# 1. In Claude Code, add the marketplace and install the plugin
/plugin marketplace add zilliztech/memsearch
/plugin install memsearch
# 2. Restart Claude Code to activate the plugin (exit and reopen)
# 3. Have a conversation, then exit. Check your memories:
cat .memsearch/memory/$(date +%Y-%m-%d).md
# 4. Start a new session — Claude automatically remembers!Note: The plugin defaults to the ONNX bge-m3 embedding model — no API key required, runs locally on CPU. This model was selected through a comprehensive benchmark of 12+ models on bilingual memory retrieval. If memsearch is not already installed, the plugin will install
memsearch[onnx]automatically viauvxon first run. To use a different embedding provider (e.g. OpenAI), set it withmemsearch config set embedding.provider openaiand export the required API key.
The plugin hooks into 4 Claude Code lifecycle events and provides a memory-recall skill. A singleton memsearch watch process runs in the background, keeping the vector index in sync with markdown files as they change. (Milvus Lite falls back to one-time indexing at session start.)
stateDiagram-v2
[*] --> SessionStart
SessionStart --> WatchRunning: start memsearch watch
SessionStart --> InjectRecent: load recent memories (cold start)
state WatchRunning {
[*] --> Watching
Watching --> Reindex: file changed
Reindex --> Watching: done
}
InjectRecent --> Prompting
state Prompting {
[*] --> UserInput
UserInput --> Hint: UserPromptSubmit hook
Hint --> ClaudeProcesses: "[memsearch] Memory available"
ClaudeProcesses --> MemoryRecall: needs context?
MemoryRecall --> Subagent: memory-recall skill [fork]
Subagent --> ClaudeResponds: curated summary
ClaudeProcesses --> ClaudeResponds: no memory needed
ClaudeResponds --> UserInput: next turn
ClaudeResponds --> Summary: Stop hook (async, non-blocking)
Summary --> WriteMD: append to YYYY-MM-DD.md
}
Prompting --> SessionEnd: user exits
SessionEnd --> StopWatch: stop memsearch watch
StopWatch --> [*]
| Hook | Type | Async | Timeout | What It Does |
|---|---|---|---|---|
| SessionStart | command | no | 10s | Start memsearch watch singleton, write session heading to today's .md, inject recent daily logs as cold-start context via additionalContext, display config status (provider/model/milvus) in systemMessage |
| UserPromptSubmit | command | no | 15s | Lightweight hint: returns systemMessage "[memsearch] Memory available" (skip if < 10 chars). No search — recall is handled by the memory-recall skill |
| Stop | command | yes | 120s | Extract last turn from transcript with parse-transcript.sh, call claude -p --model haiku to summarize (third-person notes), append summary with session/turn anchors to daily .md |
| SessionEnd | command | no | 10s | Stop the memsearch watch background process (cleanup) |
Fires once when a Claude Code session begins. This hook:
- Reads config and checks API key. Runs
memsearch config getto read the configured embedding provider, model, and Milvus URI. Checks whether the required API key is set for the provider (OPENAI_API_KEY,GOOGLE_API_KEY,VOYAGE_API_KEY,JINA_API_KEY,MISTRAL_API_KEY;onnx,ollama, andlocalneed no key). If missing, shows an error insystemMessageand exits early. - Starts the watcher. Launches
memsearch watch .memsearch/memory/as a singleton background process (PID file lock prevents duplicates). The watcher monitors markdown files and auto-re-indexes on changes with a 1500ms debounce. Milvus Lite falls back to a one-timememsearch indexat session start. - Writes a session heading. Appends
## Session HH:MMto today's memory file (.memsearch/memory/YYYY-MM-DD.md), creating the file if it does not exist. - Injects cold-start context. Reads the last 30 lines from the 2 most recent daily logs and returns them as
additionalContext. This gives Claude awareness of recent sessions, which helps it decide when to invoke the memory-recall skill. - Checks for updates. Queries PyPI (2s timeout) and compares with the installed version. If a newer version is available, appends an
UPDATEhint to the status line. - Displays config status. Every exit path returns a
systemMessageshowing the active configuration, e.g.[memsearch v0.1.10] embedding: openai/text-embedding-3-small | milvus: ~/.memsearch/milvus.db(with| UPDATE: v0.1.12 availablewhen outdated).
Fires on every user prompt before Claude processes it. This hook:
- Extracts the prompt from the hook input JSON.
- Skips short prompts (under 10 characters) — greetings and single words don't need memory hints.
- Returns a lightweight hint. Outputs
systemMessage: "[memsearch] Memory available"— a visible one-liner that keeps Claude aware of the memory system without performing any search.
The actual memory retrieval is handled by the memory-recall skill, which Claude invokes automatically when it judges the user's question needs historical context.
Fires after Claude finishes each response. Runs asynchronously so it does not block the user. This hook:
- Guards against recursion. Checks
stop_hook_activeto prevent infinite loops (since the hook itself callsclaude -p). - Validates the transcript. Skips if the transcript file is missing or has fewer than 3 lines.
- Extracts the last turn. Calls
parse-transcript.sh(Python3 inline, nojqdependency), which finds the last real user message and extracts everything from there to EOF. Skips progress,file-history-snapshot, system, and thinking blocks. Formats output with clear role labels ([Human],[Claude Code],[Claude Code calls tool],[Tool output]/[Tool error]) so the summarizer treats the content as a third-party transcript. Tool results are truncated to 1000 characters (configurable viaMEMSEARCH_MAX_RESULT_CHARS). - Summarizes with Haiku. Pipes the parsed turn to
CLAUDECODE= claude -p --model haiku --no-session-persistencewith an external-observer system prompt requesting 2-6 third-person bullet points in the same language as the user's message. - Appends to daily log. Writes a
### HH:MMsub-heading with an HTML comment anchor containing session ID, turn UUID, and transcript path. Then runsmemsearch indexto ensure immediate indexing.
Fires when the user exits Claude Code. Calls stop_watch to kill the memsearch watch process and clean up the PID file, including a sweep for any orphaned processes.
Memory retrieval uses a three-layer progressive disclosure model, all handled autonomously by the memory-recall skill running in a forked subagent context. Claude invokes the skill when it judges the user's question needs historical context — no manual intervention required.
graph TD
SKILL["memory-recall skill<br/>(context: fork subagent)"]
SKILL --> L1["L1: Search<br/>(memsearch search)"]
L1 --> L2["L2: Expand<br/>(memsearch expand)"]
L2 --> L3["L3: Transcript drill-down<br/>(memsearch transcript)"]
L3 --> RETURN["Curated summary<br/>→ main agent"]
style SKILL fill:#dce8f5,stroke:#4a86c8,color:#1a2744
style L1 fill:#dce8f5,stroke:#4a86c8,color:#1a2744
style L2 fill:#fae3d0,stroke:#d08040,color:#1a2744
style L3 fill:#f5d5d5,stroke:#c04040,color:#1a2744
style RETURN fill:#d5f0d6,stroke:#4a9e4e,color:#1a2744
When Claude detects that a user's question could benefit from past context, it automatically invokes the memory-recall skill. The skill runs in a forked subagent context (context: fork), meaning it has its own context window and does not pollute the main conversation. The subagent:
- Searches for relevant memories using
memsearch search - Evaluates which results are truly relevant (skips noise)
- Expands promising results with
memsearch expandto get full markdown sections - Drills into transcripts when needed with
memsearch transcript - Returns a curated summary to the main agent
The main agent only sees the final summary — all intermediate search results, raw expand output, and transcript parsing happen inside the subagent.
Users can manually invoke the skill:
/memory-recall what did we discuss about the auth refactor?
Or just ask naturally — Claude auto-invokes the skill when it senses the question needs history:
We refactored the auth module last week, what was the approach?
The subagent runs memsearch search to find relevant chunks from the indexed memory files.
For promising search results, the subagent runs memsearch expand to retrieve the full markdown section surrounding a chunk:
$ memsearch expand 7a3f9b21e4c08d56Example output:
Source: .memsearch/memory/2026-02-10.md (lines 12-32)
Heading: 09:15
Session: abc123de-f456-7890-abcd-ef1234567890
Turn: def456ab-cdef-1234-5678-90abcdef1234
Transcript: /home/user/.claude/projects/.../abc123de...7890.jsonl
### 08:50
<!-- session:abc123de... turn:aaa11122... transcript:/.../abc123de...7890.jsonl -->
- Set up project scaffolding for the new API service
- Configured FastAPI with uvicorn, added health check endpoint
- Connected to PostgreSQL via SQLAlchemy async engine
### 09:15
<!-- session:abc123de... turn:def456ab... transcript:/.../abc123de...7890.jsonl -->
- Added Redis caching middleware to API with 5-minute TTL
- Used redis-py async client with connection pooling (max 10 connections)
- Cache key format: `api:v1:{endpoint}:{hash(params)}`
- Added cache hit/miss Prometheus counters for monitoring
- Wrote integration tests with fakeredis
When Claude needs the original conversation verbatim — exact code snippets, error messages, or tool outputs — it drills into the JSONL transcript.
List all turns in a session:
$ memsearch transcript /path/to/session.jsonlAll turns (73):
6d6210b7-b84 08:50:14 Set up the project scaffolding for... [12 tools]
3075ee94-0f6 09:05:22 Can you add a health check endpoint?
8e45ce0d-9a0 09:15:03 Add a Redis caching layer to the API... [8 tools]
53f5cac3-6d9 09:32:41 The cache TTL should be configurable... [3 tools]
c708b40c-8f8 09:45:18 Let's add Prometheus metrics for cache... [10 tools]
Drill into a specific turn with surrounding context:
$ memsearch transcript /path/to/session.jsonl --turn 8e45ce0d --context 1Showing 2 turns around 8e45ce0d:
>>> [09:05:22] 3075ee94
Can you add a health check endpoint?
**Assistant**: Sure, I'll add a `/health` endpoint that checks the database
connection and returns the service version.
>>> [09:15:03] 8e45ce0d
Add a Redis caching layer to the API with a 5-minute TTL.
**Assistant**: I'll add Redis caching middleware. Let me first check
your current dependencies and middleware setup.
[Read] requirements.txt
[Read] src/middleware/__init__.py
[Write] src/middleware/cache.py
[Edit] src/main.py — added cache middleware to app
Each memory summary includes an HTML comment anchor that links the chunk back to its source session, enabling the L2-to-L3 drill-down:
### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metricsThe anchor contains three fields:
| Field | Description |
|---|---|
session |
Claude Code session ID (also the JSONL filename without extension) |
turn |
UUID of the last user turn in the session |
transcript |
Absolute path to the JSONL transcript file |
Claude extracts these fields from memsearch expand --json-output and uses them to call memsearch transcript for L3 access.
All memories live in .memsearch/memory/ inside your project directory:
your-project/
├── .memsearch/
│ ├── .watch.pid <-- singleton watcher PID file
│ └── memory/
│ ├── 2026-02-07.md <-- daily memory log
│ ├── 2026-02-08.md
│ └── 2026-02-09.md <-- today's session summaries
└── ... (your project files)
Each file contains session summaries in plain markdown:
## Session 14:30
### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metrics
## Session 17:45
### 17:45
<!-- session:mno456pqr turn:stu012vwx transcript:/home/user/.claude/projects/.../mno456pqr.jsonl -->
- Debugged React hydration mismatch caused by Date.now() during SSR
- Added comprehensive test suite for the caching middlewareMarkdown is the source of truth. The Milvus vector index is a derived cache that can be rebuilt at any time with memsearch index .memsearch/memory/.
claude-mem is another memory solution for Claude Code. Here is a detailed comparison:
| Aspect | memsearch | claude-mem |
|---|---|---|
| Architecture | 4 shell hooks + 1 skill + 1 watch process | 5 JS hooks + 1 skill + MCP tools + Express worker service (port 37777) + React viewer |
| Integration | Native hooks + skill + CLI — no MCP, no sidecar service | Hooks + skill + MCP tools + HTTP worker service |
| Memory recall | Skill in forked subagent — memory-recall runs in context: fork, intermediate results stay isolated from main context |
Skill + MCP hybrid — mem-search skill for auto-recall, plus 5 MCP tools (search, timeline, get_observations, save_memory, ...) for explicit access |
| Progressive disclosure | 3-layer in subagent: search → expand → transcript, all in forked context — only curated summary reaches main conversation | 3-layer: mem-search skill for auto-recall; MCP tools for explicit drill-down |
| Session capture | 1 async claude -p --model haiku call at session end |
AI observation compression on every tool use (PostToolUse hook) + session summary |
| Vector backend | Milvus — hybrid search (dense + BM25 + RRF), scales from embedded to distributed cluster | ChromaDB — dense only; SQLite FTS5 for keyword search (separate, not fused) |
| Embedding model | Pluggable: OpenAI, Google, Voyage, Jina, Mistral, Ollama, local, ONNX (default: bge-m3 int8) | Fixed: all-MiniLM-L6-v2 (384-dim, WASM backend) |
| Storage format | Transparent .md files — human-readable, git-friendly |
SQLite database + ChromaDB binary |
| Data portability | Copy .memsearch/memory/*.md and rebuild index |
Export from SQLite + ChromaDB |
| Runtime dependency | Python (memsearch CLI) + claude CLI |
Node.js / Bun + Express worker service |
| Context window cost | No MCP tool definitions; skill runs in forked context — only curated summary enters main context | MCP tool definitions permanently loaded + each MCP tool call/result consumes main context |
Both projects use hooks for session lifecycle and skills for memory recall. The architectural divergence is in how retrieval interacts with the main context window.
memsearch runs memory recall in a forked subagent (context: fork). The memory-recall skill gets its own isolated context window — all search, expand, and transcript operations happen there. Only the curated summary is returned to the main conversation. This means: (1) intermediate search results never pollute the main context, (2) multi-step retrieval is autonomous, and (3) no MCP tool definitions consume context tokens.
claude-mem combines a mem-search skill with MCP tools (search, timeline, get_observations, save_memory). The MCP tools give Claude explicit control over memory access in the main conversation, at the cost of tool definitions permanently consuming context tokens. The PostToolUse hook also records every tool call as an observation, providing richer per-action granularity but incurring more API calls.
The other key difference is storage philosophy: memsearch treats markdown files as the source of truth (human-readable, git-friendly, rebuildable), while claude-mem uses SQLite + ChromaDB (opaque but structured, with richer queryable metadata).
Claude Code has built-in memory features: CLAUDE.md files and auto-memory (the /memory command). Here is why memsearch provides a stronger solution:
| Aspect | Claude Native Memory | memsearch |
|---|---|---|
| Storage | Single CLAUDE.md file (or per-project) |
Unlimited daily .md files with full history |
| Recall mechanism | File is loaded at session start (no search) | Skill-based semantic search — Claude auto-invokes when context is needed |
| Granularity | One monolithic file, manually edited | Per-session bullet points, automatically generated |
| Search | None — Claude reads the whole file or nothing | Hybrid semantic search (dense + BM25) returning top-k relevant chunks |
| History depth | Limited to what fits in one file | Unlimited — every session is logged, every entry is searchable |
| Automatic capture | /memory command requires manual intervention |
Fully automatic — hooks capture every session |
| Progressive disclosure | None — entire file is loaded into context | 3-layer model (L1 auto-inject, L2 expand, L3 transcript) minimizes context usage |
| Deduplication | Manual — user must avoid adding duplicates | SHA-256 content hashing prevents duplicate embeddings |
| Portability | Tied to Claude Code's internal format | Standard markdown files, usable with any tool |
CLAUDE.md is a blunt instrument: it loads the entire file into context at session start, regardless of relevance. As the file grows, it wastes context window on irrelevant information and eventually hits size limits. There is no search — Claude cannot selectively recall a specific decision from three weeks ago.
memsearch solves this with skill-based semantic search and progressive disclosure. When Claude judges that historical context would help, it auto-invokes the memory-recall skill, which runs in a forked subagent and autonomously searches, expands, and curates relevant memories. History can grow indefinitely without degrading performance, because the vector index handles the filtering. And the three-layer model (search → expand → transcript) runs entirely in the subagent, keeping the main context window clean.
plugins/claude-code/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest (name, version, description)
├── hooks/
│ ├── hooks.json # Hook definitions (4 lifecycle hooks)
│ ├── common.sh # Shared setup: env, PATH, memsearch detection, watch management
│ ├── session-start.sh # Start watch + write session heading + inject cold-start context
│ ├── user-prompt-submit.sh # Lightweight systemMessage hint ("[memsearch] Memory available")
│ ├── stop.sh # Extract last turn → haiku summary (third-person) → append to daily .md
│ ├── parse-transcript.sh # Extract last turn from JSONL, format with role labels (Python3, no jq)
│ └── session-end.sh # Stop watch process (cleanup)
└── skills/
└── memory-recall/
└── SKILL.md # Memory retrieval skill (context: fork subagent)
The plugin is built entirely on the memsearch CLI — every hook is a shell script calling memsearch subcommands:
| Command | Used By | What It Does |
|---|---|---|
search <query> |
memory-recall skill | Semantic search over indexed memories (--top-k for result count, --json-output for JSON) |
watch <paths> |
SessionStart hook | Background watcher that auto-indexes on file changes (1500ms debounce) |
index <paths> |
Manual / rebuild | One-shot index of markdown files (--force to re-index all) |
expand <chunk_hash> |
memory-recall skill (L2) | Show full markdown section around a chunk, with anchor metadata |
transcript <jsonl> |
memory-recall skill (L3) | Parse Claude Code JSONL transcript into readable conversation turns |
config init |
Quick Start | Interactive config wizard for first-time setup |
stats |
Manual | Show index statistics (collection size, chunk count) |
reset |
Manual | Drop all indexed data (requires --yes to confirm) |
For the full CLI reference, see the CLI Reference docs.
For contributors or if you want to modify the plugin locally:
git clone https://github.com/zilliztech/memsearch.git
cd memsearch && uv sync
claude --plugin-dir ./plugins/claude-codeThe plugin provides several observability mechanisms, from always-on status lines to opt-in debug logging. Work from the top down — most issues are resolved by the first two sections.
| Mechanism | Always On? | What You See | Best For |
|---|---|---|---|
| SessionStart status line | Yes | [memsearch v0.1.11] embedding: openai/... | milvus: ... |
Config errors, version checks |
| Debug mode | No | Full hook JSON in ~/.claude/logs/ |
Hook execution, additionalContext |
| CLI diagnostic commands | Manual | Config, index stats, search results | Config verification, search testing |
| Watch process | Yes (background) | PID file at .memsearch/.watch.pid |
Index sync issues |
| Skill execution | Yes (in UI) | Skill invocation + Bash tool calls | Memory recall debugging |
| Memory files | Yes | .memsearch/memory/YYYY-MM-DD.md |
Stop hook, summary quality |
Every session starts with a status line in systemMessage. This is the first thing to check when something seems wrong.
Hooks communicate with Claude Code by returning JSON. Two key fields:
systemMessage— A visible info line shown in the terminal, like a status bar.additionalContext— Invisible to the user; injected into Claude's context silently. Only appears in debug logs (claude --debug).
Here is what a session looks like with the plugin installed:
✻
|
▟█▙ Claude Code v2.x.x
▐▛███▜▌ Model · Plan
▝▜█████▛▘ ~/my-project
▘▘ ▝▝
⎿ SessionStart:startup says: [memsearch v0.1.11] ← systemMessage
embedding: openai/text-embedding-3-small | milvus: (SessionStart hook)
~/.memsearch/milvus.db
❯ How does the caching layer work?
⎿ UserPromptSubmit says: [memsearch] Memory available ← systemMessage
(UserPromptSubmit hook)
✶ Thinking…
The SessionStart hook also loads the 2 most recent daily logs as additionalContext — Claude reads this silently to decide when to invoke the memory-recall skill, but you won't see it in the terminal.
Normal:
[memsearch v0.1.11] embedding: openai/text-embedding-3-small | milvus: ~/.memsearch/milvus.db
API key missing:
[memsearch v0.1.11] embedding: openai/text-embedding-3-small | milvus: ~/.memsearch/milvus.db | ERROR: OPENAI_API_KEY not set — memory search disabled
Update available:
[memsearch v0.1.11] embedding: openai/text-embedding-3-small | milvus: ~/.memsearch/milvus.db | UPDATE: v0.1.12 available — run: pip install --upgrade 'memsearch[onnx]'
The plugin checks for the required API key at session start. If missing, memory recording still writes .md files, but semantic search and indexing are disabled.
| Provider | Required environment variable |
|---|---|
onnx (plugin default) |
None (local, CPU) |
openai (Python API default) |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
voyage |
VOYAGE_API_KEY |
jina |
JINA_API_KEY |
mistral |
MISTRAL_API_KEY |
ollama |
None (local) |
local |
None (local) |
Fix: export the key for your configured provider:
# The plugin defaults to onnx (no key needed). If you use OpenAI:
export OPENAI_API_KEY="sk-..."
memsearch config set embedding.provider openaiTo make it permanent, add the export to your ~/.bashrc, ~/.zshrc, or equivalent.
The plugin checks PyPI at session start (2s timeout) and shows this hint when a newer version exists. The hint now includes the exact upgrade command, auto-detected from your installation method:
UPDATE: v0.1.15 available — run: pip install --upgrade 'memsearch[onnx]'
UPDATE: v0.1.15 available — run: uv tool upgrade 'memsearch[onnx]'
| Install method | Upgrade command shown |
|---|---|
pip install memsearch[onnx] |
pip install --upgrade 'memsearch[onnx]' |
uv tool install memsearch[onnx] |
uv tool upgrade 'memsearch[onnx]' |
uvx (auto) |
uvx --upgrade --from 'memsearch[onnx]' memsearch --version |
Note:
uvxusers get automatic upgrades — the plugin runsuvx --upgradeon every bootstrap with the[onnx]extra. TheUPDATEhint primarily helpspip/uv toolusers who have no automatic update mechanism.
Claude Code's --debug flag enables verbose logging for all hooks.
Start Claude Code with debug logging:
claude --debugLog location: ~/.claude/logs/ (timestamped files)
What to look for in the logs:
# See all hook outputs (additionalContext, systemMessage, etc.)
grep -A 5 'hook' ~/.claude/logs/*.log
# Check SessionStart output specifically
grep -A 10 'SessionStart' ~/.claude/logs/*.log
# See what additionalContext was injected
grep 'additionalContext' ~/.claude/logs/*.logEach hook outputs JSON to stdout. In debug mode, you can see the raw JSON — useful for verifying that additionalContext (cold-start memories) and systemMessage (status line) are being returned correctly.
These commands work outside of Claude Code — run them directly in your terminal.
Verify resolved configuration:
memsearch config list --resolvedShows the effective config after merging all layers (defaults → ~/.memsearch/config.toml → .memsearch.toml → env vars). Check that embedding.provider, embedding.model, and milvus.uri are what you expect.
Check index health:
memsearch statsShows collection name, chunk count, and embedding dimensions. If the count is 0 or unexpectedly low, re-index:
memsearch index .memsearch/memory/ --forceTest search manually:
memsearch search "your query here" --top-k 5If this returns no results but stats shows chunks exist, the issue is likely with embeddings (wrong API key, different model than what was used for indexing).
Expand a specific chunk:
memsearch expand <chunk_hash>Retrieves the full markdown section surrounding a chunk, including session anchors. Useful for verifying that the L2 expand layer works.
Trace back to original conversation:
memsearch transcript /path/to/session.jsonl
memsearch transcript /path/to/session.jsonl --turn <uuid> --context 3Lists all turns or drills into a specific turn. The transcript path is embedded in session anchors (the <!-- session:... transcript:... --> HTML comments in memory files).
The memsearch watch singleton runs in the background, auto-re-indexing when memory files change.
PID file location: .memsearch/.watch.pid
Check if it's running:
cat .memsearch/.watch.pid && kill -0 $(cat .memsearch/.watch.pid) 2>/dev/null && echo "running" || echo "not running"Restart manually:
# Kill existing watch (if any) and start fresh
kill $(cat .memsearch/.watch.pid) 2>/dev/null; rm -f .memsearch/.watch.pid
memsearch watch .memsearch/memory/ &
echo $! > .memsearch/.watch.pidSweep for orphaned processes:
pgrep -f "memsearch watch" && echo "found orphans" || echo "clean"The watch process is started by SessionStart and stopped by SessionEnd. If Claude Code crashes or is killed with SIGKILL, the SessionEnd hook won't fire and the process may become orphaned. The next SessionStart always stops any existing watch before starting a new one.
Note: Milvus Lite does not support concurrent access, so the plugin falls back to one-time indexing at session start instead of a persistent watcher.
Want real-time indexing? Switch to Zilliz Cloud — no Docker, no ops, free tier available. Just set your URI and token:
memsearch config set milvus.uri "https://in03-xxx.api.gcp-us-west1.zillizcloud.com" memsearch config set milvus.token "your-api-key"The next Claude Code session will automatically use real-time
watchindexing. See the backend comparison for details.
When Claude decides past context is needed, it invokes the memory-recall skill. You can observe the three progressive disclosure layers in the Claude Code UI:
╭─ memory-recall ─╮
│ │
│ ● Searching for relevant memories... │
│ │
│ $ memsearch search "redis caching" --top-k 5 │
│ → 3 results found │
│ │
│ $ memsearch expand 7a3f9b21e4c08d56 │
│ → Full section from 2026-02-10.md │
│ │
│ Summary: Found relevant context about Redis caching... │
│ │
╰──────────────────────────────────────────────────────────╯
The skill runs in a forked subagent (context: fork), so its intermediate work does not pollute your main conversation context.
Force a skill invocation for debugging:
/memory-recall <your query>
This manually triggers the skill, bypassing Claude's judgment about whether memory is needed.
Skill not triggering automatically? Possible reasons:
- Claude judged that the question doesn't need historical context — this is by design
- The
UserPromptSubmithint ([memsearch] Memory available) didn't fire — check that the prompt is ≥ 10 characters memsearchis not installed or not in PATH — theUserPromptSubmithook returns{}whenMEMSEARCH_CMDis empty
All memories are stored as plain markdown in .memsearch/memory/.
Directory location: .memsearch/memory/ (project-scoped)
File format: One file per day, named YYYY-MM-DD.md:
## Session 14:30
### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinloadVerify the Stop hook is working:
# Check if today's file exists and has content
cat .memsearch/memory/$(date +%Y-%m-%d).md
# Check if recent sessions have summaries (not just headings)
tail -20 .memsearch/memory/$(date +%Y-%m-%d).mdIf you see ## Session HH:MM headings but no ### HH:MM sub-headings with bullet points underneath, the Stop hook is not completing successfully. Common causes:
claudeCLI not found — the Stop hook callsclaude -p --model haikuto summarize- API key missing — the Stop hook skips summarization when the embedding provider key is not set
- Transcript too short — sessions with fewer than 3 JSONL lines are skipped
| Symptom | Check | Section |
|---|---|---|
| "ERROR: <KEY> not set" in status line | Export the required API key for your provider | §1 |
| "UPDATE: v0.x.x available" in status line | Upgrade memsearch | §1 |
| First session hangs or memory search unavailable | The ONNX model (~558 MB) is downloading in the background. See §7 | §7 |
| Search returns no results | Run memsearch stats and memsearch search manually |
§3 |
| New memories not being indexed | Check watch process is running | §4 |
| Claude never invokes memory recall | Try /memory-recall <query> manually |
§5 |
| Session summaries missing from memory files | Check claude CLI is available and API key is set |
§6 |
The plugin defaults to the ONNX bge-m3 int8 embedding model, which runs locally on CPU with no API key required. On the very first session, this model (~558 MB) needs to be downloaded from HuggingFace Hub. The download runs in the background during session start, and during this time memory search may be temporarily unavailable.
Symptoms:
- First session appears to hang after sending a prompt (the background download is blocking Milvus Lite)
[memsearch] Memory availablehint appears but memory recall returns no resultsmemsearch searchormemsearch indexcommands hang on first run
Pre-download the model manually:
# This triggers the model download without starting a Claude session
uvx --from 'memsearch[onnx]' memsearch search --provider onnx "warmup" 2>/dev/null || trueIf the download is slow or stuck:
HuggingFace Hub may be slow or inaccessible from certain networks. Set the HF_ENDPOINT environment variable to use a mirror:
export HF_ENDPOINT=https://hf-mirror.com
uvx --from 'memsearch[onnx]' memsearch search --provider onnx "warmup" 2>/dev/null || trueTo make this permanent, add export HF_ENDPOINT=https://hf-mirror.com to your ~/.bashrc or ~/.zshrc.
After the first download: The model is cached locally at ~/.cache/huggingface/hub/ and all subsequent sessions load it instantly from disk with no network access required.