Skip to content

Releases: Sardor-M/lumen

v0.3.0: Graph connectivity + managed sync daemon

29 May 08:55
fb2ab9d

Choose a tag to compare

v0.3.0: Graph connectivity + managed sync daemon

Two additive, backward-compatible features. The compiler now keeps the knowledge graph connected across sources instead of forming per-source islands, and cross-device sync gets a first-class managed daemon so you no longer hand-edit launchd/systemd templates.

npm install -g lumen-kb@0.3.0

Connect the graph across sources (#52)

Previously each compile pass coined slugs in isolation and cross-source edges formed only by coincidence - the graph fragmented into disconnected subgraphs. Now:

  • Known-concepts hint — before compiling a source, the most-mentioned active concepts (top 80, retired excluded) are inlined into the LLM prompt as a "reuse these slugs verbatim" block, so the same idea gets the same slug across sources. Prompt overhead stays under ~2k tokens.
  • Global edge resolver — every edge endpoint resolves against the whole brain: exact in-pass match → alias-aware DB match → fuzzy slug match (same Levenshtein threshold as dedup, with a length pre-filter). Edges pointing at concepts from prior sources are no longer silently dropped.
  • edges_dropped is now reported alongside edges_created in the compilation result.

Managed sync daemon (Tier 6, #26 / #28 / #29)

A first-class daemon replaces the manual launchd/systemd templates from 0.2.0:

lumen sync daemon install      # detect platform, generate + load the unit
lumen sync daemon status       # installed? managed vs. manual? PID alive?
lumen sync daemon uninstall    # bootout/disable + clean up
  • Adaptive cadence — Active (~30s) when there's pending push work or recent pulls returned rows; Idle (~300s) after a few empty pulls.
  • Push debounce — bursts of journal writes (e.g. during a long lumen compile) coalesce into a single push; pull stays on schedule.
  • Manual-shape detection--replace-manual detects the PR #27 templates (StartInterval / Type=oneshot), unloads them, and installs the managed daemon in their place.

Upgrade

npm install -g lumen-kb@0.3.0

# If you copied the manual sync templates from 0.2.0:
lumen sync daemon install --replace-manual

Fully backward compatible — no schema migration, no config changes required.

Full changelog: https://github.com/Sardor-M/lumen/blob/main/CHANGELOG.md

v0.2.0 — Tier 5 cross-device sync

07 May 14:22

Choose a tag to compare

v0.2.0 — Tier 5 cross-device sync

End-to-end-encrypted multi-device sync. Five additive sub-tiers (5a–5e) shipping as a single coherent feature. Local-first remains the default — sync is opt-in, every payload is sealed with X25519 + XChaCha20-Poly1305 on-device, and the relay holds opaque ciphertext only.

npm install -g lumen-kb@0.2.0

What you can do now

# Device A:
lumen sync init --relay https://lumen-relay.<your-account>.workers.dev
lumen sync enable
lumen sync show-key --reveal       # share securely

# Device B:
lumen sync import-key "<base64>" --relay <same URL>
lumen sync enable
lumen sync run                     # push → pull → apply

Concepts, compiled-truth updates, +1/−1 feedback, retirements, and trajectories all sync across your laptops. New devices coming online a year later replay the entire history from one master key.

What's in the box

  • Append-only sync journal (sync_journal, schema v15) — every concept-touching mutation atomically journals alongside its entity write
  • X25519 + XChaCha20-Poly1305 encryption envelope — pure-TS, audited noble-suite deps, no native bindings
  • Relay HTTP client + driver with retry/backoff + circuit-breaker after 5 consecutive failures
  • Reference Cloudflare Worker relay (apps/relay/) — D1-backed, ~600 LOC, deployable in three wrangler commands, zero-knowledge by construction
  • Per-op apply rules (schema v16) — applyConceptCreate, applyTrajectory, applyFeedback, applyTruthUpdate, applyRetire — last-write-wins on truth_update with concept_truth_history audit trail
  • lumen sync CLIinit / enable/disable / push/pull/apply/run / status / reset-error / show-key/import-key/forget-key
  • 23 MCP tools (was 19) — brain_feedback, retire_skill, capture_trajectory, replay_skill join the existing surface

Tests

  • 887 passing, 0 regressions (was 740 in 0.1.4)
  • 132 new sync tests across 7 files (journal, crypto, keyring, relay-client, driver, apply, e2e)
  • 25 relay tests via @cloudflare/vitest-pool-workers (real workerd + miniflare D1/KV, no mocks)
  • 25 edge-case tests covering LWW chains, multi-device feedback/retire, mixed-op same-slug batches, orchestrator robustness, scope-aware apply

Schema migrations

Both purely additive — no rewrite of existing rows, existing tables and prior tests untouched.

  • v15sync_state singleton + sync_journal append-only log
  • v16concept_truth_history (LWW audit) + concept_feedback.sync_id partial UNIQUE INDEX

Honest limitations

  • What syncs: concepts, feedback, truth updates, retirements, trajectories
  • What doesn't: lumen add source content, embeddings, chunks — local-first stays local-first
  • No realtime: sync runs on demand via lumen sync run; a daemon is Tier 6
  • LWW only: no vector clocks / CRDTs in v1; flat-timestamp last-write-wins is honest enough at our scale
  • Multi-device key share: currently show-key --reveal + import-key <base64>; QR / BIP39 / age-file is Tier 6
  • Tombstone propagation: DELETE removes the relay blob and tombstones the relay row, but other devices learn about the deletion only via their own DELETE (Tier 6)

Migration

lumen upgrades automatically — schema v16 applies on first DB open after upgrade. No action required unless you want to enable sync (opt-in):

npm install -g lumen-kb@0.2.0
lumen sync init --relay <url>          # only if you want sync

Existing concept_feedback rows keep sync_id = NULL (the partial unique index doesn't apply to them).

Acknowledgements

Crypto primitives via @noble/ciphers and @noble/curves — pure-TS, audited, no native deps.

v0.1.4 — code/dataset/image ingest + Obsidian connector

23 Apr 08:16

Choose a tag to compare

What's in v0.1.4

  • Code repositories (source_type: code) — lumen add <github-url> shallow-clones into tmpdir and cleans up;
    / .next / target, truncates files over 50 KB, caps at 800 files per repo. Regex signature extraction for
    JS/TS/Python/Go/Rust/Java/C/C++/Ruby/C#. README.md / CONTRIBUTING.md / docs/ ordered before source sections.
  • Datasets (source_type: dataset) — lumen add ./data.csv or lumen add https://huggingface.co/datasets/<id>.
    Native CSV / TSV / JSONL parsing with delimiter auto-detection, quoted-field + doubled-quote escapes; HuggingFace datasets
    fetched via /api/datasets/ plus /raw/main/README.md for the card. Schema table (column, inferred type, null count) +
    20-row preview rendered as markdown. Colocated README.md / dataset-card.md auto-inlined.
  • Images (source_type: image) — lumen add screenshot.png. SHA-256 of bytes into metadata, MIME inferred from
    extension. Optional OCR shells out to a local tesseract binary when on PATH; --no-ocr stores metadata only. Missing
    binary produces a brew install hint instead of failing. Supports .png / .jpg / .webp / .gif / .bmp / .tiff.
  • Obsidian connector (ConnectorType: obsidian) — lumen watch add obsidian ~/vault. Parses Web Clipper YAML
    frontmatter, promotes the source: URL into the ExtractionResult so re-clippings dedup by URL. Skips .obsidian and
    other dotfiles. Flat keys, inline arrays, block-list arrays all supported.
  • Auto-detection in detectSourceType: GitHub repo URL → code, HuggingFace dataset URL → dataset, .csv / .tsv
    / .jsonl / .ndjsondataset, image extensions → image, directory containing .gitcode.
  • New CLI flags: --as-dataset (force dataset handling for ambiguous text), --no-ocr (skip OCR on images).
    lib/add.ts AddInput object form accepts an options field threading IngestOptions through the programmatic path.
  • 77 new tests across tests/code.test.ts (20), tests/dataset.test.ts (20), tests/image.test.ts (12, one
    conditionally skipped when tesseract is absent), tests/obsidian.test.ts (19). Full suite: 536 passing, 2 skipped.

Changed

  • SourceType union widened to 'url' | 'pdf' | 'youtube' | 'arxiv' | 'file' | 'folder' | 'code' | 'dataset' | 'image'.
    No schema migration — existing source_type TEXT + metadata TEXT columns accept the new shapes.
  • ConnectorType union widened with 'obsidian'.
  • ingestInput(input, options?) — optional IngestOptions second argument, backwards-compatible.

Fixes

  • parseJsonl: SCHEMA_SAMPLE_ROWS cap now counts only successfully-parsed object rows (previously incremented on
    malformed lines too).
  • dataset tests: global.fetch stubs switched to vi.stubGlobal + vi.unstubAllGlobals so they don't leak across test
    files.
  • obsidian connector: parseTarget and pull reject subdir paths that escape the vault via .. or absolute paths.
  • add command: --type flag is now actually threaded to the extractor as forcedType.
  • add command: action body wrapped in top-level try/catch for consistent error handling.

Deferred

Tracked in docs/docs-temp/INGEST-EXPANSION-PLAN.md:

  • Tree-sitter-based code parsing — avoids native-build dependency for now.
  • Claude Vision caption pass on compile for images — metadata.caption reserved as placeholder.
  • Native Parquet support — errors with a duckdb conversion hint instead of parsing.
  • In-house browser extension — Obsidian path validates the flow first.

Install

npm install -g lumen-kb

PRs

  • #10 — code/dataset/image ingest formats + Obsidian connector + v0.1.4 CHANGELOG

Full changelog: https://github.com/Sardor-M/Lumen/blob/main/CHANGELOG.md

v0.1.3

22 Apr 01:49

Choose a tag to compare

Landing page, shared packages, benchmarks runner, CI and lint fixes.

What's in v0.1.3

  • Landing page app with interactive graph visualization, knowledge model demo, and agent wiring walkthrough
  • Shared packages: @lumen/ui (useClipboard), @lumen/eslint-config, @lumen/tsconfig, @lumen/brand
  • Benchmarks runner with 6 test suites: search quality, search latency, graph ops, ingest, adversarial, MCP contract
  • npm/GitHub badges in README, npm install -g lumen-kb as primary install
  • Auto-review diff limit raised to 4000 lines

Fixes

  • useClipboard: await writeText Promise before setting copied state
  • @eslint/eslintrc declared as explicit dependency
  • bench script routes through workspace tsx
  • @claude workflow: allowed_tools moved into claude_args as --allowedTools
  • Landing JSX comment lint errors fixed
  • .gitignore iCloud pattern escaped

Install

npm install -g lumen-kb

PRs

  • #8 — npm badges, landing lint fixes, benchmark plan
  • #7 — CHANGELOG, CONTRIBUTING, SECURITY, .env.example, knip

Full changelog: https://github.com/Sardor-M/Lumen/blob/main/CHANGELOG.md

v0.1.2 — First published release

21 Apr 01:41
f147d62

Choose a tag to compare

Local-first knowledge compiler. Ingest articles, papers, PDFs, YouTube transcripts into a searchable knowledge graph — then wire it into your AI assistant.

Install

npm install -g lumen-kb
lumen init
echo 'ANTHROPIC_API_KEY=sk-ant-...' > ~/.lumen/.env

Then in Claude Code:

lumen install claude

What's in v0.1.2

Ingestion — URL, PDF, YouTube (Innertube captions), arXiv, local files and folders. SHA-256 dedup. Structural chunking (Markdown/HTML/plain text, merge <50t, split >1000t).

Search — 3-signal hybrid: BM25 (FTS5 Porter stemmed) + TF-IDF (in-memory cosine) + Vector ANN (sqlite-vec). Fused via Reciprocal Rank Fusion (k=60). Token budget flag: lumen search "query" -b 4000.

Compilation — LLM extracts concepts + weighted edges with compiled truth + timeline per concept. Parallel: lumen compile -c 5. Model override: --model claude-haiku-4-5-20251001. Prompt caching cuts cost 60-80%.

Enrichment — Concepts auto-escalate: Tier 3 (stub) → Tier 2 (enriched, 3+ mentions) → Tier 1 (full, 6+ mentions). lumen enrich --status.

Graph — PageRank, BFS shortest path, N-hop neighborhood, label propagation communities. lumen graph pagerank, lumen graph path <a> <b>.

Agent wiringlumen install claude generates 5 files: CLAUDE.md (brain-first protocol), .mcp.json, skill, PreToolUse hook, Stop hook. Agent checks KB before web search, cites [Source: title], captures ideas after every response.

MCP server — 19 tools via stdio: search, brain_ops, compile, capture, add, concept, path, neighbors, god_nodes, communities, pagerank, query, status, profile, session_summary, add_link, backlinks, links, community.

Streaminglumen ask streams tokens to stdout as they arrive.

Tests — 28 tests: compress pipeline (13), graph engine + PageRank + communities (15), delta stubs (10 todos).

Web UI — Next.js 15 dashboard with search, concept browser, graph visualization.

PRs

  • #3 — Signal capture + tiered enrichment (schema v9)
  • #4 — Streaming responses + always-on brain wiring
  • #5 — Parallel compile, brain-first CLAUDE.md, CLI fixes
  • #6 — MCP compile tool, tests, npm publish, MIT license
  • #7 — CHANGELOG, CONTRIBUTING, SECURITY, .env.example, knip

What's next (v0.2.0)

  • Vector embeddings wired into default search path
  • Delta module: smart recompilation of only changed sources
  • Browser extension for capturing highlights and URLs