Skip to content

perf: ruflo memory store needs batch/pipe mode — CLI startup overhead makes bulk import impractical #141

@dmoellenbeck

Description

@dmoellenbeck

Summary

ruflo memory store spawns a full Node.js process per invocation (~5-10s startup). For bulk operations (e.g., importing 76 documentation files into HNSW-indexed memory), this means ~10+ minutes for what should be a 30-second operation.

Request: Add a batch import mode — either ruflo memory import-dir or ruflo memory store --batch reading from stdin/file.

Environment

  • ruflo: v3.5.48 (via npx @claude-flow/cli@latest)
  • agentic-flow: v3.0.0-alpha.1
  • agentdb: v1.3.9 (project-local), v3.0.0-alpha.10 (npx-bundled)
  • Node.js: v22.x
  • Platform: macOS Darwin 25.0.0 (Apple Silicon)

Reproduction

# Single store works but is slow (~5-10s per call)
time ruflo memory store --namespace adr --key test --value "hello world"
# real    0m7.234s  ← 7 seconds for 11 bytes

# Attempting to import 76 files:
for FILE in docs/adr/*.md docs/ddd/*.md docs/specification/*.md; do
  KEY=$(basename "$FILE" .md)
  NS=$(basename $(dirname "$FILE"))
  ruflo memory store --namespace "$NS" --key "$KEY" --value "$(cat "$FILE")"
done
# Result: ETIMEDOUT errors, ~10+ minutes, most files fail

Root Cause Analysis

Each ruflo memory store invocation:

  1. Node.js process startup (~1s)
  2. Package resolution — loads @claude-flow/cli, agentic-flow, agentdb (~2s)
  3. AgentDB runtime patch attempt — searches process.cwd()/node_modules/agentdb/dist/controllers/index.js, fails with warning if wrong distribution found (~0.5s)
  4. SQLite database open — opens/creates the memory.db file (~0.5s)
  5. HNSW embedding generation — computes 384-dim vector for the value (~1-2s)
  6. Actual store operation — SQLite INSERT (~10ms)
  7. Process exit (~0.1s)

Steps 1-4 are pure overhead repeated for every single CLI call. The actual useful work (step 6) takes 10ms.

Secondary Issue: cwd()-based agentdb Resolution

The agentdb-runtime-patch.js (line 87) searches for agentdb starting at process.cwd():

const possiblePaths = [
  join(process.cwd(), 'node_modules', 'agentdb'),  // ← finds project-local copy
  // ...
];

When the project has its own agentdb installation (e.g., as a transitive dependency), the patch finds it but it's a different distribution (browser/WASM) without dist/controllers/. This produces the misleading warning:

[AgentDB Patch] Controller index not found: /project/node_modules/agentdb/dist/controllers/index.js

The patch should prefer the npx-bundled copy (which has the correct Node.js distribution) over the project-local copy. Related: #80, #111, #132.

Proposed Solutions

Option A: ruflo memory import-dir (preferred)

A single-process command that imports all files from a directory tree:

ruflo memory import-dir --source docs/ --namespace-from-dir
# Maps: docs/adr/*.md → namespace "adr", docs/ddd/*.md → namespace "ddd"
# One process, one SQLite connection, batch HNSW indexing

Option B: ruflo memory store --batch (stdin JSONL)

Read multiple store operations from stdin:

cat <<EOF | ruflo memory store --batch
{"namespace":"adr","key":"ADR-001","value":"# ADR-001..."}
{"namespace":"adr","key":"ADR-002","value":"# ADR-002..."}
EOF

Option C: ruflo memory store --file

Read value from file instead of --value argument (also avoids shell escaping issues with large markdown content containing backticks, quotes, etc.):

ruflo memory store --namespace adr --key ADR-001 --file docs/adr/ADR-001.md

Option D: Long-running daemon mode

Keep a ruflo memory daemon process alive and send operations via IPC/socket:

ruflo memory daemon &
ruflo memory store --via-daemon --namespace adr --key ADR-001 --value "..."

Fix for cwd() resolution

In agentdb-runtime-patch.js, prioritize the npx/CLI-bundled agentdb over project-local:

function findAgentDBPath() {
  const possiblePaths = [
    // Prefer the bundled copy that ships with ruflo/agentic-flow
    join(dirname(fileURLToPath(import.meta.url)), '..', '..', 'node_modules', 'agentdb'),
    join(dirname(fileURLToPath(import.meta.url)), '..', '..', '..', 'agentdb'),
    // Then fall back to project-local
    join(process.cwd(), 'node_modules', 'agentdb'),
    // ...
  ];
}

Impact

This blocks the /prd2build --import=from-files workflow in Turbo-Flow, which needs to index 76+ documentation files into HNSW memory for semantic search. Currently users must either:

  • Wait 10+ minutes (with most files timing out)
  • Skip memory indexing entirely and use file-based access only

Workaround

We wrote a batch import script (scripts/ruflo-batch-import.mjs) that reuses a single ruflo binary path, but it still spawns a process per file. The real fix needs to happen inside ruflo to keep a single process/connection alive across multiple store operations.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions