MCP server fails on multi-repo cwd + 3 related bugs (ambiguity, path-as-name, case-sensitivity, unsafe consolidate)

## Summary

Engram MCP server (`engram mcp`) keeps creating ambiguity errors and garbage projects when the cwd has multiple `.git` children, and several related issues compound the experience. After running for a few months on Windows my user store accumulated **9 garbage projects with absolute paths as project names** plus several unintended duplicates, while every `mem_save` from the MCP returned `ambiguous_project`.

This issue bundles 4 related bugs found while debugging on `engram 1.14.12` (Windows 11, MCP integrated with Claude Code).

---

## Bug 1 — `mem_save` fails with `ambiguous_project` when cwd has multiple `.git` children

**Severity**: High — blocks `mem_save` from MCP entirely whenever the user's home (or any parent directory) contains multiple repos.

**Repro**:
1. Have multiple git repos directly under `$HOME` (very common on Windows: e.g. `~/engram`, `~/agent-teams-lite`, `~/dotfiles`).
2. Start Engram MCP server with default args (no `--project`).
3. Call `mem_save` with title + content (no project param — schema doesn't accept one anyway).

**Expected**: save succeeds, project resolved to `general` (or last-used, or `--project` default).

**Actual**:
```json
{
  "available_projects": ["agent-teams-lite", "dotfiles", "engram", ...],
  "error_code": "ambiguous_project",
  "hint": "Use mem_current_project to inspect detection results, or cd into one of the listed repositories.",
  "message": "Cannot determine project: ambiguous project: multiple git repos found in cwd"
}
```

**Why it's blocking**: when Engram MCP is launched by Claude Code / OpenCode, the server's cwd is fixed (typically the user's home directory). The user cannot `cd` into a specific repo from inside the agent — the MCP process inherits the parent agent's cwd. So the suggested workaround in `hint` is impossible to apply.

The `mem_save` tool schema also doesn't accept `project` as an argument, so even passing it explicitly doesn't help.

**Suggested fix**: when ambiguity is detected, fall back to (in order):
1. The `--project NAME` flag if provided to the MCP server.
2. The most recently used project (last `last_activity_at`).
3. `general`.

Don't error out — the server has the info to make a sensible default.

---

## Bug 2 — When auto-detection fails, projects get created with the absolute path as the project name

**Severity**: Medium — produces garbage data over time. Workaround = manual SQL cleanup.

After several months of usage I had **9 garbage project entries** in my Engram store, all with Windows-style absolute paths as the project name and **0 observations** but lots of empty session rows. Sample shape (paths anonymized):

| Project name pattern | obs | sessions |
|---|---|---|
| `C:\Users\<user>` | 0 | 296 |
| `C:\Users\<user>\.opencode` | 0 | 53 |
| `C:\<workspace-A>` | 0 | 50 |
| `C:\Users\<user>\source\repos\<repo>` | 0 | 11 |
| `D:\<workspace-B>\<sub>\<sub>` | 0 | 17 |
| ...and 4 more similar | 0 | varied |

These were created in earlier 1.14.x versions where the MCP fell back to "use cwd as the project name" when detection failed. Newer versions error out instead (Bug 1), but the garbage from old versions remains.

**Cleanup query** (worked safely — 434 empty sessions deleted, 0 observations lost):
```sql
DELETE FROM sessions WHERE project LIKE 'C:\\%' AND project IN (
  SELECT project FROM sessions GROUP BY project HAVING COUNT(*) > 0
  AND project NOT IN (SELECT DISTINCT project FROM observations WHERE deleted_at IS NULL)
);
```

**Suggested fix**: validate project names. Reject any name that:
- Contains `\` or `/` (filesystem separators) — these are paths, not project names
- Is an absolute path pattern (`^[A-Z]:` on Windows, `^/` on Unix)

Fall back to `general` instead of using the path verbatim.

Optionally: `engram doctor cleanup-paths` command that detects and deletes these in one shot.

---

## Bug 3 — `mem_search` filter `project: "E3"` is lowercased internally and not found

**Severity**: Medium — case sensitivity inconsistency.

**Repro**:
1. Have a project saved with name `E3` (uppercase).
2. Call `mem_search(query: "...", project: "E3")` from MCP.

**Expected**: returns observations under project `E3`.

**Actual**:
```json
{
  "available_projects": [..., "E3", ...],
  "error_code": "unknown_project",
  "hint": "Use one of the available_projects values, or omit project to auto-detect.",
  "message": "Project \"e3\" not found in store"
}
```

The error message shows the input was lowercased to `"e3"` before lookup, even though `available_projects` lists `E3` (uppercase) in the same response.

**Suggested fix**: case-insensitive lookup against the `available_projects` set, OR preserve case as-is. Currently behavior is inconsistent (server preserves case in storage but downcases in input filter).

This bug is also present in the SQLite backend if you query directly: `SELECT * FROM observations WHERE project = 'e3'` returns nothing (case-sensitive collation), but the user filter that arrives via MCP transforms the input — so it's specifically an input normalization issue at the MCP boundary.

---

## Bug 4 — `engram projects consolidate --all` fuzzy-matching groups unrelated projects

**Severity**: High — runs the risk of catastrophic data merge.

**Repro**:
```bash
engram projects consolidate --all --dry-run
```

**Expected**: groups of projects with truly similar names (typos, casing variants, kebab vs snake case).

**Actual**: returns 4 groups, but **Group 2 contains 46 completely unrelated projects** with no shared substring or naming pattern, and proposes merging all of them into `general`. Sample shape (project names anonymized — what matters is the count and that they are clearly distinct projects):

```
Group 2:
    [1]  <internal-tool-A>                47 obs
    [2]  <absolute-path-B>                 0 obs
    [10] <product-C>                       1 obs
    [11] <product-C-variant>               1 obs
    [12] <presentation-D>                 13 obs
    [14] <mcp-E>                           4 obs
    [16] <plugin-F>                        6 obs
    [17] <daemon-G>                        1 obs
    [22] <training-H>                     10 obs
    [23] <runtime-I>                      12 obs
    [24] <plugin-J>                       90 obs
  → [27] general                         262 obs
    [29] <industrial-K>                   46 obs
    [35] <agent-L>                       119 obs
    [38] <user-home>                     229 obs
    [40] <opencode-plugin-M>              69 obs
    ...46 entries total — every one of these is a distinct project...
  Suggested canonical: "general"
  [dry-run] Would merge into "general"
```

These are **fully distinct projects** with their own observation history (4 to 229 obs each). If a user runs `engram projects consolidate --all` (without `--dry-run`) trusting the algorithm, they would lose project separation across **most of their store**.

**Suggested fix**: tighten the similarity threshold significantly. A reasonable heuristic:
- Levenshtein distance ≤ 3 on the canonical (lowercased, dehyphenated) form, OR
- Substring containment with both strings >50% of each other, OR
- Only group projects that share a normalized prefix of ≥ 4 characters.

The current algorithm seems to use something much looser (maybe just "ends with vowel-consonant" or "shares 1+ token") which is far too permissive.

For reference, **Group 1** in the same report is correct (`course-branding` ↔ `course-presentation`), so the algorithm has the right shape, just calibrated way too loose for Group 2.

**Suggested additional safeguard**: add an interactive `--confirm-each` flag that requires explicit y/N per group. The current `--dry-run` is good but a user who skips dry-run could lose data instantly.

---

## Environment

- OS: Windows 11 Pro 10.0.22631
- Engram: 1.14.12 (also reproduced on 1.14.4)
- Shell: Git Bash + PowerShell 7
- Integration: Claude Code (MCP via `~/.claude/mcp/engram.json`)
- Multi-repo home directory: 3+ git repos directly under `$HOME` (and growing)

## Impact summary

These 4 bugs together produce a degrading user experience over time:
1. Bug 1 prevents normal `mem_save` from MCP → users do SQL workarounds → data ends up inconsistently saved
2. Bug 2 silently accumulates garbage entries that pollute the project list
3. Bug 3 surfaces as confusing "Project not found" when the project clearly exists
4. Bug 4 turns a "cleanup" tool into a data-loss footgun

I'd be glad to provide more diagnostics or test fixes — happy to PR the validation logic for Bug 2 if you'd like.

Thanks for the great tool 🙏


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP server fails on multi-repo cwd + 3 related bugs (ambiguity, path-as-name, case-sensitivity, unsafe consolidate) #283

Summary

Bug 1 — `mem_save` fails with `ambiguous_project` when cwd has multiple `.git` children

Bug 2 — When auto-detection fails, projects get created with the absolute path as the project name

Bug 3 — `mem_search` filter `project: "E3"` is lowercased internally and not found

Bug 4 — `engram projects consolidate --all` fuzzy-matching groups unrelated projects

Environment

Impact summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Project name pattern	obs	sessions
`C:\Users\<user>`	0	296
`C:\Users\<user>\.opencode`	0	53
`C:\<workspace-A>`	0	50
`C:\Users\<user>\source\repos\<repo>`	0	11
`D:\<workspace-B>\<sub>\<sub>`	0	17
...and 4 more similar	0	varied

MCP server fails on multi-repo cwd + 3 related bugs (ambiguity, path-as-name, case-sensitivity, unsafe consolidate) #283

Description

Summary

Bug 1 — mem_save fails with ambiguous_project when cwd has multiple .git children

Bug 2 — When auto-detection fails, projects get created with the absolute path as the project name

Bug 3 — mem_search filter project: "E3" is lowercased internally and not found

Bug 4 — engram projects consolidate --all fuzzy-matching groups unrelated projects

Environment

Impact summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug 1 — `mem_save` fails with `ambiguous_project` when cwd has multiple `.git` children

Bug 3 — `mem_search` filter `project: "E3"` is lowercased internally and not found

Bug 4 — `engram projects consolidate --all` fuzzy-matching groups unrelated projects