Skip to content

fix: resolve multi-repo cwd ambiguity + 3 related bugs (Closes #283)#284

Open
Maicololiveras wants to merge 1 commit intoGentleman-Programming:mainfrom
Maicololiveras:fix/multi-repo-cwd-and-related-bugs
Open

fix: resolve multi-repo cwd ambiguity + 3 related bugs (Closes #283)#284
Maicololiveras wants to merge 1 commit intoGentleman-Programming:mainfrom
Maicololiveras:fix/multi-repo-cwd-and-related-bugs

Conversation

@Maicololiveras
Copy link
Copy Markdown

Summary

Fixes the 4 related bugs reported in #283:

  1. mem_save blocks with ambiguous_project when cwd has multiple .git children (very common when MCP server starts from $HOME).
  2. Path-like project names can slip into the store (legacy C:\Users\… artifacts).
  3. mem_search with project="E3" reports "Project 'e3' not found" even though the store has a row under E3 (case-sensitivity inconsistency for legacy data).
  4. engram projects consolidate --all groups dozens of unrelated projects into a single component because every project transitively shares $HOME as a "directory fingerprint" → unsafe merge proposed.

Closes #283

Changes by bug

Bug #1mem_save ambiguity fallback

  • New flag engram mcp --default-project=NAME configures a project to fall back to when cwd-based detection returns ErrAmbiguousProject.
  • New package-level setter mcp.SetDefaultProjectFallback(name) in internal/mcp/mcp.go. The fallback is opt-in: empty string preserves the historical "fail fast" contract.
  • When the fallback fires, the response envelope keeps the original available_projects and adds Source: "default_fallback" plus a Warning so callers can see what happened.
  • Help text updated.

Files: internal/mcp/mcp.go, cmd/engram/main.go (flag parsing + help text).

Bug #2 — Reject path-like project names

  • NormalizeProject now detects filesystem paths (Windows drive paths, UNC, Unix absolute, anything with \ or /) and reduces them to the basename with a warning.
  • Pure helper sanitizePathLike extracted for testability.
  • Edge case: a drive root (C:\) yields empty basename → recursion bottoms out at "unknown" instead of leaving an empty project name.

Files: internal/store/store.go.

Bug #3 — Case-insensitive project resolution

  • New Store.ResolveProjectName(input) method: tries case-sensitive match first (fast path), falls back to COLLATE NOCASE and returns the canonical stored casing. Distinct casings (e.g. both E3 and e3 in store) remain individually addressable — exact-case match wins.
  • resolveReadProject in internal/mcp/mcp.go now uses ResolveProjectName instead of ProjectExists, with two attempts: normalized form first, then the raw user override (in case the user passed exact stored casing that survives normalization).
  • The returned DetectionResult.Project is the canonical stored name, not the normalized input — so downstream queries match exact rows.

Files: internal/store/store.go, internal/mcp/mcp.go.

Bug #4 — Tighten consolidate similarity grouping

  • groupSimilarProjects is now a thin wrapper over groupSimilarProjectsWithMode(projects, strictSimilarity bool).
  • New constant maxSharedProjectsForDirMatch = 3: when a directory is touched by more than 3 distinct projects, it is treated as too noisy to be a fingerprint and is skipped in the union-find. This is what eliminates the catastrophic transitive cluster where every project sharing $HOME got merged into one component.
  • New flag engram projects consolidate --strict-similarity: disables shared-directory union entirely. Only name-similarity edges are formed. Recommended for stores where the user knows many projects share noisy parent paths.
  • Help text updated.

Files: cmd/engram/main.go.

Tests added

  • TestNormalizeProject_PathLike (8 cases — Windows drive paths, UNC, Unix absolute, relative paths with separators, drive root edge case, plain-name passthrough).
  • TestResolveProjectName_CaseInsensitive (legacy E3 row inserted directly via SQL, covers exact match, lowercase fallback, unknown, empty).
  • TestResolveProjectName_PrefersExactCaseWhenBothExist (asserts addressability of distinct casings).
  • TestResolveWriteProject_AmbiguousFallback (fallback to general when --default-project is set, asserts envelope fields).
  • TestSetDefaultProjectFallback_NormalizesAndTrims (confirms input normalization).
  • TestGroupSimilarProjects_NoisyDirectoryNotMerged (5 unrelated projects sharing $HOME → 0 groups).
  • TestGroupSimilarProjects_LegitimateRenameStillMerges (2 projects with unique shared dir → 1 group, canonical chosen by obs count).
  • TestGroupSimilarProjectsWithMode_StrictDisablesDirUnion (strict mode skips dir union even for legitimate renames).

TestResolveWriteProject_AmbiguousError was updated to explicitly clear the fallback so it continues to assert the historical error path.

Backward compatibility

  • No breaking changes. All new behavior is opt-in:
    • --default-project is empty by default → ambiguous cwd still errors as before.
    • --strict-similarity is opt-in → consolidate behavior unchanged for users who do not pass it.
    • Path sanitization in NormalizeProject only activates when the input contains path separators — names without separators (the overwhelming majority of inputs) are unchanged.
    • ResolveProjectName is a new method; existing ProjectExists is untouched. The MCP layer now prefers the new method but case-sensitive matches still work identically.

Test plan

  • go build ./... — clean.
  • go test ./internal/project/ ./internal/store/ ./internal/mcp/ ./cmd/engram/ -count=1 — all passing locally (existing + new tests).
  • Maintainer to confirm CI passes.

Notes

🙏 Thanks for the great tool.

Fixes 4 related issues that compound for users with multiple git repos
in $HOME or large project stores accumulated over months. All fixes are
opt-in or strictly additive — no breaking changes.

Bug Gentleman-Programming#1 — mem_save ambiguity fallback (--default-project flag)
  resolveWriteProject now consults a configurable fallback when cwd-based
  detection returns ErrAmbiguousProject. Operators opt in by launching
  the MCP server with `engram mcp --default-project=NAME`. Empty default
  preserves the historical fail-fast contract. The fallback response
  carries Source="default_fallback" + Warning so callers can see what
  happened.

Bug Gentleman-Programming#2 — Reject path-like project names
  NormalizeProject now detects filesystem paths (Windows drive paths,
  UNC, Unix absolute, anything with separators) and reduces them to the
  basename with a warning. A new helper sanitizePathLike isolates the
  detection for testability. This prevents future "C:\Users\foo"
  artifacts from polluting the project list when detection fails.

Bug Gentleman-Programming#3 — Case-insensitive project resolution
  New Store.ResolveProjectName(input) tries case-sensitive first, falls
  back to COLLATE NOCASE, and returns the canonical stored casing. The
  MCP layer uses it from resolveReadProject so a search filter like
  project=\"E3\" finds legacy data stored as \"E3\" (previously reported
  as not found because the store normalized the input to \"e3\" before
  the case-sensitive lookup). Distinct casings remain individually
  addressable — exact-case match always wins.

Bug Gentleman-Programming#4 — Tighten consolidate similarity grouping
  groupSimilarProjects now skips directories shared by more than 3
  distinct projects (parameterized by maxSharedProjectsForDirMatch).
  This eliminates the catastrophic transitive cluster where every
  project sharing \$HOME got merged into a single component (in one
  user's store: 46 unrelated projects proposed for merge into
  \"general\"). Adds new flag --strict-similarity that disables
  shared-directory union entirely for fully name-based grouping.

Tests
  - 8 cases for NormalizeProject_PathLike (Windows/UNC/Unix/relative).
  - ResolveProjectName covers exact match, case-insensitive fallback,
    distinct-casing addressability, unknown, and empty.
  - resolveWriteProject_AmbiguousFallback + SetDefaultProjectFallback
    normalization.
  - groupSimilarProjects regression (noisy dir not merged) +
    legitimate-rename-still-merges + strict-mode skips dir union.

Closes Gentleman-Programming#283
@Maicololiveras
Copy link
Copy Markdown
Author

CI environment note for the maintainer reviewing this PR:

Confirmed locally that cmd/engram test suite has a pre-existing 10-minute hang on Windows that is independent of this PR. Verified by:

  1. Checking out origin/main (commit 5f3329f, no PR changes applied) and running go test ./cmd/engram/ -timeout=120s -count=1.
  2. Test suite hung at the timeout. Stack trace shows goroutines blocked on chan receive for >1 minute, all rooted at cmd/engram/main.go:734cmdServe.func1() (line 735):
goroutine 1261 [chan receive, 1 minutes]:
github.com/Gentleman-Programming/engram/cmd/engram.cmdServe.func1()
	C:/Users/maicolj/engram/cmd/engram/main.go:735 +0x2f
created by github.com/Gentleman-Programming/engram/cmd/engram.cmdServe in goroutine 1268
	C:/Users/maicolj/engram/cmd/engram/main.go:734 +0x436

The test packages exercised by this PR (internal/project, internal/store, internal/mcp) all pass cleanly:

ok  	github.com/Gentleman-Programming/engram/internal/project	5.459s
ok  	github.com/Gentleman-Programming/engram/internal/mcp	19.327s

(internal/store reports two pre-existing Windows TempDir/SQLite-locking flakes unrelated to this PR's changes — both functions create their own temp DBs so it's likely the same Windows file-locking issue around test cleanup.)

The cmd/engram tests added by this PR (TestGroupSimilarProjects_*) pass when run in isolation:

go test ./cmd/engram/ -run "TestGroupSimilarProjects" -count=1 -v
=== RUN   TestGroupSimilarProjects_NoisyDirectoryNotMerged
--- PASS: TestGroupSimilarProjects_NoisyDirectoryNotMerged (0.00s)
=== RUN   TestGroupSimilarProjects_LegitimateRenameStillMerges
--- PASS: TestGroupSimilarProjects_LegitimateRenameStillMerges (0.00s)
=== RUN   TestGroupSimilarProjectsWithMode_StrictDisablesDirUnion
--- PASS: TestGroupSimilarProjectsWithMode_StrictDisablesDirUnion (0.00s)

CI on Linux likely won't hit the same hang — but flagging it in case it surfaces and the reviewer wonders if the PR introduced it. Happy to dig into the cmdServe goroutine leak as a follow-up issue if it'd be useful.

Thanks again 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP server fails on multi-repo cwd + 3 related bugs (ambiguity, path-as-name, case-sensitivity, unsafe consolidate)

1 participant