feat: add Milvus storage backend (Lite / self-hosted / Zilliz Cloud) by zc277584121 · Pull Request #721 · doobidoo/mcp-memory-service

zc277584121 · 2026-04-16T08:04:57Z

Summary

Adds a new milvus storage backend to the Strategy Pattern under storage/, alongside sqlite-vec, cloudflare, and hybrid. Implemented against the existing MemoryStorage abstract interface. Supports three deployment modes from the same code path — Milvus Lite (local .db file, zero deps, good for scripts and tests), self-hosted Milvus via Docker (recommended for MCP servers and long-lived services), and Zilliz Cloud (managed, at-scale).

What's in this PR

src/mcp_memory_service/storage/milvus.py (new) — MilvusMemoryStorage built on pymilvus.MilvusClient. AUTOINDEX + COSINE, dynamic fields for tags/metadata, one MilvusClient per storage instance, asyncio.to_thread to run the sync pymilvus client from async handlers. All functions graded ≤ B (cyclomatic complexity ≤ 8).
Factory wiring — storage/factory.py, server_impl.py (eager + lazy init paths), cli/utils.py, web/api/configuration.py, utils/db_utils.py — four dispatch points in the codebase all now recognize milvus.
Config — MCP_MILVUS_URI, MCP_MILVUS_TOKEN, MCP_MILVUS_COLLECTION_NAME. The MCP_ prefix is intentional; pymilvus reserves unprefixed MILVUS_URI and validates it at import time.
Optional dependency group — pyproject.toml adds [project.optional-dependencies].milvus = [pymilvus, milvus-lite, <conditional setuptools<81 on Py 3.13>]. No impact on default installs.
Graceful reconnect for Milvus Lite — when Milvus Lite's embedded daemon exits after idle, the backend detects the specific error signatures and reconnects once per RPC. Only active when uri.endswith(".db"); remote backends never take the reconnect path. See upstream milvus-lite#334 for context.
Tests — 32 new Milvus tests in tests/storage/test_milvus.py. Full suite: 1620 passed.
Docs — docs/milvus-backend.md (new dedicated guide), docs/guides/STORAGE_BACKENDS.md extended with a Milvus column and decision flowchart, README.md and .env.example both reference the new backend with appropriate caveats.

Which URI to use

URI form	Best for
`./milvus.db` (Milvus Lite)	Scripts, tests, short-lived CLIs. Not recommended for MCP servers or other long-lived services — see docs for why.
`http://host:19530`	Recommended for MCP deployments. Stable, concurrent-safe, easy to run via `milvusdb/milvus` Docker image.
`https://<cluster>.zillizcloud.com` + token	Managed, at scale.

Verification

Full test suite: pytest -q tests/ --timeout=180 → 1620 passed, 0 failed (baseline 1588 + 32 Milvus).
Real Claude Code MCP end-to-end against Docker Milvus (uri=http://localhost:19530): health, 3 stores, semantic search (Milvus memory ranked add MCP server badge #1 for "vector database"), tag filter (exactly 1 match for "user-preference"), total_memories=3 — all green.
Milvus Lite sequential 75 s idle probe: store → sleep 75 s → store — both succeed, retrieve returns 2 hits, thanks to the reconnect helper.

Migration / compatibility

Zero breaking changes. Existing deployments on sqlite_vec, hybrid, cloudflare are untouched.
New backend is opt-in via MCP_MEMORY_STORAGE_BACKEND=milvus and the [milvus] optional-dependency group.
Schema validator and health-check registry both recognize MilvusMemoryStorage so memory_health and the web dashboard report correct backend identity.

Introduces MilvusMemoryStorage, a pluggable backend that uses pymilvus's MilvusClient API. One configuration supports three deployment targets: * Milvus Lite (default): single local file, zero external dependencies * Self-hosted Milvus server: http(s) endpoint * Zilliz Cloud: managed endpoint with a token Schema and semantics: * Primary key is the memory's content hash, so exact-hash dedup is free and cleanup_duplicates() is a no-op. * Tags are stored as ",tag1,tag2," with leading/trailing commas and matched with "tags like '%,tag,%'" — exact match (no substring). * AUTOINDEX + COSINE metric work identically on Lite, server, and Cloud. * Semantic deduplication honours the same MCP_SEMANTIC_DEDUP_* env vars as sqlite_vec, so the service layer sees consistent behaviour. * Hard delete (no tombstone table); soft-delete/sync scenarios should continue to use the hybrid backend. Wiring: * storage/factory.py routes backend == "milvus" to the new class and reads MCP_MILVUS_URI / MCP_MILVUS_TOKEN / MCP_MILVUS_COLLECTION_NAME. * config.py exposes those three variables and extends SUPPORTED_BACKENDS. The MCP_ prefix is deliberate — pymilvus reserves MILVUS_URI and validates it at import time, so a Lite file path in the unprefixed env var would error before our code runs. * pyproject.toml ships a new "milvus" optional-dependency group that pulls in pymilvus>=2.5.0, milvus-lite>=2.4.10, and pins setuptools<81 on Python 3.13+ (milvus-lite still imports pkg_resources). Tests: * tests/storage/test_milvus.py (23 tests) covers initialize, store, retrieve, batch, exact-tag match, AND/OR tag search, delete paths, update, stats, and semantic dedup. Uses Milvus Lite with a shared DB file + per-test collection to avoid spawning a daemon per test. * Full suite: 1611 passed (baseline 1588 + 23 new). Docs: * docs/milvus-backend.md — per-target deployment guide, env vars, schema, limitations, troubleshooting. * docs/guides/STORAGE_BACKENDS.md — adds Milvus to the comparison table, selection flowchart, and config reference. * README.md / .env.example mention the new backend option.

gemini-code-assist

Code Review

This pull request introduces a Milvus storage backend to the MCP Memory Service, supporting Milvus Lite, self-hosted Milvus, and Zilliz Cloud. The implementation includes a new MilvusMemoryStorage class, updated configuration handling, and comprehensive unit tests. Feedback focuses on improving the scalability and correctness of the Milvus implementation, specifically addressing issues with server-side filtering for substring searches, fixing broken pagination and sorting in query methods, ensuring consistent timestamp metadata, and optimizing fetch limits during vector searches.

gemini-code-assist · 2026-04-16T08:08:16Z

+    async def get_by_exact_content(self, content: str) -> List[Memory]:
+        """Case-insensitive substring match on content.
+
+        Matches sqlite_vec's semantics (which uses ``LIKE '%...%' COLLATE NOCASE``).
+        Milvus ``like`` is case-sensitive, so we scan candidates and filter in
+        Python.  For expected collection sizes (thousands to low millions) this
+        is acceptable; production users with huge collections should prefer
+        full-text search on a dedicated field instead.
+        """
+        if not self._ensure_initialized():
+            return []
+
+        memories = await self.get_all_memories()
+        needle = content.lower()
+        return [m for m in memories if needle in (m.content or "").lower()]


This method performs a full scan of the collection (up to _MILVUS_MAX_LIMIT) and filters in Python, which has significant drawbacks:

Incompleteness: It only searches the first 16,384 records. In a collection with millions of memories, matches beyond this limit will never be found.

Performance/Memory Risk: Fetching thousands of records into memory as Python objects to perform a substring search is inefficient and poses an OOM (Out of Memory) risk for large collections.

Consider using the Milvus like operator for server-side filtering to narrow down candidates. Even if case-insensitivity requires a final Python check, the number of records fetched would be drastically reduced. For true case-insensitive scalability, consider storing a normalized lowercase field or using a Milvus analyzer (available for TEXT fields in 2.4+).

gemini-code-assist · 2026-04-16T08:08:16Z

+    async def _query_memories(
+        self,
+        filter_expr: str,
+        limit: int,
+        offset: int = 0,
+        sort_desc_key: Optional[str] = None,
+    ) -> List[Memory]:
+        """Run a scalar ``query`` and return Memory objects, sorted client-side."""
+        limit = max(1, min(limit, _MILVUS_MAX_LIMIT))
+        # Milvus has a limit on offset (~16384), but total rows < offset+limit still works.
+        fetch_limit = min(offset + limit, _MILVUS_MAX_LIMIT)
+
+        try:
+            rows = await self._call_client(
+                "query",
+                collection_name=self.collection_name,
+                filter=filter_expr or "",
+                output_fields=list(self._OUTPUT_FIELDS),
+                limit=fetch_limit,
+            )
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Milvus query failed (filter=%r): %s", filter_expr, exc)
+            return []
+
+        memories: List[Memory] = []
+        for row in rows:
+            mem = self._entity_to_memory(row)
+            if mem is not None:
+                memories.append(mem)
+
+        if sort_desc_key:
+            memories.sort(key=lambda m: getattr(m, sort_desc_key) or 0.0, reverse=True)
+
+        if offset:
+            memories = memories[offset:]
+        return memories[:limit]
+


The current implementation of _query_memories has several issues that affect correctness and scalability:

Incorrect Sorting: Sorting is performed in Python after fetching a subset of records. Without an order_by clause in the Milvus query, the database returns an arbitrary set of records. This means get_recent_memories will not reliably return the most recent entries.

Broken Pagination: Slicing is done in Python on a result set capped by _MILVUS_MAX_LIMIT. If offset + limit exceeds this limit, the function will return fewer records than requested or even an empty list (e.g., if offset >= 16384), even if more records exist.

Efficiency: Milvus supports server-side sorting and pagination via order_by, limit, and offset. Using these parameters directly in the query call is significantly more efficient and correct.

Note: While Milvus has a default max_query_limit (typically 16384), using the offset parameter allows paginating through the entire collection.

async def _query_memories( self, filter_expr: str, limit: int, offset: int = 0, sort_desc_key: Optional[str] = None, ) -> List[Memory]: """Run a scalar query and return Memory objects, sorted server-side.""" query_params = { "collection_name": self.collection_name, "filter": filter_expr or "", "output_fields": list(self._OUTPUT_FIELDS), "limit": limit, "offset": offset, } if sort_desc_key: query_params["order_by"] = [f"{sort_desc_key} desc"] try: rows = await self._call_client("query", **query_params) except Exception as exc: # noqa: BLE001 logger.warning("Milvus query failed (filter=%r): %s", filter_expr, exc) return [] memories: List[Memory] = [] for row in rows: mem = self._entity_to_memory(row) if mem is not None: memories.append(mem) return memories

gemini-code-assist · 2026-04-16T08:08:16Z

+    def _memory_to_entity(self, memory: Memory, embedding: List[float]) -> Dict[str, Any]:
+        return {
+            "id": memory.content_hash,
+            "vector": embedding,
+            "content": memory.content or "",
+            "tags": _tags_to_string(memory.tags),
+            "memory_type": memory.memory_type or "",
+            "metadata": json.dumps(memory.metadata) if memory.metadata else "{}",
+            "created_at": float(memory.created_at) if memory.created_at is not None else time.time(),
+            "updated_at": float(memory.updated_at) if memory.updated_at is not None else time.time(),
+            "created_at_iso": memory.created_at_iso or "",
+            "updated_at_iso": memory.updated_at_iso or "",
+        }


There is an inconsistency in timestamp handling when fields are missing from the Memory object. If memory.created_at is None, it is initialized with time.time(), but created_at_iso is stored as an empty string. This results in records having a valid float timestamp but an empty ISO string, which may break consumers expecting consistent temporal metadata. Both fields should be populated using the same timestamp value. A similar pattern should be applied to _build_update_entity.

def _memory_to_entity(self, memory: Memory, embedding: List[float]) -> Dict[str, Any]: now = time.time() created_at = float(memory.created_at) if memory.created_at is not None else now updated_at = float(memory.updated_at) if memory.updated_at is not None else now def to_iso(ts): return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat().replace("+00:00", "Z") return { "id": memory.content_hash, "vector": embedding, "content": memory.content or "", "tags": _tags_to_string(memory.tags), "memory_type": memory.memory_type or "", "metadata": json.dumps(memory.metadata) if memory.metadata else "{}", "created_at": created_at, "updated_at": updated_at, "created_at_iso": memory.created_at_iso or to_iso(created_at), "updated_at_iso": memory.updated_at_iso or to_iso(updated_at), }

gemini-code-assist · 2026-04-16T08:08:16Z

+    def _retrieve_fetch_limit(n_results: int, tag_filter: str) -> int:
+        # Over-fetch when a tag filter is active — Milvus applies filters during
+        # search but effective selectivity can vary.
+        base = n_results * 3 if tag_filter else n_results
+        return max(1, min(base, _MILVUS_MAX_LIMIT))
+


Over-fetching by a factor of 3 when a tag filter is active is unnecessary in Milvus. Milvus performs integrated filtering during the vector search process, ensuring that the limit requested is satisfied by matching entities (if they exist). Unlike systems that perform post-filtering, Milvus's search with a filter expression is efficient and exact. Reducing fetch_n to n_results would reduce network overhead and database load.

Suggested change

def _retrieve_fetch_limit(n_results: int, tag_filter: str) -> int:

# Over-fetch when a tag filter is active — Milvus applies filters during

# search but effective selectivity can vary.

base = n_results * 3 if tag_filter else n_results

return max(1, min(base, _MILVUS_MAX_LIMIT))

@staticmethod

def _retrieve_fetch_limit(n_results: int, tag_filter: str) -> int:

# Milvus applies filters during search, so over-fetching is not required.

return max(1, min(n_results, _MILVUS_MAX_LIMIT))

doobidoo · 2026-04-16T11:18:14Z

Review findings — Gemini cross-check + additions

Read the four Gemini comments and verified each against src/mcp_memory_service/storage/milvus.py. Summary of where I agree, where I'd push back, and a few things Gemini missed.

Gemini #1 — `get_by_exact_content` full-scan (line 1156-1170, high) — ✅ agree

Real risk. Two problems compound:

Hard cap at _MILVUS_MAX_LIMIT (16 384) — silently truncates on larger collections.
Full Python materialization of every row before substring filter — OOM risk + RTT bandwidth.

Caveat on Gemini's fix: pushing down tags like "%,...,%" works because tags are normalized lowercase; a generic like "%needle%" on content is case-sensitive in Milvus and so doesn't match sqlite_vec's LIKE … COLLATE NOCASE semantics (verified at sqlite_vec.py:2668). To get parity and server-side filtering, store a normalized content_lower field at insert time and push like down on that. Otherwise the function will quietly diverge from sqlite_vec on case-mixed content.

Gemini #2 — `_query_memories` sort + pagination (line 1499-1534, high) — ✅ agree, this is the most important finding

Two real bugs once a collection grows past 16 384 rows:

Sort is fenced inside a fetch window. pymilvus.query() doesn't guarantee row order, so get_recent_memories will sort the first arbitrary 16 384 rows — not the actually-most-recent ones. On any non-trivial collection, "recent" becomes "random subset that happens to be recent".
Pagination silently caps. fetch_limit = min(offset + limit, _MILVUS_MAX_LIMIT) plus client-side slicing means once offset + limit > 16 384 you can never page past row 16 384.

Recommended fix: use pymilvus.QueryIterator for sorted/paginated scans, or push sorting into Milvus via order_by (Milvus 2.5+ scalar query supports it). Either is preferable to the current client-side approach.

Gemini #3 — timestamp inconsistency in `_memory_to_entity` (line 576-588, medium) — ✅ agree, trivial fix

Confirmed at lines 584-587:

"created_at": float(memory.created_at) if memory.created_at is not None else time.time(),
...
"created_at_iso": memory.created_at_iso or "",

Two clocks, one record. Compute now = time.time() once when either is missing, derive the ISO string from the same value. Same fix in _build_update_entity.

Gemini #4 — over-fetch 3× when tag filter active (line 878-883, medium) — ❌ I'd push back

Gemini says Milvus's filtered ANN guarantees limit is satisfied. That's only true for brute-force search; with AUTOINDEX/HNSW + a selective filter, the search can hit ef_search boundary before collecting enough matching candidates and return fewer than limit results. The 3× defensive over-fetch is a standard pattern (Pinecone, Weaviate, Qdrant clients all do similar) and is correct here.

Two improvements that would address Gemini's underlying concern without losing the safety margin:

Add a one-line comment at line 880 explaining why (filtered ANN under HNSW can under-return) — would have pre-empted this finding.
Consider 2× instead of 3× as a small win on bandwidth; 3× is conservative.

I'd close this thread as "won't fix" with the rationale.

Additional findings Gemini missed

`_EMBEDDING_CACHE` keyed by `hash(text)` (line 466-480, medium)

_EMBEDDING_CACHE: Dict[int, List[float]] = {}
...
cache_key = hash(text)

Two issues:

Hash collision → wrong embedding returned. Python hash() for strings is 64-bit; collisions are rare but real, and a wrong embedding silently corrupts retrieval. Key by text itself.
Unbounded growth. No eviction — long-lived MCP processes will grow this monotonically. Use functools.lru_cache or cachetools.LRUCache with an explicit cap.

This is a latent correctness + memory-leak bug that won't surface in the 32 unit tests.

Milvus Lite reconnect retry — single-attempt is correct, worth a test

The reconnect helper at _call_client (around line 540-573) only retries once per RPC, which is the right call (avoids retry storms). Worth adding a test that simulates two consecutive dead-channel errors and asserts the second one propagates — currently the 75 s idle probe only validates the happy path.

Hybrid backend integration

hybrid.py isn't extended to support Milvus as the cloud target. Not a blocker for this PR, but worth a follow-up issue so Milvus + local SQLite-Vec mirror is achievable later.

Bottom line

Strong PR overall — clean separation, three deployment modes from one code path, defensive concurrency, real reconnect handling for a known upstream issue. The two high-severity Gemini findings (get_by_exact_content, _query_memories) are real and worth fixing before merge. The medium timestamp fix is trivial. I'd push back on Gemini's over-fetch finding. The embedding-cache collision is the one Gemini missed and I'd add it to the must-fix list.

Happy to take any of these as follow-up commits if it helps move the PR forward.

…ery ordering, timestamp consistency, bounded embedding cache - Add a mirrored ``content_lower`` VARCHAR field on the collection schema and populate it on every insert/upsert. ``get_by_exact_content`` now pushes a case-insensitive ``like`` filter down to Milvus instead of fetching every row into Python. Pre-existing collections without the field fall back to a client-side scan with a startup warning. - Replace the plain ``client.query(limit=...)`` path in ``_query_memories`` with ``query_iterator`` so sorting and pagination apply to the full matching set rather than to a random 16384-row window. Fixes silent truncation on any collection larger than the per-RPC cap. - Derive ``created_at`` / ``created_at_iso`` (and their ``updated_at`` counterparts) from a single ``time.time()`` reading in ``_memory_to_entity`` and ``_build_update_entity`` so the epoch and ISO columns never reference different moments when a field is missing. - Replace the module-level ``Dict[int, List[float]]`` embedding cache with a bounded LRU keyed by ``f"{model_name}::{text}"``. Eliminates the silent-wrong-embedding risk from 64-bit ``hash(text)`` collisions and caps size at 1024 entries to prevent unbounded growth in long-lived processes. - Document the 3× over-fetch on tag-filtered ANN retrieval: HNSW filtered ANN can drop candidates after the filter, so we over-fetch to guarantee the caller receives ``n_results`` hits. Tests: 1627 passed (7 new), 0 failed. All functions ≤ B complexity.

zc277584121 · 2026-04-17T08:40:16Z

Thanks for the thorough review! Pushed fixes for all four findings as a new commit (787815a9) so the delta is easy to see.

get_by_exact_content full-scan (high) — Added a mirrored content_lower VARCHAR field to the collection schema and populate it on every insert/upsert. get_by_exact_content now pushes a case-insensitive like filter down to Milvus (content_lower like "%<needle>%") instead of fetching every row into Python. Pre-existing collections without the field fall back to the old client-side scan with a startup warning — schema introspection in _validate_existing_collection sets a self._has_content_lower flag that gates both the store path (skip the field) and the read path (fallback to scan).
_query_memories sort + pagination (high) — Replaced the client.query(limit=...) path with pymilvus.QueryIterator. Sorting is still client-side (Milvus Lite has no working order_by for scalar queries in 2.6 — confirmed empirically; STL_SORT index type is also rejected by Lite), but the iterator streams the full matching set in batches of 1000 rather than a random 16384-row window, so sort + offset + limit now produce correct results at any collection size. Added two regression tests for ordering and for cross-page consistency.
Timestamp consistency (medium) — Introduced _resolve_timestamps that computes now = time.time() once and derives ISO via datetime.fromtimestamp(now, tz=timezone.utc).isoformat(). Both _memory_to_entity and _build_update_entity use it, so epoch and ISO columns always reference the same instant when a field is missing.
Embedding cache collision + unbounded growth (medium) — Replaced the Dict[int, List[float]] keyed by hash(text) with a bounded OrderedDict LRU (max 1024 entries) keyed by f"{model_name}::{text}". No collision risk (keys are the full text; different models don't collide), no unbounded growth, thread-safe via a threading.Lock.

Also added the one-liner explaining the 3× over-fetch on tag-filtered search — HNSW filtered ANN can eliminate candidates after the filter, so we over-fetch to guarantee n_results hits.

Validation:

Milvus test file: 39 passed (7 new targeted tests for each fix).
Full suite: 1627 passed, 0 failed (up from 1620).
Complexity: all functions A-B, average A (4.1) via radon cc.

doobidoo · 2026-04-17T20:57:13Z

Follow-up review — ready to merge with two asks

Verified commit 787815a9 against the diff. All four findings addressed cleanly — not just claimed in the comment but actually present in the code. Quick recap so the thread is self-contained:

content_lower filter (was finding add MCP server badge #1): new VARCHAR field on the schema, populated in both _memory_to_entity and _build_update_entity, proper like pushdown with \" escaping, and a legacy fallback with a one-shot startup warning on pre-existing collections. Matches sqlite_vec's COLLATE NOCASE semantics.
QueryIterator (was finding Add configuration for memory service and test client #2): client.query(limit=…) replaced with query_iterator(batch_size=1000) drained inside a single _write_lock acquisition — sort + offset + limit now apply to the full matching set instead of an arbitrary 16384-row window. iterator.close() in a finally so teardown can't leak channels.
Timestamp consistency (was finding Implement Automated Backup System with Scheduling #3): new _resolve_timestamps helper derives both epoch and ISO from one time.time() reading. Also replaces the deprecated datetime.utcfromtimestamp(...) path with datetime.fromtimestamp(epoch, tz=timezone.utc) — a welcome drive-by fix for Py 3.12+ deprecation.
Embedding cache (was my addition, finding Side-loading the database #5): OrderedDict LRU capped at 1024, keyed by f"{model_name}::{text}", guarded by threading.Lock, correct move_to_end on hit and popitem(last=False) eviction. No collision risk, no unbounded growth.

Complexity still A-B across the board (avg A), 1627 tests green (7 new regression tests for ordering, cross-page consistency, cache eviction, legacy fallback).

Approving on code quality. Before merge, two asks that are about long-term sustainability rather than code correctness:

1. CI integration test against Docker Milvus

The 39 unit tests exercise Milvus Lite only. Self-hosted Milvus is the recommended mode for MCP deployments per the docs, but it's the one path with zero automated coverage today. Could you add a CI job that spins up milvusdb/milvus in a service container (similar to how we gate the Cloudflare backend behind CLOUDFLARE_* secrets) and runs a subset of the Milvus suite against uri=http://localhost:19530? This doesn't need to be exhaustive — health, store, semantic search, tag filter, and the new content_lower pushdown would be enough to catch Milvus-server-vs-Lite divergences early.

Happy to help wire this into .github/workflows/ if that speeds things up.

2. Maintenance commitment

pymilvus + Milvus semantics are deep enough that nobody else on the project can realistically debug a Milvus-specific regression right now. Would you be willing to commit to ~6 months of being the first responder on issues labeled backend:milvus? Not exclusive ownership — just a soft SLA so we don't end up with a 1750-line backend that becomes unmaintainable if someone files a bug in October.

If yes, I'll add a CODEOWNERS entry for src/mcp_memory_service/storage/milvus.py pointing at your handle and a note in docs/milvus-backend.md under "Support".

Neither blocks merge on code grounds; both are about making sure this lands in a sustainable way. Let me know your take on both and we can move this forward.

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Milvus storage backend (Lite / self-hosted / Zilliz Cloud)#721

feat: add Milvus storage backend (Lite / self-hosted / Zilliz Cloud)#721
zc277584121 wants to merge 2 commits intodoobidoo:mainfrom
zc277584121:milvus-storage-backend

zc277584121 commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

doobidoo commented Apr 16, 2026

Uh oh!

zc277584121 commented Apr 17, 2026

Uh oh!

doobidoo commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zc277584121 commented Apr 16, 2026

Summary

What's in this PR

Which URI to use

Verification

Migration / compatibility

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

doobidoo commented Apr 16, 2026

Review findings — Gemini cross-check + additions

Gemini #1 — get_by_exact_content full-scan (line 1156-1170, high) — ✅ agree

Gemini #2 — _query_memories sort + pagination (line 1499-1534, high) — ✅ agree, this is the most important finding

Gemini #3 — timestamp inconsistency in _memory_to_entity (line 576-588, medium) — ✅ agree, trivial fix

Gemini #4 — over-fetch 3× when tag filter active (line 878-883, medium) — ❌ I'd push back

Additional findings Gemini missed

_EMBEDDING_CACHE keyed by hash(text) (line 466-480, medium)

Milvus Lite reconnect retry — single-attempt is correct, worth a test

Hybrid backend integration

Bottom line

Uh oh!

zc277584121 commented Apr 17, 2026

Uh oh!

doobidoo commented Apr 17, 2026

Follow-up review — ready to merge with two asks

1. CI integration test against Docker Milvus

2. Maintenance commitment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gemini #1 — `get_by_exact_content` full-scan (line 1156-1170, high) — ✅ agree

Gemini #2 — `_query_memories` sort + pagination (line 1499-1534, high) — ✅ agree, this is the most important finding

Gemini #3 — timestamp inconsistency in `_memory_to_entity` (line 576-588, medium) — ✅ agree, trivial fix

`_EMBEDDING_CACHE` keyed by `hash(text)` (line 466-480, medium)