Skip to content

Milvus Lite daemon becomes unreachable after ~60s of idle on a long-lived MilvusClient #334

Description

@zc277584121

Summary

With Milvus Lite 2.5.1, a long-lived MilvusClient stops working after roughly 60 seconds of idle. The client object is never recreated, never closed, and runs in a single sync Python process — but the next RPC after the idle gap raises:

MilvusException: (code=2, message=Fail connecting to server on unix:/tmp/tmpXXX_<db>.sock,
illegal connection params or server unavailable)

Shorter idles (≤30s) work fine. The unix socket file still exists on disk, but connections to it fail, suggesting the milvus-lite daemon subprocess has exited while the Python side still holds its MilvusClient.

This hurts any long-lived Python process that talks to Milvus Lite infrequently — MCP servers, long-running web backends with low QPS, interactive Jupyter notebooks with multi-minute gaps between cells, scheduled jobs, REPL-style tools, etc.

Minimal reproduction (no threads, no asyncio, no MCP)

import os, time
from pymilvus import MilvusClient, DataType

db = "/tmp/lite_repro.db"
if os.path.exists(db):
    os.unlink(db)

client = MilvusClient(uri=db)

schema = client.create_schema(auto_id=False)
schema.add_field(field_name="id", datatype=DataType.VARCHAR, is_primary=True, max_length=64)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=4)
idx = client.prepare_index_params()
idx.add_index(field_name="vector", index_type="AUTOINDEX", metric_type="COSINE")
client.create_collection(collection_name="probe", schema=schema, index_params=idx)

client.insert("probe", [{"id": "a", "vector": [0.1, 0.2, 0.3, 0.4]}])
print("first insert OK")

time.sleep(60)   # <-- key

client.insert("probe", [{"id": "b", "vector": [0.5, 0.6, 0.7, 0.8]}])
# MilvusException: Fail connecting to server on unix:/tmp/tmp1ns3wqia_lite_repro.db.sock,
# illegal connection params or server unavailable

Idle-duration matrix (same script, only time.sleep(N) changes)

idle N (s) second insert
5 ✅ OK
15 ✅ OK
30 ✅ OK
60 ❌ FAIL (server unavailable)

Threshold sits somewhere between 30s and 60s on this environment. Same behavior on fresh ./milvus.db files (not caused by file-locking / stale state).

Environment

  • OS: Ubuntu 22.04 (Linux 5.15.0-174-generic, x86_64)
  • Python: 3.12.9
  • pymilvus: 2.6.8
  • milvus-lite: 2.5.1 (latest stable on PyPI as of 2026-04-16)

What I'd like to understand / propose

  1. Is this intentional? i.e. does milvus-lite deliberately shut down the daemon subprocess after an idle timeout to free resources?
  2. If intentional, is there a supported way to keep the daemon alive, or a documented recommendation for long-lived clients (e.g. "periodically call list_collections() as a keepalive" or "reconstruct MilvusClient on code=2 errors")?
  3. If not intentional, this looks like a real regression for any long-lived low-QPS user of Milvus Lite. Would be great to have either a fix (client-side auto-reconnect on server unavailable / closed channel) or explicit documentation of the unsupported usage pattern.

Related issues I found but none cover this specific pattern: #88, #152, #195, #216, #263, #264.

Happy to contribute a fix or docs patch once the intended behavior is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions