Skip to content

Support restarting pegaflow-server while vLLM keeps running #273

Description

@xiaguan

Problem

PegaFlow should support restarting pegaflow-server while the vLLM process continues running.

Today the vLLM connector creates an EngineRpcClient during connector initialization, opens a long-lived session stream, and registers CUDA KV cache contexts against the server. If the server restarts, the server-side in-memory state is lost and the existing client/session/registration state may no longer be valid.

Relevant current code paths:

  • python/src/lib.rs eagerly connects EngineRpcClient, reuses a tonic client/channel, and holds the Session stream in start_session_watcher.
  • python/pegaflow/connector/__init__.py starts the session watcher once during scheduler connector initialization.
  • python/pegaflow/connector/state_manager.py can mark the service unavailable and health-check it, but its current scope is mainly scheduler query fallback.
  • python/pegaflow/connector/worker.py registers CUDA KV cache tensors and performs load/save RPCs using the existing client.

Requested feature

Make the connector recover automatically after a pegaflow-server restart without requiring a vLLM restart.

Required behavior:

  • Detect server disconnects and failed RPCs as service unavailability.
  • Reconnect or recreate the underlying engine client/channel after the server becomes healthy again.
  • Re-open the scheduler Session stream after reconnecting.
  • Re-register worker KV cache contexts after reconnecting, because the restarted server loses its CUDA IPC registry and engine state.
  • During downtime, degrade predictably:
    • scheduler cache queries should behave as misses;
    • load/save paths should fail without hanging;
    • connector state should stay internally consistent.
  • After recovery, new requests should be able to query, save, and load through PegaFlow again while the same vLLM process keeps running.

Expected behavior

  • Operators can restart or roll pegaflow-server without restarting vLLM.
  • Old in-memory cache contents may be lost unless backed by durable storage, but the connector should recover to a correct empty-cache or rebuilt-cache state.
  • Recovery should be observable through logs and metrics.
  • Add tests that simulate server unavailability and recovery across scheduler and worker paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions