Skip to content

[Enhancement] Add liveness and readiness probes to cleaner service #194

Description

@NotYuSheng

Description

The cleaner service, which runs as a background Redis pub/sub listener, lacks liveness and readiness probes. Without these, Kubernetes cannot determine if the service is running correctly or has entered a non-functional state (e.g., thread deadlock, Redis connectivity issues). This could lead to silent failures of the cleanup process, causing accumulation of stale data in MinIO storage and ChromaDB vector database.

The cleaner service currently:

  • Runs as a background daemon with Redis keyspace event listener
  • Has no HTTP endpoints or health check mechanisms
  • Cannot be detected by Kubernetes if it enters a deadlocked state
  • Is critical for preventing storage bloat and maintaining system health

Acceptance Criteria

  1. The task is complete when cleaner service has functioning liveness and readiness probes.
  2. Additional requirement: Kubernetes can detect and restart cleaner pods that become unresponsive.
  3. Additional requirement: Health checks verify Redis connectivity and thread health.

Notes

  • Implementation options:
    • Add minimal HTTP health server with /health and /ready endpoints
    • Implement file-based health checks with exec probes
    • Track Redis pub/sub thread health and connectivity status
  • Current deployment template: helm/cleaner/templates/deployment.yaml
  • Service implementation: cleaner/main.py and cleaner/utils/cleaner.py
  • Estimated effort: 1-2 days development work
  • Priority: Medium (not urgent unless experiencing cleanup failures)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementTechnical improvements, infra, refactoring

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions