You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: re-probe unhealthy ColPali endpoints after cooldown (#398)
Previously, when an endpoint failed once it was discarded from
healthy_endpoints and only re-added if ALL endpoints failed at
the same time. A single transient OOM on one of N endpoints could
silently halve sustained ingestion throughput until the worker
process restarted.
Now each endpoint is timestamped when marked unhealthy and
re-included after a 60s cooldown. If it's still failing, the
existing failure path will mark it unhealthy again — no change
to error semantics.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments