Commit 8723547
Handle ZCONNECTIONLOSS with exponential backoff
Summary:
Zeus ZCONNECTIONLOSS errors are used for loadshedding (see
https://fb.workplace.com/groups/zeus.users/permalink/31336632699291946/).
Currently these fall through to `InternalError::Other`, which uses quadratic
backoff starting at 100ms — the worker crashes after 5 consecutive failures
in ~3 seconds. This is too aggressive and doesn't give Zeus enough breathing
room to recover.
This diff adds a dedicated `TransientZeusError` variant to `InternalError`
that matches `ZCONNECTIONLOSS` and applies exponential backoff starting at
500ms (500ms, 1s, 2s, 4s, then crash). This extends the total retry window
from ~3s to ~7.5s before the worker crashes.
This is a mitigation while we request more Zelos capacity.
Alert: https://fburl.com/onedetection/yrvpdgcc
Reviewed By: YousefSalama
Differential Revision: D99992330
fbshipit-source-id: 28193c9f6744579b89d38e38c59c961c9050513e1 parent 5b4eb52 commit 8723547
File tree
1 file changed
+6
-0
lines changed- eden/mononoke/repo_attributes/repo_derivation_queues/src
1 file changed
+6
-0
lines changedLines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
58 | 64 | | |
59 | 65 | | |
60 | 66 | | |
| |||
0 commit comments