Commit 01eff19
committed
Skip stale worker heartbeats from orphaned worker directories
The supervisor builds the heartbeat batch it sends to Nimbus from every
worker directory present on disk, with no check on whether the worker is
still alive. If a worker directory is ever left behind (a worker that died
before the supervisor finished cleanup), its frozen heartbeat keeps being
reported until the next supervisor restart -- the only time orphaned
directories are reclaimed. Nimbus then repeatedly reads the (often deleted)
topology conf, flooding its log with "Exception when getting heartbeat
timeout" / NotAliveException.
Filter out heartbeats older than supervisor.worker.timeout.secs in
ReportWorkerHeartbeats: such a worker is already considered dead by the
same timeout the slot uses, so it should not be reported. A live worker
always refreshes its heartbeat well within the timeout, so this never
drops a valid heartbeat.1 parent 85a5cbc commit 01eff19
2 files changed
Lines changed: 114 additions & 1 deletion
File tree
- storm-server/src
- main/java/org/apache/storm/daemon/supervisor/timer
- test/java/org/apache/storm/daemon/supervisor/timer
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
| 42 | + | |
39 | 43 | | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
| 47 | + | |
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
| |||
59 | 64 | | |
60 | 65 | | |
61 | 66 | | |
62 | | - | |
| 67 | + | |
63 | 68 | | |
64 | 69 | | |
65 | 70 | | |
| |||
70 | 75 | | |
71 | 76 | | |
72 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
73 | 90 | | |
74 | 91 | | |
75 | 92 | | |
| |||
Lines changed: 96 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
0 commit comments