Commit 85b863a
authored
fix: Synchronously refresh publication and clear cache if failed (#2490)
Followup to #2487
In the above PR I fixed an issue that would cause the publication
manager to exit, which made Electric go into a restart loop.
There were two issues that compounded that problem:
1. I had previously converted the recovery of the shapes on the
publication manager asynchronous, using `cast` instead of `call`, which
means that Electric went ahead with initialisation and marked itself
healthy despite not managing to actually recover - if it had waited for
the recovery to finish and fail the container would never have been
marked as healthy and never deployed.
2. If a recovery fails, for whatever reason, it might enter into an
infinite recovery loop as it is likely some of the persisted state from
which we are trying to recover is causing the issue. We should be able
to recover from that since we consider the storage/cache disposable.
In order to solve the above issues, I implement the following:
1. Convert the calls back to synchronous calls, increasing the timeout
for `ShapeCache` initialisation slightly, as that was the original
purpose of making them asynchronous (it might take a while to recover
things).
a. I'm not entirely sure if this is a bit of an anti-pattern - we could
place the initialisation in a `{:continue, :restore_shapes}` after the
init - my concern is that if the rest of Electric continues on and
initialises everything else before the restore is done we might have a
problem.
2. Use `try..catch..after` blocks to ensure cleanup occurs if any
recovery operation fails, such that Electric will still manage to start
after the supervisor restarts the process and the old, problematic state
is cleaned up.
a. It's a bit of a brutal approach, I've tested it by tweaking the
publication manager to fail every now and then and it works really
nicely, where it will erase the stored shapes and restart and continues
on like normal.
b. We could theoretically place this at a higher level, such as the
`start_link` of the `ShapeCache`, so that we always clear the cache if
initialisation fails, but I wanted to keep the scope tight and only to
things that we have observed during real operation.
I can spend time to add a sort of integration test with a failing
`PublicationManager` to better simulate this situation in a test suite
rather than ad hoc.
Tagging @balegas as well as we discussed this change1 parent 077aad2 commit 85b863a
File tree
5 files changed
+91
-28
lines changed- .changeset
- packages
- sync-service
- lib/electric
- replication
- test/electric
- typescript-client/test
5 files changed
+91
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
Lines changed: 13 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
111 | | - | |
| 110 | + | |
112 | 111 | | |
113 | 112 | | |
114 | 113 | | |
| |||
124 | 123 | | |
125 | 124 | | |
126 | 125 | | |
127 | | - | |
128 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
| |||
194 | 197 | | |
195 | 198 | | |
196 | 199 | | |
197 | | - | |
198 | | - | |
199 | | - | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
200 | 203 | | |
201 | 204 | | |
202 | 205 | | |
203 | | - | |
204 | | - | |
205 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
206 | 209 | | |
207 | 210 | | |
208 | 211 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
| |||
209 | 212 | | |
210 | 213 | | |
211 | 214 | | |
212 | | - | |
| 215 | + | |
| 216 | + | |
213 | 217 | | |
214 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
215 | 230 | | |
216 | 231 | | |
217 | 232 | | |
| |||
270 | 285 | | |
271 | 286 | | |
272 | 287 | | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
278 | 298 | | |
279 | | - | |
280 | | - | |
281 | | - | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
282 | 303 | | |
283 | 304 | | |
284 | 305 | | |
| |||
304 | 325 | | |
305 | 326 | | |
306 | 327 | | |
307 | | - | |
308 | | - | |
309 | | - | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
310 | 333 | | |
311 | 334 | | |
312 | 335 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
893 | 893 | | |
894 | 894 | | |
895 | 895 | | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
896 | 926 | | |
897 | 927 | | |
898 | 928 | | |
| |||
916 | 946 | | |
917 | 947 | | |
918 | 948 | | |
919 | | - | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
920 | 952 | | |
921 | 953 | | |
922 | 954 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
635 | 635 | | |
636 | 636 | | |
637 | 637 | | |
638 | | - | |
639 | | - | |
| 638 | + | |
| 639 | + | |
640 | 640 | | |
641 | 641 | | |
642 | 642 | | |
| |||
927 | 927 | | |
928 | 928 | | |
929 | 929 | | |
930 | | - | |
| 930 | + | |
931 | 931 | | |
932 | 932 | | |
933 | 933 | | |
| |||
0 commit comments