fix(coordinator): guarantee setup event is set to prevent deadlock on failed initialization#669
fix(coordinator): guarantee setup event is set to prevent deadlock on failed initialization#669firstof9 wants to merge 3 commits into
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #669 +/- ##
==========================================
+ Coverage 84.14% 92.23% +8.09%
==========================================
Files 10 41 +31
Lines 801 4817 +4016
Branches 0 30 +30
==========================================
+ Hits 674 4443 +3769
- Misses 127 374 +247
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
tykeal
left a comment
There was a problem hiding this comment.
Review: fix(coordinator): guarantee setup event is set to prevent deadlock
Correctly root-causes #668. _async_setup only set _initial_setup_done_event on its last line (coordinator.py:158-167); if any prior step raised, the event was never set, and every mutating method waits on it. On delete, async_remove_entry → delete_lock_by_config_entry_id awaits the never-set event and hangs, blocking teardown until the watchdog SIGKILLs Core — matching both repro paths in #668. The try/finally fix is right, and the delete_coordinator data is None guard is a necessary secondary fix (it would TypeError on len(None) right after the deadlock is lifted). CI is green.
Non-blocking items:
-
[SUGGESTION] After a failed
_async_setup, the exception propagates frominitial_setup()toasync_setup_entry:141as a raw exception, so HA errors the entry without retry, while the coordinator stays inhass.data[DOMAIN][COORDINATOR]withkmlocks == {}and the event set. A later reconfigure/reload seesCOORDINATORpresent (__init__.py:138), reuses it, and skipsinitial_setup()— locks never reload, integration is silently non-functional until a full HA restart. Wrap theinitial_setup()failure asConfigEntryNotReadyso HA retries and re-runs setup. Not required to close #668. -
[SUGGESTION]
delete_coordinator(__init__.py:303): the newdata is Nonebranch is untested. Codecov's "100% modified lines" is line-level, not branch-level. Add a test assertingdelete_coordinatorremoves the coordinator whencoordinator.data is Noneand no other entries remain. -
[SUGGESTION]
test_async_setup_successasserts call counts but not ordering. The invariant #668 depends on is that the event is set after the setup steps and_verify_lock_configurationruns after the event on success. Add an ordering assertion. -
[NITPICK] Add a one-line comment on the
finallydocumenting why the event is set unconditionally, so a future refactor doesn't reintroduce the deadlock. -
[NITPICK]
test_async_setup_exception_sets_eventonly raises from_async_load_data(first step). A parametrized variant raising from_rebuild_lock_relationships/_setup_timerswould protect thefinallyguarantee for mid-block failures.
No blockers. Approve-with-nits once the above are considered.
Summary
This PR resolves #668.
When the Keymaster coordinator fails to set up (due to connection failures or missing dependencies during Home Assistant startup), the
_initial_setup_done_eventis never set. Consequently, when a user tries to delete or reconfigure the integration entry afterwards, any downstream calls waiting on this event (await self._initial_setup_done_event.wait()) hang indefinitely. This blocks the config entry unloading task, causing Home Assistant Core to hang and eventually be terminated by the watchdog (SIGKILL / Exit 256).Proposed change
Wrapping the core setup in a
try...finallyblock ensures that_initial_setup_done_eventis always set even if setup fails, allowing entry cleanup/deletion or reload to proceed normally.Type of change
Additional information