Skip to content

fix(coordinator): guarantee setup event is set to prevent deadlock on failed initialization#669

Open
firstof9 wants to merge 3 commits into
FutureTense:mainfrom
firstof9:fix-setup-deadlock-668
Open

fix(coordinator): guarantee setup event is set to prevent deadlock on failed initialization#669
firstof9 wants to merge 3 commits into
FutureTense:mainfrom
firstof9:fix-setup-deadlock-668

Conversation

@firstof9

@firstof9 firstof9 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR resolves #668.

When the Keymaster coordinator fails to set up (due to connection failures or missing dependencies during Home Assistant startup), the _initial_setup_done_event is never set. Consequently, when a user tries to delete or reconfigure the integration entry afterwards, any downstream calls waiting on this event (await self._initial_setup_done_event.wait()) hang indefinitely. This blocks the config entry unloading task, causing Home Assistant Core to hang and eventually be terminated by the watchdog (SIGKILL / Exit 256).

Proposed change

Wrapping the core setup in a try...finally block ensures that _initial_setup_done_event is always set even if setup fails, allowing entry cleanup/deletion or reload to proceed normally.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

@firstof9 firstof9 requested a review from tykeal July 3, 2026 23:56
@github-actions github-actions Bot added the bugfix Fixes a bug label Jul 3, 2026
@codecov-commenter

codecov-commenter commented Jul 3, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.23%. Comparing base (cdb4922) to head (e4b8c3a).
⚠️ Report is 186 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #669      +/-   ##
==========================================
+ Coverage   84.14%   92.23%   +8.09%     
==========================================
  Files          10       41      +31     
  Lines         801     4817    +4016     
  Branches        0       30      +30     
==========================================
+ Hits          674     4443    +3769     
- Misses        127      374     +247     
Flag Coverage Δ
python 92.09% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@firstof9 firstof9 marked this pull request as ready for review July 4, 2026 01:36

@tykeal tykeal left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(coordinator): guarantee setup event is set to prevent deadlock

Correctly root-causes #668. _async_setup only set _initial_setup_done_event on its last line (coordinator.py:158-167); if any prior step raised, the event was never set, and every mutating method waits on it. On delete, async_remove_entrydelete_lock_by_config_entry_id awaits the never-set event and hangs, blocking teardown until the watchdog SIGKILLs Core — matching both repro paths in #668. The try/finally fix is right, and the delete_coordinator data is None guard is a necessary secondary fix (it would TypeError on len(None) right after the deadlock is lifted). CI is green.

Non-blocking items:

  • [SUGGESTION] After a failed _async_setup, the exception propagates from initial_setup() to async_setup_entry:141 as a raw exception, so HA errors the entry without retry, while the coordinator stays in hass.data[DOMAIN][COORDINATOR] with kmlocks == {} and the event set. A later reconfigure/reload sees COORDINATOR present (__init__.py:138), reuses it, and skips initial_setup() — locks never reload, integration is silently non-functional until a full HA restart. Wrap the initial_setup() failure as ConfigEntryNotReady so HA retries and re-runs setup. Not required to close #668.

  • [SUGGESTION] delete_coordinator (__init__.py:303): the new data is None branch is untested. Codecov's "100% modified lines" is line-level, not branch-level. Add a test asserting delete_coordinator removes the coordinator when coordinator.data is None and no other entries remain.

  • [SUGGESTION] test_async_setup_success asserts call counts but not ordering. The invariant #668 depends on is that the event is set after the setup steps and _verify_lock_configuration runs after the event on success. Add an ordering assertion.

  • [NITPICK] Add a one-line comment on the finally documenting why the event is set unconditionally, so a future refactor doesn't reintroduce the deadlock.

  • [NITPICK] test_async_setup_exception_sets_event only raises from _async_load_data (first step). A parametrized variant raising from _rebuild_lock_relationships / _setup_timers would protect the finally guarantee for mid-block failures.

No blockers. Approve-with-nits once the above are considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ISSUE: Home Assistant Core crashes when reconfiguring or deleting Keymaster integration entry

3 participants