Handle add-on filesystem errors gracefully and reduce Sentry noise by agners · Pull Request #6707 · home-assistant/supervisor

agners · 2026-04-07T18:16:08Z

Proposed change

Handle OSError (e.g. errno 74 / EBADMSG) in add-on metadata reads (long_description, refresh_path_cache) gracefully instead of letting them bubble up as unhandled exceptions. A new translatable AddonFileReadError is raised after calling check_oserror() to mark the system unhealthy, giving API consumers a proper error response.

Additionally, in core.py setup(), skip Sentry reporting when the resolution system has already handled the error (detected by checking if a new unhealthy reason was added during task execution). This avoids flooding Sentry with filesystem corruption errors that aren't actionable for developers -- the user is already notified via the resolution system. The log level is also lowered from critical (which triggers Sentry via LoggingIntegration) to error without stack trace in that case.

SUPERVISOR-BC6 alone accounts for 548K Sentry events from a single user with a corrupt filesystem.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality to the supervisor)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes SUPERVISOR-BC6, SUPERVISOR-BZJ
This PR is related to issue:
Link to documentation pull request:
Link to cli pull request:
Link to client library pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Ruff (ruff format supervisor tests)
Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Documentation added/updated for developers.home-assistant.io
CLI updated (if necessary)
Client library updated (if necessary)

Add AddonFileReadError for add-on metadata read failures (long_description, refresh_path_cache) caused by filesystem errors like EBADMSG (errno 74). The new exception calls check_oserror() to mark the system unhealthy via the resolution system, then raises a translatable API error so callers get a proper error response instead of an unhandled OSError. Fixes SUPERVISOR-BC6 (548K events from the API path) and SUPERVISOR-BZJ (from the startup/load path). In core.py setup(), skip reporting exceptions to Sentry when the error has already been handled by the resolution system. This is detected by checking if a new unhealthy reason was added during the task execution (e.g. via check_oserror). In that case the user is already notified, so we log at error level (no stack trace) instead of critical (which would also send to Sentry via the LoggingIntegration) and skip the explicit capture_exception call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

agners · 2026-04-07T18:19:31Z

This PR got a bit out of hand: I simply wanted to introduce a API error for those read cases. But since the same code is also used during setup(), I had to also come up with a solution for this codepath.

I was considering simply handling the new AddonFileReadError exception in setup() , but this didn't seem too scale well (we'd have too keep a list of all the exception which we consider handled during setup() 🤔 .

The solution I chose now is also a bit hacky, so not sure if it's really better.

Simply consider all HassioError as "handled" doesn't work, we have quite some Sentry reports where we maybe need to inform the user in one form or another:

● OK, here's the consolidated list of Sentry issues coming from core.py:setup():

  From _adjust_system_datetime:

  ┌─────────────────┬───────────────────────────────────────────────────────────────────┬────────┬────────────┐
  │      Issue      │                               Error                               │ Events │   Status   │
  ├─────────────────┼───────────────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-92R  │ DBusFatalError: Automatic time synchronization is enabled         │ 14,285 │ unresolved │
  ├─────────────────┼───────────────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-W4W  │ DBusFatalError: Failed to set time zone: Permission denied        │ 333    │ unresolved │
  ├─────────────────┼───────────────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-15YG │ DBusInterfaceMethodError: Invalid or not installed time zone      │ 77     │ unresolved │
  ├─────────────────┼───────────────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-1GF9 │ DBusSystemdNoSuchUnit: Unit systemd-timesyncd.service not loaded. │ 165    │ unresolved │
  ├─────────────────┼───────────────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-WD6  │ HostNotSupportedError: No timedate D-Bus connection available     │ 2,003  │ unresolved │
  └─────────────────┴───────────────────────────────────────────────────────────────────┴────────┴────────────┘

  From OSManager.load → data_disk:

  ┌────────────────┬────────────────────────────────────────────┬────────┬────────────┐
  │     Issue      │                   Error                    │ Events │   Status   │
  ├────────────────┼────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-S5W │ DBusNotConnectedError (via wrap_dbus)      │ 1,081  │ unresolved │
  ├────────────────┼────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-CPN │ DBusNotConnectedError (via connected_dbus) │ 18,155 │ unresolved │
  └────────────────┴────────────────────────────────────────────┴────────┴────────────┘

  From AddonManager.load → asyncio.gather:

  ┌──────────────────────────────┬──────────────────────────────────────────────────────────┬────────┬────────────┐
  │            Issue             │                          Error                           │ Events │   Status   │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-BZJ               │ OSError: [Errno 74] Bad message (icon.png .exists())     │ 57     │ resolved   │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-VAX               │ OSError: [Errno 74] Bad message (DOCS.md .exists())      │ 12     │ resolved   │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────┼────────┼────────────┤
  │ SUPERVISOR-1BA8              │ OSError: [Errno 74] Bad message (translations .exists()) │ 1      │ resolved   │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────┼────────┼────────────┤
  │ Multiple JobException issues │ Docker attach/install failures during addon.load()       │ varies │ unresolved │
  └──────────────────────────────┴──────────────────────────────────────────────────────────┴────────┴────────────┘
  From StoreManager.load → store/data.py:

  ┌──────────────────────────────┬─────────────────────────────────────────────────────────────┬────────┬────────┐
  │            Issue             │                            Error                            │ Events │ Status │
  ├──────────────────────────────┼─────────────────────────────────────────────────────────────┼────────┼────────┤
  │ SUPERVISOR-9X5               │ TypeError: expected string or bytes-like object, got 'dict' │ recent │ -      │
  └──────────────────────────────┴─────────────────────────────────────────────────────────────┴────────┴────────┘

Thoughts?

mdegat01 · 2026-04-08T21:14:27Z

I mean my first though reading the list is we probably shouldn't be relying on setup to report the HassioError type exceptions we want to know about. Like for those errors from _adjust_system_datetime, setup is not the only thing that calls that. If those errors are something we (the supervisor dev team) and the user needs to know about then they should be logged and reported from that method. Otherwise when they occur while making changes on a running Supervisor from the API neither of us will be properly informed.

So yea my take would be using setup for reporting should be a last resort. If HassioErrors are not being handled and reported properly in the places they are being raised then lets fix that. setup should just make sure the ones that weren't handled (non-HassioError type exceptions) have some last resort logging and capturing.

We could also require each of these load methods be jobs with the annotation and then even that handling can be dropped since the Job decorator takes care of that already.

agners requested a review from mdegat01 April 7, 2026 18:16

agners added the bugfix A bug fix label Apr 7, 2026

home-assistant bot added the cla-signed label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle add-on filesystem errors gracefully and reduce Sentry noise#6707

Handle add-on filesystem errors gracefully and reduce Sentry noise#6707
agners wants to merge 1 commit intomainfrom
improve-add-on-file-system-error-message

agners commented Apr 7, 2026

Uh oh!

agners commented Apr 7, 2026 •

edited

Loading

Uh oh!

mdegat01 commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agners commented Apr 7, 2026

Proposed change

Type of change

Additional information

Checklist

Uh oh!

agners commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdegat01 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agners commented Apr 7, 2026 •

edited

Loading

mdegat01 commented Apr 8, 2026 •

edited

Loading