Handle add-on filesystem errors gracefully and reduce Sentry noise#6707
Handle add-on filesystem errors gracefully and reduce Sentry noise#6707
Conversation
Add AddonFileReadError for add-on metadata read failures (long_description, refresh_path_cache) caused by filesystem errors like EBADMSG (errno 74). The new exception calls check_oserror() to mark the system unhealthy via the resolution system, then raises a translatable API error so callers get a proper error response instead of an unhandled OSError. Fixes SUPERVISOR-BC6 (548K events from the API path) and SUPERVISOR-BZJ (from the startup/load path). In core.py setup(), skip reporting exceptions to Sentry when the error has already been handled by the resolution system. This is detected by checking if a new unhealthy reason was added during the task execution (e.g. via check_oserror). In that case the user is already notified, so we log at error level (no stack trace) instead of critical (which would also send to Sentry via the LoggingIntegration) and skip the explicit capture_exception call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR got a bit out of hand: I simply wanted to introduce a API error for those read cases. But since the same code is also used during I was considering simply handling the new The solution I chose now is also a bit hacky, so not sure if it's really better. Simply consider all Thoughts? |
|
I mean my first though reading the list is we probably shouldn't be relying on So yea my take would be using setup for reporting should be a last resort. If We could also require each of these load methods be jobs with the annotation and then even that handling can be dropped since the |
Proposed change
Handle
OSError(e.g. errno 74 / EBADMSG) in add-on metadata reads (long_description,refresh_path_cache) gracefully instead of letting them bubble up as unhandled exceptions. A new translatableAddonFileReadErroris raised after callingcheck_oserror()to mark the system unhealthy, giving API consumers a proper error response.Additionally, in
core.pysetup(), skip Sentry reporting when the resolution system has already handled the error (detected by checking if a new unhealthy reason was added during task execution). This avoids flooding Sentry with filesystem corruption errors that aren't actionable for developers -- the user is already notified via the resolution system. The log level is also lowered fromcritical(which triggers Sentry viaLoggingIntegration) toerrorwithout stack trace in that case.SUPERVISOR-BC6 alone accounts for 548K Sentry events from a single user with a corrupt filesystem.
Type of change
Additional information
Checklist
ruff format supervisor tests)If API endpoints or add-on configuration are added/changed: