Let write-side OSError propagate during backup creation#6704
Conversation
Securetar's create_tar context manager uses a two-phase header write: on enter it writes a placeholder tar header (size unknown), and on exit _finalize_tar_entry seeks back to rewrite the header with the actual size. If an OSError (e.g. ENOSPC) occurs mid-write, the inner tar entry is left with truncated data and a placeholder header. Continuing to write more entries on top of this produces a structurally invalid tar file that cannot be restored. Previously, _folder_save wrapped OSError as BackupError, which store_folders then caught and swallowed — allowing the backup to continue writing to an already corrupt outer tar. Similarly, _create_finalize silently swallowed OSError when writing backup.json, and the finally block in create() could raise a secondary OSError from _close_outer_tarfile that replaced the original exception. Securetar already distinguishes read vs write errors: read-side errors (e.g. permission denied on a source file) are wrapped as AddFileError (non-fatal, skip the file), while write-side OSError propagates as-is. With this change, write-side OSError is wrapped as BackupFatalError (a BackupError subclass) instead of plain BackupError. This ensures: - store_folders/store_addons do not swallow it (they only catch BackupError, and re-raise BackupFatalError explicitly). - The job decorator handles it as a HassioError (no extra Sentry event). Letting OSError bubble up raw would cause the job decorator to treat it as an unhandled exception, capturing it to Sentry and wrapping it as JobException — producing more Sentry noise, not less. - _do_backup catches it via `except BackupError` and deletes the incomplete backup file. This is the correct behavior since the tar is structurally corrupt and not restorable. Changes: - Add BackupFatalError exception for write-side I/O errors. - In create(), use except/else instead of finally so that finalization is skipped when an error already occurred during yield. This prevents a secondary exception from _close_outer_tarfile replacing the original error. - In _create_finalize, raise BackupFatalError on OSError instead of swallowing it. - In _folder_save, wrap OSError as BackupFatalError (not BackupError). - In store_folders, re-raise BackupFatalError instead of swallowing. - In store_supervisor_config, wrap OSError as BackupFatalError. Fixes SUPERVISOR-B53 Fixes SUPERVISOR-1FAJ Fixes SUPERVISOR-BJ4 Fixes SUPERVISOR-18KS Fixes SUPERVISOR-1HE6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
64c52c7 to
3c796ad
Compare
| # Close may fail (e.g. ENOSPC writing end-of-archive | ||
| # markers), but tarfile's finally ensures the file handle | ||
| # is released regardless. The file is unlinked by the caller. | ||
| with suppress(Exception): |
There was a problem hiding this comment.
The fact that this doesn't cause a linter error requiring an ignore comment seems like something we need to fix 😆
| try: | ||
| yield | ||
| finally: | ||
| except Exception: |
There was a problem hiding this comment.
Wait we definitely have a too-broad-except rule or something like that, I've had to disable it before. Is something wrong with our linter? This should require a disable rule comment to pass ci...
There was a problem hiding this comment.
Hm, just checked, so it seems that when the pattern is:
except Exception:
...
raiseThe rule doesn't apply. Which kinda make sense.
mdegat01
left a comment
There was a problem hiding this comment.
Makes sense, good catch. Nitpick on the exception name but looks fine.
supervisor/exceptions.py
Outdated
| """Raise if the backup file already exists.""" | ||
|
|
||
|
|
||
| class BackupFatalError(BackupError): |
There was a problem hiding this comment.
Maybe BackupFatalIOError? The name seems a bit generic compared to the docstring, might get accidentally re-used for unrelated things in the future.
Proposed change
When a write-side
OSError(e.g. ENOSPC, host down) occurs during backup creation, the outer tar file is left in a structurally corrupt state. Securetar'screate_tarcontext manager uses a two-phase header write: it writes a placeholder header on enter and seeks back to rewrite it with the actual size on exit. If anOSErroroccurs mid-write, the inner tar entry has truncated data and a stale placeholder header.Previously,
_folder_savewrappedOSErrorasBackupError, whichstore_foldersthen caught and swallowed -- allowing the backup to continue writing to an already corrupt tar. Similarly,_create_finalizesilently swallowedOSErrorwhen writingbackup.json, and thefinallyblock increate()could raise a secondaryOSErrorfrom_close_outer_tarfilethat replaced the original exception. This secondary exception is what was captured in Sentry as SUPERVISOR-B53 (18k events, 470 users), SUPERVISOR-1FAJ (5.5k events, 595 users), SUPERVISOR-BJ4 (3.7k events), SUPERVISOR-18KS, and SUPERVISOR-1HE6.This PR introduces
BackupFatalError(aBackupErrorsubclass) for write-side I/O errors. Using a dedicated subclass rather than letting rawOSErrorpropagate avoids the job decorator treating it as an unhandled exception (which would send extra events to Sentry and wrap it asJobException). SinceBackupFatalErroris aHassioError, the job decorator handles it cleanly, and_do_backupcatches it viaexcept BackupErrorto delete the incomplete backup file.Changes:
BackupFatalErrorexception for write-side I/O errorscreate(), use except/else instead of finally so finalization is skipped on error. On the error path, close the tar suppressing errors) to release the file handle; the file is unlinked by the caller_create_finalize, raiseBackupFatalErroronOSErrorinstead of swallowing it_folder_saveandstore_supervisor_config, wrapOSErrorasBackupFatalErrorinstead ofBackupErrorstore_foldersandstore_addons, re-raiseBackupFatalErrorinstead of swallowing it like regularBackupErrorType of change
Additional information
Checklist
ruff format supervisor tests)If API endpoints or add-on configuration are added/changed: