Skip to content

Let write-side OSError propagate during backup creation#6704

Merged
agners merged 2 commits intomainfrom
improve-backup-error-handling
Apr 8, 2026
Merged

Let write-side OSError propagate during backup creation#6704
agners merged 2 commits intomainfrom
improve-backup-error-handling

Conversation

@agners
Copy link
Copy Markdown
Member

@agners agners commented Apr 7, 2026

Proposed change

When a write-side OSError (e.g. ENOSPC, host down) occurs during backup creation, the outer tar file is left in a structurally corrupt state. Securetar's create_tar context manager uses a two-phase header write: it writes a placeholder header on enter and seeks back to rewrite it with the actual size on exit. If an OSError occurs mid-write, the inner tar entry has truncated data and a stale placeholder header.

Previously, _folder_save wrapped OSError as BackupError, which store_folders then caught and swallowed -- allowing the backup to continue writing to an already corrupt tar. Similarly, _create_finalize silently swallowed OSError when writing backup.json, and the finally block in create() could raise a secondary OSError from _close_outer_tarfile that replaced the original exception. This secondary exception is what was captured in Sentry as SUPERVISOR-B53 (18k events, 470 users), SUPERVISOR-1FAJ (5.5k events, 595 users), SUPERVISOR-BJ4 (3.7k events), SUPERVISOR-18KS, and SUPERVISOR-1HE6.

This PR introduces BackupFatalError (a BackupError subclass) for write-side I/O errors. Using a dedicated subclass rather than letting raw OSError propagate avoids the job decorator treating it as an unhandled exception (which would send extra events to Sentry and wrap it as JobException). Since BackupFatalError is a HassioError, the job decorator handles it cleanly, and _do_backup catches it via except BackupError to delete the incomplete backup file.

Changes:

  • Add BackupFatalError exception for write-side I/O errors
  • In create(), use except/else instead of finally so finalization is skipped on error. On the error path, close the tar suppressing errors) to release the file handle; the file is unlinked by the caller
  • In _create_finalize, raise BackupFatalError on OSError instead of swallowing it
  • In _folder_save and store_supervisor_config, wrap OSError as BackupFatalError instead of BackupError
  • In store_folders and store_addons, re-raise BackupFatalError instead of swallowing it like regular BackupError

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

@agners agners added the bugfix A bug fix label Apr 7, 2026
@agners agners requested a review from mdegat01 April 7, 2026 13:57
Securetar's create_tar context manager uses a two-phase header write:
on enter it writes a placeholder tar header (size unknown), and on exit
_finalize_tar_entry seeks back to rewrite the header with the actual
size. If an OSError (e.g. ENOSPC) occurs mid-write, the inner tar entry
is left with truncated data and a placeholder header. Continuing to
write more entries on top of this produces a structurally invalid tar
file that cannot be restored.

Previously, _folder_save wrapped OSError as BackupError, which
store_folders then caught and swallowed — allowing the backup to
continue writing to an already corrupt outer tar. Similarly,
_create_finalize silently swallowed OSError when writing backup.json,
and the finally block in create() could raise a secondary OSError from
_close_outer_tarfile that replaced the original exception.

Securetar already distinguishes read vs write errors: read-side errors
(e.g. permission denied on a source file) are wrapped as AddFileError
(non-fatal, skip the file), while write-side OSError propagates as-is.

With this change, write-side OSError is wrapped as BackupFatalError
(a BackupError subclass) instead of plain BackupError. This ensures:
- store_folders/store_addons do not swallow it (they only catch
  BackupError, and re-raise BackupFatalError explicitly).
- The job decorator handles it as a HassioError (no extra Sentry
  event). Letting OSError bubble up raw would cause the job decorator
  to treat it as an unhandled exception, capturing it to Sentry and
  wrapping it as JobException — producing more Sentry noise, not less.
- _do_backup catches it via `except BackupError` and deletes the
  incomplete backup file. This is the correct behavior since the tar
  is structurally corrupt and not restorable.

Changes:
- Add BackupFatalError exception for write-side I/O errors.
- In create(), use except/else instead of finally so that finalization
  is skipped when an error already occurred during yield. This prevents
  a secondary exception from _close_outer_tarfile replacing the
  original error.
- In _create_finalize, raise BackupFatalError on OSError instead of
  swallowing it.
- In _folder_save, wrap OSError as BackupFatalError (not BackupError).
- In store_folders, re-raise BackupFatalError instead of swallowing.
- In store_supervisor_config, wrap OSError as BackupFatalError.

Fixes SUPERVISOR-B53
Fixes SUPERVISOR-1FAJ
Fixes SUPERVISOR-BJ4
Fixes SUPERVISOR-18KS
Fixes SUPERVISOR-1HE6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@agners agners force-pushed the improve-backup-error-handling branch from 64c52c7 to 3c796ad Compare April 7, 2026 14:00
# Close may fail (e.g. ENOSPC writing end-of-archive
# markers), but tarfile's finally ensures the file handle
# is released regardless. The file is unlinked by the caller.
with suppress(Exception):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this doesn't cause a linter error requiring an ignore comment seems like something we need to fix 😆

try:
yield
finally:
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait we definitely have a too-broad-except rule or something like that, I've had to disable it before. Is something wrong with our linter? This should require a disable rule comment to pass ci...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, just checked, so it seems that when the pattern is:

    except Exception:
        ...
        raise

The rule doesn't apply. Which kinda make sense.

Copy link
Copy Markdown
Contributor

@mdegat01 mdegat01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, good catch. Nitpick on the exception name but looks fine.

"""Raise if the backup file already exists."""


class BackupFatalError(BackupError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe BackupFatalIOError? The name seems a bit generic compared to the docstring, might get accidentally re-used for unrelated things in the future.

@agners agners requested a review from mdegat01 April 7, 2026 17:04
@agners agners merged commit 1fcfede into main Apr 8, 2026
21 checks passed
@agners agners deleted the improve-backup-error-handling branch April 8, 2026 14:45
@github-actions github-actions bot locked and limited conversation to collaborators Apr 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants