Skip to content

barman-cli-cloud 3.19.x: backup.info double-write fails against retention-protected destinations (regression from 3.18.0) #1195

@rtumaykin

Description

@rtumaykin

barman-cli-cloud 3.19.x: backup.info double-write fails against retention-protected destinations (regression from 3.18.0)

Searched first

I searched the issue tracker for retention, WORM, object lock, backup.info, immutability, RetentionPolicy, and BAR-1235 and could not find an existing report for this regression on the upload path. The closest adjacent items I found:

If a duplicate already exists and I missed it, please point me at it and feel free to close this one.

Summary

Since 3.19.0, every cloud backup writes backup.info to the destination twice — once at _start_backup() with status=STARTED and again in the finally: block with status=DONE (or FAILED). On any object store that enforces a retention / immutability window on objects, the second PUT is an intentional overwrite of an object that is still within its retention window, the request is rejected, and every backup ends in FAILED.

3.18.0 wrote backup.info exactly once and works fine against the same destinations.

The double-write lives in the provider-agnostic base class (CloudBackup.coordinate_backup in barman/cloud.py), so the regression applies to all providers (S3, GCS, Azure) — directly observed on GCS; analytically the same code path on S3 with Object Lock or a bucket default retention, and on Azure Blob with a time-based immutability policy or legal hold. See the Provider impact section below.

Affected versions

  • Confirmed broken: barman-cli-cloud 3.19.1-1.pgdg22.04+1
  • Confirmed working: barman-cli-cloud 3.18.0-2.pgdg22.04+2
  • Likely also broken: 3.19.0 (the START-write commit landed for that release)

Environment of the observed failure

  • Ubuntu 22.04.5 LTS, Python 3.10.12
  • Cloud provider: --cloud-provider=google-cloud-storage
  • Bucket: standard GCS bucket with a bucket-retention policy of 90 days (retentionPeriod: 7776000)
  • Authentication: GCP service account JSON via GOOGLE_APPLICATION_CREDENTIALS

Reproducer (GCS — what we directly observed)

# any GCS bucket with an active retention policy
gcloud storage buckets update gs://repro-bucket --retention-period=1d

GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json \
  barman-cloud-backup \
    --cloud-provider=google-cloud-storage \
    --lz4 -J 4 -S 5GB --immediate-checkpoint \
    -n test-$(date -u +%Y%m%dT%H%M%S) \
    gs://repro-bucket/barman/ <server-name>

The basebackup itself completes; the final write of backup.info in the finally: block fails:

ERROR: 403 PUT https://storage.googleapis.com/upload/storage/v1/b/<bucket>/o
  ...base/<backup-id>/backup.info...
  {
    "code": 403,
    "message": "Object '<bucket>/...base/<backup-id>/backup.info' is subject
                to bucket's retention policy or object retention and cannot
                be deleted or overwritten until <retention-end>",
    "errors": [{
      "message": "...",
      "domain": "global",
      "reason": "retentionPolicyNotMet"
    }]
  }
ERROR: Backup failed uploading backup.info file (...)

Expected reproducers on other providers

We haven't run these ourselves, but the same code path produces the same shape of failure on any object-retention configuration:

  • S3 with Object Lock (Compliance mode, or Governance mode without x-amz-bypass-governance-retention: true), with either a bucket default Object Lock retention setting or per-object retention. Expected response: 403 AccessDenied with reason indicating a retention violation on the second PUT of backup.info.
  • Azure Blob with a time-based immutability policy or a legal hold on the container, where upload_blob(..., overwrite=True) is rejected. Azure's response is typically 409 BlobImmutableDueToPolicy / 409 BlobHasImmutabilityPolicy.

Root cause

Introduced in commit d88d385f ("Upload backup.info marked as STARTED at the start of a cloud backup", referencing BAR-1235) and finalized for the 3.19.0 release. The relevant block in barman/cloud.py (CloudBackup.coordinate_backup):

self._start_backup()

# Mark as STARTED and upload backup.info so the backup is visible
# in the catalog immediately.  The finally block will overwrite
# this with the final status (DONE or FAILED).
self.backup_info.set_attribute("status", BackupInfo.STARTED)
self._upload_backup_info()    # ← 1st PUT

paired with the existing call in finally::

finally:
    ...
    self._upload_backup_info()    # ← 2nd PUT, intentional overwrite

coordinate_backup is provider-agnostic (defined on CloudBackup), so this affects every backend.

Provider impact (per source inspection)

None of the providers call _upload_backup_info() through a path that would honor an "if-not-exists" or precondition contract, because coordinate_backup() doesn't pass any such hint:

  • GCS (barman/cloud_providers/google_cloud_storage.py): upload_fileobj issues a plain blob.upload_from_file(fileobj) with no preconditions. fail_if_exists raises NotImplementedError. A # TODO: implement a mechanism to avoid overrides comment is already in the code.
  • Azure (barman/cloud_providers/azure_blob_storage.py): upload_fileobj calls container_client.upload_blob(..., overwrite=True) — overwrite is explicit. fail_if_exists raises NotImplementedError.
  • S3 (barman/cloud_providers/aws_s3.py): fail_if_exists IS implemented (delegates to _put_object), but coordinate_backup() doesn't pass it, so the default path is still an unconditioned s3.meta.client.upload_fileobj(...) and Object Lock will reject the second write.

Notably, the same 3.19.0 release introduced aws_check_object_lock (BAR-1113) to make the delete path WORM-aware on S3. The new START write in BAR-1235 is the equivalent gap on the write path, and it applies to all three providers, not just S3.

Impact

Any user running barman-cli-cloud ≥ 3.19.0 against a destination with any active object-retention / immutability protection — bucket-wide or per-object — will see every backup report FAILED after the basebackup data is uploaded. The data side of the backup is on the object store, but no backup.info is finalized, so the catalog never marks the backup DONE, and barman-cloud-restore and friends won't see the backup as restorable.

Workaround we're using

Pinning barman-cli-cloud and python3-barman at 3.18.0-2.pgdg22.04+2 via apt-mark hold until a fix lands. Posting in case it helps anyone else running into the same on GCS, S3 with Object Lock, or Azure Blob with immutability policies.


Thanks for your time looking at this. Happy to provide additional diagnostics, gather logs at a different verbosity, or run targeted repros — just say what would be most useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions