barman-cli-cloud 3.19.x: backup.info double-write fails against retention-protected destinations (regression from 3.18.0)
Searched first
I searched the issue tracker for retention, WORM, object lock, backup.info, immutability, RetentionPolicy, and BAR-1235 and could not find an existing report for this regression on the upload path. The closest adjacent items I found:
If a duplicate already exists and I missed it, please point me at it and feel free to close this one.
Summary
Since 3.19.0, every cloud backup writes backup.info to the destination twice — once at _start_backup() with status=STARTED and again in the finally: block with status=DONE (or FAILED). On any object store that enforces a retention / immutability window on objects, the second PUT is an intentional overwrite of an object that is still within its retention window, the request is rejected, and every backup ends in FAILED.
3.18.0 wrote backup.info exactly once and works fine against the same destinations.
The double-write lives in the provider-agnostic base class (CloudBackup.coordinate_backup in barman/cloud.py), so the regression applies to all providers (S3, GCS, Azure) — directly observed on GCS; analytically the same code path on S3 with Object Lock or a bucket default retention, and on Azure Blob with a time-based immutability policy or legal hold. See the Provider impact section below.
Affected versions
- Confirmed broken:
barman-cli-cloud 3.19.1-1.pgdg22.04+1
- Confirmed working:
barman-cli-cloud 3.18.0-2.pgdg22.04+2
- Likely also broken: 3.19.0 (the START-write commit landed for that release)
Environment of the observed failure
- Ubuntu 22.04.5 LTS, Python 3.10.12
- Cloud provider:
--cloud-provider=google-cloud-storage
- Bucket: standard GCS bucket with a bucket-retention policy of 90 days (
retentionPeriod: 7776000)
- Authentication: GCP service account JSON via
GOOGLE_APPLICATION_CREDENTIALS
Reproducer (GCS — what we directly observed)
# any GCS bucket with an active retention policy
gcloud storage buckets update gs://repro-bucket --retention-period=1d
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json \
barman-cloud-backup \
--cloud-provider=google-cloud-storage \
--lz4 -J 4 -S 5GB --immediate-checkpoint \
-n test-$(date -u +%Y%m%dT%H%M%S) \
gs://repro-bucket/barman/ <server-name>
The basebackup itself completes; the final write of backup.info in the finally: block fails:
ERROR: 403 PUT https://storage.googleapis.com/upload/storage/v1/b/<bucket>/o
...base/<backup-id>/backup.info...
{
"code": 403,
"message": "Object '<bucket>/...base/<backup-id>/backup.info' is subject
to bucket's retention policy or object retention and cannot
be deleted or overwritten until <retention-end>",
"errors": [{
"message": "...",
"domain": "global",
"reason": "retentionPolicyNotMet"
}]
}
ERROR: Backup failed uploading backup.info file (...)
Expected reproducers on other providers
We haven't run these ourselves, but the same code path produces the same shape of failure on any object-retention configuration:
- S3 with Object Lock (Compliance mode, or Governance mode without
x-amz-bypass-governance-retention: true), with either a bucket default Object Lock retention setting or per-object retention. Expected response: 403 AccessDenied with reason indicating a retention violation on the second PUT of backup.info.
- Azure Blob with a time-based immutability policy or a legal hold on the container, where
upload_blob(..., overwrite=True) is rejected. Azure's response is typically 409 BlobImmutableDueToPolicy / 409 BlobHasImmutabilityPolicy.
Root cause
Introduced in commit d88d385f ("Upload backup.info marked as STARTED at the start of a cloud backup", referencing BAR-1235) and finalized for the 3.19.0 release. The relevant block in barman/cloud.py (CloudBackup.coordinate_backup):
self._start_backup()
# Mark as STARTED and upload backup.info so the backup is visible
# in the catalog immediately. The finally block will overwrite
# this with the final status (DONE or FAILED).
self.backup_info.set_attribute("status", BackupInfo.STARTED)
self._upload_backup_info() # ← 1st PUT
paired with the existing call in finally::
finally:
...
self._upload_backup_info() # ← 2nd PUT, intentional overwrite
coordinate_backup is provider-agnostic (defined on CloudBackup), so this affects every backend.
Provider impact (per source inspection)
None of the providers call _upload_backup_info() through a path that would honor an "if-not-exists" or precondition contract, because coordinate_backup() doesn't pass any such hint:
- GCS (
barman/cloud_providers/google_cloud_storage.py): upload_fileobj issues a plain blob.upload_from_file(fileobj) with no preconditions. fail_if_exists raises NotImplementedError. A # TODO: implement a mechanism to avoid overrides comment is already in the code.
- Azure (
barman/cloud_providers/azure_blob_storage.py): upload_fileobj calls container_client.upload_blob(..., overwrite=True) — overwrite is explicit. fail_if_exists raises NotImplementedError.
- S3 (
barman/cloud_providers/aws_s3.py): fail_if_exists IS implemented (delegates to _put_object), but coordinate_backup() doesn't pass it, so the default path is still an unconditioned s3.meta.client.upload_fileobj(...) and Object Lock will reject the second write.
Notably, the same 3.19.0 release introduced aws_check_object_lock (BAR-1113) to make the delete path WORM-aware on S3. The new START write in BAR-1235 is the equivalent gap on the write path, and it applies to all three providers, not just S3.
Impact
Any user running barman-cli-cloud ≥ 3.19.0 against a destination with any active object-retention / immutability protection — bucket-wide or per-object — will see every backup report FAILED after the basebackup data is uploaded. The data side of the backup is on the object store, but no backup.info is finalized, so the catalog never marks the backup DONE, and barman-cloud-restore and friends won't see the backup as restorable.
Workaround we're using
Pinning barman-cli-cloud and python3-barman at 3.18.0-2.pgdg22.04+2 via apt-mark hold until a fix lands. Posting in case it helps anyone else running into the same on GCS, S3 with Object Lock, or Azure Blob with immutability policies.
Thanks for your time looking at this. Happy to provide additional diagnostics, gather logs at a different verbosity, or run targeted repros — just say what would be most useful.
barman-cli-cloud 3.19.x: backup.info double-write fails against retention-protected destinations (regression from 3.18.0)
Searched first
I searched the issue tracker for
retention,WORM,object lock,backup.info,immutability,RetentionPolicy, andBAR-1235and could not find an existing report for this regression on the upload path. The closest adjacent items I found:If a duplicate already exists and I missed it, please point me at it and feel free to close this one.
Summary
Since 3.19.0, every cloud backup writes
backup.infoto the destination twice — once at_start_backup()withstatus=STARTEDand again in thefinally:block withstatus=DONE(orFAILED). On any object store that enforces a retention / immutability window on objects, the second PUT is an intentional overwrite of an object that is still within its retention window, the request is rejected, and every backup ends inFAILED.3.18.0 wrote
backup.infoexactly once and works fine against the same destinations.The double-write lives in the provider-agnostic base class (
CloudBackup.coordinate_backupinbarman/cloud.py), so the regression applies to all providers (S3, GCS, Azure) — directly observed on GCS; analytically the same code path on S3 with Object Lock or a bucket default retention, and on Azure Blob with a time-based immutability policy or legal hold. See the Provider impact section below.Affected versions
barman-cli-cloud 3.19.1-1.pgdg22.04+1barman-cli-cloud 3.18.0-2.pgdg22.04+2Environment of the observed failure
--cloud-provider=google-cloud-storageretentionPeriod: 7776000)GOOGLE_APPLICATION_CREDENTIALSReproducer (GCS — what we directly observed)
The basebackup itself completes; the final write of
backup.infoin thefinally:block fails:Expected reproducers on other providers
We haven't run these ourselves, but the same code path produces the same shape of failure on any object-retention configuration:
x-amz-bypass-governance-retention: true), with either a bucket default Object Lock retention setting or per-object retention. Expected response:403 AccessDeniedwith reason indicating a retention violation on the second PUT ofbackup.info.upload_blob(..., overwrite=True)is rejected. Azure's response is typically409 BlobImmutableDueToPolicy/409 BlobHasImmutabilityPolicy.Root cause
Introduced in commit
d88d385f("Upload backup.info marked as STARTED at the start of a cloud backup", referencing BAR-1235) and finalized for the 3.19.0 release. The relevant block inbarman/cloud.py(CloudBackup.coordinate_backup):paired with the existing call in
finally::coordinate_backupis provider-agnostic (defined onCloudBackup), so this affects every backend.Provider impact (per source inspection)
None of the providers call
_upload_backup_info()through a path that would honor an "if-not-exists" or precondition contract, becausecoordinate_backup()doesn't pass any such hint:barman/cloud_providers/google_cloud_storage.py):upload_fileobjissues a plainblob.upload_from_file(fileobj)with no preconditions.fail_if_existsraisesNotImplementedError. A# TODO: implement a mechanism to avoid overridescomment is already in the code.barman/cloud_providers/azure_blob_storage.py):upload_fileobjcallscontainer_client.upload_blob(..., overwrite=True)— overwrite is explicit.fail_if_existsraisesNotImplementedError.barman/cloud_providers/aws_s3.py):fail_if_existsIS implemented (delegates to_put_object), butcoordinate_backup()doesn't pass it, so the default path is still an unconditioneds3.meta.client.upload_fileobj(...)and Object Lock will reject the second write.Notably, the same 3.19.0 release introduced
aws_check_object_lock(BAR-1113) to make the delete path WORM-aware on S3. The new START write in BAR-1235 is the equivalent gap on the write path, and it applies to all three providers, not just S3.Impact
Any user running barman-cli-cloud ≥ 3.19.0 against a destination with any active object-retention / immutability protection — bucket-wide or per-object — will see every backup report
FAILEDafter the basebackup data is uploaded. The data side of the backup is on the object store, but nobackup.infois finalized, so the catalog never marks the backupDONE, andbarman-cloud-restoreand friends won't see the backup as restorable.Workaround we're using
Pinning
barman-cli-cloudandpython3-barmanat3.18.0-2.pgdg22.04+2viaapt-mark holduntil a fix lands. Posting in case it helps anyone else running into the same on GCS, S3 with Object Lock, or Azure Blob with immutability policies.Thanks for your time looking at this. Happy to provide additional diagnostics, gather logs at a different verbosity, or run targeted repros — just say what would be most useful.