Story Form
As the Cloud Storage Archive plugin
I want to register a temp archive's block range with ApplicationStateFacility only after doCompleteMultipartUpload and the companion .meta object are both written successfully
So that a failed or aborted multipart upload never leaves ghost stored-block ranges in ASF that prevent backfill from re-delivering the missing blocks and closing the temp-archive coverage gap
Technical Notes
Background
TempArchiveUploadTask sends PersistedNotification(success=true) and calls addStoredBlockRange after each successfully flushed S3 part, before the multipart upload is complete. If a later part or completeMultipartUpload fails, abortQuietly removes all uploaded data from S3, but the optimistic ASF registrations are not retracted.
The gap this leaves in tempArchiveTracker cannot be closed in-process: StartupRecoveryTask only rebuilds the tracker from .meta files, and .meta is written only after a successful doCompleteMultipartUpload, so an aborted upload leaves nothing for recovery to find. The only path to eventual consolidation is for backfill to re-deliver the missing blocks into a fresh TempArchiveUploadTask. But backfill skips blocks that ASF already reports as stored, so the ghost ranges block re-delivery indefinitely and the group is never consolidated until the node restarts and ASF state is rebuilt from scratch.
Notes
- Remove the per-part
addStoredBlockRange calls inside the while loop and the final-buffer block in TempArchiveUploadTask.
- Add a single
addStoredBlockRange(new LongRange(firstBlock, lastBlock)) call after doCompleteMultipartUpload and doUploadTextFile (.meta) both succeed — matching the point at which the block range becomes durable and discoverable by StartupRecoveryTask.
- The
PersistedNotification(success=true) sends can similarly be deferred to a single notification at the same point; or kept per-part as an optimistic hint to downstream consumers (same trade-off as BlockUploadTask).
- On the failure path, only the
PersistedNotification(success=false) for the last-attempted range needs to remain; no ASF cleanup is required because addStoredBlockRange was never called.
- Update
TempArchiveUploadTaskTest to assert that no addStoredBlockRange call is made before doCompleteMultipartUpload returns successfully, and that a single call covering [firstBlock, lastBlock] is made on success.
- In
StartupRecoveryTask.recoverFromCompletedObjects(), instead of only finding the last key (findLastKey), enumerate all tar objects under the configured prefix and collect every group's block range.
- In
completeRecoveryIfReady(), replace the single addStoredBlockRange(new LongRange(0, nextBlockToQueue - 1)) call with per-archive calls, one addStoredBlockRange per entry in completedRanges.
Story Form
As the Cloud Storage Archive plugin
I want to register a temp archive's block range with ApplicationStateFacility only after doCompleteMultipartUpload and the companion .meta object are both written successfully
So that a failed or aborted multipart upload never leaves ghost stored-block ranges in ASF that prevent backfill from re-delivering the missing blocks and closing the temp-archive coverage gap
Technical Notes
Background
TempArchiveUploadTasksendsPersistedNotification(success=true)and callsaddStoredBlockRangeafter each successfully flushed S3 part, before the multipart upload is complete. If a later part orcompleteMultipartUploadfails,abortQuietlyremoves all uploaded data from S3, but the optimistic ASF registrations are not retracted.The gap this leaves in
tempArchiveTrackercannot be closed in-process:StartupRecoveryTaskonly rebuilds the tracker from .meta files, and .meta is written only after a successfuldoCompleteMultipartUpload, so an aborted upload leaves nothing for recovery to find. The only path to eventual consolidation is for backfill to re-deliver the missing blocks into a freshTempArchiveUploadTask. But backfill skips blocks that ASF already reports as stored, so the ghost ranges block re-delivery indefinitely and the group is never consolidated until the node restarts and ASF state is rebuilt from scratch.Notes
addStoredBlockRangecalls inside the while loop and the final-buffer block inTempArchiveUploadTask.addStoredBlockRange(new LongRange(firstBlock, lastBlock))call afterdoCompleteMultipartUploadanddoUploadTextFile (.meta)both succeed — matching the point at which the block range becomes durable and discoverable byStartupRecoveryTask.PersistedNotification(success=true)sends can similarly be deferred to a single notification at the same point; or kept per-part as an optimistic hint to downstream consumers (same trade-off asBlockUploadTask).PersistedNotification(success=false)for the last-attempted range needs to remain; no ASF cleanup is required becauseaddStoredBlockRangewas never called.TempArchiveUploadTaskTestto assert that noaddStoredBlockRangecall is made beforedoCompleteMultipartUploadreturns successfully, and that a single call covering [firstBlock, lastBlock] is made on success.StartupRecoveryTask.recoverFromCompletedObjects(), instead of only finding the last key (findLastKey), enumerate all tar objects under the configured prefix and collect every group's block range.completeRecoveryIfReady(), replace the singleaddStoredBlockRange(new LongRange(0, nextBlockToQueue - 1))call with per-archive calls, oneaddStoredBlockRangeper entry incompletedRanges.