Skip to content

Defer ASF stored-block registration in TempArchiveUploadTask until multipart upload completes #3041

@ivannov

Description

@ivannov

Story Form

As the Cloud Storage Archive plugin
I want to register a temp archive's block range with ApplicationStateFacility only after doCompleteMultipartUpload and the companion .meta object are both written successfully
So that a failed or aborted multipart upload never leaves ghost stored-block ranges in ASF that prevent backfill from re-delivering the missing blocks and closing the temp-archive coverage gap

Technical Notes

Background

TempArchiveUploadTask sends PersistedNotification(success=true) and calls addStoredBlockRange after each successfully flushed S3 part, before the multipart upload is complete. If a later part or completeMultipartUpload fails, abortQuietly removes all uploaded data from S3, but the optimistic ASF registrations are not retracted.

The gap this leaves in tempArchiveTracker cannot be closed in-process: StartupRecoveryTask only rebuilds the tracker from .meta files, and .meta is written only after a successful doCompleteMultipartUpload, so an aborted upload leaves nothing for recovery to find. The only path to eventual consolidation is for backfill to re-deliver the missing blocks into a fresh TempArchiveUploadTask. But backfill skips blocks that ASF already reports as stored, so the ghost ranges block re-delivery indefinitely and the group is never consolidated until the node restarts and ASF state is rebuilt from scratch.

Notes

  • Remove the per-part addStoredBlockRange calls inside the while loop and the final-buffer block in TempArchiveUploadTask.
  • Add a single addStoredBlockRange(new LongRange(firstBlock, lastBlock)) call after doCompleteMultipartUpload and doUploadTextFile (.meta) both succeed — matching the point at which the block range becomes durable and discoverable by StartupRecoveryTask.
  • The PersistedNotification(success=true) sends can similarly be deferred to a single notification at the same point; or kept per-part as an optimistic hint to downstream consumers (same trade-off as BlockUploadTask).
  • On the failure path, only the PersistedNotification(success=false) for the last-attempted range needs to remain; no ASF cleanup is required because addStoredBlockRange was never called.
  • Update TempArchiveUploadTaskTest to assert that no addStoredBlockRange call is made before doCompleteMultipartUpload returns successfully, and that a single call covering [firstBlock, lastBlock] is made on success.
  • In StartupRecoveryTask.recoverFromCompletedObjects(), instead of only finding the last key (findLastKey), enumerate all tar objects under the configured prefix and collect every group's block range.
  • In completeRecoveryIfReady(), replace the single addStoredBlockRange(new LongRange(0, nextBlockToQueue - 1)) call with per-archive calls, one addStoredBlockRange per entry in completedRanges.

Metadata

Metadata

Assignees

Labels

Block NodeIssues/PR related to the Block Node.S3 Cloud ArchiveIssues related to the S3 Cloud Archive functionality

Type

No fields configured for Task.

Projects

Status
🏃 Sprint Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions