feat(executor): report artifact metadata before upload completes#16104
Open
SimbaKingjoe wants to merge 1 commit into
Open
feat(executor): report artifact metadata before upload completes#16104SimbaKingjoe wants to merge 1 commit into
SimbaKingjoe wants to merge 1 commit into
Conversation
f68cde8 to
4902409
Compare
Generate artifact keys (S3/GCS/etc) before uploading, then report outputs with full metadata to the controller immediately. Actual artifact uploads run concurrently via goroutines with WaitGroup synchronization. - Extract key generation from saveArtifactFromFile into generateArtifactKey - Add StagedArtifact type pairing artifact metadata with local file paths - Add GenerateArtifactOutputs to stage files and generate keys without uploading - Add SaveArtifactsAsync to upload staged artifacts concurrently - Update wait container flow: GenerateArtifactOutputs \u2192 ReportOutputs \u2192 SaveArtifactsAsync - Add unit tests for generateArtifactKey, GenerateArtifactOutputs, SaveArtifactsAsync Refs argoproj#16091 Signed-off-by: daixin1204 <daixin1204@gmail.com>
4902409 to
8156a36
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Report artifact metadata (S3 keys, types) to the controller before the actual upload completes. The wait container generates artifact keys from the deterministic archive location, reports outputs immediately, then uploads files concurrently via goroutines. This allows the controller to access artifact metadata earlier, and makes uploads non-blocking within the wait container.
Fixes
#16091
Motivation
Today,
SaveArtifactsuploads all output artifacts synchronously beforeReportOutputs. For large artifacts (build caches, ML models), this blocks the wait container and prevents the controller from seeing any output metadata until every byte is uploaded. For downstream DAG/Steps tasks that do NOT consume these artifacts, this is unnecessary delay.This PR lays the foundation for full async artifact uploads by:
Modifications
workflow/executor/executor.go(+111/-13):generateArtifactKey()— new method that computes the artifact's S3/GCS key from the archive location, without uploadingStagedArtifact— new struct pairing an artifact with its staged local file pathGenerateArtifactOutputs()— stages artifact files and populates keys/types, returns[]StagedArtifactSaveArtifactsAsync()— uploads staged artifacts concurrently via goroutines withsync.WaitGroup+ error aggregationsaveArtifactFromFile()— refactored to callgenerateArtifactKeyinstead of inlining key generationcmd/argoexec/commands/wait.go(+21/-1):GenerateArtifactOutputs→ReportOutputs→SaveArtifactsAsyncworkflow/executor/executor_test.go(+47):TestGenerateArtifactKey— verifies key generation and HasKey() idempotencyTestGenerateArtifactOutputs_EmptyOutputs— empty outputs return nilTestSaveArtifactsAsync_EmptyList— nil/empty input returns nilVerification
Behavioral Impact
FinalizeOutput(which setsreport-outputs-completed=true) is still deferred and runs after all uploads complete. Downstream DAG dependencies still wait correctly.taskResultReconciliationcopies outputs to NodeStatus regardless of completion flag (taskresult.go:124-143), so artifact keys are visible in UI/CLI as soon asReportOutputscompletes.