You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The upload_artifact safe-output handler emits a malformed CreateArtifact request to the GitHub Actions artifact API, returning HTTP 400 and failing the entire safe_outputs job — even though the agent and all other safe-outputs (e.g. create_discussion) succeed.
Affected workflows & runs
Smoke Copilot — run 27451309410 (confirmed). Agent succeeded (19 turns, 873k tokens); create_discussion succeeded (discussion #38988); only upload_artifact failed and marked the whole job failed.
Smoke Copilot family failed repeatedly in the last 6h: runs 27451309410, 27447675349, 27444195593, 27443692618, 27439104838. Smoke Copilot is at ~95% failure (19/20 recent runs).
✗ Message 1 (upload_artifact) failed: ERR_VALIDATION: upload_artifact: failed to upload
artifact "gh-aw": artifact twirp CreateArtifact failed (400):
{"code":"malformed","msg":"the json request could not be decoded"}
Probable root cause
The upload_artifact handler constructs an invalid JSON body for the artifact-service CreateArtifact (twirp) endpoint, so the server rejects it with 400 malformed: the json request could not be decoded. Because a single failed safe-output message fails the whole safe_outputs job, a non-critical artifact upload takes down the run.
Proposed remediation
Inspect the upload_artifact request-body construction; validate it against the current artifact-service CreateArtifact schema (field names/types, encoding).
Add request-payload validation + a clear error before the API call.
Treat a failed optional upload_artifact as non-fatal to the safe_outputs job (or make it configurable), so one bad upload does not fail otherwise-successful runs.
Success criteria
Smoke Copilot safe_outputs job passes; upload_artifact produces a well-formed CreateArtifact request (no 400 malformed).
A failing optional artifact upload no longer fails the entire safe_outputs job.
Context
Filed by [aw] Failure Investigator (6h) — analyzed run 27452598516. Parent: Workflow Health Manager Issue Group #29109.
Related to #29109
Still recurring — 5 fresh upload_artifact→safe_outputs failures in the 2026-06-13 19:13Z 6h window (agent step succeeded, only Process Safe Outputs failed): Smoke Copilot §27471858644, Smoke Codex §27471836485, Smoke Claude §27471836462, Design Decision Gate §27471832454, PR Sous Chef §27471203716. The failing job sets upload_artifact_count / upload_artifact_slot_0_tmp_id outputs, confirming the same handler path. No fix landed yet — keeping open.
Signature shift — the upload_artifact CreateArtifact 400 did NOT reproduce in the 2026-06-14 01:38Z 6h window; the safe_outputs job now fails on a different handler.
The upload_artifactsafe-output handlerand the actions/upload-artifact step both succeeded in every run (artifacts finalized; upload_artifact_count / upload_artifact_slot_0_tmp_id outputs set). The CreateArtifact (400) malformed signature was not observed.
The safe_outputs job still fails (agent job = success), but on a distinct handler permission/context signature:
add_comment → HTTP 403 "Resource not accessible by integration" (default token lacks scope; in 27480363196 it targeted non-existent issue Welcome to Agentic Workflows! #335 → 404 → discussion-comment fallback → 403).
add_labels / remove_labels → "No issue/PR number available" (missing issue/PR target context on the scheduled run) [27481382799].
Action: confirm whether the upload_artifact 400 fix has landed — if so this issue is resolvable. The remaining safe_outputs failures are a different root cause (handler permission scope + missing-target context on scheduled runs) and warrant separate tracking rather than living under the 400 signature. Not closing this issue yet: the 400 was confirmed live ~3h before this window, and only 1–2 runs were deep-audited here.
Problem
The
upload_artifactsafe-output handler emits a malformed CreateArtifact request to the GitHub Actions artifact API, returning HTTP 400 and failing the entiresafe_outputsjob — even though the agent and all other safe-outputs (e.g.create_discussion) succeed.Affected workflows & runs
create_discussionsucceeded (discussion #38988); onlyupload_artifactfailed and marked the whole job failed.27451309410,27447675349,27444195593,27443692618,27439104838. Smoke Copilot is at ~95% failure (19/20 recent runs).upload_artifactas "tool not available" in AOAI variants — confirm whether the same handler path is involved.Evidence
Probable root cause
The
upload_artifacthandler constructs an invalid JSON body for the artifact-serviceCreateArtifact(twirp) endpoint, so the server rejects it with400 malformed: the json request could not be decoded. Because a single failed safe-output message fails the wholesafe_outputsjob, a non-critical artifact upload takes down the run.Proposed remediation
upload_artifactrequest-body construction; validate it against the current artifact-service CreateArtifact schema (field names/types, encoding).upload_artifactas non-fatal to thesafe_outputsjob (or make it configurable), so one bad upload does not fail otherwise-successful runs.Success criteria
safe_outputsjob passes;upload_artifactproduces a well-formed CreateArtifact request (no 400malformed).safe_outputsjob.Context
Filed by [aw] Failure Investigator (6h) — analyzed run 27452598516. Parent: Workflow Health Manager Issue Group #29109.
Related to #29109
Still recurring — 5 fresh
upload_artifact→safe_outputsfailures in the 2026-06-13 19:13Z 6h window (agent step succeeded, only Process Safe Outputs failed): Smoke Copilot §27471858644, Smoke Codex §27471836485, Smoke Claude §27471836462, Design Decision Gate §27471832454, PR Sous Chef §27471203716. The failing job setsupload_artifact_count/upload_artifact_slot_0_tmp_idoutputs, confirming the same handler path. No fix landed yet — keeping open.Signature shift — the
upload_artifactCreateArtifact 400 did NOT reproduce in the 2026-06-14 01:38Z 6h window; thesafe_outputsjob now fails on a different handler.Fresh
auditof the Smoke safe-outputs failures this window — Smoke Claude §27480363196, Smoke Claude §27478319746, Smoke Copilot §27481382799:upload_artifactsafe-output handler and theactions/upload-artifactstep both succeeded in every run (artifacts finalized;upload_artifact_count/upload_artifact_slot_0_tmp_idoutputs set). TheCreateArtifact (400) malformedsignature was not observed.safe_outputsjob still fails (agent job = success), but on a distinct handler permission/context signature:add_comment→ HTTP 403 "Resource not accessible by integration" (default token lacks scope; in 27480363196 it targeted non-existent issue Welcome to Agentic Workflows! #335 → 404 → discussion-comment fallback → 403).add_labels/remove_labels→ "No issue/PR number available" (missing issue/PR target context on the scheduled run) [27481382799].Action: confirm whether the upload_artifact 400 fix has landed — if so this issue is resolvable. The remaining
safe_outputsfailures are a different root cause (handler permission scope + missing-target context on scheduled runs) and warrant separate tracking rather than living under the 400 signature. Not closing this issue yet: the 400 was confirmed live ~3h before this window, and only 1–2 runs were deep-audited here.Filed by [aw] Failure Investigator (6h) — analyzed run https://github.com/github/gh-aw/actions/runs/27484878458.