Title: Mission artifact corruption can trigger replay/resume churn because scheduling trusts mutable features.json
Summary
We hit an expensive mission-state integrity failure in Droid / Factory missions.
This was not a product-code regression.
It was a harness/state issue where a late artifact-writing failure left features.json malformed/inconsistent, and resume behavior then relied on that mutable file strongly enough to create replay/broad-resume risk.
The practical result was that a small late-stage artifact failure created unnecessary orchestration churn, token waste, and operator confusion near the end of a mostly completed mission.
What happened
Mission shape:
- large multi-step coding mission
- orchestrator + workers + validators
- mission artifacts included
features.json and validation-state.json
Observed sequence:
- core product work was already largely complete
- a late proof/follow-up/finalization phase was running
- an artifact-writing step failed
- malformed JSON / broken artifact write
- later we also saw synthesis-writing failures
features.json ended up malformed or inconsistent
- resume/start logic trusted
features.json strongly enough that replay/broad-resume behavior followed
- this created unnecessary re-planning / re-validation risk instead of a narrow repair-only path
Important clarification
This was not a true runtime completed -> pending regression for already completed product work.
In our case:
- some follow-up validation state had genuinely failed or not completed yet
- later manual artifact/state repair attempted to finalize the mission state
- then malformed/inconsistent
features.json made scheduling unreliable
So the problem was not “successful product work vanished”.
The problem was:
- mutable mission summary state
- artifact corruption
- scheduler trusting that corrupted summary strongly enough to cause replay/resume churn
Why this is costly
A late artifact error should be cheap to recover from.
Instead, it can currently cause:
- replay risk for already completed work
- broad milestone resume behavior
- repeated re-reading / re-planning / re-validation
- large token waste
- lower operator trust in resume semantics
This is especially painful near the end of long-running missions.
Verified behavior from our incident
validation-state.json preserved behavioral truth more reliably than features.json
features.json was the effective scheduling truth for resume/start behavior
- malformed/inconsistent
features.json directly interfered with scheduling
- broad replay/resume risk came from artifact state, not from a real product regression
Requested changes
1. Make mission artifact writes transactional
For files like:
features.json
validation-state.json
- synthesis artifacts that affect scheduling
Please use:
- write temp file
- parse temp file
- schema-validate temp file
- atomically replace original only on success
If validation fails, keep the old file untouched.
2. Do not use mutable features.json as sole scheduling truth
Please add a more durable execution truth, such as:
- append-only event log
- durable runner state
- immutable feature lifecycle history
Examples of durable facts:
- feature started
- feature completed
- feature failed
- validator completed
- validator failed
- worker session ids
- milestone transitions
features.json should be a projection/summary, not the only source of scheduling truth.
3. Enforce monotonic state transitions
A successfully completed runtime step should not be silently treated as pending again just because a mutable artifact was damaged or rewritten.
At minimum:
completed -> pending should require explicit repair mode / reason
- corruption should not implicitly broaden pending work
4. Fail closed on artifact corruption
If features.json is malformed or inconsistent:
- do not broaden replay
- do not recompute pending work from partial state
- do not milestone-resume broadly
Instead:
- enter repair mode
- require artifact repair first
- then resume only the truly pending step
5. Distinguish implementation failures from finalization/synthesis failures
A late failure while writing synthesis/finalization artifacts should not be treated like unfinished product implementation.
Please distinguish:
- implementation/product step failures
- finalization/bookkeeping/synthesis failures
Finalization failures should trigger:
- repair-in-place
- not replay of completed product work
6. Cross-check resume state against stronger evidence
Before resuming from mission artifacts, cross-check:
- durable execution history
- worker handoffs
- validator handoffs
validation-state.json
- current
features.json
If they disagree:
- stop
- enter repair mode
- do not schedule blindly from the mutable summary file
Minimum acceptance criteria
A single malformed mission artifact should not cause replay of completed work.
Specifically:
- completed runtime work remains recoverable even if summary files are damaged
- artifact corruption blocks scheduling instead of widening it
- finalization/synthesis failures do not reopen completed product work
- resume can recover from handoffs / durable state rather than only from
features.json
Additional note
We also observed that operators currently need to be extremely explicit with the orchestrator to avoid broad resume behavior after artifact issues.
That is workable as a temporary mitigation, but the core issue appears to be harness/state integrity rather than prompt quality.
If helpful
I can provide a more structured incident report with:
- timeline
- verified facts
- root cause
- contributing factors
- corrective actions
Title: Mission artifact corruption can trigger replay/resume churn because scheduling trusts mutable
features.jsonSummary
We hit an expensive mission-state integrity failure in Droid / Factory missions.
This was not a product-code regression.
It was a harness/state issue where a late artifact-writing failure left
features.jsonmalformed/inconsistent, and resume behavior then relied on that mutable file strongly enough to create replay/broad-resume risk.The practical result was that a small late-stage artifact failure created unnecessary orchestration churn, token waste, and operator confusion near the end of a mostly completed mission.
What happened
Mission shape:
features.jsonandvalidation-state.jsonObserved sequence:
features.jsonended up malformed or inconsistentfeatures.jsonstrongly enough that replay/broad-resume behavior followedImportant clarification
This was not a true runtime
completed -> pendingregression for already completed product work.In our case:
features.jsonmade scheduling unreliableSo the problem was not “successful product work vanished”.
The problem was:
Why this is costly
A late artifact error should be cheap to recover from.
Instead, it can currently cause:
This is especially painful near the end of long-running missions.
Verified behavior from our incident
validation-state.jsonpreserved behavioral truth more reliably thanfeatures.jsonfeatures.jsonwas the effective scheduling truth for resume/start behaviorfeatures.jsondirectly interfered with schedulingRequested changes
1. Make mission artifact writes transactional
For files like:
features.jsonvalidation-state.jsonPlease use:
If validation fails, keep the old file untouched.
2. Do not use mutable
features.jsonas sole scheduling truthPlease add a more durable execution truth, such as:
Examples of durable facts:
features.jsonshould be a projection/summary, not the only source of scheduling truth.3. Enforce monotonic state transitions
A successfully completed runtime step should not be silently treated as pending again just because a mutable artifact was damaged or rewritten.
At minimum:
completed -> pendingshould require explicit repair mode / reason4. Fail closed on artifact corruption
If
features.jsonis malformed or inconsistent:Instead:
5. Distinguish implementation failures from finalization/synthesis failures
A late failure while writing synthesis/finalization artifacts should not be treated like unfinished product implementation.
Please distinguish:
Finalization failures should trigger:
6. Cross-check resume state against stronger evidence
Before resuming from mission artifacts, cross-check:
validation-state.jsonfeatures.jsonIf they disagree:
Minimum acceptance criteria
A single malformed mission artifact should not cause replay of completed work.
Specifically:
features.jsonAdditional note
We also observed that operators currently need to be extremely explicit with the orchestrator to avoid broad resume behavior after artifact issues.
That is workable as a temporary mitigation, but the core issue appears to be harness/state integrity rather than prompt quality.
If helpful
I can provide a more structured incident report with: