Skip to content

[codex] Ignore stale terminal scheduler heartbeats#5240

Draft
serrrfirat wants to merge 1 commit into
mainfrom
codex/scheduler-stale-heartbeat
Draft

[codex] Ignore stale terminal scheduler heartbeats#5240
serrrfirat wants to merge 1 commit into
mainfrom
codex/scheduler-stale-heartbeat

Conversation

@serrrfirat

Copy link
Copy Markdown
Collaborator

Summary

  • classify scheduler heartbeat InvalidTransition { from: terminal, to: Running } as a stale terminal observation instead of scheduler_heartbeat_failed
  • keep all other heartbeat errors, including timeouts, on the existing terminal-failure path
  • add a caller-level scheduler contract regression and a crate-local guardrail note

Fixes #5239.

Evidence

Railway deployment 3e28add8-824b-42c1-bea4-d2533fdddc48 at commit 44f063d97fcf204d78cde198c262c9881a704bb3 logged this sequence for run b0628cba-5564-4542-ad77-0f506f465173:

2026-06-25T12:19:23.725856Z DEBUG ironclaw_host_runtime::turn_scheduler: turn run scheduler heartbeat failed error=invalid turn transition from Completed to Running
2026-06-25T12:19:23.757905Z DEBUG ironclaw_host_runtime::turn_scheduler: turn run scheduler terminal failure transition failed error=invalid turn transition from Completed to Failed

The regression test was added before the runtime patch and failed on unpatched origin/main with:

stale heartbeat after Completed must not be recorded as scheduler_heartbeat_failed

Validation

  • cargo test -p ironclaw_host_runtime scheduler_does_not_fail_completed_run_for_stale_terminal_heartbeat --test turn_scheduler_contract -- --nocapture failed before the fix, passes after
  • cargo test -p ironclaw_host_runtime scheduler_records_failure_when_heartbeat_call_times_out --test turn_scheduler_contract -- --nocapture
  • cargo fmt --check
  • cargo test -p ironclaw_host_runtime
  • cargo clippy -p ironclaw_host_runtime --all-targets -- -D warnings

Notes

This intentionally does not relax ironclaw_turns transition rules. The store should continue rejecting invalid terminal transitions; the scheduler should avoid misclassifying a stale terminal heartbeat as a new runner failure.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 078c81d7-f652-49b7-aea5-e8f01ea5f258

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added scope: docs Documentation size: M 50-199 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Jun 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the turn scheduler to treat heartbeat errors against already-terminal runs as stale scheduler observations rather than new executor failures. It introduces a HeartbeatOutcome enum to distinguish between renewed, already-terminal, and failed heartbeats, preventing scheduler races from incorrectly marking completed runs as failed. A comprehensive integration test has been added to verify this behavior. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@railway-app

railway-app Bot commented Jun 25, 2026

Copy link
Copy Markdown

🚅 Deployed to the ironclaw-pr-5240 environment in ironclaw-ci-preview

Service Status Web Updated (UTC)
ironclaw ✅ Success (View Logs) Web Jun 25, 2026 at 12:55 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: docs Documentation size: M 50-199 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scheduler treats stale terminal heartbeat as runner failure

1 participant