[cuebot] Release procs on frame-complete for scheduler-managed shows#2403
Conversation
After AcademySoftwareFoundation#2323, the frame-complete proc-reuse path in FrameCompleteHandler still rebooked procs for scheduler-managed shows. That races the Rust scheduler, stranding pk_frame=NULL procs that hold reserved cores, so hosts report too few idle cores. Gate the reuse path on showDao.isSchedulerManaged: for non-local procs on scheduler-managed shows, unbookProc immediately instead of queueing DispatchNextFrame. This frees the cores and avoids double-dispatching a frame the scheduler owns. Add FrameCompleteHandlerTests coverage for both the scheduler-managed (proc released) and control (proc reused) cases.
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughFrame complete handler now exits early for scheduler-managed shows: when a frame is WAITING or SUCCEEDED and the proc is not locally dispatched, it unbooks the proc and returns immediately, skipping stranded-core handling and re-dispatch/next-frame scheduling. Tests validate release vs reuse behavior. ChangesScheduler-Managed Show Proc Release
Sequence Diagram(s)sequenceDiagram
participant FrameCompleteHandler
participant ShowDao
participant ProcDao
participant DispatchSupport
FrameCompleteHandler->>ShowDao: isSchedulerManaged(showId)
alt scheduler-managed == true
FrameCompleteHandler->>ProcDao: unbookProc(proc, "scheduler-managed show, releasing proc")
FrameCompleteHandler-->>DispatchSupport: return (stop further processing)
else scheduler-managed == false
FrameCompleteHandler->>DispatchSupport: continue stranded-core / re-dispatch flow
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cuebot/src/test/java/com/imageworks/spcue/test/dispatcher/FrameCompleteHandlerTests.java (2)
564-566: ⚡ Quick winMake scheduler-managed state explicit for both test branches.
The helper currently depends on fixture defaults for the
falsecase. Set the show flag unconditionally so the test setup is deterministic.Suggested change
- if (schedulerManaged) { - showDao.updateSchedulerManaged(showDao.getShowDetail(proc.getShowId()), true); - } + showDao.updateSchedulerManaged( + showDao.getShowDetail(proc.getShowId()), schedulerManaged);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cuebot/src/test/java/com/imageworks/spcue/test/dispatcher/FrameCompleteHandlerTests.java` around lines 564 - 566, Replace the conditional that only updates scheduler-managed when true with an unconditional call to showDao.updateSchedulerManaged using the current schedulerManaged variable so the test explicitly sets the flag for both branches; locate the existing if (schedulerManaged) block around showDao.updateSchedulerManaged(showDao.getShowDetail(proc.getShowId()), true) and change it to always call showDao.updateSchedulerManaged(showDao.getShowDetail(proc.getShowId()), schedulerManaged) to make the setup deterministic.
590-604: ⚡ Quick winAdd coverage for the
WAITINGscheduler-managed branch.The new handler gate applies to
WAITINGandSUCCEEDED, but these tests currently exercise onlySUCCEEDED. Add aWAITINGvariant to lock in both sides of the condition.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cuebot/src/test/java/com/imageworks/spcue/test/dispatcher/FrameCompleteHandlerTests.java` around lines 590 - 604, The tests only exercise the SUCCEEDED branch of the handler gate; add coverage for the WAITING scheduler-managed branch by creating new test variants that mirror testSchedulerManagedShowReleasesProc and testNonSchedulerManagedShowReusesProc but drive the handler with a WAITING terminal state instead of SUCCEEDED. Use the existing helper procSurvivesReuseAfterComplete (or create a small wrapper) to invoke the same logic for WAITING, and add assertions matching the scheduler-managed and non-scheduler-managed expectations (assertFalse for scheduler-managed, assertTrue for non-scheduler-managed) to lock both sides of the condition; reference the existing test methods testSchedulerManagedShowReleasesProc and testNonSchedulerManagedShowReusesProc and the WAITING/SUCCEEDED gate names when locating where to add the new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@cuebot/src/test/java/com/imageworks/spcue/test/dispatcher/FrameCompleteHandlerTests.java`:
- Around line 564-566: Replace the conditional that only updates
scheduler-managed when true with an unconditional call to
showDao.updateSchedulerManaged using the current schedulerManaged variable so
the test explicitly sets the flag for both branches; locate the existing if
(schedulerManaged) block around
showDao.updateSchedulerManaged(showDao.getShowDetail(proc.getShowId()), true)
and change it to always call
showDao.updateSchedulerManaged(showDao.getShowDetail(proc.getShowId()),
schedulerManaged) to make the setup deterministic.
- Around line 590-604: The tests only exercise the SUCCEEDED branch of the
handler gate; add coverage for the WAITING scheduler-managed branch by creating
new test variants that mirror testSchedulerManagedShowReleasesProc and
testNonSchedulerManagedShowReusesProc but drive the handler with a WAITING
terminal state instead of SUCCEEDED. Use the existing helper
procSurvivesReuseAfterComplete (or create a small wrapper) to invoke the same
logic for WAITING, and add assertions matching the scheduler-managed and
non-scheduler-managed expectations (assertFalse for scheduler-managed,
assertTrue for non-scheduler-managed) to lock both sides of the condition;
reference the existing test methods testSchedulerManagedShowReleasesProc and
testNonSchedulerManagedShowReusesProc and the WAITING/SUCCEEDED gate names when
locating where to add the new tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 06bf8f04-6a14-439c-8283-6087d6c9f3e0
📒 Files selected for processing (2)
cuebot/src/main/java/com/imageworks/spcue/dispatcher/FrameCompleteHandler.javacuebot/src/test/java/com/imageworks/spcue/test/dispatcher/FrameCompleteHandlerTests.java
The first version wrote b_scheduler_managed via showDao.updateSchedulerManaged, which also primes ShowDaoJdbc's in-memory cache. That cache is not rolled back with the transaction, so it leaked scheduler-managed behavior into other tests (e.g. RedirectManagerTests). The control test was also nondeterministic: the depend-job reuse path can unbook the proc downstream regardless of the show flag. Stub isSchedulerManaged on a spy instead (no DB write, no cache mutation) and assert the branch via the unique unbookProc reason string, restoring the spied beans afterward. Deterministic and contamination-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ests The DispatchSupport/ShowDao beans are Spring JDK dynamic proxies (final classes), so Mockito.spy() fails with "Cannot mock/spy class $Proxy". Use mock(Interface.class, delegatesTo(realBean)) instead: it forwards all calls to the real bean except the stubbed isSchedulerManaged, while still recording invocations for verify(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Without a host on the FrameCompleteReport, report.getHost().getFreeMem() returns 0, which trips the < 512MB low-memory unbook in handlePostFrameCompleteOperations. The proc is then released at line ~454 and the method returns before reaching the WAITING/SUCCEEDED proc-reuse branch that holds the scheduler-managed gate, so the gate was never exercised. Attach an UP host with 8GB free so the flow reaches line 529. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gate fires for FrameState.WAITING || SUCCEEDED but the tests only drove SUCCEEDED. Parameterize executeProcReuseGate on the terminal FrameState and add WAITING variants for both the scheduler-managed (release) and control (no release) cases, locking both sides of the condition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
57ab286
into
AcademySoftwareFoundation:master
After #2323, the frame-complete proc-reuse path in FrameCompleteHandler still rebooked procs for scheduler-managed shows. That races the Rust scheduler, stranding pk_frame=NULL procs that hold reserved cores, so hosts report too few idle cores.
Gate the reuse path on showDao.isSchedulerManaged: for non-local procs on scheduler-managed shows, unbookProc immediately instead of queueing DispatchNextFrame. This frees the cores and avoids double-dispatching a frame the scheduler owns.
Add FrameCompleteHandlerTests coverage for both the scheduler-managed (proc released) and control (proc reused) cases.
LLM usage disclosure
Claude Opus was used to investigate and propose a fix for this issue
Summary by CodeRabbit
Bug Fixes
Tests