Skip to content

feat: Add completing status#423

Open
L-Xiafeng wants to merge 2 commits intomasterfrom
dev/completing
Open

feat: Add completing status#423
L-Xiafeng wants to merge 2 commits intomasterfrom
dev/completing

Conversation

@L-Xiafeng
Copy link
Copy Markdown
Collaborator

@L-Xiafeng L-Xiafeng commented Apr 4, 2026

Summary by CodeRabbit

  • New Features

    • Step status updates can include a final job status for clearer termination reporting.
    • Supervisor initialization can enable a SLURM-compatible environment option.
    • Runtime now exposes a "completing" node list to make step lifecycle phases more visible.
    • Terminate-step requests no longer support marking steps as "orphaned".
  • Refactor

    • Removed the legacy orphaned-step termination endpoint.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 81878eaa-8188-46a4-a30a-64cc779a955b

📥 Commits

Reviewing files that changed from the base of the PR and between 02c0ba0 and 578b902.

📒 Files selected for processing (3)
  • protos/Crane.proto
  • protos/PublicDefs.proto
  • protos/Supervisor.proto
🚧 Files skipped from review as they are similar to previous changes (2)
  • protos/Supervisor.proto
  • protos/Crane.proto

📝 Walkthrough

Walkthrough

Added and removed protobuf declarations: StepStatusChangeRequest.final_status and RuntimeAttrOfStep.completing_nodes were added; InitSupervisorRequest.enable_slurm_compatible_env was added; TerminateStepRequest.mark_orphaned was reserved/removed; Craned.TerminateOrphanedStep and its request/reply messages were removed.

Changes

Cohort / File(s) Summary
Craned protos
protos/Crane.proto
Added optional JobStatus final_status = 8 to StepStatusChangeRequest. Removed rpc TerminateOrphanedStep(...) from service Craned and deleted TerminateOrphanedStepRequest / TerminateOrphanedStepReply.
Supervisor protos
protos/Supervisor.proto
Added bool enable_slurm_compatible_env = 31 to InitSupervisorRequest. Removed mark_orphaned from TerminateStepRequest by reserving tag 1; retained terminated_by_user = 2.
PublicDefs runtime attrs
protos/PublicDefs.proto
Added repeated string completing_nodes = 19 to RuntimeAttrOfStep.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • RileyWen
  • Nativu5

Poem

🐰 I nibble proto lines at night,

New final flags gleam in the light,
One RPC hops off the trail,
Completing nodes leave a trail,
I thump my foot — deploy takes flight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add completing status' is directly related to the main change: adding a 'completing_nodes' field and associated completing status support across the protobuf definitions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/completing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
protos/Supervisor.proto (1)

110-110: Remove trailing semicolon after message closing brace for consistency.

The semicolon after } on line 110 is valid proto3 syntax but inconsistent with other message definitions in this file (e.g., BindFsConfig, SubIdConfig, DnsConfig on lines 54, 61, 69).

Suggested fix
-  };
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@protos/Supervisor.proto` at line 110, Remove the trailing semicolon
immediately after the closing brace of the message definition (the stray "};")
so the message ends with "}" to match the style used by BindFsConfig,
SubIdConfig, DnsConfig; locate the message whose closing brace is followed by a
semicolon and delete that semicolon to make the file consistent with other proto
message definitions.
protos/Crane.proto (1)

28-37: Consider adding a comment to clarify the distinction between new_status and final_status.

The message now has two JobStatus fields: new_status (field 4) and final_status (field 8). The semantic difference between these fields isn't immediately clear from the proto definition alone. Based on the PR title "Add completing status," final_status likely represents the terminal state after the step completes, while new_status represents the current transition.

A brief comment would improve maintainability for future contributors.

Suggested documentation
   string reason = 6;
   google.protobuf.Timestamp timestamp = 7;
+  // The terminal status of the step, if it has reached a final state.
+  // Distinct from new_status which represents the current state transition.
   optional JobStatus final_status = 8;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@protos/Crane.proto` around lines 28 - 37, The proto message
StepStatusChangeRequest defines two JobStatus fields (new_status and
final_status) whose roles are ambiguous; add a concise comment above the fields
in the StepStatusChangeRequest message explaining that new_status is the
immediate status transition for the step (the in-flight/current update) while
final_status (optional) denotes the terminal/completing state after the step
finishes, and indicate when final_status is expected to be set (e.g., only for
completion events) so future readers and code generators understand their
semantic difference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@protos/Crane.proto`:
- Around line 28-37: The proto message StepStatusChangeRequest defines two
JobStatus fields (new_status and final_status) whose roles are ambiguous; add a
concise comment above the fields in the StepStatusChangeRequest message
explaining that new_status is the immediate status transition for the step (the
in-flight/current update) while final_status (optional) denotes the
terminal/completing state after the step finishes, and indicate when
final_status is expected to be set (e.g., only for completion events) so future
readers and code generators understand their semantic difference.

In `@protos/Supervisor.proto`:
- Line 110: Remove the trailing semicolon immediately after the closing brace of
the message definition (the stray "};") so the message ends with "}" to match
the style used by BindFsConfig, SubIdConfig, DnsConfig; locate the message whose
closing brace is followed by a semicolon and delete that semicolon to make the
file consistent with other proto message definitions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43ff6cae-fd33-47b2-807b-d15b07f5e89f

📥 Commits

Reviewing files that changed from the base of the PR and between 6a63477 and ebc18b4.

📒 Files selected for processing (2)
  • protos/Crane.proto
  • protos/Supervisor.proto

@L-Xiafeng L-Xiafeng marked this pull request as draft April 4, 2026 06:34
@L-Xiafeng L-Xiafeng added the enhancement New feature or request label Apr 8, 2026
@L-Xiafeng L-Xiafeng marked this pull request as ready for review April 8, 2026 15:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
protos/Supervisor.proto (1)

153-154: Reserve the removed field name, not only its tag.

Line 153 reserves field number 1, but the deleted name can still be reused accidentally later. In protobuf, reserve both the tag and the field name for safer long-term schema evolution.

Suggested protobuf compatibility tweak
 message TerminateStepRequest {
   reserved 1;  // was mark_orphaned
+  reserved "mark_orphaned";
   bool terminated_by_user = 2;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@protos/Supervisor.proto` around lines 153 - 154, The proto currently only
reserves the numeric tag (reserved 1) but not the removed field name, so add a
reservation for the original field name (mark_orphaned) in Supervisor.proto
alongside the existing numeric reservation; update the reserved statement to
include both the tag and the name (e.g., reserve 1 and "mark_orphaned") so the
removed name cannot be re-used later and leave the new field terminated_by_user
= 2 unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@protos/Crane.proto`:
- Line 36: The proto uses proto3 optional (the field "optional JobStatus
final_status = 8" in Crane.proto) which requires protoc >= 3.12; update the repo
to enforce that by either documenting a minimum protoc version in the README
(e.g., "protoc >= 3.12") and |or| adding a build/CI validation that runs "protoc
--version" and fails if the version is older than 3.12; modify the Makefile or
CI script that invokes protoc so it calls the validation step before codegen to
ensure reproducible builds.

---

Nitpick comments:
In `@protos/Supervisor.proto`:
- Around line 153-154: The proto currently only reserves the numeric tag
(reserved 1) but not the removed field name, so add a reservation for the
original field name (mark_orphaned) in Supervisor.proto alongside the existing
numeric reservation; update the reserved statement to include both the tag and
the name (e.g., reserve 1 and "mark_orphaned") so the removed name cannot be
re-used later and leave the new field terminated_by_user = 2 unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5fa3ca93-2916-4d6f-b0a4-9f23de1cfb9e

📥 Commits

Reviewing files that changed from the base of the PR and between 5425516 and 39d32d7.

📒 Files selected for processing (3)
  • protos/Crane.proto
  • protos/PublicDefs.proto
  • protos/Supervisor.proto
🚧 Files skipped from review as they are similar to previous changes (1)
  • protos/PublicDefs.proto

Comment thread protos/Crane.proto
uint32 exit_code = 5;
string reason = 6;
google.protobuf.Timestamp timestamp = 7;
optional JobStatus final_status = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only check: locate protobuf/protoc version pins in common build/config files.
fd -HI 'go\.mod|go\.sum|package\.json|pnpm-lock\.yaml|yarn\.lock|buf\.ya?ml|Dockerfile|Makefile|CMakeLists\.txt' \
| xargs -r rg -n 'protoc|protobuf|google\.golang\.org/protobuf|github\.com/golang/protobuf|buf'

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 1521


🏁 Script executed:

# First, let's verify the actual content of Crane.proto at line 36
sed -n '30,42p' protos/Crane.proto

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 323


🏁 Script executed:

# Search for CI configuration files that might specify protoc versions
fd -HI '\.github|\.gitlab-ci|\.circleci|\.travis|azure-pipelines' -t f | head -20

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


🏁 Script executed:

# Look for Dockerfile or build scripts that might pin protoc
fd -HI 'Dockerfile|\.dockerfile|build\.sh|install\.sh' -t f

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


🏁 Script executed:

# Check if there's a buf.lock or other version lock files
find . -name 'buf.lock' -o -name 'protoc-version' -o -name '.protoc-version' 2>/dev/null

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


🏁 Script executed:

# Check README or documentation for protoc version requirements
find . -name 'README*' -o -name 'BUILD*' -o -name 'INSTALL*' | xargs cat 2>/dev/null | grep -i 'protoc\|protobuf' -A 2 -B 2

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


🏁 Script executed:

# Search for go:generate directives that might document protoc usage
rg 'go:generate.*protoc' -A 2

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


🏁 Script executed:

# Check if there's a .protoc-version or similar version file
rg 'syntax.*proto3' protos/ | head -5

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 232


🏁 Script executed:

# Look for any setup or installation scripts
fd -HI 'setup|install|bootstrap|requirements' -t f | xargs rg -l 'protoc' 2>/dev/null

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 188


🏁 Script executed:

# Check grpc.go for any protoc version information
cat -n internal/util/grpc.go | head -40

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 1480


🏁 Script executed:

# Check cmd.go for proto generation details
cat -n internal/ccon/cmd.go | grep -i 'proto\|buf\|grpc' -B 3 -A 3

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 1114


🏁 Script executed:

# Check the actual Makefile protoc invocation more carefully
sed -n '105,115p' Makefile

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 627


🏁 Script executed:

# Check if go.mod has any build directives or module info about protoc
head -30 go.mod

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 977


🏁 Script executed:

# Search for any documentation or comments about protobuf version requirements
rg 'protobuf|protoc' --type markdown --type txt --type rst 2>/dev/null | head -20

Repository: PKUHPC/CraneSched-FrontEnd

Length of output: 52


Document minimum protoc version or add version validation to the build.

Proto3 optional (used on line 36) is supported by all pinned Go protobuf libraries (v1.36.5, v1.5.4, v1.3.2), which are modern versions from 2021–2024. However, the Makefile invokes protoc without an explicit version constraint. While proto3 optional has been widely supported since protoc 3.12 (May 2020), for reproducibility and to prevent potential issues from older system-installed protoc versions, add a documented minimum protoc version requirement or CI validation to ensure protoc ≥ 3.12 is used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@protos/Crane.proto` at line 36, The proto uses proto3 optional (the field
"optional JobStatus final_status = 8" in Crane.proto) which requires protoc >=
3.12; update the repo to enforce that by either documenting a minimum protoc
version in the README (e.g., "protoc >= 3.12") and |or| adding a build/CI
validation that runs "protoc --version" and fails if the version is older than
3.12; modify the Makefile or CI script that invokes protoc so it calls the
validation step before codegen to ensure reproducible builds.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
protos/Crane.proto (1)

36-36: Document when final_status is valid.

This adds an important API invariant, but the proto contract does not say when final_status must be populated or which values are allowed. Please add a field comment clarifying its relationship to new_status—for example, whether it is only set while reporting a COMPLETING transition and must be a terminal status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@protos/Crane.proto` at line 36, Add a proto field comment for optional
JobStatus final_status = 8 that documents when it must be populated and its
allowed values: state that final_status is only set during a status transition
where new_status indicates the job is COMMITTING/COMPLETING (or other defined
"finishing" transition), that final_status must be a terminal JobStatus (e.g.,
SUCCEEDED, FAILED, CANCELLED) and that it represents the intended terminal state
after the transition; also clarify its relationship to new_status (new_status is
the immediate transition state, final_status is the eventual terminal state) and
whether it may be omitted for non-terminal transitions or when unknown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@protos/Crane.proto`:
- Line 36: Add a proto field comment for optional JobStatus final_status = 8
that documents when it must be populated and its allowed values: state that
final_status is only set during a status transition where new_status indicates
the job is COMMITTING/COMPLETING (or other defined "finishing" transition), that
final_status must be a terminal JobStatus (e.g., SUCCEEDED, FAILED, CANCELLED)
and that it represents the intended terminal state after the transition; also
clarify its relationship to new_status (new_status is the immediate transition
state, final_status is the eventual terminal state) and whether it may be
omitted for non-terminal transitions or when unknown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90db677d-dee7-4c68-980f-065a20589d4c

📥 Commits

Reviewing files that changed from the base of the PR and between 39d32d7 and 02c0ba0.

📒 Files selected for processing (3)
  • protos/Crane.proto
  • protos/PublicDefs.proto
  • protos/Supervisor.proto
🚧 Files skipped from review as they are similar to previous changes (2)
  • protos/PublicDefs.proto
  • protos/Supervisor.proto

Copilot AI review requested due to automatic review settings April 16, 2026 05:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the gRPC/protobuf contracts to expose a clearer “completing” lifecycle phase, allow step status updates to optionally include a final job status, add a SLURM-compatible environment toggle during supervisor init, and remove the orphaned-step termination API surface.

Changes:

  • Add enable_slurm_compatible_env to InitSupervisorRequest and remove mark_orphaned from TerminateStepRequest.
  • Expose completing_nodes in RuntimeAttrOfStep.
  • Add final_status to StepStatusChangeRequest and remove the legacy TerminateOrphanedStep* messages/RPC.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
protos/Supervisor.proto Adds SLURM env toggle; removes orphaned-step marking via reserved field.
protos/PublicDefs.proto Adds completing_nodes to step runtime attributes to reflect lifecycle phase visibility.
protos/Crane.proto Extends step status change payload; removes orphaned-step termination RPC/messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread protos/Crane.proto
Comment on lines 148 to 157

message TerminateStepsRequest {
map<uint32, JobStepIds> job_step_ids_map = 1;
}

message TerminateStepsReply {
bool ok = 1;
string reason = 2;
}

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the TerminateOrphanedStepRequest/Reply message types is a breaking change for any consumers that still compile against this proto (including mixed-version deployments). If backwards compatibility/rolling upgrades are a requirement, consider keeping these messages (and the RPC) marked as deprecated for at least one release, or providing an explicit compatibility plan/version gate.

Copilot uses AI. Check for mistakes.
Comment thread protos/Crane.proto
@@ -1423,7 +1416,6 @@
If the job is a batch job, just kill it.
*/
rpc TerminateSteps(TerminateStepsRequest) returns (TerminateStepsReply);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping the TerminateOrphanedStep RPC from the Craned service changes the gRPC surface area and can break mixed-version clusters (older clients calling this will start receiving UNIMPLEMENTED). If rolling upgrades are supported, consider leaving the RPC in place as deprecated and returning a clear error until all components are upgraded.

Suggested change
rpc TerminateSteps(TerminateStepsRequest) returns (TerminateStepsReply);
rpc TerminateSteps(TerminateStepsRequest) returns (TerminateStepsReply);
/*
Deprecated compatibility RPC retained to avoid breaking mixed-version clusters
during rolling upgrades. Implementations should return a clear deprecation error
until all components are upgraded.
*/
rpc TerminateOrphanedStep(google.protobuf.Empty) returns (google.protobuf.Empty);

Copilot uses AI. Check for mistakes.
Comment thread protos/Supervisor.proto

message TerminateStepRequest {
bool mark_orphaned = 1;
reserved 1; // was mark_orphaned
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When removing a proto field, reserving only the tag number still allows the old field name to be reintroduced with a different tag in the future, which can create confusing JSON/proto mappings. Consider also reserving the field name (e.g., reserving "mark_orphaned") alongside tag 1 to fully prevent reuse.

Suggested change
reserved 1; // was mark_orphaned
reserved 1; // was mark_orphaned
reserved "mark_orphaned";

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants