Skip to content

Fix kubernetes container state#7214

Closed
moelhoussein wants to merge 6 commits into
apache:masterfrom
moelhoussein:fix-kubernetes-container-state
Closed

Fix kubernetes container state#7214
moelhoussein wants to merge 6 commits into
apache:masterfrom
moelhoussein:fix-kubernetes-container-state

Conversation

@moelhoussein

Copy link
Copy Markdown

Why are the changes needed?

This PR fixes #7195 where if kyuubi.kubernetes.application.state.source is CONTAINER and kubernetes fails to pull the image, or the image name is not valid, or any failure ocurs, kyuubi marks the application as pending, forever.

How was this patch tested?

  • Added unit tests in KubernetesApplicationOperationSuite:
    To run the targeted suite:
    ./build/mvn -pl kyuubi-server -DskipITs -Dtest=org.apache.kyuubi.engine.KubernetesApplicationOperationSuite test

Was this patch authored or co-authored using generative AI tooling?

No

MElHoussein added 3 commits September 22, 2025 21:32
…AILED; handle empty reason as PENDING; unify constant as PENDING_WAITING_REASONS; add buildSparkAppUrl overload; restore POD IP URL test; add tests for failure waiting reasons; revert .idea/vcs.xml

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where Kubernetes applications in waiting state due to container failures (like image pull errors) were incorrectly marked as PENDING forever instead of FAILED. The fix introduces proper state mapping based on container waiting reasons.

  • Adds logic to distinguish between valid pending reasons and failure reasons in container waiting states
  • Updates container state mapping to mark applications as FAILED when appropriate error conditions occur
  • Includes comprehensive unit tests for the new container state logic

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
KubernetesApplicationOperation.scala Implements proper container state mapping logic with failure detection
KubernetesApplicationOperationSuite.scala Adds unit tests covering pending and failure waiting reasons

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@turboFei turboFei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you fix the code style with dev/reformat?

@codecov-commenter

codecov-commenter commented Sep 28, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (2431435) to head (4f83667).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
...kyuubi/engine/KubernetesApplicationOperation.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7214   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         695     696    +1     
  Lines       43479   43546   +67     
  Branches     5882    5892   +10     
======================================
- Misses      43479   43546   +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@moelhoussein

Copy link
Copy Markdown
Author

LGTM, could you fix the code style with dev/reformat?

@turboFei done.

This reverts commit e869a46.
@turboFei turboFei closed this in b5d7f58 Sep 29, 2025
turboFei added a commit that referenced this pull request Sep 29, 2025
### Why are the changes needed?

This PR fixes #7195 where if `kyuubi.kubernetes.application.state.source` is `CONTAINER` and kubernetes fails to pull the image, or the image name is not valid, or any failure ocurs, kyuubi marks the application as pending, forever.

### How was this patch tested?

- Added unit tests in KubernetesApplicationOperationSuite:
To run the targeted suite:
`./build/mvn -pl kyuubi-server -DskipITs -Dtest=org.apache.kyuubi.engine.KubernetesApplicationOperationSuite test`

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #7214 from moelhoussein/fix-kubernetes-container-state.

Closes #7214

4f83667 [Wang, Fei] Revert "reformatted"
e869a46 [MElHoussein] reformatted
8ede9aa [Wang, Fei] code style
d3b9cd2 [MElHoussein] revert unrelated test/url overload changes; keep only container waiting-state logic change
7d346fc [MElHoussein] engine(k8s): treat only specific waiting reasons as PENDING; others FAILED; handle empty reason as PENDING; unify constant as PENDING_WAITING_REASONS; add buildSparkAppUrl overload; restore POD IP URL test; add tests for failure waiting reasons; revert .idea/vcs.xml
ca94d64 [MElHoussein] Fixing container state

Lead-authored-by: MElHoussein <MElHoussein@geico.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit b5d7f58)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
@turboFei

turboFei commented Sep 29, 2025

Copy link
Copy Markdown
Member

thanks, merged to master(1.11.0) and branch-1.10(1.10.3)

@turboFei turboFei added this to the v1.10.3 milestone Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Kyuubi POD pending forever when kyuubi.kubernetes.application.state.source=Container and imageName is not valid.

4 participants