Skip to content

Fix duplicate containerStatuses in Kubernetes app diagnose error output#7486

Open
turboFei wants to merge 1 commit into
apache:masterfrom
turboFei:k8s_status
Open

Fix duplicate containerStatuses in Kubernetes app diagnose error output#7486
turboFei wants to merge 1 commit into
apache:masterfrom
turboFei:k8s_status

Conversation

@turboFei

@turboFei turboFei commented Jun 2, 2026

Copy link
Copy Markdown
Member

Why are the changes needed?

When a Spark-on-Kubernetes application fails, getApplicationStateAndErrorFromPod builds a diagnostic JSON map containing both PodStatus (the full Kubernetes PodStatus object) and ContainerStatus (the specific driver container's status extracted from pod.getStatus.getContainerStatuses).

The problem is that PodStatus already embeds containerStatuses[] internally. For a single-container pod (the typical Spark driver), PodStatus.containerStatuses[0] and the top-level ContainerStatus are identical, producing duplicated data in the error output.

Before — same container error appears twice:

{
  "PodStatus": {
    "conditions": [...],
    "containerStatuses": [{ "name": "spark-kubernetes-driver", "state": { "terminated": { "exitCode": 1, "message": "..." } } }]
  },
  "ContainerStatus":    { "name": "spark-kubernetes-driver", "state": { "terminated": { "exitCode": 1, "message": "..." } } }
}

After — clean separation of pod-level and container-level info:

{
  "PodStatus": { "conditions": [...], "phase": "Failed", "hostIP": "...", "podIP": "..." },
  "ContainerStatus": { "name": "spark-kubernetes-driver", "state": { "terminated": { "exitCode": 1, "message": "..." } } }
}

How was this patch tested?

Manual verification with a failing Spark-on-Kubernetes job. The diagnostic JSON no longer contains duplicate container status data. PodStatus retains pod-level fields (conditions, phase, hostIP, podIP, qosClass) and ContainerStatus retains the full driver container state with exit code and error message.

The POD-mode fallback path (no ContainerStatus sibling) is unaffected — it continues to emit the full PodStatus including containerStatuses.

Related: https://github.corp.ebay.com/hadoop/kyuubi/pull/868

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-sonnet-4-6

…p diagnose error output

When building the application error diagnostic JSON, PodStatus is
serialized as the full Kubernetes PodStatus object which already
embeds containerStatuses[] internally. ContainerStatus is then
separately included as a sibling key — for a single-container pod
(e.g. Spark driver), these two fields contain identical data.

Fix: clear containerStatuses on the PodStatus object before
serializing, in the branch where ContainerStatus is separately
present. The POD-mode fallback (no ContainerStatus sibling) is
unaffected and retains the full PodStatus.

Before:
  PodStatus.containerStatuses[0] == ContainerStatus  (duplicated)

After:
  PodStatus contains only pod-level info (conditions, phase, IPs)
  ContainerStatus contains the full driver container state
@turboFei turboFei changed the title [KYUUBI #7441] Fix duplicate containerStatuses in Kubernetes app diagnose error output Fix duplicate containerStatuses in Kubernetes app diagnose error output Jun 2, 2026
@turboFei turboFei changed the title Fix duplicate containerStatuses in Kubernetes app diagnose error output [KYUUBI #7486] Fix duplicate containerStatuses in Kubernetes app diagnose error output Jun 2, 2026
@turboFei turboFei changed the title [KYUUBI #7486] Fix duplicate containerStatuses in Kubernetes app diagnose error output Fix duplicate containerStatuses in Kubernetes app diagnose error output Jun 2, 2026
@turboFei turboFei requested a review from pan3793 June 6, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants