Pods evicted/drained are reported as terminated.reason=OOMKilled with exitCode 143 (no memory pressure)

/kind bug

**1. What `kops` version are you running? The command `kops version`, will display
 this information.**

`1.33.1`

**2. What Kubernetes version are you running? `kubectl version` will print the
 version if a cluster is running or provide the Kubernetes version specified as
 a `kops` flag.**

```
$ kubectl version
Client Version: v1.35.0
Kustomize Version: v5.7.1
Server Version: v1.32.10
```

**3. What cloud provider are you using?**

aws

**4. What commands did you run?  What is the simplest way to reproduce this issue?**

Reproduction scenario (simplest observed so far):

1. Create a Kubernetes cluster on AWS using kOps with
   - Kubernetes v1.32.x (kube-proxy `v1.32.10`),
   - `nodeTerminationHandler` enabled in the kOps cluster spec (using `aws-node-termination-handler`),
   - Standard kubelet resource reservations (no extreme overcommit).

2. Run a workload pod on a node, e.g.:
   - Namespace: `zookeeper`
   - Pod: `cluster-2`
   - Container: `zookeeper`
   - Container resources: `resources.requests.memory: 512Mi`, **no memory limit** set.
   - The pod runs for a long time (in our case since `2026-02-11T10:11:23Z`).

3. Trigger a node drain / pod eviction **not caused by memory pressure**, for example:
   - By draining the node (`kubectl drain <node> --force --ignore-daemonsets --delete-emptydir-data`), or
   - By causing the node to be terminated by AWS (e.g. via `aws-node-termination-handler` / ASG scale-in / maintenance),
   - Or via a rolling update that cordons & drains the node.

4. Observe the pod status after the eviction/termination:
   - `kubectl get pod -n zookeeper cluster-2 -o yaml`
   - And/or watch application logs and metrics around the time of termination.

We have reproduced this multiple times by evicting pods on kOps-managed nodes in this way.

**5. What happened after the commands executed?**

For an example pod (`zookeeper/cluster-2`):

- The pod is eventually marked as **`phase: Failed`**.
- The container (`zookeeper`) is reported in `pod.status.containerStatuses` as:

  - `state.terminated.reason = "OOMKilled"`
  - `state.terminated.exitCode = 143`
  - `state.terminated.startedAt = 2026-02-11T10:11:23Z`
  - `state.terminated.finishedAt = 2026-03-06T13:34:26Z`

- There is **no corresponding memory pressure or OOM** indication:

  - Prometheus (`kubelet` / cAdvisor) shows:
    - `container_memory_working_set_bytes{namespace="zookeeper",pod="cluster-2",container="zookeeper"}`
      is stable around ~350–370 MiB before termination, well below the 512Mi request.
    - `container_oom_events_total{namespace="zookeeper",pod="cluster-2",container="zookeeper"} = 0`
      at and after the termination time.
  - Kubernetes Events in Loki for this pod show:
    - `reason=Unhealthy` liveness probe failures (Zookeeper liveness script exit 1),
    - No events indicating OOM or memory pressure.

- An observability component (Robusta) logs the pod status it receives from the API server and confirms that
  **`pod.status.containerStatuses[*].state.terminated.reason` is `"OOMKilled"` with `exitCode=143`** for the
  terminating container at the time the alert is raised.
So effectively, **pods that are being drained/evicted without memory pressure are reported as if they were OOMKilled,
but with exit code 143 (SIGTERM)**.

**6. What did you expect to happen?**

- For non-OOM terminations such as node drains or manual pod deletions we would expect:
  - `container.state.terminated.reason` **not** to be `"OOMKilled"` (e.g. `"Error"`, `"Completed"`, or other appropriate reasons),
  - Or the **pod-level** `status.reason` to indicate `Evicted` or `NodeShutdown` while container termination reason
    remains consistent with the signal / exit code (143 for SIGTERM, 0 for clean exit, etc.).
- Specifically, we do not expect the `OOMKilled` reason
  - For pods being terminated due to node drains or liveness failures without any evidence of memory pressure.
This misclassification is problematic because downstream controllers/alerting (including ours) rely on
`containerStatuses[*].state.terminated.reason` to distinguish true OOM events from expected evictions.

**7. Please provide your cluster manifest.**

```yaml
# REDACTED / simplified example
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: prod
spec:
  cloudProvider: aws
  kubernetesVersion: 1.32.10
  api:
    loadBalancer:
      type: Public
  nodeTerminationHandler:
    enabled: true
    cpuRequest: 200m
    enableRebalanceMonitoring: true
    enableSQSTerminationDraining: true
    managedASGTag: "aws-node-termination-handler/managed"
    prometheusEnable: true
  kubelet:
    # Typical reservations; nothing exotic
    kubeReserved:
      cpu: "1"
      memory: "2Gi"
      ephemeral-storage: "1Gi"
    systemReserved:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "1Gi"
    enforceNodeAllocatable: "pods,system-reserved,kube-reserved"
  # ... other standard kOps cluster config (subnets, IAM, etc.) ...
```

**8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.**

n.a.

**9. Anything else do we need to know?**

Any guidance is appreciated 🙏 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods evicted/drained are reported as terminated.reason=OOMKilled with exitCode 143 (no memory pressure) #18040

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pods evicted/drained are reported as terminated.reason=OOMKilled with exitCode 143 (no memory pressure) #18040

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions