Description
What problem are you trying to solve?
We're running Bottlerocket nodes with multiple EBS volumes that are aggregated via RAID0 (using a bootstrap container). Karpenter's ephemeralStorage() function returns only a single block device size on nodeclaims, not reflecting the actual storage available to pods after our userdata aggregates the volumes.
Code reference:
|
func ephemeralStorage(info ec2types.InstanceTypeInfo, amiFamily amifamily.AMIFamily, blockDeviceMappings []*v1.BlockDeviceMapping, instanceStorePolicy *v1.InstanceStorePolicy) *resource.Quantity { |
|
// If local store disks have been configured for node ephemeral-storage, use the total size of the disks. |
|
if lo.FromPtr(instanceStorePolicy) == v1.InstanceStorePolicyRAID0 { |
|
if info.InstanceStorageInfo != nil && info.InstanceStorageInfo.TotalSizeInGB != nil { |
|
return resources.Quantity(fmt.Sprintf("%dG", *info.InstanceStorageInfo.TotalSizeInGB)) |
|
} |
|
} |
|
if len(blockDeviceMappings) != 0 { |
|
// First check if there's a root volume configured in blockDeviceMappings. |
|
if blockDeviceMapping, ok := lo.Find(blockDeviceMappings, func(bdm *v1.BlockDeviceMapping) bool { |
|
return bdm.RootVolume |
|
}); ok && blockDeviceMapping.EBS.VolumeSize != nil { |
|
return blockDeviceMapping.EBS.VolumeSize |
|
} |
|
switch amiFamily.(type) { |
|
case *amifamily.Custom: |
|
// We can't know if a custom AMI is going to have a volume size. |
|
volumeSize := blockDeviceMappings[len(blockDeviceMappings)-1].EBS.VolumeSize |
|
return lo.Ternary(volumeSize != nil, volumeSize, amifamily.DefaultEBS.VolumeSize) |
|
default: |
|
// If a block device mapping exists in the provider for the root volume, use the volume size specified in the provider. If not, use the default |
|
if blockDeviceMapping, ok := lo.Find(blockDeviceMappings, func(bdm *v1.BlockDeviceMapping) bool { |
|
return *bdm.DeviceName == *amiFamily.EphemeralBlockDevice() |
|
}); ok && blockDeviceMapping.EBS.VolumeSize != nil { |
|
return blockDeviceMapping.EBS.VolumeSize |
|
} |
|
} |
|
} |
|
//Return the ephemeralBlockDevice size if defined in ami |
|
if ephemeralBlockDevice, ok := lo.Find(amiFamily.DefaultBlockDeviceMappings(), func(item *v1.BlockDeviceMapping) bool { |
|
return *amiFamily.EphemeralBlockDevice() == *item.DeviceName |
|
}); ok { |
|
return ephemeralBlockDevice.EBS.VolumeSize |
|
} |
|
return amifamily.DefaultEBS.VolumeSize |
What the Bootstrap Container Does:
Detects all non-root EBS volumes: /dev/xvdb (18Gi), /dev/xvde (20Gi), /dev/xvdf (20Gi), /dev/xvdg (20Gi)
Creates RAID0 array from all 4 volumes
Mounts combined storage (78Gi total) to /data
Bind mounts /var/lib/kubelet and /var/lib/containerd to RAID array
The Gap:
Karpenter calculates for nodeclaim: 18Gi (only /dev/xvdb - the EphemeralBlockDevice)
Actual available storage on node: 78Gi (RAID0 of all 4 non-root volumes)
Bin-packing decisions are based on incorrect capacity, causing premature scaling.
Example EC2NodeClass configuration (Bottlerocket):
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
spec:
amiFamily: Bottlerocket
blockDeviceMappings:
- deviceName: /dev/xvda # Root volume: 2Gi (OS only)
ebs:
volumeSize: 2Gi
volumeType: gp3
encrypted: true
- deviceName: /dev/xvdb # EphemeralBlockDevice: 18Gi
ebs:
volumeSize: 18Gi
volumeType: gp3
encrypted: true
- deviceName: /dev/xvde # Additional: 20Gi
ebs:
volumeSize: 20Gi
volumeType: gp3
encrypted: true
- deviceName: /dev/xvdf # Additional: 20Gi
ebs:
volumeSize: 20Gi
volumeType: gp3
encrypted: true
- deviceName: /dev/xvdg # Additional: 20Gi
ebs:
volumeSize: 20Gi
volumeType: gp3
encrypted: true
Question:
How should we handle scenarios where userdata/bootstrap scripts aggregate multiple volumes?
Annotation-based override - e.g., karpenter.sh/ephemeral-storage-override: "78Gi"
New field in EC2NodeClass - e.g., spec.ephemeralStorageCapacity: 78Gi
Label-based volume selection - Mark which volumes contribute to ephemeral storage
Sum all non-root volumes
What's the recommended approach for this use case?
How important is this feature to you?
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
What problem are you trying to solve?
We're running Bottlerocket nodes with multiple EBS volumes that are aggregated via RAID0 (using a bootstrap container). Karpenter's ephemeralStorage() function returns only a single block device size on nodeclaims, not reflecting the actual storage available to pods after our userdata aggregates the volumes.
Code reference:
karpenter-provider-aws/pkg/providers/instancetype/types.go
Lines 358 to 392 in 10c4ef3
What the Bootstrap Container Does:
Detects all non-root EBS volumes: /dev/xvdb (18Gi), /dev/xvde (20Gi), /dev/xvdf (20Gi), /dev/xvdg (20Gi)
Creates RAID0 array from all 4 volumes
Mounts combined storage (78Gi total) to /data
Bind mounts /var/lib/kubelet and /var/lib/containerd to RAID array
The Gap:
Karpenter calculates for nodeclaim: 18Gi (only /dev/xvdb - the EphemeralBlockDevice)
Actual available storage on node: 78Gi (RAID0 of all 4 non-root volumes)
Bin-packing decisions are based on incorrect capacity, causing premature scaling.
Example EC2NodeClass configuration (Bottlerocket):
Question:
How should we handle scenarios where userdata/bootstrap scripts aggregate multiple volumes?
Annotation-based override - e.g., karpenter.sh/ephemeral-storage-override: "78Gi"
New field in EC2NodeClass - e.g., spec.ephemeralStorageCapacity: 78Gi
Label-based volume selection - Mark which volumes contribute to ephemeral storage
Sum all non-root volumes
What's the recommended approach for this use case?
How important is this feature to you?