You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Yes. The EFS CSI driver currently bundles efs-utils.conf in the container image with no way to override specific settings without replacing the entire configuration file. This creates a maintenance burden when operators need to tune specific parameters for production reliability.
We encountered a race condition on Bottlerocket nodes where TLS certificate rotation (every 60 minutes via tls_cert_renewal_interval_min = 60) kills stunnel (or efs-proxy) during mass pod evictions. When this coincides with 60+ pods being terminated, the watchdog's health check interval of 5 minutes (stunnel_health_check_interval_min = 5) is too slow to detect and restart the dead stunnel process. This causes mount operations to fill all --max-inflight-mount-calls slots with stuck goroutines, deadlocking the node.
Describe the solution you'd like in detail
Add support for overriding specific efs-utils.conf settings via environment variables or Helm values, so operators can tune parameters without owning the entire config file.
Option 1: Environment Variables
Option 2: Helm Values
The driver's entrypoint would merge these overrides with the bundled efs-utils.conf, preserving upstream defaults for settings not explicitly overridden. This allows critical production tuning while automatically receiving updates from new efs-utils versions (new regions, security patches, feature flags).
Describe alternatives you've considered
Own the entire efs-utils.conf via ConfigMap - This works but means we miss automatic updates when new AWS regions are added, security patches are applied, or new configuration options are introduced in efs-utils updates.
Init container to copy and modify config - Similar to option 1, requires maintaining a full copy of the config.
Fork and maintain a custom image - Too much overhead for what should be a simple configuration override.
All alternatives require tracking upstream changes manually and rebasing our config file with each efs-utils release.
Additional context
Our specific fix includes stunnel_health_check_interval_min = 1 (down from default 5) to detect dead stunnel within 60 seconds instead of 5 minutes
Combined with --force-unmount-after-timeout=true and --max-inflight-mount-calls=30, this reduces the chance the node will deadlock during TLS cert rotation
This issue is more prevalent on Bottlerocket (read-only root FS) than AL2, making configuration flexibility even more important
Is your feature request related to a problem? Please describe.
Yes. The EFS CSI driver currently bundles efs-utils.conf in the container image with no way to override specific settings without replacing the entire configuration file. This creates a maintenance burden when operators need to tune specific parameters for production reliability.
We encountered a race condition on Bottlerocket nodes where TLS certificate rotation (every 60 minutes via
tls_cert_renewal_interval_min = 60) kills stunnel (orefs-proxy) during mass pod evictions. When this coincides with 60+ pods being terminated, the watchdog's health check interval of 5 minutes (stunnel_health_check_interval_min = 5) is too slow to detect and restart the dead stunnel process. This causes mount operations to fill all--max-inflight-mount-callsslots with stuck goroutines, deadlocking the node.Describe the solution you'd like in detail
Add support for overriding specific efs-utils.conf settings via environment variables or Helm values, so operators can tune parameters without owning the entire config file.
Option 1: Environment Variables
Option 2: Helm Values
The driver's entrypoint would merge these overrides with the bundled efs-utils.conf, preserving upstream defaults for settings not explicitly overridden. This allows critical production tuning while automatically receiving updates from new efs-utils versions (new regions, security patches, feature flags).
Describe alternatives you've considered
Own the entire efs-utils.conf via ConfigMap - This works but means we miss automatic updates when new AWS regions are added, security patches are applied, or new configuration options are introduced in efs-utils updates.
Init container to copy and modify config - Similar to option 1, requires maintaining a full copy of the config.
Fork and maintain a custom image - Too much overhead for what should be a simple configuration override.
All alternatives require tracking upstream changes manually and rebasing our config file with each efs-utils release.
Additional context
stunnel_health_check_interval_min = 1(down from default 5) to detect dead stunnel within 60 seconds instead of 5 minutes--force-unmount-after-timeout=trueand--max-inflight-mount-calls=30, this reduces the chance the node will deadlock during TLS cert rotation