Add support for configuring efs-utils.conf settings via environment variables or Helm values

**Is your feature request related to a problem? Please describe.**

Yes. The EFS CSI driver currently bundles efs-utils.conf in the container image with no way to override specific settings without replacing the entire configuration file. This creates a maintenance burden when operators need to tune specific parameters for production reliability.

We encountered a race condition on Bottlerocket nodes where TLS certificate rotation (every 60 minutes via `tls_cert_renewal_interval_min = 60`) kills stunnel (or `efs-proxy`) during mass pod evictions. When this coincides with 60+ pods being terminated, the watchdog's health check interval of 5 minutes (`stunnel_health_check_interval_min = 5`) is too slow to detect and restart the dead stunnel process. This causes mount operations to fill all `--max-inflight-mount-calls` slots with stuck goroutines, deadlocking the node.

**Describe the solution you'd like in detail**

Add support for overriding specific efs-utils.conf settings via environment variables or Helm values, so operators can tune parameters without owning the entire config file.

Option 1: Environment Variables

Option 2: Helm Values

The driver's entrypoint would merge these overrides with the bundled efs-utils.conf, preserving upstream defaults for settings not explicitly overridden. This allows critical production tuning while automatically receiving updates from new efs-utils versions (new regions, security patches, feature flags).

**Describe alternatives you've considered**

Own the entire efs-utils.conf via ConfigMap - This works but means we miss automatic updates when new AWS regions are added, security patches are applied, or new configuration options are introduced in efs-utils updates.

Init container to copy and modify config - Similar to option 1, requires maintaining a full copy of the config.

Fork and maintain a custom image - Too much overhead for what should be a simple configuration override.

All alternatives require tracking upstream changes manually and rebasing our config file with each efs-utils release.

**Additional context**

- Our specific fix includes `stunnel_health_check_interval_min = 1` (down from default 5) to detect dead stunnel within 60 seconds instead of 5 minutes
- Combined with `--force-unmount-after-timeout=true` and `--max-inflight-mount-calls=30`, this reduces the chance the node will deadlock during TLS cert rotation
- This issue is more prevalent on Bottlerocket (read-only root FS) than AL2, making configuration flexibility even more important
- Related to #1821 about efs-utils.conf not updating during Helm upgrades


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for configuring efs-utils.conf settings via environment variables or Helm values #1833

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for configuring efs-utils.conf settings via environment variables or Helm values #1833

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions