Is your feature request related to a problem? Please describe.
When using Kafka as the EventBus, the sensor consumer batches messages with a hardcoded 1-second timeout (Batch(100, 1*time.Second, ...)). This introduces significant latency between an event being published and a trigger firing — up to 1 second per internal topic hop (event → trigger → action), resulting in 3-5 seconds of total latency even for a single event at low volume.
In our environment (3-broker Strimzi cluster across 3 GCP zones, mTLS, min.insync.replicas=2), we observed ~4.0 seconds between eventsource.publish and sensor.trigger through distributed tracing. The batch timer accounted for ~2.5 seconds of that total, with the remaining time spent on Kafka transactional commits.
This latency is acceptable for high-throughput batch workloads but problematic for latency-sensitive use cases like webhook-driven workflows, real-time notifications, and interactive event-driven pipelines.
Describe the solution you'd like
Add a configurable consumerBatchMaxWait field to the Kafka EventBus spec that controls the maximum time the sensor consumer waits to fill a batch before processing. The field should:
- Accept a Go duration string (e.g.,
1s, 500ms, 100ms) or 0 to disable batching entirely
- Default to
1s (preserving current behavior when not set)
- When set to
0, process messages individually in real-time without batching
- Be configurable at the EventBus level (default for all sensors) and overridable per Sensor via a
eventBusConsumerBatchMaxWait field in the Sensor spec
Example EventBus configuration:
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
name: default
spec:
kafka:
url: kafka:9092
consumerBatchMaxWait: "100ms"
Example Sensor-level override:
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: latency-sensitive-sensor
spec:
eventBusConsumerBatchMaxWait: "0"
Describe alternatives you've considered
- Hardcode a lower timeout (e.g., 100ms): This would improve latency for everyone but reduces throughput optimization for high-volume use cases. A configurable approach lets users choose the right trade-off.
- Remove batching entirely: While this provides the best latency, it eliminates the transaction amortization benefit. Making it opt-in via
"0" is safer.
- Use JetStream instead of Kafka: JetStream already processes messages one at a time (no batching), but switching event buses is not always feasible for teams that depend on Kafka's ecosystem, horizontal scaling, and exactly-once semantics.
Additional context
Benchmarks from a 3-broker Strimzi Kafka cluster (Kafka 4.1.1, 3 GCP zones, mTLS, min.insync.replicas=2, transaction.state.log.replication.factor=3):
| Configuration |
Event-to-trigger latency |
Default (1s batch) |
~3500ms |
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
Is your feature request related to a problem? Please describe.
When using Kafka as the EventBus, the sensor consumer batches messages with a hardcoded 1-second timeout (
Batch(100, 1*time.Second, ...)). This introduces significant latency between an event being published and a trigger firing — up to 1 second per internal topic hop (event → trigger → action), resulting in 3-5 seconds of total latency even for a single event at low volume.In our environment (3-broker Strimzi cluster across 3 GCP zones, mTLS,
min.insync.replicas=2), we observed ~4.0 seconds betweeneventsource.publishandsensor.triggerthrough distributed tracing. The batch timer accounted for ~2.5 seconds of that total, with the remaining time spent on Kafka transactional commits.This latency is acceptable for high-throughput batch workloads but problematic for latency-sensitive use cases like webhook-driven workflows, real-time notifications, and interactive event-driven pipelines.
Describe the solution you'd like
Add a configurable
consumerBatchMaxWaitfield to the Kafka EventBus spec that controls the maximum time the sensor consumer waits to fill a batch before processing. The field should:1s,500ms,100ms) or0to disable batching entirely1s(preserving current behavior when not set)0, process messages individually in real-time without batchingeventBusConsumerBatchMaxWaitfield in the Sensor specExample EventBus configuration:
Example Sensor-level override:
Describe alternatives you've considered
"0"is safer.Additional context
Benchmarks from a 3-broker Strimzi Kafka cluster (Kafka 4.1.1, 3 GCP zones, mTLS,
min.insync.replicas=2,transaction.state.log.replication.factor=3):1sbatch)Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.