Configurable consumer batch timeout for Kafka EventBus sensors

**Is your feature request related to a problem? Please describe.**

When using Kafka as the EventBus, the sensor consumer batches messages with a hardcoded 1-second timeout (`Batch(100, 1*time.Second, ...)`). This introduces significant latency between an event being published and a trigger firing — up to 1 second per internal topic hop (event → trigger → action), resulting in 3-5 seconds of total latency even for a single event at low volume.

In our environment (3-broker Strimzi cluster across 3 GCP zones, mTLS, `min.insync.replicas=2`), we observed ~4.0 seconds between `eventsource.publish` and `sensor.trigger` through distributed tracing. The batch timer accounted for ~2.5 seconds of that total, with the remaining time spent on Kafka transactional commits.

This latency is acceptable for high-throughput batch workloads but problematic for latency-sensitive use cases like webhook-driven workflows, real-time notifications, and interactive event-driven pipelines.

**Describe the solution you'd like**

Add a configurable `consumerBatchMaxWait` field to the Kafka EventBus spec that controls the maximum time the sensor consumer waits to fill a batch before processing. The field should:

1. Accept a Go duration string (e.g., `1s`, `500ms`, `100ms`) or `0` to disable batching entirely
2. Default to `1s` (preserving current behavior when not set)
3. When set to `0`, process messages individually in real-time without batching
4. Be configurable at the EventBus level (default for all sensors) and overridable per Sensor via a `eventBusConsumerBatchMaxWait` field in the Sensor spec

Example EventBus configuration:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
spec:
  kafka:
    url: kafka:9092
    consumerBatchMaxWait: "100ms"
```

Example Sensor-level override:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: latency-sensitive-sensor
spec:
  eventBusConsumerBatchMaxWait: "0"
```

**Describe alternatives you've considered**

- **Hardcode a lower timeout (e.g., 100ms):** This would improve latency for everyone but reduces throughput optimization for high-volume use cases. A configurable approach lets users choose the right trade-off.
- **Remove batching entirely:** While this provides the best latency, it eliminates the transaction amortization benefit. Making it opt-in via `"0"` is safer.
- **Use JetStream instead of Kafka:** JetStream already processes messages one at a time (no batching), but switching event buses is not always feasible for teams that depend on Kafka's ecosystem, horizontal scaling, and exactly-once semantics.

**Additional context**

Benchmarks from a 3-broker Strimzi Kafka cluster (Kafka 4.1.1, 3 GCP zones, mTLS, `min.insync.replicas=2`, `transaction.state.log.replication.factor=3`):

| Configuration | Event-to-trigger latency |
|---|---|
| Default (`1s` batch) | ~3500ms |

<img width="2559" height="513" alt="Image" src="https://github.com/user-attachments/assets/7bb0183a-8a44-4ece-abde-5657305125a9" />

---

**Message from the maintainers**:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable consumer batch timeout for Kafka EventBus sensors #3984

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Configurable consumer batch timeout for Kafka EventBus sensors #3984

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions