[da-vinci][server] Add per-poll record count metric (poll_result_size) by sixpluszero · Pull Request #2697 · linkedin/venice

sixpluszero · 2026-04-06T16:37:30Z

Problem Statement

The existing batch_processing_request_size metric is recorded per mini-batch after splitting in the leader AA/WC path. This makes it impossible to distinguish genuinely small polls from tail batches of larger polls, obscuring the true per-poll record count.

Additionally, poll size was only tracked in the batch processing path (produceToStoreBufferServiceOrKafkaInBatch), meaning any ingestion that goes through the non-batch path (e.g., non-AA/WC stores, VT consumption, or when parallel processing is disabled) had no poll size visibility at all.

Solution

Add a new poll_result_size metric (Tehuti + OTel) recorded before mini-batch splitting to capture the true per-poll record count per partition.
Record this metric in both the batch path (produceToStoreBufferServiceOrKafkaInBatch) and the non-batch path (produceToStoreBufferServiceOrKafka), so poll sizes are captured regardless of whether AA/WC parallel processing is enabled.
Only records that pass shouldProcessRecord are counted, consistent across both paths.

Code changes

StoreIngestionTask.java: Add poll_result_size recording in both produceToStoreBufferServiceOrKafka (non-batch) and produceToStoreBufferServiceOrKafkaInBatch (batch) paths.
IngestionStats.java: Add pollResultSizeSensor (Tehuti).
AggVersionedIngestionStats.java: Add recordPollResultSize delegation to both Tehuti and OTel stats.
IngestionOtelStats.java / IngestionOtelMetricEntity.java: Add POLL_RESULT_SIZE OTel metric entity and recording method.
NoOpIngestionOtelStats.java: Add no-op override.
Added new code behind a config. No new config — metric is always emitted.
Introduced new log lines. No new log lines.

Concurrency-Specific Checks

Code has no race conditions or thread safety issues. The metric recording is a simple counter increment on an existing stats object within the single-threaded consumer callback path.
No new synchronization needed.
No blocking calls introduced.
No new collections introduced.
No new multi-threaded code introduced.

How was this PR tested?

New unit tests added.
- testPollResultSizeRecordedInNonBatchPath — verifies metric is recorded with correct count (3 records) in the non-batch path.
- testPollResultSizeNotRecordedWhenAllRecordsFiltered — verifies metric is NOT recorded when all records are filtered by shouldProcessRecord.
- testPollResultSizeRecordedInBatchPath — verifies metric is recorded with correct count (5 records) in the AA/WC batch path.
New integration tests added.
Modified or extended existing tests.
Verified backward compatibility (if applicable). No breaking changes — only additive metric.

Does this PR introduce any user-facing or breaking changes?

No. You can skip the rest of this section.
Yes. Clearly explain the behavior change and its impact.

Add a poll_result_size metric recorded before mini-batch splitting in the leader AA/WC path to capture the true per-poll record count per partition. The existing batch_processing_request_size is recorded per mini-batch after splitting, making it impossible to distinguish small polls from tail batches of larger polls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…unit tests The poll_result_size metric was only recorded in the AA/WC batch processing path (produceToStoreBufferServiceOrKafkaInBatch). This adds the same metric to the non-batch path (produceToStoreBufferServiceOrKafka) so that poll sizes are captured regardless of whether parallel processing is enabled. Also adds three unit tests covering both paths and the filtered-records edge case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a new per-poll metric (poll_result_size) to improve visibility into the true number of records returned by each consumer poll (per partition), independent of mini-batch splitting and regardless of whether ingestion runs in the batch (AA/WC parallel) or non-batch path.

Changes:

Record poll_result_size in StoreIngestionTask for both non-batch and batch ingestion flows.
Add Tehuti plumbing in IngestionStats and aggregation/delegation in AggVersionedIngestionStats.
Add an OTel metric entity + recording API for poll_result_size, including a no-op implementation, and unit tests validating behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
clients/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java	Records per-poll record count in both non-batch and batch ingestion paths.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/IngestionStats.java	Adds Tehuti sensor and recording method for `poll_result_size`.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/AggVersionedIngestionStats.java	Delegates poll-size recording to Tehuti (versioned/total) and OTel.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ingestion/IngestionOtelMetricEntity.java	Introduces OTel metric entity definition for poll-size.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ingestion/IngestionOtelStats.java	Wires the new OTel metric into the stats implementation and exposes a record method.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ingestion/NoOpIngestionOtelStats.java	Adds no-op override for poll-size recording.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTaskTest.java	Adds unit tests verifying metric emission (non-batch, batch) and suppression when all records are filtered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T16:41:32Z

...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java

+    int totalRecordsInPoll = 0;
+    for (List<DefaultPubSubMessage> batch: batches) {
+      totalRecordsInPoll += batch.size();
+    }
+    if (totalRecordsInPoll > 0) {
+      versionedIngestionStats
+          .recordPollResultSize(storeName, versionNumber, totalRecordsInPoll, beforeProcessingBatchRecordsTimestampMs);
+    }


In the batch path, totalRecordsInPoll is recomputed by iterating over the already-split batches, which adds an extra pass and contradicts the intent of recording the poll size before mini-batch splitting. Consider tracking the count while building batches (or incrementing a counter alongside ongoingBatch.add(record)) so the metric is computed directly from the pre-split poll result without the additional loop.

Copilot · 2026-04-06T16:41:32Z

...nci-client/src/main/java/com/linkedin/davinci/stats/ingestion/IngestionOtelMetricEntity.java


+  POLL_RESULT_SIZE(
+      "ingestion.poll_result.size", MetricType.MIN_MAX_COUNT_SUM_AGGREGATIONS, MetricUnit.NUMBER,
+      "Total records per poll per partition before mini-batch splitting (leader AA/WC path)",


The OTel metric description says this is specific to the "leader AA/WC path", but poll_result_size is recorded in both the batch (AA/WC) and non-batch ingestion paths. Please update the description to reflect the broader coverage (and ideally mention that it counts only records that pass shouldProcessRecord).

Suggested change

"Total records per poll per partition before mini-batch splitting (leader AA/WC path)",

"Total records per poll per partition before mini-batch splitting, counting only records that pass shouldProcessRecord",

- Fix NPE in testConsumedBytesSinceLastSyncTracking by adding versionedIngestionStats, storeName, and versionNumber to mock setup - Add POLL_RESULT_SIZE to IngestionOtelMetricEntityTest expected defs - Update ServerMetricEntityTest count from 139 to 140 - Eliminate extra loop in batch path by counting during batch building - Update OTel metric description to reflect both batch and non-batch path coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sixpluszero and others added 2 commits March 25, 2026 00:50

Copilot AI review requested due to automatic review settings April 6, 2026 16:37

Copilot started reviewing on behalf of sixpluszero April 6, 2026 16:38 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[da-vinci][server] Add per-poll record count metric (poll_result_size)#2697

[da-vinci][server] Add per-poll record count metric (poll_result_size)#2697
sixpluszero wants to merge 3 commits intolinkedin:mainfrom
sixpluszero:metric/poll-size

sixpluszero commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"Total records per poll per partition before mini-batch splitting (leader AA/WC path)",
	"Total records per poll per partition before mini-batch splitting, counting only records that pass shouldProcessRecord",

Conversation

sixpluszero commented Apr 6, 2026

Problem Statement

Solution

Code changes

Concurrency-Specific Checks

How was this PR tested?

Does this PR introduce any user-facing or breaking changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants