[producer] Add partial update, delete, and batch file mode to ProducerTool#2698
[producer] Add partial update, delete, and batch file mode to ProducerTool#2698sixpluszero wants to merge 8 commits intolinkedin:mainfrom
Conversation
…rTool - Add -u/--update flag for partial updates using UpdateBuilder, supporting field replacement and collection operators ($addToList, $removeFromList, $addToMap, $removeFromMap) - Add -d/--delete flag to delete records without requiring a value - Add -f/--file flag for batch mode reading JSON Lines files with per-record operation support (put/update/delete) - Cache schemas upfront in batch mode to avoid per-record router lookups Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds new CLI capabilities to ProducerTool to support partial updates, explicit deletes, and batch ingestion from a JSON Lines file.
Changes:
- Adds
--update(-u) to perform partial updates viaVeniceProducer.asyncUpdate()andUpdateBuilder. - Adds
--delete(-d) to delete by key without requiring-v "null". - Adds
--file(-f) JSONL batch mode and caches schemas up front to reduce router calls.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
- Reject --file combined with --update (was silently ignored) - Reject --delete combined with --update (was silently ignored) - Use Files.newBufferedReader with UTF-8 charset instead of FileReader - Add RECORD type check before partial update field iteration - Add ARRAY/MAP type checks before collection operations - Remove unnecessary @SuppressWarnings("unchecked") on writeFromFile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
- Use Option.getOpt() strings for cmd.getOptionValue/hasOption calls for consistency with rest of codebase - Update FILE_OPTION description to list all mutually exclusive flags - Use Locale.ROOT for toLowerCase() to avoid locale-dependent behavior - Reuse a single static ObjectMapper instead of creating per invocation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
- Use ObjectMapperFactory.getInstance() instead of new ObjectMapper()
for consistent Jackson config across Venice codebase
- Use schemaFetcher.getLatestValueSchema() instead of max schema ID
to correctly honor superset schema in both single and batch mode
- Handle JSON null values in partial updates: fieldValue.isNull() now
calls setNewFieldValue(fieldName, null) instead of trying to parse
the string "null". Also handle nulls in array/map element parsing
- Add ProducerToolTest with 14 unit tests covering:
- Plain field replacement (string, int, boolean, double)
- Null field value handling (single null, mixed null/non-null)
- Collection operators (addToList, removeFromList, addToMap, removeFromMap)
- Error cases (non-record schema, unknown field, invalid JSON, empty
object, wrong schema type for collection operators)
- Primitive type adaptation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
clients/venice-producer/src/test/java/com/linkedin/venice/producer/online/ProducerToolTest.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
Previous tests only verified buildUpdateFunctionWithSchema helper in isolation. New tests now cover the actual features: - writeFromFile batch mode: put/delete/update operations, mixed ops, default operation is put, empty line skipping, stop-on-first-error - writeFromFile error cases: missing key, missing value, invalid JSON, unknown operation - Null value handling in partial updates (the Copilot review concern) - Collection operators and error cases (wrong schema type) Uses mocked OnlineVeniceProducer and RouterBasedStoreSchemaFetcher with real temp JSONL files to test the full writeFromFile code path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ches
- Replace Java assert keyword with TestNG Assert.assertTrue in all
tests so checks are effective without -ea JVM flag
- Reorder branches in writeToStore so isUpdate is checked before
value.equals("null"), preventing -u -v null from silently deleting
instead of attempting a partial update
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Outdated
Show resolved
Hide resolved
clients/venice-producer/src/main/java/com/linkedin/venice/producer/online/ProducerTool.java
Show resolved
Hide resolved
asText() strips JSON quotes, which breaks the Avro JSON decoder for ENUM, FIXED, and BYTES schemas (e.g., enum "RED" becomes RED without quotes). Introduce jsonNodeToString() helper that only uses asText() when the target schema is STRING, and toString() for all other types. Applied to all 5 locations: writeFromFile key/value, update field values, parseJsonArray elements, and parseJsonMap values. Added tests: - jsonNodeToString: STRING schema strips quotes, ENUM preserves quotes, numeric nodes use toString - adaptDataToSchema with quoted enum value - buildUpdateFunctionWithSchema with an ENUM field verifies the adapted value is a GenericData.EnumSymbol Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change float test value from 3.14 to 1.25 to avoid SpotBugs flagging it as a rough approximation of Math.PI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem Statement
The
ProducerToolCLI currently only supports full-valueputand delete-by-null-value operations, one record at a time. This has several limitations:-v "null".Solution
Three new CLI options are added to
ProducerTool:1. Partial update (
-u/--update)Uses
VeniceProducer.asyncUpdate()withUpdateBuilderto modify individual fields. Supports:{"fieldName": "newValue"}→setNewFieldValue{"listField": {"$addToList": [...]}}→setElementsToAddToListField{"listField": {"$removeFromList": [...]}}→setElementsToRemoveFromListField{"mapField": {"$addToMap": {...}}}→setEntriesToAddToMapField{"mapField": {"$removeFromMap": [...]}}→setKeysToRemoveFromMapField2. Explicit delete (
-d/--delete)Deletes a record by key without requiring
-v. The existing-v "null"convention still works for backward compatibility.3. Batch file mode (
-f/--file)Reads a JSON Lines file where each line specifies
key,value, and optionaloperation(put/update/delete, default: put). Example:Performance: Schemas are fetched once before processing and cached, avoiding per-record router round-trips. (
RouterBasedStoreSchemaFetcherhas no internal caching.)Error handling: Stops on first error, reports how many records succeeded before the failure.
Code changes
Concurrency-Specific Checks
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
-k+-vfor put,-v "null"for delete) works unchanged.Does this PR introduce any user-facing or breaking changes?
New CLI options added (all additive, no breaking changes):
-u/--update: Perform partial update instead of full put-d/--delete: Delete a record by key (no-vneeded)-f/--file: Batch mode from a JSON Lines fileMinor change:
-kand-vare no longer unconditionally required — they are validated contextually (required unless-for-dis used). This does not break any existing valid invocations.