Skip to content

[producer] Add partial update, delete, and batch file mode to ProducerTool#2698

Open
sixpluszero wants to merge 8 commits intolinkedin:mainfrom
sixpluszero:feature/produce-update
Open

[producer] Add partial update, delete, and batch file mode to ProducerTool#2698
sixpluszero wants to merge 8 commits intolinkedin:mainfrom
sixpluszero:feature/produce-update

Conversation

@sixpluszero
Copy link
Copy Markdown
Contributor

Problem Statement

The ProducerTool CLI currently only supports full-value put and delete-by-null-value operations, one record at a time. This has several limitations:

  • No partial update support: Users cannot update individual fields of a record without replacing the entire value. This is especially painful for records with many fields where only one needs to change.
  • No explicit delete flag: Deleting a record requires the unintuitive convention of passing -v "null".
  • No batch mode: Writing multiple records requires repeated CLI invocations, each incurring connection setup and schema lookup overhead.

Solution

Three new CLI options are added to ProducerTool:

1. Partial update (-u/--update)

Uses VeniceProducer.asyncUpdate() with UpdateBuilder to modify individual fields. Supports:

  • Field replacement: {"fieldName": "newValue"}setNewFieldValue
  • List append: {"listField": {"$addToList": [...]}}setElementsToAddToListField
  • List remove: {"listField": {"$removeFromList": [...]}}setElementsToRemoveFromListField
  • Map add: {"mapField": {"$addToMap": {...}}}setEntriesToAddToMapField
  • Map remove: {"mapField": {"$removeFromMap": [...]}}setKeysToRemoveFromMapField

2. Explicit delete (-d/--delete)

Deletes a record by key without requiring -v. The existing -v "null" convention still works for backward compatibility.

3. Batch file mode (-f/--file)

Reads a JSON Lines file where each line specifies key, value, and optional operation (put/update/delete, default: put). Example:

{"key": "k1", "value": {"name": "Alice"}, "operation": "put"}
{"key": "k2", "value": {"tags": {"$addToList": ["new"]}}, "operation": "update"}
{"key": "k3", "value": "null", "operation": "delete"}

Performance: Schemas are fetched once before processing and cached, avoiding per-record router round-trips. (RouterBasedStoreSchemaFetcher has no internal caching.)

Error handling: Stops on first error, reports how many records succeeded before the failure.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
    • No new configs. Features are controlled by CLI flags only.
  • Introduced new log lines.

Concurrency-Specific Checks

  • Code has no race conditions or thread safety issues.
    • All new code runs single-threaded in the CLI main method.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
    • N/A — single-threaded CLI tool.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
    • N/A — no shared mutable state.
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).
    • Existing CLI usage (-k + -v for put, -v "null" for delete) works unchanged.
    • All existing unit tests pass.

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

New CLI options added (all additive, no breaking changes):

  • -u/--update: Perform partial update instead of full put
  • -d/--delete: Delete a record by key (no -v needed)
  • -f/--file: Batch mode from a JSON Lines file

Minor change: -k and -v are no longer unconditionally required — they are validated contextually (required unless -f or -d is used). This does not break any existing valid invocations.

…rTool

- Add -u/--update flag for partial updates using UpdateBuilder, supporting
  field replacement and collection operators ($addToList, $removeFromList,
  $addToMap, $removeFromMap)
- Add -d/--delete flag to delete records without requiring a value
- Add -f/--file flag for batch mode reading JSON Lines files with per-record
  operation support (put/update/delete)
- Cache schemas upfront in batch mode to avoid per-record router lookups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 16:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new CLI capabilities to ProducerTool to support partial updates, explicit deletes, and batch ingestion from a JSON Lines file.

Changes:

  • Adds --update (-u) to perform partial updates via VeniceProducer.asyncUpdate() and UpdateBuilder.
  • Adds --delete (-d) to delete by key without requiring -v "null".
  • Adds --file (-f) JSONL batch mode and caches schemas up front to reduce router calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Reject --file combined with --update (was silently ignored)
- Reject --delete combined with --update (was silently ignored)
- Use Files.newBufferedReader with UTF-8 charset instead of FileReader
- Add RECORD type check before partial update field iteration
- Add ARRAY/MAP type checks before collection operations
- Remove unnecessary @SuppressWarnings("unchecked") on writeFromFile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use Option.getOpt() strings for cmd.getOptionValue/hasOption calls
  for consistency with rest of codebase
- Update FILE_OPTION description to list all mutually exclusive flags
- Use Locale.ROOT for toLowerCase() to avoid locale-dependent behavior
- Reuse a single static ObjectMapper instead of creating per invocation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use ObjectMapperFactory.getInstance() instead of new ObjectMapper()
  for consistent Jackson config across Venice codebase
- Use schemaFetcher.getLatestValueSchema() instead of max schema ID
  to correctly honor superset schema in both single and batch mode
- Handle JSON null values in partial updates: fieldValue.isNull() now
  calls setNewFieldValue(fieldName, null) instead of trying to parse
  the string "null". Also handle nulls in array/map element parsing
- Add ProducerToolTest with 14 unit tests covering:
  - Plain field replacement (string, int, boolean, double)
  - Null field value handling (single null, mixed null/non-null)
  - Collection operators (addToList, removeFromList, addToMap, removeFromMap)
  - Error cases (non-record schema, unknown field, invalid JSON, empty
    object, wrong schema type for collection operators)
  - Primitive type adaptation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sixpluszero and others added 2 commits April 8, 2026 15:33
Previous tests only verified buildUpdateFunctionWithSchema helper in
isolation. New tests now cover the actual features:

- writeFromFile batch mode: put/delete/update operations, mixed ops,
  default operation is put, empty line skipping, stop-on-first-error
- writeFromFile error cases: missing key, missing value, invalid JSON,
  unknown operation
- Null value handling in partial updates (the Copilot review concern)
- Collection operators and error cases (wrong schema type)

Uses mocked OnlineVeniceProducer and RouterBasedStoreSchemaFetcher
with real temp JSONL files to test the full writeFromFile code path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ches

- Replace Java assert keyword with TestNG Assert.assertTrue in all
  tests so checks are effective without -ea JVM flag
- Reorder branches in writeToStore so isUpdate is checked before
  value.equals("null"), preventing -u -v null from silently deleting
  instead of attempting a partial update

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 22:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

asText() strips JSON quotes, which breaks the Avro JSON decoder for
ENUM, FIXED, and BYTES schemas (e.g., enum "RED" becomes RED without
quotes). Introduce jsonNodeToString() helper that only uses asText()
when the target schema is STRING, and toString() for all other types.

Applied to all 5 locations: writeFromFile key/value, update field
values, parseJsonArray elements, and parseJsonMap values.

Added tests:
- jsonNodeToString: STRING schema strips quotes, ENUM preserves quotes,
  numeric nodes use toString
- adaptDataToSchema with quoted enum value
- buildUpdateFunctionWithSchema with an ENUM field verifies the adapted
  value is a GenericData.EnumSymbol

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change float test value from 3.14 to 1.25 to avoid SpotBugs flagging
it as a rough approximation of Math.PI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants