Skip to content

Fix AOF persistence and WATCH for collection-emptying RMW ops [dev]#1677

Open
badrishc wants to merge 3 commits intodevfrom
badrishc/fix-aof-object-mutation-persistence
Open

Fix AOF persistence and WATCH for collection-emptying RMW ops [dev]#1677
badrishc wants to merge 3 commits intodevfrom
badrishc/fix-aof-object-mutation-persistence

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

@badrishc badrishc commented Apr 7, 2026

Bug 1: Collection-emptying RMW operations (LPOP, ZREM, HDEL, SREM that empty a collection) were not logged to AOF. InPlaceUpdaterWorker and PostCopyUpdater returned false before setting NeedAofLog when HasRemoveKey was true, so PostRMWOperation never wrote the AOF entry. Data reappeared after restart.

Bug 2: InPlaceUpdaterWorker's HasRemoveKey path did not call IncrementVersion on the watch version map, so WATCH/MULTI/EXEC transactions did not detect the key modification and incorrectly committed.

Bug 3: AofProcessor.RecoverReplay disposed respServerSession in its finally block, but in multi-DB recovery it ran once per database, causing double-dispose and NullReferenceException in GarnetLatencyMetricsSession.Return(). Moved dispose to AofProcessor.Dispose() which runs exactly once.

Fixes #1675

Copilot AI review requested due to automatic review settings April 7, 2026 17:24
@badrishc badrishc requested a review from vazois April 7, 2026 17:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes correctness issues around object-store RMW operations that empty collections, ensuring they are persisted to AOF and correctly tracked by WATCH, and addresses multi-DB AOF recovery double-dispose behavior.

Changes:

  • Ensure collection-emptying object RMW mutations (e.g., LPOP/ZREM/HDEL/SREM that delete the key) set NeedAofLog so AOF replay preserves deletions.
  • Increment WATCH version on the “remove key” path for in-place object updates so WATCH/MULTI/EXEC detects key modifications.
  • Move AOF replay session disposal to AofProcessor.Dispose() to avoid double-dispose during multi-DB recovery; add regression tests for these scenarios.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/Garnet.test/TransactionTests.cs Adds a WATCH regression test for list-emptying LPOP within a watched transaction flow.
test/Garnet.test/RespAofTests.cs Adds AOF recovery tests covering object-store RMW deletes/partial deletes across lists, sets, hashes, and sorted sets.
test/Garnet.test/MultiDatabaseTests.cs Adds multi-DB AOF recovery regression test for collection-emptying mutations across DBs.
libs/server/Storage/Functions/ObjectStore/RMWMethods.cs Fixes AOF logging + WATCH version increments when RMW operations remove the key.
libs/server/AOF/AofProcessor.cs Centralizes disposal in AofProcessor.Dispose() to avoid double-dispose across multi-DB recovery passes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@badrishc badrishc changed the title Fix AOF persistence and WATCH for collection-emptying RMW ops Fix AOF persistence and WATCH for collection-emptying RMW ops [dev] Apr 7, 2026
badrishc and others added 2 commits April 7, 2026 12:22
Bug 1: Collection-emptying RMW operations (LPOP, ZREM, HDEL, SREM that
empty a collection) were not logged to AOF. InPlaceUpdaterWorker and
PostCopyUpdater returned false before setting NeedAofLog when
HasRemoveKey was true, so PostRMWOperation never wrote the AOF entry.
Data reappeared after restart.

Bug 2: InPlaceUpdaterWorker's HasRemoveKey path did not call
IncrementVersion on the watch version map, so WATCH/MULTI/EXEC
transactions did not detect the key modification and incorrectly
committed.

Bug 3: AofProcessor.RecoverReplay disposed respServerSession in its
finally block, but in multi-DB recovery it ran once per database,
causing double-dispose and NullReferenceException in
GarnetLatencyMetricsSession.Return(). Moved dispose to
AofProcessor.Dispose() which runs exactly once.

Fixes #1675

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/fix-aof-object-mutation-persistence branch from 2403184 to 12b88a8 Compare April 7, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants