Skip to content

Initial work on transactions for chunkstore#776

Open
bmoscon wants to merge 1 commit into
masterfrom
chunkstore-transactions
Open

Initial work on transactions for chunkstore#776
bmoscon wants to merge 1 commit into
masterfrom
chunkstore-transactions

Conversation

@bmoscon

@bmoscon bmoscon commented May 19, 2019

Copy link
Copy Markdown
Collaborator
  • Please do not merge, this is a work in progress - I do want to get some feedback now as I finish out the testing of this change
  1. On every write, creates a transaction document. When write completes, removes the transaction document. Reads and writes check for this, and if its present raise an error.

  2. Supplies a recovery function that can remove the data from a failed transaction. Chunkstore has no way of knowing if a write is in progress or if its has failed, so it cannot recover automatically.

Still testing the recovery method and adding unit tests

@shashank88

Copy link
Copy Markdown
Contributor

Is my understanding correct that this happens as you are basically writing the chunk (data + metadata) sequentially due to bulk_write not being atomic and any interrupts causes a bad intermediate state due to committed metadata?

  • Can having better SIGINT or even a try / finally help with this to start with?
  • Also personally I find using a context manager cleaner than explicit the start/end if possible.
  • Can you add the cleanup as part of some generic _fsck like op that fixes all of these cases?

I haven't spent much time with chunkstore so apologies in advance for stupid questions

@bmoscon

bmoscon commented May 20, 2019

Copy link
Copy Markdown
Collaborator Author

yes, chunkstore creates the chunks, and then writes them one at a time in a bulk operation, so any interruption of that write causes a corrupted state. A power outage or sigkill would be a case where a finally or the like would not be sufficient. There is also the case where some sort of multiprocessing code might try and read/write the same chunk, which would also cause issues. I can certainly change the code to use a context manager, but want to make sure the code works as expected before that (this is WIP). Also, trying to auto recover/fix is not really possible due to the nature of the issue. You can definitely hit an "invalid read" or write scenario where you wouldnt want to try and fix the issue (i.e. a concurrent reader and writer).

@scoriiu

scoriiu commented Oct 9, 2019

Copy link
Copy Markdown

@bmoscon Any updates on this?

@harryd31

Copy link
Copy Markdown

@bmoscon We would love to use this in production in our application however this issue is holding us back... Any update on when we can expect a fix please?

@bmoscon

bmoscon commented Oct 18, 2019

Copy link
Copy Markdown
Collaborator Author

@harryd31 fix is here, give it a try, let me know if it works or doesnt work

@harryd31

Copy link
Copy Markdown

@bmoscon Thank you, I will give it a try and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants