Retry transient read errors during input (VCF, binary) streaming in Glimpse phase#300
Open
mmorgantaylor wants to merge 7 commits into
Open
Conversation
Collaborator
|
Thanks for this. I do understand this extends the retry mechanism to VCF files. Should be easy to check and add to main. Is this ready to merge? |
Author
@srubinacci I'm testing it now - this actually didn't resolve the issue, but we think we also need to add error handling and retries to the haplotype_set input (in caller_header.h and caller_initialise.cpp) |
3 tasks
Author
|
@srubinacci is there any additional testing you'd like to do (or us to do) for this? |
mmorgantaylor
commented
Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix for input and reference panel streaming, using #212 as a model.
Goal: Make GLIMPSE2_phase resilient to transient read failures when inputs are streamed/localized from a cloud filesystem, by retrying with exponential backoff instead of crashing on the first error — while still failing fast on errors that retrying can't fix.
Two areas in phase changed:
In ligate and concordance, we've added explicit error messages to avoid silent failures, but adding retries would be more complex than I'd like to include in this PR.
Testing:
I ran the same GlimpsePhase task on one reference panel chunk and VCF input, 5000 times in parallel, and checked the crc values for the inputs in the checkpointing file.
I also confirmed that the output of phase is identical before and after these changes.