Skip to content

Retry transient read errors during input (VCF, binary) streaming in Glimpse phase#300

Open
mmorgantaylor wants to merge 7 commits into
odelaneau:masterfrom
mmorgantaylor:tsps-792-glimpse-phase-stream-vcf
Open

Retry transient read errors during input (VCF, binary) streaming in Glimpse phase#300
mmorgantaylor wants to merge 7 commits into
odelaneau:masterfrom
mmorgantaylor:tsps-792-glimpse-phase-stream-vcf

Conversation

@mmorgantaylor

@mmorgantaylor mmorgantaylor commented Jun 22, 2026

Copy link
Copy Markdown

Fix for input and reference panel streaming, using #212 as a model.

Goal: Make GLIMPSE2_phase resilient to transient read failures when inputs are streamed/localized from a cloud filesystem, by retrying with exponential backoff instead of crashing on the first error — while still failing fast on errors that retrying can't fix.

Two areas in phase changed:

  1. Target GL streaming — phase/src/io/genotype_reader.cpp / .h
  2. Binary reference panel localization — phase/src/caller/caller_initialise.cpp / caller_header.h

In ligate and concordance, we've added explicit error messages to avoid silent failures, but adding retries would be more complex than I'd like to include in this PR.

Testing:
I ran the same GlimpsePhase task on one reference panel chunk and VCF input, 5000 times in parallel, and checked the crc values for the inputs in the checkpointing file.

  • As a positive control, I ran this before applying the changes in this PR, and 9/5000 tasks had differing checksums.
  • After these changes, 0/5000 tasks had differing checksums, indicating the issue is resolved

I also confirmed that the output of phase is identical before and after these changes.

@srubinacci

Copy link
Copy Markdown
Collaborator

Thanks for this. I do understand this extends the retry mechanism to VCF files. Should be easy to check and add to main. Is this ready to merge?

@mmorgantaylor

mmorgantaylor commented Jun 23, 2026

Copy link
Copy Markdown
Author

Is this ready to merge?

@srubinacci I'm testing it now - this actually didn't resolve the issue, but we think we also need to add error handling and retries to the haplotype_set input (in caller_header.h and caller_initialise.cpp)

@mmorgantaylor mmorgantaylor changed the title Fix VCF streaming Fix input streaming Jun 23, 2026
@mmorgantaylor mmorgantaylor marked this pull request as ready for review June 23, 2026 21:42
@mmorgantaylor

Copy link
Copy Markdown
Author

@srubinacci is there any additional testing you'd like to do (or us to do) for this?

@mmorgantaylor mmorgantaylor changed the title Fix input streaming Retry transient read errors during input (VCF, binary) streaming Jun 24, 2026
Comment thread phase/src/io/retry_io.h Outdated
@mmorgantaylor mmorgantaylor changed the title Retry transient read errors during input (VCF, binary) streaming Retry transient read errors during input (VCF, binary) streaming in Glimpse phase Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants