Skip to content

fix: propagate failures in multi-accession prefetch#70

Merged
noamteyssier merged 4 commits into
mainfrom
fix/multi-accession-prefetch-error-propagation
May 29, 2026
Merged

fix: propagate failures in multi-accession prefetch#70
noamteyssier merged 4 commits into
mainfrom
fix/multi-accession-prefetch-error-propagation

Conversation

@nick-youngblut

@nick-youngblut nick-youngblut commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Multi-accession xsra prefetch could exit 0 even when some accessions failed to resolve or download, because failures in that path were only logged. Since prefetch is typically an upstream step in a batch workflow, a zero exit code after a partial failure lets downstream steps run on missing/partial inputs.

Changes

  • Fail loudly on partial failure (src/prefetch/mod.rs): the multi-accession path now accumulates every failure and returns a single summary error naming all failed accessions, while preserving the existing best-effort, concurrent behavior (every accession is still attempted). Covered failure points: URL resolution, missing GCP project ID, unsupported provider, and HTTPS/GCP download errors. Each download future now carries its accession so failures are attributable by name.
  • Stop dropping join errors in identify_urls: task JoinErrors were previously logged and discarded along with their accession. Since join_all preserves order, results are zipped against the input accessions so a join failure becomes a reportable Err entry instead of vanishing.
  • Regression test: prefetch_multi_fails_when_any_url_resolution_fails runs fully offline (via the test query_entrez stub) and asserts the command errors and the message names all failed accessions, not just the first.
  • Repo housekeeping: added CLAUDE.md (build/test/architecture guidance) and expanded .gitignore.

The single-accession path already propagated errors correctly and is unchanged.

- Expanded .gitignore to include additional file types and directories.
- Introduced CLAUDE.md to provide comprehensive guidance on using the `xsra` CLI, including commands, architecture, and conventions.
- Enhanced error handling in `prefetch` module to report per-accession failures.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the .gitignore configuration, adds a comprehensive CLAUDE.md guide for development, and significantly improves error handling in the prefetch module. Specifically, it ensures that failures during concurrent URL resolution and downloads (both HTTPS and GCP) are properly attributed to their respective accessions and accumulated, allowing the command to fail loudly at the end with a detailed summary. The review feedback suggests a minor performance optimization to pre-allocate the processed_results vector with the known capacity of accessions to avoid unnecessary re-allocations.

Comment thread src/prefetch/mod.rs Outdated
- Changed initialization of processed_results to use Vec::with_capacity for improved performance by preallocating memory based on the length of accessions.
@noamteyssier noamteyssier merged commit 46ffd0b into main May 29, 2026
9 checks passed
@noamteyssier noamteyssier deleted the fix/multi-accession-prefetch-error-propagation branch May 29, 2026 17:05
@noamteyssier

Copy link
Copy Markdown
Collaborator

nice add - new version with addition just published to crates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants