Issue Running ProHap on All-of-Us Dataset

Hi @vasicek58 ,

I am the collaborator of @bschilder, who opened a couple of issues before. I'm currently running ProHap on the All-of-Us dataset (414,000 individuals) and encountered an issue that's been a bit puzzling. For context, the pipeline exits with status code 1 and provides minimal error information.

Here’s the relevant error message from the log (from /home/jupyter/workspaces/gpmaps2/ProHap/Snakefile_chr17, line 277):

CalledProcessError:
Command 'source /home/jupyter/miniconda3/bin/activate '/home/jupyter/workspaces/gpmaps2/ProHap/.snakemake/conda/8176726ab2db4e11e088d31f55112ddf_'; set -euo pipefail; mkdir -p tmp/transcript_vcf_haplo; mkdir -p log; mkdir -p results; python3 src/prohap.py -i "data/vcf/phased_0000014706/chr17_phased_filtered.vcf.gz" -db "data/gtf/Homo_sapiens.GRCh38.113.chr_patch_hapl_scaff_chr17.db" -transcripts "data/transcripts_selected.csv" -cdna "data/fasta/total_cdnas_113.fa" -s "data/AoU_meta.tsv" -chr 17 -min_hap_freq 0 -min_hap_count 0 -acc_prefix enshap_17 -id_prefix haplo_chr17 -require_start 1 -ignore_UTR 1 -skip_start_lost 1 -x_par1_to 2781479 -x_par2_from 155701383 -threads 50 -log "log/prohap_chr17.log" -tmp_dir "tmp/transcript_vcf_haplo" -output_csv "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.tsv.gz" -output_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.fa" -output_cdna_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_cdna_chr17.fa"'
returned non-zero exit status 1.

Interestingly, a .tsv file is successfully generated in the tmp/transcript_vcf_haplo directory before the error occurs, so the failure seems to happen right after that step.

I’m wondering if you might have any insight into potential causes? One thought I had was whether the issue could stem from a mismatch between the expected file extension or format (e.g., .csv vs .tsv). However, I didn’t see anywhere in the Snakemake file where the output format is explicitly specified or could be misconfigured.

Would appreciate any guidance or suggestions you might have!

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue Running ProHap on All-of-Us Dataset #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Issue Running ProHap on All-of-Us Dataset #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions