Skip to content

Issue Running ProHap on All-of-Us Dataset #10

Description

@mengysun

Hi @vasicek58 ,

I am the collaborator of @bschilder, who opened a couple of issues before. I'm currently running ProHap on the All-of-Us dataset (414,000 individuals) and encountered an issue that's been a bit puzzling. For context, the pipeline exits with status code 1 and provides minimal error information.

Here’s the relevant error message from the log (from /home/jupyter/workspaces/gpmaps2/ProHap/Snakefile_chr17, line 277):

CalledProcessError:
Command 'source /home/jupyter/miniconda3/bin/activate '/home/jupyter/workspaces/gpmaps2/ProHap/.snakemake/conda/8176726ab2db4e11e088d31f55112ddf_'; set -euo pipefail; mkdir -p tmp/transcript_vcf_haplo; mkdir -p log; mkdir -p results; python3 src/prohap.py -i "data/vcf/phased_0000014706/chr17_phased_filtered.vcf.gz" -db "data/gtf/Homo_sapiens.GRCh38.113.chr_patch_hapl_scaff_chr17.db" -transcripts "data/transcripts_selected.csv" -cdna "data/fasta/total_cdnas_113.fa" -s "data/AoU_meta.tsv" -chr 17 -min_hap_freq 0 -min_hap_count 0 -acc_prefix enshap_17 -id_prefix haplo_chr17 -require_start 1 -ignore_UTR 1 -skip_start_lost 1 -x_par1_to 2781479 -x_par2_from 155701383 -threads 50 -log "log/prohap_chr17.log" -tmp_dir "tmp/transcript_vcf_haplo" -output_csv "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.tsv.gz" -output_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.fa" -output_cdna_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_cdna_chr17.fa"'
returned non-zero exit status 1.

Interestingly, a .tsv file is successfully generated in the tmp/transcript_vcf_haplo directory before the error occurs, so the failure seems to happen right after that step.

I’m wondering if you might have any insight into potential causes? One thought I had was whether the issue could stem from a mismatch between the expected file extension or format (e.g., .csv vs .tsv). However, I didn’t see anywhere in the Snakemake file where the output format is explicitly specified or could be misconfigured.

Would appreciate any guidance or suggestions you might have!

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions