Hi @vasicek58 ,
I am the collaborator of @bschilder, who opened a couple of issues before. I'm currently running ProHap on the All-of-Us dataset (414,000 individuals) and encountered an issue that's been a bit puzzling. For context, the pipeline exits with status code 1 and provides minimal error information.
Here’s the relevant error message from the log (from /home/jupyter/workspaces/gpmaps2/ProHap/Snakefile_chr17, line 277):
CalledProcessError:
Command 'source /home/jupyter/miniconda3/bin/activate '/home/jupyter/workspaces/gpmaps2/ProHap/.snakemake/conda/8176726ab2db4e11e088d31f55112ddf_'; set -euo pipefail; mkdir -p tmp/transcript_vcf_haplo; mkdir -p log; mkdir -p results; python3 src/prohap.py -i "data/vcf/phased_0000014706/chr17_phased_filtered.vcf.gz" -db "data/gtf/Homo_sapiens.GRCh38.113.chr_patch_hapl_scaff_chr17.db" -transcripts "data/transcripts_selected.csv" -cdna "data/fasta/total_cdnas_113.fa" -s "data/AoU_meta.tsv" -chr 17 -min_hap_freq 0 -min_hap_count 0 -acc_prefix enshap_17 -id_prefix haplo_chr17 -require_start 1 -ignore_UTR 1 -skip_start_lost 1 -x_par1_to 2781479 -x_par2_from 155701383 -threads 50 -log "log/prohap_chr17.log" -tmp_dir "tmp/transcript_vcf_haplo" -output_csv "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.tsv.gz" -output_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.fa" -output_cdna_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_cdna_chr17.fa"'
returned non-zero exit status 1.
Interestingly, a .tsv file is successfully generated in the tmp/transcript_vcf_haplo directory before the error occurs, so the failure seems to happen right after that step.
I’m wondering if you might have any insight into potential causes? One thought I had was whether the issue could stem from a mismatch between the expected file extension or format (e.g., .csv vs .tsv). However, I didn’t see anywhere in the Snakemake file where the output format is explicitly specified or could be misconfigured.
Would appreciate any guidance or suggestions you might have!
Best regards
Hi @vasicek58 ,
I am the collaborator of @bschilder, who opened a couple of issues before. I'm currently running ProHap on the All-of-Us dataset (414,000 individuals) and encountered an issue that's been a bit puzzling. For context, the pipeline exits with status code 1 and provides minimal error information.
Here’s the relevant error message from the log (from /home/jupyter/workspaces/gpmaps2/ProHap/Snakefile_chr17, line 277):
CalledProcessError:
Command 'source /home/jupyter/miniconda3/bin/activate '/home/jupyter/workspaces/gpmaps2/ProHap/.snakemake/conda/8176726ab2db4e11e088d31f55112ddf_'; set -euo pipefail; mkdir -p tmp/transcript_vcf_haplo; mkdir -p log; mkdir -p results; python3 src/prohap.py -i "data/vcf/phased_0000014706/chr17_phased_filtered.vcf.gz" -db "data/gtf/Homo_sapiens.GRCh38.113.chr_patch_hapl_scaff_chr17.db" -transcripts "data/transcripts_selected.csv" -cdna "data/fasta/total_cdnas_113.fa" -s "data/AoU_meta.tsv" -chr 17 -min_hap_freq 0 -min_hap_count 0 -acc_prefix enshap_17 -id_prefix haplo_chr17 -require_start 1 -ignore_UTR 1 -skip_start_lost 1 -x_par1_to 2781479 -x_par2_from 155701383 -threads 50 -log "log/prohap_chr17.log" -tmp_dir "tmp/transcript_vcf_haplo" -output_csv "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.tsv.gz" -output_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_chr17.fa" -output_cdna_fasta "results/All_of_Us_haplotypes_0000014706_tmp/haplo_cdna_chr17.fa"'
returned non-zero exit status 1.
Interestingly, a .tsv file is successfully generated in the tmp/transcript_vcf_haplo directory before the error occurs, so the failure seems to happen right after that step.
I’m wondering if you might have any insight into potential causes? One thought I had was whether the issue could stem from a mismatch between the expected file extension or format (e.g., .csv vs .tsv). However, I didn’t see anywhere in the Snakemake file where the output format is explicitly specified or could be misconfigured.
Would appreciate any guidance or suggestions you might have!
Best regards