not enough memory for gatk3_join

Hi!
I have run scan2 on 91 BAM files (median coverage 18X), with the genome split up into 100,000 chunks in the `--regions-file`. Here is my command: 
```
scan2 run \
  --joblimit 800 \
  --cluster \
    "bsub -q basement -M {resources.mem_mb} -R'span[hosts=1] select[mem>{resources.mem_mb}] rusage[mem={resources.mem_mb}]' \
      -n {threads} -o %logdir/%J.out -e %logdir/%J.err"
```
...and I got this error:
```
[Mon Nov  3 16:32:18 2025]
Error in rule gatk_scatter:
    jobid: 53403
    input: plate3_wellA2_dna_run49882.bam, [...], PD63118b_lo0001.sample.dupmarked.bam
    output: gatk/hc_raw.mmq1_chunk24572.vcf, gatk/hc_raw.mmq1_chunk24572.vcf.idx
    shell:
        gatk3 -Xmx3500M -Xms3500M    -T HaplotypeCaller    -R data/scan2/GRCh37/genome.fa    --dontUseSoftClippedBases -l INFO    --dbsnp reference/dbsnp/GRCh37/common_all_20180423.vcf    -rf BadCigar     -mmq 1    -I plate3_wellA2_dna_run49882.bam [...] -I PD63118b_lo0001.sample.dupmarked.bam    -L 16:46400001-46500000    -o gatk/hc_raw.mmq1_chunk24572.vcf
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Job <983205> is submitted to queue <basement>.

Error executing rule gatk_scatter on cluster (jobid: 53403, external: Job <983205> is submitted to queue <basement>., jobscript: PD63118/.snakemake/tmp.j0wo9x64/snakejob.gatk_scatter.53403.sh). For error details see the cluster log and the log files of the involved rule(s).
[Mon Nov  3 16:32:18 2025]
Error in rule gatk_scatter:
    jobid: 53402
    input: plate3_wellA2_dna_run49882.bam, [...], PD63118b_lo0001.sample.dupmarked.bam
    output: gatk/hc_raw.mmq1_chunk24571.vcf, gatk/hc_raw.mmq1_chunk24571.vcf.idx
    shell:
        gatk3 -Xmx3500M -Xms3500M    -T HaplotypeCaller    -R /nfs/casm/team268im/at31/projects/hashimoto_thyroiditis/data/scan2/GRCh37/genome.fa    --dontUseSoftClippedBases -l INFO    --dbsnp /nfs/casm/team268im/at31/reference/dbsnp/GRCh37/common_all_20180423.vcf    -rf BadCigar     -mmq 1    -I plate3_wellA2_dna_run49882.bam [...] -I PD63118b_lo0001.sample.dupmarked.bam    -L 16:46300001-46400000    -o gatk/hc_raw.mmq1_chunk24571.vcf
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Job <983255> is submitted to queue <basement>.

Error executing rule gatk_scatter on cluster (jobid: 53402, external: Job <983255> is submitted to queue <basement>., jobscript: /lustre/scratch125/casm/teams/team268/at31/projects/hashimoto_thyroiditis/out/resolveome/scan2/PD63118/.snakemake/tmp.j0wo9x64/snakejob.gatk_scatter.53402.sh). For error details see the cluster log and the log files of the involved rule(s).
Submitted job 9591 with external jobid 'Job <983594> is submitted to queue <basement>.'.

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2025-11-03T112035.866671.snakemake.log
```
The actual error in `983205.err` is an out-of-memory error:
```
##### ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. 
You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java.
```
The job allocates 3.5GB to GATK (`gatk3 -Xmx3500M -Xms3500M`), which seems to be insufficient for 91 BAMs. I see that the memory allocation scales with the number of BAMs in the SENTIEON implementation, but it doesn't in the GATK3 implementation. I'm not sure how to increase the memory allocation for this process. Do you have any idea how I can prevent this failing?
Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not enough memory for gatk3_join #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

not enough memory for gatk3_join #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions