Hi!
I'm folding a variety of homomer proteins and am running into OOM issues for the larger folds. I originally tried to fold them in one session, but the larger folds led to OOM, which made me think that the larger buckets couldn't be compiled with the smaller buckets. I am now running each bucket separately to make sure only one bucket is folded at a time, but for the 4608 bucket size I am still getting OOM. I'm running on an H100 with 80GB GPU so i'm a little confused why.
nvidia-smi output:
Fri Dec 19 20:27:14 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.274.02 Driver Version: 535.274.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:33:00.0 Off | 0 |
| N/A 31C P0 113W / 700W | 1957MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
docker command:
set -x
docker run --rm \
--volume "$(pwd)/$bin_dir":/root/af_input \
--volume "$(pwd)/$bin_output_dir":/root/af_output \
--volume "$HOME/AF3_model":/root/models \
--volume "$HOME/AF3_db/sharded_databases":/root/public_databases \
--gpus all \
-e XLA_PYTHON_CLIENT_PREALLOCATE=true \
-e TF_FORCE_UNIFIED_MEMORY=true \
$AF_IMAGE python run_alphafold.py \
--input_dir=/root/af_input \
--model_dir=/root/models \
--output_dir=/root/af_output \
--run_data_pipeline=false \
--buckets=256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120
) 2>&1 | tee "${bin_output_dir}/${INPUT_BASENAME}_bin_${padded_num}.log"
log file from docker run:
+ docker run --rm --volume /home/ubuntu/remaining4/bin_02:/root/af_input --volume /home/ubuntu/remaining4_bin_02_output:/root/af_output --volume /home/ubuntu/AF3_model:/root/models --volume /home/ubuntu/AF3_db/sharded_databases:/root/public_databases --gpus all -e XLA_PYTHON_CLIENT_PREALLOCATE=true -e TF_FORCE_UNIFIED_MEMORY=true alphafold3 python run_alphafold.py --input_dir=/root/af_input --model_dir=/root/models --output_dir=/root/af_output --run_data_pipeline=false --buckets=256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120
I1219 18:12:08.387334 133477095690560 xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
I1219 18:12:08.388193 133477095690560 xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
I1219 18:12:18.181271 133477095690560 pipeline.py:173] processing WP_147157570.1_copies_4, random_seed=1
I1219 18:12:18.784216 133477095690560 pipeline.py:266] Calculating bucket size for input with 4184 tokens.
I1219 18:12:18.784405 133477095690560 pipeline.py:272] Got bucket size 4608 for input with 4184 tokens, resulting in 424 padded tokens.
2025-12-19 18:13:21.515373: W external/xla/xla/service/hlo_rematerialization.cc:3005] Can't reduce memory use below 72.33GiB (77668778489 bytes) by rematerialization; only reduced to 82.68GiB (88779527564 bytes), down from 82.68GiB (88779527564 bytes) originally
2025-12-19 18:13:38.860537: W external/xla/xla/tsl/framework/bfc_allocator.cc:497] Allocator (GPU_0_bfc) ran out of memory trying to allocate 81.01GiB (rounded to 86984403200)requested by op
2025-12-19 18:13:38.861254: W external/xla/xla/tsl/framework/bfc_allocator.cc:508] ****________________________________________________________________________________________________
E1219 18:13:38.861577 1 pjrt_stream_executor_client.cc:3084] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 86984403160 bytes.
Traceback (most recent call last):
File "/app/alphafold/run_alphafold.py", line 981, in <module>
Running AlphaFold 3. Please note that standard AlphaFold 3 model parameters are
only available under terms of use provided at
https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md.
If you do not agree to these terms and are using AlphaFold 3 derived model
parameters, cancel execution of AlphaFold 3 inference with CTRL-C, and do not
use the model parameters.
Found local devices: [CudaDevice(id=0)], using device 0: cuda:0
Building model from scratch...
Checking that model parameters can be loaded...
Running fold job WP_147157570.1_copies_4...
Output will be written in /root/af_output/WP_147157570.1_copies_4
Skipping data pipeline...
Writing model input JSON to /root/af_output/WP_147157570.1_copies_4/WP_147157570.1_copies_4_data.json
Predicting 3D structure for WP_147157570.1_copies_4 with 1 seed(s)...
Featurising data with 1 seed(s)...
Featurising data with seed 1.
Featurising data with seed 1 took 48.11 seconds.
Featurising data with 1 seed(s) took 53.23 seconds.
Running model inference and extracting output structure samples with 1 seed(s)...
Running model inference with seed 1...
app.run(main)
File "/alphafold3_venv/lib/python3.12/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/alphafold3_venv/lib/python3.12/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
^^^^^^^^^^
File "/app/alphafold/run_alphafold.py", line 963, in main
process_fold_input(
File "/app/alphafold/run_alphafold.py", line 797, in process_fold_input
all_inference_results = predict_structure(
^^^^^^^^^^^^^^^^^^
File "/app/alphafold/run_alphafold.py", line 543, in predict_structure
result = model_runner.run_inference(example, rng_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/alphafold/run_alphafold.py", line 438, in run_inference
result = self._model(rng_key, featurised_example)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 86984403160 bytes.
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
when i update to enable unified memory spillover below:
(
set -x
docker run --rm \
--volume "$(pwd)/$bin_dir":/root/af_input \
--volume "$(pwd)/$bin_output_dir":/root/af_output \
--volume "$HOME/AF3_model":/root/models \
--volume "$HOME/AF3_db/sharded_databases":/root/public_databases \
--gpus all \
-e XLA_PYTHON_CLIENT_PREALLOCATE=true \
-e XLA_CLIENT_MEM_FRACTION=0.98 \
-e TF_FORCE_UNIFIED_MEMORY=true \
-e XLA_CLIENT_MEM_FRACTION=3.2 \
$AF_IMAGE python run_alphafold.py \
--input_dir=/root/af_input \
--model_dir=/root/models \
--output_dir=/root/af_output \
--run_data_pipeline=false \
--buckets=256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120
) 2>&1 | tee "${bin_output_dir}/${INPUT_BASENAME}_bin_${padded_num}.log"
it is able to run, but the memory spillover is substantial (one 4608 fold takes ~35 minutes, where a 5120 bucket fold takes ~24 minutes according to performance docs)
I am not entirely sure what may be wrong, any help would be much appreciated!
Hi!
I'm folding a variety of homomer proteins and am running into OOM issues for the larger folds. I originally tried to fold them in one session, but the larger folds led to OOM, which made me think that the larger buckets couldn't be compiled with the smaller buckets. I am now running each bucket separately to make sure only one bucket is folded at a time, but for the 4608 bucket size I am still getting OOM. I'm running on an H100 with 80GB GPU so i'm a little confused why.
nvidia-smi output:
docker command:
log file from docker run:
when i update to enable unified memory spillover below:
( set -x docker run --rm \ --volume "$(pwd)/$bin_dir":/root/af_input \ --volume "$(pwd)/$bin_output_dir":/root/af_output \ --volume "$HOME/AF3_model":/root/models \ --volume "$HOME/AF3_db/sharded_databases":/root/public_databases \ --gpus all \ -e XLA_PYTHON_CLIENT_PREALLOCATE=true \ -e XLA_CLIENT_MEM_FRACTION=0.98 \ -e TF_FORCE_UNIFIED_MEMORY=true \ -e XLA_CLIENT_MEM_FRACTION=3.2 \ $AF_IMAGE python run_alphafold.py \ --input_dir=/root/af_input \ --model_dir=/root/models \ --output_dir=/root/af_output \ --run_data_pipeline=false \ --buckets=256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120 ) 2>&1 | tee "${bin_output_dir}/${INPUT_BASENAME}_bin_${padded_num}.log"it is able to run, but the memory spillover is substantial (one 4608 fold takes ~35 minutes, where a 5120 bucket fold takes ~24 minutes according to performance docs)
I am not entirely sure what may be wrong, any help would be much appreciated!