Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions models/rfd3/docs/examples/demo.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
"A156": "NZ,CE,CD",
"ACT": "OXT",
"NAI": ""
}
},
"allow_ligand_on_existing_chain": true
},
"partial_diffusion": {
"input": "../input_pdbs/7v11.pdb",
Expand All @@ -23,7 +24,8 @@
"A431": "TIP",
"A572": "BKBN",
"A573": "BKBN"
}
},
"allow_ligand_on_existing_chain": true
},
"dsDNA_basic": {
"input": "../input_pdbs/1bna.pdb",
Expand Down
3 changes: 2 additions & 1 deletion models/rfd3/src/rfd3/inference/input_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1354,7 +1354,8 @@ def accumulate_components(
np.zeros(token.shape[0], dtype=int),
)
else:
n = int(component)
## foundry components update sends P for protein tokens
n = int(component[:-1])
# ... Skip if none or unindexed
if n == 0 or unindexed_components_started:
res_id += n
Expand Down
3 changes: 2 additions & 1 deletion models/rfd3/src/rfd3/inference/legacy_input_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,8 @@ def accumulate_components(
np.ones(atom_array_insert.shape[0], dtype=int),
)
else:
n = int(component)
## foundry components update sends P for protein tokens
n = int(component[:-1])
Comment thread
r-krishna marked this conversation as resolved.
if n == 0 or unindexed_components_started:
res_id += n
continue
Expand Down
Binary file added models/rfd3na/.assets/dna.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added models/rfd3na/.assets/enzyme.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added models/rfd3na/.assets/ppi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added models/rfd3na/.assets/sm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added models/rfd3na/.assets/symm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added models/rfd3na/.assets/trajectory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions models/rfd3na/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
tests/test_data
**.dat
.env
193 changes: 193 additions & 0 deletions models/rfd3na/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# De novo Design of Biopolymers with Atomic Functional Sites using RFdiffusion3

RFdiffusion3NA (RFD3NA) is an expanded version of RFDiffusion3, that can design multiplolymer structures (including protein-DNA-RNA) under complex constraints.

This repository contains both the training and inference code, and
both are described in more detail below.


<p align="center">
<img src="docs/.assets/rfd3na_rnasep.jpg" alt="All-atom design with RFD3NA">
</p>

## Getting Started
1. Install RFdiffusion3NA.
If you have already installed all the models and **are not** interested in hydrogen bond conditioning skip [here](#running-inference). <br><br>
If you have already installed all the models and **are** interested in hydrogen bond conditioning skip [here](#hydrogen-bond-conditioning)
If you would like to install all of the foundry models (recommended), see the [foundry README](../../README.md) for instructions. <br><br>
If you would like to install only RFD3NA:
```bash
pip install rc-foundry[rfd3na]
```

2. Download checkpoint to your desired checkpoint location.
```bash
foundry install rfd3na --checkpoint-dir <path/to/ckpt/dir>
```
This sets `FOUNDRY_CHECKPOINT_DIRS` and will in future look for checkpoints in that directory (alongside the default `~/.foundry/checkpoints` location), allowing you to run inference without supplying the checkpoint path. The checkpoint directory is optional, defaulting to `~/.foundry/checkpoints` if unset.

Recommended checkpoint (default): https://files.ipd.uw.edu/pub/rfdiffusion3na/rfd3na-1190.ckpt

Preprint Figure 2 checkpoint: https://files.ipd.uw.edu/pub/rfdiffusion3na/rfd3na-890.ckpt

### Hydrogen Bond Conditioning
If you would like to use hydrogen bond conditioning in your designs,
you need to install [HBPLUS](https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/). This is **not** installed by default:

3. Download HBPLUS from here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/download.html (available for free)
4. Follow the installation instruction here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/install.html
5. Update `HBPLUS_PATH` in `foundry/.env` file with the path to your `hbplus` executable.

## Running Inference

Below is a quick inference example to run to test that your setup
is working correctly.

To run inference (with foundry installed in your environment, or RFD3 & Foundry src in PYTHONPATH):
```bash
rfd3na design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3na/docs/examples/atom23_design.json skip_existing=False dump_trajectories=True prevalidate_inputs=True read_sequence_from_sequence_head=False
```

`read_sequence_from_sequence_head=False` is recommended global setting for RFD3NA.

Similar concepts of input specification as in RFD3 apply here:

Main modification is you can now specify 'R' or 'D' suffix to your contig parts to specify RNA or DNA generation e.g. `10-10,20-20R,30-30D` would generate a protein chain of length 10, an RNA chain of length 20 and a DNA chain of length 30.

See the RFD3 [external documentation for more details](https://rosettacommons.github.io/foundry/models/rfd3/index.html#general)) where you specify your design constraints and the output directory (`out_dir`) where you want to store the files RFD3NA generates.

Additional unnecessary (but useful!) options are added to the above command:
- `dump_trajectories`: Dumps trajectory structures, can be useful for debugging your setup or making cool gifs. However, trajectory files are large, thus this setting is False by default.
- `prevalidate_inputs`: Checks that your inputs are valid before running inference. Helpful if your JSON/YAML has a number of different configs you want to debug / double check are valid before loading the checkpoints.
- `skip_existing`: Skips any existing files that would be in the same place and have the same name as the calculation being run. If you are testing your setup multiple times, including this option is important so that you actually run RFdiffusion3.

## Further example JSONs for different applications of RFD3NA
Additional examples are are available. If you have cloned the
repository, matching `.json` files are in `foundry/models/rfd3/docs/examples`
that can be run directly, similar to the previous example.

<div align="center">
<h2>
<a href="./docs/examples/atom23_design.md"><sub></sub>Multipolymer design examples</a>
</h2>
<img src="docs/.assets/atom23.gif" height="500">
</div>


In the examples, the paths to the input files are specified assuming
that you are running the examples from the `foundry/models/rfd3/docs/examples`
directory. If you would like to run RFD3NA from a different location,
you will need to change the path in the `.json` file(s) before running.

There are various interesting ways you can use RFD3NA design as it's trained on a large array of different tasks for botjh protein and nucleic acids.
For example, you can fix sequence and not structure (prediction-type task), fix the backbone and unfix the sequence (MPNN-type inverse folding) or unfix the sidechains only (PLACER/ChemNet-style):

<p align="center">
<img src="docs/.assets/conditioning.png" alt="Conditioning options for RFD3NA">
</p>

For full details on how to specify inputs, see the [input specification documentation](./docs/input.md). You can also see `foundry/models/rfd3/configs/inference_engine/rfdiffusion3.yaml` for even more options.
The `BKBN` and `TIP` shorthands do not apply to nucleic acids, but the functionalities exist. Should specify corresponding atom names.


## Training and Fine-Tuning

We make available to the community not only the weights to run RFdiffusion3NA but also the complete training code, easily extendable to additional use cases. Any AtomWorks-compatible dataset (and thus, any collection of structure files) can be readily incorporated and used for training or fine-tuning.

### Dataset Configuration

#### PDB Training

To train on the PDB:

1. Set up PDB and CCD mirrors as described in the [AtomWorks documentation](https://rosettacommons.github.io/atomworks/latest/mirrors.html)
2. Update the [path configs](/models/rfd3na/configs/paths/) to point to the correct base directories for the metadata parquets
3. Set the `PDB_MIRROR` and `CCD_PATH` variables in your `.env` file

#### Custom Datasets

RFdiffusion3NA supports arbitrary datasets of structure files for training and fine-tuning via AtomWorks. See the [AtomWorks dataset documentation](https://rosettacommons.github.io/atomworks/latest/auto_examples/dataset_exploration.html) for details on creating custom datasets.

### Running Training

After setting up Hydra configs, launch a training run:
```bash
uv run python models/rfd3na/src/rfd3na/train.py experiment=rfd3na ckpt_path=<path/to/ckpt>
```

Supplying `ckpt_path=null` (default) will start with fresh weights.
See the [path configs](/models/rfd3na/configs/paths/) to customize data input and log output directories.

### Logging Configuration

Training runs support logging via [Weights & Biases](https://wandb.ai/). To enable wandb logging:

```bash
uv run python models/rfd3na/src/rfd3na/train.py experiment=rfd3na logger=wandb
```

To run training without wandb (default):
```bash
uv run python models/rfd3na/src/rfd3na/train.py experiment=rfd3na logger=csv
```

### Install HBPLUS for training with hydrogen bond conditioning:

1. Download hbplus from here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/download.html (available for free)
2. Follow the installation instruction here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/install.html
3. Update `HBPLUS_PATH` in `foundry/.env` file with the path to your `hbplus` executable.

## Distributed Training
To use distributed training, you could use a command such as this (we use Lightning Fabric to handle ddp)
```
EFFECTIVE_BATCH_SIZE=16
DEVICES_PER_NODE= #INSERT NUMBER OF DEVICES PER NODE
NNODES = # INSERT NUMBER OF NODES
GRAD_ACCUM_STEPS=$((EFFECTIVE_BATCH_SIZE / (DEVICES_PER_NODE * NNODES)))
srun --kill-on-bad-exit uv run python models/rfd3na/src/rfd3na/train.py \
experiment=pretrain \
trainer.devices_per_node=$DEVICES_PER_NODE \
trainer.num_nodes=$SLURM_NNODES \
trainer.grad_accum_steps=$GRAD_ACCUM_STEPS"
```
Notably, fabric must receive `devices_per_node` and the number of nodes (`num_nodes`) you're training on.

**Dataset Paths:** See the paths [configs](/models/rfd3na/configs/paths/) to customize the paths where data is read from and where logs are written. There is also a wandb config that can be enabled if you want to log training through wandb.

**Hydra configs and experiments:** In the example above, the `experiment` argument is a hydra-native argument. For RFD3NA, it will look for config overrides in `/models/rfd3na/configs/experiment/<experiment-name>.yaml` and apply them on top of the base configs

**Conditioning during training:** RFD3NA is trained on a multitude of conditioning tasks, and does so by randomly 'creating problems' for it to solve during training. For example, for a random training example it gets a random set of tokens to be 'motif tokens', then subsets those to whether specific atoms should be fixed, and further subsets the information to whether, say, sequence, coordinates or the sequence index should be fixed. It's pretty complicated to evaluate and how it was put together was more of an art than a science. There's likely still room for
further optimization!

In `models/rfd3na/configs/datasets/design_base_rfd3na.yaml` there's the shared configs for all datasets under `global_transform_args`. The dials that control the conditioning described above go under `training_conditions`, where for example `tipatom` - a specific preset conditioning sampler which more frequently fixes few tokens with few atoms - and others can be found.

**Training with WandB:** We strongly recommend tracking your runs via wandb. To use it, simply have your WANDB_API_KEY set and use the wandb logger. For more details see [here](https://wandb.ai/site/)

# Appendix

## Install HBPLUS for hydrogen bond conditioning:
One of the examples shows how to incorporate hydrogen bond conditioning
into your designs. To make use of this feature, you will need to
additionally complete the following steps:

1. Download hbplus from here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/download.html (available for free)
2. Follow the installation instruction here: https://www.ebi.ac.uk/thornton-srv/software/HBPLUS/install.html
3. Update `HBPLUS_PATH` in `foundry/.env` file with the path to your `hbplus` executable.

## Citation

If you use this code or data in your work, please consider citing:

```bibtex
@article {butcher2025_rfdiffusion3,
author = {Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David},
title = {De novo Design of All-atom Biomolecular Interactions with RFdiffusion3},
elocation-id = {2025.09.18.676967},
year = {2025},
doi = {10.1101/2025.09.18.676967},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/11/19/2025.09.18.676967},
eprint = {https://www.biorxiv.org/content/early/2025/11/19/2025.09.18.676967.full.pdf},
journal = {bioRxiv}
}
```
Empty file.
11 changes: 11 additions & 0 deletions models/rfd3na/configs/callbacks/design_callbacks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- train_logging
- metrics_logging
- _self_

log_learning_rate_callback:
log_every_n: 25 # default 10

log_af3_training_losses_callback:
log_full_batch_losses: False
log_every_n: 25 # default 10
20 changes: 20 additions & 0 deletions models/rfd3na/configs/callbacks/metrics_logging.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
store_validation_metrics_in_df_callback:
_target_: foundry.callbacks.metrics_logging.StoreValidationMetricsInDFCallback
save_dir: ${paths.output_dir}/val_metrics
metrics_to_save: "all"

dump_validation_structures_callback:
_target_: rfd3na.trainer.dump_validation_structures.DumpValidationStructuresCallback
save_dir: ${paths.output_dir}/val_structures
dump_predictions: True
dump_prediction_metadata_json: True
dump_trajectories: False
dump_denoised_trajectories_only: False

one_model_per_file: True
dump_every_n: 4
align_trajectories: False
verbose: False

log_design_validation_metrics_callback:
_target_: rfd3na.callbacks.LogDesignValidationMetricsCallback
24 changes: 24 additions & 0 deletions models/rfd3na/configs/callbacks/train_logging.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
log_af3_training_losses_callback:
_target_: foundry.callbacks.train_logging.LogAF3TrainingLossesCallback
log_every_n: 10
log_full_batch_losses: true

log_learning_rate_callback:
_target_: foundry.callbacks.train_logging.LogLearningRateCallback
log_every_n: 10

log_model_parameters_callback:
_target_: foundry.callbacks.train_logging.LogModelParametersCallback

log_dataset_sampling_ratios_callback:
_target_: foundry.callbacks.train_logging.LogDatasetSamplingRatiosCallback

# Optional health logging
# activations_tracking_callback:
# _target_: foundry.callbacks.health_logging.ActivationsGradientsWeightsTracker
# log_freq: 100
# keep_cache: True # --> WARNING: Do not run this in a production run, this will lead to a memory leak! Meant for debugging.
# activations_tracking_callback:
# _target_: foundry.callbacks.health_logging.ActivationsGradientsWeightsTracker
# log_freq: 100
# keep_cache: True # --> WARNING: Do not run this in a production run, this will lead to a memory leak! Meant for debugging.
15 changes: 15 additions & 0 deletions models/rfd3na/configs/dataloader/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
train:
dataloader_params:
# These parameters will be unpacked as kwargs for the DataLoader
batch_size: 1
num_workers: 2
prefetch_factor: 3
n_fallback_retries: 4

val:
dataloader_params:
# These parameters will be unpacked as kwargs for the DataLoader
batch_size: 1
num_workers: 2
prefetch_factor: 3
n_fallback_retries: 0 # Disable fallback retries for validation
11 changes: 11 additions & 0 deletions models/rfd3na/configs/dataloader/fast.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- default

train:
dataloader_params:
num_workers: 2
prefetch_factor: 6
val:
dataloader_params:
num_workers: 2
prefetch_factor: 6
3 changes: 3 additions & 0 deletions models/rfd3na/configs/datasets/conditions/dna_condition.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
_target_: rfd3na.transforms.training_conditions.SubtypeCondition
frequency: 1.0
subtype: ["is_dna", "is_rna"]
28 changes: 28 additions & 0 deletions models/rfd3na/configs/datasets/conditions/island.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

_target_: rfd3na.transforms.training_conditions.IslandCondition
frequency: 1.0
name: island

# Island sampling (`is_motif_token` assignment)
island_sampling_kwargs:
island_len_min: 1
island_len_max: 12 # Rec 25, kept lower because unindexed motifs get sampled too and create more tokens.
n_islands_min: 2
n_islands_max: 5

# Subgraph / within-token sampling (`is_motif_atom` assignment)
p_diffuse_motif_sidechains: 0.80 # 80% probability of diffusing sidechains
p_diffuse_subgraph_atoms: 0.0 # 0% probability of sampling subgraph atoms (defaults to fully fixed)
subgraph_sampling_kwargs: # see tipatom
residue_p_seed_furthest_from_o: null
residue_n_bond_expectation: null
residue_p_fix_all: null
hetatom_n_bond_expectation: null
hetatom_p_fix_all: null

# Sets `is_motif_atom_with_fixed_seq`
p_fix_motif_sequence: 0.2 # probability that sequence is fixed for all motifs during training
# Sets `is_motif_atom_with_fixed_coord`
p_fix_motif_coordinates: 0.8 # Of the atoms that are sampled, should their coordinates be fixed?
# Sets `is_motif_atom_with_unindexed`
p_unindex_motif_tokens: 0.5 # probability of unindexing all motif atoms
2 changes: 2 additions & 0 deletions models/rfd3na/configs/datasets/conditions/ppi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
_target_: rfd3na.transforms.training_conditions.PPICondition
frequency: 1.0
17 changes: 17 additions & 0 deletions models/rfd3na/configs/datasets/conditions/sequence_design.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
defaults:
- island
- _self_

frequency: 1.0
name: sequence_design

island_sampling_kwargs:
island_len_min: 99999
island_len_max: 999999999

p_diffuse_motif_sidechains: 1.0
p_unindex_motif_tokens: 0.0

# For ChemNet-style sampling < 1.0
p_fix_motif_coordinates: 0.8
p_fix_motif_sequence: 0.1
Loading
Loading