Kdidi/ligand clean by kierandidi · Pull Request #361 · uw-ipd/tmol

kierandidi · 2026-04-12T23:47:31Z

Ligand Preparation Pipeline

Adds automated preparation of non-standard residues (ligands, modified amino acids, cofactors) for loading into tmol's PoseStack. Given a CIF/PDB structure with unknown residues, the pipeline detects them, builds RDKit molecules directly from atom arrays, protonates at physiological pH via Dimorphite-DL, assigns Rosetta-compatible atom types via OpenBabel, and registers the resulting residue types in the ChemicalDatabase -- so they load transparently alongside protein residues.

Pipeline

Biotite AtomArray (CIF/PDB)
│
├─ detect_nonstandard_residues()     # filter unknowns, classify via Biotite CCD
│
├─ ligand_atom_array_to_rdkit_mol()  # build RDKit Mol directly from AtomArray
│                                    # (preserves 3D coords, infers bonds if needed)
│
├─ protonate_ligand_mol()            # Dimorphite-DL on RDKit Mol at target pH
│                                    # (no SMILES roundtrip, operates on Mol directly)
│
├─ compute_mmff94_charges()          # RDKit MMFF94 partial charges
│
├─ rdkit_to_obmol()                  # thin MolBlock conversion for atom typing
├─ assign_tmol_atom_types()          # Rosetta generic_potential types (OpenBabel)
├─ build_residue_type()              # RawResidueType with atoms, bonds, icoors
│
├─ register_ligand()                 # add to ChemicalDatabase + CanonicalOrdering
│
└─ pose_stack_from_biotite()         # normal PoseStack loading

Changes

New module: `tmol/ligand/`

File	Purpose
`detect.py`	Detect non-standard residues, classify as ligand vs modified AA via Biotite CCD
`smiles.py`	SMILES perception from AtomArray + Dimorphite-DL protonation (direct Mol, no SMILES roundtrip)
`mol3d.py`	MMFF94 partial charges via RDKit, thin OBMol conversion for atom typing
`atom_typing.py`	Rosetta atom type assignment, ported from `molfile_to_params.py` / `mol2genparams.py`
`residue_builder.py`	Build `RawResidueType` with atoms, bonds, internal coordinates, non-polymer properties
`registry.py`	Register new residue types into ChemicalDatabase, rebuild CanonicalOrdering, caching
`graph_match.py`	Graph isomorphism for CIF-to-generated atom name mapping
`params_io.py`	Read/write classic Rosetta `.params` files for interop
`dimorphite_dl.py`	Vendored Dimorphite-DL with `protonate_mol_variants()` (direct Mol-in/Mol-out)
`__init__.py`	Top-level `prepare_ligands()` and `prepare_single_ligand()` API

Database extensions

Added Br (35) and I (53) to element_types in chemical.yaml
Added 60+ Rosetta generic_potential atom types (CD, CR, Nim, Ohx, PG3, FR, ClR, BrR, IR, etc.) to chemical.yaml with donor/acceptor/hybridization properties
Added corresponding LJLK parameters for all new atom types in ljlk.yaml
Added HBond parameters for ligand donor/acceptor types in hbond.yaml
Added genbonded.yaml scoring parameters

Integration with biotite loading

Added prepare_ligands=True and ligand_ph parameters to pose_stack_from_biotite()
Modified canonical_form_from_biotite() to accept a custom CanonicalOrdering
Defensive guards in Dunbrack chi sampler and kinforest builder for non-polymer residues

Packer/rotamer changes

single_residue_kinforest.py: Minimal stub kinforest for non-Dunbrack-eligible residues (ligands, non-alpha polymers)
dunbrack_chi_sampler.py: Skip Dunbrack rotamer sampling for ligands
build_missing_sidechains.py: Skip sidechain rebuilding for ligand residues

Usage

# Automatic -- just works with pose_stack_from_biotite
pose_stack = pose_stack_from_biotite(atom_array, device, prepare_ligands=True)

# Manual -- for more control
from tmol.ligand import prepare_ligands
param_db, canonical_ordering = prepare_ligands(atom_array, ph=7.4)

# Standalone modules
from tmol.ligand.detect import detect_nonstandard_residues
from tmol.ligand.smiles import ligand_atom_array_to_rdkit_mol, protonate_ligand_mol

Tests

3 real CIF test fixtures: 184l (I4B, small drug), 155c (HEM, large cofactor), 1a25 (PSE, partial occupancy)
24 tests in test_ligand_pipeline.py covering:
- Detection (no ligands in UBQ, I4B in 184l, HEM in 155c, PSE with partial occupancy)
- Full pipeline (I4B, HEM, PSE, ATP)
- Caching (dedup, pH-dependent keys, defensive copies)
- Scoring integration (electrostatics, cartbonded, hbond, LJLK)
- Atom typing correctness (aliphatic carbons, improper exclusion)
- PoseStack round-trip (build + score on GPU)
- Dunbrack escaping for ligand residues

Dependencies

biotite>=1.2 (structure I/O)
openbabel-wheel==3.1.1.22 (atom typing only -- Rosetta AtomTypeClassifier port)
rdkit (protonation, MMFF94 charges, SMILES perception)
Dimorphite-DL vendored in-tree (no external dependency)

kierandidi added 8 commits April 12, 2026 16:35

refactored ligand logic

de788e6

format

fff2a5b

biotite dep

2cfd62a

fix h assignments

5832905

fix lazy imports and make tests pretty

d346c76

update ci with apptainer

6c9d2b6

update ci

0b67e24

fix formatting

2314fc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kdidi/ligand clean#361

Kdidi/ligand clean#361
kierandidi wants to merge 8 commits intomasterfrom
kdidi/ligand_clean

kierandidi commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kierandidi commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ligand Preparation Pipeline

Pipeline

Changes

New module: tmol/ligand/

Database extensions

Integration with biotite loading

Packer/rotamer changes

Usage

Tests

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kierandidi commented Apr 12, 2026 •

edited

Loading

New module: `tmol/ligand/`