Skip to content

Kdidi/ligand clean#361

Open
kierandidi wants to merge 8 commits intomasterfrom
kdidi/ligand_clean
Open

Kdidi/ligand clean#361
kierandidi wants to merge 8 commits intomasterfrom
kdidi/ligand_clean

Conversation

@kierandidi
Copy link
Copy Markdown
Collaborator

@kierandidi kierandidi commented Apr 12, 2026

Ligand Preparation Pipeline

Adds automated preparation of non-standard residues (ligands, modified amino acids, cofactors) for loading into tmol's PoseStack. Given a CIF/PDB structure with unknown residues, the pipeline detects them, builds RDKit molecules directly from atom arrays, protonates at physiological pH via Dimorphite-DL, assigns Rosetta-compatible atom types via OpenBabel, and registers the resulting residue types in the ChemicalDatabase -- so they load transparently alongside protein residues.

Pipeline

Biotite AtomArray (CIF/PDB)
│
├─ detect_nonstandard_residues()     # filter unknowns, classify via Biotite CCD
│
├─ ligand_atom_array_to_rdkit_mol()  # build RDKit Mol directly from AtomArray
│                                    # (preserves 3D coords, infers bonds if needed)
│
├─ protonate_ligand_mol()            # Dimorphite-DL on RDKit Mol at target pH
│                                    # (no SMILES roundtrip, operates on Mol directly)
│
├─ compute_mmff94_charges()          # RDKit MMFF94 partial charges
│
├─ rdkit_to_obmol()                  # thin MolBlock conversion for atom typing
├─ assign_tmol_atom_types()          # Rosetta generic_potential types (OpenBabel)
├─ build_residue_type()              # RawResidueType with atoms, bonds, icoors
│
├─ register_ligand()                 # add to ChemicalDatabase + CanonicalOrdering
│
└─ pose_stack_from_biotite()         # normal PoseStack loading

Changes

New module: tmol/ligand/

File Purpose
detect.py Detect non-standard residues, classify as ligand vs modified AA via Biotite CCD
smiles.py SMILES perception from AtomArray + Dimorphite-DL protonation (direct Mol, no SMILES roundtrip)
mol3d.py MMFF94 partial charges via RDKit, thin OBMol conversion for atom typing
atom_typing.py Rosetta atom type assignment, ported from molfile_to_params.py / mol2genparams.py
residue_builder.py Build RawResidueType with atoms, bonds, internal coordinates, non-polymer properties
registry.py Register new residue types into ChemicalDatabase, rebuild CanonicalOrdering, caching
graph_match.py Graph isomorphism for CIF-to-generated atom name mapping
params_io.py Read/write classic Rosetta .params files for interop
dimorphite_dl.py Vendored Dimorphite-DL with protonate_mol_variants() (direct Mol-in/Mol-out)
__init__.py Top-level prepare_ligands() and prepare_single_ligand() API

Database extensions

  • Added Br (35) and I (53) to element_types in chemical.yaml
  • Added 60+ Rosetta generic_potential atom types (CD, CR, Nim, Ohx, PG3, FR, ClR, BrR, IR, etc.) to chemical.yaml with donor/acceptor/hybridization properties
  • Added corresponding LJLK parameters for all new atom types in ljlk.yaml
  • Added HBond parameters for ligand donor/acceptor types in hbond.yaml
  • Added genbonded.yaml scoring parameters

Integration with biotite loading

  • Added prepare_ligands=True and ligand_ph parameters to pose_stack_from_biotite()
  • Modified canonical_form_from_biotite() to accept a custom CanonicalOrdering
  • Defensive guards in Dunbrack chi sampler and kinforest builder for non-polymer residues

Packer/rotamer changes

  • single_residue_kinforest.py: Minimal stub kinforest for non-Dunbrack-eligible residues (ligands, non-alpha polymers)
  • dunbrack_chi_sampler.py: Skip Dunbrack rotamer sampling for ligands
  • build_missing_sidechains.py: Skip sidechain rebuilding for ligand residues

Usage

# Automatic -- just works with pose_stack_from_biotite
pose_stack = pose_stack_from_biotite(atom_array, device, prepare_ligands=True)

# Manual -- for more control
from tmol.ligand import prepare_ligands
param_db, canonical_ordering = prepare_ligands(atom_array, ph=7.4)

# Standalone modules
from tmol.ligand.detect import detect_nonstandard_residues
from tmol.ligand.smiles import ligand_atom_array_to_rdkit_mol, protonate_ligand_mol

Tests

  • 3 real CIF test fixtures: 184l (I4B, small drug), 155c (HEM, large cofactor), 1a25 (PSE, partial occupancy)
  • 24 tests in test_ligand_pipeline.py covering:
    • Detection (no ligands in UBQ, I4B in 184l, HEM in 155c, PSE with partial occupancy)
    • Full pipeline (I4B, HEM, PSE, ATP)
    • Caching (dedup, pH-dependent keys, defensive copies)
    • Scoring integration (electrostatics, cartbonded, hbond, LJLK)
    • Atom typing correctness (aliphatic carbons, improper exclusion)
    • PoseStack round-trip (build + score on GPU)
    • Dunbrack escaping for ligand residues

Dependencies

  • biotite>=1.2 (structure I/O)
  • openbabel-wheel==3.1.1.22 (atom typing only -- Rosetta AtomTypeClassifier port)
  • rdkit (protonation, MMFF94 charges, SMILES perception)
  • Dimorphite-DL vendored in-tree (no external dependency)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant