Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 10 additions & 23 deletions docs/using/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,30 @@

## Charge assignment logging

SMIRNOFF force fields support several different partial charge assignment methods. These are applied, in the following order

1. Look for preset charges from the `charge_from_molecules` argument
1. Look for chemical environment matches within the `<LibraryCharges>` section
1. Look for chemical environment matches within the `<ChargeIncrementModel>` section
1. Try to run AM1-BCC according to the `<ToolkitAM1BCC>` section or some variant

If a molecule gets charges from one method, attempts to match charges for later methods are skipped. Note that preset charges override the force field and are not checked for consistency; any charges provided to the `charge_from_molecules` argument technically modify the force field. For more on how SMIRNOFF defines this behavior, see [this issue](https://github.com/openforcefield/standards/issues/68) and linked discussions.

After all (mass-bearing) atoms have partial charges assigned, virtual sites are given charges by transferring charge from the atoms they are associated with ("orientation" atoms) according to the parameters of the force field. (The value of these parameters can be 0.0.)

Given this complexity, it may be useful to track how each atom actually got charges assigned. Interchange has opt-in logging to track this behavior. This uses the [standard library `logging` module](https://docs.python.org/3/library/logging.html) at the `INFO` level. The easiest way to get started is by adding something like `logging.basicConfig(level=logging.INFO)` to the beginning of a script or program. For example, this script:
SMIRNOFF force fields support several different partial charge assignment methods and [employ a hierarchical scheme](smirnoff-charge-assignment-hierarchy) to determine which is used on each molecule. Given this complexity, it may be useful to track how each molecule's atoms actually had their charges assigned. (Note that, except for some complexity with `<ChargeIncrementModel>`, all atoms in a given molecule are assigned charges via the same method when using SMIRNOFF force fields.) Interchange has opt-in logging to track this behavior. This uses the [standard library `logging` module](https://docs.python.org/3/library/logging.html) at the `INFO` level. The easiest way to get started is by adding something like `logging.basicConfig(level=logging.INFO)` to the beginning of a script or program. For example, this script:

```python
import logging

from openff.toolkit import ForceField, Molecule
from openff.toolkit import ForceField, Molecule, Topology

logging.basicConfig(level=logging.INFO)

ForceField("openff-2.2.0.offxml").create_interchange(
Molecule.from_smiles("CCO").to_topology()
Topology.from_molecules(
[
Molecule.from_smiles("CCO"),
Molecule.from_smiles("O"),
],
)
)
```

will produce output including something like

```shell
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 4
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 5
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 6
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 7
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 8
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to molecule with InChI InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges, applied to molecule with InChI InChI=1S/H2O/h1H2
```

This functionality is only available with SMIRNOFF force fields.
2 changes: 2 additions & 0 deletions docs/using/edges.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ You may also wish to make the vdW cut-off distance longer. This is typically acc

## Quirks of charge assignment

(smirnoff-charge-assignment-hierarchy)=

### Charge assignment hierarchy

Interchange, following the [SMIRNOFF specification](https://openforcefield.github.io/standards/standards/smirnoff/#partial-charge-and-electrostatics-models), assigns charges to (heavy) atoms with the following priority:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,29 +115,40 @@
NAGL_KEY = "openff-gnn-am1bcc-1.0.0.pt"


def map_methods_to_atom_indices(caplog: pytest.LogCaptureFixture) -> dict[str, list[int]]:
def map_methods_to_inchi(caplog: pytest.LogCaptureFixture) -> dict[str, list[int]]:
"""
Map partial charge assignment methods to (sorted) atom indices.
Map partial charge assignment methods to (sorted) molecule InChI.
"""
info = defaultdict(list)

for record in caplog.records:
# skip logged warnings from upstreams/other packages
if record.name.startswith("openff.interchange"):
# This is a specific one-off logging event not relevant to this function
if "Could not generate InChI for molecule with Hill formula" in record.msg:
continue
assert record.levelname == "INFO", "Only INFO logs are expected."
else:
# skip logged warnings from upstreams/other packages
continue

message = record.msg

if message.startswith("Charge section LibraryCharges"):
info["library"].append(int(message.split("atom index ")[-1]))
_, molecule = message.split(", ")

info["library"].append(molecule.split("InChI ")[-1])
Comment thread
mattwthompson marked this conversation as resolved.

elif message.startswith("Charge section ToolkitAM1BCC"):
info[message.split(", ")[1].split(" ")[-1]].append(int(message.split("atom index ")[-1]))
_, method, molecule = message.split(", ")

assert method.split()[-1] == AM1BCC_KEY, f"Expected method {AM1BCC_KEY} but got {method.split()[-1]}"

info[method.split()[-1]].append(molecule.split("InChI ")[-1])

elif message.startswith("Charge section NAGLCharges"):
info["NAGLChargesHandler"].append(int(message.split("atom index ")[-1]))
_, _, molecule = message.split(", ")

info["NAGL"].append(molecule.split("InChI ")[-1])

# without also pulling the virtual site - particle mapping (which is different for each engine)
# it's hard to store more information than the orientation atoms that are affected by each
Expand All @@ -151,12 +162,13 @@ def map_methods_to_atom_indices(caplog: pytest.LogCaptureFixture) -> dict[str, l
info["orientation"].append(atom)

elif message.startswith("Preset charges"):
info["preset"].append(int(message.split("atom index")[-1]))
info["preset"].append(message.split("InChI ")[-1])

elif message.startswith("Charge section ChargeIncrementModel"):
if "using charge method" in message:
info[f"chargeincrements_{message.split(',')[1].split(' ')[-1]}"].append(
int(message.split("atom index ")[-1]),
_, method, molecule = message.split(", ")
info[f"chargeincrements_{method.split()[-1]}"].append(
molecule.split("InChI ")[-1],
)

elif "applying charge increment" in message:
Expand Down Expand Up @@ -285,23 +297,27 @@ def test_case0(caplog, sage_no_nagl, ligand):
with caplog.at_level(logging.INFO):
sage_no_nagl.create_interchange(ligand.to_topology())

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are ethanol, getting AM1-BCC
assert info[AM1BCC_KEY] == [*range(0, 9)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]


def test_case1(caplog, sage_no_nagl, ligand_and_water_and_ions):
with caplog.at_level(logging.INFO):
sage_no_nagl.create_interchange(ligand_and_water_and_ions)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are ethanol, getting AM1-BCC
assert info[AM1BCC_KEY] == [*range(0, 9)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

# atoms 9 through 21 are water + ions, getting library charges
assert info["library"] == [*range(9, 22)]
assert info["library"] == [
"InChI=1S/ClH/h1H/p-1",
"InChI=1S/H2O/h1H2",
"InChI=1S/Na/q+1",
]


def test_case2(caplog, sage_no_nagl, ligand, solvent):
Expand All @@ -310,10 +326,10 @@ def test_case2(caplog, sage_no_nagl, ligand, solvent):
with caplog.at_level(logging.INFO):
sage_no_nagl.create_interchange(topology)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# everything should get AM1-BCC charges
assert info[AM1BCC_KEY] == [*range(0, topology.n_atoms)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3", "InChI=1S/CH5N/c1-2/h2H2,1H3"]


def test_case3(caplog, sage_no_nagl, ligand_and_water_and_ions, solvent):
Expand All @@ -327,14 +343,18 @@ def test_case3(caplog, sage_no_nagl, ligand_and_water_and_ions, solvent):
ligand_and_water_and_ions,
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are ethanol, getting AM1-BCC,
# and also solvent molecules (starting at index 22)
assert info[AM1BCC_KEY] == [*range(0, 9), *range(22, 22 + 3 * 7)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3", "InChI=1S/CH5N/c1-2/h2H2,1H3"]

# atoms 9 through 21 are water + ions, getting library charges
assert info["library"] == [*range(9, 22)]
assert info["library"] == [
"InChI=1S/ClH/h1H/p-1",
"InChI=1S/H2O/h1H2",
"InChI=1S/Na/q+1",
]


@pytest.mark.slow
Expand All @@ -356,23 +376,29 @@ def test_cases4_5(caplog, ligand_and_water_and_ions, preset_on_protein):
else:
ff.create_interchange(complex)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

assert info[AM1BCC_KEY] == [*range(complex.molecule(0).n_atoms, complex.molecule(0).n_atoms + 9)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

if preset_on_protein:
# protein gets preset charges
assert info["preset"] == [*range(0, complex.molecule(0).n_atoms)]
assert info["preset"] == [
"InChI=1S/C9H17N3O3/c1-5(8(14)10-4)12-9(15)6(2)11-7(3)13/h5-6H,1-4H3,(H,10,14)(H,11,13)(H,12,15)/t5-,6-/m0/s1",
]

# everything after the protein and ligand should get library charges
assert info["library"] == [
*range(complex.molecule(0).n_atoms + 9, complex.n_atoms),
"InChI=1S/ClH/h1H/p-1",
"InChI=1S/H2O/h1H2",
"InChI=1S/Na/q+1",
]
else:
# the protein and everything after the ligand should get library charges
assert info["library"] == [
*range(0, complex.molecule(0).n_atoms),
*range(complex.molecule(0).n_atoms + 9, complex.n_atoms),
"InChI=1S/C9H17N3O3/c1-5(8(14)10-4)12-9(15)6(2)11-7(3)13/h5-6H,1-4H3,(H,10,14)(H,11,13)(H,12,15)/t5-,6-/m0/s1",
"InChI=1S/ClH/h1H/p-1",
"InChI=1S/H2O/h1H2",
"InChI=1S/Na/q+1",
]


Expand All @@ -384,13 +410,13 @@ def test_case6(caplog, ligand, water):
with caplog.at_level(logging.INFO):
force_field.create_interchange(topology)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are ethanol, getting AM1-BCC charges
assert info[AM1BCC_KEY] == [*range(0, 9)]
assert info[AM1BCC_KEY] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

# atoms 9 through 17 are water atoms, getting library charges
assert info["library"] == [*range(9, 18)]
assert info["library"] == ["InChI=1S/H2O/h1H2"]

# particles 18 through 20 are water virtual sites, but the current logging strategy
# makes it hard to match these up (and the particle indices are different OpenMM/GROMACS/etc)
Expand All @@ -409,13 +435,17 @@ def test_case7(caplog, sage_no_nagl, ligand_and_water_and_ions):
charge_from_molecules=[ligand_and_water_and_ions.molecule(0)],
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are ethanol, getting preset charges
assert info["preset"] == [*range(0, 9)]
assert info["preset"] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

# atoms 9 through 21 are water + ions, getting library charges
assert info["library"] == [*range(9, 22)]
assert info["library"] == [
"InChI=1S/ClH/h1H/p-1",
"InChI=1S/H2O/h1H2",
"InChI=1S/Na/q+1",
]


def test_case8(caplog, sage_no_nagl, water_and_ions):
Expand All @@ -427,13 +457,13 @@ def test_case8(caplog, sage_no_nagl, water_and_ions):
charge_from_molecules=[water_and_ions.molecule(0)],
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 8 are water, getting preset charges
assert info["preset"] == [*range(0, 9)]
assert info["preset"] == ["InChI=1S/H2O/h1H2"]

# atoms 9 through 12 are ions, getting library charges
assert info["library"] == [*range(9, 13)]
assert info["library"] == ["InChI=1S/ClH/h1H/p-1", "InChI=1S/Na/q+1"]


def test_case9(caplog, sage_with_bond_charge):
Expand All @@ -444,10 +474,10 @@ def test_case9(caplog, sage_with_bond_charge):
).to_topology(),
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 5 are ligand, getting AM1-BCC charges
assert info[AM1BCC_KEY] == [*range(0, 5)]
assert info[AM1BCC_KEY] == ["InChI=1S/CH3Cl/c1-2/h1H3"]

# atoms 0 and 1 are the orientation atoms of the sigma hole virtual site
assert info["orientation"] == [0, 1]
Expand All @@ -467,10 +497,10 @@ def test_case10(caplog, sage_with_nagl_chargeincrements, ligand):
ligand.to_topology(),
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# atoms 0 through 5 are ligand, getting NAGL charges
assert info[f"chargeincrements_{NAGL_KEY}"] == [*range(0, 9)]
# atoms 0 through 8 are ligand, getting NAGL charges
assert info[f"chargeincrements_{NAGL_KEY}"] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

# TODO: These are logged symmetrically (i.e. hydrogens are listed)
# even though the charges appear to be correct, assert should
Expand All @@ -489,11 +519,11 @@ def test_case11(caplog, sage, ligand):
with caplog.at_level(logging.INFO):
sage.create_interchange(ligand.to_topology())

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# Should log NAGL charges for all atoms
assert "NAGLChargesHandler" in info
assert info["NAGLChargesHandler"] == [*range(0, ligand.n_atoms)]
assert "NAGL" in info
assert info["NAGL"] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]


def test_case12(caplog, sage, water):
Expand All @@ -505,11 +535,11 @@ def test_case12(caplog, sage, water):
with caplog.at_level(logging.INFO):
sage.create_interchange(topology)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# Water should get library charges, not NAGL
assert info["library"] == [*range(0, 6)] # 2 water molecules, 3 atoms each
assert "NAGLChargesHandler" not in info
assert info["library"] == ["InChI=1S/H2O/h1H2"] # 2 water molecules
assert "NAGL" not in info


def test_case13(caplog, sage, ligand, water):
Expand All @@ -521,13 +551,13 @@ def test_case13(caplog, sage, ligand, water):
with caplog.at_level(logging.INFO):
sage.create_interchange(topology)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# Ligand should get NAGL charges
assert info["NAGLChargesHandler"] == [*range(0, ligand.n_atoms)]
assert info["NAGL"] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]

# Water should get library charges
assert info["library"] == [*range(ligand.n_atoms, ligand.n_atoms + water.n_atoms)]
assert info["library"] == ["InChI=1S/H2O/h1H2"]


def test_case14(caplog, sage, ligand):
Expand All @@ -542,8 +572,28 @@ def test_case14(caplog, sage, ligand):
charge_from_molecules=[ligand],
)

info = map_methods_to_atom_indices(caplog)
info = map_methods_to_inchi(caplog)

# Should use preset charges, not NAGL
assert info["preset"] == [*range(0, ligand.n_atoms)]
assert "NAGLChargesHandler" not in info
assert info["preset"] == ["InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"]
assert "NAGL" not in info


def test_inchi_fallback(caplog, sage):
"""Test that molecules that fail InChI generation are still logged in some way."""
from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper

# TODO: Might be a toolkit bug that needs to be worked around
with caplog.at_level(logging.INFO):
sage.create_interchange(
Molecule.from_smiles(342 * "C").to_topology(),
)
Comment thread
mattwthompson marked this conversation as resolved.

info = map_methods_to_inchi(caplog)

if OpenEyeToolkitWrapper.is_available():
assert info["NAGL"] == ["UNKNOWN_INCHI"]
else:
# RDKit can generate an InChI for this molecule, but it's very long
assert len(info["NAGL"]) == 1
assert info["NAGL"][0].startswith("InChI=1S/C342H686/c1-3-5")
Loading