Skip to content

Log charge provenance once per unique molecule#1488

Open
mattwthompson wants to merge 6 commits intomainfrom
charge-logging-large-molecules
Open

Log charge provenance once per unique molecule#1488
mattwthompson wants to merge 6 commits intomainfrom
charge-logging-large-molecules

Conversation

@mattwthompson
Copy link
Copy Markdown
Member

@mattwthompson mattwthompson commented Apr 17, 2026

Closes #1484
Closes #1483

(I found this was easier to make these two changes in one go, but I can refactor it into separate changes if desired)

Description

With these changes:

  • The charge method used to apply partial charges to a given molecule is only logged once
    • Previously this was per-atom, and therefore repetitive. Now it's just once per molecule
  • The charge method used to apply partial charges to a duplicates of a unique molecule is only logged once
    • Previously, this was duplicated for each duplicate molecule, so e.g. 1000 waters would be 3000 lines in the log
  • InChI is used to identify the moleule
    • SMILES for all molecules won't work, because of proteins
    • Another solution could be to use SMILES for small molecules like water and ions but InChI for molecules above some threshold size
    • Molecule.to_inchi failures are handled with a wide net
  • Logging logic is not entered unless it's turned to logging.INFO

Here is an example of it in use, adopting from others' code

In [1]: import logging
   ...: logging.basicConfig(level=logging.INFO)
   ...:
   ...: from openff.toolkit import ForceField, Molecule
   ...: from openff.interchange import Interchange
   ...: from openff.interchange.components._packmol import pack_box
   ...: from openff.units import unit
   ...:
   ...: mols = [Molecule.from_smiles(smiles) for smiles in ['CCO', 'O']]
   ...: for mol in mols:
   ...:     mol.generate_conformers(n_conformers=1)
   ...:
   ...: topology = pack_box(
   ...:     molecules=mols,
   ...:     number_of_copies=[3, 2],
   ...:     target_density=0.1 * unit.gram / unit.milliliter,
   ...: )
   ...:
   ...: ff = ForceField("openff-2.2.1.offxml")
   ...: interchange = Interchange.from_smirnoff(force_field=ff, topology=topology)
WARNING:pint.util:Redefining '[electric_potential]' (<class 'pint.delegates.txt_defparser.plain.DerivedDimensionDefinition'>)
/Users/mattthompson/software/openff-interchange/openff/interchange/components/interchange.py:1119: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
  if isinstance(obj, functools._lru_cache_wrapper) and obj.__module__.startswith("openff.interchange"):
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to molecule with InChI InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to molecule with InChI InChI=1S/H2O/h1H2

I have not done it yet, but, in principle, a large solvated protein-ligand system may have its entire charge provenance logged with as few as 5 lines i the log

Checklist

  • Update tests
  • Lint
  • Update docstrings

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.05%. Comparing base (569e43b) to head (278fc30).

Files with missing lines Patch % Lines
openff/interchange/smirnoff/_nonbonded.py 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1488      +/-   ##
==========================================
- Coverage   94.08%   94.05%   -0.03%     
==========================================
  Files          73       73              
  Lines        6049     6058       +9     
==========================================
+ Hits         5691     5698       +7     
- Misses        358      360       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +1033 to +1044
elif type(key) is ChargeModelTopologyKey:
logger.info(
"Charge section ChargeIncrementModel, using charge method "
f"{key.partial_charge_method}, "
f"applied to molecule with InChI {inchi}",
)

elif type(key) is ChargeIncrementTopologyKey:
# here is where the individual increments could be logged at creation time, but they're
# also logged in _get_charges
pass

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subtle behavior change.

Previously, there was logging for the charge method applied to the whole molecule (partial_charge_method in the SMIRNOFF data and API) and then also each individual charge increment. Each of these are done on a "per-atom" view, if repetitive, same as how other charge methods were logged before these changes.

Taking a "per-molecule" view works nicely for other methods (no need to log $N_{atoms}$ lines for every atom in molecule that's given AM1-BCC charges) and works fine for ChargeIncrementModelHandler.partial_charge_method but doesn't work for each individual charge increment (i.e. BCC). There's probably a way to refactor all of this to cater to that detail, but I don't think it's trivial. Given that ChargeIncrementModel is not actually used in production, and there aren't plans to do so, I'm inclined to not put that effort into this change.

This comment was marked as resolved.

@mattwthompson mattwthompson marked this pull request as ready for review April 21, 2026 23:02
key: ChargeModelTopologyKey | SingleAtomChargeTopologyKey | LibraryChargeTopologyKey,
):
try:
inchi = unique_molecule.to_inchi()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) inchi will scale with the size of the molecule, and so it will have the same problems as SMILES for proteins. inchikey is constant length and I think is the better thing to use here.

Suggested change
inchi = unique_molecule.to_inchi()
inchikey = unique_molecule.to_inchikey()


if type(key) is LibraryChargeTopologyKey:
logger.info(
f"Charge section LibraryCharges, applied to molecule with InChI {inchi}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) I think it'd be good for each message to also include the Hill formula for the molecule, since users will be able to read what they need from that 90% of the time, whereas inchi/inchikey are harder to interpret quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve charge provenance logging with large molecule(s) Improve charge provenance logging with copies of molecules

3 participants