Log charge provenance once per unique molecule#1488
Log charge provenance once per unique molecule#1488mattwthompson wants to merge 6 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1488 +/- ##
==========================================
- Coverage 94.08% 94.05% -0.03%
==========================================
Files 73 73
Lines 6049 6058 +9
==========================================
+ Hits 5691 5698 +7
- Misses 358 360 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| elif type(key) is ChargeModelTopologyKey: | ||
| logger.info( | ||
| "Charge section ChargeIncrementModel, using charge method " | ||
| f"{key.partial_charge_method}, " | ||
| f"applied to molecule with InChI {inchi}", | ||
| ) | ||
|
|
||
| elif type(key) is ChargeIncrementTopologyKey: | ||
| # here is where the individual increments could be logged at creation time, but they're | ||
| # also logged in _get_charges | ||
| pass | ||
|
|
There was a problem hiding this comment.
This is a subtle behavior change.
Previously, there was logging for the charge method applied to the whole molecule (partial_charge_method in the SMIRNOFF data and API) and then also each individual charge increment. Each of these are done on a "per-atom" view, if repetitive, same as how other charge methods were logged before these changes.
Taking a "per-molecule" view works nicely for other methods (no need to log ChargeIncrementModelHandler.partial_charge_method but doesn't work for each individual charge increment (i.e. BCC). There's probably a way to refactor all of this to cater to that detail, but I don't think it's trivial. Given that ChargeIncrementModel is not actually used in production, and there aren't plans to do so, I'm inclined to not put that effort into this change.
| key: ChargeModelTopologyKey | SingleAtomChargeTopologyKey | LibraryChargeTopologyKey, | ||
| ): | ||
| try: | ||
| inchi = unique_molecule.to_inchi() |
There was a problem hiding this comment.
(blocking) inchi will scale with the size of the molecule, and so it will have the same problems as SMILES for proteins. inchikey is constant length and I think is the better thing to use here.
| inchi = unique_molecule.to_inchi() | |
| inchikey = unique_molecule.to_inchikey() |
|
|
||
| if type(key) is LibraryChargeTopologyKey: | ||
| logger.info( | ||
| f"Charge section LibraryCharges, applied to molecule with InChI {inchi}", |
There was a problem hiding this comment.
(not blocking) I think it'd be good for each message to also include the Hill formula for the molecule, since users will be able to read what they need from that 90% of the time, whereas inchi/inchikey are harder to interpret quickly.
Closes #1484
Closes #1483
(I found this was easier to make these two changes in one go, but I can refactor it into separate changes if desired)
Description
With these changes:
Molecule.to_inchifailures are handled with a wide netlogging.INFOHere is an example of it in use, adopting from others' code
I have not done it yet, but, in principle, a large solvated protein-ligand system may have its entire charge provenance logged with as few as 5 lines i the log
Checklist