Calculator refactor, analytic tests, uv + pyproject.toml by SamTov · Pull Request #619 · zincware/MDSuite

SamTov · 2026-05-21T16:12:09Z

Summary

Three thematic commits — see each for the full detail:

refactor(calculators) — calculators are now standalone configuration objects, applied to experiments via experiment.run(calc) / project.run(calc, parallel=True). The @call decorator is deleted. All five tfp.stats.auto_correlation call sites migrated to a JIT'd JAX-vmap helper; tensorflow_probability removed from the dep tree. self.args no longer mutated in place — derived state goes to self.resolved_*. Bug fixes for 8 latent issues the old snapshot tests had been hiding (axis-0-vs-axis-1 ACF, single-window result reporting, empty-minibatch broadcasting, np.prod(form_factors) overflow, etc.).
test — every active calculator now has an integration test that validates against a closed-form theoretical answer or a strict deterministic reference, not a frozen DataHub snapshot. 32 calculator tests (13 against closed-form analytic results; the rest use scipy.quad of synthetic g(r), shape checks, or consistency checks) + a 5-test standalone-API contract suite (including a parallel-vs-serial equivalence test). Disabled _test_*.py / __test_*.py snapshot tests deleted. zinchub removed from the calculator test path; the 7 remaining zinchub-dependent tests use pytest.importorskip so they skip cleanly when the package isn't installed.
chore — setup.py + requirements.txt + dev-requirements.txt collapsed into a single PEP 621 pyproject.toml with test / docs / dev extras. TF pinned to >=2.10,<2.20. CI workflow rewritten around uv with separate jobs for tests (with coverage --fail-under=70), ruff lint, mypy baseline (informational), notebook tests, and docs. Merge-conflict pre-commit hook re-enabled. README install instructions updated to Python 3.10 + uv. Logger collision in project.py fixed (handler dedup + [project=<name>] log prefix). NernstEinsteinIonicConductivity deleted — it had been broken for an unknown period (calls update_user_args and _update_properties_file, neither of which exist anywhere); rewrite path documented in the roadmap.

Test plan

All 33 calculator integration tests pass against the new analytic / synthetic-data references
All 73 unit tests pass
Parallel-vs-serial equivalence test confirms project.run(calc, parallel=True) matches the serial path bit-for-bit
JAX migration verified numerically equivalent to the TFP autocorrelation it replaces (max relative diff ~1.3e-5, floating-point only)
mdsuite imports cleanly from the new pyproject.toml
Performance: calculator integration test suite 368s (TF/tfp) → 264s (JAX + JIT + cleanup), ~28% faster

🤖 Generated with Claude Code

…tion Calculators are now standalone configuration objects, constructed without an experiment and applied via experiment.run(calc) / project.run(calc). Replaces the legacy ``@call``-decorator pattern (deleted) where the calculator was reconstructed inside the loop with an experiment baked in and ``self.args`` got mutated mid-run. Key changes: * Delete the ``@call`` decorator from mdsuite/calculators/calculator.py. Replace with an explicit ``Calculator.run(experiment)`` method that resets per-run state, runs ``_setup()`` (an override hook), does the DB cache lookup, runs the analysis, persists, and returns the ``db.Computation``. * Move user kwargs from each calculator's ``__call__`` to ``__init__``. Strip ``experiment``/``experiments`` from ``__init__``; the experiment is attached transiently for the duration of ``run()``. * Make ``RunComputation`` callable: ``experiment.run(calc)`` returns a single ``db.Computation``, ``project.run(calc)`` returns a ``{name: Computation}`` dict. Property-style shims keep the legacy ``experiment.run.<Calculator>(**kwargs)`` API working. * Add ``parallel=False`` kwarg on the dispatch ``__call__``. When true, copies the calculator with ``copy.deepcopy`` per worker and runs in a ``ThreadPoolExecutor``. Threads (not processes) avoid the Experiment-with-SQLAlchemy-session pickling problem and still scale across cores because JAX/TF/NumPy release the GIL. * Stop mutating ``self.args.tau_values`` / ``self.args.data_range`` in ``_handle_tau_values()``. Write to ``self.resolved_tau_values`` / ``self.resolved_data_range`` instead. Update all downstream call sites (tf.gather, prefactor formulas, array allocations) to read the resolved attributes. Removes a workaround that required re-running ``_setup()`` after persistence to rewind the mutation. * Move RDF and ADF ``check_input`` "fill experiment-dependent defaults" out of ``check_input`` into ``_setup`` so ``self.args`` is fully resolved *before* the DB cache lookup. * Export every calculator class at the ``mdsuite`` top level so users can write ``mds.GreenKuboThermalConductivity(...)`` directly. Performance — replace ``tfp.stats.auto_correlation`` everywhere: * Add ``auto_correlation`` helper in calculator_helper_methods.py: nested ``jax.vmap`` over Cartesian components and particles, with the inner kernel ``jnp.correlate(a, a, mode='full')`` wrapped in ``jax.jit`` for kernel reuse. Applies a ``T / (T - k)`` rescaling to match the unbiased per-lag convention all GK calculators rely on for the transport integrals. * Swap all 5 ``tfp.stats.auto_correlation`` call sites (GK thermal / viscosity / viscosity flux / ionic conductivity / self-diffusion) to the new helper. Calculator-bug fixes surfaced by the new analytic tests in the next commit (these were latent in the snapshot-test era): * GK thermal / viscosity / viscosity-flux: was passing axis=0 to ``tfp.stats.auto_correlation`` (the singleton particles axis) instead of axis=1 (time). Fixed. * GK thermal / viscosity / viscosity-flux: result reporting was returning ``result[0]`` / ``result[1]`` (first two single-window estimates) instead of mean/SEM across windows, and the values were numpy scalars that failed JSON serialisation. Now returns Python floats wrapped in lists via ``np.mean`` / SEM aggregation. * GK ionic conductivity: ``tf.squeeze(axis=0)`` crashed on empty minibatches. Replaced with ``tf.reduce_sum`` over the particles axis. * E-H ionic / thermal / Kinaci: ``ensemble[:, 0, :]`` dropped the time axis and broke broadcasting on empty minibatches. Now uses ``ensemble[:, 0:1, :]`` with reduce_sum over particles. * Structure factor: ``np.prod(form_factors)`` collapsed two length-N arrays into a single scalar and overflowed to inf for typical Na form factors. Replaced with the proper element-wise Faber-Ziman mean. * Structure factor: ``queue_data(subjects=pair)`` iterated as characters (the pair was a bare string). Fixed to ``pair.split('_')``. * CN / KBI / PMF / structure factor: ``isinstance(_, Computation)`` check rejected duck-typed RDF stand-ins. Relaxed to ``hasattr(_, 'data_dict')`` so the synthetic-RDF tests can inject reference data. Removed: * ``tensorflow_probability`` from the dep tree. * Various dead code in deprecated calculator-database helpers (``_get_rdf_data``, ``_load_rdf_from_file``) — they only raised ``DeprecationWarning``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Every active calculator now has an integration test that validates against a closed-form theoretical answer (or a strict deterministic reference derived from a synthetic input), not a frozen snapshot downloaded from the DataHub. 32 tests total covering every active calculator + a 5-test API-contract suite. 13 of the calculator tests validate against closed-form analytic answers; the rest use scipy.quad of synthetic g(r) (CN) or shape / consistency checks (ADF, distinct diffusion ×2). Pattern by calculator family: * Transport coefficients (GK self-diffusion, Einstein diffusion, GK thermal conductivity, GK viscosity, GK viscosity flux, GK ionic conductivity, E-H ionic conductivity, E-H thermal conductivity, E-H Kinaci): feed an Ornstein-Uhlenbeck or Wiener process with prescribed variance and relaxation time as the relevant flux / dipole / moment, then check that the recovered transport coefficient matches the analytic formula (σ²τ / (V kB T²) for κ_GK, D_M / (V kB T) for σ_EH, 6 D t for MSDs, etc.). * Structural (RDF, ADF): feed uniform-random positions in a periodic box (ideal gas), then assert g(r) = 1 / isotropic angle distribution with sin(θ) Jacobian. * RDF-derived (KBI, PMF, CN, structure factor): inject a synthetic ``SyntheticRDF`` duck-typed object via the new ``rdf_data=`` argument (the calculator's ``isinstance(Computation)`` check was relaxed to a ``hasattr('data_dict')`` check). Then check against the closed-form integrals (G(r) = 0 for ideal gas KBI, G(∞) = -4π r_excl³/3 for hard shell, w(r) = -kT ln g(r) for PMF, ρ * shell volume for CN with a Gaussian-peak g(r), S(q) = 1 for ideal-gas structure factor). * Distinct diffusion (Einstein + GK): two species with independent random walks / velocities; assert |D_AB| < 0.5 D_AA (cross-correlation vanishes for independent processes). * Standalone API contract: constructable without an experiment, no state bleed between sequential runs, new ``experiment.run(calc)`` and legacy ``experiment.run.X(**kw)`` shim produce identical numerical results, ``project.run(calc, parallel=True)`` produces identical results to the serial path. zinchub infrastructure scrapped from the calculator test path: * Remove all six disabled snapshot tests (``_test_*.py``, ``__test_structure_factor.py``) — they relied on JSON snapshots in a remote DataHub repo and one had a hard-coded Windows path. * Replace the active calculator zinchub tests with the analytic versions described above. * Add a small shared helper at ``CI/integration_tests/calculators/_synthetic_signals.py``: OU process generator, Wiener process generator, ``SyntheticRDF`` duck-typed Computation stand-in, and a ``make_experiment_with_species`` utility that spins up a minimal project so RDF-consumer calculators have valid species metadata to read off ``experiment.species``. * Mark the remaining 7 zinchub-dependent tests (functional water / molten-salt studies, LAMMPS file-reader unit tests, two transformation tests) with ``pytest.importorskip('zinchub')`` so they skip cleanly when the package isn't installed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Packaging modernisation: * Consolidate setup.py + requirements.txt + dev-requirements.txt into a single PEP 621 pyproject.toml. ``[project]`` block holds all metadata, ``[project.dependencies]`` holds runtime deps, ``[project.optional-dependencies]`` is split into ``test``, ``docs``, and ``dev`` extras. * Pin TensorFlow to a tested range: ``tensorflow>=2.10,<2.20`` (was ``>=2.5`` with no upper bound). * Remove ``tensorflow_probability[tf]`` — no longer used after the JAX migration. * Delete the now-redundant setup.py, requirements.txt, dev-requirements.txt. CI workflow rewritten around uv: * Install via ``uv pip install --system -e ".[test]"`` (or ".[docs]" / ".[dev]" per job). * Five jobs: tests (with ``coverage --fail-under=70``), ruff lint over mdsuite/, mypy baseline (permissive, informational), notebook tests, docs build. * Linting and type-checking are no longer skippable via ``--no-verify``. Other small fixes: * README install instructions updated to Python 3.10 + uv-first workflow (was Python 3.8, which has been EOL since 2024-10). * Re-enable the merge-conflict marker pre-commit hook (had been commented out with no explanation). * Logger collision fix in project.py: ``attach_file_logger`` now tracks attached ``FileHandler`` instances on a class-level dict and detaches stale ones when a new ``Project`` is created. Log format prefixes each line with ``[project=<name>]`` so the active project is obvious from the log alone. * Delete the ``NernstEinsteinIonicConductivity`` calculator. It had been broken for an indeterminate period — calls ``self.update_user_args(...)`` and ``self._update_properties_file(...)``, neither of which exist anywhere in the codebase. Its test was already disabled. The roadmap documents the rewrite path step by step (use the duck-typed RDF/Computation pattern + a synthetic 2-species test). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two related cleanups following up on the standalone-instance + JAX migration: JAX MSD helper: * Add ``msd_from_reference`` in ``calculator_helper_methods.py`` that mirrors the ``auto_correlation`` design: nested ``jax.vmap`` (inner Cartesian-components, outer particles) wrapped in ``jax.jit`` for kernel reuse. Computes ``msd[k] = sum_p sum_d (ds[p, tau_idx[k], d] - ds[p, 0, d])**2`` in one shot. * Replace the ``tf.math.squared_difference(tf.gather(...), ensemble[:, 0:1, :])`` + ``tf.reduce_sum(..., axis=2)`` + ``tf.reduce_sum(..., axis=0)`` chain in all four MSD-based calculators: Einstein diffusion, Einstein-Helfand ionic conductivity, Einstein-Helfand thermal conductivity, Einstein-Helfand Kinaci. * Replace the last ``tf.gather`` in GK ionic conductivity with plain numpy fancy indexing on the converted ensemble. The auto-correlation helper is already JAX. This leaves TF in the calculators only where it's actually doing something tightly coupled to the data-loading pipeline: ``tf.float64`` dtype constants (the data manager hands ensembles back as tf tensors), the neighbour-list code in ADF / SDF / RDF (heavy ``tf.scatter_nd`` / ``tf.linalg.set_diag`` ops), and the ``tf.Tensor`` type hints on ``ensemble_operation`` (accurate — the input is a tf tensor). Simpler _setup methods: For seven calculators whose argument resolution does NOT actually depend on the experiment (E-H ionic / thermal / Kinaci, GK ionic / thermal / viscosity / viscosity-flux), build ``self.args`` directly in ``__init__`` instead of in ``_setup``. ``_setup`` is now restricted to the genuinely experiment-dependent bits: ``self.time = self._handle_tau_values()`` (reads ``experiment.time_step``, ``sample_rate``) and per-run array allocations. This removes the ``self._user_data_range``, ``self._user_tau_values``, ``self._user_correlation_time``, ``self._user_fit_range`` / ``self._user_integration_range`` shadow attributes that only existed to defer args construction to ``_setup``. ``self.args`` becomes a true configuration object — locked in at construction, immutable across runs, identical cache keys. ``_setup`` stays in place for the calculators where it really does need the experiment: RDF and ADF (defaults like ``cutoff = experiment.box_array[0]/2 - 0.1``), GK / Einstein self-diffusion + distinct diffusion (``species`` default from ``experiment.species``), RDF-consumer calculators (CN / KBI / PMF / SF, which call ``self.experiment.run.RadialDistributionFunction(...)`` when no ``rdf_data`` is passed in). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

SamTov and others added 4 commits May 21, 2026 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculator refactor, analytic tests, uv + pyproject.toml#619

Calculator refactor, analytic tests, uv + pyproject.toml#619
SamTov wants to merge 4 commits into
mainfrom
restructure

SamTov commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamTov commented May 21, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant