Skip to content

[PERF] improve MPM solver speed.#2720

Open
Kashu7100 wants to merge 3 commits intoGenesis-Embodied-AI:mainfrom
Kashu7100:feat-mpm
Open

[PERF] improve MPM solver speed.#2720
Kashu7100 wants to merge 3 commits intoGenesis-Embodied-AI:mainfrom
Kashu7100:feat-mpm

Conversation

@Kashu7100
Copy link
Copy Markdown
Collaborator

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Kashu7100 and others added 3 commits April 18, 2026 18:01
- Drop the unused [f+1] grid frame; grid is only indexed at [f] in p2g/g2p.
- Fuse compute_F_tmp + svd on the forward pass to keep F_tmp in registers;
  keep them separate on the autodiff path so the backward composition is
  unchanged.
- Rate-limit _is_state_valid to every 10 substeps so the NaN check no longer
  forces a GPU->CPU sync every substep.
- Add benches/mpm_rigid_bench.py: Franka-squeezes-elastic-cube harness that
  reports per-step/substep wall-clock, peak GPU memory, and a mean-position
  fingerprint so variants can be compared against a baseline run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MPM materials now expose `needs_svd`. For Elastic(neohooken) and
non-viscous Liquid, neither the F-update nor the stress uses U/V/S, so
the SVD kernel is pure waste. When no registered material needs SVD,
the solver dispatches to a new `compute_F_tmp_only` kernel and p2g
reads J from `F_tmp.determinant()` via a qd.static branch (det(F_tmp)
== det(S) for qd.svd's proper-rotation U/V).

Also split grid reset: the non-differentiable forward path uses
`reset_grid`, which skips the grad-buffer writes that `reset_grid_and_grad`
still performs on the autodiff path. Halves reset DRAM traffic for
forward-only runs.

Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs):
  baseline (3 runs): 9.77 / 9.71 / 9.24 ms/step, mean 9.58
  plan 1   (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67
Within run-to-run variance; corotation still requires SVD on this
scene, so the SVD-skip path isn't exercised. Fingerprint identical
across runs (0.641477, 0.001549, 0.064053). Verified neohooken /
non-viscous liquid scenes now take the SVD-free path end-to-end.

Harness: `benches/mpm_rigid_bench.py` now prepends the repo root to
sys.path so the editable checkout wins over any site-packages
`genesis` namespace stub left behind by a prior wheel install.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dense reset_grid zeros all ~30K cells per batch every substep even
when particles only touch a few percent of them. Replace on the
forward path with a per-slot dirty list:

- init_grid_fields now allocates grid_dirty_list
  (substeps_local, max_dirty_cells, B) and grid_dirty_count
  (substeps_local, B), sized from n_particles * 27 (each particle
  scatters to a 3^3 neighbourhood, so n_particles*27 is the exact
  upper bound on distinct cells touched per batch).
- p2g's grid-scatter now captures the prior mass via atomic_add;
  the unique thread that sees prev_mass == 0 appends the flat cell
  index into grid_dirty_list[f, :, :]. Guarded by qd.static on
  `_sparse_reset_enabled = not requires_grad` so the autodiff
  composition of p2g.grad is untouched.
- sparse_reset_grid(f) zeros only the cells in
  grid_dirty_list[f, :, count], then clears the counter in the
  same kernel. Fields are zero-initialized, so the first pass
  through each slot is a correct no-op (grid already zero).
- substep_pre_coupling forward path now calls sparse_reset_grid
  instead of reset_grid; the diff path still calls
  reset_grid_and_grad unchanged.

Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs):
  plan 1 (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67
  plan 2 (3 runs): 9.11 / 9.00 / 9.06 ms/step, mean 9.05
~6.5% speedup, fingerprint identical (0.641477, 0.001549, 0.064053).
For larger grids or sparser particle occupancy the relative win
scales with grid_size / (n_particles * 27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Comment thread pyproject.toml
"fast_simplification>=0.1.12",
# Surface reconstruction library for particle data from SPH simulations
"pysplashsurf==0.14.*",
"torch>=2.11.0",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@@ -0,0 +1,251 @@
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c117c2c89

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread pyproject.toml
"fast_simplification>=0.1.12",
# Surface reconstruction library for particle data from SPH simulations
"pysplashsurf==0.14.*",
"torch>=2.11.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove hard Torch dependency from base install

Adding torch>=2.11.0 to core project.dependencies makes uv sync resolve Torch from the default index before users can choose the platform-specific wheel, which conflicts with the documented setup flow (README.md installs Torch separately via CUDA/CPU/MPS indexes) and can fail or pull an incompatible backend build on GPU/Metal environments. This change can block environment setup for supported platforms, so Torch should remain an explicit post-sync install (or be moved to a backend-specific extra) rather than a mandatory base dependency.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

🔴 Benchmark Regression Detected ➡️ Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant