[PERF] improve MPM solver speed. by Kashu7100 · Pull Request #2720 · Genesis-Embodied-AI/Genesis

Kashu7100 · 2026-04-20T08:29:21Z

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

- Drop the unused [f+1] grid frame; grid is only indexed at [f] in p2g/g2p. - Fuse compute_F_tmp + svd on the forward pass to keep F_tmp in registers; keep them separate on the autodiff path so the backward composition is unchanged. - Rate-limit _is_state_valid to every 10 substeps so the NaN check no longer forces a GPU->CPU sync every substep. - Add benches/mpm_rigid_bench.py: Franka-squeezes-elastic-cube harness that reports per-step/substep wall-clock, peak GPU memory, and a mean-position fingerprint so variants can be compared against a baseline run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MPM materials now expose `needs_svd`. For Elastic(neohooken) and non-viscous Liquid, neither the F-update nor the stress uses U/V/S, so the SVD kernel is pure waste. When no registered material needs SVD, the solver dispatches to a new `compute_F_tmp_only` kernel and p2g reads J from `F_tmp.determinant()` via a qd.static branch (det(F_tmp) == det(S) for qd.svd's proper-rotation U/V). Also split grid reset: the non-differentiable forward path uses `reset_grid`, which skips the grad-buffer writes that `reset_grid_and_grad` still performs on the autodiff path. Halves reset DRAM traffic for forward-only runs. Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs): baseline (3 runs): 9.77 / 9.71 / 9.24 ms/step, mean 9.58 plan 1 (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67 Within run-to-run variance; corotation still requires SVD on this scene, so the SVD-skip path isn't exercised. Fingerprint identical across runs (0.641477, 0.001549, 0.064053). Verified neohooken / non-viscous liquid scenes now take the SVD-free path end-to-end. Harness: `benches/mpm_rigid_bench.py` now prepends the repo root to sys.path so the editable checkout wins over any site-packages `genesis` namespace stub left behind by a prior wheel install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Dense reset_grid zeros all ~30K cells per batch every substep even when particles only touch a few percent of them. Replace on the forward path with a per-slot dirty list: - init_grid_fields now allocates grid_dirty_list (substeps_local, max_dirty_cells, B) and grid_dirty_count (substeps_local, B), sized from n_particles * 27 (each particle scatters to a 3^3 neighbourhood, so n_particles*27 is the exact upper bound on distinct cells touched per batch). - p2g's grid-scatter now captures the prior mass via atomic_add; the unique thread that sees prev_mass == 0 appends the flat cell index into grid_dirty_list[f, :, :]. Guarded by qd.static on `_sparse_reset_enabled = not requires_grad` so the autodiff composition of p2g.grad is untouched. - sparse_reset_grid(f) zeros only the cells in grid_dirty_list[f, :, count], then clears the counter in the same kernel. Fields are zero-initialized, so the first pass through each slot is a correct no-op (grid already zero). - substep_pre_coupling forward path now calls sparse_reset_grid instead of reset_grid; the diff path still calls reset_grid_and_grad unchanged. Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs): plan 1 (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67 plan 2 (3 runs): 9.11 / 9.00 / 9.06 ms/step, mean 9.05 ~6.5% speedup, fingerprint identical (0.641477, 0.001549, 0.064053). For larger grids or sparser particle occupancy the relative win scales with grid_size / (n_particles * 27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Kashu7100 · 2026-04-20T08:36:15Z

    "fast_simplification>=0.1.12",
    # Surface reconstruction library for particle data from SPH simulations
    "pysplashsurf==0.14.*",
+    "torch>=2.11.0",


remove this

Kashu7100 · 2026-04-20T08:36:27Z

@@ -0,0 +1,251 @@
+"""


remove this file

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c117c2c89

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T08:39:15Z

    "fast_simplification>=0.1.12",
    # Surface reconstruction library for particle data from SPH simulations
    "pysplashsurf==0.14.*",
+    "torch>=2.11.0",


Remove hard Torch dependency from base install

Adding torch>=2.11.0 to core project.dependencies makes uv sync resolve Torch from the default index before users can choose the platform-specific wheel, which conflicts with the documented setup flow (README.md installs Torch separately via CUDA/CPU/MPS indexes) and can fail or pull an incompatible backend build on GPU/Metal environments. This change can block environment setup for supported platforms, so Torch should remain an explicit post-sync install (or be moved to a backend-specific extra) rather than a mandatory base dependency.

Useful? React with 👍 / 👎.

github-actions · 2026-04-20T10:49:18Z

🔴 Benchmark Regression Detected ➡️ Report

Kashu7100 and others added 3 commits April 18, 2026 18:01

Kashu7100 requested review from YilingQiao and duburcqa as code owners April 20, 2026 08:29

claude bot reviewed Apr 20, 2026

View reviewed changes

Kashu7100 commented Apr 20, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] improve MPM solver speed.#2720

[PERF] improve MPM solver speed.#2720
Kashu7100 wants to merge 3 commits intoGenesis-Embodied-AI:mainfrom
Kashu7100:feat-mpm

Kashu7100 commented Apr 20, 2026

Uh oh!

claude bot left a comment

Uh oh!

Kashu7100 Apr 20, 2026

Uh oh!

Kashu7100 Apr 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kashu7100 commented Apr 20, 2026

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Kashu7100 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Kashu7100 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant