[PERF] Share static-geometry raycast BVH across envs (N trees, not B)#2914
Open
Kashu7100 wants to merge 2 commits into
Open
[PERF] Share static-geometry raycast BVH across envs (N trees, not B)#2914Kashu7100 wants to merge 2 commits into
Kashu7100 wants to merge 2 commits into
Conversation
27c3160 to
89a1d7e
Compare
89a1d7e to
9304fb8
Compare
The raycast BVH (Raycaster / DepthCamera) is allocated per env (n_batches=n_envs): nodes, AABBs, morton codes and radix-sort scratch are all replicated across envs. For a high-poly static terrain this is the dominant GPU-memory cost — it OOMs at a few thousand envs even though the trees are identical (or fall into a handful of distinct variants) and the cast already reads batch 0 when it detects the trees match (Genesis-Embodied-AI#2867). Add RigidOptions.shared_static_raycast_bvh (default False). When True, the static (fully-fixed) collision BVH is allocated per *distinct geometry* rather than per env: envs that share geometry share one tree. - homogeneous scene -> 1 shared tree - N variants across n_envs -> N trees (N << n_envs, e.g. a terrain curriculum: distinct geometries are read from the per-env active-geom ranges Genesis already stores for heterogeneous envs) Mechanism: two small routing arrays unify all three layouts — env_bvh_idx [n_env] (which tree each env casts against) and batch_repr_env[n_batches] (which env's geometry builds each tree). The cast kernels index env_bvh_idx; update_aabbs builds each tree slot from its representative env. The per-env path is env_bvh_idx=arange (unchanged); the runtime shared-across-envs detection (Genesis-Embodied-AI#2867) becomes env_bvh_idx=0. Opt-in rather than auto-detected because env-identity is a runtime property (a per-env set_pos on a fixed body diverges geometry after build); the flag is a caller guarantee the static geometry stays grouped as built. Default False keeps per-env trees and the runtime detection unchanged. Benchmark — raycast DepthCamera over static eden_dwbp terrain (RTX 3080), total GPU memory: single terrain, 64x36: 256 env 775 -> 154 MB (5.0x) 1024 env 2884 -> 366 MB (7.9x) 4096 env OOM -> runs (was CUDA_ERROR_OUT_OF_MEMORY) 4 terrains across 1024 env: 7300 -> 530 MB (13.8x) Depth is bit-identical to the per-env path (homogeneous and N-variant); cast speed is unchanged (it already read its tree; the build is one-time). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9304fb8 to
a530b77
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a530b7776c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
kernel_update_verts_and_aabbs gained a required batch_repr_env parameter, but the interactive viewer caller (RaycasterViewerPlugin -> Raycaster) was left unchanged, so any scene that opens the viewer raycaster would fail with a missing-argument error before the BVH could build. The viewer BVH is always per-env (n_batches=n_envs), so pass the identity mapping arange(n_envs) for batch_repr_env, reproducing the pre-change per-env semantics (i_env == i_b) exactly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The raycast BVH (
Raycaster/DepthCamera) is allocated per env —nodes,aabbs,morton_codes, and the radix-sort scratch are allshape=(n_envs, …). For a high-poly static terrain this per-env replication is the dominant GPU-memory cost and OOMs at a few thousand envs — even though the per-env trees are identical (or fall into a handful of distinct variants), and the cast already reads batch 0 once #2867's runtime check finds them identical (that saves cast work, not memory).This adds
RigidOptions.shared_static_raycast_bvh(defaultFalse). WhenTrue, the static (fully-fixed) collision BVH is allocated per distinct geometry rather than per env — envs that share geometry share one tree:Mechanism
Two routing arrays unify all three layouts:
env_bvh_idx[n_env](which tree each env casts against —arangeper-env,0shared, group id grouped) andbatch_repr_env[n_batches](which env's geometry builds each tree). The cast kernels indexenv_bvh_idx;update_aabbsbuilds each tree slot from its representative env. Groups are derived from the per-env active-geom-range signature in batchedlinks_info(the data Genesis already populates for heterogeneous envs), so N variants yield N groups automatically. The per-env path isenv_bvh_idx=arange(unchanged); #2867's runtime shared detection becomesenv_bvh_idx=0.Why opt-in
Env-identity is a runtime property — a per-env
set_poson a fixed body diverges geometry after build (exercised bytest_lidar_bvh_parallel_env), and detecting it requires the per-env trees we're eliminating. The flag is a caller guarantee the static geometry stays as built. DefaultFalsepreserves current behavior + the runtime detection exactly.Benchmark
Raycast
DepthCameraover static eden_dwbp terrain (RTX 3080), total GPU memory (mem_get_info— BVH lives in Quadrants fields invisible totorch.cuda.max_memory_allocated):Depth is bit-identical to the per-env path (homogeneous and N-variant); cast speed unchanged (each env already read its tree; build is one-time).
Tests
test_raycaster_shared_static_bvh: homogeneous → 1 tree, shared, distances identical across envs.test_raycaster_grouped_static_bvh: 3 variants × 12 envs → 3 trees,env_bvh_idx == [0,0,0,0,1,1,1,1,2,2,2,2], grouped distances bit-identical to per-env reference.set_posdivergence covered bytest_lidar_bvh_parallel_env; heterogeneous fall-back bytest_raycaster_heterogeneous_object. Full raycaster/lidar suite passes.Notes
maybe_staticis solver-wide so the flag has no effect there yet; it composes with a static/dynamic BVH split to extend the benefit to those scenes.🤖 Generated with Claude Code