Skip to content

[PERF] Share static-geometry raycast BVH across envs (N trees, not B)#2914

Open
Kashu7100 wants to merge 2 commits into
Genesis-Embodied-AI:mainfrom
Kashu7100:kashu/raycast-shared-static-bvh
Open

[PERF] Share static-geometry raycast BVH across envs (N trees, not B)#2914
Kashu7100 wants to merge 2 commits into
Genesis-Embodied-AI:mainfrom
Kashu7100:kashu/raycast-shared-static-bvh

Conversation

@Kashu7100

@Kashu7100 Kashu7100 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

The raycast BVH (Raycaster / DepthCamera) is allocated per envnodes, aabbs, morton_codes, and the radix-sort scratch are all shape=(n_envs, …). For a high-poly static terrain this per-env replication is the dominant GPU-memory cost and OOMs at a few thousand envs — even though the per-env trees are identical (or fall into a handful of distinct variants), and the cast already reads batch 0 once #2867's runtime check finds them identical (that saves cast work, not memory).

This adds RigidOptions.shared_static_raycast_bvh (default False). When True, the static (fully-fixed) collision BVH is allocated per distinct geometry rather than per env — envs that share geometry share one tree:

  • homogeneous scene → 1 shared tree
  • N variants across n_envs (N ≪ n_envs, e.g. a terrain curriculum) → N trees

Mechanism

Two routing arrays unify all three layouts: env_bvh_idx[n_env] (which tree each env casts against — arange per-env, 0 shared, group id grouped) and batch_repr_env[n_batches] (which env's geometry builds each tree). The cast kernels index env_bvh_idx; update_aabbs builds each tree slot from its representative env. Groups are derived from the per-env active-geom-range signature in batched links_info (the data Genesis already populates for heterogeneous envs), so N variants yield N groups automatically. The per-env path is env_bvh_idx=arange (unchanged); #2867's runtime shared detection becomes env_bvh_idx=0.

Why opt-in

Env-identity is a runtime property — a per-env set_pos on a fixed body diverges geometry after build (exercised by test_lidar_bvh_parallel_env), and detecting it requires the per-env trees we're eliminating. The flag is a caller guarantee the static geometry stays as built. Default False preserves current behavior + the runtime detection exactly.

Benchmark

Raycast DepthCamera over static eden_dwbp terrain (RTX 3080), total GPU memory (mem_get_info — BVH lives in Quadrants fields invisible to torch.cuda.max_memory_allocated):

scene envs per-env (off) shared/grouped (on) reduction
1 terrain (8087 faces) 256 775 MB 154 MB 5.0×
1 terrain 1024 2884 MB 366 MB 7.9×
1 terrain 4096 OOM runs
4 terrains 1024 7300 MB 530 MB 13.8×

Depth is bit-identical to the per-env path (homogeneous and N-variant); cast speed unchanged (each env already read its tree; build is one-time).

Tests

  • test_raycaster_shared_static_bvh: homogeneous → 1 tree, shared, distances identical across envs.
  • test_raycaster_grouped_static_bvh: 3 variants × 12 envs → 3 trees, env_bvh_idx == [0,0,0,0,1,1,1,1,2,2,2,2], grouped distances bit-identical to per-env reference.
  • Default + per-env set_pos divergence covered by test_lidar_bvh_parallel_env; heterogeneous fall-back by test_raycaster_heterogeneous_object. Full raycaster/lidar suite passes.

Notes

  • Composes with the per-sensor distances-only mode ([PERF] Add distances-only mode to raycaster (return_points=False) #2908) — orthogonal memory levers.
  • For a solver mixing static terrain with a moving entity (e.g. a robot), maybe_static is solver-wide so the flag has no effect there yet; it composes with a static/dynamic BVH split to extend the benefit to those scenes.

🤖 Generated with Claude Code

@Kashu7100 Kashu7100 force-pushed the kashu/raycast-shared-static-bvh branch from 27c3160 to 89a1d7e Compare June 8, 2026 10:23
@Kashu7100 Kashu7100 changed the title [PERF] Optional shared static-geometry raycast BVH (n_batches=1) [PERF] Share static-geometry raycast BVH across envs (N trees, not B) Jun 8, 2026
@Kashu7100 Kashu7100 force-pushed the kashu/raycast-shared-static-bvh branch from 89a1d7e to 9304fb8 Compare June 9, 2026 06:56
The raycast BVH (Raycaster / DepthCamera) is allocated per env
(n_batches=n_envs): nodes, AABBs, morton codes and radix-sort scratch are
all replicated across envs. For a high-poly static terrain this is the
dominant GPU-memory cost — it OOMs at a few thousand envs even though the
trees are identical (or fall into a handful of distinct variants) and the
cast already reads batch 0 when it detects the trees match (Genesis-Embodied-AI#2867).

Add RigidOptions.shared_static_raycast_bvh (default False). When True, the
static (fully-fixed) collision BVH is allocated per *distinct geometry*
rather than per env: envs that share geometry share one tree.
  - homogeneous scene            -> 1 shared tree
  - N variants across n_envs      -> N trees   (N << n_envs, e.g. a terrain
    curriculum: distinct geometries are read from the per-env active-geom
    ranges Genesis already stores for heterogeneous envs)

Mechanism: two small routing arrays unify all three layouts — env_bvh_idx
[n_env] (which tree each env casts against) and batch_repr_env[n_batches]
(which env's geometry builds each tree). The cast kernels index
env_bvh_idx; update_aabbs builds each tree slot from its representative env.
The per-env path is env_bvh_idx=arange (unchanged); the runtime
shared-across-envs detection (Genesis-Embodied-AI#2867) becomes env_bvh_idx=0.

Opt-in rather than auto-detected because env-identity is a runtime property
(a per-env set_pos on a fixed body diverges geometry after build); the flag
is a caller guarantee the static geometry stays grouped as built. Default
False keeps per-env trees and the runtime detection unchanged.

Benchmark — raycast DepthCamera over static eden_dwbp terrain (RTX 3080),
total GPU memory:
  single terrain, 64x36:   256 env  775 ->  154 MB (5.0x)
                          1024 env 2884 ->  366 MB (7.9x)
                          4096 env  OOM -> runs (was CUDA_ERROR_OUT_OF_MEMORY)
  4 terrains across 1024 env:     7300 ->  530 MB (13.8x)
Depth is bit-identical to the per-env path (homogeneous and N-variant);
cast speed is unchanged (it already read its tree; the build is one-time).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Kashu7100 Kashu7100 force-pushed the kashu/raycast-shared-static-bvh branch from 9304fb8 to a530b77 Compare June 10, 2026 09:18
@Kashu7100 Kashu7100 marked this pull request as ready for review June 11, 2026 11:22

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a530b7776c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread genesis/utils/raycast_qd.py
kernel_update_verts_and_aabbs gained a required batch_repr_env parameter,
but the interactive viewer caller (RaycasterViewerPlugin -> Raycaster) was
left unchanged, so any scene that opens the viewer raycaster would fail with
a missing-argument error before the BVH could build.

The viewer BVH is always per-env (n_batches=n_envs), so pass the identity
mapping arange(n_envs) for batch_repr_env, reproducing the pre-change per-env
semantics (i_env == i_b) exactly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

⚠️ Abnormal Benchmark Result Detected ➡️ Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant