Skip to content

[PERF] Add distances-only mode to raycaster (return_points=False)#2908

Open
Kashu7100 wants to merge 1 commit into
Genesis-Embodied-AI:mainfrom
Kashu7100:kashu/raycast-distances-only
Open

[PERF] Add distances-only mode to raycaster (return_points=False)#2908
Kashu7100 wants to merge 1 commit into
Genesis-Embodied-AI:mainfrom
Kashu7100:kashu/raycast-distances-only

Conversation

@Kashu7100

Copy link
Copy Markdown
Collaborator

Summary

A Raycaster/DepthCamera always stores both the per-ray xyz hit point and the hit distance in its output cache (H*W*(3+1) floats per env), even though depth-image consumers read only .distances. For a depth camera that means ¾ of the cache (and the per-ray write bandwidth) is spent on points nobody reads.

This PR adds return_points: bool = True to the raycaster options. With return_points=False the sensor skips computing and storing the hit points:

  • the output cache shrinks ~4× (just the H*W distances), and
  • the cast kernel stops doing the per-ray point transform + 3 writes.

read().points is None when disabled; .distances is bit-identical to the points-on result.

Implementation

  • return_points flag on Raycaster options (inherited by DepthCamera).
  • _get_return_format drops the (*shape, 3) points field when disabled, so the shared cache is allocated 4× smaller automatically.
  • The cast kernels (kernel_cast_rays, kernel_cast_rays_visual) gate point writes per-sensor via sensor_return_points, and locate the distance block via a per-sensor sensor_point_region (= num_rays*3 with points, 0 without) instead of a hardcoded *3. This lets a points sensor and a distances-only sensor share one BVH cast.
  • Per-sensor cache offsets are now a cumulative sum (was cache_size*(idx+1), which assumed a uniform per-sensor cache size) so sensors of differing cache sizes pack correctly.
  • _get_formatted_data re-wraps the single distances field back into RaycasterData(points=None, distances=...) so the public NamedTuple contract is unchanged.

Benchmark

Static depth scene (RTX 3080, plane + fixed boxes), perception cost as the marginal scene.step() time; return_points=TrueFalse:

envs res VRAM step time
4096 64×36 1153 → 289 MB (4.0×) 9.3 → 5.8 ms (1.6×)
1024 128×96 1538 → 385 MB (4.0×) 19.9 → 7.7 ms (2.6×)
4096 128×96 6147 → 1538 MB (4.0×) 77.5 → 30.1 ms (2.6×)

The speed win grows with rays×envs because the cast was bandwidth-bound writing the unused xyz.

Tests

Adds test_raycaster_return_points_false (n_envs ∈ {0, 2}): asserts points is None, distances finite, distances bit-identical to a points-on sensor in the same scene, and the points-on sensor stays self-consistent (‖hit_point‖ == distance) while sharing the cache with a distances-only sensor (exercises the cumulative-offset packing). Existing test_raycaster_hits, test_raycaster_against_visual, test_lidar_cache_offset_parallel_env, test_lidar_bvh_parallel_env, test_shared_context still pass.

Draft: opened for review of the API name (return_points) and the cache-offset change before finalizing.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

🔴 Benchmark Regression Detected ➡️ Report

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c39d6c7f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

max_range: PositiveFloat = 20.0
no_hit_value: float | None = None
return_world_frame: StrictBool = False
return_points: StrictBool = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard debug drawing when points are disabled

When return_points=False is combined with the existing draw_debug=True option, read().points is now None, but RaycasterSensor._draw_debug still unconditionally calls data.points.reshape((-1, 3)). Any distances-only Raycaster/DepthCamera with debug drawing enabled will therefore crash during the viewer/debug draw path instead of drawing only ray starts or rejecting the option combination.

Useful? React with 👍 / 👎.

A Raycaster/DepthCamera stores both the per-ray xyz hit point and the hit
distance in its output cache (H*W*(3+1) floats/env), but depth-image
consumers read only `.distances`. Add `return_points: bool = True`; when
False the sensor skips computing and storing the hit points, shrinking the
output cache ~4x and cutting the per-ray write bandwidth.

The cast kernels gate the point writes per-sensor (`sensor_return_points`)
and locate the distance block via a per-sensor point-region size
(`sensor_point_region`, 0 when points are off), so a points sensor and a
distances-only sensor can share one BVH cast. Per-sensor cache offsets are
now a cumulative sum so sensors of differing cache sizes pack correctly.

`read().points` is None when disabled; `.distances` is bit-identical to the
points-on result.

Static-scene depth benchmark (RTX 3080, plane + fixed boxes):
  4096 env / 64x36 :  VRAM 1153 -> 289 MB (4.0x),  step 9.3  -> 5.8  ms
  1024 env / 128x96:  VRAM 1538 -> 385 MB (4.0x),  step 19.9 -> 7.7  ms (2.6x)
  4096 env / 128x96:  VRAM 6147 -> 1538 MB (4.0x), step 77.5 -> 30.1 ms (2.6x)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Kashu7100 Kashu7100 force-pushed the kashu/raycast-distances-only branch from 99c39d6 to 0742ce3 Compare June 10, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant