[PERF] Add distances-only mode to raycaster (return_points=False)#2908
[PERF] Add distances-only mode to raycaster (return_points=False)#2908Kashu7100 wants to merge 1 commit into
Conversation
|
🔴 Benchmark Regression Detected ➡️ Report |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99c39d6c7f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| max_range: PositiveFloat = 20.0 | ||
| no_hit_value: float | None = None | ||
| return_world_frame: StrictBool = False | ||
| return_points: StrictBool = True |
There was a problem hiding this comment.
Guard debug drawing when points are disabled
When return_points=False is combined with the existing draw_debug=True option, read().points is now None, but RaycasterSensor._draw_debug still unconditionally calls data.points.reshape((-1, 3)). Any distances-only Raycaster/DepthCamera with debug drawing enabled will therefore crash during the viewer/debug draw path instead of drawing only ray starts or rejecting the option combination.
Useful? React with 👍 / 👎.
A Raycaster/DepthCamera stores both the per-ray xyz hit point and the hit distance in its output cache (H*W*(3+1) floats/env), but depth-image consumers read only `.distances`. Add `return_points: bool = True`; when False the sensor skips computing and storing the hit points, shrinking the output cache ~4x and cutting the per-ray write bandwidth. The cast kernels gate the point writes per-sensor (`sensor_return_points`) and locate the distance block via a per-sensor point-region size (`sensor_point_region`, 0 when points are off), so a points sensor and a distances-only sensor can share one BVH cast. Per-sensor cache offsets are now a cumulative sum so sensors of differing cache sizes pack correctly. `read().points` is None when disabled; `.distances` is bit-identical to the points-on result. Static-scene depth benchmark (RTX 3080, plane + fixed boxes): 4096 env / 64x36 : VRAM 1153 -> 289 MB (4.0x), step 9.3 -> 5.8 ms 1024 env / 128x96: VRAM 1538 -> 385 MB (4.0x), step 19.9 -> 7.7 ms (2.6x) 4096 env / 128x96: VRAM 6147 -> 1538 MB (4.0x), step 77.5 -> 30.1 ms (2.6x) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
99c39d6 to
0742ce3
Compare
Summary
A
Raycaster/DepthCameraalways stores both the per-ray xyz hit point and the hit distance in its output cache (H*W*(3+1)floats per env), even though depth-image consumers read only.distances. For a depth camera that means ¾ of the cache (and the per-ray write bandwidth) is spent on points nobody reads.This PR adds
return_points: bool = Trueto the raycaster options. Withreturn_points=Falsethe sensor skips computing and storing the hit points:H*Wdistances), andread().pointsisNonewhen disabled;.distancesis bit-identical to the points-on result.Implementation
return_pointsflag onRaycasteroptions (inherited byDepthCamera)._get_return_formatdrops the(*shape, 3)points field when disabled, so the shared cache is allocated 4× smaller automatically.kernel_cast_rays,kernel_cast_rays_visual) gate point writes per-sensor viasensor_return_points, and locate the distance block via a per-sensorsensor_point_region(=num_rays*3with points,0without) instead of a hardcoded*3. This lets a points sensor and a distances-only sensor share one BVH cast.cache_size*(idx+1), which assumed a uniform per-sensor cache size) so sensors of differing cache sizes pack correctly._get_formatted_datare-wraps the single distances field back intoRaycasterData(points=None, distances=...)so the public NamedTuple contract is unchanged.Benchmark
Static depth scene (RTX 3080, plane + fixed boxes), perception cost as the marginal
scene.step()time;return_points=True→False:The speed win grows with rays×envs because the cast was bandwidth-bound writing the unused xyz.
Tests
Adds
test_raycaster_return_points_false(n_envs ∈ {0, 2}): assertspoints is None, distances finite, distances bit-identical to a points-on sensor in the same scene, and the points-on sensor stays self-consistent (‖hit_point‖ == distance) while sharing the cache with a distances-only sensor (exercises the cumulative-offset packing). Existingtest_raycaster_hits,test_raycaster_against_visual,test_lidar_cache_offset_parallel_env,test_lidar_bvh_parallel_env,test_shared_contextstill pass.Draft: opened for review of the API name (
return_points) and the cache-offset change before finalizing.🤖 Generated with Claude Code