[FEATURE] Add nvidia-smi fallback for GPU detection in cloud environments by vlordier · Pull Request #2694 · Genesis-Embodied-AI/Genesis

vlordier · 2026-04-11T07:20:48Z

Summary

Adds fallback to nvidia-smi commands when /proc/driver/nvidia/gpus/ interface is unavailable
Supports cloud GPU instances and containers lacking the /proc interface
Maintains backward compatibility - existing behavior unchanged when /proc interface is available

Changes

tests/conftest.py:
- Added nvidia-smi fallback in _get_gpu_indices() using --list-gpus to count GPUs
- Added nvidia-smi fallback in _torch_get_gpu_idx() using --query-gpu=uuid to match GPU UUID
- Both fallbacks gracefully degrade to single-GPU mode when nvidia-smi is also unavailable

Usage

This change is transparent to users - Genesis will automatically detect GPUs using:

/proc/driver/nvidia/gpus/ (primary method, unchanged)
nvidia-smi commands (fallback for cloud/container environments)
Single GPU assumption (final fallback)

Benefit

Enables Genesis to run on cloud GPU providers that don't mount the NVIDIA proc interface
Allows containers with only /dev/nvidia* access to utilize multi-GPU detection
Prevents unnecessary warnings and fallbacks to single-GPU mode in supported environments

Closes #2683

- Add history_length parameter to ContactForce sensor options - Override read() to return historical force readings from ring buffer - Update return format to include history dimension - Extend ring buffer size to accommodate history

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot

Pull request overview

This PR primarily targets improved GPU detection in Linux cloud/container environments by falling back to nvidia-smi when the NVIDIA /proc interface is unavailable, but it also introduces additional sensor and terrain-related API changes.

Changes:

Add nvidia-smi fallbacks for multi-GPU detection and UUID-based GPU indexing in test setup.
Add history_length to ContactForce sensor options and adjust sensor buffering logic.
Add terrain sampling helpers (get_height_at, get_normal_at) to RigidEntity.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/conftest.py	Adds `nvidia-smi` fallback paths for GPU enumeration and UUID matching when `/proc/driver/nvidia/gpus/` is missing.
genesis/options/sensors/options.py	Introduces `history_length` option for `ContactForce` sensor and documents intended history behavior.
genesis/engine/sensors/sensor_manager.py	Updates buffer length calculation to account for sensors that declare `history_length`.
genesis/engine/sensors/contact_force.py	Implements `history_length` behavior in `ContactForceSensor.read()` and adjusts debug drawing data source.
genesis/engine/entities/rigid_entity/rigid_entity.py	Adds public terrain query helpers for height and surface normal interpolation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T07:25:04Z

+        return self.read(envs_idx)
+


read_ground_truth() now delegates to read(), which returns measured (noisy/quantized/clipped/delayed) values when history_length == 1. This breaks the contract of read_ground_truth() and will fail existing tests that expect ground-truth force (e.g., tests/test_sensors.py asserts ground truth is 0.0 before contact). Implement a separate ground-truth path that reads from SensorManager.get_cloned_from_cache(..., is_ground_truth=True) (and, if supporting history, a ground-truth history buffer).

Suggested change

return self.read(envs_idx)

envs_idx = self._sanitize_envs_idx(envs_idx)

history_length = self._options.history_length

if history_length == 1:

return self._get_formatted_data(

self._manager.get_cloned_from_cache(self, is_ground_truth=True), envs_idx

)

buffered_data = self._manager._buffered_data[gs.tc_float]

cache_slice = slice(self._cache_idx, self._cache_idx + 3)

n_envs = self._manager._sim.n_envs

history_data = []

for i in range(history_length):

hist = buffered_data.at(i, envs_idx, cache_slice)

if n_envs == 0:

hist = hist.reshape(3)

else:

hist = hist.reshape(n_envs, 3)

history_data.append(hist)

result = torch.stack(history_data, dim=1)

return result.squeeze(1) if n_envs == 0 else result

Copilot · 2026-04-11T07:25:05Z

+            if n_envs == 0:
+                hist = hist.reshape(3)
+            else:
+                hist = hist.reshape(n_envs, 3)


For history_length > 1, read() pulls from self._manager._buffered_data[gs.tc_float], but this ring buffer is populated with shared_ground_truth_cache in _update_shared_cache() before delay/noise/clipping/quantization are applied. That means the returned history bypasses the documented measured-sensor processing (and can disagree with read() for history_length == 1). Consider storing the processed cache into a dedicated history ring buffer, or moving the ring-buffer set() to after processing so the buffer contains measured values.

Suggested change

hist = hist.reshape(n_envs, 3)

hist = hist.reshape(n_envs, 3)

hist = self._get_formatted_data(hist, envs_idx)

Copilot · 2026-04-11T07:25:05Z

+        history_data = []
+        for i in range(history_length):
+            hist = buffered_data.at(i, envs_idx, cache_slice)
+            if n_envs == 0:
+                hist = hist.reshape(3)
+            else:
+                hist = hist.reshape(n_envs, 3)


The history path reshapes each hist using n_envs = self._manager._sim.n_envs, but buffered_data.at(i, envs_idx, ...) returns data for len(envs_idx) environments (which may be a subset). Using n_envs will produce incorrect shapes or a reshape error when envs_idx is not all environments. Reshape based on len(envs_idx) (and keep the non-batched n_envs == 0 behavior consistent with Sensor._get_formatted_data).

Suggested change

history_data = []

for i in range(history_length):

hist = buffered_data.at(i, envs_idx, cache_slice)

if n_envs == 0:

hist = hist.reshape(3)

else:

hist = hist.reshape(n_envs, 3)

selected_n_envs = len(envs_idx) if n_envs != 0 else 0

history_data = []

for i in range(history_length):

hist = buffered_data.at(i, envs_idx, cache_slice)

if n_envs == 0:

hist = hist.reshape(3)

else:

hist = hist.reshape(selected_n_envs, 3)

Copilot · 2026-04-11T07:25:05Z

+        result = torch.stack(history_data, dim=1)
+        return result.squeeze(1) if n_envs == 0 else result


In the non-batched case (sim.n_envs == 0) and history_length > 1, torch.stack(history_data, dim=1) stacks 1D tensors of shape (3,) along dim=1, producing shape (3, history_length) (transposed vs the docstring expectation). If you want (history_length, 3) for the single-env case, stack along dim=0 (or ensure tensors are shaped (1,3) before stacking).

Suggested change

result = torch.stack(history_data, dim=1)

return result.squeeze(1) if n_envs == 0 else result

stack_dim = 0 if n_envs == 0 else 1

result = torch.stack(history_data, dim=stack_dim)

return result

Copilot · 2026-04-11T07:25:05Z

        quat = self._link.get_quat(env_idx).reshape((4,))

-        force = self.read(env_idx).reshape((3,))
+        force = self._manager.get_cloned_from_cache(self, is_ground_truth=False)[0, :3].reshape((3,))


_draw_debug() now always reads get_cloned_from_cache(...)[0, :3], which hard-codes environment 0 and ignores env_idx = context.rendered_envs_idx[0]. If rendered_envs_idx[0] != 0 (or if you later change debug rendering to other envs), the debug arrow will visualize the wrong environment’s force. Index the cloned cache with the correct env index (or reuse self.read(env_idx) while ensuring history-length handling doesn’t break debug).

Suggested change

force = self._manager.get_cloned_from_cache(self, is_ground_truth=False)[0, :3].reshape((3,))

cache = self._manager.get_cloned_from_cache(self, is_ground_truth=False)

force = cache[env_idx, :3].reshape((3,)) if env_idx is not None else cache[:3].reshape((3,))

Copilot · 2026-04-11T07:25:06Z

        The maximum output absolute force per each axis. Values above this will be clipped. Default is infinity.
+    history_length : int, optional
+        The number of historical force readings to store and return. Default is 1 (current value only).
+        When > 1, the sensor returns a history buffer of shape (history_length, 3) per environment.


The docstring says the history buffer is of shape (history_length, 3) per environment, but the sensor system generally returns batched tensors shaped (n_envs, ...) when n_envs > 0. Please clarify the expected return shape for history_length > 1 (e.g., (n_envs, history_length, 3) for batched, (history_length, 3) for single-env) and ensure the implementation matches it.

Suggested change

When > 1, the sensor returns a history buffer of shape (history_length, 3) per environment.

When > 1, the sensor returns a history buffer with shape (history_length, 3) for a single environment,

or (n_envs, history_length, 3) when returned in batched form across multiple environments.

Copilot · 2026-04-11T07:25:06Z

+    @gs.assert_built
+    def get_height_at(self, x: float, y: float) -> float:
+        """
+        Get terrain height at world position (x, y).
+
+        Uses bilinear interpolation from the height field.
+
+        Parameters


This PR introduces additional public API/features beyond the stated GPU-detection fallback (e.g., new RigidEntity.get_height_at() / get_normal_at() terrain helpers). Either update the PR description/title to reflect these additions or split them into a separate PR to keep review scope focused and reduce merge risk.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2fbb32004

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T07:25:46Z

+        """
+        Read the ground truth sensor data (without noise).
+        """
+        return self.read(envs_idx)


Return cached ground truth in read_ground_truth

read_ground_truth() now directly calls read(), which uses the non-ground-truth path when history_length == 1. That means sensors configured with noise/bias/delay/clipping/quantization can return processed values from read_ground_truth() instead of true contact forces, violating the sensor API contract and breaking existing expectations for ground-truth reads.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T07:25:46Z

+            if n_envs == 0:
+                hist = hist.reshape(3)
+            else:
+                hist = hist.reshape(n_envs, 3)


Reshape history using requested env count

When history_length > 1, each history slice is reshaped with global n_envs rather than len(envs_idx). If callers request a subset via read(envs_idx=...), buffered_data.at() returns data sized to that subset, so reshape(n_envs, 3) can raise a runtime error (or produce invalid shape assumptions) in parallel scenes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T07:25:46Z

+        envs_idx = self._sanitize_envs_idx(envs_idx)
+        history_length = self._options.history_length
+
+        buffered_data = self._manager._buffered_data[gs.tc_float]


Read history from processed cache, not raw buffer

The history branch reads from _buffered_data, but _update_shared_cache() writes shared_ground_truth_cache into that ring buffer before applying delay, noise, clipping, and quantization. So read() returns raw undelayed forces whenever history_length > 1, which silently changes semantics versus the history_length == 1 path and contradicts the method contract (“with noise applied if applicable”).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T07:25:46Z

+        x_idx = x / h_scale
+        y_idx = y / h_scale


Transform world query into terrain-local coordinates

get_height_at() is documented to accept world (x, y) but directly converts those values to grid indices without accounting for the terrain entity pose. Since _load_terrain allows non-zero morph.pos/morph.quat, translated or rotated terrains will return incorrect heights from this method (and get_normal_at() has the same frame mismatch).

Useful? React with 👍 / 👎.

- Fix reshape crash in read() for batched env subsets: use n_query_envs instead of global n_envs when reshaping history slices - Restore read_ground_truth() contract: return noise-free ground truth cache instead of delegating to read() which returns processed data - Fix _draw_debug() env index: use cache[env_idx] instead of hardcoded cache[0] for correct environment in multi-env scenes Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

- Adds entity.get_height_at(x, y) to query terrain height at world position using bilinear interpolation from height field - Adds entity.get_normal_at(x, y) to compute surface normal at world position from height field gradient - Both methods handle boundary conditions gracefully Closes Genesis-Embodied-AI#2094

…se transform - Fix coordinate indexing: hf[y,x] -> hf[x,y] since heightfield is stored as [row, col] where row corresponds to x - Add pose transformation: convert world coords to terrain local frame using inv_transform_by_trans_quat(terrain_pos, terrain_quat) - Transform normals back to world frame with transform_by_quat - Height now includes terrain z-offset (terrain_pos[2]) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

…ents - Add fallback to in _get_gpu_indices() when /proc/driver/nvidia/gpus/ is unavailable - Add fallback to in _torch_get_gpu_idx() when /proc interface is missing - Handles cloud GPU instances and containers lacking /proc interface - Graceful degradation to single-GPU mode when both methods fail Closes Genesis-Embodied-AI#2683

duburcqa · 2026-04-11T19:46:51Z

Supersedes by #2680. Closing.

vlordier added 4 commits April 10, 2026 23:16

Fix history read shapes and stack dimension

3dcf064

Remove test file from PR

745a438

Fix debug arrow to use current force instead of history

7783c00

Copilot AI review requested due to automatic review settings April 11, 2026 07:20

vlordier requested review from YilingQiao and duburcqa as code owners April 11, 2026 07:20

claude bot reviewed Apr 11, 2026

View reviewed changes

Copilot started reviewing on behalf of vlordier April 11, 2026 07:21 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 11, 2026

View reviewed changes

vlordier and others added 5 commits April 11, 2026 15:00

Remove test file

d05edc6

vlordier force-pushed the feature/nvidia-smi-fallback branch from fb6b2ae to ccf9c5e Compare April 11, 2026 13:28

duburcqa closed this Apr 11, 2026

	hist = hist.reshape(n_envs, 3)
	hist = hist.reshape(n_envs, 3)
	hist = self._get_formatted_data(hist, envs_idx)

		result = torch.stack(history_data, dim=1)
		return result.squeeze(1) if n_envs == 0 else result

	force = self._manager.get_cloned_from_cache(self, is_ground_truth=False)[0, :3].reshape((3,))
	cache = self._manager.get_cloned_from_cache(self, is_ground_truth=False)
	force = cache[env_idx, :3].reshape((3,)) if env_idx is not None else cache[:3].reshape((3,))

	When > 1, the sensor returns a history buffer of shape (history_length, 3) per environment.
	When > 1, the sensor returns a history buffer with shape (history_length, 3) for a single environment,
	or (n_envs, history_length, 3) when returned in batched form across multiple environments.

Conversation

vlordier commented Apr 11, 2026

Summary

Changes

Usage

Benefit

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

duburcqa commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants