Skip to content

[MISC] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests.#2951

Merged
duburcqa merged 3 commits into
Genesis-Embodied-AI:mainfrom
duburcqa:fix_flaky_ci
Jun 15, 2026
Merged

[MISC] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests.#2951
duburcqa merged 3 commits into
Genesis-Embodied-AI:mainfrom
duburcqa:fix_flaky_ci

Conversation

@duburcqa

Copy link
Copy Markdown
Collaborator

Description

Fixes flaky CI failures on software renderers and GPU runners. Three independent changes:

  • Offscreen GL context isolation (bug fix). Each offscreen scene owns its own GL context, but the platform's current-context pointer is process/thread-global. Tearing down one context forced it current and unconditionally released that global pointer, so when cyclic GC collected a stale scene mid-render the in-flight render lost its context and raised Attempt to retrieve context when no valid context. OffscreenRenderer.delete() no longer forces its context current, and EGLPlatform.delete_context() releases the current context only when it is actually its own. Only surfaces on EGL (Linux/GPU).
  • Render snapshot tests. test_deformable_uv_textures disables shadows and shrinks the ground plane to stay within the frustum on the Apple Software Renderer (Rasterizer only; the RayTracer keeps its scene and snapshot). The pixel comparison is factored into a shared assert_pixel_match helper.
  • Physics tolerances. Relaxed resting-velocity tolerance in test_convexify (0.5 -> 0.6) and test_nonconvex_concentric_contact (0.06 -> 0.07).

How Has This Been Tested?

New regression test test_offscreen_context_isolation forces the mid-render teardown: it fails on main with the exact error and passes with the fix (verified on Linux EGL via Mesa llvmpipe and locally on macOS). Render and rigid-physics suites pass locally.

Checklist:

  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I have added tests to cover my changes.

@duburcqa duburcqa requested a review from YilingQiao as a code owner June 15, 2026 20:44
@duburcqa duburcqa changed the title [BUG FIX] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests [MISC] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests Jun 15, 2026
@duburcqa duburcqa changed the title [MISC] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests [MISC] Fix flaky CI: isolate offscreen GL contexts, stabilize render/physics tests. Jun 15, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d959c4da4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/conftest.py
from PIL import Image
from syrupy.extensions.image import PNGImageSnapshotExtension

from .utils import IMG_BLUR_KERNEL_SIZE, IMG_NUM_ERR_THR, IMG_STD_ERR_THR, assert_pixel_match

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move the utils import after test environment setup

This top-level import now loads tests/utils.py, which imports both mujoco and genesis, before this file sets MUJOCO_GL and PYGLET_HEADLESS below. On headless Linux/EGL workers that means MuJoCo/pyglet/Genesis can be initialized with the wrong GL backend during pytest collection, before the headless settings at lines 65-75 take effect. Please keep the pixel constants/helper import below the environment setup or split the helper into a module that does not import Genesis/MuJoCo.

Useful? React with 👍 / 👎.

# process/thread-global, so making it current here would clobber the context another renderer may be
# using (e.g. while it is mid-render, when this renderer is being torn down by garbage collection).
# 'delete_context' makes itself current only when the platform requires it.
self._platform.delete_context()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop teardown from making stale EGL contexts current

This change only stops OffscreenRenderer.delete() itself from forcing its context current, but the normal scene teardown path still calls self._renderer.make_current(); self._renderer.delete() in Rasterizer.destroy(). In the mid-render GC/scene_b.destroy() scenario modeled by the new test, scene B therefore still steals the thread's current EGL context and then EGLPlatform.delete_context() uncurrents it, leaving scene A with no current context. Please remove or guard the caller-side make_current() as well, otherwise the production path and the newly required regression test still hit the original failure.

Useful? React with 👍 / 👎.

@duburcqa duburcqa merged commit 289af1a into Genesis-Embodied-AI:main Jun 15, 2026
9 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant