Skip to content

Add ROCm test suite (companion to #4720)#4824

Open
danielhanchen wants to merge 1 commit intomainfrom
feature/rocm-tests
Open

Add ROCm test suite (companion to #4720)#4824
danielhanchen wants to merge 1 commit intomainfrom
feature/rocm-tests

Conversation

@danielhanchen
Copy link
Copy Markdown
Contributor

Summary

  • 95 Python tests (tests/studio/install/test_rocm_support.py) and 23 shell tests (tests/sh/test_get_torch_index_url.sh) covering AMD ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logic
  • All tests use mocks -- no AMD hardware required
  • Split out from Add AMD ROCm/HIP support across installer and hardware detection #4720 to keep the main implementation PR focused

Test plan

  • python -m pytest tests/studio/install/test_rocm_support.py -v
  • bash tests/sh/test_get_torch_index_url.sh

95 Python tests and 23 shell tests covering ROCm detection,
torch index URL selection, hardware flags, prebuilt asset selection,
and install pathway logic. All tests use mocks -- no AMD hardware required.

Companion to #4720 (AMD ROCm/HIP support).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53e782f7b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sys.modules[_STACK_SPEC.name] = stack_mod
_STACK_SPEC.loader.exec_module(stack_mod)

_detect_rocm_version = stack_mod._detect_rocm_version
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Defer binding ROCm symbols until existence is verified

These module-level assignments unconditionally access private ROCm helpers from studio/install_python_stack.py; on the parent revision (8981e6c) those attributes are absent, so pytest aborts during collection with AttributeError before any test can run. This makes the commit non-runnable unless the companion implementation lands first, so the test module should guard missing symbols (for example via getattr/hasattr + pytest.skip) or be merged together with the implementation.

Useful? React with 👍 / 👎.

# 9) ROCm 6.3 (no nvidia-smi) -> rocm6.3
_dir=$(make_mock_amd_smi "6.3")
_result=$(run_func "$_dir")
assert_eq "ROCm 6.3 -> rocm6.3" "https://download.pytorch.org/whl/rocm6.3" "$_result"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid ROCm URL assertions before install.sh supports ROCm

This new expectation (and the related ROCm cases below it) currently fails deterministically against the base install.sh logic, which returns the CPU index whenever nvidia-smi is missing; running bash tests/sh/test_get_torch_index_url.sh yields multiple ROCm failures and exits non-zero. As a result, merging this test-only change ahead of the matching installer implementation will break CI.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive testing for AMD ROCm support, including shell script tests for torch index URL resolution and a new Python test suite covering various installation pathways. The feedback identifies a critical issue where the new tests reference a missing amd.py file, which will likely cause CI failures. Additionally, there are recommendations to avoid global state leakage by using proper mocking for sys.modules and to replace broad, silent exception handlers with more descriptive error reporting in test setup blocks.

Comment on lines +1073 to +1076
def test_amd_py_exists(self):
"""amd.py should exist in the hardware directory."""
amd_path = PACKAGE_ROOT / "studio" / "backend" / "utils" / "hardware" / "amd.py"
assert amd_path.exists()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test asserts the existence of amd.py, which is not included in this pull request. While this PR is a companion to #4720, including tests that are guaranteed to fail against the current target branch will break CI. Consider either including the necessary file changes in this PR or marking these tests as expected to fail/skipped until the implementation is merged.

Comment on lines +1016 to +1020
sys.modules["structlog"] = MagicMock()
sys.modules["loggers"] = MagicMock()
sys.modules["loggers"].get_logger = MagicMock(return_value = MagicMock())
sys.modules["utils"] = MagicMock()
sys.modules["utils.hardware"] = MagicMock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying sys.modules directly within a test method causes global state leakage that can affect other tests in the suite or subsequent test files. It is recommended to use unittest.mock.patch.dict to ensure that these modifications are scoped to the test and automatically reverted upon completion. This pattern is repeated in several other tests in this file (e.g., lines 1093, 1131, 1166, 1186).

Comment on lines +1022 to +1025
try:
_worker_spec.loader.exec_module(worker_mod)
except Exception:
pytest.skip("Could not load worker module in test environment")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This broad except Exception block silently skips the test on any failure during module execution. This violates the general rule against broad, silent exception handlers and makes it difficult to distinguish between environment issues and actual bugs in the code under test. Please catch more specific exceptions or at least include the exception details in the skip message.

Suggested change
try:
_worker_spec.loader.exec_module(worker_mod)
except Exception:
pytest.skip("Could not load worker module in test environment")
try:
_worker_spec.loader.exec_module(worker_mod)
except Exception as e:
pytest.skip(f"Could not load worker module in test environment: {e}")
References
  1. Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant