Conversation
95 Python tests and 23 shell tests covering ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logic. All tests use mocks -- no AMD hardware required. Companion to #4720 (AMD ROCm/HIP support).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53e782f7b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sys.modules[_STACK_SPEC.name] = stack_mod | ||
| _STACK_SPEC.loader.exec_module(stack_mod) | ||
|
|
||
| _detect_rocm_version = stack_mod._detect_rocm_version |
There was a problem hiding this comment.
Defer binding ROCm symbols until existence is verified
These module-level assignments unconditionally access private ROCm helpers from studio/install_python_stack.py; on the parent revision (8981e6c) those attributes are absent, so pytest aborts during collection with AttributeError before any test can run. This makes the commit non-runnable unless the companion implementation lands first, so the test module should guard missing symbols (for example via getattr/hasattr + pytest.skip) or be merged together with the implementation.
Useful? React with 👍 / 👎.
| # 9) ROCm 6.3 (no nvidia-smi) -> rocm6.3 | ||
| _dir=$(make_mock_amd_smi "6.3") | ||
| _result=$(run_func "$_dir") | ||
| assert_eq "ROCm 6.3 -> rocm6.3" "https://download.pytorch.org/whl/rocm6.3" "$_result" |
There was a problem hiding this comment.
Avoid ROCm URL assertions before install.sh supports ROCm
This new expectation (and the related ROCm cases below it) currently fails deterministically against the base install.sh logic, which returns the CPU index whenever nvidia-smi is missing; running bash tests/sh/test_get_torch_index_url.sh yields multiple ROCm failures and exits non-zero. As a result, merging this test-only change ahead of the matching installer implementation will break CI.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive testing for AMD ROCm support, including shell script tests for torch index URL resolution and a new Python test suite covering various installation pathways. The feedback identifies a critical issue where the new tests reference a missing amd.py file, which will likely cause CI failures. Additionally, there are recommendations to avoid global state leakage by using proper mocking for sys.modules and to replace broad, silent exception handlers with more descriptive error reporting in test setup blocks.
| def test_amd_py_exists(self): | ||
| """amd.py should exist in the hardware directory.""" | ||
| amd_path = PACKAGE_ROOT / "studio" / "backend" / "utils" / "hardware" / "amd.py" | ||
| assert amd_path.exists() |
There was a problem hiding this comment.
This test asserts the existence of amd.py, which is not included in this pull request. While this PR is a companion to #4720, including tests that are guaranteed to fail against the current target branch will break CI. Consider either including the necessary file changes in this PR or marking these tests as expected to fail/skipped until the implementation is merged.
| sys.modules["structlog"] = MagicMock() | ||
| sys.modules["loggers"] = MagicMock() | ||
| sys.modules["loggers"].get_logger = MagicMock(return_value = MagicMock()) | ||
| sys.modules["utils"] = MagicMock() | ||
| sys.modules["utils.hardware"] = MagicMock() |
There was a problem hiding this comment.
Modifying sys.modules directly within a test method causes global state leakage that can affect other tests in the suite or subsequent test files. It is recommended to use unittest.mock.patch.dict to ensure that these modifications are scoped to the test and automatically reverted upon completion. This pattern is repeated in several other tests in this file (e.g., lines 1093, 1131, 1166, 1186).
| try: | ||
| _worker_spec.loader.exec_module(worker_mod) | ||
| except Exception: | ||
| pytest.skip("Could not load worker module in test environment") |
There was a problem hiding this comment.
This broad except Exception block silently skips the test on any failure during module execution. This violates the general rule against broad, silent exception handlers and makes it difficult to distinguish between environment issues and actual bugs in the code under test. Please catch more specific exceptions or at least include the exception details in the skip message.
| try: | |
| _worker_spec.loader.exec_module(worker_mod) | |
| except Exception: | |
| pytest.skip("Could not load worker module in test environment") | |
| try: | |
| _worker_spec.loader.exec_module(worker_mod) | |
| except Exception as e: | |
| pytest.skip(f"Could not load worker module in test environment: {e}") |
References
- Avoid using broad, silent exception handlers like
except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.
Summary
tests/studio/install/test_rocm_support.py) and 23 shell tests (tests/sh/test_get_torch_index_url.sh) covering AMD ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logicTest plan
python -m pytest tests/studio/install/test_rocm_support.py -vbash tests/sh/test_get_torch_index_url.sh