Skip to content

ROCm CI support to Torchforge #754

@akashveramd

Description

@akashveramd

We are in the process of adding ROCm CI support to torchforge. Recently, we added ROCm CI support to torchtitan, where CUDA and ROCm share common GitHub workflows, Dockerfile, and build script. This shared approach has been very effective in minimizing code divergence and maintenance overhead. There are a few topics we would like to discuss and get feedback on-

  1. In torchtitan, the CI is Docker-based, and the Conda environment setup is performed inside the container. For reference:
    Dockerfile: https://github.com/pytorch/torchtitan/blob/main/.ci/docker/ubuntu/Dockerfile
    Build script: https://github.com/pytorch/torchtitan/blob/main/.ci/docker/build.sh

    Given this setup, we would like to understand whether a similar Docker-based approach would be desirable for torchforge as well?

  2. Regarding torchforge installation, there appears to be some divergence in how torchforge is installed:
    The torchforge README mentions installation via ./scripts/install.sh.
    Whereas the integration_test.yaml workflow, installs torchforge using uv and pyproject.toml (for CUDA CI).

    For ROCm CI, as of now we would not use uv. Since ROCm does not have reliable wheels or pip installs for several dependencies (e.g., Monarch, vLLM). As a result, currently these components are built from source in install_rocm.sh script.

    Given these constraints, we would like to discuss path forward for torchforge installation with the motivation of having common workflow file for CUDA & ROCm?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions