Skip to content

[RFC] Fine-tuning on Intel hardware in the PyTorch ecosystem — where to invest? #773

@DamianSzwichtenberg

Description

@DamianSzwichtenberg

Motivation

We (Intel) want to enable fine-tuning on Intel GPU hardware within the PyTorch ecosystem. We started with torchforge and have been contributing device-agnostic improvements. Now we need to understand where to focus next.

Work completed

What Status Description
#749 — Make SFT hardware-agnostic Merged Replaced torch.cuda.* with torch.accelerator.*, introduced DeviceProxy for device counting and env var mapping across backends. Updated tests accordingly.
#760 — XPU: add install script and docs Open (awaiting review) Adds scripts/install_xpu.sh following the install_rocm.sh pattern.
#759 — SFT checkpoint bug Open Bug found during testing, reported upstream.

Next goal: GRPO — but where?

Our next target is enabling the GRPO workflow on Intel hardware. GRPO has a deeper stack than SFT — it relies on Monarch actors, TorchStore (RDMA-based weight sync), and vLLM. Some of these have CUDA-specific paths.

The question is where this work should land. We've noticed that torchforge activity has slowed down, while torchtitan has added its own RL support via experiments/rl — an alternative GRPO implementation using the same core dependencies (Monarch, TorchStore, vLLM) but directly within torchtitan.

We'd like to understand: is torchforge still the right place to invest, or should we shift our fine-tuning enablement efforts toward torchtitan?

Any guidance from maintainers would be greatly appreciated.

/cc @felipemello1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions