[RFC] Fine-tuning on Intel hardware in the PyTorch ecosystem — where to invest?

## Motivation

We (Intel) want to enable fine-tuning on Intel GPU hardware within the PyTorch ecosystem. We started with torchforge and have been contributing device-agnostic improvements. Now we need to understand where to focus next.

## Work completed

| What | Status | Description |
|------|--------|-------------|
| [#749 — Make SFT hardware-agnostic](https://github.com/meta-pytorch/torchforge/pull/749) | **Merged** | Replaced `torch.cuda.*` with `torch.accelerator.*`, introduced `DeviceProxy` for device counting and env var mapping across backends. Updated tests accordingly. |
| [#760 — XPU: add install script and docs](https://github.com/meta-pytorch/torchforge/pull/760) | **Open** (awaiting review) | Adds `scripts/install_xpu.sh` following the `install_rocm.sh` pattern. |
| [#759 — SFT checkpoint bug](https://github.com/meta-pytorch/torchforge/issues/759) | **Open** | Bug found during testing, reported upstream. |

## Next goal: GRPO — but where?

Our next target is enabling the GRPO workflow on Intel hardware. GRPO has a deeper stack than SFT — it relies on Monarch actors, TorchStore (RDMA-based weight sync), and vLLM. Some of these have CUDA-specific paths.

The question is **where this work should land**. We've noticed that torchforge activity has slowed down, while torchtitan has added its own RL support via [`experiments/rl`](https://github.com/pytorch/torchtitan/tree/main/torchtitan/experiments/rl) — an alternative GRPO implementation using the same core dependencies (Monarch, TorchStore, vLLM) but directly within torchtitan.

**We'd like to understand: is torchforge still the right place to invest, or should we shift our fine-tuning enablement efforts toward torchtitan?**

Any guidance from maintainers would be greatly appreciated.

/cc @felipemello1 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Fine-tuning on Intel hardware in the PyTorch ecosystem — where to invest? #773

Motivation

Work completed

Next goal: GRPO — but where?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What	Status	Description
#749 — Make SFT hardware-agnostic	Merged	Replaced `torch.cuda.` with `torch.accelerator.`, introduced `DeviceProxy` for device counting and env var mapping across backends. Updated tests accordingly.
#760 — XPU: add install script and docs	Open (awaiting review)	Adds `scripts/install_xpu.sh` following the `install_rocm.sh` pattern.
#759 — SFT checkpoint bug	Open	Bug found during testing, reported upstream.

[RFC] Fine-tuning on Intel hardware in the PyTorch ecosystem — where to invest? #773

Description

Motivation

Work completed

Next goal: GRPO — but where?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions