Skip to content

fix(examples): Avoid torch dataset formatter#7388

Open
WilliamLindskog wants to merge 3 commits into
mainfrom
fix/examples-torch-format
Open

fix(examples): Avoid torch dataset formatter#7388
WilliamLindskog wants to merge 3 commits into
mainfrom
fix/examples-torch-format

Conversation

@WilliamLindskog

@WilliamLindskog WilliamLindskog commented Jun 16, 2026

Copy link
Copy Markdown
Member

What changed

  • Remove Dataset.with_format("torch") from the PyTorch example paths that combine Hugging Face Datasets with torchvision transforms.
  • Add a shared lazy tensor transform for the Whisper example so encoded data and targets are still returned as PyTorch tensors without using the global torch formatter.

Why

This is a follow-up to #7330. The E2E blocker was fixed there, but a few examples still used with_format("torch"), which can trigger Hugging Face Datasets' torch formatter import path and the unsupported torchvision.io.VideoReader import under some dependency combinations.

This PR is intentionally examples-only and does not change package minimum versions.

Validation

  • ruff, black --check, and py_compile on touched example Python files
  • git diff --check

Copilot AI review requested due to automatic review settings June 16, 2026 00:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes usage of Hugging Face Datasets’ global Torch formatter (Dataset.with_format("torch")) from several PyTorch-facing examples and docs to avoid triggering the fragile torchvision.io.VideoReader import path under certain dependency combinations, while preserving tensor outputs via lazy transforms where needed.

Changes:

  • Replaced with_format("torch") with with_transform(...) in CIFAR10-based PyTorch examples (quickstart + custom-mods).
  • Introduced a shared lazy tensor conversion transform for the Whisper example and applied it across centralized/client/server evaluation paths.
  • Removed with_format("torch") from the FedDebug label-noise dataset path and updated the PyTorch guide to stop recommending the map(...).with_format("torch") pattern.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/whisper-federated-finetuning/whisper_example/server_app.py Uses the new lazy torch transform for central evaluation datasets instead of with_format("torch").
examples/whisper-federated-finetuning/whisper_example/dataset.py Adds with_torch_transform helper (lazy tensor conversion for encoded data/targets).
examples/whisper-federated-finetuning/whisper_example/client_app.py Applies lazy torch transform to training partition after sampler construction; removes with_format("torch").
examples/whisper-federated-finetuning/centralized.py Switches centralized train/val/test datasets to the lazy torch transform helper.
examples/quickstart-pytorch/pytorchexample/task.py Removes global torch formatter from centralized CIFAR10 DataLoader path.
examples/custom-mods/custom_mods/task.py Removes global torch formatter from centralized CIFAR10 DataLoader path.
datasets/docs/source/how-to-use-with-pytorch.rst Removes documentation recommending map(...).with_format("torch").
baselines/feddebug/feddebug/dataset.py Drops with_format("torch") from label-noise mapping path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions Bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Jun 16, 2026
@WilliamLindskog WilliamLindskog marked this pull request as ready for review June 16, 2026 03:16
@WilliamLindskog WilliamLindskog force-pushed the fix/examples-torch-format branch from b0cc222 to 18ac2ff Compare June 16, 2026 12:10
@WilliamLindskog

Copy link
Copy Markdown
Member Author

Addressing the scope/dependency concern explicitly:

  • I narrowed this PR to examples-only. It now only touches the PyTorch/custom-mods/Whisper example files; the docs and FedDebug baseline changes were removed.
  • I do not think this needs a minimum torch/torchvision version bump. The PR removes Dataset.with_format("torch"); it does not rely on a newer torch/torchvision API. The replacement paths use existing example transforms and, for Whisper, torch.as_tensor to preserve tensor output for data/targets.
  • The motivation is to avoid the Hugging Face Datasets torch formatter import path that can hit torchvision.io.VideoReader under some dependency combinations, while keeping the example behavior equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants