fix(examples): Avoid torch dataset formatter#7388
Open
WilliamLindskog wants to merge 3 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes usage of Hugging Face Datasets’ global Torch formatter (Dataset.with_format("torch")) from several PyTorch-facing examples and docs to avoid triggering the fragile torchvision.io.VideoReader import path under certain dependency combinations, while preserving tensor outputs via lazy transforms where needed.
Changes:
- Replaced
with_format("torch")withwith_transform(...)in CIFAR10-based PyTorch examples (quickstart + custom-mods). - Introduced a shared lazy tensor conversion transform for the Whisper example and applied it across centralized/client/server evaluation paths.
- Removed
with_format("torch")from the FedDebug label-noise dataset path and updated the PyTorch guide to stop recommending themap(...).with_format("torch")pattern.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| examples/whisper-federated-finetuning/whisper_example/server_app.py | Uses the new lazy torch transform for central evaluation datasets instead of with_format("torch"). |
| examples/whisper-federated-finetuning/whisper_example/dataset.py | Adds with_torch_transform helper (lazy tensor conversion for encoded data/targets). |
| examples/whisper-federated-finetuning/whisper_example/client_app.py | Applies lazy torch transform to training partition after sampler construction; removes with_format("torch"). |
| examples/whisper-federated-finetuning/centralized.py | Switches centralized train/val/test datasets to the lazy torch transform helper. |
| examples/quickstart-pytorch/pytorchexample/task.py | Removes global torch formatter from centralized CIFAR10 DataLoader path. |
| examples/custom-mods/custom_mods/task.py | Removes global torch formatter from centralized CIFAR10 DataLoader path. |
| datasets/docs/source/how-to-use-with-pytorch.rst | Removes documentation recommending map(...).with_format("torch"). |
| baselines/feddebug/feddebug/dataset.py | Drops with_format("torch") from label-noise mapping path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b0cc222 to
18ac2ff
Compare
Member
Author
|
Addressing the scope/dependency concern explicitly:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Dataset.with_format("torch")from the PyTorch example paths that combine Hugging Face Datasets with torchvision transforms.dataandtargetsare still returned as PyTorch tensors without using the global torch formatter.Why
This is a follow-up to #7330. The E2E blocker was fixed there, but a few examples still used
with_format("torch"), which can trigger Hugging Face Datasets' torch formatter import path and the unsupportedtorchvision.io.VideoReaderimport under some dependency combinations.This PR is intentionally examples-only and does not change package minimum versions.
Validation
ruff,black --check, andpy_compileon touched example Python filesgit diff --check