[Bug] audio dataset preview and finetuning doesn't work due to torchcodec version mismatch

On the latest version of unsloth studio, installing via curl (`install.sh` commit https://github.com/unslothai/unsloth/commit/1d8160376e169d13c386b7ef4bc1fdc8f855de68)
- WSL Ubuntu 24.04
- Python 3.13
- CUDA 13.0

### Problem

1. Dataset preview will fail for most audio datasets on HuggingFace

    For example, loading `facebook/multilingual_librispeech` gives an error:
    <img width="1005" height="312" alt="Image" src="https://github.com/user-attachments/assets/35854ddc-7bf3-48a2-ac31-7f7c350f7a99" />
    
    Inspecting the logs (truncated), we see that loading the `.parquet` file containing audio data fails:
    ```json
    {"timestamp": "2026-04-07T15:24:29.204249Z", "level": "info", "event": "Tier 1: loading single file dutch/1_hours-00000-of-00001.parquet"}
    {"timestamp": "2026-04-07T15:24:32.835977Z", "level": "warning", "event": "Tier 1 (single-file) failed: Could not load libtorchcodec. Likely causes: 1. FFmpeg is not properly installed in your environment. We support versions 4, 5, 6, 7, and 8, and we attempt to load libtorchcodec for each of those versions. Errors for versions not installed on your system are expected; only the error for your installed FFmpeg version is relevant. On Windows, ensure you've installed the \"full-shared\" version which ships DLLs. 2. The PyTorch version (2.10.0+cu130) is not compatible with this version of TorchCodec. Refer to the version compatibility table: https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec. 3. Another runtime dependency; see exceptions below. The following exceptions were raised as we tried to load libtorchcodec: [start of libtorchcodec loading traceback]..."}
    ```
    
    To reproduce, try loading any audio dataset that is stored as `.parquet` files which contain embedded audio bytes inside.

2. Training also fails, due to the same error

    The fallback as a result of the first point, is to load the full dataset, but this will also fail when we try to start training, giving the same error:
    ```json
    {"timestamp": "2026-04-07T15:24:32.930110Z", "level": "info", "event": "Tier 2: falling back to full streaming load_dataset"}
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 71825.40it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 88846.69it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 92098.17it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 87419.28it/s]
    {"timestamp": "2026-04-07T15:24:37.588855Z", "level": "error", "event": "Error checking dataset format: Could not load libtorchcodec..."}
    ```
    
    This error commonly occurs when either `ffmpeg` isnt installed, or if there is a version mismatch with `torch/torchaudio/torchcodec`.
    
### Solution

This is a common issue caused by either not installing `ffmpeg` or a version mismatch between `torch/torchcodec/torchaudio`. To confirm that it is a version mismatch issue, we can inspect the versions of these 3 libraries:
```bash
cd ~/.unsloth/studio
source ./unsloth_studio/bin/activate
uv pip show torch torchaudio torchcodec

# Using Python 3.13.12 environment at: unsloth_studio
Name: torch
Version: 2.10.0+cu130
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires: cuda-bindings, filelock, fsspec, jinja2, networkx, nvidia-cublas, nvidia-cuda-cupti, nvidia-cuda-nvrtc, nvidia-cuda-runtime, nvidia-cudnn-cu13, nvidia-cufft, nvidia-cufile, nvidia-curand, nvidia-cusolver, nvidia-cusparse, nvidia-cusparselt-cu13, nvidia-nccl-cu13, nvidia-nvjitlink, nvidia-nvshmem-cu13, nvidia-nvtx, setuptools, sympy, triton, typing-extensions
Required-by: accelerate, bitsandbytes, cut-cross-entropy, descript-audio-codec, descript-audiotools, julius, openai-whisper, peft, sentence-transformers, snac, timm, torch-c-dlpack-ext, torch-stoi, torchvision, transformers-cfg, unsloth, unsloth-zoo, xformers
---
Name: torchaudio
Version: 2.11.0+cu130
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires:
Required-by: descript-audio-codec, descript-audiotools, torch-stoi
---
Name: torchcodec
Version: 0.11.0
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires:
Required-by:
```

Indeed, there is a version mismatch - based on the compatibility table (https://github.com/meta-pytorch/torchcodec), `torch==2.10.0+cu130` requires `torchaudio==2.10.0+cu130` and `torchcodec==0.10.0`.

Therefore, the fix is to downgrade the versions of these two libraries:
```bash
uv pip install torchcodec==0.10.0

# Using Python 3.13.12 environment at: unsloth_studio
Resolved 1 package in 75ms
Uninstalled 1 package in 10ms
Installed 1 package in 23ms
 - torchcodec==0.11.0
 + torchcodec==0.10.0

uv pip install torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130
# Using Python 3.13.12 environment at: unsloth_studio
Resolved 29 packages in 4.52s
Uninstalled 1 package in 13ms
Installed 1 package in 25ms
 - torchaudio==2.11.0+cu130
 + torchaudio==2.10.0+cu130
```

and restarting Unsloth studio, now audio dataset preview works as intended:

<img width="1014" height="665" alt="Image" src="https://github.com/user-attachments/assets/28c187bb-984e-4aac-807a-49b3d6240d3f" />

Training also works as well :)

<img width="915" height="370" alt="Image" src="https://github.com/user-attachments/assets/e0f39f7f-ab45-4c2e-a0f6-f438d659d889" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] audio dataset preview and finetuning doesn't work due to torchcodec version mismatch #4900

Problem

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] audio dataset preview and finetuning doesn't work due to torchcodec version mismatch #4900

Description

Problem

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions