Skip to content

[Bug] audio dataset preview and finetuning doesn't work due to torchcodec version mismatch #4900

@sleepydirt

Description

@sleepydirt

On the latest version of unsloth studio, installing via curl (install.sh commit 1d81603)

  • WSL Ubuntu 24.04
  • Python 3.13
  • CUDA 13.0

Problem

  1. Dataset preview will fail for most audio datasets on HuggingFace

    For example, loading facebook/multilingual_librispeech gives an error:
    Image

    Inspecting the logs (truncated), we see that loading the .parquet file containing audio data fails:

    {"timestamp": "2026-04-07T15:24:29.204249Z", "level": "info", "event": "Tier 1: loading single file dutch/1_hours-00000-of-00001.parquet"}
    {"timestamp": "2026-04-07T15:24:32.835977Z", "level": "warning", "event": "Tier 1 (single-file) failed: Could not load libtorchcodec. Likely causes: 1. FFmpeg is not properly installed in your environment. We support versions 4, 5, 6, 7, and 8, and we attempt to load libtorchcodec for each of those versions. Errors for versions not installed on your system are expected; only the error for your installed FFmpeg version is relevant. On Windows, ensure you've installed the \"full-shared\" version which ships DLLs. 2. The PyTorch version (2.10.0+cu130) is not compatible with this version of TorchCodec. Refer to the version compatibility table: https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec. 3. Another runtime dependency; see exceptions below. The following exceptions were raised as we tried to load libtorchcodec: [start of libtorchcodec loading traceback]..."}

    To reproduce, try loading any audio dataset that is stored as .parquet files which contain embedded audio bytes inside.

  2. Training also fails, due to the same error

    The fallback as a result of the first point, is to load the full dataset, but this will also fail when we try to start training, giving the same error:

    {"timestamp": "2026-04-07T15:24:32.930110Z", "level": "info", "event": "Tier 2: falling back to full streaming load_dataset"}
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 71825.40it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 88846.69it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 92098.17it/s]
    Resolving data files: 100%|██████████████████████████████████████████████████████| 48/48 [00:00<00:00, 87419.28it/s]
    {"timestamp": "2026-04-07T15:24:37.588855Z", "level": "error", "event": "Error checking dataset format: Could not load libtorchcodec..."}

    This error commonly occurs when either ffmpeg isnt installed, or if there is a version mismatch with torch/torchaudio/torchcodec.

Solution

This is a common issue caused by either not installing ffmpeg or a version mismatch between torch/torchcodec/torchaudio. To confirm that it is a version mismatch issue, we can inspect the versions of these 3 libraries:

cd ~/.unsloth/studio
source ./unsloth_studio/bin/activate
uv pip show torch torchaudio torchcodec

# Using Python 3.13.12 environment at: unsloth_studio
Name: torch
Version: 2.10.0+cu130
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires: cuda-bindings, filelock, fsspec, jinja2, networkx, nvidia-cublas, nvidia-cuda-cupti, nvidia-cuda-nvrtc, nvidia-cuda-runtime, nvidia-cudnn-cu13, nvidia-cufft, nvidia-cufile, nvidia-curand, nvidia-cusolver, nvidia-cusparse, nvidia-cusparselt-cu13, nvidia-nccl-cu13, nvidia-nvjitlink, nvidia-nvshmem-cu13, nvidia-nvtx, setuptools, sympy, triton, typing-extensions
Required-by: accelerate, bitsandbytes, cut-cross-entropy, descript-audio-codec, descript-audiotools, julius, openai-whisper, peft, sentence-transformers, snac, timm, torch-c-dlpack-ext, torch-stoi, torchvision, transformers-cfg, unsloth, unsloth-zoo, xformers
---
Name: torchaudio
Version: 2.11.0+cu130
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires:
Required-by: descript-audio-codec, descript-audiotools, torch-stoi
---
Name: torchcodec
Version: 0.11.0
Location: /home/sleepydirt/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages
Requires:
Required-by:

Indeed, there is a version mismatch - based on the compatibility table (https://github.com/meta-pytorch/torchcodec), torch==2.10.0+cu130 requires torchaudio==2.10.0+cu130 and torchcodec==0.10.0.

Therefore, the fix is to downgrade the versions of these two libraries:

uv pip install torchcodec==0.10.0

# Using Python 3.13.12 environment at: unsloth_studio
Resolved 1 package in 75ms
Uninstalled 1 package in 10ms
Installed 1 package in 23ms
 - torchcodec==0.11.0
 + torchcodec==0.10.0

uv pip install torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130
# Using Python 3.13.12 environment at: unsloth_studio
Resolved 29 packages in 4.52s
Uninstalled 1 package in 13ms
Installed 1 package in 25ms
 - torchaudio==2.11.0+cu130
 + torchaudio==2.10.0+cu130

and restarting Unsloth studio, now audio dataset preview works as intended:

Image

Training also works as well :)

Image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions