Optimize forward seeks by NicolasHug · Pull Request #1489 · meta-pytorch/torchcodec

NicolasHug · 2026-06-16T17:45:14Z

This PR implements the heuristic described in #1488, which greatly speeds up various sparse sampling scenarios

On main:

== 10s clip ===
  scenario num_frames=32:
    threads=1     897.97 ±   1.46 ms
    threads=8     318.66 ±  11.46 ms
    threads=16    405.97 ±  43.35 ms
  scenario fps=2:
    threads=1     823.16 ±   4.05 ms
    threads=8     297.88 ±  15.41 ms
    threads=16    335.72 ±  38.15 ms

=== 30s clip ===
  scenario num_frames=32:
    threads=1    2093.25 ±  12.27 ms
    threads=8     747.90 ±  36.04 ms
    threads=16    888.11 ±  70.06 ms
  scenario fps=2:
    threads=1    3072.13 ±  27.38 ms
    threads=8     958.11 ±  53.55 ms
    threads=16   1017.69 ± 100.46 ms

This PR:

=== 10s clip ===
  scenario num_frames=32:
    threads=1     900.39 ±  19.13 ms
    threads=8     256.81 ±  14.87 ms
    threads=16    162.67 ±  12.00 ms
  scenario fps=2:
    threads=1     829.18 ±   8.31 ms
    threads=8     246.38 ±  17.08 ms
    threads=16    212.49 ±  22.90 ms

=== 30s clip ===
  scenario num_frames=32:
    threads=1    2071.08 ±  14.91 ms
    threads=8     705.73 ±  39.94 ms
    threads=16    564.64 ±  39.22 ms
  scenario fps=2:
    threads=1    3050.82 ±  19.00 ms
    threads=8     789.66 ±  43.08 ms
    threads=16    502.18 ±  26.40 ms

benchmark code:

Details

"""Tiny torchcodec-only benchmark to compare branches (e.g. main vs the
forward-seek heuristic from issue #1488).

Same clip + scenarios as bench_opencv.py, but self-contained. Run it on each
branch and eyeball:

    git checkout main         && <rebuild> && python bench_heuristic.py
    git checkout <my-branch>  && <rebuild> && python bench_heuristic.py

Thread count matters: the heuristic threshold is has_b_frames + thread_count - 1,
so the no-seek win grows with more threads on sparse sampling (fps=2).
"""

import hashlib
import math
import statistics
import subprocess
import tempfile
import time
from pathlib import Path

import numpy as np

from torchcodec.decoders import VideoDecoder

SRC_FPS = 30
SRC_GOP = 30
DURATIONS_S = [10, 30]
THREADS = [1, 8, 16]
ITERS = 20

# (label, kwargs) sampling scenarios to exercise.
SCENARIOS = [
    ("num_frames=32", {"num_frames": 32}),
    ("fps=2", {"fps": 2}),
]

CACHE_DIR = Path(tempfile.gettempdir()) / "vllm_video_bench"


def get_or_make_video(duration_s):
    """Return a cached animated 720p clip, generating it with ffmpeg if needed."""
    spec = [
        "-f", "lavfi",
        "-i", f"mandelbrot=s=1280x720:rate={SRC_FPS}",
        "-t", str(duration_s),
        "-g", str(SRC_GOP),
        "-pix_fmt", "yuv420p",
    ]
    digest = hashlib.sha256(" ".join(spec).encode()).hexdigest()[:16]
    CACHE_DIR.mkdir(parents=True, exist_ok=True)
    path = CACHE_DIR / f"clip_{duration_s}s_{digest}.mp4"
    if not path.exists():
        print(f"generating {path} ...")
        tmp = path.with_suffix(".tmp.mp4")
        subprocess.run(
            ["ffmpeg", "-y", "-loglevel", "error", *spec, str(tmp)], check=True
        )
        tmp.rename(path)
    return str(path)


def compute_frame_idx(total_frames, duration, num_frames=-1, fps=-1):
    """Uniformly sample frame indices (copied from vLLM: multimodal/video.py)."""
    n = total_frames
    if num_frames > 0:
        n = min(num_frames, total_frames)
    if fps > 0:
        n = min(n, math.floor(duration * fps))
    n = max(1, n)
    if n == total_frames:
        return list(range(n))
    return np.linspace(0, total_frames - 1, n, dtype=int).tolist()


def torchcodec_decode(path, threads, num_frames=-1, fps=-1):
    decoder = VideoDecoder(path, num_ffmpeg_threads=threads)
    frame_idx = compute_frame_idx(
        decoder._num_frames, decoder.metadata.duration_seconds, num_frames, fps
    )
    return decoder.get_frames_at(frame_idx).data


def bench(path, kwargs, threads):
    torchcodec_decode(path, threads, **kwargs)  # warm-up
    times = []
    for _ in range(ITERS):
        start = time.perf_counter()
        torchcodec_decode(path, threads, **kwargs)
        times.append((time.perf_counter() - start) * 1e3)
    return statistics.median(times), (statistics.stdev(times) if len(times) > 1 else 0.0)


def main():
    rev = subprocess.run(
        ["git", "rev-parse", "--short", "HEAD"], capture_output=True, text=True
    ).stdout.strip()
    print(f"git HEAD: {rev}  (iters={ITERS})")

    for duration in DURATIONS_S:
        path = get_or_make_video(duration)
        print(f"\n=== {duration}s clip ===")
        for label, kwargs in SCENARIOS:
            print(f"  scenario {label}:")
            for threads in THREADS:
                median_ms, std_ms = bench(path, kwargs, threads)
                print(f"    threads={threads:<2}  {median_ms:8.2f} ± {std_ms:6.2f} ms")


if __name__ == "__main__":
    main()

pytorch-bot · 2026-06-16T17:45:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1489

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Cancelled Jobs, 2 Unrelated Failures

As of commit cb7e137 with merge base 8bbce65 ():

NEW FAILURES - The following jobs have failed:

Linux CUDA aarch64 / Build and Upload wheel / upload / upload-wheel-py3_10-cuda-aarch6412_6-aarch64 (gh)
Unable to download artifact(s): Artifact not found for name: meta-pytorch_torchcodec__3.10_cu126_aarch64
Linux CUDA aarch64 / Build and Upload wheel / upload / upload-wheel-py3_10-cuda-aarch6413_0-aarch64 (gh)
Unable to download artifact(s): Artifact not found for name: meta-pytorch_torchcodec__3.10_cu130_aarch64

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Linux CUDA aarch64 / Build and Upload wheel / build-wheel-py3_10-cuda-aarch6413_2-aarch64 (gh) (trunk failure)
Linux CUDA aarch64 / Build and Upload wheel / upload / upload-wheel-py3_10-cuda-aarch6413_2-aarch64 (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: meta-pytorch_torchcodec__3.10_cu132_aarch64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…orward

Optimize forward seeks

50d3129

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 16, 2026

Merge branch 'main' of github.com:meta-pytorch/torchcodec into seek_f…

cb7e137

…orward

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize forward seeks#1489

Optimize forward seeks#1489
NicolasHug wants to merge 2 commits into
meta-pytorch:mainfrom
NicolasHug:seek_forward

NicolasHug commented Jun 16, 2026

Uh oh!

pytorch-bot Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NicolasHug commented Jun 16, 2026

Uh oh!

pytorch-bot Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1489

❌ 2 New Failures, 2 Cancelled Jobs, 2 Unrelated Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jun 16, 2026 •

edited

Loading