Skip to content

Optimize forward seeks#1489

Open
NicolasHug wants to merge 2 commits into
meta-pytorch:mainfrom
NicolasHug:seek_forward
Open

Optimize forward seeks#1489
NicolasHug wants to merge 2 commits into
meta-pytorch:mainfrom
NicolasHug:seek_forward

Conversation

@NicolasHug

Copy link
Copy Markdown
Contributor

This PR implements the heuristic described in #1488, which greatly speeds up various sparse sampling scenarios

On main:

== 10s clip ===
  scenario num_frames=32:
    threads=1     897.97 ±   1.46 ms
    threads=8     318.66 ±  11.46 ms
    threads=16    405.97 ±  43.35 ms
  scenario fps=2:
    threads=1     823.16 ±   4.05 ms
    threads=8     297.88 ±  15.41 ms
    threads=16    335.72 ±  38.15 ms

=== 30s clip ===
  scenario num_frames=32:
    threads=1    2093.25 ±  12.27 ms
    threads=8     747.90 ±  36.04 ms
    threads=16    888.11 ±  70.06 ms
  scenario fps=2:
    threads=1    3072.13 ±  27.38 ms
    threads=8     958.11 ±  53.55 ms
    threads=16   1017.69 ± 100.46 ms

This PR:

=== 10s clip ===
  scenario num_frames=32:
    threads=1     900.39 ±  19.13 ms
    threads=8     256.81 ±  14.87 ms
    threads=16    162.67 ±  12.00 ms
  scenario fps=2:
    threads=1     829.18 ±   8.31 ms
    threads=8     246.38 ±  17.08 ms
    threads=16    212.49 ±  22.90 ms

=== 30s clip ===
  scenario num_frames=32:
    threads=1    2071.08 ±  14.91 ms
    threads=8     705.73 ±  39.94 ms
    threads=16    564.64 ±  39.22 ms
  scenario fps=2:
    threads=1    3050.82 ±  19.00 ms
    threads=8     789.66 ±  43.08 ms
    threads=16    502.18 ±  26.40 ms

benchmark code:

Details
"""Tiny torchcodec-only benchmark to compare branches (e.g. main vs the
forward-seek heuristic from issue #1488).

Same clip + scenarios as bench_opencv.py, but self-contained. Run it on each
branch and eyeball:

    git checkout main         && <rebuild> && python bench_heuristic.py
    git checkout <my-branch>  && <rebuild> && python bench_heuristic.py

Thread count matters: the heuristic threshold is has_b_frames + thread_count - 1,
so the no-seek win grows with more threads on sparse sampling (fps=2).
"""

import hashlib
import math
import statistics
import subprocess
import tempfile
import time
from pathlib import Path

import numpy as np

from torchcodec.decoders import VideoDecoder

SRC_FPS = 30
SRC_GOP = 30
DURATIONS_S = [10, 30]
THREADS = [1, 8, 16]
ITERS = 20

# (label, kwargs) sampling scenarios to exercise.
SCENARIOS = [
    ("num_frames=32", {"num_frames": 32}),
    ("fps=2", {"fps": 2}),
]

CACHE_DIR = Path(tempfile.gettempdir()) / "vllm_video_bench"


def get_or_make_video(duration_s):
    """Return a cached animated 720p clip, generating it with ffmpeg if needed."""
    spec = [
        "-f", "lavfi",
        "-i", f"mandelbrot=s=1280x720:rate={SRC_FPS}",
        "-t", str(duration_s),
        "-g", str(SRC_GOP),
        "-pix_fmt", "yuv420p",
    ]
    digest = hashlib.sha256(" ".join(spec).encode()).hexdigest()[:16]
    CACHE_DIR.mkdir(parents=True, exist_ok=True)
    path = CACHE_DIR / f"clip_{duration_s}s_{digest}.mp4"
    if not path.exists():
        print(f"generating {path} ...")
        tmp = path.with_suffix(".tmp.mp4")
        subprocess.run(
            ["ffmpeg", "-y", "-loglevel", "error", *spec, str(tmp)], check=True
        )
        tmp.rename(path)
    return str(path)


def compute_frame_idx(total_frames, duration, num_frames=-1, fps=-1):
    """Uniformly sample frame indices (copied from vLLM: multimodal/video.py)."""
    n = total_frames
    if num_frames > 0:
        n = min(num_frames, total_frames)
    if fps > 0:
        n = min(n, math.floor(duration * fps))
    n = max(1, n)
    if n == total_frames:
        return list(range(n))
    return np.linspace(0, total_frames - 1, n, dtype=int).tolist()


def torchcodec_decode(path, threads, num_frames=-1, fps=-1):
    decoder = VideoDecoder(path, num_ffmpeg_threads=threads)
    frame_idx = compute_frame_idx(
        decoder._num_frames, decoder.metadata.duration_seconds, num_frames, fps
    )
    return decoder.get_frames_at(frame_idx).data


def bench(path, kwargs, threads):
    torchcodec_decode(path, threads, **kwargs)  # warm-up
    times = []
    for _ in range(ITERS):
        start = time.perf_counter()
        torchcodec_decode(path, threads, **kwargs)
        times.append((time.perf_counter() - start) * 1e3)
    return statistics.median(times), (statistics.stdev(times) if len(times) > 1 else 0.0)


def main():
    rev = subprocess.run(
        ["git", "rev-parse", "--short", "HEAD"], capture_output=True, text=True
    ).stdout.strip()
    print(f"git HEAD: {rev}  (iters={ITERS})")

    for duration in DURATIONS_S:
        path = get_or_make_video(duration)
        print(f"\n=== {duration}s clip ===")
        for label, kwargs in SCENARIOS:
            print(f"  scenario {label}:")
            for threads in THREADS:
                median_ms, std_ms = bench(path, kwargs, threads)
                print(f"    threads={threads:<2}  {median_ms:8.2f} ± {std_ms:6.2f} ms")


if __name__ == "__main__":
    main()

@pytorch-bot

pytorch-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1489

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Cancelled Jobs, 2 Unrelated Failures

As of commit cb7e137 with merge base 8bbce65 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant