Skip to content

Add VibeVoice-1.5B perf benchmark and wire to nightly#5343

Draft
saiarthiraguram wants to merge 1 commit into
mainfrom
sai_arthi_raguram/vibevoice_nightly
Draft

Add VibeVoice-1.5B perf benchmark and wire to nightly#5343
saiarthiraguram wants to merge 1 commit into
mainfrom
sai_arthi_raguram/vibevoice_nightly

Conversation

@saiarthiraguram

@saiarthiraguram saiarthiraguram commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Ticket

Link to Github Issue

Problem description

VibeVoice-1.5B (microsoft/VibeVoice-1.5B) was brought up single-device on n150, but it has no perf benchmark and is not part of the nightly performance pipeline, so its throughput/PCC is not tracked over time.

What's changed

Adds a perf benchmark for VibeVoice-1.5B and wires it into the nightly perf matrix.

  • tests/benchmark/test_encoders.py::test_vibevoice — runs the model through the generic single-forward + PCC encoder harness. VibeVoice's bringup forward reduces to the Qwen2.5 LM backbone producing logits (speech_tensors=None; the semantic connector is exercised but unused), and the loader wraps the model to return a bare logits tensor, so it fits the existing harness without changes. Config: bf16, batch 1, seq len 32, loop count 32, optimization level 1, trace disabled.
  • .github/workflows/perf-bench-matrix.json — adds a vibevoice entry pinned to runs-on: n150-perf (the verified bringup arch). This matrix is filtered and executed by the nightly pipeline (schedule-nightly.ymlperf-benchmarkcall-filtered-perf-tests.yml).

Impact: VibeVoice-1.5B throughput and PCC are now tracked in the nightly benchmark report on n150.

Dependency: the benchmark imports the VibeVoice loader from third_party.tt_forge_models.vibevoice. It only runs green once the tt-forge-models loader PR (branch sai_arthi_raguram/vibe_voice) lands and the submodule is uplifted in tt-xla. Land + uplift that first.

Checklist

  • New/Existing tests provide coverage for changes — tests/benchmark/test_encoders.py::test_vibevoice verified on n150: PCC=0.992910, 1 passed in 197s.

Logs

benchmark_vibevoice_n150.log
bringup_steps.txt

Add a single-forward + PCC benchmark for VibeVoice-1.5B
(microsoft/VibeVoice-1.5B) using the generic encoder benchmark harness.
The model's bringup forward reduces to the Qwen2.5 LM backbone producing
logits (speech_tensors=None; semantic connector exercised but unused), and
the loader wraps it to return a bare logits tensor, so it runs cleanly
through the existing harness.

Wire it into the nightly perf pipeline via perf-bench-matrix.json, pinned to
n150-perf (the verified bringup arch). Trace disabled, optimization level 1.

Verified on n150: PCC=0.992910, 1 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant