Add VibeVoice-1.5B perf benchmark and wire to nightly#5343
Draft
saiarthiraguram wants to merge 1 commit into
Draft
Add VibeVoice-1.5B perf benchmark and wire to nightly#5343saiarthiraguram wants to merge 1 commit into
saiarthiraguram wants to merge 1 commit into
Conversation
Add a single-forward + PCC benchmark for VibeVoice-1.5B (microsoft/VibeVoice-1.5B) using the generic encoder benchmark harness. The model's bringup forward reduces to the Qwen2.5 LM backbone producing logits (speech_tensors=None; semantic connector exercised but unused), and the loader wraps it to return a bare logits tensor, so it runs cleanly through the existing harness. Wire it into the nightly perf pipeline via perf-bench-matrix.json, pinned to n150-perf (the verified bringup arch). Trace disabled, optimization level 1. Verified on n150: PCC=0.992910, 1 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
Link to Github Issue
Problem description
VibeVoice-1.5B (
microsoft/VibeVoice-1.5B) was brought up single-device on n150, but it has no perf benchmark and is not part of the nightly performance pipeline, so its throughput/PCC is not tracked over time.What's changed
Adds a perf benchmark for VibeVoice-1.5B and wires it into the nightly perf matrix.
tests/benchmark/test_encoders.py::test_vibevoice— runs the model through the generic single-forward + PCC encoder harness. VibeVoice's bringup forward reduces to the Qwen2.5 LM backbone producing logits (speech_tensors=None; the semantic connector is exercised but unused), and the loader wraps the model to return a bare logits tensor, so it fits the existing harness without changes. Config: bf16, batch 1, seq len 32, loop count 32, optimization level 1, trace disabled..github/workflows/perf-bench-matrix.json— adds avibevoiceentry pinned toruns-on: n150-perf(the verified bringup arch). This matrix is filtered and executed by the nightly pipeline (schedule-nightly.yml→perf-benchmark→call-filtered-perf-tests.yml).Impact: VibeVoice-1.5B throughput and PCC are now tracked in the nightly benchmark report on n150.
Dependency: the benchmark imports the VibeVoice loader from
third_party.tt_forge_models.vibevoice. It only runs green once the tt-forge-models loader PR (branchsai_arthi_raguram/vibe_voice) lands and the submodule is uplifted in tt-xla. Land + uplift that first.Checklist
tests/benchmark/test_encoders.py::test_vibevoiceverified on n150: PCC=0.992910,1 passed in 197s.Logs
benchmark_vibevoice_n150.log
bringup_steps.txt