Skip to content

ysdede/tdt-webgpu-demo

Repository files navigation

Transformers.js v4 Parakeet TDT Demo

Deploy Demo to GitHub Pages Sync Demo to HF Space Live Demo Live Demo

Live demo: https://ysdede.github.io/tdt-webgpu-demo/

This project is a React + Vite web app for automatic speech recognition with Nemo Conformer TDT models using Transformers.js v4.

Features

  • Run Parakeet-style TDT ASR models in the browser, for example ysdede/parakeet-tdt-0.6b-v2-onnx-tfjs4.
  • Choose encoder backend and dtype settings (WebGPU or WASM).
  • Switch between HF pipeline mode and direct model.transcribe() mode with mode-specific controls.
  • Compare browser audio prep paths, including a deterministic custom JS path with linear parity resampling and optional higher-quality SRC modes.
  • Transcribe sample or uploaded audio.
  • Inspect transcript, timestamps, confidence data, raw JSON output, and separated run/audio/model metrics.
  • Run a Node.js CLI test script for quick non-UI checks.

Requirements

  • Node.js 18 or newer.
  • A modern browser. WebGPU (Chrome or Edge) is recommended for best encoder performance.
  • Optional local development setup for transformers.js as a sibling folder at ../transformers.js.

Install

git clone <this-repo-url>
cd transformers-v4-parakeet-demo
npm install

Run

NPM mode (default)

Uses @huggingface/transformers@next from npm:

npm run dev

Then open the URL shown by Vite (typically http://localhost:5173).

Local source mode

Use this when you want to test local transformers.js changes without publishing a package.

  1. Keep both repositories as siblings:
    • .../transformers.js/
    • .../transformers-v4-parakeet-demo/
  2. Build transformers from the transformers.js root:
cd path/to/transformers.js
pnpm --filter @huggingface/transformers run build
  1. Start the demo in local mode:
cd path/to/transformers-v4-parakeet-demo
npm run dev:local

dev:local sets TRANSFORMERS_LOCAL=true and aliases @huggingface/transformers to ../transformers.js/packages/transformers/dist/transformers.web.js.

Production Build

npm run build
npm run preview

GitHub Pages Deployment

Deployment is handled by .github/workflows/deploy-pages.yml.

The workflow:

  1. Checks out this demo repository.
  2. Checks out transformers.js into a sibling directory.
  3. Builds @huggingface/transformers from source with pnpm.
  4. Builds this demo with TRANSFORMERS_LOCAL=true.
  5. Publishes dist/ to GitHub Pages.

Repository settings:

  • Enable Pages and select GitHub Actions as the source.
  • Optional repository variable TRANSFORMERS_REPO (default ysdede/transformers.js).
  • Optional repository variable TRANSFORMERS_REPO_REF (default v4-nemo-conformer-tdt-main-r3).
  • Optional secret TRANSFORMERS_REPO_TOKEN if transformers.js is private.

Notes:

  • GitHub Actions can only build commits that are pushed to GitHub.
  • workflow_dispatch supports a transformers_ref input for one-off branch/tag/SHA overrides.
  • The checked-in workflow defaults to your fork branch v4-nemo-conformer-tdt-main-r3.
  • The Vite base path is set automatically for both project pages and user pages.

Hugging Face Spaces Sync

Sync is handled by .github/workflows/sync-hf-space.yml.

The workflow:

  1. Exports an HF-safe copy of the app to hf_export/.
  2. Removes GitHub/local-only files and COI serviceworker wiring.
  3. Writes HF-specific README.md, vite.config.js, and package.json.
  4. Pushes the result to https://huggingface.co/spaces/ysdede/tdt-webgpu-demo.

Repository settings:

  • Add secret HF_TOKEN with write access to ysdede/tdt-webgpu-demo.

Node CLI Test

Run a quick transcription from the terminal:

npm run test:node -- --model ysdede/parakeet-tdt-0.6b-v2-onnx-tfjs4 --audio <path-to-wav-file> --encoder-device webgpu

By default, this script loads the local transformers build from ../transformers.js/packages/transformers/dist/transformers.node.mjs. Use --npm to use the installed npm package instead. Node CLI input must be WAV (.wav).

Option Description
--model <id-or-path> Model ID or local model path
--audio <wav-path> WAV file path
--encoder-device <webgpu|cpu> Encoder device (cpu is safer for Node)
--encoder-dtype, --decoder-dtype Examples: fp16, int8, fp32
--timestamps Request word-level timestamps
--loop <n> Repeat transcription n times
--npm Use @huggingface/transformers from node_modules
--local-module <path> Path to a local transformers node build

Included Sample

Sample audio file used by the UI: public/assets/Harvard-L2-1.ogg.

Additional Notes

UI Notes

  • Model configuration supports load mode, model ID, device, dtype, and WASM thread tuning.
  • Transcription options include explicit inference mode selection, direct Nemo API flags, pipeline timestamp settings, and audio prep controls.
  • Metrics are split by mode: pipeline shows wall-clock run timing plus audio prep, while direct mode shows audio prep plus direct model internals.
  • The transcribe workspace is organized into three columns with transcript and API contract visible at the same time.
  • Settings and theme preferences are persisted in localStorage.

License

See the repository license.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors