Live demo: https://ysdede.github.io/tdt-webgpu-demo/
This project is a React + Vite web app for automatic speech recognition with Nemo Conformer TDT models using Transformers.js v4.
- Run Parakeet-style TDT ASR models in the browser, for example
ysdede/parakeet-tdt-0.6b-v2-onnx-tfjs4. - Choose encoder backend and dtype settings (WebGPU or WASM).
- Switch between HF pipeline mode and direct
model.transcribe()mode with mode-specific controls. - Compare browser audio prep paths, including a deterministic custom JS path with linear parity resampling and optional higher-quality SRC modes.
- Transcribe sample or uploaded audio.
- Inspect transcript, timestamps, confidence data, raw JSON output, and separated run/audio/model metrics.
- Run a Node.js CLI test script for quick non-UI checks.
- Node.js 18 or newer.
- A modern browser. WebGPU (Chrome or Edge) is recommended for best encoder performance.
- Optional local development setup for
transformers.jsas a sibling folder at../transformers.js.
git clone <this-repo-url>
cd transformers-v4-parakeet-demo
npm installUses @huggingface/transformers@next from npm:
npm run devThen open the URL shown by Vite (typically http://localhost:5173).
Use this when you want to test local transformers.js changes without publishing a package.
- Keep both repositories as siblings:
.../transformers.js/.../transformers-v4-parakeet-demo/
- Build transformers from the
transformers.jsroot:
cd path/to/transformers.js
pnpm --filter @huggingface/transformers run build- Start the demo in local mode:
cd path/to/transformers-v4-parakeet-demo
npm run dev:localdev:local sets TRANSFORMERS_LOCAL=true and aliases @huggingface/transformers to ../transformers.js/packages/transformers/dist/transformers.web.js.
npm run build
npm run previewDeployment is handled by .github/workflows/deploy-pages.yml.
The workflow:
- Checks out this demo repository.
- Checks out
transformers.jsinto a sibling directory. - Builds
@huggingface/transformersfrom source withpnpm. - Builds this demo with
TRANSFORMERS_LOCAL=true. - Publishes
dist/to GitHub Pages.
Repository settings:
- Enable Pages and select
GitHub Actionsas the source. - Optional repository variable
TRANSFORMERS_REPO(defaultysdede/transformers.js). - Optional repository variable
TRANSFORMERS_REPO_REF(defaultv4-nemo-conformer-tdt-main-r3). - Optional secret
TRANSFORMERS_REPO_TOKENiftransformers.jsis private.
Notes:
- GitHub Actions can only build commits that are pushed to GitHub.
workflow_dispatchsupports atransformers_refinput for one-off branch/tag/SHA overrides.- The checked-in workflow defaults to your fork branch
v4-nemo-conformer-tdt-main-r3. - The Vite base path is set automatically for both project pages and user pages.
Sync is handled by .github/workflows/sync-hf-space.yml.
The workflow:
- Exports an HF-safe copy of the app to
hf_export/. - Removes GitHub/local-only files and COI serviceworker wiring.
- Writes HF-specific
README.md,vite.config.js, andpackage.json. - Pushes the result to
https://huggingface.co/spaces/ysdede/tdt-webgpu-demo.
Repository settings:
- Add secret
HF_TOKENwith write access toysdede/tdt-webgpu-demo.
Run a quick transcription from the terminal:
npm run test:node -- --model ysdede/parakeet-tdt-0.6b-v2-onnx-tfjs4 --audio <path-to-wav-file> --encoder-device webgpuBy default, this script loads the local transformers build from ../transformers.js/packages/transformers/dist/transformers.node.mjs.
Use --npm to use the installed npm package instead.
Node CLI input must be WAV (.wav).
| Option | Description |
|---|---|
--model <id-or-path> |
Model ID or local model path |
--audio <wav-path> |
WAV file path |
--encoder-device <webgpu|cpu> |
Encoder device (cpu is safer for Node) |
--encoder-dtype, --decoder-dtype |
Examples: fp16, int8, fp32 |
--timestamps |
Request word-level timestamps |
--loop <n> |
Repeat transcription n times |
--npm |
Use @huggingface/transformers from node_modules |
--local-module <path> |
Path to a local transformers node build |
Sample audio file used by the UI: public/assets/Harvard-L2-1.ogg.
- Model configuration supports load mode, model ID, device, dtype, and WASM thread tuning.
- Transcription options include explicit inference mode selection, direct Nemo API flags, pipeline timestamp settings, and audio prep controls.
- Metrics are split by mode: pipeline shows wall-clock run timing plus audio prep, while direct mode shows audio prep plus direct model internals.
- The transcribe workspace is organized into three columns with transcript and API contract visible at the same time.
- Settings and theme preferences are persisted in
localStorage.
See the repository license.