Skip to content

feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions#53

Merged
abhishekmishragithub merged 1 commit into
mainfrom
misc/voice-chat-widget
May 29, 2026
Merged

feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions#53
abhishekmishragithub merged 1 commit into
mainfrom
misc/voice-chat-widget

Conversation

@abhishekmishragithub

@abhishekmishragithub abhishekmishragithub commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two sibling cookbook examples under misc/ that show the same live voice-chat UX two ways. Same UI, same UX, same SMALLEST_API_KEY for all three services (Pulse STT, Electron LLM, Lightning v3.1 TTS) — different data-plumbing layer.

Folder Data layer
misc/voice-chat-widget/ Raw WebSocket + hand-rolled SSE proxy. Useful for seeing what's happening under the hood.
misc/voice-chat-widget-with-vercel-sdk/ smallestai-vercel-provider + ai + @ai-sdk/openai-compatible. STT goes browser-direct, no STT proxy. Cleanest path for teams already on the Vercel AI SDK.

Each folder has its own README. The raw-WS folder has the ITN deep-dive (itn_normalize, finalize_on_words=false, eou_timeout_ms, close_stream — the agentic pattern from Smallest's docs). The Vercel-SDK folder has a side-by-side param mapping (snake_case strings ↔ camelCase booleans) so anyone porting between the two only has to do mechanical renames.

What's in each folder

  • app/page.tsx — chat UI with push-to-talk mic + live partials in the input + sentence-boundary flush from LLM stream → TTS
  • lib/usePulseSTT.ts — STT hook (raw WS / SDK)
  • lib/useLightningTTS.ts — Lightning TTS streaming over WS (same in both — SDK doesn't yet wrap streaming TTS)
  • app/api/chat/route.ts — LLM proxy (hand-rolled SSE / Vercel streamText)
  • proxy.mjs — WebSocket bridge (/stt+/tts in raw-WS, /tts-only in SDK version)
  • README.md

Test plan

  • npm install && npm run dev boots Next + proxy cleanly in both folders
  • Hold the mic, say "the total is five hundred and twenty five dollars" → bubble shows $525
  • Hold the mic, say a phone number → bubble shows hyphenated 910-555-1234
  • Bot reply streams text into the bubble and audio plays in parallel
  • ▶ replay re-streams TTS for any past message
  • Vercel-SDK folder: SDK emits its security warning about auth: 'query' in the proxy logs — that's by design

Note for reviewers

  • tsconfig.tsbuildinfo (TS incremental cache) is now in .gitignore for both folders; was accidentally committed once and removed.
  • .env.local never committed; .env.example is the only env file in either folder.

End-to-end Next.js example combining all three Smallest products in
parallel: Pulse STT (live transcription with ITN), Electron LLM (OpenAI-
compatible chat-completions streaming), and Lightning v3.1 TTS (streaming
audio over WebSocket). One SMALLEST_API_KEY powers everything.

The README is a deep-dive on the agentic ITN config — finalize_on_words,
eou_timeout_ms, close_stream — that customers most commonly get wrong, with
worked examples (currency, phone numbers, dates, emails, decimals).
@vercel

vercel Bot commented May 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
smallest-showcase Ready Ready Preview, Comment May 29, 2026 10:00am

Request Review

@abhishekmishragithub abhishekmishragithub merged commit 4c2a5ef into main May 29, 2026
2 checks passed
@entelligence-ai-pr-reviews

Copy link
Copy Markdown

EntelligenceAI PR Summary

Adds a complete misc/voice-chat-widget Next.js 14 example project integrating Smallest AI Pulse STT, Electron LLM, and Lightning TTS into a real-time voice chat UI.

  • proxy.mjs: Node.js WebSocket proxy injecting SMALLEST_API_KEY for STT and TTS endpoints, with bidirectional forwarding and RFC 6455 close code handling
  • app/api/chat/route.ts: Edge Runtime SSE proxy to Electron LLM, re-streaming tokens via ReadableStream
  • app/api/key/route.ts: Demo-only key exposure endpoint with production warning
  • lib/usePulseSTT.ts: React hook for real-time STT with AudioWorklet-based 48kHz→16kHz downsampling and ITN-safe teardown
  • lib/useLightningTTS.ts: React hook for TTS WebSocket management with PCM16 decoding, Web Audio scheduling, and word-level timestamp support
  • app/page.tsx: Full UI with 4-state status machine, SentenceFlusher TTS chunking, karaoke highlighting, push-to-talk controls, and echo mode
  • app/globals.css: Dark-themed CSS design system with animated status pill, waveform visualizer, and word-highlight styles
  • Project config files (.env.example, .gitignore, package.json, tsconfig.json, next.config.js, README.md) complete the standalone demo setup

Confidence Score: 2/5 - Changes Needed

Not safe to merge — this PR introduces a real-time voice chat widget with solid architectural ambition, but ships with multiple Medium-severity correctness and race-condition bugs that would produce observable failures in normal usage. The unguarded req.json() in app/api/chat/route.ts will throw unhandled SyntaxError on any malformed request, useLightningTTS.ts fires onEnd on WebSocket close rather than audio playback completion causing the UI to show idle while audio is still playing, and usePulseSTT.ts has two compounding race conditions — a stale-state double-start guard that permits concurrent WebSocket and AudioContext creation, and a WebSocket that gets opened before getUserMedia() resolves leaving an orphaned connection on permission denial. These are not edge cases; they are triggered by normal user interactions like interrupting speech or denying microphone access.

Key Findings:

  • In useLightningTTS.ts, onEnd is called inside ws.onclose rather than after audio playback completes, meaning the parent component in app/page.tsx will transition back to idle state while audio is still being rendered — this directly breaks the intended UX state machine for the voice chat flow.
  • The speak() interruption path in useLightningTTS.ts closes the old WebSocket but does not null out or guard wsRef.current before the old onclose fires, so the stale onclose handler will corrupt state (call onEnd, clear refs) for the newly initiated utterance — a classic ref-capture race condition.
  • In usePulseSTT.ts, start() is async and the in-progress guard reads recording state synchronously; because setRecording(true) is deferred (React batching), rapid double-calls bypass the guard, creating duplicate WebSocket connections and AudioContext instances that are never cleaned up.
  • The WebSocket in usePulseSTT.ts is instantiated and stored in wsRef.current before the await getUserMedia() call resolves, meaning a microphone permission denial leaves an open, unreferenced WebSocket connection with no cleanup path — a resource leak that could also interfere with subsequent legitimate start() calls.
Files requiring special attention
  • misc/voice-chat-widget/lib/usePulseSTT.ts
  • misc/voice-chat-widget/lib/useLightningTTS.ts
  • misc/voice-chat-widget/app/api/chat/route.ts
  • misc/voice-chat-widget/app/page.tsx

}

export async function POST(req: Request) {
const { message } = await req.json();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR CORRECTNESS Unguarded req.json() throws on malformed or empty body

req.json() throws a SyntaxError when the body is empty or not valid JSON, and there is no try/catch, so the Edge runtime surfaces an unhandled rejection instead of a clean error response.

Suggested change
const { message } = await req.json();
let message: string | undefined;
try {
({ message } = await req.json());
} catch {
return new Response("Invalid JSON body", { status: 400 });
}
if (!message) {
return new Response("Missing 'message' field", { status: 400 });
}
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In misc/voice-chat-widget/app/api/chat/route.ts, line 62, `const { message } = await req.json();` is not wrapped in a try/catch. Replace it with a try/catch block that returns a 400 response on parse failure, and add a check that `message` is a non-empty string before proceeding. Example:

  let message: string | undefined;
  try {
    ({ message } = await req.json());
  } catch {
    return new Response("Invalid JSON body", { status: 400 });
  }
  if (!message) {
    return new Response("Missing 'message' field", { status: 400 });
  }

Insert this before the existing `const key = process.env.SMALLEST_API_KEY;` check.

Comment on lines +124 to +130
onEnd: () => {
setStatus("idle");
setMessages((prev) =>
prev.map((m) =>
m.id !== messageId
? m
: { ...m, words: m.words.map((w) => ({ ...w, spoken: true, current: false })) }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR CORRECTNESS onEnd fires on WS close, not audio complete — status goes idle while audio plays

useLightningTTS.ts fires onEnd inside ws.onclose, which triggers as soon as the WebSocket closes (all data received), not when the AudioContext finishes playing the buffered PCM. page.tsx calls setStatus('idle') in onEnd, so the status pill returns to 'idle' and words are marked 'spoken' while the AudioContext is still playing the last several seconds of scheduled audio.

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `useLightningTTS.ts`, instead of calling `onEnd?.()` directly in `ws.onclose`, schedule it on the AudioContext timeline: after the final chunk is received and the WS is about to close, compute `delay = Math.max(0, nextStartRef.current - ctx.currentTime)` and call `setTimeout(onEnd, delay * 1000)`. This defers `onEnd` until the AudioContext has actually finished playing all buffered audio, so `page.tsx`'s `setStatus('idle')` and 'spoken' word marking align with real playback completion.

Comment on lines +145 to +149
ws.onclose = () => {
onEnd?.();
setSpeaking(false);
wsRef.current = null;
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR RACE Old WS onclose corrupts new utterance's state when speak() interrupts in-flight TTS

When speak() is called while a WS is active, the old socket is closed and wsRef.current is immediately overwritten with the new WS (line 96). When the old socket's onclose fires asynchronously, it sets setSpeaking(false) and wsRef.current = null, clobbering the new utterance's state. The replay button in page.tsx:368 is a direct trigger: clicking it while audio plays leaves speaking=false and a null wsRef, so a subsequent stop() call silently does nothing.

Suggested change
ws.onclose = () => {
onEnd?.();
setSpeaking(false);
wsRef.current = null;
};
ws.onclose = () => {
if (wsRef.current === ws) {
onEnd?.();
setSpeaking(false);
wsRef.current = null;
}
};
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/useLightningTTS.ts`, lines 145-149, the `ws.onclose` handler unconditionally calls `setSpeaking(false)` and sets `wsRef.current = null`. This is wrong when `speak()` is called while a previous WebSocket is still open: the old socket's `onclose` fires after `wsRef.current` has already been replaced with the new WebSocket, so it clobbers the new utterance's state.

Fix: guard the handler so it only updates state if `wsRef.current` still refers to THIS socket:

```ts
ws.onclose = () => {
  if (wsRef.current === ws) {
    onEnd?.();
    setSpeaking(false);
    wsRef.current = null;
  }
};

This ensures that when speak() interrupts an in-flight TTS (e.g., replay button clicked during active streaming), the stale onclose from the old socket does not flip speaking back to false or null out the new active WebSocket reference.


</details>

}, []);

const start = useCallback(async () => {
if (recording) return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR RACE Stale-state double-start guard allows concurrent WS and AudioContext creation

start() is async and setRecording(true) is only reached at line 158 after mic acquisition and worklet setup. Two rapid calls both see recording=false, pass the guard, open two WebSockets and two AudioContexts; the second overwrites wsRef.current at line 87 without closing the first, permanently orphaning it.

Suggested change
if (recording) return;
const startingRef = useRef(false);
const start = useCallback(async () => {
if (recording || startingRef.current) return;
startingRef.current = true;
try {
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/usePulseSTT.ts`, the `start()` function at line 57 guards double-invocation with `if (recording) return`, but `recording` is React state that is only set to `true` at line 158 after several async operations. Two rapid calls both pass the guard and create duplicate WebSockets and AudioContexts; only the second's WebSocket survives in `wsRef.current`, leaking the first. Fix: add a `useRef<boolean>(false)` ref (e.g., `startingRef`) that is set to `true` immediately on entry and back to `false` after setup completes (or on error). Change the guard to `if (recording || startingRef.current) return;` and wrap the async body in try/finally to reset `startingRef.current = false`.

Comment on lines +85 to +87
const ws = new WebSocket(url);
ws.binaryType = "arraybuffer";
wsRef.current = ws;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR RELIABILITY WebSocket opened before getUserMedia — orphaned on permission denial

The WebSocket is created and stored in wsRef.current before getUserMedia() is awaited. If getUserMedia() throws (permission denied), wsRef.current holds an open socket. The next start() call overwrites wsRef.current at the same line without closing the previous socket, permanently leaking the upstream connection.

Suggested change
const ws = new WebSocket(url);
ws.binaryType = "arraybuffer";
wsRef.current = ws;
// Get mic first — if permission is denied, no WS is opened.
const stream = await navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true, channelCount: 1 },
});
streamRef.current = stream;
const base = proxyUrl || `ws://${location.hostname}:3031/stt`;
const qs = new URLSearchParams({
language,
encoding: "linear16",
sample_rate: "16000",
itn_normalize: "true",
finalize_on_words: "false",
eou_timeout_ms: "1000",
});
const url = `${base}?${qs}`;
const ws = new WebSocket(url);
ws.binaryType = "arraybuffer";
wsRef.current = ws;
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/usePulseSTT.ts`, the `start()` function opens a WebSocket at line 85 (`const ws = new WebSocket(url)`) before awaiting `navigator.mediaDevices.getUserMedia()` at line 113. If `getUserMedia()` throws, the open WebSocket is stored in `wsRef.current` and leaked when `start()` is retried (the retry overwrites `wsRef.current` without closing the old socket). Fix: move the `getUserMedia()` call and the stream/AudioContext setup (lines 113–119) to come BEFORE the WebSocket construction (lines 84–87), so that if mic permission is denied, no WebSocket is ever opened.

@abhishekmishragithub abhishekmishragithub changed the title feat(misc): voice-chat-widget — live STT + Electron + Lightning TTS feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions May 29, 2026
abhishekmishragithub added a commit that referenced this pull request May 29, 2026
- New folder misc/voice-chat-widget-with-vercel-sdk/ — same UX as the raw-WS
  sibling shipped in #53, rebuilt on smallestai-vercel-provider + ai +
  @ai-sdk/openai-compatible. STT goes browser-direct via auth: 'query',
  LLM via streamText, streaming TTS still raw-WS (SDK doesn't wrap it yet).
- Raw-WS README: appended ITN gotcha #8 — spoken 'and' inside dollar
  amounts ('five hundred and twenty five dollars') breaks the cardinal
  entity and produces '500 and 25 dollars', not '$525'. Workaround:
  drop the 'and'.
- Both .gitignore files: exclude *.tsbuildinfo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant