feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions by abhishekmishragithub · Pull Request #53 · smallest-inc/cookbook

abhishekmishragithub · 2026-05-29T10:00:21Z

Summary

Two sibling cookbook examples under misc/ that show the same live voice-chat UX two ways. Same UI, same UX, same SMALLEST_API_KEY for all three services (Pulse STT, Electron LLM, Lightning v3.1 TTS) — different data-plumbing layer.

Folder	Data layer
`misc/voice-chat-widget/`	Raw WebSocket + hand-rolled SSE proxy. Useful for seeing what's happening under the hood.
`misc/voice-chat-widget-with-vercel-sdk/`	`smallestai-vercel-provider` + `ai` + `@ai-sdk/openai-compatible`. STT goes browser-direct, no STT proxy. Cleanest path for teams already on the Vercel AI SDK.

Each folder has its own README. The raw-WS folder has the ITN deep-dive (itn_normalize, finalize_on_words=false, eou_timeout_ms, close_stream — the agentic pattern from Smallest's docs). The Vercel-SDK folder has a side-by-side param mapping (snake_case strings ↔ camelCase booleans) so anyone porting between the two only has to do mechanical renames.

What's in each folder

app/page.tsx — chat UI with push-to-talk mic + live partials in the input + sentence-boundary flush from LLM stream → TTS
lib/usePulseSTT.ts — STT hook (raw WS / SDK)
lib/useLightningTTS.ts — Lightning TTS streaming over WS (same in both — SDK doesn't yet wrap streaming TTS)
app/api/chat/route.ts — LLM proxy (hand-rolled SSE / Vercel streamText)
proxy.mjs — WebSocket bridge (/stt+/tts in raw-WS, /tts-only in SDK version)
README.md

Test plan

npm install && npm run dev boots Next + proxy cleanly in both folders
Hold the mic, say "the total is five hundred and twenty five dollars" → bubble shows $525
Hold the mic, say a phone number → bubble shows hyphenated 910-555-1234
Bot reply streams text into the bubble and audio plays in parallel
▶ replay re-streams TTS for any past message
Vercel-SDK folder: SDK emits its security warning about auth: 'query' in the proxy logs — that's by design

Note for reviewers

tsconfig.tsbuildinfo (TS incremental cache) is now in .gitignore for both folders; was accidentally committed once and removed.
.env.local never committed; .env.example is the only env file in either folder.

End-to-end Next.js example combining all three Smallest products in parallel: Pulse STT (live transcription with ITN), Electron LLM (OpenAI- compatible chat-completions streaming), and Lightning v3.1 TTS (streaming audio over WebSocket). One SMALLEST_API_KEY powers everything. The README is a deep-dive on the agentic ITN config — finalize_on_words, eou_timeout_ms, close_stream — that customers most commonly get wrong, with worked examples (currency, phone numbers, dates, emails, decimals).

vercel · 2026-05-29T10:00:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
smallest-showcase	Ready	Preview, Comment	May 29, 2026 10:00am

entelligence-ai-pr-reviews · 2026-05-29T10:13:34Z

EntelligenceAI PR Summary

Adds a complete misc/voice-chat-widget Next.js 14 example project integrating Smallest AI Pulse STT, Electron LLM, and Lightning TTS into a real-time voice chat UI.

proxy.mjs: Node.js WebSocket proxy injecting SMALLEST_API_KEY for STT and TTS endpoints, with bidirectional forwarding and RFC 6455 close code handling
app/api/chat/route.ts: Edge Runtime SSE proxy to Electron LLM, re-streaming tokens via ReadableStream
app/api/key/route.ts: Demo-only key exposure endpoint with production warning
lib/usePulseSTT.ts: React hook for real-time STT with AudioWorklet-based 48kHz→16kHz downsampling and ITN-safe teardown
lib/useLightningTTS.ts: React hook for TTS WebSocket management with PCM16 decoding, Web Audio scheduling, and word-level timestamp support
app/page.tsx: Full UI with 4-state status machine, SentenceFlusher TTS chunking, karaoke highlighting, push-to-talk controls, and echo mode
app/globals.css: Dark-themed CSS design system with animated status pill, waveform visualizer, and word-highlight styles
Project config files (.env.example, .gitignore, package.json, tsconfig.json, next.config.js, README.md) complete the standalone demo setup

Confidence Score: 2/5 - Changes Needed

Not safe to merge — this PR introduces a real-time voice chat widget with solid architectural ambition, but ships with multiple Medium-severity correctness and race-condition bugs that would produce observable failures in normal usage. The unguarded req.json() in app/api/chat/route.ts will throw unhandled SyntaxError on any malformed request, useLightningTTS.ts fires onEnd on WebSocket close rather than audio playback completion causing the UI to show idle while audio is still playing, and usePulseSTT.ts has two compounding race conditions — a stale-state double-start guard that permits concurrent WebSocket and AudioContext creation, and a WebSocket that gets opened before getUserMedia() resolves leaving an orphaned connection on permission denial. These are not edge cases; they are triggered by normal user interactions like interrupting speech or denying microphone access.

Key Findings:

In useLightningTTS.ts, onEnd is called inside ws.onclose rather than after audio playback completes, meaning the parent component in app/page.tsx will transition back to idle state while audio is still being rendered — this directly breaks the intended UX state machine for the voice chat flow.
The speak() interruption path in useLightningTTS.ts closes the old WebSocket but does not null out or guard wsRef.current before the old onclose fires, so the stale onclose handler will corrupt state (call onEnd, clear refs) for the newly initiated utterance — a classic ref-capture race condition.
In usePulseSTT.ts, start() is async and the in-progress guard reads recording state synchronously; because setRecording(true) is deferred (React batching), rapid double-calls bypass the guard, creating duplicate WebSocket connections and AudioContext instances that are never cleaned up.
The WebSocket in usePulseSTT.ts is instantiated and stored in wsRef.current before the await getUserMedia() call resolves, meaning a microphone permission denial leaves an open, unreferenced WebSocket connection with no cleanup path — a resource leak that could also interfere with subsequent legitimate start() calls.

Files requiring special attention

misc/voice-chat-widget/lib/usePulseSTT.ts
misc/voice-chat-widget/lib/useLightningTTS.ts
misc/voice-chat-widget/app/api/chat/route.ts
misc/voice-chat-widget/app/page.tsx

entelligence-ai-pr-reviews · 2026-05-29T10:13:36Z

+}
+
+export async function POST(req: Request) {
+  const { message } = await req.json();


Unguarded req.json() throws on malformed or empty body

req.json() throws a SyntaxError when the body is empty or not valid JSON, and there is no try/catch, so the Edge runtime surfaces an unhandled rejection instead of a clean error response.

Suggested change

const { message } = await req.json();

let message: string | undefined;

try {

({ message } = await req.json());

} catch {

return new Response("Invalid JSON body", { status: 400 });

}

if (!message) {

return new Response("Missing 'message' field", { status: 400 });

}

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In misc/voice-chat-widget/app/api/chat/route.ts, line 62, `const { message } = await req.json();` is not wrapped in a try/catch. Replace it with a try/catch block that returns a 400 response on parse failure, and add a check that `message` is a non-empty string before proceeding. Example: let message: string | undefined; try { ({ message } = await req.json()); } catch { return new Response("Invalid JSON body", { status: 400 }); } if (!message) { return new Response("Missing 'message' field", { status: 400 }); } Insert this before the existing `const key = process.env.SMALLEST_API_KEY;` check.

entelligence-ai-pr-reviews · 2026-05-29T10:13:36Z

+        onEnd: () => {
+          setStatus("idle");
+          setMessages((prev) =>
+            prev.map((m) =>
+              m.id !== messageId
+                ? m
+                : { ...m, words: m.words.map((w) => ({ ...w, spoken: true, current: false })) }


onEnd fires on WS close, not audio complete — status goes idle while audio plays

useLightningTTS.ts fires onEnd inside ws.onclose, which triggers as soon as the WebSocket closes (all data received), not when the AudioContext finishes playing the buffered PCM. page.tsx calls setStatus('idle') in onEnd, so the status pill returns to 'idle' and words are marked 'spoken' while the AudioContext is still playing the last several seconds of scheduled audio.

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `useLightningTTS.ts`, instead of calling `onEnd?.()` directly in `ws.onclose`, schedule it on the AudioContext timeline: after the final chunk is received and the WS is about to close, compute `delay = Math.max(0, nextStartRef.current - ctx.currentTime)` and call `setTimeout(onEnd, delay * 1000)`. This defers `onEnd` until the AudioContext has actually finished playing all buffered audio, so `page.tsx`'s `setStatus('idle')` and 'spoken' word marking align with real playback completion.

entelligence-ai-pr-reviews · 2026-05-29T10:13:36Z

+    ws.onclose = () => {
+      onEnd?.();
+      setSpeaking(false);
+      wsRef.current = null;
+    };


Old WS onclose corrupts new utterance's state when speak() interrupts in-flight TTS

When speak() is called while a WS is active, the old socket is closed and wsRef.current is immediately overwritten with the new WS (line 96). When the old socket's onclose fires asynchronously, it sets setSpeaking(false) and wsRef.current = null, clobbering the new utterance's state. The replay button in page.tsx:368 is a direct trigger: clicking it while audio plays leaves speaking=false and a null wsRef, so a subsequent stop() call silently does nothing.

Suggested change

ws.onclose = () => {

onEnd?.();

setSpeaking(false);

wsRef.current = null;

};

ws.onclose = () => {

if (wsRef.current === ws) {

onEnd?.();

setSpeaking(false);

wsRef.current = null;

}

};

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/useLightningTTS.ts`, lines 145-149, the `ws.onclose` handler unconditionally calls `setSpeaking(false)` and sets `wsRef.current = null`. This is wrong when `speak()` is called while a previous WebSocket is still open: the old socket's `onclose` fires after `wsRef.current` has already been replaced with the new WebSocket, so it clobbers the new utterance's state. Fix: guard the handler so it only updates state if `wsRef.current` still refers to THIS socket: ```ts ws.onclose = () => { if (wsRef.current === ws) { onEnd?.(); setSpeaking(false); wsRef.current = null; } };

This ensures that when speak() interrupts an in-flight TTS (e.g., replay button clicked during active streaming), the stale onclose from the old socket does not flip speaking back to false or null out the new active WebSocket reference.

</details>

entelligence-ai-pr-reviews · 2026-05-29T10:13:36Z

+  }, []);
+
+  const start = useCallback(async () => {
+    if (recording) return;


Stale-state double-start guard allows concurrent WS and AudioContext creation

start() is async and setRecording(true) is only reached at line 158 after mic acquisition and worklet setup. Two rapid calls both see recording=false, pass the guard, open two WebSockets and two AudioContexts; the second overwrites wsRef.current at line 87 without closing the first, permanently orphaning it.

Suggested change

if (recording) return;

const startingRef = useRef(false);

const start = useCallback(async () => {

if (recording || startingRef.current) return;

startingRef.current = true;

try {

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/usePulseSTT.ts`, the `start()` function at line 57 guards double-invocation with `if (recording) return`, but `recording` is React state that is only set to `true` at line 158 after several async operations. Two rapid calls both pass the guard and create duplicate WebSockets and AudioContexts; only the second's WebSocket survives in `wsRef.current`, leaking the first. Fix: add a `useRef<boolean>(false)` ref (e.g., `startingRef`) that is set to `true` immediately on entry and back to `false` after setup completes (or on error). Change the guard to `if (recording || startingRef.current) return;` and wrap the async body in try/finally to reset `startingRef.current = false`.

entelligence-ai-pr-reviews · 2026-05-29T10:13:36Z

+    const ws = new WebSocket(url);
+    ws.binaryType = "arraybuffer";
+    wsRef.current = ws;


WebSocket opened before getUserMedia — orphaned on permission denial

The WebSocket is created and stored in wsRef.current before getUserMedia() is awaited. If getUserMedia() throws (permission denied), wsRef.current holds an open socket. The next start() call overwrites wsRef.current at the same line without closing the previous socket, permanently leaking the upstream connection.

Suggested change

const ws = new WebSocket(url);

ws.binaryType = "arraybuffer";

wsRef.current = ws;

// Get mic first — if permission is denied, no WS is opened.

const stream = await navigator.mediaDevices.getUserMedia({

audio: { echoCancellation: true, noiseSuppression: true, channelCount: 1 },

});

streamRef.current = stream;

const base = proxyUrl || `ws://${location.hostname}:3031/stt`;

const qs = new URLSearchParams({

language,

encoding: "linear16",

sample_rate: "16000",

itn_normalize: "true",

finalize_on_words: "false",

eou_timeout_ms: "1000",

});

const url = `${base}?${qs}`;

const ws = new WebSocket(url);

ws.binaryType = "arraybuffer";

wsRef.current = ws;

Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `misc/voice-chat-widget/lib/usePulseSTT.ts`, the `start()` function opens a WebSocket at line 85 (`const ws = new WebSocket(url)`) before awaiting `navigator.mediaDevices.getUserMedia()` at line 113. If `getUserMedia()` throws, the open WebSocket is stored in `wsRef.current` and leaked when `start()` is retried (the retry overwrites `wsRef.current` without closing the old socket). Fix: move the `getUserMedia()` call and the stream/AudioContext setup (lines 113–119) to come BEFORE the WebSocket construction (lines 84–87), so that if mic permission is denied, no WebSocket is ever opened.

- New folder misc/voice-chat-widget-with-vercel-sdk/ — same UX as the raw-WS sibling shipped in #53, rebuilt on smallestai-vercel-provider + ai + @ai-sdk/openai-compatible. STT goes browser-direct via auth: 'query', LLM via streamText, streaming TTS still raw-WS (SDK doesn't wrap it yet). - Raw-WS README: appended ITN gotcha #8 — spoken 'and' inside dollar amounts ('five hundred and twenty five dollars') breaks the cardinal entity and produces '500 and 25 dollars', not '$525'. Workaround: drop the 'and'. - Both .gitignore files: exclude *.tsbuildinfo.

abhishekmishragithub merged commit 4c2a5ef into main May 29, 2026
2 checks passed

entelligence-ai-pr-reviews Bot reviewed May 29, 2026

View reviewed changes

abhishekmishragithub changed the title ~~feat(misc): voice-chat-widget — live STT + Electron + Lightning TTS~~ feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions May 29, 2026

This was referenced May 29, 2026

feat(misc): voice-chat-widget-with-vercel-sdk + ITN "and" gotcha #54

Closed

feat(misc): voice-chat-widget-with-vercel-sdk + ITN "and" gotcha #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions#53

feat(misc): voice-chat-widget — raw-WS + Vercel SDK versions#53
abhishekmishragithub merged 1 commit into
mainfrom
misc/voice-chat-widget

abhishekmishragithub commented May 29, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

Uh oh!

entelligence-ai-pr-reviews Bot commented May 29, 2026

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-  const { message } = await req.json();
+  let message: string | undefined;
+  try {
+    ({ message } = await req.json());
+  } catch {
+    return new Response("Invalid JSON body", { status: 400 });
+  }
+  if (!message) {
+    return new Response("Missing 'message' field", { status: 400 });
+  }

-    if (recording) return;
+  const startingRef = useRef(false);
+  const start = useCallback(async () => {
+    if (recording || startingRef.current) return;
+    startingRef.current = true;
+    try {

-    const ws = new WebSocket(url);
-    ws.binaryType = "arraybuffer";
-    wsRef.current = ws;
+    // Get mic first — if permission is denied, no WS is opened.
+    const stream = await navigator.mediaDevices.getUserMedia({
+      audio: { echoCancellation: true, noiseSuppression: true, channelCount: 1 },
+    });
+    streamRef.current = stream;
+    const base = proxyUrl || `ws://${location.hostname}:3031/stt`;
+    const qs = new URLSearchParams({
+      language,
+      encoding: "linear16",
+      sample_rate: "16000",
+      itn_normalize: "true",
+      finalize_on_words: "false",
+      eou_timeout_ms: "1000",
+    });
+    const url = `${base}?${qs}`;
+    const ws = new WebSocket(url);
+    ws.binaryType = "arraybuffer";
+    wsRef.current = ws;

Conversation

abhishekmishragithub commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in each folder

Test plan

Note for reviewers

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

Uh oh!

entelligence-ai-pr-reviews Bot commented May 29, 2026

EntelligenceAI PR Summary

Confidence Score: 2/5 - Changes Needed

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abhishekmishragithub commented May 29, 2026 •

edited

Loading