Skip to content

feat(stt): interactive VAD events demo on the feature page#272

Open
abhishekmishragithub wants to merge 4 commits into
feat/vad-events-pulse-sttfrom
feat/vad-events-interactive-demo
Open

feat(stt): interactive VAD events demo on the feature page#272
abhishekmishragithub wants to merge 4 commits into
feat/vad-events-pulse-sttfrom
feat/vad-events-interactive-demo

Conversation

@abhishekmishragithub

Copy link
Copy Markdown
Collaborator

Stacks on top of #271. Adds an interactive React component to the VAD events page that lets readers play sample audio and see the captured server message stream in sync.

What ships

File Purpose
`fern/components/vad-events-demo/VadEventsDemo.tsx` React component. SVG waveform + native audio playback + scrolling event log. ~16 KB.
`fern/components/vad-events-demo/assets/{clean,multi-turn,no-tail}.mp3` Three audio samples. ~210 KB total.
`fern/components/vad-events-demo/fixtures/{clean,multi-turn,no-tail}.json` Captured live event streams + precomputed amplitude arrays. ~20 KB total.
`scripts/spec-live-tests/prep_vad_demo_fixtures.py` Reproducer. Downloads source audio, builds three variants with ffmpeg, runs each through the live Pulse STT WebSocket, writes fixtures.
`fern/products/waves/.../vad-events.mdx` + mirror Adds the import and a `` block under a "Try it" section.

Three fixtures, three behaviors

Fixture speech_started speech_ended Transcripts Illustrates
clean 1 1 7 Normal single-utterance case.
multi-turn 2 2 13 Per-voiced-region behavior across two turns.
no-tail 1 0 6 The "speech_ended requires trailing silence" caveat from Notes.

Every event payload in the fixtures comes directly from the production endpoint. The component does not connect to a WebSocket at runtime; everything is static.

Behavior

  • Sample picker at the top switches between fixtures. Each chip shows a title and a one-line subtitle.
  • Waveform is rendered as 200 SVG bars from a precomputed peak-amplitude array. Vertical dashed markers sit at the timestamps of every `speech_started` and `speech_ended` event.
  • Play button drives a native `` element. A live playhead moves across the waveform.
  • Event log scrolls in sync. The row whose timestamp matches the playhead highlights and auto-scrolls into view. Earlier rows fade in (full opacity once "reached"), later rows are dimmed.

Why this design

Two ideas, one demo:

  1. Timing: the waveform + markers make the acoustic-boundary concept tactile. The reader hears the sample audio and sees where the events fire on the wave.
  2. Wire shape: the event log shows the actual JSON each customer would receive, with the same discriminator field they'd switch on in their code.

The Mermaid sequence diagram below the demo is the formal contract; the interactive demo is the hands-on layer above it.

Verification

  • `fern check`: 0 errors.
  • Live fixture regeneration verified: all three variants captured with expected event counts. Replayable via `SMALLEST_API_KEY=... python3 scripts/spec-live-tests/prep_vad_demo_fixtures.py`.
  • v4-mirror diff: empty.
  • llms.txt regenerated; in sync.

Test plan

  • `fern check` green.
  • Render check on Vercel preview: confirm the component renders, the play button plays audio, markers display at the right positions, event log highlights in sync.
  • Confirm preview works on both desktop and mobile (the waveform SVG uses `preserveAspectRatio="none"` so it should scale; the fixture chips wrap on narrow screens).

Adds a React component that plays one of three pre-recorded audio samples
and renders the captured server message stream in sync with playback.
Three fixtures show the three behaviors documented on the page:

- clean       : 1 speech_started + 1 speech_ended (normal case)
- multi-turn  : 2 of each (per-voiced-region behavior)
- no-tail     : speech_started fires; speech_ended does not (caveat)

Each fixture is captured live from the production WebSocket; the JSON
files are the actual recorded message stream. No client-side WebSocket
connection at runtime; everything ships as static assets.

## Files

- fern/components/vad-events-demo/VadEventsDemo.tsx
- fern/components/vad-events-demo/assets/{clean,multi-turn,no-tail}.mp3
- fern/components/vad-events-demo/fixtures/{clean,multi-turn,no-tail}.json
- scripts/spec-live-tests/prep_vad_demo_fixtures.py
    Regenerates audio + fixtures by running each variant through the
    live Pulse STT endpoint. Requires ffmpeg + SMALLEST_API_KEY.
- fern/products/waves/.../vad-events.mdx (+ versions mirror)
    Imports the component, renders it under a "Try it" section above
    the existing Mermaid sequence diagram.

## Sizes

| File | Size |
|---|---|
| VadEventsDemo.tsx | 16 KB |
| clean.mp3 | 60 KB |
| multi-turn.mp3 | 108 KB |
| no-tail.mp3 | 48 KB |
| 3 x fixture.json | ~20 KB |
| total | ~252 KB |

## Verification

- fern check: 0 errors.
- Live regeneration of fixtures verified: ran prep script against prod,
  all three variants captured with the expected event counts.
@abhishekmishragithub abhishekmishragithub force-pushed the feat/vad-events-interactive-demo branch from bb3f3de to 38b854f Compare June 23, 2026 10:10
…t root

Component was failing to render with 'Could not resolve ./fixtures/*.json'
and './assets/*.mp3'. Root cause: Fern's MDX bundler only resolves
.tsx/.ts/.js/.mdx imports from custom React components. Relative JSON
and binary asset imports are not supported.

Changes:
- Convert fixtures from JSON to typed TS modules (clean.ts etc.) that
  export a Fixture object. Component imports those instead.
- Add a shared types.ts so the fixture and component agree on shape.
- Move MP3s to fern/docs/assets/vad-events/ (Fern's static asset root).
  Each fixture's `audio` field now points at /assets/vad-events/<name>.mp3
  which is the served URL on the rendered site.
- Update prep script accordingly so future regens emit the TS format +
  write MP3s to the new asset location.
…of page

Two fixes from review feedback on the rendered preview.

1. Play/pause did nothing because the audio src pointed at
   /assets/vad-events/<name>.mp3, which 404s on docs.smallest.ai.
   Fern does not serve fern/docs/assets/ at runtime — that path is for
   MDX compile-time references only (e.g. <img src="../../docs/assets/...">).
   Custom React components running at runtime have no equivalent. Fix:
   embed each MP3 as a base64 data URL directly in the fixture TS module.
   Self-contained, no external CDN, no CSP issue.

2. Move the demo from the top of the page to the bottom after Notes, so
   readers see the param + payload + behavior caveats before the
   interactive walkthrough.

Fixture sizes after embedding:
- clean.ts: 80 KB
- multi-turn.ts: 152 KB
- no-tail.ts: 64 KB
…ix sequence ordering

Three improvements after reviewing the rendered demo.

1. **Maverick voice via Lightning v3.1 Pro** for the audio samples. More
   natural than the cookbook test audio and showcases real product output.
   Dialogue is conversational with fillers ('um', 'so', 'yeah') so the
   samples represent real call audio, not test reads.

2. **Real word timestamps** on transcripts. The Pulse WS does not put a
   top-level timestamp on transcription messages, only on speech_started /
   speech_ended. Previous fixtures synthesized transcript times by spreading
   them evenly across the audio duration. Now the prep script enables
   word_timestamps=true and uses words[-1].end for each transcript.

3. **Sort by timestamp** so the demo timeline is monotonic. With real word
   timestamps, partials emitted before silence had earlier times than the
   speech_ended that follows them in wire order — the highlight picker
   walked off the end of the array. Sorting the events by their assigned
   timeline timestamp puts them in playback order.

Component picker also fixed defensively: scans the full array instead of
stopping at the first out-of-order timestamp.

Sequences captured live from prod:
- clean      : speech_started @0.032s, speech_ended @5.36s, 6 transcripts
- multi-turn : 2 speech_started + 2 speech_ended pairs, 6 transcripts
- no-tail    : speech_started @0.384s, no speech_ended (caveat), 3 transcripts

Fixture sizes: 80 KB / 84 KB / 28 KB (base64-embedded MP3 + envelope + events).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant