feat(stt): interactive VAD events demo on the feature page#272
Open
abhishekmishragithub wants to merge 4 commits into
Open
feat(stt): interactive VAD events demo on the feature page#272abhishekmishragithub wants to merge 4 commits into
abhishekmishragithub wants to merge 4 commits into
Conversation
Adds a React component that plays one of three pre-recorded audio samples
and renders the captured server message stream in sync with playback.
Three fixtures show the three behaviors documented on the page:
- clean : 1 speech_started + 1 speech_ended (normal case)
- multi-turn : 2 of each (per-voiced-region behavior)
- no-tail : speech_started fires; speech_ended does not (caveat)
Each fixture is captured live from the production WebSocket; the JSON
files are the actual recorded message stream. No client-side WebSocket
connection at runtime; everything ships as static assets.
## Files
- fern/components/vad-events-demo/VadEventsDemo.tsx
- fern/components/vad-events-demo/assets/{clean,multi-turn,no-tail}.mp3
- fern/components/vad-events-demo/fixtures/{clean,multi-turn,no-tail}.json
- scripts/spec-live-tests/prep_vad_demo_fixtures.py
Regenerates audio + fixtures by running each variant through the
live Pulse STT endpoint. Requires ffmpeg + SMALLEST_API_KEY.
- fern/products/waves/.../vad-events.mdx (+ versions mirror)
Imports the component, renders it under a "Try it" section above
the existing Mermaid sequence diagram.
## Sizes
| File | Size |
|---|---|
| VadEventsDemo.tsx | 16 KB |
| clean.mp3 | 60 KB |
| multi-turn.mp3 | 108 KB |
| no-tail.mp3 | 48 KB |
| 3 x fixture.json | ~20 KB |
| total | ~252 KB |
## Verification
- fern check: 0 errors.
- Live regeneration of fixtures verified: ran prep script against prod,
all three variants captured with the expected event counts.
bb3f3de to
38b854f
Compare
…t root Component was failing to render with 'Could not resolve ./fixtures/*.json' and './assets/*.mp3'. Root cause: Fern's MDX bundler only resolves .tsx/.ts/.js/.mdx imports from custom React components. Relative JSON and binary asset imports are not supported. Changes: - Convert fixtures from JSON to typed TS modules (clean.ts etc.) that export a Fixture object. Component imports those instead. - Add a shared types.ts so the fixture and component agree on shape. - Move MP3s to fern/docs/assets/vad-events/ (Fern's static asset root). Each fixture's `audio` field now points at /assets/vad-events/<name>.mp3 which is the served URL on the rendered site. - Update prep script accordingly so future regens emit the TS format + write MP3s to the new asset location.
…of page Two fixes from review feedback on the rendered preview. 1. Play/pause did nothing because the audio src pointed at /assets/vad-events/<name>.mp3, which 404s on docs.smallest.ai. Fern does not serve fern/docs/assets/ at runtime — that path is for MDX compile-time references only (e.g. <img src="../../docs/assets/...">). Custom React components running at runtime have no equivalent. Fix: embed each MP3 as a base64 data URL directly in the fixture TS module. Self-contained, no external CDN, no CSP issue. 2. Move the demo from the top of the page to the bottom after Notes, so readers see the param + payload + behavior caveats before the interactive walkthrough. Fixture sizes after embedding: - clean.ts: 80 KB - multi-turn.ts: 152 KB - no-tail.ts: 64 KB
…ix sequence ordering
Three improvements after reviewing the rendered demo.
1. **Maverick voice via Lightning v3.1 Pro** for the audio samples. More
natural than the cookbook test audio and showcases real product output.
Dialogue is conversational with fillers ('um', 'so', 'yeah') so the
samples represent real call audio, not test reads.
2. **Real word timestamps** on transcripts. The Pulse WS does not put a
top-level timestamp on transcription messages, only on speech_started /
speech_ended. Previous fixtures synthesized transcript times by spreading
them evenly across the audio duration. Now the prep script enables
word_timestamps=true and uses words[-1].end for each transcript.
3. **Sort by timestamp** so the demo timeline is monotonic. With real word
timestamps, partials emitted before silence had earlier times than the
speech_ended that follows them in wire order — the highlight picker
walked off the end of the array. Sorting the events by their assigned
timeline timestamp puts them in playback order.
Component picker also fixed defensively: scans the full array instead of
stopping at the first out-of-order timestamp.
Sequences captured live from prod:
- clean : speech_started @0.032s, speech_ended @5.36s, 6 transcripts
- multi-turn : 2 speech_started + 2 speech_ended pairs, 6 transcripts
- no-tail : speech_started @0.384s, no speech_ended (caveat), 3 transcripts
Fixture sizes: 80 KB / 84 KB / 28 KB (base64-embedded MP3 + envelope + events).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on top of #271. Adds an interactive React component to the VAD events page that lets readers play sample audio and see the captured server message stream in sync.
What ships
Three fixtures, three behaviors
Every event payload in the fixtures comes directly from the production endpoint. The component does not connect to a WebSocket at runtime; everything is static.
Behavior
Why this design
Two ideas, one demo:
The Mermaid sequence diagram below the demo is the formal contract; the interactive demo is the hands-on layer above it.
Verification
Test plan