feat(atoms): realtime agent — register-call + wire-truth corrections + Python guide#274
Draft
abhishekmishragithub wants to merge 2 commits into
Draft
feat(atoms): realtime agent — register-call + wire-truth corrections + Python guide#274abhishekmishragithub wants to merge 2 commits into
abhishekmishragithub wants to merge 2 commits into
Conversation
…rections + Python guide Verified end-to-end against a fresh agent created from the sp-medical-centre-receptionist-in template via POST /agent/from-template. Spec changes (AsyncAPI agent-ws.yaml): - Add `transcript.delta` message (cumulative streaming text per role). Wire emits dot notation; the JS SDK normalizes to underscore client-side. Raw WS clients (Python, mobile) must listen for the dot form. - Update `role` enum on `transcript` and `transcript.delta` to `user | assistant` — pipecat emits `assistant` for the agent. - Make audio format explicit on both `input_audio_buffer.append` and `output_audio.delta` — PCM 16-bit signed little-endian, mono, at the sample rate echoed in `session.created`. - Add jitter-buffer guidance on `output_audio.delta` (200-300 ms pre-buffer). Spec changes (OpenAPI openapi.yaml): - Add POST /conversation/register-call. Issues a short-lived `wct_` token (30 s TTL, single-use) for browser/client WebSocket auth. Live on platform main since 2026-06-21 (atoms-platform PR #2870). Body accepts `agent_id` (required), `mode` (webcall|chat, default webcall), and `variables` (string|number|boolean values for prompt templating). New page: - Agent WebSocket (Python) at /atoms/developer-guide/integrate/realtime-agent-python. Mirror of the JS Web SDK guide for headless Python clients. Self-contained working script: register-call → wct_ token → WS, plus mic/speaker callbacks, base64 PCM16 handling, transcript.delta + transcript event handling, 250 ms jitter buffer. No numpy dependency — sounddevice CFFI buffers support direct slice assignment. Closes #50, #53, #125.
|
🌿 Preview your docs: https://smallest-ai-preview-docs-atoms-realtime-agent-cookbook-flow.docs.buildwithfern.com Here are the markdown pages you've updated: |
…rity note - New realtime-agent-token.sh: shell helper that fetches a wct_ token from ATOMS_API_KEY + ATOMS_AGENT_ID. Useful when testing the JS playground, Framer voice components, or one-off HTML files without writing a server. - Document webcall vs chat mode on the same WebSocket (mode query param). - Add explicit security warning against hard-coding sk_ keys in browser/Framer/no-code embeds — always issue wct_ tokens server-side.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Three independent gaps in the Realtime Agent WebSocket docs surfaced when an FDE tried to build a Python client:
transcript.delta(DOT) — current spec only documentstranscript(final, rarely fires on prod webcall). The JS SDK normalizes dot→underscore client-side; raw WS clients (Python, mobile) miss it entirely.roleenum wrong. Spec saysuser | agent; wire emitsuser | assistant(pipecat owns the role string; the platform's typed enum is stale).POST /conversation/register-callmissing from spec. Endpoint is live on platform main since 2026-06-21 (atoms-platform PR #2870), referenced by the JS cookbook, but not in our OpenAPI.What
AsyncAPI (
agent-ws.yaml)transcript.deltamessage: cumulative streaming text per role.transcriptandtranscript.deltarole enum touser | assistant.output_audio.delta.OpenAPI (
openapi.yaml)POST /conversation/register-callas GA. Body:agent_id, optionalmode(webcall|chat), optionalvariables. Returnsaccess_token(wct_prefix, 30 s TTL, single-use) +expires_in+sample_rate.New page
/atoms/developer-guide/integrate/realtime-agent-python— full working Python client mirroring the JS Web SDK guide. Uses the two-step flow (POST /register-call → wct_ → WS ?token=). Includes jitter-buffer rationale, event-handling table, common-errors table. No numpy dependency.Verification
Created a fresh test agent from the
sp-medical-centre-receptionist-intemplate. Ran the Python client from the new docs page end-to-end against prod:wct_token,expires_in=30,sample_rate=24000?token=wct_…accepted,session.createdfiredoutput_audio.deltadecoded cleanly, zero buffer underruns with the 250 ms prebuffertranscript.deltaevents captured withrole: "assistant", displayed correctlyfern check: 0 errors.Closes