Skip to content

feat(atoms): realtime agent — register-call + wire-truth corrections + Python guide#274

Draft
abhishekmishragithub wants to merge 2 commits into
mainfrom
docs/atoms-realtime-agent-cookbook-flow
Draft

feat(atoms): realtime agent — register-call + wire-truth corrections + Python guide#274
abhishekmishragithub wants to merge 2 commits into
mainfrom
docs/atoms-realtime-agent-cookbook-flow

Conversation

@abhishekmishragithub

Copy link
Copy Markdown
Collaborator

Why

Three independent gaps in the Realtime Agent WebSocket docs surfaced when an FDE tried to build a Python client:

  1. Wire event name wrong in spec. Streaming text fires as transcript.delta (DOT) — current spec only documents transcript (final, rarely fires on prod webcall). The JS SDK normalizes dot→underscore client-side; raw WS clients (Python, mobile) miss it entirely.
  2. role enum wrong. Spec says user | agent; wire emits user | assistant (pipecat owns the role string; the platform's typed enum is stale).
  3. POST /conversation/register-call missing from spec. Endpoint is live on platform main since 2026-06-21 (atoms-platform PR #2870), referenced by the JS cookbook, but not in our OpenAPI.

What

AsyncAPI (agent-ws.yaml)

  • Add transcript.delta message: cumulative streaming text per role.
  • Update transcript and transcript.delta role enum to user | assistant.
  • Add explicit "PCM 16-bit signed little-endian, mono" to audio messages.
  • Add 200-300 ms jitter-buffer guidance on output_audio.delta.

OpenAPI (openapi.yaml)

  • Add POST /conversation/register-call as GA. Body: agent_id, optional mode (webcall|chat), optional variables. Returns access_token (wct_ prefix, 30 s TTL, single-use) + expires_in + sample_rate.

New page

  • /atoms/developer-guide/integrate/realtime-agent-python — full working Python client mirroring the JS Web SDK guide. Uses the two-step flow (POST /register-call → wct_ → WS ?token=). Includes jitter-buffer rationale, event-handling table, common-errors table. No numpy dependency.

Verification

Created a fresh test agent from the sp-medical-centre-receptionist-in template. Ran the Python client from the new docs page end-to-end against prod:

  • POST /register-call returned 201 with valid wct_ token, expires_in=30, sample_rate=24000
  • WS ?token=wct_… accepted, session.created fired
  • Agent fired opening turn, ~1 MB of output_audio.delta decoded cleanly, zero buffer underruns with the 250 ms prebuffer
  • transcript.delta events captured with role: "assistant", displayed correctly

fern check: 0 errors.

Closes

…rections + Python guide

Verified end-to-end against a fresh agent created from the
sp-medical-centre-receptionist-in template via POST /agent/from-template.

Spec changes (AsyncAPI agent-ws.yaml):
- Add `transcript.delta` message (cumulative streaming text per role). Wire
  emits dot notation; the JS SDK normalizes to underscore client-side. Raw
  WS clients (Python, mobile) must listen for the dot form.
- Update `role` enum on `transcript` and `transcript.delta` to
  `user | assistant` — pipecat emits `assistant` for the agent.
- Make audio format explicit on both `input_audio_buffer.append` and
  `output_audio.delta` — PCM 16-bit signed little-endian, mono, at the
  sample rate echoed in `session.created`.
- Add jitter-buffer guidance on `output_audio.delta` (200-300 ms pre-buffer).

Spec changes (OpenAPI openapi.yaml):
- Add POST /conversation/register-call. Issues a short-lived `wct_` token
  (30 s TTL, single-use) for browser/client WebSocket auth. Live on
  platform main since 2026-06-21 (atoms-platform PR #2870). Body accepts
  `agent_id` (required), `mode` (webcall|chat, default webcall), and
  `variables` (string|number|boolean values for prompt templating).

New page:
- Agent WebSocket (Python) at /atoms/developer-guide/integrate/realtime-agent-python.
  Mirror of the JS Web SDK guide for headless Python clients. Self-contained
  working script: register-call → wct_ token → WS, plus mic/speaker callbacks,
  base64 PCM16 handling, transcript.delta + transcript event handling, 250 ms
  jitter buffer. No numpy dependency — sounddevice CFFI buffers support
  direct slice assignment.

Closes #50, #53, #125.
@github-actions

Copy link
Copy Markdown

…rity note

- New realtime-agent-token.sh: shell helper that fetches a wct_ token from
  ATOMS_API_KEY + ATOMS_AGENT_ID. Useful when testing the JS playground,
  Framer voice components, or one-off HTML files without writing a server.
- Document webcall vs chat mode on the same WebSocket (mode query param).
- Add explicit security warning against hard-coding sk_ keys in
  browser/Framer/no-code embeds — always issue wct_ tokens server-side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant