Skip to content

feat(voice): widget voice-as-chat — stream mode through the chat brain#845

Merged
swaroopvarma1 merged 1 commit into
releasefrom
feat/widget-voice-as-chat-backend
Jun 18, 2026
Merged

feat(voice): widget voice-as-chat — stream mode through the chat brain#845
swaroopvarma1 merged 1 commit into
releasefrom
feat/widget-voice-as-chat-backend

Conversation

@swaroopvarma1

@swaroopvarma1 swaroopvarma1 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What

Re-architects Buddy Assist widget voice from a dual-brain design (a separate voice FlowManager+LLM kept in sync with the chat ChatAgent) to stream mode (ExecutionMode.DAILY_STREAM — STT in / TTS out, no LLM in the pipeline) driven by the existing chat ChatAgent. Voice becomes pure audio I/O around one brain, so cart_id, content-block history, agent_state, carousels, and HITL are all inherited from chat for free.

Telephony (Twilio/Plivo/Exotel) is untouched — it never uses DAILY_STREAM and runs no chat session.

Key changes

  • chat/turn_core.py (new) — channel-agnostic run_chat_turn (+ approval continuation) factored out of the chat HTTP handler so the voice subprocess drives the same brain.
  • chat/voice_bridge.py (new) — WidgetVoiceBridge taps on_user_turn_stoppedrun_chat_turn → adapts the SSE stream into TTSSpeakFrame + RTVI events; holds the per-session Redis lock per turn; barge-in cancels the in-flight turn.
  • HITL over voice rides the chat approval path: the gated call surfaces an inline, persistent approval card (same kind:'approval' message as chat); the prompt is spoken audio-only so the card text isn't duplicated.
  • Forward function-call-started/function-call-completed RTVI events so the widget can show a "thinking / executing " state on the voice orb.
  • Carousel-click injection (ui-action) during voice.
  • Deletes the dual-brain sync layer (prepare_resume_node, voice drain, ui-blocks-for-voice, agent_state seeding).

Deferred (out of this PR)

  • Generative UI output over RTVI for non-widget Daily agent-mode voice (VoiceUiStreamProcessor) is deferred. The processor module is kept but not plugged into the pipeline; the wiring was removed from agent/__init__.py, agent/pipeline.py, agent/flow.py. Widget voice (stream mode) doesn't use it — the chat ChatAgent emits/persists ui_op itself. See docs/widget/VOICE_GENERATIVE_UI_TODO.md for the re-wiring checklist. (coerce_ui_action_text / click-to-talk is unrelated and stays.)

Testing

  • uv run pyrefly check — 0 errors
  • uv run black --check / isort / autoflake — clean
  • JWT_SECRET_KEY=test JWT_ALGORITHM=HS256 uv run pytest tests/ -q553 passed, 1 xfailed
  • Manual E2E on the local harness: voice↔chat parity (cart inheritance, carousels), inline HITL approval (appears mid-call, persists with resolved badge), barge-in stops TTS mid-sentence.

Pairs with the loom frontend PR (SDK + widget).

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e193bdaa-51d9-40be-8be5-f5b6987b515b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Implements widget voice-as-chat Architecture v2: a new WidgetVoiceBridge connects stream-mode Daily voice sessions to the existing ChatAgent via an extracted turn_core module, replacing the v1 resume-seed/drain path. A VoiceUiStreamProcessor strips <ui_stream> markers from LLM text and emits RTVI ui-op events; claim_tool_approval provides atomic HITL decision claiming; AccumulatingSpeechTimeoutStrategy is removed in favor of the Pipecat 1.1.0 native strategy.

Changes

Widget Voice-as-Chat v2: WidgetVoiceBridge + Generative Voice UI

Layer / File(s) Summary
VoiceUiStreamProcessor and coerce_ui_action_text
app/ai/voice/agents/breeze_buddy/processors/voice_ui_stream.py, app/ai/voice/agents/breeze_buddy/processors/__init__.py
New VoiceUiStreamProcessor buffers LLM text frames, extracts <ui_stream> op lines via UiStreamExtractor, strips markers from prose forwarded to TTS, emits validated RTVI ui-op events via an async callback, and resets dedup state per response. coerce_ui_action_text validates/trims client ui-action msg payloads. Registered in the package __all__.
Shared chat brain: turn_core + claim_tool_approval
app/ai/voice/agents/breeze_buddy/chat/turn_core.py, app/ai/voice/agents/breeze_buddy/chat/approvals.py
turn_core.py is a new channel-agnostic module exporting run_chat_turn, run_chat_approval_turn, run_chat_approval_continuation, build_render_template_vars, and resolve_llm_configuration—encapsulating history replay, dangling tool repair, agent state loading, and ChatAgent streaming. approvals.py adds ApprovalClaim dataclass and claim_tool_approval with atomic decide_tool_approval + race-fallback re-read + sibling tracking.
WidgetVoiceBridge: SSE→TTS/RTVI adaptation
app/ai/voice/agents/breeze_buddy/chat/voice_bridge.py
New WidgetVoiceBridge manages a single in-flight turn task with generation counters for barge-in cancellation, serializes session writes via a Redis lock (skipping with RTVI error when busy), adapts chat SSE events into sentence-chunked TTSSpeakFrame audio and RTVI events (ui-op, HITL approval request/resolution, turn-end, error), speaks a filler phrase before slow tool calls, and gates greeting on first attachment.
Agent state, pipeline wiring, and flow config
app/ai/voice/agents/breeze_buddy/agent/__init__.py, app/ai/voice/agents/breeze_buddy/agent/pipeline.py, app/ai/voice/agents/breeze_buddy/agent/flow.py, app/ai/voice/agents/breeze_buddy/template/interruption.py, app/ai/voice/agents/breeze_buddy/template/session_state.py
Agent adds _voice_ui_allowlist and _voice_bridge state, wires WidgetVoiceBridge to user aggregator events, routes RTVI approval/ui-action messages to bridge or agent path, and adds _resolve_voice_ui_allowlist. build_pipeline gains ui_emit/ui_allowlist params with conditional VoiceUiStreamProcessor splice; create_pipeline_task adds ignored_rtvi_sources. build_flow_config gains ui_allowlist; prepare_resume_node is removed. AccumulatingSpeechTimeoutStrategy is replaced by SpeechTimeoutUserTurnStopStrategy.
Chat HTTP handler delegation to turn_core
app/api/routers/breeze_buddy/chat/handlers.py, app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py
/message handler delegates to run_chat_turn; /approve handler uses claim_tool_approval and delegates resume to run_chat_approval_continuation; inline history replay, ChatAgent construction, and approval row logic are removed. end_conversation replaces the drain pathway with voice_bridge.aclose() then flip_chat_session_to_chat.
Widget session management, execution mode, and DB cleanup
app/api/routers/breeze_buddy/widget/handlers.py, app/database/accessor/breeze_buddy/..., app/database/queries/breeze_buddy/..., app/schemas/breeze_buddy/chat.py
Widget handlers add _template_voice_enabled gating, set ExecutionMode.DAILY_STREAM for lead creation/reset, simplify voice_connect_handler seed construction, and add guarded flip_chat_session_to_chat in voice_end_handler. reset_widget_voice_lead gains execution_mode param. drain_voice_into_chat_session accessor and query are deleted. voice_enabled: bool added to CreateWidgetSessionResponse and WidgetSessionStateResponse.
Tests and documentation
tests/test_turn_core.py, tests/test_voice_bridge.py, tests/test_voice_ui_stream.py, tests/test_turn_stop_strategy.py, docs/DAILY_RTVI_EVENTS.md, docs/widget/VOICE_AS_CHAT.md
Tests for turn_core (missing session/template, supersede ordering, approval outcomes), VoiceUiStreamProcessor (marker stripping, allowlist gating, known-id reset per response), WidgetVoiceBridge (sentence aggregation, filler, barge-in, lock semantics, HITL cards), and SpeechTimeoutUserTurnStopStrategy (regression confirming AccumulatingSpeechTimeoutStrategy removal). DAILY_RTVI_EVENTS.md documents ui-op/ui-action events; VOICE_AS_CHAT.md defines Architecture v2.

Sequence Diagram(s)

sequenceDiagram
  participant Widget as Widget Client
  participant VoiceBot as Daily Voice Bot (Agent)
  participant WidgetVoiceBridge
  participant TurnCore as turn_core (ChatAgent)
  participant Redis as Redis Lock
  participant DB as Database

  Widget->>VoiceBot: user speech (STT finalized)
  VoiceBot->>WidgetVoiceBridge: handle_user_turn(transcript)
  WidgetVoiceBridge->>WidgetVoiceBridge: cancel_inflight(), bump generation
  WidgetVoiceBridge->>Redis: acquire(session_lock)
  Redis-->>WidgetVoiceBridge: acquired
  WidgetVoiceBridge->>TurnCore: run_chat_turn(session_id, content)
  TurnCore->>DB: load session, template, history, agent_state
  TurnCore->>TurnCore: ChatAgent.run_turn(history, agent_state)
  TurnCore-->>WidgetVoiceBridge: SSEEvent stream (assistant_token, ui_op, turn_end)
  WidgetVoiceBridge->>VoiceBot: TTSSpeakFrame(sentence)
  WidgetVoiceBridge->>Widget: RTVI emit("ui-op", op)
  WidgetVoiceBridge->>Widget: RTVI emit("turn-end", status)
  WidgetVoiceBridge->>Redis: release(session_lock)

  Widget->>VoiceBot: RTVI "ui-action" (carousel click)
  VoiceBot->>WidgetVoiceBridge: handle_user_turn(coerced_text)
  note over WidgetVoiceBridge,TurnCore: same turn flow as above
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • juspay/clairvoyance#778: Introduced prepare_resume_node and widget-mode resume/seed logic in agent/__init__.py and flow.py that this PR removes in favor of the WidgetVoiceBridge stream path.
  • juspay/clairvoyance#824: Established ApprovalManager wiring and on_client_message decision validation that this PR extends with the _voice_bridge.handle_approval_decision(...) branch for stream widget voice.
  • juspay/clairvoyance#835: Added agent state persistence and rehydration during CHAT↔VOICE resume that this PR supersedes by removing the resume seed/carry-forward pattern entirely.

Suggested reviewers

  • Tara-ag

Poem

🐰 Hop, hop! The drain is gone, no seeds to stow,
The bridge now streams the chat brain's SSE flow.
ui_stream markers stripped, so TTS won't say JSON—
The filler says "Just a sec!" and then the turn rolls on.
One chat brain rules them all, from widget voice to chat,
Generation counters guard the barge-in, just like that! 🎙️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main architectural change: widget voice now operates through the chat brain in stream mode, which is the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/widget-voice-as-chat-backend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
app/ai/voice/agents/breeze_buddy/chat/turn_core.py (1)

366-372: 💤 Low value

Consider sorting __all__ alphabetically.

Ruff flags this as unsorted. While not a functional issue, sorting aids readability and reduces merge conflicts.

♻️ Suggested sort
 __all__ = [
-    "run_chat_turn",
-    "run_chat_approval_turn",
+    "build_render_template_vars",
+    "resolve_llm_configuration",
     "run_chat_approval_continuation",
-    "build_render_template_vars",
-    "resolve_llm_configuration",
+    "run_chat_approval_turn",
+    "run_chat_turn",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/chat/turn_core.py` around lines 366 - 372,
The __all__ list in turn_core.py is not sorted alphabetically, which Ruff flags
as unsorted. Rearrange the items in the __all__ list (containing
"run_chat_turn", "run_chat_approval_turn", "run_chat_approval_continuation",
"build_render_template_vars", and "resolve_llm_configuration") in alphabetical
order to comply with Ruff's linting requirements and improve readability.
app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py (1)

21-21: 💤 Low value

Add noqa comment for consistency with the drain exception handler.

The exception handler at line 154 intentionally catches a blind Exception for best-effort behavior (matching line 141's pattern), but is missing the # noqa: BLE001 comment for consistency and to silence the static analysis warning.

Suggested fix
-            except Exception as flip_err:
+            except Exception as flip_err:  # noqa: BLE001

Also applies to: 120-159

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py` at
line 21, Add the `# noqa: BLE001` comment to the bare exception handler(s) in
the end_conversation.py file to suppress the static analysis warning about
catching a blind Exception. Locate the exception handler(s) that catch generic
Exception (particularly around line 154) and add the noqa comment at the end of
the except clause line to maintain consistency with other intentional bare
exception catches in the file that already have this annotation for best-effort
error handling behavior.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/widget/VOICE_AS_CHAT.md`:
- Around line 259-264: The fenced code block containing the conditional logic
that checks voiceLive and voiceSession is missing a language identifier, which
causes markdownlint MD040 violations. Add the language identifier "typescript"
to the opening fence of this code block by changing the opening triple backticks
from ``` to ```typescript to properly mark the code language.

In `@tests/test_turn_core.py`:
- Around line 186-196: The test function `test_approval_turn_already_decided`
only validates the first event and is missing an assertion for the terminal
`turn_end` event that should close the turn. Add an additional assertion after
the existing assertions to verify that the last event in the events list has
event type equal to "turn_end" to ensure the turn properly completes and prevent
regressions that drop this terminal event.

In `@tests/test_voice_bridge.py`:
- Around line 244-263: The test function test_barge_in_cancel_drops_tail uses a
fixed asyncio.sleep(0.02) call to coordinate timing between when the first
sentence is spoken and when the barge-in should occur, which makes the test
unreliable on slower CI runners. Replace the fixed sleep with a deterministic
synchronization mechanism by introducing an asyncio.Event that signals when the
first sentence has been fully flushed and spoken, then await that event instead
of using the hardcoded sleep duration. This ensures the test waits for the
actual condition to be met rather than guessing at timing.

---

Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/chat/turn_core.py`:
- Around line 366-372: The __all__ list in turn_core.py is not sorted
alphabetically, which Ruff flags as unsorted. Rearrange the items in the __all__
list (containing "run_chat_turn", "run_chat_approval_turn",
"run_chat_approval_continuation", "build_render_template_vars", and
"resolve_llm_configuration") in alphabetical order to comply with Ruff's linting
requirements and improve readability.

In `@app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py`:
- Line 21: Add the `# noqa: BLE001` comment to the bare exception handler(s) in
the end_conversation.py file to suppress the static analysis warning about
catching a blind Exception. Locate the exception handler(s) that catch generic
Exception (particularly around line 154) and add the noqa comment at the end of
the except clause line to maintain consistency with other intentional bare
exception catches in the file that already have this annotation for best-effort
error handling behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46690370-eb85-4293-984e-790248b7bae3

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7f891 and baa991c.

📒 Files selected for processing (24)
  • app/ai/voice/agents/breeze_buddy/agent/__init__.py
  • app/ai/voice/agents/breeze_buddy/agent/flow.py
  • app/ai/voice/agents/breeze_buddy/agent/pipeline.py
  • app/ai/voice/agents/breeze_buddy/chat/approvals.py
  • app/ai/voice/agents/breeze_buddy/chat/turn_core.py
  • app/ai/voice/agents/breeze_buddy/chat/voice_bridge.py
  • app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py
  • app/ai/voice/agents/breeze_buddy/processors/__init__.py
  • app/ai/voice/agents/breeze_buddy/processors/voice_ui_stream.py
  • app/ai/voice/agents/breeze_buddy/template/interruption.py
  • app/ai/voice/agents/breeze_buddy/template/session_state.py
  • app/api/routers/breeze_buddy/chat/handlers.py
  • app/api/routers/breeze_buddy/widget/handlers.py
  • app/database/accessor/breeze_buddy/chat_session.py
  • app/database/accessor/breeze_buddy/lead_call_tracker.py
  • app/database/queries/breeze_buddy/chat_session.py
  • app/database/queries/breeze_buddy/lead_call_tracker.py
  • app/schemas/breeze_buddy/chat.py
  • docs/DAILY_RTVI_EVENTS.md
  • docs/widget/VOICE_AS_CHAT.md
  • tests/test_turn_core.py
  • tests/test_turn_stop_strategy.py
  • tests/test_voice_bridge.py
  • tests/test_voice_ui_stream.py
💤 Files with no reviewable changes (2)
  • app/database/accessor/breeze_buddy/chat_session.py
  • app/database/queries/breeze_buddy/chat_session.py

Comment on lines +259 to +264
```
if (voiceLive && voiceSession) {
store.appendUserBubble(action.display ?? action.msg); // optimistic (no server echo)
store.sendUserAction({type:'to_assistant', msg: action.msg, display: action.display});
} else { send(cleaned, bubble); } // chat path unchanged
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

This block is missing a language tag and trips markdownlint MD040.

✅ Suggested lint fix
-```
+```typescript
 if (voiceLive && voiceSession) {
   store.appendUserBubble(action.display ?? action.msg);   // optimistic (no server echo)
   store.sendUserAction({type:'to_assistant', msg: action.msg, display: action.display});
 } else { send(cleaned, bubble); }                           // chat path unchanged
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/widget/VOICE_AS_CHAT.md` around lines 259 - 264, The fenced code block
containing the conditional logic that checks voiceLive and voiceSession is
missing a language identifier, which causes markdownlint MD040 violations. Add
the language identifier "typescript" to the opening fence of this code block by
changing the opening triple backticks from ``` to ```typescript to properly mark
the code language.

Source: Linters/SAST tools

Comment thread tests/test_turn_core.py
Comment on lines +186 to +196
async def test_approval_turn_already_decided(monkeypatch):
async def _claim(session_id, tool_call_id, approved, reason):
return ApprovalClaim(outcome="already_decided", winning_status="denied")

monkeypatch.setattr(tc, "claim_tool_approval", _claim)
events = await _collect(
tc.run_chat_approval_turn(session_id="s", tool_call_id="tc1", approved=False)
)
assert events[0].event == "function_approval_resolved"
assert events[0].data["status"] == "denied"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert terminal turn_end in the already_decided approval test.

This case currently validates only the first event, so a regression that drops the terminal completion event would still pass.

✅ Suggested test hardening
 async def test_approval_turn_already_decided(monkeypatch):
@@
     events = await _collect(
         tc.run_chat_approval_turn(session_id="s", tool_call_id="tc1", approved=False)
     )
-    assert events[0].event == "function_approval_resolved"
+    assert [e.event for e in events] == ["function_approval_resolved", "turn_end"]
     assert events[0].data["status"] == "denied"
+    assert events[1].data["session_status"] == "ACTIVE"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_turn_core.py` around lines 186 - 196, The test function
`test_approval_turn_already_decided` only validates the first event and is
missing an assertion for the terminal `turn_end` event that should close the
turn. Add an additional assertion after the existing assertions to verify that
the last event in the events list has event type equal to "turn_end" to ensure
the turn properly completes and prevent regressions that drop this terminal
event.

Comment on lines +244 to +263
async def test_barge_in_cancel_drops_tail(monkeypatch):
gate = asyncio.Event()

async def _gen(*, session_id, user_content, llm=None, context_placement=None):
yield SSEEvent(
"assistant_token", {"delta": "This is the first sentence here. "}
)
await gate.wait() # block until the test releases (it won't — cancelled)
yield SSEEvent("assistant_token", {"delta": "Tail that must be dropped."})
yield SSEEvent("turn_end", {"session_status": "ACTIVE"})

monkeypatch.setattr(vb, "run_chat_turn", _gen)
bridge, task, _ = _make_bridge()
await bridge.handle_user_turn("hi")
inflight = bridge._inflight
assert inflight is not None
# Let the first sentence flush, then barge in.
await asyncio.sleep(0.02)
assert _spoken(task) == ["This is the first sentence here."]
await bridge.cancel_inflight()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid fixed sleep in barge-in test to prevent flakiness.

Using asyncio.sleep(0.02) makes this test timing-sensitive across slower CI runners.

✅ Deterministic synchronization approach
 async def test_barge_in_cancel_drops_tail(monkeypatch):
     gate = asyncio.Event()
+    first_chunk_seen = asyncio.Event()
@@
     async def _gen(*, session_id, user_content, llm=None, context_placement=None):
         yield SSEEvent(
             "assistant_token", {"delta": "This is the first sentence here. "}
         )
+        first_chunk_seen.set()
         await gate.wait()  # block until the test releases (it won't — cancelled)
         yield SSEEvent("assistant_token", {"delta": "Tail that must be dropped."})
         yield SSEEvent("turn_end", {"session_status": "ACTIVE"})
@@
-    await asyncio.sleep(0.02)
+    await asyncio.wait_for(first_chunk_seen.wait(), timeout=1.0)
     assert _spoken(task) == ["This is the first sentence here."]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_voice_bridge.py` around lines 244 - 263, The test function
test_barge_in_cancel_drops_tail uses a fixed asyncio.sleep(0.02) call to
coordinate timing between when the first sentence is spoken and when the
barge-in should occur, which makes the test unreliable on slower CI runners.
Replace the fixed sleep with a deterministic synchronization mechanism by
introducing an asyncio.Event that signals when the first sentence has been fully
flushed and spoken, then await that event instead of using the hardcoded sleep
duration. This ensures the test waits for the actual condition to be met rather
than guessing at timing.

@swaroopvarma1 swaroopvarma1 force-pushed the feat/widget-voice-as-chat-backend branch from baa991c to ad7e4b7 Compare June 18, 2026 12:19

@Tara-ag Tara-ag left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary: PR #845 — Widget Voice-as-Chat (Stream Mode)

Overview

This PR re-architects widget voice from a dual-brain design to stream mode (ExecutionMode.DAILY_STREAM), where voice becomes pure audio I/O around the existing chat ChatAgent. Key achievements:

  • New Core Modules:

    • chat/turn_core.py — Channel-agnostic run_chat_turn factored out of HTTP handler
    • chat/voice_bridge.pyWidgetVoiceBridge adapting SSE stream to TTS + RTVI events
    • processors/voice_ui_stream.py — Generative UI over RTVI (deferred but kept)
  • Test Coverage: 4 new test files (1,082 lines) covering turn core, voice bridge, turn stop strategy, and voice UI stream

Security & Safety Analysis

Area Status Notes
SQL Injection ✅ Safe All queries use $1, $2... positional placeholders via run_parameterized_query()
Migration Integrity ✅ Safe No existing migration files modified; no new migrations needed (uses existing schema)
Secrets ✅ Clean No hardcoded credentials; secrets flow through KMS-encrypted DB
Auth/Authorization ✅ Consistent Voice bridge uses same Redis lock key as HTTP paths (chat:session:{id}:lock)
Input Validation ✅ Present coerce_ui_action_text() validates/truncates ui-action messages

Key Implementation Highlights

  1. SQL Safety Verified:

    • app/database/queries/breeze_buddy/chat_session.py — All queries use parameterized placeholders
    • No f-string or %-formatting in SQL construction
    • JSONB values passed as bound parameters ($N::jsonb)
  2. Race Condition Prevention:

    • Per-session Redis lock (_SESSION_LOCK_TTL_SECONDS = 180) serializes writers
    • Lock acquired in _drive() before DB operations, released in finally with asyncio.shield()
    • Generation counter prevents tail frame leakage on barge-in cancel
  3. Error Handling:

    • CancelledError uncancel pattern (Python 3.11 safe)
    • Lock release shielded from cancellation
    • Turn errors emit RTVI error events rather than crashing the pipeline

Existing Comments Acknowledged

  • CodeRabbit's 3 minor suggestions (markdown lint, test assertion, deterministic test sync) are non-blocking and can be addressed in follow-up

Approval

No blocking issues found. The implementation follows project conventions for:

  • SQL parameterization (asyncpg $N style)
  • Layered architecture (queries → accessor → decoder)
  • Redis-backed distributed locking
  • Fail-open degradation with proper logging

Approved for merge.

@swaroopvarma1 swaroopvarma1 force-pushed the feat/widget-voice-as-chat-backend branch 3 times, most recently from 8ad4700 to b97e865 Compare June 18, 2026 15:26
Re-architect Buddy Assist widget voice from a dual-brain design (a separate
voice FlowManager+LLM kept in sync with the chat ChatAgent) to stream mode
(ExecutionMode.DAILY_STREAM: STT in / TTS out, no LLM in the pipeline) driven
by the existing chat ChatAgent. Voice becomes pure audio I/O around one brain,
so cart_id, content-block history, agent_state, carousels and HITL are all
inherited from chat for free. Telephony (Twilio/Plivo/Exotel) is untouched.

- chat/turn_core.py: channel-agnostic run_chat_turn (+ approval continuation)
  factored out of the chat HTTP handler so the voice subprocess drives the
  same brain.
- chat/voice_bridge.py: WidgetVoiceBridge taps on_user_turn_stopped ->
  run_chat_turn -> adapts the SSE stream into TTSSpeakFrame + RTVI events;
  holds the per-session Redis lock per turn; barge-in cancels the in-flight
  turn.
- HITL over voice rides the chat approval path: the gated call surfaces an
  inline, persistent approval card (same kind:'approval' message as chat) and
  the prompt is spoken audio-only so the card text isn't duplicated.
- Forward function-call-started/completed RTVI events so the widget can show a
  "thinking / executing <tool>" state on the voice orb (the bridge previously
  consumed function_call_started only for the TTS filler).
- Carousel-click injection (ui-action) during voice. Generative-UI *output*
  over RTVI for non-widget Daily agent-mode is DEFERRED — VoiceUiStreamProcessor
  is kept but not plugged into the agent pipeline; see
  docs/widget/VOICE_GENERATIVE_UI_TODO.md.
- Delete the dual-brain sync layer (prepare_resume_node, voice drain,
  ui-blocks-for-voice, agent_state seeding).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@swaroopvarma1 swaroopvarma1 force-pushed the feat/widget-voice-as-chat-backend branch from b97e865 to 63c1810 Compare June 18, 2026 15:29
@swaroopvarma1 swaroopvarma1 merged commit 3990db9 into release Jun 18, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants