feat(voice): widget voice-as-chat — stream mode through the chat brain#845
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughImplements widget voice-as-chat Architecture v2: a new ChangesWidget Voice-as-Chat v2: WidgetVoiceBridge + Generative Voice UI
Sequence Diagram(s)sequenceDiagram
participant Widget as Widget Client
participant VoiceBot as Daily Voice Bot (Agent)
participant WidgetVoiceBridge
participant TurnCore as turn_core (ChatAgent)
participant Redis as Redis Lock
participant DB as Database
Widget->>VoiceBot: user speech (STT finalized)
VoiceBot->>WidgetVoiceBridge: handle_user_turn(transcript)
WidgetVoiceBridge->>WidgetVoiceBridge: cancel_inflight(), bump generation
WidgetVoiceBridge->>Redis: acquire(session_lock)
Redis-->>WidgetVoiceBridge: acquired
WidgetVoiceBridge->>TurnCore: run_chat_turn(session_id, content)
TurnCore->>DB: load session, template, history, agent_state
TurnCore->>TurnCore: ChatAgent.run_turn(history, agent_state)
TurnCore-->>WidgetVoiceBridge: SSEEvent stream (assistant_token, ui_op, turn_end)
WidgetVoiceBridge->>VoiceBot: TTSSpeakFrame(sentence)
WidgetVoiceBridge->>Widget: RTVI emit("ui-op", op)
WidgetVoiceBridge->>Widget: RTVI emit("turn-end", status)
WidgetVoiceBridge->>Redis: release(session_lock)
Widget->>VoiceBot: RTVI "ui-action" (carousel click)
VoiceBot->>WidgetVoiceBridge: handle_user_turn(coerced_text)
note over WidgetVoiceBridge,TurnCore: same turn flow as above
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
app/ai/voice/agents/breeze_buddy/chat/turn_core.py (1)
366-372: 💤 Low valueConsider sorting
__all__alphabetically.Ruff flags this as unsorted. While not a functional issue, sorting aids readability and reduces merge conflicts.
♻️ Suggested sort
__all__ = [ - "run_chat_turn", - "run_chat_approval_turn", + "build_render_template_vars", + "resolve_llm_configuration", "run_chat_approval_continuation", - "build_render_template_vars", - "resolve_llm_configuration", + "run_chat_approval_turn", + "run_chat_turn", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/chat/turn_core.py` around lines 366 - 372, The __all__ list in turn_core.py is not sorted alphabetically, which Ruff flags as unsorted. Rearrange the items in the __all__ list (containing "run_chat_turn", "run_chat_approval_turn", "run_chat_approval_continuation", "build_render_template_vars", and "resolve_llm_configuration") in alphabetical order to comply with Ruff's linting requirements and improve readability.app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py (1)
21-21: 💤 Low valueAdd
noqacomment for consistency with the drain exception handler.The exception handler at line 154 intentionally catches a blind
Exceptionfor best-effort behavior (matching line 141's pattern), but is missing the# noqa: BLE001comment for consistency and to silence the static analysis warning.Suggested fix
- except Exception as flip_err: + except Exception as flip_err: # noqa: BLE001Also applies to: 120-159
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py` at line 21, Add the `# noqa: BLE001` comment to the bare exception handler(s) in the end_conversation.py file to suppress the static analysis warning about catching a blind Exception. Locate the exception handler(s) that catch generic Exception (particularly around line 154) and add the noqa comment at the end of the except clause line to maintain consistency with other intentional bare exception catches in the file that already have this annotation for best-effort error handling behavior.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/widget/VOICE_AS_CHAT.md`:
- Around line 259-264: The fenced code block containing the conditional logic
that checks voiceLive and voiceSession is missing a language identifier, which
causes markdownlint MD040 violations. Add the language identifier "typescript"
to the opening fence of this code block by changing the opening triple backticks
from ``` to ```typescript to properly mark the code language.
In `@tests/test_turn_core.py`:
- Around line 186-196: The test function `test_approval_turn_already_decided`
only validates the first event and is missing an assertion for the terminal
`turn_end` event that should close the turn. Add an additional assertion after
the existing assertions to verify that the last event in the events list has
event type equal to "turn_end" to ensure the turn properly completes and prevent
regressions that drop this terminal event.
In `@tests/test_voice_bridge.py`:
- Around line 244-263: The test function test_barge_in_cancel_drops_tail uses a
fixed asyncio.sleep(0.02) call to coordinate timing between when the first
sentence is spoken and when the barge-in should occur, which makes the test
unreliable on slower CI runners. Replace the fixed sleep with a deterministic
synchronization mechanism by introducing an asyncio.Event that signals when the
first sentence has been fully flushed and spoken, then await that event instead
of using the hardcoded sleep duration. This ensures the test waits for the
actual condition to be met rather than guessing at timing.
---
Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/chat/turn_core.py`:
- Around line 366-372: The __all__ list in turn_core.py is not sorted
alphabetically, which Ruff flags as unsorted. Rearrange the items in the __all__
list (containing "run_chat_turn", "run_chat_approval_turn",
"run_chat_approval_continuation", "build_render_template_vars", and
"resolve_llm_configuration") in alphabetical order to comply with Ruff's linting
requirements and improve readability.
In `@app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.py`:
- Line 21: Add the `# noqa: BLE001` comment to the bare exception handler(s) in
the end_conversation.py file to suppress the static analysis warning about
catching a blind Exception. Locate the exception handler(s) that catch generic
Exception (particularly around line 154) and add the noqa comment at the end of
the except clause line to maintain consistency with other intentional bare
exception catches in the file that already have this annotation for best-effort
error handling behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 46690370-eb85-4293-984e-790248b7bae3
📒 Files selected for processing (24)
app/ai/voice/agents/breeze_buddy/agent/__init__.pyapp/ai/voice/agents/breeze_buddy/agent/flow.pyapp/ai/voice/agents/breeze_buddy/agent/pipeline.pyapp/ai/voice/agents/breeze_buddy/chat/approvals.pyapp/ai/voice/agents/breeze_buddy/chat/turn_core.pyapp/ai/voice/agents/breeze_buddy/chat/voice_bridge.pyapp/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.pyapp/ai/voice/agents/breeze_buddy/processors/__init__.pyapp/ai/voice/agents/breeze_buddy/processors/voice_ui_stream.pyapp/ai/voice/agents/breeze_buddy/template/interruption.pyapp/ai/voice/agents/breeze_buddy/template/session_state.pyapp/api/routers/breeze_buddy/chat/handlers.pyapp/api/routers/breeze_buddy/widget/handlers.pyapp/database/accessor/breeze_buddy/chat_session.pyapp/database/accessor/breeze_buddy/lead_call_tracker.pyapp/database/queries/breeze_buddy/chat_session.pyapp/database/queries/breeze_buddy/lead_call_tracker.pyapp/schemas/breeze_buddy/chat.pydocs/DAILY_RTVI_EVENTS.mddocs/widget/VOICE_AS_CHAT.mdtests/test_turn_core.pytests/test_turn_stop_strategy.pytests/test_voice_bridge.pytests/test_voice_ui_stream.py
💤 Files with no reviewable changes (2)
- app/database/accessor/breeze_buddy/chat_session.py
- app/database/queries/breeze_buddy/chat_session.py
| ``` | ||
| if (voiceLive && voiceSession) { | ||
| store.appendUserBubble(action.display ?? action.msg); // optimistic (no server echo) | ||
| store.sendUserAction({type:'to_assistant', msg: action.msg, display: action.display}); | ||
| } else { send(cleaned, bubble); } // chat path unchanged | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
This block is missing a language tag and trips markdownlint MD040.
✅ Suggested lint fix
-```
+```typescript
if (voiceLive && voiceSession) {
store.appendUserBubble(action.display ?? action.msg); // optimistic (no server echo)
store.sendUserAction({type:'to_assistant', msg: action.msg, display: action.display});
} else { send(cleaned, bubble); } // chat path unchanged</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/VOICE_AS_CHAT.md` around lines 259 - 264, The fenced code block
containing the conditional logic that checks voiceLive and voiceSession is
missing a language identifier, which causes markdownlint MD040 violations. Add
the language identifier "typescript" to the opening fence of this code block by
changing the opening triple backticks from ``` to ```typescript to properly mark
the code language.
Source: Linters/SAST tools
| async def test_approval_turn_already_decided(monkeypatch): | ||
| async def _claim(session_id, tool_call_id, approved, reason): | ||
| return ApprovalClaim(outcome="already_decided", winning_status="denied") | ||
|
|
||
| monkeypatch.setattr(tc, "claim_tool_approval", _claim) | ||
| events = await _collect( | ||
| tc.run_chat_approval_turn(session_id="s", tool_call_id="tc1", approved=False) | ||
| ) | ||
| assert events[0].event == "function_approval_resolved" | ||
| assert events[0].data["status"] == "denied" | ||
|
|
There was a problem hiding this comment.
Assert terminal turn_end in the already_decided approval test.
This case currently validates only the first event, so a regression that drops the terminal completion event would still pass.
✅ Suggested test hardening
async def test_approval_turn_already_decided(monkeypatch):
@@
events = await _collect(
tc.run_chat_approval_turn(session_id="s", tool_call_id="tc1", approved=False)
)
- assert events[0].event == "function_approval_resolved"
+ assert [e.event for e in events] == ["function_approval_resolved", "turn_end"]
assert events[0].data["status"] == "denied"
+ assert events[1].data["session_status"] == "ACTIVE"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_turn_core.py` around lines 186 - 196, The test function
`test_approval_turn_already_decided` only validates the first event and is
missing an assertion for the terminal `turn_end` event that should close the
turn. Add an additional assertion after the existing assertions to verify that
the last event in the events list has event type equal to "turn_end" to ensure
the turn properly completes and prevent regressions that drop this terminal
event.
| async def test_barge_in_cancel_drops_tail(monkeypatch): | ||
| gate = asyncio.Event() | ||
|
|
||
| async def _gen(*, session_id, user_content, llm=None, context_placement=None): | ||
| yield SSEEvent( | ||
| "assistant_token", {"delta": "This is the first sentence here. "} | ||
| ) | ||
| await gate.wait() # block until the test releases (it won't — cancelled) | ||
| yield SSEEvent("assistant_token", {"delta": "Tail that must be dropped."}) | ||
| yield SSEEvent("turn_end", {"session_status": "ACTIVE"}) | ||
|
|
||
| monkeypatch.setattr(vb, "run_chat_turn", _gen) | ||
| bridge, task, _ = _make_bridge() | ||
| await bridge.handle_user_turn("hi") | ||
| inflight = bridge._inflight | ||
| assert inflight is not None | ||
| # Let the first sentence flush, then barge in. | ||
| await asyncio.sleep(0.02) | ||
| assert _spoken(task) == ["This is the first sentence here."] | ||
| await bridge.cancel_inflight() |
There was a problem hiding this comment.
Avoid fixed sleep in barge-in test to prevent flakiness.
Using asyncio.sleep(0.02) makes this test timing-sensitive across slower CI runners.
✅ Deterministic synchronization approach
async def test_barge_in_cancel_drops_tail(monkeypatch):
gate = asyncio.Event()
+ first_chunk_seen = asyncio.Event()
@@
async def _gen(*, session_id, user_content, llm=None, context_placement=None):
yield SSEEvent(
"assistant_token", {"delta": "This is the first sentence here. "}
)
+ first_chunk_seen.set()
await gate.wait() # block until the test releases (it won't — cancelled)
yield SSEEvent("assistant_token", {"delta": "Tail that must be dropped."})
yield SSEEvent("turn_end", {"session_status": "ACTIVE"})
@@
- await asyncio.sleep(0.02)
+ await asyncio.wait_for(first_chunk_seen.wait(), timeout=1.0)
assert _spoken(task) == ["This is the first sentence here."]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_voice_bridge.py` around lines 244 - 263, The test function
test_barge_in_cancel_drops_tail uses a fixed asyncio.sleep(0.02) call to
coordinate timing between when the first sentence is spoken and when the
barge-in should occur, which makes the test unreliable on slower CI runners.
Replace the fixed sleep with a deterministic synchronization mechanism by
introducing an asyncio.Event that signals when the first sentence has been fully
flushed and spoken, then await that event instead of using the hardcoded sleep
duration. This ensures the test waits for the actual condition to be met rather
than guessing at timing.
baa991c to
ad7e4b7
Compare
Tara-ag
left a comment
There was a problem hiding this comment.
Review Summary: PR #845 — Widget Voice-as-Chat (Stream Mode)
Overview
This PR re-architects widget voice from a dual-brain design to stream mode (ExecutionMode.DAILY_STREAM), where voice becomes pure audio I/O around the existing chat ChatAgent. Key achievements:
-
New Core Modules:
chat/turn_core.py— Channel-agnosticrun_chat_turnfactored out of HTTP handlerchat/voice_bridge.py—WidgetVoiceBridgeadapting SSE stream to TTS + RTVI eventsprocessors/voice_ui_stream.py— Generative UI over RTVI (deferred but kept)
-
Test Coverage: 4 new test files (1,082 lines) covering turn core, voice bridge, turn stop strategy, and voice UI stream
Security & Safety Analysis
| Area | Status | Notes |
|---|---|---|
| SQL Injection | ✅ Safe | All queries use $1, $2... positional placeholders via run_parameterized_query() |
| Migration Integrity | ✅ Safe | No existing migration files modified; no new migrations needed (uses existing schema) |
| Secrets | ✅ Clean | No hardcoded credentials; secrets flow through KMS-encrypted DB |
| Auth/Authorization | ✅ Consistent | Voice bridge uses same Redis lock key as HTTP paths (chat:session:{id}:lock) |
| Input Validation | ✅ Present | coerce_ui_action_text() validates/truncates ui-action messages |
Key Implementation Highlights
-
SQL Safety Verified:
app/database/queries/breeze_buddy/chat_session.py— All queries use parameterized placeholders- No f-string or %-formatting in SQL construction
- JSONB values passed as bound parameters (
$N::jsonb)
-
Race Condition Prevention:
- Per-session Redis lock (
_SESSION_LOCK_TTL_SECONDS = 180) serializes writers - Lock acquired in
_drive()before DB operations, released infinallywithasyncio.shield() - Generation counter prevents tail frame leakage on barge-in cancel
- Per-session Redis lock (
-
Error Handling:
CancelledErroruncancel pattern (Python 3.11 safe)- Lock release shielded from cancellation
- Turn errors emit RTVI error events rather than crashing the pipeline
Existing Comments Acknowledged
- CodeRabbit's 3 minor suggestions (markdown lint, test assertion, deterministic test sync) are non-blocking and can be addressed in follow-up
Approval
No blocking issues found. The implementation follows project conventions for:
- SQL parameterization (asyncpg
$Nstyle) - Layered architecture (queries → accessor → decoder)
- Redis-backed distributed locking
- Fail-open degradation with proper logging
Approved for merge.
8ad4700 to
b97e865
Compare
Re-architect Buddy Assist widget voice from a dual-brain design (a separate voice FlowManager+LLM kept in sync with the chat ChatAgent) to stream mode (ExecutionMode.DAILY_STREAM: STT in / TTS out, no LLM in the pipeline) driven by the existing chat ChatAgent. Voice becomes pure audio I/O around one brain, so cart_id, content-block history, agent_state, carousels and HITL are all inherited from chat for free. Telephony (Twilio/Plivo/Exotel) is untouched. - chat/turn_core.py: channel-agnostic run_chat_turn (+ approval continuation) factored out of the chat HTTP handler so the voice subprocess drives the same brain. - chat/voice_bridge.py: WidgetVoiceBridge taps on_user_turn_stopped -> run_chat_turn -> adapts the SSE stream into TTSSpeakFrame + RTVI events; holds the per-session Redis lock per turn; barge-in cancels the in-flight turn. - HITL over voice rides the chat approval path: the gated call surfaces an inline, persistent approval card (same kind:'approval' message as chat) and the prompt is spoken audio-only so the card text isn't duplicated. - Forward function-call-started/completed RTVI events so the widget can show a "thinking / executing <tool>" state on the voice orb (the bridge previously consumed function_call_started only for the TTS filler). - Carousel-click injection (ui-action) during voice. Generative-UI *output* over RTVI for non-widget Daily agent-mode is DEFERRED — VoiceUiStreamProcessor is kept but not plugged into the agent pipeline; see docs/widget/VOICE_GENERATIVE_UI_TODO.md. - Delete the dual-brain sync layer (prepare_resume_node, voice drain, ui-blocks-for-voice, agent_state seeding). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b97e865 to
63c1810
Compare
What
Re-architects Buddy Assist widget voice from a dual-brain design (a separate voice FlowManager+LLM kept in sync with the chat
ChatAgent) to stream mode (ExecutionMode.DAILY_STREAM— STT in / TTS out, no LLM in the pipeline) driven by the existing chatChatAgent. Voice becomes pure audio I/O around one brain, socart_id, content-block history,agent_state, carousels, and HITL are all inherited from chat for free.Telephony (Twilio/Plivo/Exotel) is untouched — it never uses
DAILY_STREAMand runs no chat session.Key changes
chat/turn_core.py(new) — channel-agnosticrun_chat_turn(+ approval continuation) factored out of the chat HTTP handler so the voice subprocess drives the same brain.chat/voice_bridge.py(new) —WidgetVoiceBridgetapson_user_turn_stopped→run_chat_turn→ adapts the SSE stream intoTTSSpeakFrame+ RTVI events; holds the per-session Redis lock per turn; barge-in cancels the in-flight turn.kind:'approval'message as chat); the prompt is spoken audio-only so the card text isn't duplicated.function-call-started/function-call-completedRTVI events so the widget can show a "thinking / executing " state on the voice orb.ui-action) during voice.prepare_resume_node, voice drain, ui-blocks-for-voice,agent_stateseeding).Deferred (out of this PR)
VoiceUiStreamProcessor) is deferred. The processor module is kept but not plugged into the pipeline; the wiring was removed fromagent/__init__.py,agent/pipeline.py,agent/flow.py. Widget voice (stream mode) doesn't use it — the chatChatAgentemits/persistsui_opitself. Seedocs/widget/VOICE_GENERATIVE_UI_TODO.mdfor the re-wiring checklist. (coerce_ui_action_text/ click-to-talk is unrelated and stays.)Testing
uv run pyrefly check— 0 errorsuv run black --check/ isort / autoflake — cleanJWT_SECRET_KEY=test JWT_ALGORITHM=HS256 uv run pytest tests/ -q— 553 passed, 1 xfailedPairs with the loom frontend PR (SDK + widget).
🤖 Generated with Claude Code