feat(widget): add push-to-talk STT transcription endpoint#847
feat(widget): add push-to-talk STT transcription endpoint#847amreetkhuntia wants to merge 1 commit into
Conversation
Add a one-shot speech-to-text path for the chat widget's push-to-talk
button: POST /widget/session/{id}/transcribe accepts a short audio clip
and returns the transcript for the user to review before sending.
- New app/ai/voice/stt/transcribe.py: direct provider-routed REST
transcription (OpenAI Whisper / Deepgram / Sarvam), with Whisper
fallback for streaming-only providers (Soniox/Google) and transient
failures. Provider is chosen from the template's stt_configuration.
- Widget route + handler with ownership, per-merchant IP rate limiting
(transcribe bucket), empty/oversize guards, and CORS preflight.
- WidgetTranscribeResponse schema; WIDGET_STT_MAX_AUDIO_BYTES as a
Redis-backed dynamic config dial.
- Docs: CHAT_MODE.md endpoint entry.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
WalkthroughA new push-to-talk transcription endpoint ( ChangesWidget Push-to-Talk Transcription
Sequence Diagram(s)sequenceDiagram
participant Client
participant WidgetRouter as Widget Router
participant Handler as transcribe_widget_audio_handler
participant STT as transcribe_audio
participant Provider as OpenAI/Deepgram/Sarvam
Client->>WidgetRouter: POST /widget/session/{id}/transcribe (multipart audio)
WidgetRouter->>Handler: audio, session context
Handler->>Handler: enforce widget active, rate bucket, session ownership, non-ENDED
Handler->>Handler: read bytes, enforce WIDGET_STT_MAX_AUDIO_BYTES
Handler->>STT: transcribe_audio(bytes, content_type, provider)
STT->>Provider: POST audio (REST)
Provider-->>STT: transcript text
STT-->>Handler: Transcription(text, provider)
Handler-->>Client: WidgetTranscribeResponse{text, provider}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/api/routers/breeze_buddy/widget/handlers.py`:
- Around line 414-419: In the exception handler for TranscriptionError in the
widget transcribe function, the HTTPException being raised needs to preserve the
exception chain context. Add `from e` to the raise statement for the
HTTPException so that the original TranscriptionError exception context is
retained in the traceback, enabling better debugging of the provider failure.
- Around line 384-395: The current implementation calls await audio.read() which
loads the entire audio payload into memory before enforcing the size limit,
creating a potential memory spike for oversized uploads. Instead of reading all
data at once and then checking its length, modify the code to read the audio
data in chunks and enforce the max_bytes limit (obtained from await
WIDGET_STT_MAX_AUDIO_BYTES()) during the read operation itself, raising the
HTTPException with status HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the
cumulative chunk size exceeds the configured limit, rather than after all data
has been loaded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e87218ae-a051-49bc-b0ef-aa5a8c7bad21
📒 Files selected for processing (7)
app/ai/voice/stt/__init__.pyapp/ai/voice/stt/transcribe.pyapp/api/routers/breeze_buddy/widget/__init__.pyapp/api/routers/breeze_buddy/widget/handlers.pyapp/core/config/dynamic.pyapp/schemas/breeze_buddy/chat.pydocs/CHAT_MODE.md
| data = await audio.read() | ||
| if not data: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Empty audio upload", | ||
| ) | ||
| max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES() | ||
| if len(data) > max_bytes: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | ||
| detail=f"Audio exceeds the {max_bytes}-byte limit", | ||
| ) |
There was a problem hiding this comment.
Enforce upload size while reading to prevent unbounded memory spikes.
await audio.read() loads the entire payload before the size gate, so oversized uploads can consume arbitrary worker memory before returning 413. Read in chunks and fail immediately once the configured limit is crossed.
Proposed fix
- data = await audio.read()
- if not data:
- raise HTTPException(
- status_code=status.HTTP_400_BAD_REQUEST,
- detail="Empty audio upload",
- )
max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES()
- if len(data) > max_bytes:
- raise HTTPException(
- status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
- detail=f"Audio exceeds the {max_bytes}-byte limit",
- )
+ chunk_size = 1024 * 1024
+ buf = bytearray()
+ while chunk := await audio.read(chunk_size):
+ buf.extend(chunk)
+ if len(buf) > max_bytes:
+ raise HTTPException(
+ status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+ detail=f"Audio exceeds the {max_bytes}-byte limit",
+ )
+ if not buf:
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail="Empty audio upload",
+ )
+ data = bytes(buf)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 384 - 395, The
current implementation calls await audio.read() which loads the entire audio
payload into memory before enforcing the size limit, creating a potential memory
spike for oversized uploads. Instead of reading all data at once and then
checking its length, modify the code to read the audio data in chunks and
enforce the max_bytes limit (obtained from await WIDGET_STT_MAX_AUDIO_BYTES())
during the read operation itself, raising the HTTPException with status
HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the cumulative chunk size
exceeds the configured limit, rather than after all data has been loaded.
| except TranscriptionError as e: | ||
| logger.warning("widget transcribe failed (session={}): {}", session_id, e) | ||
| raise HTTPException( | ||
| status_code=status.HTTP_502_BAD_GATEWAY, | ||
| detail="Transcription failed", | ||
| ) |
There was a problem hiding this comment.
Preserve traceback context when remapping to HTTP 502.
Chain the HTTPException with from e so provider failure context is retained.
Proposed fix
except TranscriptionError as e:
logger.warning("widget transcribe failed (session={}): {}", session_id, e)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="Transcription failed",
- )
+ ) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except TranscriptionError as e: | |
| logger.warning("widget transcribe failed (session={}): {}", session_id, e) | |
| raise HTTPException( | |
| status_code=status.HTTP_502_BAD_GATEWAY, | |
| detail="Transcription failed", | |
| ) | |
| except TranscriptionError as e: | |
| logger.warning("widget transcribe failed (session={}): {}", session_id, e) | |
| raise HTTPException( | |
| status_code=status.HTTP_502_BAD_GATEWAY, | |
| detail="Transcription failed", | |
| ) from e |
🧰 Tools
🪛 Ruff (0.15.17)
[warning] 416-419: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 414 - 419, In
the exception handler for TranscriptionError in the widget transcribe function,
the HTTPException being raised needs to preserve the exception chain context.
Add `from e` to the raise statement for the HTTPException so that the original
TranscriptionError exception context is retained in the traceback, enabling
better debugging of the provider failure.
Source: Linters/SAST tools
Add a one-shot speech-to-text path for the chat widget's push-to-talk button: POST /widget/session/{id}/transcribe accepts a short audio clip and returns the transcript for the user to review before sending.
Summary by CodeRabbit
New Features
Documentation