Skip to content

feat(widget): add push-to-talk STT transcription endpoint#847

Open
amreetkhuntia wants to merge 1 commit into
juspay:releasefrom
amreetkhuntia:feat/widget-stt-push-to-talk
Open

feat(widget): add push-to-talk STT transcription endpoint#847
amreetkhuntia wants to merge 1 commit into
juspay:releasefrom
amreetkhuntia:feat/widget-stt-push-to-talk

Conversation

@amreetkhuntia

@amreetkhuntia amreetkhuntia commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Add a one-shot speech-to-text path for the chat widget's push-to-talk button: POST /widget/session/{id}/transcribe accepts a short audio clip and returns the transcript for the user to review before sending.

  • New app/ai/voice/stt/transcribe.py: direct provider-routed REST transcription (OpenAI Whisper / Deepgram / Sarvam), with Whisper fallback for streaming-only providers (Soniox/Google) and transient failures. Provider is chosen from the template's stt_configuration.
  • Widget route + handler with ownership, per-merchant IP rate limiting (transcribe bucket), empty/oversize guards, and CORS preflight.
  • WidgetTranscribeResponse schema; WIDGET_STT_MAX_AUDIO_BYTES as a Redis-backed dynamic config dial.
  • Docs: CHAT_MODE.md endpoint entry.

Summary by CodeRabbit

  • New Features

    • Added push-to-talk audio transcription capability to widgets with multi-provider support and intelligent fallback handling.
    • Introduced configurable audio upload size limits for transcription requests.
  • Documentation

    • Updated documentation with new transcription endpoint details.

Add a one-shot speech-to-text path for the chat widget's push-to-talk
button: POST /widget/session/{id}/transcribe accepts a short audio clip
and returns the transcript for the user to review before sending.

- New app/ai/voice/stt/transcribe.py: direct provider-routed REST
  transcription (OpenAI Whisper / Deepgram / Sarvam), with Whisper
  fallback for streaming-only providers (Soniox/Google) and transient
  failures. Provider is chosen from the template's stt_configuration.
- Widget route + handler with ownership, per-merchant IP rate limiting
  (transcribe bucket), empty/oversize guards, and CORS preflight.
- WidgetTranscribeResponse schema; WIDGET_STT_MAX_AUDIO_BYTES as a
  Redis-backed dynamic config dial.
- Docs: CHAT_MODE.md endpoint entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 19, 2026 06:34
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

A new push-to-talk transcription endpoint (POST /widget/session/{id}/transcribe) is added to the widget API. A new transcribe.py STT module implements multi-provider routing (OpenAI, Deepgram, Sarvam) with Whisper fallback. A WidgetTranscribeResponse schema, a Redis-backed WIDGET_STT_MAX_AUDIO_BYTES dynamic config, a handler with rate limiting and session validation, and router wiring complete the feature.

Changes

Widget Push-to-Talk Transcription

Layer / File(s) Summary
STT transcription module (providers, language, orchestration)
app/ai/voice/stt/transcribe.py, app/ai/voice/stt/__init__.py
New transcribe.py defines constants, language-code normalization helpers (_short_lang, _sarvam_lang), three async provider helpers (_openai, _deepgram, _sarvam), and the transcribe_audio orchestrator with provider routing and Whisper fallback. __init__.py re-exports Transcription, TranscriptionError, and transcribe_audio.
Response schema and dynamic audio size config
app/schemas/breeze_buddy/chat.py, app/core/config/dynamic.py
WidgetTranscribeResponse(text, provider) Pydantic model added and exported. WIDGET_STT_MAX_AUDIO_BYTES() async config function reads a Redis key (default 10 MiB).
Widget handler: validation, rate limit, transcription call
app/api/routers/breeze_buddy/widget/handlers.py
transcribe_widget_audio_handler enforces active widget config, applies a dedicated transcribe rate bucket, validates session ownership and non-ENDED status, enforces audio size via WIDGET_STT_MAX_AUDIO_BYTES(), selects provider from template config, calls transcribe_audio, maps TranscriptionError to HTTP 502, and returns WidgetTranscribeResponse.
Widget router wiring and docs
app/api/routers/breeze_buddy/widget/__init__.py, docs/CHAT_MODE.md
Registers OPTIONS and POST /session/{session_id}/transcribe routes on the widget router. Documentation table updated with the new route.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant WidgetRouter as Widget Router
    participant Handler as transcribe_widget_audio_handler
    participant STT as transcribe_audio
    participant Provider as OpenAI/Deepgram/Sarvam

    Client->>WidgetRouter: POST /widget/session/{id}/transcribe (multipart audio)
    WidgetRouter->>Handler: audio, session context
    Handler->>Handler: enforce widget active, rate bucket, session ownership, non-ENDED
    Handler->>Handler: read bytes, enforce WIDGET_STT_MAX_AUDIO_BYTES
    Handler->>STT: transcribe_audio(bytes, content_type, provider)
    STT->>Provider: POST audio (REST)
    Provider-->>STT: transcript text
    STT-->>Handler: Transcription(text, provider)
    Handler-->>Client: WidgetTranscribeResponse{text, provider}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Hop hop, the rabbit listens close,
Audio bytes arrive by post!
Three providers vie for the crown,
Whisper catches if one falls down.
Rate buckets guard the session gate,
Push-to-talk is finally great! 🎙️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: adding a push-to-talk STT transcription endpoint to the widget, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/api/routers/breeze_buddy/widget/handlers.py`:
- Around line 414-419: In the exception handler for TranscriptionError in the
widget transcribe function, the HTTPException being raised needs to preserve the
exception chain context. Add `from e` to the raise statement for the
HTTPException so that the original TranscriptionError exception context is
retained in the traceback, enabling better debugging of the provider failure.
- Around line 384-395: The current implementation calls await audio.read() which
loads the entire audio payload into memory before enforcing the size limit,
creating a potential memory spike for oversized uploads. Instead of reading all
data at once and then checking its length, modify the code to read the audio
data in chunks and enforce the max_bytes limit (obtained from await
WIDGET_STT_MAX_AUDIO_BYTES()) during the read operation itself, raising the
HTTPException with status HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the
cumulative chunk size exceeds the configured limit, rather than after all data
has been loaded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e87218ae-a051-49bc-b0ef-aa5a8c7bad21

📥 Commits

Reviewing files that changed from the base of the PR and between 39fdbef and 99dc54d.

📒 Files selected for processing (7)
  • app/ai/voice/stt/__init__.py
  • app/ai/voice/stt/transcribe.py
  • app/api/routers/breeze_buddy/widget/__init__.py
  • app/api/routers/breeze_buddy/widget/handlers.py
  • app/core/config/dynamic.py
  • app/schemas/breeze_buddy/chat.py
  • docs/CHAT_MODE.md

Comment on lines +384 to +395
data = await audio.read()
if not data:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Empty audio upload",
)
max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES()
if len(data) > max_bytes:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"Audio exceeds the {max_bytes}-byte limit",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce upload size while reading to prevent unbounded memory spikes.

await audio.read() loads the entire payload before the size gate, so oversized uploads can consume arbitrary worker memory before returning 413. Read in chunks and fail immediately once the configured limit is crossed.

Proposed fix
-    data = await audio.read()
-    if not data:
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail="Empty audio upload",
-        )
     max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES()
-    if len(data) > max_bytes:
-        raise HTTPException(
-            status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
-            detail=f"Audio exceeds the {max_bytes}-byte limit",
-        )
+    chunk_size = 1024 * 1024
+    buf = bytearray()
+    while chunk := await audio.read(chunk_size):
+        buf.extend(chunk)
+        if len(buf) > max_bytes:
+            raise HTTPException(
+                status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+                detail=f"Audio exceeds the {max_bytes}-byte limit",
+            )
+    if not buf:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail="Empty audio upload",
+        )
+    data = bytes(buf)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 384 - 395, The
current implementation calls await audio.read() which loads the entire audio
payload into memory before enforcing the size limit, creating a potential memory
spike for oversized uploads. Instead of reading all data at once and then
checking its length, modify the code to read the audio data in chunks and
enforce the max_bytes limit (obtained from await WIDGET_STT_MAX_AUDIO_BYTES())
during the read operation itself, raising the HTTPException with status
HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the cumulative chunk size
exceeds the configured limit, rather than after all data has been loaded.

Comment on lines +414 to +419
except TranscriptionError as e:
logger.warning("widget transcribe failed (session={}): {}", session_id, e)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="Transcription failed",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve traceback context when remapping to HTTP 502.

Chain the HTTPException with from e so provider failure context is retained.

Proposed fix
     except TranscriptionError as e:
         logger.warning("widget transcribe failed (session={}): {}", session_id, e)
         raise HTTPException(
             status_code=status.HTTP_502_BAD_GATEWAY,
             detail="Transcription failed",
-        )
+        ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except TranscriptionError as e:
logger.warning("widget transcribe failed (session={}): {}", session_id, e)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="Transcription failed",
)
except TranscriptionError as e:
logger.warning("widget transcribe failed (session={}): {}", session_id, e)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="Transcription failed",
) from e
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 416-419: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 414 - 419, In
the exception handler for TranscriptionError in the widget transcribe function,
the HTTPException being raised needs to preserve the exception chain context.
Add `from e` to the raise statement for the HTTPException so that the original
TranscriptionError exception context is retained in the traceback, enabling
better debugging of the provider failure.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants