feat(widget): add push-to-talk STT transcription endpoint by amreetkhuntia · Pull Request #847 · juspay/clairvoyance

amreetkhuntia · 2026-06-19T06:34:50Z

Add a one-shot speech-to-text path for the chat widget's push-to-talk button: POST /widget/session/{id}/transcribe accepts a short audio clip and returns the transcript for the user to review before sending.

New app/ai/voice/stt/transcribe.py: direct provider-routed REST transcription (OpenAI Whisper / Deepgram / Sarvam), with Whisper fallback for streaming-only providers (Soniox/Google) and transient failures. Provider is chosen from the template's stt_configuration.
Widget route + handler with ownership, per-merchant IP rate limiting (transcribe bucket), empty/oversize guards, and CORS preflight.
WidgetTranscribeResponse schema; WIDGET_STT_MAX_AUDIO_BYTES as a Redis-backed dynamic config dial.
Docs: CHAT_MODE.md endpoint entry.

Summary by CodeRabbit

New Features
- Added push-to-talk audio transcription capability to widgets with multi-provider support and intelligent fallback handling.
- Introduced configurable audio upload size limits for transcription requests.
Documentation
- Updated documentation with new transcription endpoint details.

Add a one-shot speech-to-text path for the chat widget's push-to-talk button: POST /widget/session/{id}/transcribe accepts a short audio clip and returns the transcript for the user to review before sending. - New app/ai/voice/stt/transcribe.py: direct provider-routed REST transcription (OpenAI Whisper / Deepgram / Sarvam), with Whisper fallback for streaming-only providers (Soniox/Google) and transient failures. Provider is chosen from the template's stt_configuration. - Widget route + handler with ownership, per-merchant IP rate limiting (transcribe bucket), empty/oversize guards, and CORS preflight. - WidgetTranscribeResponse schema; WIDGET_STT_MAX_AUDIO_BYTES as a Redis-backed dynamic config dial. - Docs: CHAT_MODE.md endpoint entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-19T06:35:05Z

Walkthrough

A new push-to-talk transcription endpoint (POST /widget/session/{id}/transcribe) is added to the widget API. A new transcribe.py STT module implements multi-provider routing (OpenAI, Deepgram, Sarvam) with Whisper fallback. A WidgetTranscribeResponse schema, a Redis-backed WIDGET_STT_MAX_AUDIO_BYTES dynamic config, a handler with rate limiting and session validation, and router wiring complete the feature.

Changes

Widget Push-to-Talk Transcription

Layer / File(s)	Summary
STT transcription module (providers, language, orchestration) `app/ai/voice/stt/transcribe.py`, `app/ai/voice/stt/__init__.py`	New `transcribe.py` defines constants, language-code normalization helpers (`_short_lang`, `_sarvam_lang`), three async provider helpers (`_openai`, `_deepgram`, `_sarvam`), and the `transcribe_audio` orchestrator with provider routing and Whisper fallback. `__init__.py` re-exports `Transcription`, `TranscriptionError`, and `transcribe_audio`.
Response schema and dynamic audio size config `app/schemas/breeze_buddy/chat.py`, `app/core/config/dynamic.py`	`WidgetTranscribeResponse(text, provider)` Pydantic model added and exported. `WIDGET_STT_MAX_AUDIO_BYTES()` async config function reads a Redis key (default 10 MiB).
Widget handler: validation, rate limit, transcription call `app/api/routers/breeze_buddy/widget/handlers.py`	`transcribe_widget_audio_handler` enforces active widget config, applies a dedicated `transcribe` rate bucket, validates session ownership and non-`ENDED` status, enforces audio size via `WIDGET_STT_MAX_AUDIO_BYTES()`, selects provider from template config, calls `transcribe_audio`, maps `TranscriptionError` to HTTP 502, and returns `WidgetTranscribeResponse`.
Widget router wiring and docs `app/api/routers/breeze_buddy/widget/__init__.py`, `docs/CHAT_MODE.md`	Registers `OPTIONS` and `POST /session/{session_id}/transcribe` routes on the widget router. Documentation table updated with the new route.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant WidgetRouter as Widget Router
    participant Handler as transcribe_widget_audio_handler
    participant STT as transcribe_audio
    participant Provider as OpenAI/Deepgram/Sarvam

    Client->>WidgetRouter: POST /widget/session/{id}/transcribe (multipart audio)
    WidgetRouter->>Handler: audio, session context
    Handler->>Handler: enforce widget active, rate bucket, session ownership, non-ENDED
    Handler->>Handler: read bytes, enforce WIDGET_STT_MAX_AUDIO_BYTES
    Handler->>STT: transcribe_audio(bytes, content_type, provider)
    STT->>Provider: POST audio (REST)
    Provider-->>STT: transcript text
    STT-->>Handler: Transcription(text, provider)
    Handler-->>Client: WidgetTranscribeResponse{text, provider}

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Hop hop, the rabbit listens close,
Audio bytes arrive by post!
Three providers vie for the crown,
Whisper catches if one falls down.
Rate buckets guard the session gate,
Push-to-talk is finally great! 🎙️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main change: adding a push-to-talk STT transcription endpoint to the widget, which is the primary objective of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/api/routers/breeze_buddy/widget/handlers.py`:
- Around line 414-419: In the exception handler for TranscriptionError in the
widget transcribe function, the HTTPException being raised needs to preserve the
exception chain context. Add `from e` to the raise statement for the
HTTPException so that the original TranscriptionError exception context is
retained in the traceback, enabling better debugging of the provider failure.
- Around line 384-395: The current implementation calls await audio.read() which
loads the entire audio payload into memory before enforcing the size limit,
creating a potential memory spike for oversized uploads. Instead of reading all
data at once and then checking its length, modify the code to read the audio
data in chunks and enforce the max_bytes limit (obtained from await
WIDGET_STT_MAX_AUDIO_BYTES()) during the read operation itself, raising the
HTTPException with status HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the
cumulative chunk size exceeds the configured limit, rather than after all data
has been loaded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e87218ae-a051-49bc-b0ef-aa5a8c7bad21

📥 Commits

Reviewing files that changed from the base of the PR and between 39fdbef and 99dc54d.

📒 Files selected for processing (7)

app/ai/voice/stt/__init__.py
app/ai/voice/stt/transcribe.py
app/api/routers/breeze_buddy/widget/__init__.py
app/api/routers/breeze_buddy/widget/handlers.py
app/core/config/dynamic.py
app/schemas/breeze_buddy/chat.py
docs/CHAT_MODE.md

coderabbitai · 2026-06-19T06:39:53Z

+    data = await audio.read()
+    if not data:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail="Empty audio upload",
+        )
+    max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES()
+    if len(data) > max_bytes:
+        raise HTTPException(
+            status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+            detail=f"Audio exceeds the {max_bytes}-byte limit",
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce upload size while reading to prevent unbounded memory spikes.

await audio.read() loads the entire payload before the size gate, so oversized uploads can consume arbitrary worker memory before returning 413. Read in chunks and fail immediately once the configured limit is crossed.

Proposed fix

- data = await audio.read() - if not data: - raise HTTPException( - status_code=status.HTTP_400_BAD_REQUEST, - detail="Empty audio upload", - ) max_bytes = await WIDGET_STT_MAX_AUDIO_BYTES() - if len(data) > max_bytes: - raise HTTPException( - status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, - detail=f"Audio exceeds the {max_bytes}-byte limit", - ) + chunk_size = 1024 * 1024 + buf = bytearray() + while chunk := await audio.read(chunk_size): + buf.extend(chunk) + if len(buf) > max_bytes: + raise HTTPException( + status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, + detail=f"Audio exceeds the {max_bytes}-byte limit", + ) + if not buf: + raise HTTPException( + status_code=status.HTTP_400_BAD_REQUEST, + detail="Empty audio upload", + ) + data = bytes(buf)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 384 - 395, The current implementation calls await audio.read() which loads the entire audio payload into memory before enforcing the size limit, creating a potential memory spike for oversized uploads. Instead of reading all data at once and then checking its length, modify the code to read the audio data in chunks and enforce the max_bytes limit (obtained from await WIDGET_STT_MAX_AUDIO_BYTES()) during the read operation itself, raising the HTTPException with status HTTP_413_REQUEST_ENTITY_TOO_LARGE immediately when the cumulative chunk size exceeds the configured limit, rather than after all data has been loaded.

coderabbitai · 2026-06-19T06:39:53Z

+    except TranscriptionError as e:
+        logger.warning("widget transcribe failed (session={}): {}", session_id, e)
+        raise HTTPException(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            detail="Transcription failed",
+        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve traceback context when remapping to HTTP 502.

Chain the HTTPException with from e so provider failure context is retained.

Proposed fix

except TranscriptionError as e: logger.warning("widget transcribe failed (session={}): {}", session_id, e) raise HTTPException( status_code=status.HTTP_502_BAD_GATEWAY, detail="Transcription failed", - ) + ) from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except TranscriptionError as e:

logger.warning("widget transcribe failed (session={}): {}", session_id, e)

raise HTTPException(

status_code=status.HTTP_502_BAD_GATEWAY,

detail="Transcription failed",

)

except TranscriptionError as e:

logger.warning("widget transcribe failed (session={}): {}", session_id, e)

raise HTTPException(

status_code=status.HTTP_502_BAD_GATEWAY,

detail="Transcription failed",

) from e

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 416-419: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/routers/breeze_buddy/widget/handlers.py` around lines 414 - 419, In the exception handler for TranscriptionError in the widget transcribe function, the HTTPException being raised needs to preserve the exception chain context. Add `from e` to the raise statement for the HTTPException so that the original TranscriptionError exception context is retained in the traceback, enabling better debugging of the provider failure.

Source: Linters/SAST tools

Copilot AI review requested due to automatic review settings June 19, 2026 06:34

Copilot started reviewing on behalf of amreetkhuntia June 19, 2026 06:35 View session

Copilot AI reviewed Jun 19, 2026

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(widget): add push-to-talk STT transcription endpoint#847

feat(widget): add push-to-talk STT transcription endpoint#847
amreetkhuntia wants to merge 1 commit into
juspay:releasefrom
amreetkhuntia:feat/widget-stt-push-to-talk

amreetkhuntia commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amreetkhuntia commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amreetkhuntia commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading