fix(#34542): use model audio_type instead of hardcoded mp3 in Azure TTS by agenthaulk · Pull Request #2831 · langgenius/dify-official-plugins

agenthaulk · 2026-04-04T16:28:07Z

Fix

Replace hardcoded "mp3" response format with dynamic audio_type from model configuration, and add proper non-streaming TTS path using pydub for correct audio combining.

Root Cause

_tts_invoke_streaming and _process_sentence hardcoded response_format="mp3", ignoring the audio type configured per model in credentials.

Changes

tts.py — full rewrite for consistency with the OpenAI plugin pattern:

_tts_invoke_streaming: uses audio_type (from _get_model_audio_type) instead of hardcoded "mp3" as response_format
_tts_invoke (new): non-streaming path that collects sentence audio in parallel, then combines with pydub.AudioSegment for correct output regardless of format (handles wav/flac header issues)
_process_sentence: now passes response_format=audio_type to the API
No _STREAMABLE_FORMATS whitelist needed — streaming uses whatever format the user configured; non-streaming uses pydub for proper merging

pyproject.toml — added pydub~=0.25.1 dependency (same version as the OpenAI plugin)

manifest.yaml — version bump

tests/test_tts.py — new test suite covering:

Streaming path uses configured audio_type
Streaming with long text passes correct format to all parallel requests
_process_sentence forwards audio_type as response_format
_tts_invoke combines segments via pydub

Alignment with OpenAI plugin

This implementation mirrors models/openai/models/tts/tts.py:

Same pydub-based combining for non-streaming
Streaming path uses the configured audio format
Same dependency version (pydub~=0.25.1)

Scope

Minimal, focused on the reported issue. No behavioral changes beyond fixing the hardcoded format.

gemini-code-assist

Code Review

This pull request replaces the hardcoded 'mp3' response format with a dynamic lookup in the Azure OpenAI TTS implementation. A review comment identifies a potential issue where header-based audio formats (such as wav or flac) may result in corrupted files when concatenated in the streaming path, as the current logic assumes a format that supports simple byte concatenation like mp3.

gemini-code-assist · 2026-04-04T16:29:38Z

models/azure_openai/models/tts/tts.py

                        client.audio.speech.with_streaming_response.create,
                        model=model,
-                        response_format="mp3",
+                        response_format=self._get_model_audio_type(model, credentials),


When response_format is set to a format that includes a header (such as wav, flac, or opus), simply concatenating the raw bytes from multiple requests in the long-text path will result in a corrupted audio file containing multiple headers. This implementation worked previously because mp3 was hardcoded and supports simple concatenation. Consider adding logic to handle header-based formats or restricting the allowed formats for the multi-sentence streaming path.

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 4, 2026

agenthaulk had a problem deploying to models/azure_openai April 4, 2026 16:28 — with GitHub Actions Failure

dosubot bot added the bug Something isn't working label Apr 4, 2026

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 5, 2026

agenthaulk had a problem deploying to models/azure_openai April 5, 2026 02:37 — with GitHub Actions Failure

agenthaulk had a problem deploying to models/azure_openai April 5, 2026 02:44 — with GitHub Actions Failure

agenthaulk temporarily deployed to models/azure_openai April 5, 2026 03:41 — with GitHub Actions Inactive

agenthaulk force-pushed the fix/34542-azure-tts-response-format branch from 41f18b3 to 980783a Compare April 8, 2026 17:18

agenthaulk had a problem deploying to models/azure_openai April 8, 2026 17:19 — with GitHub Actions Failure

agenthaulk added a commit to agenthaulk/dify-official-plugins that referenced this pull request Apr 9, 2026

bump azure_openai version to 0.0.50 for PR langgenius#2831

47b6148

agenthaulk had a problem deploying to models/azure_openai April 9, 2026 05:27 — with GitHub Actions Failure

agenthaulk had a problem deploying to models/azure_openai April 9, 2026 05:39 — with GitHub Actions Failure

agenthaulk closed this Apr 10, 2026

agenthaulk force-pushed the fix/34542-azure-tts-response-format branch from 619ff60 to 21d3a7e Compare April 10, 2026 00:33

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#34542): use model audio_type instead of hardcoded mp3 in Azure TTS#2831

fix(#34542): use model audio_type instead of hardcoded mp3 in Azure TTS#2831
agenthaulk wants to merge 0 commit intolanggenius:mainfrom
agenthaulk:fix/34542-azure-tts-response-format

agenthaulk commented Apr 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agenthaulk commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

Root Cause

Changes

Alignment with OpenAI plugin

Scope

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

agenthaulk commented Apr 4, 2026 •

edited

Loading