Skip to content

fix(#34542): use model audio_type instead of hardcoded mp3 in Azure TTS#2831

Closed
agenthaulk wants to merge 0 commit intolanggenius:mainfrom
agenthaulk:fix/34542-azure-tts-response-format
Closed

fix(#34542): use model audio_type instead of hardcoded mp3 in Azure TTS#2831
agenthaulk wants to merge 0 commit intolanggenius:mainfrom
agenthaulk:fix/34542-azure-tts-response-format

Conversation

@agenthaulk
Copy link
Copy Markdown

@agenthaulk agenthaulk commented Apr 4, 2026

Fix

Replace hardcoded "mp3" response format with dynamic audio_type from model configuration, and add proper non-streaming TTS path using pydub for correct audio combining.

Root Cause

_tts_invoke_streaming and _process_sentence hardcoded response_format="mp3", ignoring the audio type configured per model in credentials.

Changes

tts.py — full rewrite for consistency with the OpenAI plugin pattern:

  • _tts_invoke_streaming: uses audio_type (from _get_model_audio_type) instead of hardcoded "mp3" as response_format
  • _tts_invoke (new): non-streaming path that collects sentence audio in parallel, then combines with pydub.AudioSegment for correct output regardless of format (handles wav/flac header issues)
  • _process_sentence: now passes response_format=audio_type to the API
  • No _STREAMABLE_FORMATS whitelist needed — streaming uses whatever format the user configured; non-streaming uses pydub for proper merging

pyproject.toml — added pydub~=0.25.1 dependency (same version as the OpenAI plugin)

manifest.yaml — version bump

tests/test_tts.py — new test suite covering:

  • Streaming path uses configured audio_type
  • Streaming with long text passes correct format to all parallel requests
  • _process_sentence forwards audio_type as response_format
  • _tts_invoke combines segments via pydub

Alignment with OpenAI plugin

This implementation mirrors models/openai/models/tts/tts.py:

  • Same pydub-based combining for non-streaming
  • Streaming path uses the configured audio format
  • Same dependency version (pydub~=0.25.1)

Scope

Minimal, focused on the reported issue. No behavioral changes beyond fixing the hardcoded format.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 4, 2026
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 4, 2026 16:28 — with GitHub Actions Failure
@dosubot dosubot bot added the bug Something isn't working label Apr 4, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the hardcoded 'mp3' response format with a dynamic lookup in the Azure OpenAI TTS implementation. A review comment identifies a potential issue where header-based audio formats (such as wav or flac) may result in corrupted files when concatenated in the streaming path, as the current logic assumes a format that supports simple byte concatenation like mp3.

client.audio.speech.with_streaming_response.create,
model=model,
response_format="mp3",
response_format=self._get_model_audio_type(model, credentials),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When response_format is set to a format that includes a header (such as wav, flac, or opus), simply concatenating the raw bytes from multiple requests in the long-text path will result in a corrupted audio file containing multiple headers. This implementation worked previously because mp3 was hardcoded and supports simple concatenation. Consider adding logic to handle header-based formats or restricting the allowed formats for the multi-sentence streaming path.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 5, 2026
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 5, 2026 02:37 — with GitHub Actions Failure
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 5, 2026 02:44 — with GitHub Actions Failure
@agenthaulk agenthaulk temporarily deployed to models/azure_openai April 5, 2026 03:41 — with GitHub Actions Inactive
@agenthaulk agenthaulk force-pushed the fix/34542-azure-tts-response-format branch from 41f18b3 to 980783a Compare April 8, 2026 17:18
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 8, 2026 17:19 — with GitHub Actions Failure
agenthaulk added a commit to agenthaulk/dify-official-plugins that referenced this pull request Apr 9, 2026
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 9, 2026 05:27 — with GitHub Actions Failure
@agenthaulk agenthaulk had a problem deploying to models/azure_openai April 9, 2026 05:39 — with GitHub Actions Failure
@agenthaulk agenthaulk closed this Apr 10, 2026
@agenthaulk agenthaulk force-pushed the fix/34542-azure-tts-response-format branch from 619ff60 to 21d3a7e Compare April 10, 2026 00:33
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant