Skip to content

enhance:add mimo-v2.5-tts provider#19

Open
Rocke1001feller wants to merge 1 commit into
ConardLi:mainfrom
Rocke1001feller:enhance/mimo-v2.5-tts
Open

enhance:add mimo-v2.5-tts provider#19
Rocke1001feller wants to merge 1 commit into
ConardLi:mainfrom
Rocke1001feller:enhance/mimo-v2.5-tts

Conversation

@Rocke1001feller

Copy link
Copy Markdown

What

Add MiMo-V2.5-TTS as a built-in TTS provider (tts-providers/mimo.sh), alongside the existing minimax and openai providers.

Why

  1. 限时免费 — MiMo-V2.5-TTS 目前免费使用(Token Plan),适合作为低预算项目的语音合成后端
  2. 中文口播质量优秀 — 9 个预置音色(冰糖 / 茉莉 / 苏打 / 白桦 / Mia / Chloe / Milo / Dean / mimo_default),中文场景表现出色

What changed

  • 新增 tts-providers/mimo.sh — 基于 MiMo OpenAI 兼容协议的 curl 实现,返回 base64 WAV → ffmpeg 转 mp3
  • 更新 tts-providers/README.md — 内置 provider 表 2→3,新增 mimo 使用示例
  • 更新 references/AUDIO.md — 内置 provider 列表新增 mimo 行
  • 更新 SKILL.md / README.md / README.zh-CN.md — 同步 provider 数量 2→3

Usage

export MIMO_API_KEY=tp-... # get one at https://platform.xiaomimimo.com
PRESENTATION_TTS=mimo npm run synthesize-audio

Copilot AI review requested due to automatic review settings June 6, 2026 02:30

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new MiMo TTS provider to the web-video-presentation skill and updates documentation to reflect the additional built-in provider.

Changes:

  • Introduce mimo.sh provider (curl + jq) that calls MiMo-V2.5-TTS and converts returned base64 WAV to mp3.
  • Update provider documentation and references to list MiMo as a built-in option.
  • Add usage instructions for MiMo (env vars, voices, base URL clusters).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
skills/web-video-presentation/templates/scripts/tts-providers/mimo.sh New MiMo-V2.5-TTS provider implementation (request, decode, convert).
skills/web-video-presentation/templates/scripts/tts-providers/README.md Document MiMo as an additional built-in provider + add usage section.
skills/web-video-presentation/references/AUDIO.md Update “built-in providers” list to include MiMo.
skills/web-video-presentation/SKILL.md Update skill description + internal docs to mention 3 built-in providers.
skills/web-video-presentation/README.zh-CN.md Update top-level Chinese README to mention MiMo provider.
skills/web-video-presentation/README.md Update top-level English README to mention MiMo provider.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Write response (JSON with base64 audio) to a temp file, then decode.
# MiMo API returns: {choices:[{message:{audio:{data:"<base64>"}}}]}
local tmp
tmp=$(mktemp -t mimo.XXXXXX.json)

# Decode the base64 WAV from JSON → write raw WAV → convert to mp3
local raw_wav
raw_wav=$(mktemp -t mimo.XXXXXX.wav)
# Decode the base64 WAV from JSON → write raw WAV → convert to mp3
local raw_wav
raw_wav=$(mktemp -t mimo.XXXXXX.wav)
jq -r '.choices[0].message.audio.data' "$tmp" | base64 -d > "$raw_wav"
Comment on lines +94 to +97
curl -fsS -o "$tmp" -X POST "$base/chat/completions" \
-H "api-key: $MIMO_API_KEY" \
-H "Content-Type: application/json" \
-d "$payload" 2>/dev/null
Comment on lines +105 to +109
ffmpeg -y -i "$raw_wav" -codec:a libmp3lame -qscale:a 2 "$out" >/dev/null 2>&1
local code=$?

rm -f "$tmp" "$raw_wav"
return $code
Comment on lines +80 to +81
# MiMo uses the Chat Completions format: text goes in role=assistant,
# voice/style goes in role=user. For preset voices, audio.voice is set.
--arg t "$text" \
--arg v "$voice" \
--arg m "$model" \
'{model:$m, messages:[{role:"user", content:""}, {role:"assistant", content:$t}], audio:{format:"wav", voice:$v}}')
| `openai.sh` | OpenAI Audio Speech API | `OPENAI_API_KEY` env var | curl-based;多数 agent 已有 key |
| `mimo.sh` | MiMo-V2.5-TTS API | `MIMO_API_KEY` env var | curl-based;9 preset voices;中文口播强 |

只内置这两个 —— 我们不替你做更多技术选型。其它后端的代码片段在下面,
Comment on lines +66 to 69
│ └── tts-providers/ # 每 provider 一个 .sh(内置 3 个)
│ ├── README.md # 三函数契约 + 5 段现成代码片段(11labs / edge-tts / say / azure / gcloud)
│ ├── minimax.sh # 默认 provider,用 mmx-cli
│ └── openai.sh # 内置 OpenAI TTS(curl + OPENAI_API_KEY)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants