Skip to content

fix(tongyi): avoid gevent busy loop in STT#3261

Draft
Zenine wants to merge 1 commit into
langgenius:mainfrom
Zenine:codex/tongyi-stt-subprocess-default
Draft

fix(tongyi): avoid gevent busy loop in STT#3261
Zenine wants to merge 1 commit into
langgenius:mainfrom
Zenine:codex/tongyi-stt-subprocess-default

Conversation

@Zenine

@Zenine Zenine commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • patch DashScope's async-to-sync websocket bridge in Tongyi STT when running under gevent monkey patching
  • keep TONGYI_STT_SUBPROCESS=1 as an explicit fallback path, instead of making subprocess execution the default
  • document the STT runtime behavior and bump the Tongyi plugin version to 0.2.1
  • constrain dashscope to the verified 1.25.x range because this fix intentionally shims an SDK internal helper

Why

Dify's plugin runtime imports dify_plugin, which globally applies gevent.monkey.patch_all(sys=True). DashScope Recognition.call() uses websocket duplex streaming and internally converts an async generator into a synchronous iterator through dashscope.common.utils.iter_over_async().

That bridge creates a thread and queue. Under gevent-patched threading/queue behavior, the Tongyi plugin process can remain in a busy event-loop state after STT requests, causing sustained high CPU in self-hosted plugin_daemon deployments.

This change keeps the normal in-process Recognition path, but swaps DashScope's async bridge to use native threading.Thread and native queue.Queue when gevent has patched threading. The existing subprocess worker remains available by explicitly setting TONGYI_STT_SUBPROCESS=1.

The shim is guarded and process-local:

  • no-op when gevent is unavailable
  • no-op when threading has not been monkey-patched
  • logs a warning and falls back to the old behavior if DashScope internals are incompatible
  • applies once per plugin process

No upstream DashScope issue has been filed yet from this PR; this is currently handled as a Tongyi plugin compatibility shim for Dify's gevent-based plugin runtime.

Field validation

Tested on a self-hosted BsoftRAG/Dify deployment on a Hygon/Haiguang server.

Setup:

  1. Removed TONGYI_STT_SUBPROCESS and TONGYI_STT_RECOGNITION_TIMEOUT from the plugin daemon runtime.
  2. Copied this PR's models/speech2text/speech2text.py into the installed Tongyi plugin venv.
  3. Restarted the Tongyi plugin process.
  4. Ran 3 waves of 30 concurrent /v1/audio-to-text requests against the chat-audio-test app.

Observed result:

  • STT requests returned HTTP 200.
  • plugin_daemon had transient CPU during active STT processing.
  • After each wave, plugin_daemon returned to low CPU instead of staying at sustained single-core usage.
  • Observed post-wave CPU samples:
    • wave 1: 0.19% after 5s, 0.24% after 15s, 0.33% after 30s
    • wave 2: 0.47% after 5s, 0.31% after 15s, 0.27% after 30s
    • wave 3: 0.30% after 5s, 0.54% after 15s, 0.32% after 30s
  • The server was restored to its prior TONGYI_STT_SUBPROCESS=1 / TONGYI_STT_RECOGNITION_TIMEOUT=120 configuration after the experiment.

Local validation

Passed:

cd models/tongyi
uv run --python 3.12 pytest tests -k 'not test_llm_invoke'

Result: 17 passed, 75 deselected.

Passed:

uvx black models/tongyi/models/speech2text/speech2text.py models/tongyi/tests/test_speech2text.py -C -l 100
uvx ruff check models/tongyi/models/speech2text/speech2text.py models/tongyi/tests/test_speech2text.py

Result: All checks passed.

Not run successfully:

python3 -m pytest -q

The local global pre-commit hook runs the repository-wide pytest suite with the host Python 3.14 environment, where plugin dependencies such as dify_plugin, dashscope, google, numpy, and tiktoken are not installed. This fails during test collection before exercising this Tongyi change.

Also not run locally: tests/test_llm_call.py, because it requires a real DASHSCOPE_API_KEY.

@Zenine Zenine temporarily deployed to models/tongyi June 8, 2026 03:56 — with GitHub Actions Inactive

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the speech-to-text subprocess execution logic to be enabled by default, disabling it only when the TONGYI_STT_SUBPROCESS environment variable is set to a false-like value. The tests have been updated to accommodate this default behavior. Feedback on the tests recommends avoiding clearing the entire environment using patch.dict(os.environ, {}, clear=True) as it can cause unexpected failures, and suggests selectively filtering out only the target environment variable instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread models/tongyi/tests/test_speech2text.py Outdated

audio = MagicMock(frame_rate=16000)
with patch.dict(os.environ, {"TONGYI_STT_SUBPROCESS": "1"}):
with patch.dict(os.environ, {}, clear=True):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using clear=True with an empty dictionary in patch.dict(os.environ, {}, clear=True) completely clears all environment variables (including critical ones like PATH, PYTHONPATH, TMPDIR, etc.) for the duration of the block. This can cause unexpected failures in tests, subprocesses, or test runners in certain environments.

Instead, you should preserve the existing environment variables and only exclude or remove the TONGYI_STT_SUBPROCESS variable.

Suggested change
with patch.dict(os.environ, {}, clear=True):
with patch.dict(os.environ, {k: v for k, v in os.environ.items() if k != "TONGYI_STT_SUBPROCESS"}, clear=True):

@Zenine Zenine force-pushed the codex/tongyi-stt-subprocess-default branch from b2acd0c to fd8426f Compare June 8, 2026 04:08
@Zenine Zenine changed the title fix(tongyi): enable STT subprocess by default fix(tongyi): avoid gevent busy loop in STT Jun 8, 2026
@Zenine Zenine temporarily deployed to models/tongyi June 8, 2026 04:09 — with GitHub Actions Inactive
@Zenine Zenine force-pushed the codex/tongyi-stt-subprocess-default branch from fd8426f to 4708347 Compare June 8, 2026 05:44
@Zenine Zenine deployed to models/tongyi June 8, 2026 05:45 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant