Fix: Claude Code VS Code extension fails to parse responses when every chunk includes usage data#3670
Fix: Claude Code VS Code extension fails to parse responses when every chunk includes usage data#3670zhaohuiweixiao wants to merge 3 commits intohigress-group:mainfrom
Conversation
|
这个改法不是很合适。部分大模型服务在处理包含 流式响应示例(https://api.moonshot.cn/v1/chat/completions + moonshot-v1-8k): This change is not very appropriate. When some large model services process streaming requests containing Streaming response example (https://api.moonshot.cn/v1/chat/completions + moonshot-v1-8k): |
…y chunk includes usage data Signed-off-by: zhaohuihui <zhaohuihui_yewu@cmss.chinamobile.com>
64dfd17 to
6f1c73c
Compare
|
已修改为DONE时发送message_stop event |
会不会有的场景下,服务端没有返回 [DONE]?需要进一步分析代码并测试验证一下。 |
是要测试一下这个文档中说的这些主流实现是否都返回DONE吗? HuggingFace TGI:发送 [DONE]\n(带换行符),可能导致解析问题,但已修复。 FastChat:遵循 OpenAI 规范,支持流式输出。 Ollama:不发送 [DONE],而是在流式响应的最后一个 JSON 块中将 "done": true 作为标志。 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Ⅰ. Describe what this PR did
Fix bug: Claude Code VS Code extension fails to parse responses when every chunk includes usage data.

In the ai-proxy plugin, the logic for converting OpenAI protocol to Claude protocol treats any chunk containing usage as the end of the stream.Therefore, when the model returns every chunk with usage, it prematurely emits a message_stop event, causing parsing failures in the Claude Code VS Code extension.
model output example:
`data: {"id":"019d3d99245a5dd32971f17e72e2e4e3","object":"chat.completion.chunk","created":1774854940,"model":"Minimax-M2.5","choices":[{"index":0,"delta":{"role":"assistant","content":""}}],"system_fingerprint":"","usage":{"prompt_tokens":40,"completion_tokens":0,"total_tokens":40,"prompt_tokens_details":{},"completion_tokens_details":{}}}
data: {"id":"019d3d99245a5dd32971f17e72e2e4e3","object":"chat.completion.chunk","created":1774854940,"model":"Minimax-M2.5","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":"用户"}}],"system_fingerprint":"","usage":{"prompt_tokens":40,"completion_tokens":1,"total_tokens":41,"prompt_tokens_details":{},"completion_tokens_details":{"reasoning_tokens":1}}}
data: {"id":"019d3d99245a5dd32971f17e72e2e4e3","object":"chat.completion.chunk","created":1774854940,"model":"Minimax-M2.5","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":"用"}}],"system_fingerprint":"","usage":{"prompt_tokens":40,"completion_tokens":2,"total_tokens":42,"prompt_tokens_details":{},"completion_tokens_details":{"reasoning_tokens":2}}}`
ai-proxy output:

Ⅱ. Does this pull request fix one issue?
yes
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Use whether the chunk contains a finish_reason as the end-of-stream indicator.
Ⅴ. Special notes for reviews
Ⅵ. AI Coding Tool Usage Checklist (if applicable)
Please check all applicable items:
For new standalone features (e.g., new wasm plugin or golang-filter plugin):
design/directory in the plugin folderdesign/directoryFor regular updates/changes (not new plugins):
AI Coding Prompts (for regular updates)
AI Coding Summary