Skip to content

Fix Responses API background=true + stream=true panic (#1945)#2068

Open
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-1945-responses-streaming
Open

Fix Responses API background=true + stream=true panic (#1945)#2068
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-1945-responses-streaming

Conversation

@glaziermag
Copy link
Copy Markdown
Contributor

Fixes #1945
(Alternative to closed PR #1989 which aggressively disables streams).

Description

This fixes a hard background thread panic when hitting the /v1/responses endpoint with background=true and stream=true.

The endpoint spawned a background Tokio worker which polled its async mpsc receiver using a naive single-shot match bg_rx.recv().await. When stream=true was provided, the underlying core text pipeline correctly emitted Response::Chunk(...) payloads. Because the background receiver wasn't looping, it instantly aborted with Unexpected response type upon encountering the first stream chunk.

This PR wraps the receiver in a while let loop to sequentially bypass stream events in the background while accurately preserving terminal Done, ModelError, or ValidationError states.

Before vs. After Ground Truth

Before Logs (Failing Baseline)

$ curl -s -X POST http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{ "model": "hf-internal-testing/tiny-random-LlamaForCausalLM", "background": true, "stream": true, "input": "Hello" }'

# [GET /v1/responses/... 5 seconds later]
$ curl -s -X GET "http://localhost:1234/v1/responses/resp_10d61cee-e68f-4963-b049-048ea57caed8"
{"id":"resp_10d61cee-e68f-4963-b049-048ea57caed8","object":"response","created_at":1775517077,"model":"hf-internal-testing/tiny-random-LlamaForCausalLM","status":"failed","output":[],"error":{"type":"unknown_error","message":"Unexpected response type"}}

After Logs (Graceful Handling)

To prove the core engine correctly bypasses chunks and accurately digests terminal boundaries, I intentionally triggered an internal dimension bounds assertion in the testing sandbox.

$ curl -s -X POST http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{ "model": "hf-internal-testing/tiny-random-LlamaForCausalLM", "background": true, "stream": true, "input": "Hello" }'

# [GET /v1/responses/... 5 seconds later]
$ curl -s -X GET "http://localhost:1234/v1/responses/resp_c9d1fff5-ded1-4b78-b25b-b462706b3895"
{"id":"resp_c9d1fff5-ded1-4b78-b25b-b462706b3895","object":"response","created_at":1775517461,"model":"hf-internal-testing/tiny-random-LlamaForCausalLM","status":"failed","output":[],"error":{"type":"model_error","message":"narrow invalid args start + len > dim_len: [2048, 2], dim: 0, start: 2048, len:1"}}

@glaziermag glaziermag force-pushed the fix-1945-responses-streaming branch from 1bf58be to 5cd2631 Compare April 6, 2026 23:47
@glaziermag glaziermag marked this pull request as ready for review April 6, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses API: background=true + stream=true fails with 'Unexpected response type'

1 participant