OpenAI realtime whisper hears speech but final transcript comes too late, so agent never replies

When using the OpenAI realtime transcriber (gpt-realtime-whisper) on Twilio calls, the system gets interim transcript chunks quickly, but the final transcript comes very late, sometimes 20–35 seconds later.

Because the backend only sends the user message to the LLM after the final transcript arrives, the assistant never replies during the actual call. Instead, it may play “Hey, are you still there?” or hang up for inactivity before the final transcript is processed.

What I observed:

OpenAI transcriber connects successfully
interim transcript tokens appear quickly
no LLM turn starts
final transcript arrives way too late
call can end before reply happens
Example symptom:
User says things like:

“Hey Brooke, what’s up?”
“What’s going on, Brooke?”
“Hey Brooke, can you hear me?”
But the agent never answers in time.




Environment ( git clone of master branch latest upstream )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OpenAI realtime whisper hears speech but final transcript comes too late, so agent never replies #711

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

OpenAI realtime whisper hears speech but final transcript comes too late, so agent never replies #711

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions