Skip to content

OpenAI realtime whisper hears speech but final transcript comes too late, so agent never replies #711

Description

@abdulrahmanmajid

When using the OpenAI realtime transcriber (gpt-realtime-whisper) on Twilio calls, the system gets interim transcript chunks quickly, but the final transcript comes very late, sometimes 20–35 seconds later.

Because the backend only sends the user message to the LLM after the final transcript arrives, the assistant never replies during the actual call. Instead, it may play “Hey, are you still there?” or hang up for inactivity before the final transcript is processed.

What I observed:

OpenAI transcriber connects successfully
interim transcript tokens appear quickly
no LLM turn starts
final transcript arrives way too late
call can end before reply happens
Example symptom:
User says things like:

“Hey Brooke, what’s up?”
“What’s going on, Brooke?”
“Hey Brooke, can you hear me?”
But the agent never answers in time.

Environment ( git clone of master branch latest upstream )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions