When using the OpenAI realtime transcriber (gpt-realtime-whisper) on Twilio calls, the system gets interim transcript chunks quickly, but the final transcript comes very late, sometimes 20–35 seconds later.
Because the backend only sends the user message to the LLM after the final transcript arrives, the assistant never replies during the actual call. Instead, it may play “Hey, are you still there?” or hang up for inactivity before the final transcript is processed.
What I observed:
OpenAI transcriber connects successfully
interim transcript tokens appear quickly
no LLM turn starts
final transcript arrives way too late
call can end before reply happens
Example symptom:
User says things like:
“Hey Brooke, what’s up?”
“What’s going on, Brooke?”
“Hey Brooke, can you hear me?”
But the agent never answers in time.
Environment ( git clone of master branch latest upstream )
When using the OpenAI realtime transcriber (gpt-realtime-whisper) on Twilio calls, the system gets interim transcript chunks quickly, but the final transcript comes very late, sometimes 20–35 seconds later.
Because the backend only sends the user message to the LLM after the final transcript arrives, the assistant never replies during the actual call. Instead, it may play “Hey, are you still there?” or hang up for inactivity before the final transcript is processed.
What I observed:
OpenAI transcriber connects successfully
interim transcript tokens appear quickly
no LLM turn starts
final transcript arrives way too late
call can end before reply happens
Example symptom:
User says things like:
“Hey Brooke, what’s up?”
“What’s going on, Brooke?”
“Hey Brooke, can you hear me?”
But the agent never answers in time.
Environment ( git clone of master branch latest upstream )