tl;dr:
claude wrote this (and spoiler- it wrote the accompanying PR too).
I'm open to other solutions, but it does seem like AudioProducer.flush() would be clutch to interrupt voice agents.
Summary
moq.AudioProducer.write() (Python bindings, moq-rs 0.2.17) is fire-and-forget — there's no way to drop already-written audio from the encoder/wire/jitter buffer without closing the whole track. For real-time interactive use cases (voice agents, where the user interrupts the bot mid-utterance), this means we can stop generating audio but the already-buffered audio finishes playing on the consumer side.
Repro / context
Voice-agent pipeline:
broadcast = moq.BroadcastProducer()
audio = broadcast.publish_audio(
"bot-audio",
moq.AudioEncoderInput(format=moq.AudioFormat.S16, sample_rate=24000, channels=1),
moq.AudioEncoderOutput(codec=moq.AudioCodec.OPUS, frame_duration_ms=20, ...),
)
# Bot starts speaking; many `write()`s queue up in the encoder/wire/browser.
for chunk in tts_stream():
audio.write(moq.AudioFrame(timestamp_us=0, data=chunk))
# User interrupts. We stop generating new audio. But there's no API to
# drop the in-flight buffer — the user keeps hearing the bot for ~hundreds
# of ms more.
Attempted workaround: audio.finish() followed by broadcast.publish_audio("bot-audio", ...) raises Error processing frame: duplicate — the broadcast won't accept republishing under the same track name.
Today we work around this by pacing write() calls against a wall-clock virtual timer in Python so the in-flight buffer never exceeds ~20 ms. It works but it's a backpressure mechanism reinvented in userland.
Requested API
Add AudioProducer.flush() / AudioProducer.cancel() — drop any unencoded + unsent frames without closing the track. Consumer-side observable as a brief skip in playback.
Why this matters
This is the main blocker for moq-rs-based real-time conversational AI (voice agents). Without it, interruption latency is bounded by the sum of: pacing buffer + WebTransport send window + browser jitter buffer, and the only thing the bot can control is the first one. We're keeping our pacing budget at 20 ms which makes the pipeline very jitter-sensitive.
tl;dr:
claude wrote this (and spoiler- it wrote the accompanying PR too).
I'm open to other solutions, but it does seem like
AudioProducer.flush()would be clutch to interrupt voice agents.Summary
moq.AudioProducer.write()(Python bindings,moq-rs 0.2.17) is fire-and-forget — there's no way to drop already-written audio from the encoder/wire/jitter buffer without closing the whole track. For real-time interactive use cases (voice agents, where the user interrupts the bot mid-utterance), this means we can stop generating audio but the already-buffered audio finishes playing on the consumer side.Repro / context
Voice-agent pipeline:
Attempted workaround:
audio.finish()followed bybroadcast.publish_audio("bot-audio", ...)raisesError processing frame: duplicate— the broadcast won't accept republishing under the same track name.Today we work around this by pacing
write()calls against a wall-clock virtual timer in Python so the in-flight buffer never exceeds ~20 ms. It works but it's a backpressure mechanism reinvented in userland.Requested API
Add
AudioProducer.flush()/AudioProducer.cancel()— drop any unencoded + unsent frames without closing the track. Consumer-side observable as a brief skip in playback.Why this matters
This is the main blocker for
moq-rs-based real-time conversational AI (voice agents). Without it, interruption latency is bounded by the sum of: pacing buffer + WebTransport send window + browser jitter buffer, and the only thing the bot can control is the first one. We're keeping our pacing budget at 20 ms which makes the pipeline very jitter-sensitive.