Skip to content

[fal.ai/livepeer-staging] TrickleSegmentWriteError mid-session — segment write fails to orch-staging-1 while session is active #912

@livepeer-tessa

Description

@livepeer-tessa

Summary

The livepeer_gateway.trickle_publisher is throwing TrickleSegmentWriteError during active sessions (not on teardown), failing to POST trickle segments to the orchestrator. This is distinct from #846 which covers 404 errors on session teardown. Here the stream is confirmed live (hundreds of prior segments succeeded), and then a segment POST fails mid-session.

cc @mjh1 @emranemran

Error Logs (Grafana Loki, 2026-04-10 ~20:18–20:21 UTC)

2026-04-10 20:18:47,802 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583 error=Trickle POST exception ...

2026-04-10 20:20:52,724 - livepeer_gateway.trickle_publisher - WARNING - Trickle POST retrying same segment url=.../d21a61c3-4-out/584 (no request body consumed)

2026-04-10 20:21:52,932 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=.../d21a61c3-4-out/584 error=...

Stack Trace

livepeer_gateway.trickle_publisher.TrickleSegmentWriteError: Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/media_publish.py", line 856, in _stream_pipe_to_trickle
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 574, in write
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 224, in _run_post

Context

  • Session: d21a61c3 on orch-staging-1.daydream.monster:8935
  • Frequency: 2 segment failures (seq 583 and 584) in same session
  • Session stats at time of error: 583 segments started, 582 completed, 1 failed — stream was active ~27 minutes (elapsed_s=1490s)
  • App: github_f1lhgmk5v76a0ev1w0u378by-scope-livepeer-staging
  • Pattern: segment 583 POST failed → retry of 584 also failed with 'no request body consumed' → both end in TrickleSegmentWriteError

Probable Cause

The orchestrator dropped the connection or rejected the write mid-session (not EOF/404). The 'no request body consumed' warning on retry suggests the orch-side HTTP server may have closed the connection before reading the body — possibly a timeout, transient network issue, or orch-staging-1 hiccup.

Related Issues

Suggested Fix

  • Add explicit distinction between mid-session write failures vs teardown-404s in error handling
  • Consider retry logic with backoff for TrickleSegmentWriteError (currently appears to do one retry then fail the segment)
  • Add alerting/metric when segments_failed > 0 in MediaPublishStats

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions