[fal.ai/livepeer-staging] TrickleSegmentWriteError mid-session — segment write fails to orch-staging-1 while session is active

## Summary

The `livepeer_gateway.trickle_publisher` is throwing `TrickleSegmentWriteError` during **active sessions** (not on teardown), failing to POST trickle segments to the orchestrator. This is distinct from #846 which covers 404 errors on session *teardown*. Here the stream is confirmed live (hundreds of prior segments succeeded), and then a segment POST fails mid-session.

cc @mjh1 @emranemran

## Error Logs (Grafana Loki, 2026-04-10 ~20:18–20:21 UTC)

```
2026-04-10 20:18:47,802 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583 error=Trickle POST exception ...

2026-04-10 20:20:52,724 - livepeer_gateway.trickle_publisher - WARNING - Trickle POST retrying same segment url=.../d21a61c3-4-out/584 (no request body consumed)

2026-04-10 20:21:52,932 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=.../d21a61c3-4-out/584 error=...
```

## Stack Trace

```
livepeer_gateway.trickle_publisher.TrickleSegmentWriteError: Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/media_publish.py", line 856, in _stream_pipe_to_trickle
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 574, in write
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 224, in _run_post
```

## Context

- **Session:** `d21a61c3` on `orch-staging-1.daydream.monster:8935`
- **Frequency:** 2 segment failures (seq 583 and 584) in same session
- **Session stats at time of error:** 583 segments started, 582 completed, 1 failed — stream was active ~27 minutes (`elapsed_s=1490s`)
- **App:** `github_f1lhgmk5v76a0ev1w0u378by-scope-livepeer-staging` 
- **Pattern:** segment 583 POST failed → retry of 584 also failed with 'no request body consumed' → both end in TrickleSegmentWriteError

## Probable Cause

The orchestrator dropped the connection or rejected the write mid-session (not EOF/404). The 'no request body consumed' warning on retry suggests the orch-side HTTP server may have closed the connection before reading the body — possibly a timeout, transient network issue, or orch-staging-1 hiccup.

## Related Issues

- #846 — Trickle 404 on teardown (different scenario: stream already closed)
- #805 — TransferEncodingError 400 on media input loop

## Suggested Fix

- Add explicit distinction between mid-session write failures vs teardown-404s in error handling
- Consider retry logic with backoff for TrickleSegmentWriteError (currently appears to do one retry then fail the segment)
- Add alerting/metric when `segments_failed > 0` in MediaPublishStats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fal.ai/livepeer-staging] TrickleSegmentWriteError mid-session — segment write fails to orch-staging-1 while session is active #912

Summary

Error Logs (Grafana Loki, 2026-04-10 ~20:18–20:21 UTC)

Stack Trace

Context

Probable Cause

Related Issues

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[fal.ai/livepeer-staging] TrickleSegmentWriteError mid-session — segment write fails to orch-staging-1 while session is active #912

Description

Summary

Error Logs (Grafana Loki, 2026-04-10 ~20:18–20:21 UTC)

Stack Trace

Context

Probable Cause

Related Issues

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions