`experimental-image-proxy`: hangs indefinitely when CDN stalls mid-blob-download (FinishPipe deadlock)

In Fedora CoreOS we've been running into an odd bug in our upgrade tests in our pipeline. I filed an issue for that over in https://github.com/coreos/fedora-coreos-tracker/issues/2149 (it's a good read if you have time).

Following on from that I had AI analyze to see if the bug was fixed in newer code and it thinks it isn't and needs to be fixed. I generated this AI bug report which I think summarizes the problem well:

## Summary

`skopeo experimental-image-proxy` hangs permanently when an HTTP connection stalls mid-transfer during a blob download. The proxy's `FinishPipe` handler blocks on `wg.Wait()` while holding the handler mutex, which deadlocks the entire proxy — no further requests can be processed, and the calling application (rpm-ostree, bootc, etc.) hangs indefinitely.

This was observed during an automated Fedora CoreOS upgrade via zincati/rpm-ostree. The Akamai CDN serving a Quay.io blob stopped sending data at 76% completion while keeping the TCP connection alive. Because there is no idle timeout anywhere in the pipeline (`containers/image` HTTP client, skopeo proxy goroutine, or the proxy protocol), the entire system deadlocked permanently.

## Environment

- **OS**: Fedora CoreOS (cosa-devsh VM, QEMU NAT networking)
- **Package**: `rpm-ostree-2025.6-6.fc42.x86_64`
- **skopeo**: bundled with rpm-ostree (protocol version `0.2.6`)
- **Image**: `quay.io/fedora/fedora-coreos@sha256:c6a73015376ade9c5ae50e8cca5356e542a51edb8df4a05590abfaf268edeaa9`
- **Triggered by**: `zincati.service` (automatic update agent)
- **Proxy invocation**: `setpriv --no-new-privs --init-groups --reuid rpm-ostree --bounding-set -all --pdeathsig TERM -- skopeo experimental-image-proxy --no-creds`

## Symptoms

With `CONTAINERS_IMAGE_PROXY_DEBUG=1`, the process prints the following and then hangs forever:

```
DEBUG new_with_config: Spawned skopeo pid=3978 config=ImageProxyConfig { authfile: None, auth_data: None, auth_anonymous: true, ... }
DEBUG new_with_config: Remote protocol version: 0.2.6 ...
Fetching ostree-remote-image:fedora:docker://quay.io/fedora/fedora-coreos@sha256:c6a730...
DEBUG open_image: opening image ...
DEBUG finish_pipe: closing pipe ... pipeid=7
DEBUG finish_pipe: closing pipe ... pipeid=8
layers already present: 36; layers needed: 15 (245.6 MB)
DEBUG Fetching base
DEBUG get_layer_info: Getting layer info ...
DEBUG fetching sha256:a283e9be7391dc4a3d858e321574277c621faad113a8f6b36e0511cd05352448
DEBUG get_blob: fetching blob ... size=2759859
DEBUG layer progress: [0/15] Fetching ostree chunk a283e9be7391dc4a3d8 (2.8 MB)
DEBUG finish_pipe: closing pipe ... pipeid=8
    ← hangs here forever, no further output
```

## Root Cause Analysis

Three issues combine to produce a permanent deadlock:

### 1. CDN stalls mid-transfer (trigger)

The Akamai CDN edge server (`23.61.255.227`, serving `cdn01.quay.io`) stopped sending data after delivering 76% (2,124,175 of 2,759,859 bytes) of a blob. The TCP connection remained alive — keepalive ACKs continued flowing — but no application data was sent. `ss -tnpi` output during the hang:

```
ESTAB  0  0  10.0.2.15:50856  23.61.255.227:443  users:(("skopeo",pid=3978,fd=6))
  bytes_received:2124175
  lastsnd:1341635   ← last data sent ~22 minutes ago
  lastrcv:1341446   ← last data received ~22 minutes ago
  lastack:12438     ← TCP keepalive ACKs still flowing (~12 sec ago)
  rcv_wnd:2636640   ← skopeo advertising 2.5 MB receive window (not a flow control issue)
```

### 2. No idle timeout on HTTP body reads (`containers/image`)

The `containers/image` library's HTTP transport ([`pkg/tlsclientconfig/tlsclientconfig.go:88-101`](https://github.com/containers/image/blob/main/pkg/tlsclientconfig/tlsclientconfig.go#L88-L101)) sets TCP-level timeouts but has **no HTTP body read timeout**:

```go
func NewTransport() *http.Transport {
    direct := &net.Dialer{
        Timeout:   30 * time.Second,  // TCP connect only
        KeepAlive: 30 * time.Second,  // TCP keepalive interval
    }
    tr := &http.Transport{
        Proxy:               http.ProxyFromEnvironment,
        DialContext:         direct.DialContext,
        TLSHandshakeTimeout: 10 * time.Second,
        IdleConnTimeout:     90 * time.Second,  // idle conn pool, NOT active reads
        MaxIdleConns:        100,
        // No ResponseHeaderTimeout
        // No overall Timeout on http.Client
        // No body-read idle timeout
    }
    return tr
}
```

The `bodyReader` wrapper ([`docker/body_reader.go:139`](https://github.com/containers/image/blob/main/docker/body_reader.go#L139)) tracks `lastSuccessTime` (line 147) but **only uses it for debug logging** (line 224-226), never to enforce a deadline. Its reconnection logic only handles hard failures (`ECONNRESET`, `ErrUnexpectedEOF`), not stalls where the connection stays open but no data arrives.

### 3. `FinishPipe` holds mutex during `wg.Wait()` — deadlocks entire proxy (skopeo/container-libs)

In the proxy handler ([`common/pkg/json-proxy/handler.go`](https://github.com/containers/container-libs/blob/main/common/pkg/json-proxy/handler.go)):

**`GetBlob`** (line 427) spawns a goroutine that copies the blob from the HTTP response body to a pipe:
```go
func (h *handler) GetBlob(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()
    defer h.lock.Unlock()
    // ...
    go func() {
        defer blobr.Close()
        defer f.wg.Done()
        // ...
        n, err := io.Copy(f.w, tr)  // line 472 — blocks if HTTP body stalls
        // ...
    }()
    // returns pipeid + fd to client
}
```

The goroutine at line 466 does **not use any context** — it does a raw `io.Copy` with no cancellation mechanism. If the HTTP body read stalls, this goroutine blocks forever.

**`FinishPipe`** (line 664) waits for the goroutine to complete **while holding the handler mutex**:
```go
func (h *handler) FinishPipe(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()         // line 665
    defer h.lock.Unlock() // line 666
    // ...
    f.wg.Wait()           // line 682 — blocks forever if goroutine is stuck
    // ...
}
```

Since `h.lock` is held, **all** other handler methods (`GetBlob`, `GetManifest`, `OpenImage`, etc.) are blocked. The proxy cannot process any requests — including a potential shutdown/cleanup — until the stalled goroutine completes, which it never will because the CDN has stopped sending data.

## Deadlock Diagram

```
rpm-ostree (pid 3980)                 skopeo proxy (pid 3986)              Akamai CDN

  sendto(29, FinishPipe[8]) ────────►  recvfrom(3, FinishPipe[8])
                                       h.lock.Lock()  ← acquired
                                       f.wg.Wait()    ← BLOCKED
  recvmsg(29) ← BLOCKED                  │
    waiting for response                  │  goroutine still running:
    that never comes                      │  io.Copy(pipe, httpBody)
                                          │     │
                                          │     └──► Read(fd=6) ──────────► CDN stalled
                                          │          EAGAIN                  no data
                                          │          epoll: no events        keepalive OK
                                          │          goroutine parked        connection alive
                                          │                                  but silent
                                          ▼
                                       wg never Done()
                                       lock never released
                                       proxy fully deadlocked
```

## Protocol Message Trace (from `strace -f`)

The image proxy protocol uses a `SOCK_SEQPACKET` Unix socket. JSON requests are sent from rpm-ostree, and JSON responses (with optional file descriptors via `SCM_RIGHTS`) come back from skopeo. The strace shows 9 requests sent but only 8 responses received:

| # | Request | Response | Status |
|---|---------|----------|--------|
| 1 | *version handshake* | `"0.2.6"` | OK |
| 2 | `OpenImage(docker://quay.io/...)` | `value: 1` | OK |
| 3 | `GetManifest(1)` | `sha256:c6a7...`, `pipeid=7`, fd passed | OK |
| 4 | `FinishPipe(7)` | `null` | OK |
| 5 | *get config/metadata* | `null`, `pipeid=8`, fd passed | OK |
| 6 | `FinishPipe(8)` | `null` | OK |
| 7 | `GetLayerInfo(1)` | 8178 bytes of layer JSON | OK |
| 8 | `GetBlob(1, "sha256:a283e9...", 2759859)` | `value: 2759859`, `pipeid=8`, fd passed | OK |
| 9 | `FinishPipe(8)` | **NO RESPONSE** | **DEADLOCK** |

Evidence from strace — skopeo received all 9 requests on fd 3:

```
[pid  3978] recvfrom(3, "{\"method\":\"OpenImage\",\"args\":[\"docker://quay.io/...\"]}", ...)
[pid  3984] recvfrom(3, "{\"method\":\"GetManifest\",\"args\":[1]}", ...)
[pid  3984] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[7]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[8]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"GetLayerInfo\",\"args\":[1]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"GetBlob\",\"args\":[1,\"sha256:a283e9...\",2759859]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[8]}", ...)   ← received, never responded
```

But only 8 `sendmsg(3, ...)` responses were sent — the 9th (for `FinishPipe(8)`) was never sent because the handler is stuck in `wg.Wait()`.

Evidence from strace — rpm-ostree blocked waiting for the response:

```
[pid  3980] sendto(29, '{"method":"FinishPipe","args":[8]}', ...) = 34
[pid  3980] recvmsg(29, ...    ← BLOCKED, waiting for response that never comes
```

Evidence from strace — skopeo's blob streaming goroutine stalled:

```
[pid  3986] write(8, ..., 4264) = 4264    # last chunk written to pipe (fd 8)
[pid  3986] read(6, ...) = -1 EAGAIN      # no more data from CDN TLS connection (fd 6)
[pid  3986] epoll_pwait(4, [], ...) = 0    # no events on any fd
[pid  3986] futex(FUTEX_WAIT, ...)         # goroutine parked by Go runtime
```

### Download progress at stall

| Metric | Value |
|--------|-------|
| Expected blob size | 2,759,859 bytes |
| Bytes received from CDN (`ss bytes_received`) | 2,124,175 bytes |
| Bytes written to pipe (summed from strace `write(8,...)`) | ~2,107,617 bytes |
| Completion | **76.4%** |

## Why This Only Affects the Proxy Path

This deadlock does **not** occur with direct `podman pull` or `skopeo copy` because:

1. **Direct pulls** use the caller's context (which can be canceled via ctrl-C, systemd timeout, etc.). When the context is canceled, Go's `net/http` closes the underlying TCP connection, and the `Read()` returns an error. The stall is recoverable.

2. **The proxy path** passes `context.Background()` to the handler — no deadline, no cancellation. The `io.Copy` goroutine has no context at all. The only thing that can unblock it is the TCP connection actually dying (via TCP keepalive, which can take hours with the OS defaults).

3. **The mutex in `FinishPipe`** amplifies the impact: a stalled blob download doesn't just block that one operation — it deadlocks the entire proxy, preventing any other request from being processed including potential cleanup/shutdown.

## Suggested Fixes

### Fix 1 (proxy — high priority): Don't hold mutex during `wg.Wait()`

In `handler.go`, `FinishPipe` should release the mutex before waiting:

```go
func (h *handler) FinishPipe(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()
    // ... validate pipeid, get f ...
    h.lock.Unlock()  // release before blocking

    f.wg.Wait()      // now other requests can proceed while we wait

    h.lock.Lock()    // re-acquire to clean up
    delete(h.activePipes, pipeid)
    h.lock.Unlock()

    f.w.Close()
    return ret, f.err
}
```

### Fix 2 (proxy — high priority): Wire context through to the `io.Copy` goroutine

The `GetBlob` goroutine should respect context cancellation:

```go
go func() {
    defer blobr.Close()
    defer f.wg.Done()
    verifier := d.Verifier()
    tr := io.TeeReader(blobr, verifier)

    // Use a context-aware copy
    done := ctx.Done()
    buf := make([]byte, 32*1024)
    for {
        select {
        case <-done:
            f.err = ctx.Err()
            return
        default:
        }
        n, err := tr.Read(buf)
        if n > 0 {
            if _, werr := f.w.Write(buf[:n]); werr != nil {
                f.err = werr
                return
            }
        }
        if err != nil {
            if err != io.EOF {
                f.err = err
            }
            return
        }
    }
}()
```

### Fix 3 (`containers/image` — medium priority): Add idle timeout to `bodyReader`

The `bodyReader` in `docker/body_reader.go` already tracks `lastSuccessTime`. It should use this to enforce an idle timeout, e.g.:

```go
func (br *bodyReader) Read(p []byte) (int, error) {
    // Set a per-read deadline on the underlying connection
    if br.idleTimeout > 0 && !br.lastSuccessTime.IsZero() {
        if time.Since(br.lastSuccessTime) > br.idleTimeout {
            return 0, fmt.Errorf("read idle timeout after %v with no data", br.idleTimeout)
        }
    }
    // ... existing Read logic ...
}
```

Or better, wrap the response body with a deadline-enforcing reader at the `http.Transport` level.

This would fix the issue for all consumers of `containers/image`, not just the proxy.

## Workaround

Kill the skopeo process to unblock rpm-ostree:

```bash
sudo kill $(pgrep -f 'skopeo experimental-image-proxy')
```

rpm-ostree/zincati will detect the child exit and can retry the upgrade.

## Related

- `containers/image` issue [#1145](https://github.com/containers/image/issues/1145) — RFE for per-layer retry on blob download failures
- `containers/image` PR [#1469](https://github.com/containers/image/pull/1469) — proposed reducing TCP KeepAlive from 30s to 15s for faster dead connection detection
- `containers/image` [`docker/body_reader.go`](https://github.com/containers/image/blob/main/docker/body_reader.go) — reconnection logic handles `ECONNRESET`/`ErrUnexpectedEOF` but not stalls
- `containers/container-libs` [`common/pkg/json-proxy/handler.go`](https://github.com/containers/container-libs/blob/main/common/pkg/json-proxy/handler.go) — proxy handler with the mutex/wg.Wait deadlock


#	Request	Response	Status
1	version handshake	`"0.2.6"`	OK
2	`OpenImage(docker://quay.io/...)`	`value: 1`	OK
3	`GetManifest(1)`	`sha256:c6a7...`, `pipeid=7`, fd passed	OK
4	`FinishPipe(7)`	`null`	OK
5	get config/metadata	`null`, `pipeid=8`, fd passed	OK
6	`FinishPipe(8)`	`null`	OK
7	`GetLayerInfo(1)`	8178 bytes of layer JSON	OK
8	`GetBlob(1, "sha256:a283e9...", 2759859)`	`value: 2759859`, `pipeid=8`, fd passed	OK
9	`FinishPipe(8)`	NO RESPONSE	DEADLOCK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`experimental-image-proxy`: hangs indefinitely when CDN stalls mid-blob-download (FinishPipe deadlock) #2878

Summary

Environment

Symptoms

Root Cause Analysis

1. CDN stalls mid-transfer (trigger)

2. No idle timeout on HTTP body reads (`containers/image`)

3. `FinishPipe` holds mutex during `wg.Wait()` — deadlocks entire proxy (skopeo/container-libs)

Deadlock Diagram

Protocol Message Trace (from `strace -f`)

Download progress at stall

Why This Only Affects the Proxy Path

Suggested Fixes

Fix 1 (proxy — high priority): Don't hold mutex during `wg.Wait()`

Fix 2 (proxy — high priority): Wire context through to the `io.Copy` goroutine

Fix 3 (`containers/image` — medium priority): Add idle timeout to `bodyReader`

Workaround

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Expected blob size	2,759,859 bytes
Bytes received from CDN (`ss bytes_received`)	2,124,175 bytes
Bytes written to pipe (summed from strace `write(8,...)`)	~2,107,617 bytes
Completion	76.4%

experimental-image-proxy: hangs indefinitely when CDN stalls mid-blob-download (FinishPipe deadlock) #2878

Description

Summary

Environment

Symptoms

Root Cause Analysis

1. CDN stalls mid-transfer (trigger)

2. No idle timeout on HTTP body reads (containers/image)

3. FinishPipe holds mutex during wg.Wait() — deadlocks entire proxy (skopeo/container-libs)

Deadlock Diagram

Protocol Message Trace (from strace -f)

Download progress at stall

Why This Only Affects the Proxy Path

Suggested Fixes

Fix 1 (proxy — high priority): Don't hold mutex during wg.Wait()

Fix 2 (proxy — high priority): Wire context through to the io.Copy goroutine

Fix 3 (containers/image — medium priority): Add idle timeout to bodyReader

Workaround

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`experimental-image-proxy`: hangs indefinitely when CDN stalls mid-blob-download (FinishPipe deadlock) #2878

2. No idle timeout on HTTP body reads (`containers/image`)

3. `FinishPipe` holds mutex during `wg.Wait()` — deadlocks entire proxy (skopeo/container-libs)

Protocol Message Trace (from `strace -f`)

Fix 1 (proxy — high priority): Don't hold mutex during `wg.Wait()`

Fix 2 (proxy — high priority): Wire context through to the `io.Copy` goroutine

Fix 3 (`containers/image` — medium priority): Add idle timeout to `bodyReader`