Skip to content

experimental-image-proxy: hangs indefinitely when CDN stalls mid-blob-download (FinishPipe deadlock) #2878

@dustymabe

Description

@dustymabe

In Fedora CoreOS we've been running into an odd bug in our upgrade tests in our pipeline. I filed an issue for that over in coreos/fedora-coreos-tracker#2149 (it's a good read if you have time).

Following on from that I had AI analyze to see if the bug was fixed in newer code and it thinks it isn't and needs to be fixed. I generated this AI bug report which I think summarizes the problem well:

Summary

skopeo experimental-image-proxy hangs permanently when an HTTP connection stalls mid-transfer during a blob download. The proxy's FinishPipe handler blocks on wg.Wait() while holding the handler mutex, which deadlocks the entire proxy — no further requests can be processed, and the calling application (rpm-ostree, bootc, etc.) hangs indefinitely.

This was observed during an automated Fedora CoreOS upgrade via zincati/rpm-ostree. The Akamai CDN serving a Quay.io blob stopped sending data at 76% completion while keeping the TCP connection alive. Because there is no idle timeout anywhere in the pipeline (containers/image HTTP client, skopeo proxy goroutine, or the proxy protocol), the entire system deadlocked permanently.

Environment

  • OS: Fedora CoreOS (cosa-devsh VM, QEMU NAT networking)
  • Package: rpm-ostree-2025.6-6.fc42.x86_64
  • skopeo: bundled with rpm-ostree (protocol version 0.2.6)
  • Image: quay.io/fedora/fedora-coreos@sha256:c6a73015376ade9c5ae50e8cca5356e542a51edb8df4a05590abfaf268edeaa9
  • Triggered by: zincati.service (automatic update agent)
  • Proxy invocation: setpriv --no-new-privs --init-groups --reuid rpm-ostree --bounding-set -all --pdeathsig TERM -- skopeo experimental-image-proxy --no-creds

Symptoms

With CONTAINERS_IMAGE_PROXY_DEBUG=1, the process prints the following and then hangs forever:

DEBUG new_with_config: Spawned skopeo pid=3978 config=ImageProxyConfig { authfile: None, auth_data: None, auth_anonymous: true, ... }
DEBUG new_with_config: Remote protocol version: 0.2.6 ...
Fetching ostree-remote-image:fedora:docker://quay.io/fedora/fedora-coreos@sha256:c6a730...
DEBUG open_image: opening image ...
DEBUG finish_pipe: closing pipe ... pipeid=7
DEBUG finish_pipe: closing pipe ... pipeid=8
layers already present: 36; layers needed: 15 (245.6 MB)
DEBUG Fetching base
DEBUG get_layer_info: Getting layer info ...
DEBUG fetching sha256:a283e9be7391dc4a3d858e321574277c621faad113a8f6b36e0511cd05352448
DEBUG get_blob: fetching blob ... size=2759859
DEBUG layer progress: [0/15] Fetching ostree chunk a283e9be7391dc4a3d8 (2.8 MB)
DEBUG finish_pipe: closing pipe ... pipeid=8
    ← hangs here forever, no further output

Root Cause Analysis

Three issues combine to produce a permanent deadlock:

1. CDN stalls mid-transfer (trigger)

The Akamai CDN edge server (23.61.255.227, serving cdn01.quay.io) stopped sending data after delivering 76% (2,124,175 of 2,759,859 bytes) of a blob. The TCP connection remained alive — keepalive ACKs continued flowing — but no application data was sent. ss -tnpi output during the hang:

ESTAB  0  0  10.0.2.15:50856  23.61.255.227:443  users:(("skopeo",pid=3978,fd=6))
  bytes_received:2124175
  lastsnd:1341635   ← last data sent ~22 minutes ago
  lastrcv:1341446   ← last data received ~22 minutes ago
  lastack:12438     ← TCP keepalive ACKs still flowing (~12 sec ago)
  rcv_wnd:2636640   ← skopeo advertising 2.5 MB receive window (not a flow control issue)

2. No idle timeout on HTTP body reads (containers/image)

The containers/image library's HTTP transport (pkg/tlsclientconfig/tlsclientconfig.go:88-101) sets TCP-level timeouts but has no HTTP body read timeout:

func NewTransport() *http.Transport {
    direct := &net.Dialer{
        Timeout:   30 * time.Second,  // TCP connect only
        KeepAlive: 30 * time.Second,  // TCP keepalive interval
    }
    tr := &http.Transport{
        Proxy:               http.ProxyFromEnvironment,
        DialContext:         direct.DialContext,
        TLSHandshakeTimeout: 10 * time.Second,
        IdleConnTimeout:     90 * time.Second,  // idle conn pool, NOT active reads
        MaxIdleConns:        100,
        // No ResponseHeaderTimeout
        // No overall Timeout on http.Client
        // No body-read idle timeout
    }
    return tr
}

The bodyReader wrapper (docker/body_reader.go:139) tracks lastSuccessTime (line 147) but only uses it for debug logging (line 224-226), never to enforce a deadline. Its reconnection logic only handles hard failures (ECONNRESET, ErrUnexpectedEOF), not stalls where the connection stays open but no data arrives.

3. FinishPipe holds mutex during wg.Wait() — deadlocks entire proxy (skopeo/container-libs)

In the proxy handler (common/pkg/json-proxy/handler.go):

GetBlob (line 427) spawns a goroutine that copies the blob from the HTTP response body to a pipe:

func (h *handler) GetBlob(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()
    defer h.lock.Unlock()
    // ...
    go func() {
        defer blobr.Close()
        defer f.wg.Done()
        // ...
        n, err := io.Copy(f.w, tr)  // line 472 — blocks if HTTP body stalls
        // ...
    }()
    // returns pipeid + fd to client
}

The goroutine at line 466 does not use any context — it does a raw io.Copy with no cancellation mechanism. If the HTTP body read stalls, this goroutine blocks forever.

FinishPipe (line 664) waits for the goroutine to complete while holding the handler mutex:

func (h *handler) FinishPipe(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()         // line 665
    defer h.lock.Unlock() // line 666
    // ...
    f.wg.Wait()           // line 682 — blocks forever if goroutine is stuck
    // ...
}

Since h.lock is held, all other handler methods (GetBlob, GetManifest, OpenImage, etc.) are blocked. The proxy cannot process any requests — including a potential shutdown/cleanup — until the stalled goroutine completes, which it never will because the CDN has stopped sending data.

Deadlock Diagram

rpm-ostree (pid 3980)                 skopeo proxy (pid 3986)              Akamai CDN

  sendto(29, FinishPipe[8]) ────────►  recvfrom(3, FinishPipe[8])
                                       h.lock.Lock()  ← acquired
                                       f.wg.Wait()    ← BLOCKED
  recvmsg(29) ← BLOCKED                  │
    waiting for response                  │  goroutine still running:
    that never comes                      │  io.Copy(pipe, httpBody)
                                          │     │
                                          │     └──► Read(fd=6) ──────────► CDN stalled
                                          │          EAGAIN                  no data
                                          │          epoll: no events        keepalive OK
                                          │          goroutine parked        connection alive
                                          │                                  but silent
                                          ▼
                                       wg never Done()
                                       lock never released
                                       proxy fully deadlocked

Protocol Message Trace (from strace -f)

The image proxy protocol uses a SOCK_SEQPACKET Unix socket. JSON requests are sent from rpm-ostree, and JSON responses (with optional file descriptors via SCM_RIGHTS) come back from skopeo. The strace shows 9 requests sent but only 8 responses received:

# Request Response Status
1 version handshake "0.2.6" OK
2 OpenImage(docker://quay.io/...) value: 1 OK
3 GetManifest(1) sha256:c6a7..., pipeid=7, fd passed OK
4 FinishPipe(7) null OK
5 get config/metadata null, pipeid=8, fd passed OK
6 FinishPipe(8) null OK
7 GetLayerInfo(1) 8178 bytes of layer JSON OK
8 GetBlob(1, "sha256:a283e9...", 2759859) value: 2759859, pipeid=8, fd passed OK
9 FinishPipe(8) NO RESPONSE DEADLOCK

Evidence from strace — skopeo received all 9 requests on fd 3:

[pid  3978] recvfrom(3, "{\"method\":\"OpenImage\",\"args\":[\"docker://quay.io/...\"]}", ...)
[pid  3984] recvfrom(3, "{\"method\":\"GetManifest\",\"args\":[1]}", ...)
[pid  3984] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[7]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[8]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"GetLayerInfo\",\"args\":[1]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"GetBlob\",\"args\":[1,\"sha256:a283e9...\",2759859]}", ...)
[pid  3986] recvfrom(3, "{\"method\":\"FinishPipe\",\"args\":[8]}", ...)   ← received, never responded

But only 8 sendmsg(3, ...) responses were sent — the 9th (for FinishPipe(8)) was never sent because the handler is stuck in wg.Wait().

Evidence from strace — rpm-ostree blocked waiting for the response:

[pid  3980] sendto(29, '{"method":"FinishPipe","args":[8]}', ...) = 34
[pid  3980] recvmsg(29, ...    ← BLOCKED, waiting for response that never comes

Evidence from strace — skopeo's blob streaming goroutine stalled:

[pid  3986] write(8, ..., 4264) = 4264    # last chunk written to pipe (fd 8)
[pid  3986] read(6, ...) = -1 EAGAIN      # no more data from CDN TLS connection (fd 6)
[pid  3986] epoll_pwait(4, [], ...) = 0    # no events on any fd
[pid  3986] futex(FUTEX_WAIT, ...)         # goroutine parked by Go runtime

Download progress at stall

Metric Value
Expected blob size 2,759,859 bytes
Bytes received from CDN (ss bytes_received) 2,124,175 bytes
Bytes written to pipe (summed from strace write(8,...)) ~2,107,617 bytes
Completion 76.4%

Why This Only Affects the Proxy Path

This deadlock does not occur with direct podman pull or skopeo copy because:

  1. Direct pulls use the caller's context (which can be canceled via ctrl-C, systemd timeout, etc.). When the context is canceled, Go's net/http closes the underlying TCP connection, and the Read() returns an error. The stall is recoverable.

  2. The proxy path passes context.Background() to the handler — no deadline, no cancellation. The io.Copy goroutine has no context at all. The only thing that can unblock it is the TCP connection actually dying (via TCP keepalive, which can take hours with the OS defaults).

  3. The mutex in FinishPipe amplifies the impact: a stalled blob download doesn't just block that one operation — it deadlocks the entire proxy, preventing any other request from being processed including potential cleanup/shutdown.

Suggested Fixes

Fix 1 (proxy — high priority): Don't hold mutex during wg.Wait()

In handler.go, FinishPipe should release the mutex before waiting:

func (h *handler) FinishPipe(ctx context.Context, args []any) (replyBuf, error) {
    h.lock.Lock()
    // ... validate pipeid, get f ...
    h.lock.Unlock()  // release before blocking

    f.wg.Wait()      // now other requests can proceed while we wait

    h.lock.Lock()    // re-acquire to clean up
    delete(h.activePipes, pipeid)
    h.lock.Unlock()

    f.w.Close()
    return ret, f.err
}

Fix 2 (proxy — high priority): Wire context through to the io.Copy goroutine

The GetBlob goroutine should respect context cancellation:

go func() {
    defer blobr.Close()
    defer f.wg.Done()
    verifier := d.Verifier()
    tr := io.TeeReader(blobr, verifier)

    // Use a context-aware copy
    done := ctx.Done()
    buf := make([]byte, 32*1024)
    for {
        select {
        case <-done:
            f.err = ctx.Err()
            return
        default:
        }
        n, err := tr.Read(buf)
        if n > 0 {
            if _, werr := f.w.Write(buf[:n]); werr != nil {
                f.err = werr
                return
            }
        }
        if err != nil {
            if err != io.EOF {
                f.err = err
            }
            return
        }
    }
}()

Fix 3 (containers/image — medium priority): Add idle timeout to bodyReader

The bodyReader in docker/body_reader.go already tracks lastSuccessTime. It should use this to enforce an idle timeout, e.g.:

func (br *bodyReader) Read(p []byte) (int, error) {
    // Set a per-read deadline on the underlying connection
    if br.idleTimeout > 0 && !br.lastSuccessTime.IsZero() {
        if time.Since(br.lastSuccessTime) > br.idleTimeout {
            return 0, fmt.Errorf("read idle timeout after %v with no data", br.idleTimeout)
        }
    }
    // ... existing Read logic ...
}

Or better, wrap the response body with a deadline-enforcing reader at the http.Transport level.

This would fix the issue for all consumers of containers/image, not just the proxy.

Workaround

Kill the skopeo process to unblock rpm-ostree:

sudo kill $(pgrep -f 'skopeo experimental-image-proxy')

rpm-ostree/zincati will detect the child exit and can retry the upgrade.

Related

  • containers/image issue #1145 — RFE for per-layer retry on blob download failures
  • containers/image PR #1469 — proposed reducing TCP KeepAlive from 30s to 15s for faster dead connection detection
  • containers/image docker/body_reader.go — reconnection logic handles ECONNRESET/ErrUnexpectedEOF but not stalls
  • containers/container-libs common/pkg/json-proxy/handler.go — proxy handler with the mutex/wg.Wait deadlock

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions