Skip to content

feat(sync): [DC-216839]: use HTTP/2 with automatic HTTP/1.1 fallback on framing errors#4114

Open
rchincha wants to merge 25 commits into
project-zot:mainfrom
rchincha:mrudnoru-pr-http2-fallback
Open

feat(sync): [DC-216839]: use HTTP/2 with automatic HTTP/1.1 fallback on framing errors#4114
rchincha wants to merge 25 commits into
project-zot:mainfrom
rchincha:mrudnoru-pr-http2-fallback

Conversation

@rchincha

@rchincha rchincha commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

Which issue does this PR fix:

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

Automation added to e2e:

Will this break upgrades or downgrades?

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a sync-specific HTTP transport that prefers HTTP/2 but automatically retries over HTTP/1.1 when upstreams emit HTTP/2 framing/protocol errors (e.g., the “malformed HTTP response” class of failures), and documents/validates the behavior via tests.

Changes:

  • Introduces http2FallbackTransport and wires it into sync’s regclient HTTP client.
  • Adds unit coverage for framing-error classification + retry/sticky-host behavior, plus a new blackbox smoke test and CI inclusion.
  • Updates documentation and module dependencies to support the new HTTP/2 transport logic.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/ports.json Allocates port ranges for the new blackbox test.
test/blackbox/sync_http2_fallback.bats New smoke test for sync behavior with the new transport (happy-path).
test/blackbox/ci.sh Adds the new blackbox test to CI execution list.
pkg/extensions/sync/sync_test.go Makes TLS sync test more robust by retrying until the manifest is available.
pkg/extensions/sync/service.go Switches sync regclient HTTP client transport to the new HTTP/2-with-fallback transport.
pkg/extensions/sync/http2_fallback.go Implements HTTP/2-first RoundTripper with sticky HTTP/1.1 fallback on framing errors.
pkg/extensions/sync/http2_fallback_internal_test.go Adds focused unit tests for fallback, stickiness, and real-transport boundary behavior.
go.mod Promotes golang.org/x/net to a direct dependency (for http2).
examples/README.md Documents the new HTTP/2 transport and automatic fallback behavior for operators.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/blackbox/sync_http2_fallback.bats
Comment thread test/blackbox/sync_http2_fallback.bats
@rchincha rchincha force-pushed the mrudnoru-pr-http2-fallback branch from 177100e to 491ab77 Compare June 6, 2026 02:27
@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.00559% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.44%. Comparing base (273b153) to head (ca5e796).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/extensions/sync/http2_fallback.go 83.12% 16 Missing and 11 partials ⚠️
pkg/extensions/sync/utils.go 61.11% 2 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4114      +/-   ##
==========================================
- Coverage   91.55%   91.44%   -0.11%     
==========================================
  Files         200      201       +1     
  Lines       29306    29479     +173     
==========================================
+ Hits        26830    26958     +128     
- Misses       1594     1625      +31     
- Partials      882      896      +14     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment thread test/blackbox/sync_http2_fallback.bats
Comment thread test/blackbox/sync_http2_fallback.bats
Comment thread pkg/extensions/sync/http2_fallback.go
Comment thread pkg/extensions/sync/http2_fallback.go
Comment thread pkg/extensions/sync/http2_fallback_internal_test.go Outdated
mrudnoru and others added 21 commits June 9, 2026 21:42
…on framing errors

Default to HTTP/2 for upstream sync; on the first INTERNAL_ERROR /
malformed framing response (Mode B), automatically retry with HTTP/1.1
and remember the choice for subsequent calls. Includes blob streaming
path. Replaces the prior unconditional "force HTTP/1.1" workaround so
we keep HTTP/2 perf for healthy upstreams.

Squashed from c02c0d8a, 26d13489, 0e04f5e4.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
Move http2FallbackTransport, isHTTP2FramingError, and the transport
construction out of service.go into pkg/extensions/sync/http2_fallback.go.
service.go now calls a single newHTTP2FallbackTransport() helper.

Same anti-pattern rchincha flagged on project-zot#4063 ("too much going in
pkg/api/routes.go wrt storage etc. We should move all that code out to
pkg/api/blobstream.go"). Pure refactor — no behavior change.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
…rrors

Replace substring matches on err.Error() with errors.As against
http2.StreamError and *http2.GoAwayError; keep the "malformed HTTP
response" substring check with a comment linking the Go stdlib gap
(no typed error is exposed for that path).

Add unit tests in http2_fallback_test.go covering the classifier and
the RoundTrip primary/fallback paths, including the body-rewind branch.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
Extract the shared DefaultTransport.Clone + ResponseHeaderTimeout setup
into clonedTransport so the primary and HTTP/1.1 fallback transports
are built from the same code path. The fallback now only differs by
disabling HTTP/2 negotiation, which keeps the divergence between the
two paths visible at a glance and prevents the timeout setting from
silently drifting between them.

Restore the rationale comments around the timeout configuration and the
purpose of the fallback overrides.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
After a host framed-errors once, route every subsequent request for
that host straight to the HTTP/1.1 fallback for 15 minutes. The
upstream LB that triggers the fallback (Docker Hub, CDNs sitting in
front of a registry) keeps the misbehavior across many TCP
connections, so paying the HTTP/2 attempt + retry round-trip on every
sync request adds latency without changing the outcome.

State lives in a sync.Map keyed by URL host; entries expire on read so
no background reaper is needed. A pluggable clock keeps the TTL test
deterministic. After the TTL the host gets another HTTP/2 attempt, so
a temporary upstream issue does not permanently downgrade the host.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
…back

Add an operator-facing section under examples/README.md > Sync that
explains the new default (HTTP/2 for upstream sync), the framing-error
fallback path, and the sticky-per-host behavior with its 15-minute TTL.

Include upgrade notes so operators coming from the previous
HTTP/1.1-only sync transport know no configuration change is required
and what log line indicates a host has been downgraded.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
Adds test/blackbox/sync_http2_fallback.bats covering the on-demand sync
happy path through the new HTTP/2 fallback transport (the fallback path
itself is unit tested in pkg/extensions/sync/http2_fallback_test.go).

Allocates port range 11540-11549 (upstream) and 11560-11569 (downstream)
in test/ports.json and registers the test in test/blackbox/ci.sh#L19.

Addresses rchincha's "Pls also consider adding a BATS test under
test/blackbox" pattern from project-zot#4063.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
…ureTransport

The cloned http.DefaultTransport used Go's bundled http2, whose error
types differ from golang.org/x/net/http2, so errors.As never matched and
typed-error fallback never fired. ConfigureTransport rewires the primary
to the external package. Replaces the bats smoke test with a real-server
integration test and corrects the upgrade note.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
The transport refactor dropped the upstream comments explaining the
DefaultTransport clone, ResponseHeaderTimeout, and SyncTimeout. Restore
them verbatim where the code now lives: clone/header rationale in
clonedTransport, the SyncTimeout note above the http.Client.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
…o newClient

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
- rename file to *_internal_test.go so testpackage allows internal package
- replace dynamic errors.New with package-level sentinels (err113)
- switch to http.NewRequestWithContext using t.Context() (noctx)
- close stub response bodies via defer (bodyclose)
- modernize for-loop to range-over-int (modernize)
- rename short var tr to transport (varnamelen)
- wrap long log message in http2_fallback.go (lll)

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
…ack retries

On an HTTP/2 framing error the fallback path now closes any non-nil
response body from the primary attempt before retrying, so a connection
is not leaked when a RoundTripper returns both a response and an error.

The retry also no longer replays a request body it cannot rewind: a real
body (not http.NoBody) with no GetBody returns the primary error instead
of sending a possibly-truncated payload over HTTP/1.1.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
Adds back test/blackbox/sync_http2_fallback.bats covering the on-demand
sync happy path through the HTTP/2 fallback transport, wires it into
test/blackbox/ci.sh, and reserves port ranges 11540-11549 (upstream) and
11560-11569 (downstream) in test/ports.json. Adds jq to the test
prerequisites since the catalog assertions use it.

Signed-off-by: Marga Rudno-Rudzińska <marga.rudno@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…zation in TestConfigReloader and TestTLS

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
… readability

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
… stability

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
… and handle non-existence errors

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
rchincha added 3 commits June 9, 2026 21:43
… TestTLS for certificate verification

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
@rchincha rchincha force-pushed the mrudnoru-pr-http2-fallback branch from 695120b to ca5e796 Compare June 10, 2026 05:59
}

hostDir := filepath.Join(certDir, host)
files, err := os.ReadDir(hostDir)
continue
}

certPEM, err := os.ReadFile(filepath.Join(hostDir, file.Name())) //nolint:gosec // hostDir is derived from configured cert directories

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comment on lines +1 to +5
# Smoke test for the HTTP/2 → HTTP/1.1 fallback transport used by sync.
# The fallback only kicks in when the upstream emits malformed HTTP/2 framing
# (Docker Hub's LB does this occasionally); a normal zot upstream speaks HTTP/2
# cleanly and the primary path is exercised. This test verifies the new
# transport does not break the happy path. Fallback-specific behavior is unit
Comment on lines +122 to +126
run curl http://127.0.0.1:${zot_upstream_port}/v2/_catalog
[ "$status" -eq 0 ]
[ $(echo "${lines[-1]}" | jq '.repositories[]') = '"golang"' ]

run curl http://127.0.0.1:${zot_downstream_port}/v2/golang/manifests/1.20
Comment on lines +129 to +131
run curl http://127.0.0.1:${zot_downstream_port}/v2/_catalog
[ "$status" -eq 0 ]
[ $(echo "${lines[-1]}" | jq '.repositories[]') = '"golang"' ]
Comment thread examples/README.md

### Sync's HTTP/2 transport with HTTP/1.1 fallback

Sync requests now use HTTP/2 by default. Some upstream load balancers (notably Docker Hub's) occasionally send raw HTTP/2 SETTINGS frames on a connection that Go's `net/http` opened as HTTP/1.1, surfacing as `malformed HTTP response` or HTTP/2 `INTERNAL_ERROR` / `PROTOCOL_ERROR`. The sync transport catches those framing errors at the RoundTrip layer and transparently retries the request over HTTP/1.1, so neither the regclient pipeline nor the on-demand caller sees the failure.
@@ -0,0 +1,132 @@
# Smoke test for the HTTP/2 → HTTP/1.1 fallback transport used by sync.
# The fallback only kicks in when the upstream emits malformed HTTP/2 framing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_http2_fallback.bats is largely redundant with existing CI tests for sync. It doesn’t test fallback, and the smoke path it does run is already exercised by sync.bats (and friends).

Can we drop this?

t.stickyHost.Store(host, t.now().Add(t.stickyTTL))
}

func isHTTP2FramingError(err error) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches all StreamError / GoAwayError codes, not just protocol/framing mismatches (e.g. REFUSED_STREAM, ENHANCE_YOUR_CALM). For Docker Hub that’s probably fine, but it could mask genuine HTTP/2 issues on other registries by silently downgrading to HTTP/1.1.

I suggest this is refactored in 3 separate functions

func shouldRetryOnHTTP11Fallback(err error) bool {
    return isHTTP2SessionError(err) || isHTTP1HTTP2Mismatch(err)
}

func isHTTP2SessionError(err error) bool { /* StreamError, GoAwayError */ }

func isHTTP1HTTP2Mismatch(err error) bool {
    return strings.Contains(err.Error(), "malformed HTTP response")
}

@andaaron

Copy link
Copy Markdown
Contributor

@mrudnoru Please review this. I see you left some comments on #4075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants