fix: add GitHubApiClient with GITHUB_TOKEN auth and retry on 403/429 by JeffreyDallas · Pull Request #4712 · hiero-ledger/solo

JeffreyDallas · 2026-06-17T18:23:00Z

Description

This pull request changes the following:

Adds GitHubApiClient with GITHUB_TOKEN auth and automatic retry on 403/429 responses (fixes GitHub API rate-limit failures in CI)
Increases NodesStarted event wait timeout from 10 → 30 minutes (env-var overridable via NODES_STARTED_EVENT_TIMEOUT_MINUTES) to fix relay timeout in one-shot deploy
Increases MirrorNodeDeployed event wait timeout from 10 → 10 minutes (env-var overridable via MIRROR_NODE_DEPLOYED_EVENT_TIMEOUT_MINUTES)
Increases mirror pinger pod readiness timeout from 15 → 30 minutes (MIRROR_NODE_PINGER_PODS_READY_MAX_ATTEMPTS 450 → 900 × 2 s) to fix Windows/WSL2 slow-runner failures
Increases node-add-local and separate-node-add test timeouts to fix intermittent CI timeout failures (e2e add local test intermittently failure due to time out #4715)
Fixes one-shot-local-build example: skip helm dependency build when chart tarballs are already cached, avoiding Docker Hub unauthenticated 429 rate-limit errors (one shot local built test failed intermittently #4721)

Related Issues

Pull request (PR) checklist

This PR added tests (unit, integration, and/or end-to-end)
This PR updated documentation
This PR added no TODOs or commented out code
This PR has no breaking changes
Any technical debt has been documented as a separate issue and linked to this PR
Any package.json changes have been explained to and approved by a repository manager
All related issues have been linked to this PR
All changes in this PR are included in the description
When this PR merges the commits will be squashed and the title will be used as the commit message, the 'commit message guidelines' below have been followed

Testing

This PR added unit tests
This PR added integration/end-to-end tests
These changes required manual testing that is documented below
Anything not tested is documented

The following manual testing was done:

Unit tests for GitHubApiClient (9 tests covering auth header injection, 403/429 retry with backoff, 404 passthrough, non-auth requests)
CI run on fix/github-api-rate-limit-retry branch

The following was not tested:

Windows pinger 30-minute timeout (relied on CI)
one-shot-local-build helm cache skip (cache-miss path still contacts Docker Hub on first run)

Intermittent HTTP 403 errors from the GitHub API on Windows runners were caused by two compounding issues: 1. EdgeVersionFetcher never included an Authorization header, leaving all five component-version lookups unauthenticated (60 req/hour shared per IP on GitHub-hosted runners). 2. No retry logic existed for transient 403/429 rate-limit responses, so a single bad response permanently failed the dependency-check step. Introduce GitHubApiClient, a static utility class that: - Adds a Bearer token from GITHUB_TOKEN when present (raises limit to 5 000 req/hour for authenticated requests). - Retries on HTTP 403 and 429 with exponential backoff (up to 3 attempts), honouring the Retry-After and X-RateLimit-Reset response headers. Wire GitHubApiClient.get() into EdgeVersionFetcher and all four dependency managers (crane, gvproxy, podman, vfkit), replacing ~15 lines of duplicated header-building + fetch code in each. Add 9 unit tests for GitHubApiClient covering auth, retry, Retry-After header parsing, exhaustion after max retries, no-retry on non-rate-limit errors, and network-failure wrapping. Fixes #4711 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

trunk-io · 2026-06-17T18:23:04Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

github-actions · 2026-06-17T18:30:20Z

Unit Test Results - Linux

38 tests +2 38 ✅ +2 0s ⏱️ ±0s
17 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 8359896. ± Comparison against base commit de23795.

♻️ This comment has been updated with latest results.

github-actions · 2026-06-17T18:32:14Z

Unit Test Results - Windows

1 files ±0 334 suites +2 9s ⏱️ ±0s
1 049 tests +9 1 049 ✅ +9 0 💤 ±0 0 ❌ ±0
1 053 runs +9 1 053 ✅ +9 0 💤 ±0 0 ❌ ±0

Results for commit 8359896. ± Comparison against base commit de23795.

♻️ This comment has been updated with latest results.

On Windows runners using WSL2, the mirror-node pinger pod takes longer to become ready than the default 300×2 s = 10 minutes because it must: 1. pass its own startup probe (/tmp/alive file), 2. connect to the Mirror REST API (http://mirror-1-restjava:80), 3. submit a transaction through the consensus network, and 4. verify that transaction was ingested by the mirror importer. All of these steps are slower under WSL2 due to network-layer indirection, and the generic PODS_READY_MAX_ATTEMPTS budget (shared with every other pod check) is often exhausted just as the pinger is about to go Ready. Add MIRROR_NODE_PINGER_PODS_READY_MAX_ATTEMPTS (default 450) and MIRROR_NODE_PINGER_PODS_READY_DELAY (default 2 000 ms), giving pinger checks a 15-minute budget — consistent with relay and block-node — while leaving every other mirror-node pod check at the existing 10-minute limit. Both constants are overridable via environment variables. Observed failure: https://github.com/hiero-ledger/solo/actions/runs/27705538841/job/81956787144?pr=4703 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

The relay's withWaitCondition for NodesStarted was set to 10 minutes, but the node start sequence emits NodesStarted only after the full chain completes (including waitForTss), which can take 15-25+ minutes on slow or busy runners. Both timeouts are now env-var overridable constants: NODES_STARTED_EVENT_TIMEOUT_MINUTES (default 30) and MIRROR_NODE_DEPLOYED_EVENT_TIMEOUT_MINUTES (default 10). Fixes #4714 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

github-actions · 2026-06-17T20:14:55Z

E2E Test Report

10 files ±0 94 suites ±0 1h 25m 8s ⏱️ + 1m 50s
301 tests ±0 301 ✅ ±0 0 💤 ±0 0 ❌ ±0
320 runs ±0 320 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 242d1cc. ± Comparison against base commit de23795.

♻️ This comment has been updated with latest results.

The describe block in separate-node-add.test.ts had a 3-minute default timeout that matched the intermittent failure reported in #4715 (Mocha applies the describe-level timeout when the describe callback is async). The "should add a new node to the network successfully" test runs three sequential prepare/submit/execute commands that can take 10-15+ minutes on slow CI runners, but only had a 12-minute timeout. - describe block default: 3 min → 20 min - "should add a new node" test: 12 min → 20 min - outer describe in node-add-local: 3 min → 30 min Fixes #4715 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

Windows/WSL2 runners need more time for the mirror pinger pod to become ready during concurrent one-shot deploys. Previous 15-min limit (450 attempts × 2000ms) was insufficient; bumped to 900 attempts = 30 min. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

… 429 When helm dependency build runs for OCI chart repos (registry-1.docker.io/ bitnamicharts), it contacts Docker Hub for manifest verification even when the tarball already exists in charts/, triggering unauthenticated rate-limit 429 errors. Fix: only run helm dependency build on cache miss; on cache hit, restore tarballs and skip the build entirely. The cache key is the version.ts hash, so the same key guarantees identical chart versions and tarballs. Fixes #4721 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

jan-milenkov · 2026-06-18T09:42:58Z

+import {SoloErrors} from './errors/solo-errors.js';
+
+const RETRY_MAX_ATTEMPTS: number = 3;
+const RETRY_BASE_DELAY_MS: number = 1000;


use Duration with to toMilis(), also consider better placements for those constants

updated. changed to private constants

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

…lBackOff On Windows/WSL2 runners, the solo-shared-resources postgresql pod fails with ImagePullBackOff because bitnami/postgresql:latest is not pre-cached into the Kind cluster. This causes a cascade: PostgreSQL never starts → mirror REST health check fails → mirror pinger can never become ready. Adding the image to solo-cache-images-target.yaml ensures it is pre-loaded before the deploy, avoiding Docker Hub unauthenticated rate-limit 429s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

…record stall Without a block node, CN defaults to FILE_AND_GRPC writerMode which fills the gRPC buffer (maxBlocks=5) and stalls record file production after ~20s. Mirror importer falls behind, pinger can never confirm transactions, and the one-shot deploy times out. Fix profile-manager to explicitly set FILE_ONLY when no block nodes are in the deployment state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

… node FILE_ONLY is not a valid BlockStreamWriterMode enum value in CN v0.74; the correct value is FILE. Using FILE_ONLY caused CN to fail to become ACTIVE across all E2E tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

Reverts 242d1cc and f9e70fd. The writerMode changes are not needed for the Windows pinger fix and are being removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

JeffreyDallas requested a review from a team as a code owner June 17, 2026 18:23

JeffreyDallas requested review from Ivo-Yankov and olsentorfinn June 17, 2026 18:23

JeffreyDallas self-assigned this Jun 17, 2026

JeffreyDallas added P1-💎 Current Milestone & Goals PR: Needs Team Approval A pull request that needs review from a team member. labels Jun 17, 2026

JeffreyDallas and others added 2 commits June 17, 2026 14:09

JeffreyDallas and others added 3 commits June 17, 2026 15:21

JeffreyDallas requested a review from a team as a code owner June 18, 2026 02:32

jan-milenkov reviewed Jun 18, 2026

View reviewed changes

jan-milenkov added the PR: Unresolved Comments A pull request where there are comments and they need to be resolved. label Jun 18, 2026

refactor

0fc9e9a

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

JeffreyDallas removed the PR: Unresolved Comments A pull request where there are comments and they need to be resolved. label Jun 18, 2026

JeffreyDallas and others added 5 commits June 18, 2026 10:00

save

9a4c5cd

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add GitHubApiClient with GITHUB_TOKEN auth and retry on 403/429#4712

fix: add GitHubApiClient with GITHUB_TOKEN auth and retry on 403/429#4712
JeffreyDallas wants to merge 12 commits into
mainfrom
fix/github-api-rate-limit-retry

JeffreyDallas commented Jun 17, 2026 •

edited

Loading

Uh oh!

trunk-io Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

jan-milenkov Jun 18, 2026

Uh oh!

JeffreyDallas Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JeffreyDallas commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Pull request (PR) checklist

Testing

Uh oh!

trunk-io Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results - Linux

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results - Windows

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Report

Uh oh!

jan-milenkov Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JeffreyDallas Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JeffreyDallas commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading