Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Author
Summary
Please refer to this comment for the overall implementation.
This PR implements the L2 levels of the cross-repository CI relay described in [RFC] Cross-Repository CI Relay for PyTorch Out-of-Tree Backends. For the previous L1 implementation, please refer to this PR
The current implementation focuses on the first two levels defined in the RFC:
L2: downstream repos can send their CI results to PyTorch and display them in PyTorch HUD.Higher-level behaviors for
L3andL4are intentionally left for follow-up work.Architecture
The relay is split into two AWS Lambda functions:
webhooklambda function (Updated)opened/reopened/synchronized/closedactions, forwards repository_dispatch events to downstream reposcallbacklambda function (Added)queue timeandexecute timefor evolution toL3repoChanges
..github/ ├── workflows/ │ └── _lambda-do-release-runners.yml # Updates the Lambda release workflow to include cross-repo-ci-relay packaging/release │ └── actions/ └── cross-repo-ci-relay-callback/ └── action.yml # Composite action used by downstream workflows to report status back to the relay/result endpoint aws/lambda/cross_repo_ci_relay/ ├── tests/ # Unit tests for allowlist/config/webhook/result/redis behavior ├── README.md # Project overview, local development, callback flow, and result-side validation steps ├── Makefile # Top-level local developer entrypoint for test / deploy / clean ├── local_server.py # FastAPI wrapper for local end-to-end testing of both webhook and result endpoints ├── requirements.txt # Python dependencies required by the relay Lambdas │ ├── utils/ │ ├── allowlist.py # Loads, parses, and queries the downstream allowlist by rollout level │ ├── config.py # Shared runtime config loading and cached get_config() helper │ ├── gh_helper.py # GitHub App, repository_dispatch, and GitHub file access helpers │ ├── hud.py # HUD write helpers for downstream result reporting │ ├── jwt_helper.py # Helpers for minting/verifying relay callback tokens │ ├── redis_helper.py # Redis helpers for allowlist cache, OOT state, and timing data │ └── misc.py # Shared TypedDict definitions and HTTPException │ ├── webhook/ │ ├── Makefile # Build/package/deploy commands for the webhook Lambda │ ├── lambda_function.py # Webhook Lambda entrypoint: verifies GitHub webhook requests and routes events │ └── event_handler.py # Handles PR/push events, resolves allowlist targets, and dispatches to downstream repos │ └── callback/ ├── Makefile # Build/package/deploy commands for the result Lambda ├── lambda_function.py # Result Lambda entrypoint: verifies callback token and GitHub OIDC token └── callback_handler.py # Validates callback payloads, checks L2+ eligibility, stores state, and writes to HUDUsage
See README.md for more details.
Verification
We performed the following scenario verification on our AWS Lambda instance:
Terraform configuration
pytorch/ci-infra#415
Unit Tests
Security
Callback payload carries full upstream webhook data back to HUD —
action.ymlbuilds the callback body by mutatinggithub.event.client_payload(which contains the entire original webhook payload: PR metadata, commits, author info) and addingstatus/conclusion/workflow_name/workflow_urlon top. This full blob is forwarded verbatim byhud.pyto HUD with no relay-side filtering. HUD receives both relay-trustedverified_repoand an unvalidated body — if HUD trusts self-reported fields inside the body oververified_repo, a manipulated dispatch payload could tamper with HUD records.Lambda callback URL is public and hardcoded — The endpoint is hardcoded in `action.yml and exposed in a public action, making it trivially discoverable. OIDC verification blocks unauthorized HUD writes, but the endpoint has no rate limiting; request flooding can cause Lambda concurrency exhaustion or Redis connection saturation.
Only OIDC is used for verification — The callback lambda relies solely on GitHub OIDC token verification for authentication, without additional application-level secrets or signatures. If an attacker compromises a downstream repo's GitHub Actions permissions, they could forge authenticated requests to the callback endpoint. Besides, OIDC has its own limitations (e.g., token expiration, potential misconfigurations) that could lead to unauthorized access if not carefully managed.
HUD Interaction
Design Principle: Transparent Relay & Decoupling
The Relay Server acts as a lightweight data passthrough layer. It does not define or parse specific CI data formats; instead, it offloads data interpretation and validation to the HUD. This ensures complete decoupling between the relay infrastructure and business-specific data.
Security & Risk Mitigation
The relay uses OIDC authentication to guarantee the authenticity of the data source (Verified Repo). Its core responsibility is to ensure the data originates from the claimed repository, while security filtering and content compliance are enforced at the HUD level.