You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During L1 implementation #7847, we met some confusion about the Cross-Repository CI Relay, which may not be clearly specified in this RFC. Therefore, I open this issue to discuss the detailed design of L1 through L4 to improve future L2-L4 implementations. I would also like to listen to different opinions from the community.
L1
%%{init: {"theme": "base"}}%%
sequenceDiagram
participant U as UpStream Repo
participant W as webhook_handler
participant R as Redis
participant S as result_handler
participant D as DownStream Repo
U->>W: PR/Push event trigger
W->>R: Get Allowlist
W->>D: Passthrough Payload
Loading
L1 is the most basic event forwarding layer. Its core goal is to allow the upstream PyTorch repository to safely forward PR and push events to onboarded downstream repositories, without writing any status back to the upstream side.
The flow in the diagram is straightforward:
An upstream PR event (opened/reopened/synchronized/closed) or push event triggers webhook_handler through the GitHub App.
webhook_handler reads the Allowlist from a remote source and stores it in Redis to reduce follow-up calls, determining which downstream repositories should receive the event.
The payload is then passed through directly to the downstream repositories.
For the concrete L1 implementation, see #7847.
L2
%%{init: {"theme": "base"}}%%
sequenceDiagram
participant U as UpStream Repo
participant W as webhook_handler
participant R as Redis
participant S as result_handler
participant D as DownStream Repo
participant H as HUD
U->>W: PR/Push event trigger
W->>R: Get Allowlist
W->>D: Passthrough Payload
rect rgb(240, 240, 240)
Note over S, D: Creating In Progress in HUD
D->>S: In progress call
S->>R: Get Allowlist
S->>H: Show in progress on HUD
end
rect rgb(240, 240, 240)
Note over S, D: Updating Status in HUD
D->>S: Completed workflow run call
S->>R: Get Allowlist
S->>H: Show completed on HUD
end
Loading
L2 adds the ability to report results back to HUD on top of L1. The first half of the diagram is the same as L1: the upstream event enters webhook_handler, the Allowlist is read, and the downstream repository is triggered. The new part is the two gray sections in the second half:
When the downstream workflow run starts running, it sends an In progress callback to result_handler.
DownStream Repo actively sends a callback request to result_handler from the first job in the workflow.
After authenticating the request, result_handler reads the Allowlist to verify whether the request from this DownStream Repo belongs to L2 or above.
If the request is valid, the run information triggered by the workflow is written to HUD and marked as in progress.
When the workflow run finishes, it sends a Completed callback.
DownStream Repo actively sends a callback request to result_handler from the last job in the workflow.
After authenticating the request, result_handler reads the Allowlist to verify whether the request from this DownStream Repo belongs to L2 or above.
If the request is valid, the run information triggered by the workflow is written to HUD and marked as completed.
L3
%%{init: {"theme": "base"}}%%
sequenceDiagram
participant U as UpStream Repo
participant W as webhook_handler
participant R as Redis
participant RH as result_handler
participant D as DownStream Repo
participant H as HUD
U->>W: PR/Push event trigger
W->>R: Get Allowlist
W->>D: Passthrough payload
rect rgb(240, 240, 240)
Note over R, D: Scenario 1: label add before workflow run create
U->>W: PR label add
W->>R: Cache PR label info
end
D->>RH: In progress workflow run call
RH->>R: Get Allowlist<br>Cache workflow run info
RH->>H: Show in progress workflow run on HUD
rect rgb(240, 240, 240)
Note over R, D: Scenario 1
RH->>R: Find PR label info record
RH->>U: Create PR in_progress check run
end
rect rgb(240, 240, 240)
Note over R, D: Scenario 2: label add during workflow run execute
U->>W: PR label add
W->>R: Find workflow run info
W->>U: Create PR in_progress check run
end
D->>RH: Completed workflow run call
RH->>R: Get Allowlist<br>Update workflow run info
RH->>H: Show completed workflow run on HUD
rect rgb(240, 240, 240)
Note over R, D: Scenario 1 & 2
RH->>U: Update PR completed check run
end
rect rgb(240, 240, 240)
Note over R, D: Scenario 3: label add after run complete
U->>W: PR label add
W->>R: Find workflow run info
W->>U: Create PR completed check run
end
Loading
L3 keeps the HUD display capability from L2 and further introduces on-demand upstream PR check runs. Consistent with the label_only design in the RFC, this layer does not attach downstream results to every PR by default. Instead, the status of the corresponding backend is shown as a non-blocking upstream check only after a label is explicitly added to the PR. The key of L3 is whether the label event or the downstream workflow run status arrives first. Because of that, both sides of the information need to be temporarily stored in Redis, and the check run is created or updated when the timing is right.
L3 has the following three scenarios:
Scenario 1 means the label arrives before the workflow run:
webhook_handler first caches the label information in Redis.
The downstream workflow run starts and calls back to result_handler. After finding the matching label record in the cache, result_handler immediately creates an in_progress check run on the upstream PR.
After the workflow run completes and DownStream Repo sends the completed callback to result_handler, result_handler updates both the workflow run status in Redis and the check run status on the PR.
Scenario 2 means the workflow run is already executing and the label arrives later:
When result_handler receives the in progress callback from DownStream Repo, it first caches the workflow run information.
After the user adds a label to the PR, webhook_handler looks up that workflow run record in reverse and backfills an in_progress check run.
After the workflow run completes and DownStream Repo sends the completed callback to result_handler, result_handler updates both the workflow run status in Redis and the check run status on the PR.
Scenario 3 is the later case where the downstream workflow run has already completed before the user adds the label:
In this case, webhook_handler directly creates a completed check run based on the workflow run result already stored in Redis, without re-triggering execution.
If the record for that workflow run has already been removed from Redis, the check run will not be created.
Note:
The Redis cache TTL is tentatively set to 3 hours to align with the workflow integration requirements, so Redis data will not grow indefinitely.
L4
The L4 scenario is the same as L3 Scenario 1. The difference is that L3 requires a label to trigger, while L4 triggers by default without requiring a label.
Check run
%%{init: {"theme": "base"}}%%
sequenceDiagram
participant U as UpStream Repo
participant W as webhook_handler
participant R as Redis
participant RH as result_handler
participant D as DownStream Repo
participant H as HUD
U->>W: Re-run workflow run trigger
W->>R: Get Allowlist
W->>D: Re-run dispatch
D->>RH: In progress workflow run call
RH->>R: Get Allowlist
RH->>U: Update in progress check run
RH->>H: Update in progress workflow run in HUD
D->>RH: Completed workflow run call
RH->>R: Get Allowlist
RH->>U: Update completed check run
RH->>H: Update completed workflow run on HUD
Loading
This diagram focuses on the re-run scenario:
After the upstream side triggers a workflow run re-run request from a check run, webhook_handler first reads the Allowlist and then dispatches the re-run request to the downstream side.
After the downstream side re-runs the workflow run, it uses result_handler to synchronize the in progress and completed states back to the upstream check run and HUD, which is almost the same as L2 and L3.
Note:
When a check run is created, the workflow run's run_id is stored in the payload's external_id. When a re-run is triggered, the corresponding workflow run can be found by looking up the external_id in the check run payload.
Background
During L1 implementation #7847, we met some confusion about the Cross-Repository CI Relay, which may not be clearly specified in this RFC. Therefore, I open this issue to discuss the detailed design of L1 through L4 to improve future L2-L4 implementations. I would also like to listen to different opinions from the community.
L1
%%{init: {"theme": "base"}}%% sequenceDiagram participant U as UpStream Repo participant W as webhook_handler participant R as Redis participant S as result_handler participant D as DownStream Repo U->>W: PR/Push event trigger W->>R: Get Allowlist W->>D: Passthrough PayloadL1 is the most basic event forwarding layer. Its core goal is to allow the upstream PyTorch repository to safely forward PR and push events to onboarded downstream repositories, without writing any status back to the upstream side.
The flow in the diagram is straightforward:
opened/reopened/synchronized/closed) or push event triggerswebhook_handlerthrough the GitHub App.webhook_handlerreads theAllowlistfrom a remote source and stores it inRedisto reduce follow-up calls, determining which downstream repositories should receive the event.payloadis then passed through directly to the downstream repositories.For the concrete L1 implementation, see #7847.
L2
%%{init: {"theme": "base"}}%% sequenceDiagram participant U as UpStream Repo participant W as webhook_handler participant R as Redis participant S as result_handler participant D as DownStream Repo participant H as HUD U->>W: PR/Push event trigger W->>R: Get Allowlist W->>D: Passthrough Payload rect rgb(240, 240, 240) Note over S, D: Creating In Progress in HUD D->>S: In progress call S->>R: Get Allowlist S->>H: Show in progress on HUD end rect rgb(240, 240, 240) Note over S, D: Updating Status in HUD D->>S: Completed workflow run call S->>R: Get Allowlist S->>H: Show completed on HUD endL2 adds the ability to report results back to HUD on top of L1. The first half of the diagram is the same as L1: the upstream event enters
webhook_handler, theAllowlistis read, and the downstream repository is triggered. The new part is the two gray sections in the second half:In progresscallback toresult_handler.DownStream Repoactively sends a callback request toresult_handlerfrom the first job in the workflow.result_handlerreads theAllowlistto verify whether the request from thisDownStream Repobelongs to L2 or above.in progress.Completedcallback.DownStream Repoactively sends a callback request toresult_handlerfrom the last job in the workflow.result_handlerreads theAllowlistto verify whether the request from thisDownStream Repobelongs to L2 or above.completed.L3
%%{init: {"theme": "base"}}%% sequenceDiagram participant U as UpStream Repo participant W as webhook_handler participant R as Redis participant RH as result_handler participant D as DownStream Repo participant H as HUD U->>W: PR/Push event trigger W->>R: Get Allowlist W->>D: Passthrough payload rect rgb(240, 240, 240) Note over R, D: Scenario 1: label add before workflow run create U->>W: PR label add W->>R: Cache PR label info end D->>RH: In progress workflow run call RH->>R: Get Allowlist<br>Cache workflow run info RH->>H: Show in progress workflow run on HUD rect rgb(240, 240, 240) Note over R, D: Scenario 1 RH->>R: Find PR label info record RH->>U: Create PR in_progress check run end rect rgb(240, 240, 240) Note over R, D: Scenario 2: label add during workflow run execute U->>W: PR label add W->>R: Find workflow run info W->>U: Create PR in_progress check run end D->>RH: Completed workflow run call RH->>R: Get Allowlist<br>Update workflow run info RH->>H: Show completed workflow run on HUD rect rgb(240, 240, 240) Note over R, D: Scenario 1 & 2 RH->>U: Update PR completed check run end rect rgb(240, 240, 240) Note over R, D: Scenario 3: label add after run complete U->>W: PR label add W->>R: Find workflow run info W->>U: Create PR completed check run endL3 keeps the HUD display capability from L2 and further introduces on-demand upstream PR check runs. Consistent with the
label_onlydesign in the RFC, this layer does not attach downstream results to every PR by default. Instead, the status of the corresponding backend is shown as a non-blocking upstream check only after a label is explicitly added to the PR. The key of L3 is whether the label event or the downstream workflow run status arrives first. Because of that, both sides of the information need to be temporarily stored in Redis, and the check run is created or updated when the timing is right.L3 has the following three scenarios:
webhook_handlerfirst caches the label information inRedis.result_handler. After finding the matching label record in the cache,result_handlerimmediately creates anin_progresscheck run on the upstream PR.DownStream Reposends the completed callback toresult_handler,result_handlerupdates both the workflow run status inRedisand the check run status on the PR.result_handlerreceives thein progresscallback fromDownStream Repo, it first caches the workflow run information.webhook_handlerlooks up that workflow run record in reverse and backfills anin_progresscheck run.DownStream Reposends the completed callback toresult_handler,result_handlerupdates both the workflow run status inRedisand the check run status on the PR.webhook_handlerdirectly creates acompletedcheck run based on the workflow run result already stored in Redis, without re-triggering execution.Redis, the check run will not be created.Note:
The Redis cache TTL is tentatively set to 3 hours to align with the workflow integration requirements, so Redis data will not grow indefinitely.
L4
The L4 scenario is the same as L3 Scenario 1. The difference is that L3 requires a label to trigger, while L4 triggers by default without requiring a label.
Check run
%%{init: {"theme": "base"}}%% sequenceDiagram participant U as UpStream Repo participant W as webhook_handler participant R as Redis participant RH as result_handler participant D as DownStream Repo participant H as HUD U->>W: Re-run workflow run trigger W->>R: Get Allowlist W->>D: Re-run dispatch D->>RH: In progress workflow run call RH->>R: Get Allowlist RH->>U: Update in progress check run RH->>H: Update in progress workflow run in HUD D->>RH: Completed workflow run call RH->>R: Get Allowlist RH->>U: Update completed check run RH->>H: Update completed workflow run on HUDThis diagram focuses on the re-run scenario:
webhook_handlerfirst reads theAllowlistand then dispatches the re-run request to the downstream side.result_handlerto synchronize thein progressandcompletedstates back to the upstream check run and HUD, which is almost the same as L2 and L3.Note:
When a check run is created, the workflow run's
run_idis stored in the payload'sexternal_id. When a re-run is triggered, the corresponding workflow run can be found by looking up theexternal_idin the check run payload.cc @fffrog @KarhouTam