ci: add native GitHub Actions pipeline + nSpect/scan GitLab trigger#1841
ci: add native GitHub Actions pipeline + nSpect/scan GitLab trigger#1841nv-nmailhot wants to merge 5 commits into
Conversation
Add the nixl GitHub Actions CI that replaces the GitLab build pipeline (runs on the velonix AWS runners): - version: compute release/dev version (pyproject on release/** builds) - build: matrix of manylinux wheel builds (cu12/cu13, x86/arm) + the runtime container, images pushed to ECR, dist/ extracted as artifacts - upload-x86-wheels / upload-arm-wheels / upload-crates: release-only uploads to internal Artifactory (JFrog CLI / cargo), manual-approval environment - trigger-gitlab-nspect: on release/** builds, triggers the nixl-ci GitLab pipeline that runs nSpect registration + wheel scans (NSPECT-WO64-8O3P) Follow-on to #1832 (INFINIA wheel build). Pushing the built container to the Artifactory docker registry is a separate follow-up PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a GitHub Actions CI workflow, actionlint runner-label config, and a manylinux liburing build adjustment. The workflow computes a version, builds and pushes matrix Docker images, uploads wheel artifacts, publishes release wheels and crates, and triggers a GitLab pipeline on release-branch pushes. ChangesGitHub Actions CI migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 34-37: The workflow input security_scan is defined in the dispatch
inputs but never consumed, so it has no effect. Update the ci.yml workflow so
the security_scan value is actually wired into trigger-gitlab-nspect or a
relevant job condition/GitLab variable mapping, using the existing security_scan
input symbol consistently; if it is not intended to control anything, remove the
input instead.
- Around line 198-206: The wheel upload jobs are missing the manual release
approval gate, so they can publish on release branch pushes without the intended
protection. Update the wheel upload workflows in the upload-x86-wheels and the
other wheel upload job blocks so they use the same release environment gating as
upload-crates, ensuring Artifactory uploads only proceed after approval. Keep
the existing release_build/ref conditions, but add the release environment
protection consistently to each wheel publish job.
- Around line 25-27: The workflow currently triggers on tag pushes but the
release gating logic in the CI jobs still treats those runs as dev builds, so
tagged releases skip the release-only path. Update the workflow logic in the CI
configuration so tag refs are handled consistently with release events by either
removing the tag trigger from the push section or making RELEASE_BUILD and the
related versioning/job conditions recognize refs/tags/v* in the same way as
release branches. Use the existing CI workflow conditions and release/version
computation blocks to align the behavior across all affected jobs.
- Line 21: The CI workflow is currently allowing pull_request runs to execute
publishing jobs on IRSA-enabled self-hosted runners, which exposes registry and
infrastructure access to PR-controlled code. Update the workflow logic in the CI
job definitions and publishing steps so that jobs like the ones tied to the
current workflow, Docker/ECR publish, and release actions only run on trusted
events such as push/tag/release, or route pull request validation to isolated
non-publishing runners. Use the existing job names and publish-related steps in
the workflow to gate or split the execution paths.
- Line 68: The workflow still uses mutable action tags, so update every
remaining uses entry in the CI workflow to an immutable full commit SHA instead
of refs like actions/checkout@v4 or setup-node@v2. Locate the affected steps in
the ci workflow and replace each action version pin with the corresponding
commit hash so the workflow is locked to exact revisions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: f2a9c564-591d-48ee-9549-198125506d42
📒 Files selected for processing (2)
.github/actionlint.yaml.github/workflows/ci.yml
| outputs: | ||
| version: ${{ steps.compute.outputs.version }} | ||
| steps: | ||
| - uses: actions/checkout@v4 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check whether the referenced actions are tag-pinned or already SHA-pinned in other workflow files.
rg -n 'uses:\s*[^@]+@v[0-9]+|uses:\s*[^@]+@[0-9a-f]{7,40}' .github/workflows -SRepository: ai-dynamo/nixl
Length of output: 1603
Pin the workflow actions to immutable SHAs. The @v4/@v2 refs in .github/workflows/ci.yml are mutable; update the remaining uses: entries in this file to full commit SHAs.
🧰 Tools
🪛 zizmor (1.26.1)
[error] 68-68: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci.yml at line 68, The workflow still uses mutable action
tags, so update every remaining uses entry in the CI workflow to an immutable
full commit SHA instead of refs like actions/checkout@v4 or setup-node@v2.
Locate the affected steps in the ci workflow and replace each action version pin
with the corresponding commit hash so the workflow is locked to exact revisions.
Source: Linters/SAST tools
liburing 2.14's examples (zcrx.c) include linux/udmabuf.h, which isn't in the manylinux_2_28 (EL8) kernel headers, so a full `make` fails the build. Build and install only the library (make -C src) — that's all the wheel/INFINIA link needs. Fixes the manylinux build regression from the 2.6->2.14 bump in #1832. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the push: tags: ['v*'] trigger: release is gated on release/** branches, so tag pushes only produced a dev-version build that skipped the release jobs. - Gate version/build with a same-repo guard so untrusted fork-PR code does not run on the self-hosted IRSA-backed builders (which have ECR write + publish). Push, workflow_dispatch, and same-repo (collaborator) PRs still run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold the sw-dynamo container push into this PR so #1841 carries the full set: native build + nSpect/scan trigger, the liburing manylinux fix, the CodeRabbit fixes, and the release-gated runtime-container push to artifactory.nvidia.com/sw-dynamo-nixl-docker-local (reusing ARTIFACTORY_URL + ARTIFACTORY_PYPI_TOKEN, docker login as nmailhot). Push path verified green via the reachability test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
207-208: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftPut the runtime container publish behind the release approval gate.
This Artifactory push runs inside
build, which is not protected by thereleaseenvironment in the provided context. Split this into a release-only publish job withenvironment: release, or otherwise apply the same approval gate beforedocker push.Also applies to: 236-238
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 207 - 208, The runtime container publish step is currently running in the build flow without the protected release approval gate. Move the Artifactory push logic out of the existing build job into a separate release-only publish job in the workflow, and set that job to use the release environment so approval is required before any docker push occurs. Use the existing “Push runtime container to Artifactory” step and the related publish steps around the referenced build/push flow as the anchors when splitting the job.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 21-22: The runtime-container publish path is still tied to a
personal Artifactory user and a PyPI-scoped token, so update the Docker
login/push steps in the CI workflow to use Docker-specific service-account
secrets instead. Replace the hard-coded nmailhot username and the reused
ARTIFACTORY_PYPI_TOKEN with dedicated Docker credentials referenced by the
publish job, and make sure the affected runtime-container and related push steps
use those new secret names consistently.
- Around line 234-238: The Docker publish step leaves Artifactory credentials
behind after docker login, so update the CI step to always clean up the registry
auth once pushing is done. In the workflow block that computes AF_HOST and
AF_IMAGE and then runs docker login/tag/push, add a trap or equivalent cleanup
so docker logout is executed regardless of success or failure, ensuring
self-hosted runners do not retain credentials.
- Around line 216-232: The publish tag is recomputing UCX_SHA from the floating
UCX_REF instead of reusing the exact UCX commit used during the build. Update
the CI workflow so the build step captures the resolved UCX commit and passes it
through as an output/env value, then have the publish/tagging logic use that
stored commit when constructing TAG and related properties. Use the existing
UCX_REF, UCX_SHA, and tag assembly block as the anchor for this change.
---
Duplicate comments:
In @.github/workflows/ci.yml:
- Around line 207-208: The runtime container publish step is currently running
in the build flow without the protected release approval gate. Move the
Artifactory push logic out of the existing build job into a separate
release-only publish job in the workflow, and set that job to use the release
environment so approval is required before any docker push occurs. Use the
existing “Push runtime container to Artifactory” step and the related publish
steps around the referenced build/push flow as the anchors when splitting the
job.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: c84197eb-32f3-4c5b-9caa-797fb92bb195
📒 Files selected for processing (1)
.github/workflows/ci.yml
| # (The runtime-container push reuses ARTIFACTORY_URL + ARTIFACTORY_PYPI_TOKEN — same | ||
| # sw-dynamo project as the wheel upload — with docker login as user 'nmailhot'.) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Use Docker-scoped service-account credentials.
The runtime Docker publish path hard-codes nmailhot and reuses ARTIFACTORY_PYPI_TOKEN. Move both username and token to Docker-specific service-account secrets to avoid personal credential coupling and over-broad token scope.
Suggested change
- ARTIFACTORY_TOKEN: ${{ secrets.ARTIFACTORY_PYPI_TOKEN }}
- AF_USERNAME: nmailhot
+ ARTIFACTORY_TOKEN: ${{ secrets.ARTIFACTORY_DOCKER_TOKEN }}
+ AF_USERNAME: ${{ secrets.ARTIFACTORY_DOCKER_USERNAME }}Also applies to: 210-214
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci.yml around lines 21 - 22, The runtime-container publish
path is still tied to a personal Artifactory user and a PyPI-scoped token, so
update the Docker login/push steps in the CI workflow to use Docker-specific
service-account secrets instead. Replace the hard-coded nmailhot username and
the reused ARTIFACTORY_PYPI_TOKEN with dedicated Docker credentials referenced
by the publish job, and make sure the affected runtime-container and related
push steps use those new secret names consistently.
| UCX_REF: v1.21.x | ||
| BASE_IMAGE_TAG: ${{ matrix.base_image_tag }} | ||
| ARCH: ${{ matrix.arch }} | ||
| RELEASE_VERSION: ${{ needs.version.outputs.version }} | ||
| run: | | ||
| set -e | ||
| # 8-char nixl + UCX commit SHAs (matches the Jenkins tag scheme). | ||
| NIXL_SHA="$(git rev-parse --short=8 HEAD)" | ||
| if [[ "$UCX_REF" =~ ^[a-f0-9]{8,40}$ ]]; then | ||
| UCX_SHA="${UCX_REF:0:8}" | ||
| else | ||
| UCX_SHA="$(git ls-remote https://github.com/openucx/ucx.git "$UCX_REF" | head -n1 | cut -c1-8)" | ||
| fi | ||
| [ -n "$UCX_SHA" ] || { echo "ERROR: could not resolve UCX_REF=$UCX_REF"; exit 1; } | ||
| # Suffix = release version (mirrors the Jenkins TAG_SUFFIX, e.g. -v1.0.1-rc2). | ||
| SUFFIX="${RELEASE_VERSION:+-v${RELEASE_VERSION}}" | ||
| TAG="${BASE_IMAGE_TAG}-nixl-${NIXL_SHA}-ucx-${UCX_SHA}-${ARCH}${SUFFIX}" |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check whether the runtime build consumes UCX_REF/UCX_SHA or only the publish step does.
rg -n -C3 'UCX_REF|UCX_SHA|UCX_VERSION|openucx|ucx' .github/workflows contribRepository: ai-dynamo/nixl
Length of output: 20243
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the CI workflow around the build and publish steps.
sed -n '160,250p' .github/workflows/ci.yml
printf '\n--- Dockerfile.manylinux UCX section ---\n'
sed -n '445,460p' contrib/Dockerfile.manylinux
printf '\n--- Dockerfile UCX section ---\n'
sed -n '176,190p' contrib/DockerfileRepository: ai-dynamo/nixl
Length of output: 6463
Use the UCX commit from the build for the publish tag
UCX_SHA is resolved from the floating v1.21.x ref during publish, while the image was already built earlier from whatever commit that ref pointed to at build time. Pass the resolved commit through the build and reuse it here, otherwise the ucx-<sha> tag/properties can describe a different UCX revision than the image contains.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci.yml around lines 216 - 232, The publish tag is
recomputing UCX_SHA from the floating UCX_REF instead of reusing the exact UCX
commit used during the build. Update the CI workflow so the build step captures
the resolved UCX commit and passes it through as an output/env value, then have
the publish/tagging logic use that stored commit when constructing TAG and
related properties. Use the existing UCX_REF, UCX_SHA, and tag assembly block as
the anchor for this change.
| AF_HOST="$(printf '%s' "$ARTIFACTORY_URL" | sed -E 's#^https?://##; s#/.*##')" | ||
| AF_IMAGE="${AF_HOST}/${AF_REPO}/nixl:${TAG}" | ||
| echo "$ARTIFACTORY_TOKEN" | docker login "$AF_HOST" -u "$AF_USERNAME" --password-stdin | ||
| docker tag "$IMAGE_NAME" "$AF_IMAGE" | ||
| docker push "$AF_IMAGE" |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Clean up Docker registry credentials after the push.
docker login writes the Artifactory token to Docker config, but the step never logs out. Add a trap so self-hosted runners do not retain publish credentials after success or failure.
Suggested change
AF_HOST="$(printf '%s' "$ARTIFACTORY_URL" | sed -E 's#^https?://##; s#/.*##')"
AF_IMAGE="${AF_HOST}/${AF_REPO}/nixl:${TAG}"
+ trap 'docker logout "$AF_HOST" >/dev/null 2>&1 || true' EXIT
echo "$ARTIFACTORY_TOKEN" | docker login "$AF_HOST" -u "$AF_USERNAME" --password-stdin
docker tag "$IMAGE_NAME" "$AF_IMAGE"
docker push "$AF_IMAGE"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AF_HOST="$(printf '%s' "$ARTIFACTORY_URL" | sed -E 's#^https?://##; s#/.*##')" | |
| AF_IMAGE="${AF_HOST}/${AF_REPO}/nixl:${TAG}" | |
| echo "$ARTIFACTORY_TOKEN" | docker login "$AF_HOST" -u "$AF_USERNAME" --password-stdin | |
| docker tag "$IMAGE_NAME" "$AF_IMAGE" | |
| docker push "$AF_IMAGE" | |
| AF_HOST="$(printf '%s' "$ARTIFACTORY_URL" | sed -E 's#^https?://##; s#/.*##')" | |
| AF_IMAGE="${AF_HOST}/${AF_REPO}/nixl:${TAG}" | |
| trap 'docker logout "$AF_HOST" >/dev/null 2>&1 || true' EXIT | |
| echo "$ARTIFACTORY_TOKEN" | docker login "$AF_HOST" -u "$AF_USERNAME" --password-stdin | |
| docker tag "$IMAGE_NAME" "$AF_IMAGE" | |
| docker push "$AF_IMAGE" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci.yml around lines 234 - 238, The Docker publish step
leaves Artifactory credentials behind after docker login, so update the CI step
to always clean up the registry auth once pushing is done. In the workflow block
that computes AF_HOST and AF_IMAGE and then runs docker login/tag/push, add a
trap or equivalent cleanup so docker logout is executed regardless of success or
failure, ensuring self-hosted runners do not retain credentials.
…scan) - Add the manual-approval 'release' environment to upload-x86-wheels and upload-arm-wheels (previously only upload-crates/trigger had it, so wheel uploads published to Artifactory without the approval gate). - Wire the security_scan workflow_dispatch input into the GitLab trigger's ENABLE_WHEEL_SCAN variable (was hardcoded true / input was dead). Defaults true on release/** pushes; on workflow_dispatch it honors the input. (SHA-pinning of actions intentionally not changed — consistent with the prior decision on #1803.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the nixl GitHub Actions CI that replaces the GitLab build pipeline (runs on the velonix AWS runners):
Follow-on to #1832 (INFINIA wheel build). Pushing the built container to the Artifactory docker registry is a separate follow-up PR.
What?
Describe what this PR is doing.
Why?
Justification for the PR. If there is an existing issue/bug, please reference it. For
bug fixes, the 'Why?' and 'What?' can be merged into a single item.
How?
It is optional, but for complex PRs, please provide information about the design,
architecture, approach, etc.
Summary by CodeRabbit
liburingbuild/install to compile and install only the needed library components for the environment.