ci(docker-new): split base-cuda layer and restructure CI pipelines#7025
Draft
ci(docker-new): split base-cuda layer and restructure CI pipelines#7025
Conversation
|
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
Restructures the docker-new image graph and CI topology: - Add base-cuda-runtime / base-cuda-devel stages as a dedicated CUDA base, and rebase universe-cuda off them instead of universe. - Drop universe-runtime-dependencies; fold into universe. - Split rmw and nvidia ansible roles into standalone playbooks so Dockerfiles bind-mount only what they need. - Switch every BuildKit RUN mount to named IDs (apt/ccache/pip/pipx, ROS_DISTRO-scoped where relevant). - Split per-distro CI into per-arch jobs; extract multi-arch manifest stitching into docker-manifest-new.yaml. - Persist BuildKit mount caches across runs via actions/cache + buildkit-cache-dance, with a size guard and lineage pruning. - Add docker-new/examples/ (basic cpu/dri/nvidia + awsim + planning simulator demos). Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
e757246 to
92c674e
Compare
- Switch buildcache-new tag to its own scope `health-check-<build-type>-humble-<platform>-<ref>` (instead of the ci-universe tags that no longer populate), with current-ref + main fallback reads and a current-ref write, mirroring docker-build-new.yaml's pattern. - Add read-only `buildkit-cache-dance` injection fed by the `buildkit-mounts-ci-universe-humble-<platform>-*` tarballs the main pipeline saves, so health-check reuses the same apt / ccache / pip / pipx mount state without writing back. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
- Put setup-universe behind the same `run:health-check` label the docker-new health-check workflow uses. setup-universe runs the setup-dev-env.sh script on a self-hosted runner, so triggering it on every PR is expensive; a single label now toggles both heavyweight pre-merge checks together. - Switch health-check from `make-sure-label-is-present.yaml@v1` to `require-label.yaml@v1`, matching the sub-repo build-and-test workflows and the new setup-universe gate. Also drops an unused `load: false` in the bake step (false is the default). Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
xmfcx
added a commit
to autowarefoundation/autoware-spell-check-dict
that referenced
this pull request
Apr 17, 2026
Words flagged by spell-check in autowarefoundation/autoware#7025: - argjson (jq --argjson flag) - containerimage (docker buildx output type) - imagetools (docker buildx subcommand) - llvmpipe (mesa software renderer) - radeonsi (mesa AMD Gallium driver) - redownloading (compound word) - tzdata (ubuntu timezone package) Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
xmfcx
added a commit
to autowarefoundation/autoware-spell-check-dict
that referenced
this pull request
Apr 17, 2026
Words flagged by spell-check in autowarefoundation/autoware#7025: - argjson (jq --argjson flag) - containerimage (docker buildx output type) - imagetools (docker buildx subcommand) - llvmpipe (mesa software renderer) - radeonsi (mesa AMD Gallium driver) - redownloading (compound word) - tzdata (ubuntu timezone package) Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
Transient TCP timeouts to archive.ubuntu.com have been failing the docker build during ansible's `apt install` step. Configure libapt to retry 5x with 30s HTTP(S) timeouts via `/etc/apt/apt.conf.d/99-retries` in the base image, inherited by every downstream stage. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
Add the Azure-hosted Canonical mirror as the primary apt source, keeping archive.ubuntu.com as a built-in failsafe. apt tries mirrors in order per URI, so Azure serves the fast happy path (especially for GHA runners on Azure) and archive.ubuntu.com takes over if Azure has a blip. Handles both classic /etc/apt/sources.list (jammy / humble base) and the deb822 /etc/apt/sources.list.d/ubuntu.sources (noble / jazzy base). Strictly more reliable than the previous single-mirror config: both mirrors would have to be unreachable for the build to fail, instead of just archive.ubuntu.com. Combined with Acquire::Retries, this covers both steady-state mirror outages and transient TCP flakes. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
Mirror the reliability hardening from docker-new/base.Dockerfile into the setup-universe job, which runs in a bare ubuntu:22.04 / ubuntu:24.04 container. Applied before the first apt call so the initial `apt-get update` already benefits: - Acquire::Retries "5" with 30s HTTP(S) timeouts to survive transient TCP flakes against archive.ubuntu.com. - Azure mirror (azure.archive.ubuntu.com) as the primary apt source with archive.ubuntu.com as the failsafe. Handles both classic /etc/apt/sources.list (jammy) and deb822 /etc/apt/sources.list.d/ubuntu.sources (noble) formats. Uses only coreutils (echo, sed), so no package installs need to succeed before the mirror config is in place. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
The workflow's docker-bake step reads from a `:...-main` registry cache ref as a fallback, populated only when the workflow runs on `main`. Previously it fired only on PR/schedule/dispatch, so the main cache never got written on merges, starving PR builds of warm layers. - Add `push` trigger on `main`. - Skip `require-label` for non-PR events (push/schedule/dispatch). - Use `always() && (... || github.event_name != 'pull_request')` on docker-build so it runs even when require-label was skipped. - Simplify per-step gates from `schedule || workflow_dispatch` to `!= 'pull_request'`, which naturally covers push too. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
Disentangle the three-mode gating in a single file by splitting it into dedicated workflows per trigger and a shared reusable job. - health-check-reusable.yaml: `workflow_call` with the matrix docker build job. Step-level `if:`s reduced to the genuinely orthogonal matrix filters (`build-type != 'main'`, `platform == 'arm64'`, `build-type == 'nightly'`). No label or paths logic. - health-check-pr.yaml: `pull_request` trigger with `paths:` filter at the trigger level (replaces the per-step `changed-files` gate). Uses the `require-label` reusable workflow which exits 1 on missing label, so a watched-path PR without `run:health-check` turns red instead of silently skipping. Calls the reusable on success. - health-check.yaml: trimmed to `push: main`, `schedule`, and `workflow_dispatch`. No label check, no paths filter; directly calls the reusable. Purpose is to keep populating the shared `:health-check-*-main` BuildKit registry cache for PRs to read. Net: ~10 repeated `if:` conditionals removed, the `changed-files` step removed, each trigger mode expressed in its own file. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
0700e43 to
9af1585
Compare
Previously the classic /etc/apt/sources.list (jammy) was handled by duplicating `deb` lines with azure.archive.ubuntu.com first and archive.ubuntu.com second. apt treats parallel deb lines as independent sources and fetches InRelease/Packages from both, wasting ~35 MB per update rather than falling back only when needed. Switch both the Dockerfile and setup-universe workflow to apt's `mirror+file://` transport: - Write /etc/apt/ubuntu-mirrors.list with Azure first, archive second. - Replace each `http://archive.ubuntu.com/ubuntu` reference (classic or deb822) with `mirror+file:///etc/apt/ubuntu-mirrors.list`. apt reads the list and tries URLs in order, using the first that responds, so we get true first-win fallback in both sources formats. security.ubuntu.com is left untouched (separate host, not mirrored on Azure). The file-existence guard uses `if [ -f "$f" ]; then ...; fi` rather than `[ -f "$f" ] && sed ...`: under `sh -e` (the default for GHA `run:` steps) the `&&` chain short-circuits and returns 1 when the file doesn't exist, tripping errexit. Only one of the two sources formats is present on any given Ubuntu version, so the non-matching iteration would otherwise kill the step. Verified locally by baking both the humble (jammy / classic sources.list) and jazzy (noble / deb822) base images end to end; the apt-get install inside the ansible rmw role succeeded via mirror+file:// on both. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org> style(pre-commit): autofix
b4f16f0 to
bf848fa
Compare
`apt-transport-mirror` treats unannotated peer URLs in a mirrorlist as equal, and spreads each `apt-get update` request across them for load balancing. That combined with `mirror+file://` in bf848fa means one `InRelease` can come from azure while the matching `Packages.gz` comes from archive.ubuntu.com (or vice versa), and when the two hosts are mid-sync apt fails with: File has unexpected size (4263777 != 4263737). Mirror sync in progress? Observed on an earlier PR run at step:15 line 881 — archive served an `InRelease` that disagreed with the `Packages.gz` its mirror had just rotated. `Acquire::Retries` doesn't help here because the bad file is consistently returned until the mirror finishes its push. Annotate the mirrorlist with `priority:` (lower = preferred) so every request goes to azure first and archive is touched only when azure fails. This eliminates the cross-host version-skew race entirely; the only remaining case is a mid-push on azure alone, which is rare and typically covered by `Acquire::Retries`. Secondary benefit: on GHA runners (which live inside Azure's network) azure.archive.ubuntu.com is an order of magnitude faster than the public archive.ubuntu.com, so pinning azure as primary is also a throughput win — "load balancing" over an unevenly fast pair was a pessimization, not a speedup. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
This was referenced Apr 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is being split into smaller, focused PRs for easier review. It stays open in draft while the stack is prepared; it will be closed once every successor lands on
main.Split stack
Each PR is stacked on top of the previous one. Review bottom-to-top. Branches are already pushed; PRs are opened one at a time so each merges before the next.
feat/split-06-docker-new-examplesOriginal combined description preserved in the commit body of
feat/splitat 135bd8256.