ci: add missing LTS VM kernels#1588
Conversation
✅ Deploy Preview for aya-rs-docs ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
tamird
left a comment
There was a problem hiding this comment.
Please tidy the PR subject and description.
@tamird reviewed 3 files and all commit messages, and made 6 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on swananan).
.github/scripts/download_kernel_images.sh line 28 at r1 (raw file):
# tips (5.15.208 and 6.6.141 as of 2026-05-29). Use the newest archived # Debian cloud packages we can find for each LTS line instead: 5.15.15 for # 5.15, and 6.6.15 for 6.6.
we need some programmatic way to refresh this information, this comment is not enough
Code quote:
# Debian Snapshot does not have cloud packages for the current upstream LTS
# tips (5.15.208 and 6.6.141 as of 2026-05-29). Use the newest archived
# Debian cloud packages we can find for each LTS line instead: 5.15.15 for
# 5.15, and 6.6.15 for 6.6..github/scripts/download_kernel_images.sh line 31 at r1 (raw file):
case "$version:$architecture" in 5.15:amd64 | 5.15:arm64) archive_timestamp=20220130T155220Z
what is the maintenance model for this logic? how will it break when these things move and how do we fix it?
also how come only 5.15 and 6.6 are here?
.github/workflows/ci.yml line 254 at r1 (raw file):
- ubuntu-24.04-arm download-kernel-images: - arm64 5.10
should this list gain 5.15 and 6.6 as well?
xtask/src/run.rs line 31 at r2 (raw file):
impl GitHubLogGroup { fn for_vm_integration_tests(kernel: impl std::fmt::Display) -> Option<Self> { if !matches!(env::var("GITHUB_ACTIONS").as_deref(), Ok("true")) {
i think you can avoid utf8 stuff here
env::var_os("GITHUB_ACTIONS")
also i'd keep the logic of this to printing ::group:: and ::endgroup:: and let the caller control everything after ::group::. you can probably use format_args in the caller and take fmt::Arguments as the parameter here.
xtask/src/run.rs line 481 at r2 (raw file):
OsStr::from_bytes(base) }; // Fold each kernel's integration test output in GitHub Actions.
please add a newline before this
bdf65e2 to
07a88eb
Compare
swananan
left a comment
There was a problem hiding this comment.
@swananan made 5 comments.
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on tamird).
.github/scripts/download_kernel_images.sh line 28 at r1 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
we need some programmatic way to refresh this information, this comment is not enough
Done.
.github/scripts/download_kernel_images.sh line 31 at r1 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
what is the maintenance model for this logic? how will it break when these things move and how do we fix it?
also how come only 5.15 and 6.6 are here?
Agreed, this logic did feel a bit brittle, so I switched it to use the Snapshot API and construct the download URLs from API metadata.
5.15 and 6.6 are the only LTS lines missing from the current Debian pool.
.github/workflows/ci.yml line 254 at r1 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
should this list gain 5.15 and 6.6 as well?
I was worried about increasing CI time, so I initially avoided adding these to the arm64 list. I have added them now so we can see how much impact they have on CI runtime.
xtask/src/run.rs line 31 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
i think you can avoid utf8 stuff here
env::var_os("GITHUB_ACTIONS")
also i'd keep the logic of this to printing
::group::and::endgroup::and let the caller control everything after::group::. you can probably use format_args in the caller and take fmt::Arguments as the parameter here.
Thanks for the details, done.
xtask/src/run.rs line 481 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
please add a newline before this
Done.
|
The new arm64 job didn't add too much to CI time. |
|
CI failed on arm64 5.15, will check this tomorrow. |
tamird
left a comment
There was a problem hiding this comment.
@tamird reviewed 4 files and all commit messages, made 4 comments, and resolved 5 discussions.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions (waiting on swananan).
.github/workflows/ci.yml line 301 at r4 (raw file):
sudo apt -y install \ autopoint \ jq \
plz add to the comment above
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
#!/usr/bin/env bash
this is getting to be quite a lot of code. is it time for this to be written in python?
Brewfile line 7 at r4 (raw file):
# which is needed for proper handling of `--etag-{compare,save}`. brew "curl" brew "jq"
needs a comment plz
xtask/src/run.rs line 31 at r4 (raw file):
impl GitHubLogGroup { fn new(title: Arguments<'_>) -> Option<Self> { if env::var_os("GITHUB_ACTIONS").as_deref() != Some(OsStr::new("true")) {
env::var_os("...").is_some_and(|s| s == "true")
4ad0e26 to
c589c57
Compare
swananan
left a comment
There was a problem hiding this comment.
5.15 and 6.6 from Debian Snapshot are not current LTS patchlevels, and I hit two related test failures on those kernels. I think we can use Ubuntu Mainline for 5.15 and 6.6 instead, while keeping the remaining kernels on the Debian pool:
https://kernel.ubuntu.com/mainline/
That should give us much closer coverage of the current LTS patchlevels, but it would require some xtask changes to support Ubuntu's package layout. I also checked cilium/ebpf's CI matrix ( https://github.com/cilium/ebpf/blob/main/.github/workflows/ci.yml), which includes mainline and stable kernels as well, but I am not sure whether we need to add those here.
Does this approach work for you? I can try this in this PR.
@swananan made 5 comments.
Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on tamird).
Brewfile line 7 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
needs a comment plz
Rewrote script in Python, so jq is no longer required.
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
this is getting to be quite a lot of code. is it time for this to be written in python?
I have rewritten the script in Python, making it only dependent on native Python libs. And I also lint this script with ruff:
ruff check .github/scripts/download_kernel_images.py
ruff format --check .github/scripts/download_kernel_images.py
.github/workflows/ci.yml line 301 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
plz add to the comment above
Done.
xtask/src/run.rs line 31 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
env::var_os("...").is_some_and(|s| s == "true")
Done. Used is_none_or instead
tamird
left a comment
There was a problem hiding this comment.
@tamird reviewed all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on swananan).
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
Previously, swananan (swananan) wrote…
I have rewritten the script in Python, making it only dependent on native Python libs. And I also lint this script with ruff:
ruff check .github/scripts/download_kernel_images.py
ruff format --check .github/scripts/download_kernel_images.py
Thinking about this again - should we just write it in rust and make it part of xtask? I think it used to be, and I don't remember why it no longer is but the answer would be in the git history.
tamird
left a comment
There was a problem hiding this comment.
Why not use mainline for all kernels?
@tamird reviewed 2 files, made 1 comment, and resolved 2 discussions.
Reviewable status: 2 of 10 files reviewed, 1 unresolved discussion (waiting on swananan).
c2b4ed0 to
3f76aa7
Compare
swananan
left a comment
There was a problem hiding this comment.
Sure, I also removed the Debian .deb support to keep the code cleaner.
I checked the CI runtime after switching to Ubuntu Mainline. The arm64 lane increased from about 26 mins to about 35 mins, with arm64 6.18 taking the longest. This is likely because the Ubuntu packages include more kernel modules/configuration than the previous Debian cloud kernels.
One possible mitigation is to skip the debug test cases on the arm64 6.12 and 6.18 jobs, which should reduce the runtime.
@swananan made 2 comments.
Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on tamird).
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
Thinking about this again - should we just write it in rust and make it part of xtask? I think it used to be, and I don't remember why it no longer is but the answer would be in the git history.
Done.
tamird
left a comment
There was a problem hiding this comment.
@tamird partially reviewed 17 files and made 7 comments.
Reviewable status: 13 of 19 files reviewed, 7 unresolved discussions (waiting on swananan).
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
Previously, swananan (swananan) wrote…
Done.
Curious that this rewrite is not mentioned in any commit message? Should it be in its own commit?
xtask/src/http.rs line 16 at r14 (raw file):
const USER_AGENT: &str = "aya-xtask"; const REQUEST_PHASE_TIMEOUT: Duration = Duration::from_secs(30); const REQUEST_GLOBAL_TIMEOUT: Duration = Duration::from_secs(15 * 60);
move these into new? only used there and the comments there are really describing these constants
Code quote:
const REQUEST_PHASE_TIMEOUT: Duration = Duration::from_secs(30);
const REQUEST_GLOBAL_TIMEOUT: Duration = Duration::from_secs(15 * 60);xtask/src/http.rs line 144 at r14 (raw file):
dest_path.with_file_name(format!( "{}.tmp-{}", file_name.to_string_lossy(),
please no lossy conversions, this can use OsString directly to avoid this
xtask/src/http.rs line 149 at r14 (raw file):
}; { let mut tmp = File::create(&tmp_path)
i think you want to wrap this in a buffered writer
xtask/src/http.rs line 154 at r14 (raw file):
io::copy(&mut body, &mut tmp) .with_context(|| format!("failed to download {url} to {}", tmp_path.display()))?; tmp.flush()
i think you don't need this because close will do it.
xtask/src/http.rs line 168 at r14 (raw file):
.with_context(|| format!("failed to write {}", etag_path.display()))?; } else if etag_path .try_exists()
this is classic TOCTOU, just remove and ignore the not exist error
xtask/src/ubuntu_mainline.rs line 216 at r14 (raw file):
let mut archives = Vec::new(); let mut keep = BTreeSet::new();
why BTree and not HashSet?
3f76aa7 to
74912f6
Compare
swananan
left a comment
There was a problem hiding this comment.
@swananan made 7 comments.
Reviewable status: 8 of 19 files reviewed, 7 unresolved discussions (waiting on tamird).
.github/scripts/download_kernel_images.sh line 1 at r4 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
Curious that this rewrite is not mentioned in any commit message? Should it be in its own commit?
Sure, I expanded the commit message to mention this.
I did not split it into a separate commit because I want each commit to pass CI on its own. Since this also removes the Debian .deb path, keeping the script removal together with the Ubuntu Mainline xtask resolver felt cleaner.
xtask/src/http.rs line 16 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
move these into
new? only used there and the comments there are really describing these constants
Done.
xtask/src/http.rs line 144 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
please no lossy conversions, this can use OsString directly to avoid this
Done.
xtask/src/http.rs line 149 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
i think you want to wrap this in a buffered writer
Done.
xtask/src/http.rs line 154 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
i think you don't need this because close will do it.
I prefer to keep the flush, since drop does not report flush errors,
xtask/src/http.rs line 168 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
this is classic TOCTOU, just remove and ignore the not exist error
Done. Thanks for the pointer.
xtask/src/ubuntu_mainline.rs line 216 at r14 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
why BTree and not HashSet?
Done. BTreeSet doesn't provide any benefit here, switched to HashSet.
tamird
left a comment
There was a problem hiding this comment.
@tamird partially reviewed 11 files, made 2 comments, and resolved 7 discussions.
Reviewable status: 16 of 19 files reviewed, 2 unresolved discussions (waiting on swananan).
xtask/src/ubuntu_mainline.rs line 62 at r17 (raw file):
let mut links = Vec::new(); let mut rest = html; while let Some(index) = rest.find("href") {
does this want to use trim_prefix?
but also: does this want to use a proper xml parser?
xtask/src/ubuntu_mainline.rs line 299 at r17 (raw file):
}; data_tar_entries += 1; let data_archive = String::from_utf8_lossy(&data_archive);
still lossy
bda2252 to
f95fd6e
Compare
Fold each kernel's integration test output under a GitHub Actions log group. This keeps multi-kernel jobs easier to scan without splitting the lane.
Teach depmod and modprobe to read .ko.zst modules, and build the VM test distro with zstd support. This keeps recent Ubuntu Mainline module packages compressed in the initramfs while still allowing module autoloading inside the VM.
The VM runner no longer uses GitHub API authentication when fetching gen_init_cpio.c. Remove the stale option and stop passing the workflow token from CI.
Treat the final modules.alias field as the module name. This avoids aborting quiet modprobe calls when Ubuntu Mainline arm64 6.18 modules produce aliases that contain spaces, which otherwise prevents qdisc module autoloading.
Wait for the stderr reader thread before reporting a non-zero QEMU exit. The pipes are already drained by reader threads, so wait on the child directly and let stderr lines reach CI logs before returning the error.
Switch VM integration jobs to let xtask resolve and download kernel versions from Ubuntu Mainline generic packages. Ubuntu Mainline has better LTS line coverage than the current Debian pool, including 5.15 and 6.6. Remove the Debian download helper and make the VM runner Ubuntu Mainline only. Require --kernel-arch, accept version arguments instead of .deb paths, and keep only the linux-image-unsigned plus linux-modules layout.
Add the missing 5.15 and 6.6 LTS lines to VM coverage. Refresh the 6.17 lane to 6.18 as well, since 6.18 is an LTS kernel line.
f95fd6e to
87169e0
Compare
swananan
left a comment
There was a problem hiding this comment.
Update: CI hit a new Ubuntu Mainline state where v5.10.257 and v5.10.258 are listed, but their amd64/arm64 artifacts are still missing.
I updated the resolver to select the newest complete build for the requested kernel line and architecture, instead of blindly using the newest release directory. The fallback is bounded to the latest 10 candidates, so we tolerate temporarily incomplete Mainline builds without silently degrading forever if an LTS line becomes unavailable.
I also took a quick look at #1591. It looks like Aya can rely on Bazel to run against different kernels, which seems like a good long-term solution.
@swananan made 3 comments.
Reviewable status: 4 of 19 files reviewed, 2 unresolved discussions (waiting on tamird).
xtask/src/ubuntu_mainline.rs line 62 at r17 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
does this want to use trim_prefix?
but also: does this want to use a proper xml parser?
I tightened the HTML parsing used to extract URLs.
An xml parser is not appropriate here, since the page is HTML rather than xml.
xtask/src/ubuntu_mainline.rs line 299 at r17 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
still lossy
Done.
Ideally we would run against every active upstream LTS kernel, but I could only find usable Debian Snapshot cloud kernel packages for the 5.15 and 6.6 lines. These are not the latest upstream LTS patch releases, but they still improve coverage across older supported kernel lines.
I did not add the new kernels to the arm64 lane for now, to avoid increasing PR CI time while GitHub Actions has been unstable recently.
Added/updated tests?
We strongly encourage you to add a test for your changes.
Checklist
cargo +nightly fmt.You can find failing lints with
cargo xtask clippy.cargo test.cargo xtask public-api --bless.(Optional) What GIF best describes this PR or how it makes you feel?
This change is