Skip to content

x86: fix extra_bi_size init value#1696

Open
terryzbai wants to merge 1 commit into
seL4:masterfrom
au-ts:fix-x86-bootinfo-size
Open

x86: fix extra_bi_size init value#1696
terryzbai wants to merge 1 commit into
seL4:masterfrom
au-ts:fix-x86-bootinfo-size

Conversation

@terryzbai

Copy link
Copy Markdown

extraLen is wrong on x86 due to non-zero init value of extra_bi_size.

@lsf37

lsf37 commented Jun 24, 2026

Copy link
Copy Markdown
Member

I'd need a bit more explanation to figure out what you're trying to do and if that is correct.

@midnightveil

Copy link
Copy Markdown
Contributor

The rust-seL4 bootinfo work iterates from 0..extraLen in the bootinfo (which excludes the padding)
However, because of the extra sizeof(BootInfo) in this size, this included the BootInfo heading of the padding data. Which then tried to access to read the padding data, and because Rust checks lengths of slices (which is 0..extraLen) it would panic on x86. (But I suppose it was only tested on ARM because the only thing was the DTB before seL4/rust-sel4#355)

@terryzbai

Copy link
Copy Markdown
Author

ARM's bootinfo excludes the padding tag:

word_t extra_bi_size = 0;

I am not sure if the padding tag on x86 is intentionally used as an end tag. In my opinion, the len of padding tag header should be 0 to prevent anyone from reading out of extra_bi_size if this is the case.

@lsf37

lsf37 commented Jun 24, 2026

Copy link
Copy Markdown
Member

Ok, that sounds reasonable. We need to check what people are currently doing with this field and if changing it would break things.

Which existing consumers of bootinfo exist? The C capDL loader comes to mind, that does have an x64 mode and must be dealing with this field. Anything else? Maybe in the other seL4 libraries?

@terryzbai

Copy link
Copy Markdown
Author

What I know is:

@midnightveil

Copy link
Copy Markdown
Contributor

Anything else? Maybe in the other seL4 libraries?

Yes, they use them (it's necessary for x86 for ACPI stuff).

@Indanz

Indanz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

I agree that extraLen is misleading because it includes the padding header, but not the padding data. However, whether that's wrong or just weird is a matter of interpretation.

Adding sizeof(seL4_BootInfoHeader) to extra_bi_size is needed to be sure that there is space for the padding header, if you remove that, then that's not guaranteed any more.

So I suppose the question is why we have a SEL4_BOOTINFO_HEADER_PADDING at all and why extraLen doesn't match the actual size of the allocated memory.

@Indanz

Indanz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

ARM's bootinfo excludes the padding tag:

word_t extra_bi_size = 0;

I don't see how it can be sure that that memory is safe to access, it is not guaranteed that extra_bi_size - extra_bi_offset > sizeof(header), or that it's even word-aligned. DTB start address is 8-byte aligned, but I can't find anything in the standard about total size, it just says "The strings block has no alignment constraints and may appear at any offset from the beginning of the devicetree blob." To me the Arm code seems wrong.

I am not sure if the padding tag on x86 is intentionally used as an end tag.

I think it is, for code that parses the block and stops at SEL4_BOOTINFO_HEADER_PADDING, instead of looking at extraLen.

In my opinion, the len of padding tag header should be 0 to prevent anyone from reading out of extra_bi_size if this is the case.

But then it doesn't document the actual padding any more... The information is correct, it just conflicts with extraLen.

@Indanz

Indanz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

By far easiest solution would be to just remove the end SEL4_BOOTINFO_HEADER_PADDING altogether on all archs.

The original idea seems that you can add them to fill holes to required alignments, but the header itself also has alignment requirements, so how is that supposed to work? With that in mind, they can't be used as end markers either, as they can be anywhere, so it should be safe to remove the end one.

@midnightveil

Copy link
Copy Markdown
Contributor

By far easiest solution would be to just remove the end SEL4_BOOTINFO_HEADER_PADDING altogether on all archs.

I think this would make this PR consistent with the removal of the header length.

As you noted, yes, I suppose that was there for the padding because the padding never increments the size itself.

@terryzbai

Copy link
Copy Markdown
Author

To get this PR merged or this issue resolved, can I please know what you want me to do? As the bootinfo passing support in rust-sel4 and Microkit need this to be fixed.

@lsf37

lsf37 commented Jul 2, 2026

Copy link
Copy Markdown
Member

So to summarise the discussion and to make sure I understand the boot info layout (please correct me if I'm wrong).

On x86, there always is an explicit padding block with padding header after the rest of the extra_bi_region blocks, so that extra_bi_region.end = start + BIT(extra_bi_size_bits). I don't see anything wrong with that. It's different to what Arm and RISCV do (they just end after the last content block, no trailing padding), but it's perfectly legitimate.

The current extra_bi_size = sizeof(seL4_BootInfoHeader) accounts for that padding block. As far as I can see it does so correctly. Changing it to extra_bi_size = 0 would be a bug.

I can't see any specific reason that this last padding block has to exist, so we could do what Indan is suggesting, which is remove the trailing padding header on x86. If we do that, then extra_bi_size = 0 becomes correct. The commit that introduced the padding on x86 introduced is this one 256c30a and it doesn't look like it was needed there either. Possibly just there to neatly fill to the reported power of 2 size.

That all said, as far as I can see, there is no bug in the kernel. If there is one, the bug is in Rust, because the kernel does exactly what is advertised in the manual. We can still make the behaviour consistent between x86 and Arm/RISC-V, but we should probably look at what is going wrong on the Rust side first.

@midnightveil

midnightveil commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

That all said, as far as I can see, there is no bug in the kernel. If there is one, the bug is in Rust, because the kernel does exactly what is advertised in the manual. We can still make the behaviour consistent between x86 and Arm/RISC-V, but we should probably look at what is going wrong on the Rust side first.

The Rust side is unhappy because if makes a slice (a fat pointer with size+address) that covers from [bootinfo, bootinfo + extraLen).

When you iterate over the bootinfo looking for an ID with the Iterator trait:

                return Some(BootInfoExtra {
                    id,
                    content_with_header: &self.bootinfo.extra_slice()
                        [content_with_header_start..content_with_header_end],
                });

I don't see anything wrong with that. It's different to what Arm and RISCV do (they just end after the last content block, no trailing padding), but it's perfectly legitimate.

Because the padding tag has a length that goes up to the page aligned end (what is actually mapped), this results in an out of bounds crash from rust as that is outside extra_slice which goes up to extraLen.

The other alternative here I suppose would be to make extraLen be rounded up to the page size.

@lsf37

lsf37 commented Jul 2, 2026

Copy link
Copy Markdown
Member

Because the padding tag has a length that goes up to the page aligned end (what is actually mapped), this results in an out of bounds crash from rust as that is outside extra_slice which goes up to extraLen.

But the padding tag length is as specified in the manual, no? So doesn't this mean this code is just wrong?

@lsf37

lsf37 commented Jul 2, 2026

Copy link
Copy Markdown
Member

To be more precise: what stops any other block at some point going up to the page aligned end or any other padding block occurring at some point in the boot info?

@lsf37

lsf37 commented Jul 2, 2026

Copy link
Copy Markdown
Member

Ok, now I think I understand it, and I take back what I said. There is a bug, and the kernel is not doing what it says in the manual.

The problem (as I understand it), is that extra_bi_size and therefore extraLen do not actually report the full size of that padding block. They only report sizeof(seL4_BootInfoHeader) as part of extraLen which is just the info part, but not the rest of the padding. And Rust is right to panic, because it finds the padding object, and the length of that padding object reaches outside extraLen. The page boundary is not really the problem it just happens to align that way.

So the fix is either to correct that reporting or to drop the padding block, which also corrects the reporting. And if we ever do add another padding block it just needs to correctly increase extra_bi_size.

I think I also can see why this never crashed in C: because it's seeing that it is a padding block, and increasing the current pointer by the length of the padding block goes > extraLen, which means it is finished.

Alright, so with that, I vote for removing the trailing padding block on x86 and setting initial extra_bi_size = 0.

As far as I can see in the C capDL loader and in seL4_libs, this wouldn't change anything, the block is of course ignored anyway. Boot info becomes smaller, but I don't think the padding caused it to cross a page boundary, so even number of allocated pages etc should be the same.

'extra_bi_size' is meant to report the full size of all the bootinfo
blocks, but excluded the body size of trailing padding block in the
original implementation. Plus the page boundary alignment issue that
the trailing padding block was trying to solve is not really a problem.

Therefore, this fix removes the trailing padding block and initialises
`extra_bi_size` to 0 as what other architectures do.

Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
@terryzbai terryzbai force-pushed the fix-x86-bootinfo-size branch from ae2fe76 to a51980d Compare July 2, 2026 04:43
@terryzbai

Copy link
Copy Markdown
Author

This might not be relative to this PR. Can I please know what is the point of this chunk on ARM and RISCV. This chunk seems never to be touched becase extra_bi_size is always equivalent to extra_bi_offset?

@Indanz

Indanz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This might not be relative to this PR. Can I please know what is the point of this chunk on ARM and RISCV. This chunk seems never to be touched becase extra_bi_size is always equivalent to extra_bi_offset?

That's the code I thought was buggy, but if it always matches, it's fine and should be removed. (For some reason I thought it got rounded up.) Mind that RISCV has exactly the same code.

if (extra_bi_size > extra_bi_offset) {
/* provide a chunk for any leftover padding in the extended boot info */
header.id = SEL4_BOOTINFO_HEADER_PADDING;
header.len = (extra_bi_size - extra_bi_offset);
*(seL4_BootInfoHeader *)(rootserver.extra_bi + extra_bi_offset) = header;

@lsf37

lsf37 commented Jul 2, 2026

Copy link
Copy Markdown
Member

This might not be relative to this PR. Can I please know what is the point of this chunk on ARM and RISCV. This chunk seems never to be touched becase extra_bi_size is always equivalent to extra_bi_offset?

That's the code I thought was buggy, but if it always matches, it's fine and should be removed. (For some reason I thought it got rounded up.) Mind that RISCV has exactly the same code.

if (extra_bi_size > extra_bi_offset) {
/* provide a chunk for any leftover padding in the extended boot info */
header.id = SEL4_BOOTINFO_HEADER_PADDING;
header.len = (extra_bi_size - extra_bi_offset);
*(seL4_BootInfoHeader *)(rootserver.extra_bi + extra_bi_offset) = header;

RISCV was very likely just copy-pasted. I can try to validate that this afternoon, but I think you're right, we should remove all of these (if they never get emitted anyway).

@lsf37

lsf37 commented Jul 3, 2026

Copy link
Copy Markdown
Member

I guess it's there as a guard against getting things wrong, but if it's that it should be an assert instead.

@lsf37

lsf37 commented Jul 3, 2026

Copy link
Copy Markdown
Member

Ok, confirmed that RISC-V is copy/paste from ARM (commit 5fc7346), and the ARM padding bit was introduced in commit 6a31a57. Even there the condition was never true. The comment and commit message do not provide much info for intention, but it's either a guard against having gotten the size/offset wrong or it was trying to do something like x86 did before, but got it wrong. In either case, we should replace them with an assert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants