Add cnode tag#539
Conversation
This adds a smaller CNode as PD's cspace and moves the original Microkit resource CNode to the root's first slot, as this makes the CSpace management more flexible. The changes are: - size of root CNode: 64 - size of Microkit CNode: 1024 -> 512 - guard size of root CNode: 0 - guard size of Microkit CNode: seL4_WordBits - 6 - 9 This also fixes the capability creation for shared VSpace and CSpace. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
This adds rules for QEMU tests on x86 and enables the 'capdl_spec.json' file generation for analysis. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
This checks if people try to place extra caps in slot 0, which is reserved for Microkit CNode, and adds a test case for it. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
This adds 'cnode' tag in SDF, providing a way for users to define custom-sized CNodes to be shared among PDs. One typical use case is to pass capabilities from one PD to another. Custom CNodes can be mapped into PDs via 'cap_cspace' with target 'cnode_name' specified, and other CapMapType fails with 'cnode_name' specified. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
This adds a few test cases for checking if invalid cnode tags or cspace mapping fails as expected. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
| named_object.name.as_ref().unwrap() | ||
| ); | ||
| }; | ||
| if let Some(src_pd) = cap_map.pd { |
There was a problem hiding this comment.
I think this is assignment within if-check, that's extremely confusing.
I would either make temp vars outside the if-checks, or don't bother with them. cap_map.pd and cap_map.cnode are short and clear enough.
There was a problem hiding this comment.
It's if-let destructuring, which is the canonical way to pattern match this in Rust.
However, I would actually agree with you regarding "source" on the CapMap type here internally: https://github.com/seL4/microkit/pull/539/changes#r3491145821
we can use a rust tagged enum and make this clearer.
The way I structured this code I was intended that the source-kind checks would be actually located with the CapMapType.
i.e. we could represent this data structure as
CapMapKind {
Tcb { source: enum {
PdSource(id),
CNodeSource(id)
}
}
or something like that, and then you just do this and you can't construct a CapMap data type without representing our internal invariants.
You could also do the separate source field but then you gather an additional _ => unreachable!("not valid here") case in your matching.
There was a problem hiding this comment.
right, this looks more rust-y.
| capdl_util_make_cnode_cap(pd_src_shadow_cspace.cspace, 0, guard_size) | ||
| } | ||
| }; | ||
| } else if let Some(src_cnode) = cap_map.cnode { |
There was a problem hiding this comment.
It makes no logical sense that this would be an else-if instead of an if, as you're checking different fields. From a higher level it of course makes perfect sense, as they are mutually exclusive.
However, as you already wrap them in an Option, would it make more sense in Rust to have one shared src field of different types and do a switch on type instead?
| let pd_guard_size = kernel_config.cap_address_bits - cnode.size_bits as u64 - PD_ROOT_CAP_BITS as u64; | ||
| let cnode_cap_self_ref = capdl_util_make_cnode_cap(cnode_obj_id, 0, pd_guard_size.try_into().unwrap()); |
There was a problem hiding this comment.
Below it does it without the weird try_into/unwrap combo. Maybe use as u8 above like is done below?
Perhaps add something like an capdl_util_make_sub_cnode_cap() function and add that to capdl util? Then you can call that everywhere and keep this code in one place.
Edit: I realised that capdl util probably doesn't know about kernel_config.cap_address_bits and PD_ROOT_CAP_BITS, so it can't be there and must be a less generic, more local make_sub_cnode_cap().
| let pd_guard_size = kernel_config.cap_address_bits - *size_bits as u64 - PD_ROOT_CAP_BITS as u64; | ||
| let cnode_cap = capdl_util_make_cnode_cap(*cnode_obj_id, 0, pd_guard_size.try_into().unwrap()); |
There was a problem hiding this comment.
Maybe replace with a capdl_util_make_sub_cnode_cap() call here too.
However, I'm not sure you want to do this unconditionally, because the user may want to manage a 2-level sub-CSpace themselves. Gobbling all remaining bits up as guard would make this impossible.
I agree that this should be the default behaviour, but I think that it should also be possible for users to specify the guard size (and perhaps value) manually.
So the the guard size would be cap_address_bits - PD_ROOT_CAP_BITS - size_bits - user_guard_size, where the latter is default 0 and the result must be >= 0.
If adding capdl_util_make_sub_cnode_cap(), you would call it with size_bits + user_guard_size instead of just size_bits.
There was a problem hiding this comment.
The consideration here was for CNode cap resolution, which strictly needs to consume all the cap_address_bits. That's also why I placed a self-ref cap at slot 0 for every sub-CNode.
Do you have better ideas about this?
There was a problem hiding this comment.
I think that it should also be possible for users to specify the guard size (and perhaps value) manually.
I remember my original proposal now: Ideally the user can specify the CSpace size in bits directly, instead of specifying the guard bits. Then the user managed those bits and they always have a fixed, known size. The default would then be cspace size bits equal to the cnode size bits. Specifying the guard size would result in a subspace that depends on PD_ROOT_CAP_BITS, and the user would still have no clue what space they are managing.
The consideration here was for CNode cap resolution, which strictly needs to consume all the cap_address_bits. That's also why I placed a self-ref cap at slot 0 for every sub-CNode.
Okay, as you know we have two different caps CNode caps here: The self-ref cap and the sub-CSpace cap.
By placing the self-ref cap in slot 0, you force full cap_address_bits alignment of the sub-CSpace, meaning it's forced to a 1-level sub-CSpace. Maybe you can work around this by adding a guard size that consumes all remaining bits for the self-ref cap, but I don't think so, I think the kernel check happens at the wrong moment for that. Worth a try though.
So the solution is to not reserve slot 0 for the self-ref cap, but to put that one in Microkit managed space.
This means you have two values to convey to the user: The self-ref cap address and the start CPtr of the user sub-CSpace.
This is fair bit more complicated than what you have now though, so leaving this for a future implementation is fine.
There was a problem hiding this comment.
Maybe you can work around this by adding a guard size that consumes all remaining bits for the self-ref cap, but I don't think so, I think the kernel check happens at the wrong moment for that. Worth a try though.
The kernel is not happy with it.
I remember my original proposal now: Ideally the user can specify the CSpace size in bits directly, instead of specifying the guard bits. Then the user managed those bits and they always have a fixed, known size. The default would then be cspace size bits equal to the cnode size bits. Specifying the guard size would result in a subspace that depends on
PD_ROOT_CAP_BITS, and the user would still have no clue what space they are managing.
Sorry, I don't quite understand the point here. What do you mean by "cspace size"? why manually specifying "guard size" results a subspace that depends on PD_ROOT_CAP_BITS? Can you please clarify a bit more?
So the solution is to not reserve slot 0 for the self-ref cap, but to put that one in Microkit managed space.
This means you have two values to convey to the user: The self-ref cap address and the start CPtr of the user sub-CSpace.
yes, this sounds better. The user might also want to know the depth (guard size bits + cnode size bits) of a sub-CNode for invocation, and maybe more.
There was a problem hiding this comment.
The kernel is not happy with it.
That was my impression from reading the code too, but good to double check.
Sorry, I don't quite understand the point here. What do you mean by "cspace size"? why manually specifying "guard size" results a subspace that depends on PD_ROOT_CAP_BITS? Can you please clarify a bit more?
When a user wants to manage a multi-level sub-CSpace, they need to know how many bits are remaining to be resolved. So let them specify that directly, instead of letting them specify the guard size of the CNode cap. That is, let the user configure it LSB first, with the top bits being used PD_ROOT_CAP_BITS and padded with guard.
Example: User wants 32 bits of CSpace to manage, and they want a (optional) guard value of 0xabcd. Say PD_ROOT_CAP_BITS is also 16. The CSpace then looks like: 0xMM000000abcdUUUU, where M is Microkit managed and U is user managed. Say the user top level CNode is 8 bits. To get this they would specify cspace_size=32, cspace_guard=0xabcd, cnode_slot_size_bits=8.
The tool would calculate the zero padding between the guard and PD_ROOT_CAP_BITS and check if it fits.
If the user has to specify the guard size, they need to do that calculation themselves somehow, and they would need to know how big PD_ROOT_CAP_BITS is. But it would only add complexity for the user with no gain.
There was a problem hiding this comment.
so in this example, the user manages 0xabcdUUUU (32 bits). 0xabcd takes 16 bits and the user top level CNode takes 8 bits, so there are still 8 bits spare for the user to manage. Do I understand it correctly?
The guard value 0xabcd seems to be a part of user managed bits as well, but how do we know how many bits the guard value takes from user-managed bits? e.g., cspace_guard=0x0abc.
Are the padded zeros also part of the user top level CNode's guard?
| return Err(value_error( | ||
| xml_sdf, | ||
| node, | ||
| format!("invalid paramter 'cnode_name' for target CapMapType"), |
| return Err(value_error( | ||
| xml_sdf, | ||
| node, | ||
| format!("Either 'pd' and 'cnode_name' should be specified"), |
There was a problem hiding this comment.
See, that's why I prefer a generic src instead of one attribute per source type. What you're doing here doesn't scale very well, and it has unclear semantics by design, making things harder for the user instead of easier.
There was a problem hiding this comment.
The issue with a generic "source" is that you run into namespace issues, and have to add meaning to stting prefixes or as we do here, new attributes.
If you have better ideas how to handle this I'm all ears but scaling is one rea why we had the syntax be "cap_cmode" as there must eventually be a limit on where cnodes are made. (same for other cap types).
That this all currently goes through the same code path right now for every cap type was a simplification.
There was a problem hiding this comment.
What namespace issues? Could you give an example?
For cap management, you basically need to gather the information you need to make the right seL4_CNode_Mint call. The seL4 API is inconsistent and weird, so it has parameters that don't apply to all cap types. That's just how seL4 is, that isn't a namespace issue.
If you work at this level, it's fine to match the seL4 API more, inconsistencies and all. It's fine if you make it easier for the user and provide not just "data", but more explicit "badge", "guard_size" and "guard_value" attributes. Yes, that means you have an attribute "badge" that's usually unused. And you can still add an error or warning if it's used on a cap type that doesn't use it if you want. But as using those on caps that don't use them doesn't make much sense, the added value of doing this is minimal, especially for an advanced feature like this.
There was a problem hiding this comment.
What namespace issues? Could you give an example?
Naming a source as a CPtr isn't very helpful. One wishes to name an appropriate microkit object; for instance: the CSpace/VSpace/Tcb/SC of a PD, or here, the name of a CNode (this is what I meant by namespace, as that would require a global lookup of object name). Or maybe a PD's frames. Or page tables.
There was a problem hiding this comment.
If you have cross-references of any kind, you want unique names, because then you can use those as unique ids in your XML. So yes, you want a global lookup of object names, I don't see why that would be a problem.
There was a problem hiding this comment.
Ah, I forgot: A PD has only one name and multiple caps. I understand what you mean now.
|
|
||
| impl CNode { | ||
| fn from_xml(xml_sdf: &XmlSystemDescription, node: &roxmltree::Node) -> Result<CNode, String> { | ||
| check_attributes(xml_sdf, node, &["name", "type", "size_bits"])?; |
There was a problem hiding this comment.
I'm wondering if it makes more sense to specify this as "slot count" instead of "size bits", because then the declaration wouldn't depend on e.g. how big a cnode slot is. (which tbf is consistent across all 64 bit arches but).
Unless that's how this value is already used, in which case "slot_count_bits" is more precise.
(Is it worth using bits where we can't represent illegal non-pow-2 values or should we just use pow-of-2 numbers and have to catch the non-powrr-of-2 error case. I don't know).
There was a problem hiding this comment.
For Microkit tool, bits makes the effort less to do the checks and calculation, e.g., guard size.
For users, I don't know.
There was a problem hiding this comment.
I assumed that size_bits was the slot count and not the memory size of the CNode. I agree that slot_count_bits would make this explicit.
For users I think both would be fine, but size in slots instead of bits is easier to understand. But if size in bits is used elsewhere already, then it's better to be consistent.
| return Err(value_error( | ||
| xml_sdf, | ||
| node, | ||
| format!("Either 'pd' and 'cnode_name' should be specified"), |
There was a problem hiding this comment.
"or" not and here btw
|
On 2026-07-01 13:23, Terry Bai wrote:
@terryzbai commented on this pull request.
so in this example, the user manages 0xabcdUUUU (32 bits). 0xabcd
takes 16 bits and the user top level CNode takes 8 bits, so there are
still 8 bits spare for the user to manage. Do I understand it
correctly?
Sorry, I started writing the example for 16 user managed bits, but
partially changed it to 32 buts. So the example I gave is actually for
16 bit user managed, not 32.
But yes, the spare bits would be the cspace size bits minus the cnode
size.
The guard value 0xabcd seems to be a part of user managed bits as well
No it's not, that was a mistake in my example.
Are the padded zeros also part of the user top level CNode's guard?
No, same as the guard, it's excluded.
|
|
Thinking about it, I don't think Microkit needs to support user specified guards for individual cnodes. If people want that, they can specify a bigger cspace size and use their own guard for the second level user managed cnode. Non-zero guards are mostly useful for the last cnode, they're not really needed at higher levels. Main thing you want is that you don't overflow into the adjacent cnode, e.g. in case of an off-by-one error, but that already happens automatically for higher levels and usually also for the lowest one. The other use case for guards is as some kind of type safety so that random integers have a low chance of being a valid cap. But Microkit can do that itself already, once there is a guard it doesn't really matter what value it has. This could be unconditionally or a global config option or something. |
Have you talked this over with anyone from the verification side of Microkit? Because this kills component separation. In terms of access control, there would be no reason to bother making two separate PDs if they share a CNode that both sides have write access to. In terms of rights, they'd be equivalent to one larger PD. If everyone is aware of this, and you have some plan on how that fits with your verification story, that's fine. Just pointing out that the standard high-level reasoning why things generally work and are separate will no longer apply. Unless this is just a temporary experiment, such large fundamental changes should really be described properly in RFCs with design, reasons, considered alternatives, implications etc. Putting Microkit under the foundation and then not following its procedures is not Ok. What the TSC accepted was a framework for static architectures with isolation, and this PR breaks that. |
|
You might also be interested in cap sharing PR, motivated by a user space schwduler, and remaining (device) untyped passing: #436, which we wanted to discuss further because it would be new capDL spec features. Terry was going to message you directly about that. For context, this is part of work to make x86 platforms useable with PCIe/ACPI, that is, because on x86 PCI devices move location, and so does RAM, and we need to parse ACPI for locations of various interrupts, we can't do everything in the microkit specification at build time, as this might not even work repeatedly on the same machine. Other options considered here was making this part of the capDL initialiser (seems quite difficult, as it would involve it having an AML interpreter as well), or building it into the microkit as part of the "system" (maybe worth it). The intent of this PR is to make it possible for the ACPI driver to only provide the necessary untypeds covering the PCIe memory window to the PCIe driver. The PCIe driver, would then it turn have the rights to only the IRQ parts of device driver's CNodes. These tasks are intended to be considered part of "system init", and not active after that phase. They would also be trusted. Yes, we are aware this would have affects on verification story, but from our systems discussion we don't know how we would avoid it when the hardware is itself, dynamic. Another feature is "template PDs" where we need to give some rights to a controller that can perform runtime reloading of PDs (also including access to certain CSpaces to add/remove notification rights for containers, but only from a subset of a statically provided maximum). Gernot calls this a "static architecture" but not necessarily "static system", IIRC. I'd also note that these are, strictly speaking, optional. I had been vaguely considering whether it was worth marking these features as experimental for the next release: it would be nice to be exempt from worrying about breaking changes for something we don't know if it is the best solution. But we want something. The intent is to only use this for ACPI/PCIe. As for RFCs, I was not under the impression that Microkit changes needed these officially, as we never did these with Ivan. If that is correct we have a large amount of other changes made without RFCs :) If that is not the case we should discuss further. |
Thanks for the detailed explanation. Now I understand your point. This makes sense to me if we allows the user to manage more than one level of CNodes. But before extending flexibility of the user CSpace management, I might need to understand the implications on verification story first as suggested by @lsf37 in comments. |
In ACPI/PCIe driver work, the shared CNodes are used for passing capabilities in one-way.
We can specify the permissions on CNodes if it helps on verification story. |
It's not the task's root CNode that is being shared, it's an empty sub-CNode which will be managed by a task itself. If that's not safe to do, then seL4 has a design problem.
This PR does not break that, it just makes it possible to also do (limited) dynamic things at runtime. If whatever proof framework is used to proof the user space components can't deal with this, then this can't be used for such systems. I don't see why that would be a problem. |
CNodes don't have permissions. I think the seL4 design is a bit awkward to safely share sub-cnode caps, permissions could help with that. |
Agreed -- we actually removed CNode permissions at some point (very early), and I think that was a mistake. It did speed up cap lookup, though, because the model at the time was that a read-only CNode cap would diminish the rights of the caps in the CNode. We should just have removed the latter part. I don't even think it'd be much verification work to add that part back. If you're up for writing an RFC, we could see what everyone else thinks of this. |
Depends on what you define as safe. If you share authority you are clearly not separate any more. That's just a direct consequence, not something you can design around in a cap system. With a good permissions system for this (which we might not have), you might be able to get to the stage where only one is a supervisor of the other, i.e. PD A can change caps in PD B, but not the other way around. Still kills isolation, but less so. If you just share a CNode that both can write to, then you don't even have that. But there are plenty other options. Caps can be granted via IPC. A CNode could be made and filled in one PD, and a single CNode cap could then be sent to the other PD. Maybe the grant right of the endpoint cap is then removed. I'm not saying that is the best option, but there are loads of options in the design space for this, and I'm not seeing a principled design exploration here, just a PR.
Yes, exactly. That is not the agreed scope of Microkit. Changing it does need a much wider RFC discussion than just a PR. Clearly the verification implications haven't been discussed and thought through. If we're designing something now, we should be designing it in such a way that works with the plans verification has for dynamic systems. There are multiple projects around that are trying to deal with that, and I'm not seeing any input from them here (because how would they even know this exists).
That would be perfectly fine with me, and that's one of the key things the design must make easy to determine and lock down. I think it'd work with the proposed change, but it'd probably also work with a lot of other points in the design space. To be clear, I'm not saying we shouldn't add a feature like this. I do agree that something is likely necessary, and maybe this design is the one in the end. I'm saying a pretty wide group of stakeholders should be involved in this discussion. |
|
OK, point taken. Let's at the very least delay this until after the next release. @terryzbai What might be a good idea for the moment is we'll maintain an au-ts microkit fork for a while with several features we want until we can figure out the best solution, in the interest of getting our stuff working, and we can use that in our repositories until we've settled on the way. Unless it would be OK to maintain an additional "experimental" branch in this repository? (@lsf37) |
Yes, an "experimental" branch (or multiple) is perfectly fine. There is no problem with prototyping and trying things out. For the design perspective, a lot of the arguments so far on the PR were "component X will do/provide something, which component Y needs to access". This is one consideration and of course has to work, but the arguably more important design questions are adversarial, i.e. "what do we still know and what can we still guarantee when X or Y misbehave accidentally/intentionally/etc" . The main verification side questions are "can we still verify X and Y independently from each other", "how much do we need to reason about kernel mechanisms when we verify X/Y". With the standard isolation principle, you can verify X and Y as separate programs, with some luck even pretending that they are sequential with only specific communication points, and you need to know nothing about underlying seL4 mechanisms. If you share CNodes, all of that goes away. You basically have to rebuild the seL4 integrity theorem as you go along verifying X and Y (reasoning that "yes, they do share authority, but no, they are not abusing it, the memory is still separate, and it will stay separate if I verify the rest of X and Y, and I am not making any kernel calls which change that", which means the Microkit abstraction is now broken). Before this, X and Y simply were not able to make such calls (and the seL4 integrity theory gives you that they cannot), now they can and you have to show they do not. It'd be much nicer to find a design where that is either never broken (not sure if that is possible), or only broken for very small periods of time/short regions of code, and otherwise kernel-enforcement comes back into effect. Or you may decide that none of that matters, because those kinds of components will always be fully verified anyway and you have the kernel model that carries that to automated user-level proofs. You'd still want to be sure that people don't start to use that all over the place just for convenience and by default are starting to build systems that are hard to reason about. |
This PR adds CNode tag to SDF, providing a way for users to define custom CNodes and share among PDs. One of the use cases is to pass capabilities from one PD to another, e.g., ACPI driver passes CNode containing memory window untypeds to PCIe driver so it can dynamically allocate and map memory regions for device drivers.
This PR is based on switch-to-two-level-cnodes PR as it should be getting merged soon.