feat: bbolt persistent pod device configs#115
feat: bbolt persistent pod device configs#115rbtr wants to merge 4 commits intokubernetes-sigs:mainfrom
Conversation
|
Hi @rbtr. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test I wonder if it will be better organized if we define a new model for Store or Storage that implements the interface, so we can use any technology and keeps the code better organized. There is also the need to synchronize the information, during a dranet restart the state may have changed, so we need to be able to guarantee the cluster , node and db state are in sync. |
I think, besides the names, this is already set up in this direction. It's behind an interface; new buckets can be created under the same bolt and once there is a need for expanded storage interfaces it can be refactored and expanded accordingly. I can explore that refactor now if you would prefer but I think it's mostly rearranging/renaming and not a functional change here.
Added an init state synchronization, lmk what you think |
|
/assign |
gauravkghildiyal
left a comment
There was a problem hiding this comment.
Thanks for the patience @rbtr! This is looking really cool
Haven't reviewed the tests yet, but I thought I'll share this before. I was thinking about the PodConfigStore setup and instead of the current "either-or" approach where we choose between an in-memory or a Bolt implementation at startup, what if we just have one in-memory store that handles all the logic and just uses a persistence backend (Bolt being an implementation of that persistence backend interface)?
Something like: driver -> PodConfigStore -> BoltStore
There's a couple of reasons why I'm thinking of structuring things that way:
-
GetPodConfig is in the hot path of all NRI hooks. This function is a bottleneck for
RunPodSandbox,CreateContainer, andStopPodSandbox, which are currently invoked for each and every pod on the node, regardless of whether it uses DRA or not. In the currentBoltPodConfigStoreimplementation, we are forced to fetch from the DB and perform data unmarshalling for each of these operations. Having a local cache (likePodConfigStore) would allow us to serve these requests from RAM instantly. -
As things are implemented right now, we are forced to implement ephemeral state like
NRIActivityinto the entireBoltPodConfigStore. This mixing of concerns smells like it needs some better structuring. If we were to havePodConfigStoreconsume a persistent storage likeBoltStore, we could avoid implementing this in the persistence layer entirely and keep the ephemeral state strictly in the top-level store.
Also, it looks like Kubelet does something similar for DRA in claiminfo.go. Basically keeping the state in RAM and using the disk as a backup so it survives restarts.
In the proposed implementation, I imagine:
- Most of the reads in this case will be served directly from memory.
- The interface between
PodConfigStore -> BoltStorewill be quite minimal
pkg/driver/pod_device_config_bolt.go
Outdated
| if err != nil { | ||
| return err | ||
| } | ||
| data, err := json.Marshal(config) |
There was a problem hiding this comment.
I'll recommend us to have another bucket (or a key-value pair if that serves better) to group the deviceConfigs, rather than storing it directly within the podUID bucket.
So instead of pod_configs => <POD_UID> => <ListOfDevices>, we instead of pod_configs => <POD_UID> => device_configs => <ListOfDevices>
Rationale being that it's very likely we'll need a Pod level config as well which is not specific to a device. (Think of something like NRIActivity, due to which the existing PodConfigStore is also structured in equivalent manner)
The DRANET driver maintains a podConfigStore that tracks per-pod network and device configurations. This state is populated during the DRA NodePrepareResource phase and consumed by NRI hooks (RunPodSandbox, CreateContainer) to inject devices and apply network config. Because the store was purely in-memory, a daemon restart between these two phases caused the NRI hooks to silently skip device injection — pods would run without their intended network configuration. Replace the in-memory-only store with a bbolt-backed persistent store: - Extract a podConfigStorer interface from the existing PodConfigStore, with methods for SetDeviceConfig, GetDeviceConfig, GetPodConfig, DeletePod, DeleteClaim, UpdateLastNRIActivity, and GetPodNRIActivities. - Add BoltPodConfigStore, a new implementation backed by bbolt that persists DeviceConfig entries as JSON in a "pod_configs" bucket keyed by "podUID/deviceName". LastNRIActivity remains ephemeral in memory since it is only used for graceful shutdown coordination. - Add json struct tags to DeviceConfig, RDMAConfig, and LinuxDevice to enable serialization. The apis.NetworkConfig types already had them. - Wire persistence into the driver lifecycle: WithDBPath option selects the bolt store in Start(), and Stop() closes the database. - Configure the database path as /var/run/dranet/pod_configs.db and add a corresponding hostPath volume (DirectoryOrCreate) in install.yaml. - The in-memory PodConfigStore remains available and is used by default in tests and when no DB path is configured. Fixes: kubernetes-sigs#89 Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
8e31ffe to
2f93b2d
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rbtr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Refactor the pod config persistence to use a layered architecture following the kubelet DRA checkpoint pattern (pkg/kubelet/cm/dra/state): - Define a Checkpointer interface as the minimal persistence contract (GetOrCreate, Store, DeletePod, Close). The in-memory PodConfigStore is the source of truth; the Checkpointer is a write-through backend. - Refactor BoltPodConfigStore into an unexported boltCheckpointer that implements only the Checkpointer interface. Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
2f93b2d to
92be948
Compare
|
@gauravkghildiyal thanks for this feedback. I agree with your suggestions here and have refactored the proposal to be more of a "write-through cache" pattern. It's basically modeled on that Kubelet precedent; even though bolt is probably fast enough to serve reads directly I think it's a better architecture this way. LMK what you think |
Replace the in-memory-only store with a bbolt-backed persistent store using a layered write-through cache architecture where the in-memory PodConfigStore remains the source of truth for all reads, with an optional Checkpointer interface as a write-through persistence backend. This follows the kubelet DRA checkpoint pattern (pkg/kubelet/cm/dra/state).
Fixes: #89