Skip to content

feat: bbolt persistent pod device configs#115

Open
rbtr wants to merge 4 commits intokubernetes-sigs:mainfrom
rbtr:feat/bbolt-persistence
Open

feat: bbolt persistent pod device configs#115
rbtr wants to merge 4 commits intokubernetes-sigs:mainfrom
rbtr:feat/bbolt-persistence

Conversation

@rbtr
Copy link
Copy Markdown
Contributor

@rbtr rbtr commented Mar 27, 2026

Replace the in-memory-only store with a bbolt-backed persistent store using a layered write-through cache architecture where the in-memory PodConfigStore remains the source of truth for all reads, with an optional Checkpointer interface as a write-through persistence backend. This follows the kubelet DRA checkpoint pattern (pkg/kubelet/cm/dra/state).

  • Define a Checkpointer interface (GetOrCreate, Store, DeletePod, Close) as the persistence contract. Add a boltCheckpointer implementation.
  • Modify PodConfigStore to accept an optional Checkpointer. Writes persist to the checkpointer before updating memory; if persistence fails, memory is unchanged. Deletes proceed with memory removal regardless — stale checkpoint entries are harmless and cleaned up by Synchronize on the next startup.
  • Reads (GetPodConfig, GetDeviceConfig) are served from memory in the NRI hot path.
  • Ephemeral state (LastNRIActivity) stays entirely in-memory in the PodConfigStore and never reaches the persistence layer.
  • Wire persistence into the driver lifecycle: WithDBPath option configures the bolt store in Start(), and Close() shuts down the database.
  • Configure the database path as /var/run/dranet/dranet.db and add a corresponding hostPath volume (DirectoryOrCreate) in install.yaml. Set --db-path to empty to disable persistence.

Fixes: #89

@k8s-ci-robot k8s-ci-robot requested a review from aojea March 27, 2026 22:22
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 27, 2026
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @rbtr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 27, 2026
@aojea
Copy link
Copy Markdown
Contributor

aojea commented Mar 28, 2026

/ok-to-test

I wonder if it will be better organized if we define a new model for Store or Storage that implements the interface, so we can use any technology and keeps the code better organized.

There is also the need to synchronize the information, during a dranet restart the state may have changed, so we need to be able to guarantee the cluster , node and db state are in sync.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 28, 2026
@rbtr
Copy link
Copy Markdown
Contributor Author

rbtr commented Mar 30, 2026

I wonder if it will be better organized if we define a new model for Store or Storage that implements the interface, so we can use any technology and keeps the code better organized.

I think, besides the names, this is already set up in this direction. It's behind an interface; new buckets can be created under the same bolt and once there is a need for expanded storage interfaces it can be refactored and expanded accordingly. I can explore that refactor now if you would prefer but I think it's mostly rearranging/renaming and not a functional change here.

There is also the need to synchronize the information, during a dranet restart the state may have changed, so we need to be able to guarantee the cluster , node and db state are in sync.

Added an init state synchronization, lmk what you think

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026
@gauravkghildiyal
Copy link
Copy Markdown
Member

/assign

Copy link
Copy Markdown
Member

@gauravkghildiyal gauravkghildiyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience @rbtr! This is looking really cool

Haven't reviewed the tests yet, but I thought I'll share this before. I was thinking about the PodConfigStore setup and instead of the current "either-or" approach where we choose between an in-memory or a Bolt implementation at startup, what if we just have one in-memory store that handles all the logic and just uses a persistence backend (Bolt being an implementation of that persistence backend interface)?

Something like: driver -> PodConfigStore -> BoltStore

There's a couple of reasons why I'm thinking of structuring things that way:

  1. GetPodConfig is in the hot path of all NRI hooks. This function is a bottleneck for RunPodSandbox, CreateContainer, and StopPodSandbox, which are currently invoked for each and every pod on the node, regardless of whether it uses DRA or not. In the current BoltPodConfigStore implementation, we are forced to fetch from the DB and perform data unmarshalling for each of these operations. Having a local cache (like PodConfigStore) would allow us to serve these requests from RAM instantly.

  2. As things are implemented right now, we are forced to implement ephemeral state like NRIActivity into the entire BoltPodConfigStore. This mixing of concerns smells like it needs some better structuring. If we were to have PodConfigStore consume a persistent storage like BoltStore, we could avoid implementing this in the persistence layer entirely and keep the ephemeral state strictly in the top-level store.

Also, it looks like Kubelet does something similar for DRA in claiminfo.go. Basically keeping the state in RAM and using the disk as a backup so it survives restarts.

In the proposed implementation, I imagine:

  • Most of the reads in this case will be served directly from memory.
  • The interface between PodConfigStore -> BoltStore will be quite minimal

if err != nil {
return err
}
data, err := json.Marshal(config)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll recommend us to have another bucket (or a key-value pair if that serves better) to group the deviceConfigs, rather than storing it directly within the podUID bucket.

So instead of pod_configs => <POD_UID> => <ListOfDevices>, we instead of pod_configs => <POD_UID> => device_configs => <ListOfDevices>

Rationale being that it's very likely we'll need a Pod level config as well which is not specific to a device. (Think of something like NRIActivity, due to which the existing PodConfigStore is also structured in equivalent manner)

rbtr added 3 commits April 7, 2026 14:26
The DRANET driver maintains a podConfigStore that tracks per-pod network
and device configurations. This state is populated during the DRA
NodePrepareResource phase and consumed by NRI hooks (RunPodSandbox,
CreateContainer) to inject devices and apply network config. Because the
store was purely in-memory, a daemon restart between these two phases
caused the NRI hooks to silently skip device injection — pods would run
without their intended network configuration.

Replace the in-memory-only store with a bbolt-backed persistent store:

- Extract a podConfigStorer interface from the existing PodConfigStore,
  with methods for SetDeviceConfig, GetDeviceConfig, GetPodConfig,
  DeletePod, DeleteClaim, UpdateLastNRIActivity, and
  GetPodNRIActivities.

- Add BoltPodConfigStore, a new implementation backed by bbolt that
  persists DeviceConfig entries as JSON in a "pod_configs" bucket keyed
  by "podUID/deviceName". LastNRIActivity remains ephemeral in memory
  since it is only used for graceful shutdown coordination.

- Add json struct tags to DeviceConfig, RDMAConfig, and LinuxDevice to
  enable serialization. The apis.NetworkConfig types already had them.

- Wire persistence into the driver lifecycle: WithDBPath option selects
  the bolt store in Start(), and Stop() closes the database.

- Configure the database path as /var/run/dranet/pod_configs.db and add
  a corresponding hostPath volume (DirectoryOrCreate) in install.yaml.

- The in-memory PodConfigStore remains available and is used by default
  in tests and when no DB path is configured.

Fixes: kubernetes-sigs#89

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
@rbtr rbtr force-pushed the feat/bbolt-persistence branch from 8e31ffe to 2f93b2d Compare April 7, 2026 16:00
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 7, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rbtr
Once this PR has been reviewed and has the lgtm label, please ask for approval from gauravkghildiyal. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Refactor the pod config persistence to use a layered architecture
following the kubelet DRA checkpoint pattern (pkg/kubelet/cm/dra/state):

- Define a Checkpointer interface as the minimal persistence contract
  (GetOrCreate, Store, DeletePod, Close). The in-memory PodConfigStore
  is the source of truth; the Checkpointer is a write-through backend.

- Refactor BoltPodConfigStore into an unexported boltCheckpointer that
  implements only the Checkpointer interface.

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
@rbtr rbtr force-pushed the feat/bbolt-persistence branch from 2f93b2d to 92be948 Compare April 7, 2026 16:34
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 7, 2026
@rbtr
Copy link
Copy Markdown
Contributor Author

rbtr commented Apr 7, 2026

@gauravkghildiyal thanks for this feedback. I agree with your suggestions here and have refactored the proposal to be more of a "write-through cache" pattern. It's basically modeled on that Kubelet precedent; even though bolt is probably fast enough to serve reads directly I think it's a better architecture this way. LMK what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Reliability] In-memory state loss during DRANET restarts leads to pod configuration bypass

4 participants