Skip to content

fix: pod deletion race condition#3

Draft
cgetzen wants to merge 30 commits intomainfrom
fix/pod-deletion-race-condition
Draft

fix: pod deletion race condition#3
cgetzen wants to merge 30 commits intomainfrom
fix/pod-deletion-race-condition

Conversation

@cgetzen
Copy link
Copy Markdown
Collaborator

@cgetzen cgetzen commented Feb 23, 2026

Summary

Breaking Changes

Testing Notes

Additional Context

vivian-hafener and others added 29 commits February 13, 2026 12:10
Changelog: added - Documented use of Kyverno policies for Slurm-bridge
With exclusive allocations, all of the node's resources will report as
being allocated to the Slurm job, hence the pod emit claims for all
resources.
This following Kubernetes a little more closely.

High level changes:

* Move logic into patchPodExtendedResourceClaimStatus()
* Move logic into bindClaim()
* Rename createResourceClaim() to preBindExtendedResources()
* Refactor generateRequestMappings() into createRequestsAndMappings()
* Use established ResourceClaim.GenerateName schema
* Use established ContainerExtendedResourceRequest name schema
Upgrade all modules. Need Kubernetes v1.35.x for DRA.
This module models Slurm nodes from Kubernetes nodes. Kube node
ResourceSlices are parsed to assemble the representative Slurm node.
Similar to DRA for GPUs, we propagate the Slurm allocation for CPUs
through extended resource claim.

NOTES:

* There are a number of magic values not exposed by the example CPU DRA
  driver which we need to use.
* There are driver specific details to prepare the pod for CPU pinning.

Changelog: Added - Added support for DRA CPU driver `dra-driver-cpu`.
Set controller.extraConfMap.ReconfigFlags to KeepPartInto during the
deployment of the Slurm Helm Chart in order to keep the Slurm-bridge
partition throughout reconfigures.
Add DRA CPU Support

See merge request SchedMD/slinky-dev/slurm-bridge!110
feat: use KeepPartInfo in hack/kind.sh to prevent partition deletion

See merge request SchedMD/slinky-dev/slurm-bridge!118
- Run only with a demo script with the examples
- Change Makefile target name of "demo-dra" to "install-dra"
- also removed lws from examples running, to not have the demo hang as they spin
- update name of Makefile target for keys
- update lws to 0.8.x
- do not deploy dra items as part of demo-examples, that is a seperate target
- Add demo-dra target that installs just the dra demo items
- Add a special demo script that is better for a demo screen
- Do not run watch script if on Mac
- Remove unused code in Makefile
- Fix memory block in kind script
- Add demo targets for yamls only
Demo config

See merge request SchedMD/slinky-dev/slurm-bridge!119
docs: document use of Kyverno policies with Slurm-bridge

See merge request SchedMD/slinky-dev/slurm-bridge!115
Changelog: Fixed - Mutating webhook unsets nodeName, if pre-defined, to ensure that we schedule.
fix: Mutating webhook should unset spec.nodeName

See merge request SchedMD/slinky-dev/slurm-bridge!120
@cgetzen cgetzen force-pushed the fix/pod-deletion-race-condition branch from 66ec7fc to 11ec360 Compare March 9, 2026 01:46
@cgetzen cgetzen force-pushed the fix/pod-deletion-race-condition branch from 11ec360 to 47b0f88 Compare March 9, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants