Skip to content

Fix GPU attach/detach handling and CM response processing#44

Open
NekoHK wants to merge 14 commits into
CoHDI:mainfrom
NekoHK:github-sync-ado-dev
Open

Fix GPU attach/detach handling and CM response processing#44
NekoHK wants to merge 14 commits into
CoHDI:mainfrom
NekoHK:github-sync-ado-dev

Conversation

@NekoHK

@NekoHK NekoHK commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Background

GPU attach/detach processing needed improvements around driver detection, GPU load checks, kubelet plugin restart behavior, and empty device responses.

CM response handling also needed to correctly process machine lookup results and removal responses.

Changes

  • added missing unit tests for upstream syncer and garbage collection
  • added CRO MT improvement tests
  • refactored GPU driver type detection
  • added fallback detection for the NVIDIA kernel module
  • made the GPU operator namespace configurable by environment variable
  • replaced kubelet plugin daemonset restart with pod restart
  • improved GPU load check and drain handling
  • handled GPU detach cases where no devices are found
  • updated CM machine lookup and response handling

NekoHK added 11 commits June 15, 2026 20:33
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
Signed-off-by: Ko Kai <ko.kai@jp.fujitsu.com>
@NekoHK NekoHK force-pushed the github-sync-ado-dev branch from ba02cdc to 2ec315c Compare June 15, 2026 11:34

@mgazz mgazz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the submission. The PR looks good, just a couple of suggestions to improve readibility.

Comment thread internal/controller/composableresource_controller.go
Comment thread internal/cdi/fti/cm/client.go Outdated
Comment thread internal/cdi/fti/cm/client.go Outdated
Comment thread internal/cdi/fti/cm/client.go Outdated
@NekoHK NekoHK force-pushed the github-sync-ado-dev branch 2 times, most recently from e6fd6a8 to 2ec315c Compare June 16, 2026 07:51
@NekoHK NekoHK force-pushed the github-sync-ado-dev branch from 3881139 to f1ce30e Compare June 22, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants