Skip to content

BMC static IP address support#757

Open
mrgalaxy-source wants to merge 1 commit intoNVIDIA:mainfrom
mrgalaxy-source:mrgalaxy/static-ip
Open

BMC static IP address support#757
mrgalaxy-source wants to merge 1 commit intoNVIDIA:mainfrom
mrgalaxy-source:mrgalaxy/static-ip

Conversation

@mrgalaxy-source
Copy link
Copy Markdown

@mrgalaxy-source mrgalaxy-source commented Mar 31, 2026

This PR makes a number of changes that allow a developer to point BMM at an "arbitrary" BMC (not the tenant, just the BMC right now) of a host or compute node in isolation that has a static IP address or has already been booted with a DHCP address not controlled by BMM. This allows us to test the BMM discovery and ingestion process for BMCs without needing to have a working DHCP Relay or PXE-configuration.

We can then look at all the discovered inventory information of the BMC and do things like power control while exploring the forge command line and API to learn about how things work.

This also requires the following settings to work:

allow_zero_dpu_hosts = true

Also, comment out the machine-a-tron out in the configmap:
bmc_proxy = "https://machine-a-tron-bmc-mock.forge-system.svc.cluster.local:1266"

Once that's done, you can deploy a static-ip-managed endpoint like this:

forge-admin-cli expected-machine add \
    --bmc-mac-address 4C:BB:47:25:BA:C7 \
    --chassis-serial-number 1234567890 \
    --bmc-username <your-username> \
    --bmc-password <your-password> \
    --ip-address x.x.x.x

Core Feature Implementation:
• ✅ Added ip_address field to ExpectedMachine and ExpectedPowerShelf
• ✅ Created database migrations for both expected_machines.ip_address and machine_interfaces.is_static_ip
• ✅ Added IpTypeStaticBmcIp enum variant to classify static BMC IPs
• ✅ Updated IP finder logic to check both BmcIp and MachineAddresses finders
• ✅ Web UI displays "Static BMC IP" label
• ✅ CLI supports --ip-address flag for add/patch operations
• ✅ Proper IP validation rejects malformed addresses

Test Infrastructure:
• ✅ Created NodePort service for stable postgres access
• ✅ Implemented WITH (FORCE) for database cleanup
• ✅ Fixed connection pooling and template database isolation

Test Coverage (6 comprehensive tests):
1. Add ExpectedMachine with static IP
2. Update to add static IP to existing machine
3. Update to change static IP
4. Reject invalid IP addresses
5. IP finder correctly classifies static BMC IPs
6. Site Explorer handles power shelf static IPs

@mrgalaxy-source mrgalaxy-source requested a review from a team as a code owner March 31, 2026 15:22
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 31, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Matthias247
Copy link
Copy Markdown
Contributor

some high level thoughts from me:

  1. the extra functionality should just add the extra machine interfaces. Everything else (e.g. machine creation) can go the current path, after those interfaces are "explored"
  2. there's probably various ways to specify these extra interfaces. Either the way it's done in this PR, or in other ways (new API, new config file entries, etc). I don't have a strong opinion
  3. might need some extra handling for the command which deletes machine-interfaces. Or maybe not, because they would just recreated in the next site-explorer iteration (or whatever tool manages the creation)
  4. we should have unit-tests for adding these interfaces

@mrgalaxy-source mrgalaxy-source force-pushed the mrgalaxy/static-ip branch 2 times, most recently from 7ff94c5 to ca75e9f Compare April 2, 2026 20:01
@mrgalaxy-source mrgalaxy-source changed the title Static IP address support for development purposes BMC static IP address support Apr 2, 2026
@spydaNVIDIA
Copy link
Copy Markdown
Contributor

How do we handle a situation where an IP is specified in a expected machines table but the BMC actually DHCPs to NICo (and we assign it some other IP)?

Copy link
Copy Markdown
Contributor

@chet chet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mrgalaxy-source! Let's hold off on this for a sec. Between the external BMC IP support and another feature request that came in for doing static DHCP reservations for managed networks (both of which apparently are high priority), I spent the day putting together underlying plumbing together for both.

...and I think the plumbing going to change the approach of this a bit -- it should be a lot more straightforward in a sense -- at least that's the goal of the plumbing.

I definitely do apologize for the overlap, because I know you put work into it, and I love seeing stuff like this coming in.

I'll get back to you shortly. I think you're tied up in a another project anyway, so I don't think you're blocked on me.

Have a good weekend. 🙏

@chet
Copy link
Copy Markdown
Contributor

chet commented Apr 6, 2026

Hey @mrgalaxy-source! Check out #817 -- this should be effectively all you need to pattern match for expected machine support, with the addition of also supporting update --bmc-ip-address as well. If you look at the latest HEAD, you'll also see power shelves and nvswitches support, e.g.:

    if let Some(bmc_ip) = power_shelf.bmc_ip_address {
        update_preallocated_machine_interface(&mut txn, power_shelf.bmc_mac_address, bmc_ip)
            .await?;
    }

And again, sorry for squashing over this, but we had both #644 and #790 pop up, and it reached a point where this was going to be a deeper change to more generically support a few different use cases.

This PR makes a number of changes that allow a developer to point BMM at an "arbitrary" BMC (not the tenant, just the BMC right now) of a host or compute node in isolation that has a static IP address or has already been booted with
a DHCP address not controlled by BMM. This allows us to test the BMM discovery and ingestion process without needing to have a working DHCP Relay or PXE-configuration.

We can then look at all the discovered inventory information of the BMC and do things like power control while exploring the forge command line and API to learn about how things work.

This also requires the following settings to work:

allow_zero_dpu_hosts = true

Also, comment out the machine-a-tron out in the configmap:
bmc_proxy = "https://machine-a-tron-bmc-mock.forge-system.svc.cluster.local:1266"

Once that's done, you can deploy a static-ip-managed endpoint like this:

forge-admin-cli expected-machine add \
    --bmc-mac-address 4C:BB:47:25:BA:C7 \
    --chassis-serial-number 1234567890 \
    --bmc-username <your-username> \
    --bmc-password <your-password> \
    --bmc-ip-address x.x.x.x

Core Feature Implementation:
  • ✅ Added ip_address field to ExpectedMachine and ExpectedPowerShelf
  • ✅ Created database migrations for both expected_machines.ip_address and machine_interfaces.is_static_ip
  • ✅ Added IpTypeStaticBmcIp enum variant to classify static BMC IPs
  • ✅ Updated IP finder logic to check both BmcIp and MachineAddresses finders
  • ✅ Web UI displays "Static BMC IP" label
  • ✅ CLI supports --ip-address flag for add/patch operations
  • ✅ Proper IP validation rejects malformed addresses

  Test Infrastructure:
  • ✅ Created NodePort service for stable postgres access
  • ✅ Implemented WITH (FORCE) for database cleanup
  • ✅ Fixed connection pooling and template database isolation
  • ✅ Created self-contained run-tests-k8s.sh script

  Test Coverage (6 comprehensive tests):
  1. Add ExpectedMachine with static IP
  2. Update to add static IP to existing machine
  3. Update to change static IP
  4. Reject invalid IP addresses
  5. IP finder correctly classifies static BMC IPs
  6. Site Explorer handles power shelf static IPs

Signed-off-by: Michael Galaxy <mrgalaxy@nvidia.com>

fix: use unique migration version for machine_health_history_rename
refactor: align expected machine static BMC with main preallocate flow
- Wire add/update/batch handlers to preallocate_machine_interface /
  update_preallocated_machine_interface like switches/power shelves
- Classify IpTypeStaticBmcIp via AllocationType::Static or static-assignments segment
- Extend machine_interface_address find_by_address with allocation_type
- Remove explore_machines_from_static_ip config (interfaces come from handlers)
- JSON import accepts legacy ip_address key via serde alias
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants