Add architecture-aware NIM bootstrap by nil16 · Pull Request #2 · openhackathons-org/AI-Powered-Drug-Discovery-Bootcamp

nil16 · 2026-05-18T06:59:06Z

Summary

adds a turnkey bootstrap flow for architecture-aware NIM startup
uses local MolMIM on amd64 with hosted MolMIM fallback, and hosted MolMIM on ARM
updates challenge notebooks and docs for the local Boltz2 plus hosted MolMIM path

Validation

bash -n scripts/openhackathon_services.sh scripts/bootstrap_bootcamp.sh scripts/check_nim_health.sh
python3 -m py_compile cdk_oracle/config.py cdk_oracle/nim_client.py cdk_oracle/pipeline.py
python3 -m json.tool challenge/03_Hands-On_CDK_Inhibitor_Design.ipynb
python3 -m json.tool mini-hands-on/03_Hands-On_CDK_Inhibitor_Design.ipynb
GB200 remote smoke test with hosted MolMIM and local Boltz2 health checks

MolMIM has no aarch64 NIM image and BioNeMo Framework v1 is amd64/pre-Blackwell, so GB200/GB300 could not run local MolMIM (no /hidden or /decode, blocking CMA-ES guided optimization). Add molmim_arm/: a pure-PyTorch MolMIM (loads the official molmim_70m_24_3 checkpoint, no NeMo/Megatron/apex/TE) wrapped in a FastAPI service that mirrors the MolMIM NIM REST surface, plus run_molmim_arm.sh to build and run it. Wire --molmim local-arm through the service scripts and fix a run_nim_docker.sh CUDA_VISIBLE_DEVICES remap bug for --gpus device=N (N>0).

Boltz-2 now requires sampling_steps >= 10 ("Sampling steps must be at least 10, got 5"); bump the readiness smoke test from 5 to 10. Add nest_asyncio, which the notebooks need to run the async Boltz-2 affinity client inside a Jupyter kernel (otherwise the affinity cells raise "This event loop is already running").

Fix two pre-existing bugs in the CMA-ES tutorials (tutorials/04 and mini-hands-on/04): an unterminated f-string from a split print(), and a fragile optimization loop that dropped invalid/duplicate decodes and could pass fewer than mu solutions to optimizer.tell() (stochastic crash). The loop now asks the full population, penalizes invalid/duplicate molecules, and tells the actual asked solutions. Update notebook text across the setup and CDK notebooks so ARM (GB200/GB300) users know to run local MolMIM via --molmim local-arm for /hidden, /decode, and CMA-ES.

Update README, HOW_TO_GET_STARTED, deployment, singularity, and the mini and cdk_oracle READMEs to describe the pure-PyTorch ARM MolMIM NIM as the way to get /hidden, /decode, and CMA-ES on GB200/GB300 (in addition to hosted MolMIM for generation only). Note the nest_asyncio dependency and refresh stale "ARM must use hosted MolMIM" guidance.

ARM-native MolMIM NIM (local-arm) + Boltz-2/notebook fixes + docs

The HPC path is Apptainer/Singularity (Boltz-2 already runs there), but the ARM MolMIM local-arm service was Docker-only, so it failed on Apptainer-only clusters. Add molmim_arm/molmim_arm.def and an Apptainer build/run path in run_molmim_arm.sh (apptainer build from the PyTorch base + apptainer run --nv with weights mounted at /models), selected via OPENHACKATHON_CONTAINER_RUNTIME. Cap mksquashfs processors (--mksquashfs-args) to avoid segfaults on many-core ARM (Grace). Make local-arm the auto-mode default on ARM whenever a container runtime (Docker or Apptainer/Singularity) is available, with hosted fallback.

Update README, HOW_TO_GET_STARTED, deployment, singularity, and mini README to state that local-arm builds/runs via Docker or Apptainer/Singularity and is the auto-mode default on ARM (hosted MolMIM is the fallback), replacing the earlier "Apptainer-only ARM falls back to hosted" guidance.

…ainer ARM MolMIM NIM on Apptainer/Singularity + default to local-arm on ARM

nil16 added 19 commits May 13, 2026 22:58

Modernize BioNeMo bootcamp deployment

d94fb52

Simplify Apptainer service launch

7eb649d

Ignore generated service files

f30185d

Harden Apptainer NIM launcher

0b3d928

Handle Apptainer port and GPU isolation limits

793c9bc

Set GPU visibility per NIM service launch

a2f5f56

Preserve endpoints for running NIM services

cd82928

Use generated NIM endpoint environment

26d632b

Document validated Apptainer workflow

fdfac9f

Support updated Boltz2 pocket constraints

e1bc506

Add challenge notebook demo mode

ed16e2e

Refresh notebooks for local NIM workflow

05a296a

Add DLI mini hands-on track

e29f372

Normalize bootcamp branding and notices

3d50e04

Polish notebook content and licensing

160cc66

Add getting started guide

c95e2f7

Add Colossus SSH bootstrap playbook

9a2cd93

Support base64 key in Colossus bootstrap playbook

a5c75c6

Add architecture-aware NIM bootstrap

d223584

nil16 changed the title ~~[codex] Add architecture-aware NIM bootstrap~~ Add architecture-aware NIM bootstrap May 18, 2026

nil16 and others added 9 commits May 23, 2026 04:30

Handle hosted NIM fallbacks for ARM bootcamp

074f651

Merge pull request #3 from openhackathons-org/bionemo-arm-molmim-nim

2e13e16

ARM-native MolMIM NIM (local-arm) + Boltz-2/notebook fixes + docs

Merge pull request #4 from openhackathons-org/bionemo-arm-molmim-appt…

fe83453

…ainer ARM MolMIM NIM on Apptainer/Singularity + default to local-arm on ARM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add architecture-aware NIM bootstrap#2

Add architecture-aware NIM bootstrap#2
nil16 wants to merge 28 commits into
mainfrom
bionemo-singularity-neel

nil16 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nil16 commented May 18, 2026

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant