Skip to content

[TEST-ONLY] Enable NUMA modules on arc-staging with A100s#741

Draft
georgehong wants to merge 2 commits into
gh/georgehong/8/basefrom
gh/georgehong/8/head
Draft

[TEST-ONLY] Enable NUMA modules on arc-staging with A100s#741
georgehong wants to merge 2 commits into
gh/georgehong/8/basefrom
gh/georgehong/8/head

Conversation

@georgehong

@georgehong georgehong commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Temporary test configuration — DO NOT MERGE.

  • Add nfd + numa-scheduler to arc-staging modules
  • Remove nfd + numa-scheduler from prod clusters
  • Enable p4d (A100) in us-west-1 (remove exclude_regions)
  • Broaden NFD + taint-remover nodeSelector to p4d
  • Broaden STARTUP_TAINTS applies_when to include p4d
  • Cap A100 runners: 1-GPU=4, 2-GPU=2, 4-GPU=2, 8-GPU=0
  • Set scheduler_name: numa-scheduler on 4-GPU A100 def

Revert with: git revert HEAD

 just deploy-module arc-staging nodepools
  just deploy-module arc-staging nfd
  just deploy-module arc-staging numa-scheduler
  just deploy-module arc-staging arc-runners

Cleanup:

  # Delete the NFD and numa-scheduler Helm releases + namespaces
  kubectl --context pytorch-arc-staging delete namespace nfd
  kubectl --context pytorch-arc-staging delete namespace numa-scheduler

  # Remove the NRT CRD (installed by NFD chart)
  kubectl --context pytorch-arc-staging delete crd noderesourcetopologies.topology.node.k8s.io

  # Remove the taint-remover ClusterRole/ClusterRoleBinding (cluster-scoped, not deleted with namespace)
  kubectl --context pytorch-arc-staging delete clusterrole nfd-taint-remover
  kubectl --context pytorch-arc-staging delete clusterrolebinding nfd-taint-remover

  # Redeploy nodepools to remove the nfd-topology startup taint from p4d NodePools
  git checkout -- modules/nodepools/scripts/python/generate_nodepools.py modules/nodepools/defs/p4d.yaml
  just deploy-module arc-staging nodepools

  # Redeploy arc-runners to restore original A100 runner defs (no max_runners caps)
  git checkout -- modules/arc-runners/defs/
  just deploy-module arc-staging arc-runners

[ghstack-poisoned]
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Capacity report

commit 25647cb3 · run log

✅ simulate-cluster
Installed 1 package in 2ms
�[1mMonte Carlo Cluster Simulation�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Seed: 42  |  MAPE threshold: 15%  |  Runners: 43  |  DaemonSets: 16
Peak target runner types: 30 (mapped from 38 old labels)

�[1m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m
�[1m�[0;36mCluster Simulation Results�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

�[1;33mSkipped labels (1):�[0m
  �[2ml-arm64g2-6-32: no runner def�[0m

�[1mNodes by instance type:�[0m

  Instance Type          Nodes  vCPU Used vCPU Total   Mem Used  Mem Total   GPU
  ──────────────────────────────────────────────────────────────────────────────
  c7a.48xlarge             261   44794.2c   49871.9c  87800.8Gi  90312.1Gi     -
  c7i.metal-24xl            37    3415.8c    3526.8c   6197.9Gi   6231.7Gi     -
  g4dn.12xlarge            162    7341.8c    7669.1c  27946.6Gi  28119.5Gi 648/648
  g4dn.8xlarge              89    2609.5c    2792.8c  10280.4Gi  10347.8Gi 89/89
  g4dn.metal                87    8205.8c    8284.1c  29972.3Gi  30082.3Gi 696/696
  g5.12xlarge               49    2220.7c    2319.7c   8208.0Gi   8240.5Gi 196/196
  g5.48xlarge               41    7762.1c    7830.2c  28884.9Gi  28912.6Gi 328/328
  g5.8xlarge               603   17680.0c   18922.1c  68446.4Gi  68969.3Gi 603/603
  g6.12xlarge               24    1087.7c    1136.2c   4140.2Gi   4165.9Gi 96/96
  g6.8xlarge               377   11053.6c   11830.3c  42793.2Gi  43120.1Gi 377/377
  m6i.32xlarge              26    3258.3c    3308.2c  12051.3Gi  12077.9Gi     -
  m7g.8xlarge               61     995.5c    1920.3c   3813.1Gi   6992.2Gi     -
  m7g.metal                 30    1869.6c    1902.0c   6795.3Gi   6830.4Gi     -
  m7i.48xlarge              48    8192.0c    9171.8c  32363.6Gi  33658.7Gi     -
  m8g.48xlarge               7    1093.4c    1337.6c   4188.2Gi   4908.6Gi     -
  r7a.48xlarge             137   21506.4c   26178.0c 170772.3Gi 193392.5Gi     -
  r7g.16xlarge             122    7481.0c    7734.8c  56548.2Gi  56673.4Gi     -

�[1mDeployment accuracy:�[0m

  Total deployed: 6208 / 7294 target
  Weighted MAPE: 15.0%

  Runner                              Deployed   Target     Diff
  ───────────────────────────────────────────────────────────────
  �[1;33ml-arm64g3-16-62                           61       76      -15�[0m
  �[1;33ml-arm64g3-61-463                         122      153      -31�[0m
  �[0;32ml-arm64g4-16-62                           67       76       -9�[0m
  �[1;33ml-barm64g3-62-226                         30       39       -9�[0m
  �[1;33ml-bx86iamx-92-167                         37       45       -8�[0m
  �[0;32ml-bx86iavx512-94-344-t4-8                 87       91       -4�[0m
  �[0;32ml-x86aavx2-189-704-a10g-8                 41       42       -1�[0m
  �[0;32ml-x86aavx2-29-113-a10g                   603      695      -92�[0m
  �[0;32ml-x86aavx2-29-113-l4                     377      422      -45�[0m
  �[1;33ml-x86aavx2-45-167-a10g-4                  49       80      -31�[0m
  �[1;33ml-x86aavx2-45-172-l4-4                    24       29       -5�[0m
  �[0;32ml-x86aavx512-125-463                      26       24       +2�[0m
  �[1;33ml-x86iamx-32-128                         130      174      -44�[0m
  �[0;32ml-x86iamx-8-32                           354      384      -30�[0m
  �[1;33ml-x86iavx2-40-160                         22       30       -8�[0m
  �[0;32ml-x86iavx2-8-32                           19       18       +1�[0m
  �[1;33ml-x86iavx512-16-128                       68       89      -21�[0m
  �[1;33ml-x86iavx512-16-32                      1146     1384     -238�[0m
  �[1;33ml-x86iavx512-2-4                          12       15       -3�[0m
  �[0;32ml-x86iavx512-29-115-t4                    89      104      -15�[0m
  �[0;32ml-x86iavx512-32-256                       13       12       +1�[0m
  �[1;33ml-x86iavx512-37-68                        48       65      -17�[0m
  �[0;32ml-x86iavx512-45-172-t4-4                 162      183      -21�[0m
  �[1;33ml-x86iavx512-46-85                       151      189      -38�[0m
  �[0;32ml-x86iavx512-48-384                      366      417      -51�[0m
  �[0;32ml-x86iavx512-8-16                       2054     2400     -346�[0m
  �[0;32ml-x86iavx512-8-64                         26       28       -2�[0m
  �[0;32ml-x86iavx512-94-192                        2        2       +0�[0m
  �[1;33ml-x86iavx512-94-768                       22       28       -6�[0m

�[1mCluster-wide utilization:�[0m

  �[0;32mvCPU:    90.8%�[0m  (150568 / 165736 cores)
  �[0;32mMemory:  95.0%�[0m  (601203 / 633036 GiB)
  �[0;32mGPU:    100.0%�[0m  (3033 / 3033 GPUs across 1432 nodes)

  Total nodes: 2161
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ analyze-utilization
Installed 1 package in 2ms
�[1mNode Utilization Analysis�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Runner def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-h100/defs
NodePool def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-h100/defs
Utilization threshold: 90.0%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7a.48xlarge�[0m
  Total: 192 vCPU, 384Gi advertised (355.2Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 346.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-32: 16320m CPU, 32.5Gi RAM (job: 16c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-2-4: 2320m CPU, 4.5Gi RAM (job: 2c+4.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-37-68: 37320m CPU, 68.5Gi RAM (job: 37c+68.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-46-85: 46320m CPU, 85.5Gi RAM (job: 46c+85.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-192: 94320m CPU, 189.5Gi RAM (job: 94c+189.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-32�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  94.0% (325.1Gi / 346.0Gi) waste: 20.9Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-2-4�[0m: 76 pods
      CPU:  92.3% (176320m / 191080m) waste: 14760m (14.8 cores)
      MEM:  99.1% (342.7Gi / 346.0Gi) waste: 3.3Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-37-68�[0m: 5 pods
      CPU:  97.7% (186600m / 191080m) waste: 4480m (4.5 cores)
      MEM:  99.0% (342.5Gi / 346.0Gi) waste: 3.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-46-85�[0m: 4 pods
      CPU:  97.0% (185280m / 191080m) waste: 5800m (5.8 cores)
      MEM:  98.8% (342.0Gi / 346.0Gi) waste: 4.0Gi
      Bottleneck: CPU
    �[1;33ml-x86iavx512-8-16�[0m: 20 pods
      CPU:  87.1% (166400m / 191080m) waste: 24680m (24.7 cores)
      MEM:  95.4% (330.2Gi / 346.0Gi) waste: 15.8Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-192�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  54.8% (189.5Gi / 346.0Gi) waste: 156.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 236

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-37-68]
         CPU:  97.7%  MEM:  99.0%  waste: 4.5c + 3.5Gi
      �[0;32m#2�[0m [1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85, 2xl-x86iavx512-8-16]
         CPU:  97.5%  MEM:  99.9%  waste: 4.8c + 466Mi
      �[0;32m#3�[0m [12xl-x86iavx512-2-4, 3xl-x86iavx512-37-68, 1xl-x86iavx512-46-85]
         CPU:  97.4%  MEM:  99.7%  waste: 5.0c + 888Mi
      �[0;32m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.7%  waste: 5.2c + 988Mi
      �[0;32m#5�[0m [8xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.4%  waste: 5.2c + 1.9Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-2-4, 9xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.8%  MEM:  99.0%  waste: 19.6c + 3.4Gi
      �[1;33m#2�[0m [2xl-x86iavx512-16-32, 12xl-x86iavx512-2-4, 2xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.7%  MEM:  98.7%  waste: 19.6c + 4.4Gi
      �[1;33m#3�[0m [4xl-x86iavx512-16-32, 5xl-x86iavx512-2-4, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 7xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#5�[0m [2xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 5xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.4%  MEM:  98.7%  waste: 20.2c + 4.4Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.12xlarge�[0m
  Total: 48 vCPU, 96Gi advertised (88.8Gi actual)
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 47440m CPU (47.4 cores), 85.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-14-27: 14320m CPU, 27.5Gi RAM (job: 14c+27.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-41: 22320m CPU, 41.5Gi RAM (job: 22c+41.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-46-84: 46320m CPU, 84.5Gi RAM (job: 46c+84.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iamx-14-27�[0m: 3 pods
      CPU:  90.6% (42960m / 47440m) waste: 4480m (4.5 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-41�[0m: 2 pods
      CPU:  94.1% (44640m / 47440m) waste: 2800m (2.8 cores)
      MEM:  97.6% (83.0Gi / 85.0Gi) waste: 2.0Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-46-84�[0m: 1 pods
      CPU:  97.6% (46320m / 47440m) waste: 1120m (1.1 cores)
      MEM:  99.4% (84.5Gi / 85.0Gi) waste: 530Mi
      Bottleneck: CPU
    �[1;33ml-x86iamx-8-16�[0m: 5 pods
      CPU:  87.7% (41600m / 47440m) waste: 5840m (5.8 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 8

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-46-84]
         CPU:  97.6%  MEM:  99.4%  waste: 1.1c + 530Mi
      �[0;32m#2�[0m [2xl-x86iamx-22-41]
         CPU:  94.1%  MEM:  97.6%  waste: 2.8c + 2.0Gi
      �[0;32m#3�[0m [3xl-x86iamx-14-27]
         CPU:  90.6%  MEM:  97.1%  waste: 4.5c + 2.5Gi
      �[1;33m#4�[0m [5xl-x86iamx-8-16]
         CPU:  87.7%  MEM:  97.1%  waste: 5.8c + 2.5Gi
      �[1;33m#5�[0m [1xl-x86iamx-14-27, 3xl-x86iamx-8-16]
         CPU:  82.8%  MEM:  90.6%  waste: 8.2c + 8.0Gi

    �[0;31mBottom 3 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-22-41, 2xl-x86iamx-8-16]
         CPU:  82.1%  MEM:  87.7%  waste: 8.5c + 10.5Gi
      �[0;31m#2�[0m [2xl-x86iamx-14-27, 1xl-x86iamx-8-16]
         CPU:  77.9%  MEM:  84.1%  waste: 10.5c + 13.5Gi
      �[0;31m#3�[0m [1xl-x86iamx-14-27, 1xl-x86iamx-22-41]
         CPU:  77.2%  MEM:  81.2%  waste: 10.8c + 16.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.metal-24xl�[0m
  Total: 96 vCPU, 192Gi advertised (177.6Gi actual)
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 95320m CPU (95.3 cores), 168.4Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-92-167: 92320m CPU, 167.5Gi RAM (job: 92c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-92-167�[0m: 1 pods
      CPU:  96.9% (92320m / 95320m) waste: 3000m (3.0 cores)
      MEM:  99.5% (167.5Gi / 168.4Gi) waste: 936Mi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-bx86iamx-92-167]
         CPU:  96.9%  MEM:  99.5%  waste: 3.0c + 936Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47340m CPU (47.3 cores), 173.6Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-45-172-t4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-45-172-t4-4�[0m: 1 pods
      CPU:  95.7% (45320m / 47340m) waste: 2020m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.6Gi) waste: 1.1Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-45-172-t4-4]
         CPU:  95.7%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 993Mi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31380m CPU (31.4 cores), 116.3Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-29-115-t4: 29320m CPU, 115.5Gi RAM, 1 GPU (job: 29c+115.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-29-115-t4�[0m: 1 pods
      CPU:  93.4% (29320m / 31380m) waste: 2060m (2.1 cores)
      MEM:  99.3% (115.5Gi / 116.3Gi) waste: 776Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-29-115-t4]
         CPU:  93.4%  MEM:  99.3%  GPU: 100.0%  waste: 2.1c + 776Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.metal�[0m
  Total: 96 vCPU, 384Gi advertised (355.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95220m CPU (95.2 cores), 345.8Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-94-344-t4-8: 94320m CPU, 344.5Gi RAM, 8 GPU (job: 94c+344.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iavx512-94-344-t4-8�[0m: 1 pods
      CPU:  99.1% (94320m / 95220m) waste: 900m (0.9 cores)
      MEM:  99.6% (344.5Gi / 345.8Gi) waste: 1.3Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-bx86iavx512-94-344-t4-8]
         CPU:  99.1%  MEM:  99.6%  GPU: 100.0%  waste: 0.9c + 1.3Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 8.3Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47340m CPU (47.3 cores), 168.2Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-167-a10g-4: 45320m CPU, 167.5Gi RAM, 4 GPU (job: 45c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-167-a10g-4�[0m: 1 pods
      CPU:  95.7% (45320m / 47340m) waste: 2020m (2.0 cores)
      MEM:  99.6% (167.5Gi / 168.2Gi) waste: 680Mi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-167-a10g-4]
         CPU:  95.7%  MEM:  99.6%  GPU: 100.0%  waste: 2.0c + 680Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 4.1Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190980m CPU (191.0 cores), 705.2Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-189-704-a10g-8: 189320m CPU, 704.5Gi RAM, 8 GPU (job: 189c+704.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-189-704-a10g-8�[0m: 1 pods
      CPU:  99.1% (189320m / 190980m) waste: 1660m (1.7 cores)
      MEM:  99.9% (704.5Gi / 705.2Gi) waste: 691Mi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-189-704-a10g-8]
         CPU:  99.1%  MEM:  99.9%  GPU: 100.0%  waste: 1.7c + 691Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31380m CPU (31.4 cores), 114.4Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-a10g: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-a10g�[0m: 1 pods
      CPU:  93.4% (29320m / 31380m) waste: 2060m (2.1 cores)
      MEM:  99.2% (113.5Gi / 114.4Gi) waste: 888Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-a10g]
         CPU:  93.4%  MEM:  99.2%  GPU: 100.0%  waste: 2.1c + 888Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47340m CPU (47.3 cores), 173.6Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-172-l4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-172-l4-4�[0m: 1 pods
      CPU:  95.7% (45320m / 47340m) waste: 2020m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.6Gi) waste: 1.1Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-172-l4-4]
         CPU:  95.7%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31380m CPU (31.4 cores), 114.4Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-l4: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-l4�[0m: 1 pods
      CPU:  93.4% (29320m / 31380m) waste: 2060m (2.1 cores)
      MEM:  99.2% (113.5Gi / 114.4Gi) waste: 888Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-l4]
         CPU:  93.4%  MEM:  99.2%  GPU: 100.0%  waste: 2.1c + 888Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m6i.32xlarge�[0m
  Total: 128 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 390m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 127240m CPU (127.2 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx512-125-463: 125320m CPU, 463.5Gi RAM (job: 125c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx512-125-463�[0m: 1 pods
      CPU:  98.5% (125320m / 127240m) waste: 1920m (1.9 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx512-125-463]
         CPU:  98.5%  MEM:  99.8%  waste: 1.9c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual)
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 31480m CPU (31.5 cores), 114.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;31ml-arm64g3-16-62�[0m: 1 pods
      CPU:  51.8% (16320m / 31480m) waste: 15160m (15.2 cores)
      MEM:  54.5% (62.5Gi / 114.6Gi) waste: 52.1Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;31m#1�[0m [1xl-arm64g3-16-62]
         CPU:  51.8%  MEM:  54.5%  waste: 15.2c + 52.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.metal�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g3-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g3-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g3-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7i.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-32-128: 32320m CPU, 128.5Gi RAM (job: 32c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-40-160: 40320m CPU, 160.5Gi RAM (job: 40c+160.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-32-128�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  91.6% (642.5Gi / 701.2Gi) waste: 58.7Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx2-40-160�[0m: 4 pods
      CPU:  84.4% (161280m / 191080m) waste: 29800m (29.8 cores)
      MEM:  91.6% (642.0Gi / 701.2Gi) waste: 59.2Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx2-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 131

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-32-128, 17xl-x86iamx-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#2�[0m [1xl-x86iamx-32-128, 16xl-x86iamx-8-32, 1xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#3�[0m [1xl-x86iamx-32-128, 15xl-x86iamx-8-32, 2xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#4�[0m [1xl-x86iamx-32-128, 14xl-x86iamx-8-32, 3xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#5�[0m [1xl-x86iamx-32-128, 13xl-x86iamx-8-32, 4xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-32-128, 1xl-x86iamx-8-32, 3xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#2�[0m [1xl-x86iamx-32-128, 3xl-x86iavx2-40-160, 2xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#3�[0m [4xl-x86iamx-32-128, 1xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#4�[0m [1xl-x86iamx-8-32, 4xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#5�[0m [4xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.16xlarge�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g4-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g4-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g4-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)
    - rel-l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU
    �[0;32mrel-l-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 12

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [11xl-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [10xl-arm64g4-16-62, 1xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [9xl-arm64g4-16-62, 2xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [8xl-arm64g4-16-62, 3xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [7xl-arm64g4-16-62, 4xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [4xl-arm64g4-16-62, 7xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [3xl-arm64g4-16-62, 8xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [2xl-arm64g4-16-62, 9xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [1xl-arm64g4-16-62, 10xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [11xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p4d.24xlarge�[0m
  Total: 96 vCPU, 1152Gi advertised (1065.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 3.0Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95220m CPU (95.2 cores), 1061.0Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-88-1000-a100-8: 88320m CPU, 1000.5Gi RAM, 8 GPU (job: 88c+1000.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-11-125-a100: 11320m CPU, 125.5Gi RAM, 1 GPU (job: 11c+125.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-22-250-a100-2: 22320m CPU, 250.5Gi RAM, 2 GPU (job: 22c+250.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-44-500-a100-4: 44320m CPU, 500.5Gi RAM, 4 GPU (job: 44c+500.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iavx512-88-1000-a100-8�[0m: 1 pods
      CPU:  92.8% (88320m / 95220m) waste: 6900m (6.9 cores)
      MEM:  94.3% (1000.5Gi / 1061.0Gi) waste: 60.5Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-11-125-a100�[0m: 8 pods
      CPU:  95.1% (90560m / 95220m) waste: 4660m (4.7 cores)
      MEM:  94.6% (1004.1Gi / 1061.0Gi) waste: 56.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-22-250-a100-2�[0m: 4 pods
      CPU:  93.8% (89280m / 95220m) waste: 5940m (5.9 cores)
      MEM:  94.4% (1002.0Gi / 1061.0Gi) waste: 59.0Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-44-500-a100-4�[0m: 2 pods
      CPU:  93.1% (88640m / 95220m) waste: 6580m (6.6 cores)
      MEM:  94.3% (1001.0Gi / 1061.0Gi) waste: 60.0Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iavx512-11-125-a100]
         CPU:  95.1%  MEM:  94.6%  GPU: 100.0%  waste: 4.7c + 56.9Gi
      �[0;32m#2�[0m [6xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2]
         CPU:  94.8%  MEM:  94.6%  GPU: 100.0%  waste: 5.0c + 57.4Gi
      �[0;32m#3�[0m [4xl-x86iavx512-11-125-a100, 2xl-x86iavx512-22-250-a100-2]
         CPU:  94.4%  MEM:  94.5%  GPU: 100.0%  waste: 5.3c + 57.9Gi
      �[0;32m#4�[0m [4xl-x86iavx512-11-125-a100, 1xl-x86iavx512-44-500-a100-4]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.5Gi
      �[0;32m#5�[0m [2xl-x86iavx512-11-125-a100, 3xl-x86iavx512-22-250-a100-2]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.5Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 59.0Gi
      �[0;32m#2�[0m [4xl-x86iavx512-22-250-a100-2]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 59.0Gi
      �[0;32m#3�[0m [2xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.4%  MEM:  94.4%  GPU: 100.0%  waste: 6.3c + 59.5Gi
      �[0;32m#4�[0m [2xl-x86iavx512-44-500-a100-4]
         CPU:  93.1%  MEM:  94.3%  GPU: 100.0%  waste: 6.6c + 60.0Gi
      �[0;32m#5�[0m [1xl-bx86iavx512-88-1000-a100-8]
         CPU:  92.8%  MEM:  94.3%  GPU: 100.0%  waste: 6.9c + 60.5Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p5.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190980m CPU (191.0 cores), 1890.8Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-h100-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-h100: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-h100-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-h100-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-h100-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190980m) waste: 14660m (14.7 cores)
      MEM:  95.2% (1800.5Gi / 1890.8Gi) waste: 90.3Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-h100�[0m: 8 pods
      CPU:  93.5% (178560m / 190980m) waste: 12420m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.8Gi) waste: 86.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-h100-2�[0m: 4 pods
      CPU:  92.8% (177280m / 190980m) waste: 13700m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.8Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-h100-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190980m) waste: 14340m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.8Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-h100]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.7Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2]
         CPU:  93.3%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.2Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-h100, 2xl-x86iamx-44-450-h100-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.1c + 87.7Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-h100, 1xl-x86iamx-88-900-h100-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.4c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-h100, 3xl-x86iamx-44-450-h100-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.4c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.8%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-h100-2]
         CPU:  92.8%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-h100-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-h100-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.7c + 90.3Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p6-b200.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 470m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190980m CPU (191.0 cores), 1890.8Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-b200-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-b200: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-b200-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-b200-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-b200-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190980m) waste: 14660m (14.7 cores)
      MEM:  95.2% (1800.5Gi / 1890.8Gi) waste: 90.3Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-b200�[0m: 8 pods
      CPU:  93.5% (178560m / 190980m) waste: 12420m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.8Gi) waste: 86.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-b200-2�[0m: 4 pods
      CPU:  92.8% (177280m / 190980m) waste: 13700m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.8Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-b200-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190980m) waste: 14340m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.8Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-b200]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.7Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2]
         CPU:  93.3%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.2Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-b200, 2xl-x86iamx-44-450-b200-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.1c + 87.7Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-b200, 1xl-x86iamx-88-900-b200-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.4c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-b200, 3xl-x86iamx-44-450-b200-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.4c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.8%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-b200-2]
         CPU:  92.8%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-b200-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-b200-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.7c + 90.3Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7a.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-32-256: 32320m CPU, 256.5Gi RAM (job: 32c+256.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-48-384: 48320m CPU, 384.5Gi RAM (job: 48c+384.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-768: 94320m CPU, 740.5Gi RAM (job: 94c+740.0Gi, hooks: 320m+522Mi)
    - rel-l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx512-32-256�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  90.9% (1282.5Gi / 1411.6Gi) waste: 129.1Gi
      Bottleneck: CPU
    �[0;31ml-x86iavx512-48-384�[0m: 3 pods
      CPU:  75.9% (144960m / 191080m) waste: 46120m (46.1 cores)
      MEM:  81.7% (1153.5Gi / 1411.6Gi) waste: 258.1Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-768�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  52.5% (740.5Gi / 1411.6Gi) waste: 671.1Gi
      Bottleneck: MEM
    �[0;32mrel-l-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 572

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-16-128, 2xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#2�[0m [4xl-x86iavx512-16-128, 2xl-x86iavx512-32-256, 1xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#3�[0m [3xl-x86iavx512-16-128, 4xl-x86iavx512-32-256]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#4�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 2xl-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#5�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#2�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xl-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#3�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 2xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#4�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 2xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#5�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7g.16xlarge�[0m
  Total: 64 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-61-463: 61320m CPU, 463.5Gi RAM (job: 61c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g3-61-463�[0m: 1 pods
      CPU:  96.7% (61320m / 63400m) waste: 2080m (2.1 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-arm64g3-61-463]
         CPU:  96.7%  MEM:  99.8%  waste: 2.1c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7i.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[0;32ml-x86iamx-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-16-128, 19xl-x86iamx-8-64]
         CPU:  91.3%  MEM:  95.9%  waste: 16.7c + 57.4Gi
      �[0;32m#2�[0m [2xl-x86iamx-16-128, 17xl-x86iamx-8-64]
         CPU:  91.1%  MEM:  95.9%  waste: 17.0c + 57.9Gi
      �[0;32m#3�[0m [3xl-x86iamx-16-128, 15xl-x86iamx-8-64]
         CPU:  90.9%  MEM:  95.9%  waste: 17.3c + 58.4Gi
      �[0;32m#4�[0m [4xl-x86iamx-16-128, 13xl-x86iamx-8-64]
         CPU:  90.8%  MEM:  95.8%  waste: 17.6c + 59.0Gi
      �[0;32m#5�[0m [5xl-x86iamx-16-128, 11xl-x86iamx-8-64]
         CPU:  90.6%  MEM:  95.8%  waste: 18.0c + 59.5Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [6xl-x86iamx-16-128, 9xl-x86iamx-8-64]
         CPU:  90.4%  MEM:  95.8%  waste: 18.3c + 60.0Gi
      �[0;32m#2�[0m [7xl-x86iamx-16-128, 7xl-x86iamx-8-64]
         CPU:  90.3%  MEM:  95.7%  waste: 18.6c + 60.5Gi
      �[0;32m#3�[0m [8xl-x86iamx-16-128, 5xl-x86iamx-8-64]
         CPU:  90.1%  MEM:  95.7%  waste: 18.9c + 61.0Gi
      �[1;33m#4�[0m [9xl-x86iamx-16-128, 3xl-x86iamx-8-64]
         CPU:  89.9%  MEM:  95.6%  waste: 19.2c + 61.5Gi
      �[1;33m#5�[0m [10xl-x86iamx-16-128, 1xl-x86iamx-8-64]
         CPU:  89.8%  MEM:  95.6%  waste: 19.6c + 62.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: t4g.2xlarge�[0m
  Total: 8 vCPU, 32Gi advertised (29.6Gi actual)
  Kubelet reserved: 90m CPU, 993Mi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 7540m CPU (7.5 cores), 27.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g2-6-25: 6320m CPU, 25.5Gi RAM (job: 6c+25.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-arm64g2-6-25�[0m: 1 pods
      CPU:  83.8% (6320m / 7540m) waste: 1220m (1.2 cores)
      MEM:  92.0% (25.5Gi / 27.7Gi) waste: 2.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[1;33m#1�[0m [1xl-arm64g2-6-25]
         CPU:  83.8%  MEM:  92.0%  waste: 1.2c + 2.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[0;31m�[1mFound 13 runner type(s) with homogeneous utilization below 90.0%�[0m

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1mUnused resource headroom per node (homogeneous packing only):�[0m

  Node Type                 Min CPU    Max CPU    Min MEM    Max MEM
  ────────────────────────────────────────────────────────────────
  c7a.48xlarge              4480m     96760m      3.3Gi    156.5Gi
  c7i.12xlarge              1120m      5840m      530Mi      2.5Gi
  c7i.metal-24xl            3000m      3000m      936Mi      936Mi
  g4dn.12xlarge             2020m      2020m      1.1Gi      1.1Gi
  g4dn.8xlarge              2060m      2060m      776Mi      776Mi
  g4dn.metal                 900m       900m      1.3Gi      1.3Gi
  g5.12xlarge               2020m      2020m      680Mi      680Mi
  g5.48xlarge               1660m      1660m      691Mi      691Mi
  g5.8xlarge                2060m      2060m      888Mi      888Mi
  g6.12xlarge               2020m      2020m      1.1Gi      1.1Gi
  g6.8xlarge                2060m      2060m      888Mi      888Mi
  m6i.32xlarge              1920m      1920m      1.0Gi      1.0Gi
  m7g.8xlarge              15160m     15160m     52.1Gi     52.1Gi
  m7g.metal                 1080m      1080m      1.2Gi      1.2Gi
  m7i.48xlarge             16360m     29800m     18.5Gi     59.2Gi
  m8g.16xlarge              1080m      1080m      1.2Gi      1.2Gi
  m8g.48xlarge             11560m     11560m     13.6Gi     13.6Gi
  p4d.24xlarge              4660m      6900m     56.9Gi     60.5Gi
  p5.48xlarge              12420m     14660m     86.7Gi     90.3Gi
  p6-b200.48xlarge         12420m     14660m     86.7Gi     90.3Gi
  r7a.48xlarge             16360m     96760m     56.9Gi    671.1Gi
  r7g.16xlarge              2080m      2080m      1.0Gi      1.0Gi
  r7i.48xlarge             16360m     27880m     56.9Gi    126.5Gi
  t4g.2xlarge               1220m      1220m      2.2Gi      2.2Gi
  ────────────────────────────────────────────────────────────────
  �[1mWORST CASE            �[0m     900m     96760m      530Mi    671.1Gi

  The tightest node has only �[1m900m CPU�[0m and �[1m530Mi RAM�[0m free.
  Any new DaemonSet must fit within these limits or runners will fail to schedule.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

tofu plan — arc-cbr-production

✅ Plan succeeded · commit 25647cb3 · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
data.aws_availability_zones.available: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.eks.data.aws_caller_identity.current: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-03542e74755fc105b]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-0d3a71569b2f687be]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-021ee6c9f1d20b71a]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-09b15a770e0c6d552]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-0cc3dadec18bbb3f3]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-0aede78edc69cf695]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-0cde9a6463901f1e1]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0f2b00a9ac31df215]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-063bee447616351f9]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-06b7b88826199a232]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-067d535102a61d1a8]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-02825435a2786b3d8]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-0bd9bf54bd6010323]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-0cead990d60ce181e]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0e67c0a8cd8c990da]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-06a980076e99cda81]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-07cfdb2fd5dc07459]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-0113c95dbdec2f879]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-086a011b3c26c0dd7]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-0de33181548ac2e5a]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-034d5e1f5a2fcb795]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0deb818bbf18764de]
data.terraform_remote_state.base: Read complete after 1s
aws_security_group.efs: Refreshing state... [id=sg-0979eb5e3d9d3db9f]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role-20260518023249903900000003]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role-20260518023249929400000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role-20260518023249955700000005]
aws_efs_mount_target.pypi_cache["subnet-0577a02acde719bff"]: Refreshing state... [id=fsmt-07d7b111b9cd6684e]
aws_efs_mount_target.pypi_cache["subnet-0709abbcafa23aec0"]: Refreshing state... [id=fsmt-08cd5108febbacef9]
aws_efs_mount_target.pypi_cache["subnet-0992f582e9bf2836e"]: Refreshing state... [id=fsmt-03523586bb4ff0c46]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

Copy link
Copy Markdown

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit a477c919 · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
data.aws_availability_zones.available: Reading...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
data.aws_availability_zones.available: Read complete after 0s [id=us-west-1]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_eip.nat_secondary["us-west-1c-4"]: Refreshing state... [id=eipalloc-0dfaa16c61333ceb3]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_eip.nat_secondary["us-west-1a-6"]: Refreshing state... [id=eipalloc-08763a35db0a26caa]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_eip.nat_secondary["us-west-1a-3"]: Refreshing state... [id=eipalloc-05a2bad636af56f4d]
module.vpc.aws_eip.nat_secondary["us-west-1a-1"]: Refreshing state... [id=eipalloc-012ac413772344fea]
module.vpc.aws_eip.nat_secondary["us-west-1c-6"]: Refreshing state... [id=eipalloc-0cf91a032d10f4ec5]
module.vpc.aws_eip.nat_secondary["us-west-1a-5"]: Refreshing state... [id=eipalloc-059986f686b188dc2]
module.vpc.aws_eip.nat_secondary["us-west-1a-4"]: Refreshing state... [id=eipalloc-0dfae88698dce850e]
module.vpc.aws_eip.nat_secondary["us-west-1c-5"]: Refreshing state... [id=eipalloc-0635efedc10ee5f66]
module.vpc.aws_eip.nat_secondary["us-west-1c-1"]: Refreshing state... [id=eipalloc-0bd09c7f2dcaa0a46]
module.vpc.aws_eip.nat_secondary["us-west-1c-2"]: Refreshing state... [id=eipalloc-0f2e15b6a36b52fac]
module.vpc.aws_eip.nat_secondary["us-west-1c-3"]: Refreshing state... [id=eipalloc-09f89978685e7f3c7]
module.vpc.aws_eip.nat_secondary["us-west-1a-2"]: Refreshing state... [id=eipalloc-0647e169131be5893]
module.vpc.aws_eip.nat_secondary["us-west-1a-0"]: Refreshing state... [id=eipalloc-0e3ca79e34012a238]
module.vpc.aws_eip.nat_secondary["us-west-1c-0"]: Refreshing state... [id=eipalloc-0d565f5bf077b05cf]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 2s
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

tofu plan — meta-prod-aws-ue1

✅ Plan succeeded · commit 25647cb3 · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (meta-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=meta-prod-aws-ue1-cluster-role]
module.eks.aws_iam_role.node: Refreshing state... [id=meta-prod-aws-ue1-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-046818728dce02486]
module.eks.data.aws_caller_identity.current: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=9274017b-776a-41bd-9f11-d118a1174159]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNGRUDTXPT]
data.aws_availability_zones.available: Read complete after 1s [id=us-east-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/meta-prod-aws-ue1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=meta-prod-aws-ue1-node-role:meta-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0cf3d9cf37ee998b6]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-0ce44cb6446f3c1b6]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-02ce11d6646870431]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0348c5058db524cd2]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0beb5fc44f0ee165f]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-033772b4490df1b41]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0eafd792589fbb363]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-09fa171393c3a7cfb]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-00c2e2605c4dea199]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-0c8291ee817240e1f]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-0c8a6faed0a97479d]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-04fe645562f597aaa]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-0af54aa2e5f40dfa4]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-0cb5208c5f775baf6]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-0f922f499d32f1368]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-00c5df9f3b60f353d]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-05844040c7248f44f]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0aba12aa23c11d20c]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0d095305019486ae6]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-0d078dc6f07628714]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0bcfe1f98793e1b12]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-0d22d3aa0667a1070]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0f0b720f4cca62ec7]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-025ef0e1813277c67]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-080ec4e265ebdc5ad]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-02e84a51a14c9cbda]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-08c7bd3306cf687ca]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0f922406e02ecba1d]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-01f89a7c130d2a810]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-0bda13d7b70c00c00]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-078f44b58c8b48ade]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-07bfd0f170c3b3406]
module.eks.aws_eks_cluster.this: Refreshing state... [id=meta-prod-aws-ue1]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-05da47c4ed26ae390]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-05e7e66e960593972]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0616491b7baeab47f]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=meta-prod-aws-ue1:kube-proxy]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=meta-prod-aws-ue1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-043779597e3b5a7fd]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0cff785d8001fc914]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-025de56c0aac8d3f0]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-09414719983019b49]
module.eks.aws_eks_node_group.base: Refreshing state... [id=meta-prod-aws-ue1:meta-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/6C84A48E1BF23A027C1E78912A368743]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-09287d705ce4a88bc]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-05d5b7a41aa6323ed]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0c665948be8d0282e]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3022997555]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=meta-prod-aws-ue1:coredns]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-02a8683fa7258f295]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-09dca398d838d4247]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0306281246323bd27]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-016f4a0d209f3e4a9,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-02ce11d6646870431"]: Refreshing state... [id=subnet-02ce11d6646870431,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0348c5058db524cd2"]: Refreshing state... [id=subnet-0348c5058db524cd2,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c,karpenter.sh/discovery]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-karpenter-controller]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller-20260528200455768400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
data.terraform_remote_state.base: Read complete after 0s
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-023e57b36ec1cd426]
aws_iam_role.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_security_group.efs: Refreshing state... [id=sg-0bc06caa62214c9b7]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role-20260528201106257700000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role-20260528201106192600000004]
aws_efs_mount_target.pypi_cache["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=fsmt-0ffaedc58eceb7749]
aws_efs_mount_target.pypi_cache["subnet-02ce11d6646870431"]: Refreshing state... [id=fsmt-06a05c001541338d2]
aws_efs_mount_target.pypi_cache["subnet-0348c5058db524cd2"]: Refreshing state... [id=fsmt-0500c573cafe66133]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role-20260528201106116400000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

[ghstack-poisoned]
georgehong added a commit that referenced this pull request Jun 12, 2026
Temporary test configuration — DO NOT MERGE.

- Add nfd + numa-scheduler to arc-staging modules
- Remove nfd + numa-scheduler from prod clusters
- Enable p4d (A100) in us-west-1 (remove exclude_regions)
- Broaden NFD + taint-remover nodeSelector to p4d
- Broaden STARTUP_TAINTS applies_when to include p4d
- Cap A100 runners: 1-GPU=2, 2-GPU=2, 4-GPU=2, 8-GPU=1
- Set scheduler_name: numa-scheduler on 4-GPU A100 def
- Add cleanup-arc-staging.sh for teardown

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git reset --hard HEAD~1

ghstack-source-id: 1deedcc
Pull-Request: #741
@georgehong georgehong deployed to osdc-staging June 12, 2026 01:18 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant