Releases · aws/sagemaker-hyperpod-cli

08 Apr 23:19

mollyheamazon

v3.7.1

7974959

v3.7.1 Latest

Latest

New Instance Type Support

Add g7e instance types to HyperPod helm chart values (nvidia/EFA device plugins) (#380)
Add g7e instance types to Python constants and CLI (#385, #390)
Add g7e instance types to health-monitoring-agent node affinity (#381)
Add B300 MIG profiles to GPU operator ConfigMap (#396)
Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra) (#398)

Inference Operator

CRD updates: BYO certificate, RequestLimitsConfig, Custom Kubernetes support (#402)
Bump hyperpod-inference-operator subchart to v2.1.0 with image tag v3.1 (#402)

Enhancements

Support AWS_REGION env var, cluster context fallback, centralize boto3 client creation (#395)
Handle pagination in cluster stack listing (#394)
Require --instance-type when specifying accelerator resources (#393)

Bug Fixes

Fix EFA field naming in PyTorch job template v1.1: efa_interfaces -> efa, efa_interfaces_limit -> efa_limit (#392)
Fix deep health check nodeSelector label to sagemaker.amazonaws.com/deep-health-check-status: Passed (#386)
Remove non-EFA instance types from EFA device plugin nodeAffinity to prevent CrashLoopBackOff (#389)
Add missing instance types and fix EFA/memory resource specs (#385)

Health Monitoring Agent

Release Health Monitoring Agent 1.0.1434.0_1.0.388.0 (#388)

Assets 2

02 Mar 23:49

mujtaba1747

v3.7.0

49baa69

v3.7.0

v3.7.0 (2026-03-02)

Space CLI

Added list all functionality and documentation updates
Disabled traceback for cleaner error output

Inference Operator

Inference Operator AddOn with NodeAffinity support and version 3.0 update
Updated hyperpod-inference-operator to version 2.0.0 in HyperPodHelmChart
Added AddOn migration script and README

Enhancements

Monitoring & Observability

Emit metrics for CLI commands

Testing & Validation

Added unit tests for inference CRDs
Added CRD format check for inference

Dependencies & Versions

Updated GPU operator container toolkit version
Updated aws-efa-k8s-device-plugin version to 0.5.20

Configuration

Instance types CRD changes

Bug Fixes

Fixed syntax error in inferenceendpointconfigs by removing tab

Assets 2

27 Jan 22:36

mollyheamazon

v3.6.0

2a46ebd

v3.6.0

Features

Add EFA support in manifest for training jobs (#345)
Add end-to-end example documentation (#350)
Add 4 new HyperPod GA regions (ca-central-1, ap-southeast-3, ap-southeast-4, eu-south-2) (#360)

Enhancements

Update documentation for elastic training arguments (#343)
Upgrade Inference Operator helm chart (#346)
Update MIG config for GPU operator (#358)
Release Health Monitoring Agent 1.0.1249.0_1.0.359.0 with enhanced Nvidia timeout analysis and bug fixes (#361)

Bug Fixes

Fix canary test failures for GPU quota allocation integration tests (#356)
Fix region fallback logic for health-monitoring-agent image URIs (#360)
Remove command flag from init pytorch job integration test (#351)
Skip expensive integration tests to improve CI performance (#355)

Assets 2

03 Dec 18:07

mollyheamazon

v3.5.0

c64811d

Elastic Training Support for ReInvent Keynote 3

Adding new command line arguments to the HyperPodTrainingOperator to support elastic training capabailities
- --elastic-replica-increment-step, --max-node-count, --elastic-graceful-shutdown-timeout-in-seconds, --elastic-scaling-timeout-in-seconds, --elastic-scale-up-snooze-time-in-seconds, --elastic-replica-discrete-values
Enables dynamic scaling of compute resources during training operations

Assets 2

02 Dec 12:36

jam-jee

v2.2.0

9774c58

Hyperpod CLI V2 with Nova recipe support

Assets 2

21 Nov 09:11

mohamedzeidan2021

3.4.0

0eba08f

Parker CLI, Fractional GPU Feature

-Added hp-devspace command set for ML dev environments

-New commands: create, list, get, update, delete, geturl for dev space management
Support for namespace-based auth and resource isolation
Added auth and get-config commands to check permissions and view default settings

Users can request partial GPU resources using MIG profiles instead of full GPUs

Added --accelerator-partition-type, --accelerator-partition-count, accelerator-partition-limit
New list-accelerator-partition-type command to view available GPU partitions for instance types

Assets 2

30 Oct 20:47

mollyheamazon

v3.3.1

7233490

v3.3.1

Features

Describe cluster command
- User can use hyp describe cluster to learn more info about hp clusters
Jinja template handling logic for inference and training
- User can modify jinja template to add parameters supported by CRD through init experience of inference and training, for further CLI customization
Cluster creation template versioning
- User can choose cloudformation template version through cluster creation expeirence
KVCache and intelligent routing for HyperPod Inference
- InferenceEndpointConfig CRD supported is updated to v1
- KVCache and Intelligent Routing support is added in template version 1.1

Assets 2

24 Sep 19:51

rsareddy0329

v3.3.0

315f7ec

init experience Launch

Features

Init Experience
- Init, Validate, and Create JumpStart endpoint, Custom endpoint, and PyTorch Training Job with local configuration
Cluster management
- Bug fixes for cluster creation

Assets 2

10 Sep 19:23

papriwal

v3.2.2

162fb79

Bug fixes

Features

Fix for production canary failures caused by bad training job template.
New version for Health Monitoring Agent (1.0.790.0_1.0.266.0) with minor improvements and bug fixes.

Assets 2

28 Aug 00:14

rsareddy0329

v3.2.1

5a346e8

Bug Fixes

Bug Fixes in cluster creation

Assets 2

Releases: aws/sagemaker-hyperpod-cli

v3.7.1

New Instance Type Support

Inference Operator

Enhancements

Bug Fixes

Health Monitoring Agent

Uh oh!

v3.7.0

v3.7.0 (2026-03-02)

Enhancements

Bug Fixes

Uh oh!

v3.6.0

Features

Enhancements

Bug Fixes

Uh oh!

Elastic Training Support for ReInvent Keynote 3

Uh oh!

Hyperpod CLI V2 with Nova recipe support

Uh oh!

Parker CLI, Fractional GPU Feature

Uh oh!

v3.3.1

Features

Uh oh!

init experience Launch

Features

Uh oh!

Bug fixes

Features

Uh oh!

Bug Fixes

Uh oh!