Releases: NVIDIA/cloudai
Releases · NVIDIA/cloudai
v1.4.beta14
What's Changed
- Small housekeeping updates by @amaslenn in #663
- nemo recipes refactor by @malay-nagda in #633
New Contributors
- @malay-nagda made their first contribution in #633
Full Changelog: v1.4.beta13...v1.4.beta14
v1.4.beta13
What's Changed
- Configure reports via scenario config by @amaslenn in #661
- Handle CancelledError gracefully during job cleanup by @TaekyungHeo in #662
Full Changelog: v1.4.beta12...v1.4.beta13
v1.4.beta12
What's Changed
- Comparison report for NCCL workloads by @amaslenn in #656
- Support explicit node assignment for prefill and decode workers by @TaekyungHeo in #647
Full Changelog: v1.4.beta11...v1.4.beta12
v1.4.beta11
What's Changed
- Support for DeepSeekR1 model with SGLang / AI Dynamo by @TaekyungHeo in #641
- Support mounting any JSON files for --dynamo-deepep-config by @TaekyungHeo in #650
- Set tp-size and dp-size from args if provided, else use total_gpus by @TaekyungHeo in #649
- Add environment validation to startup sequence by @TaekyungHeo in #651
- Follow-up for PR641 (Support for DeepSeekR1 model with SGLang / AI Dynamo) by @TaekyungHeo in #653
- Reorder the functions in ai_dynamo.sh for improved maintainability by @TaekyungHeo in #654
- Refactor GPU count to use _gpus_per_node in vllm and env validation by @TaekyungHeo in #657
- Mount huggingface_home_container_path unconditionally by @TaekyungHeo in #655
- Refactor nodelist validation to check DYNAMO_NODELIST only if both args empty by @TaekyungHeo in #658
Full Changelog: v1.4.beta10...v1.4.beta11
v1.4.beta10
What's Changed
- Preserve installables' state during apply_params_set() by @amaslenn in #643
- Control which env vars dumped for per-rand evaluation by @amaslenn in #642
- Align extra_env_vars definition in test and scenario by @amaslenn in #644
- Update USER_GUIDE.md by @TaekyungHeo in #646
- Add latency metric reporting for NCCL by @amaslenn in #645
Full Changelog: v1.4.beta9...v1.4.beta10
v1.4.beta9
What's Changed
- Updates for SlurmContainer workload by @amaslenn in #638
- Handle missing tests gracefully by adding MissingTestError to avoid backtrace by @TaekyungHeo in #640
- Clean up src/cloudai/workloads/ai_dynamo/ai_dynamo.sh by @TaekyungHeo in #639
Full Changelog: v1.4.beta8...v1.4.beta9
v1.4.beta8
What's Changed
- Add multi-worker-per-node GPU slicing support with dynamic allocation by @TaekyungHeo in #636
- Log mapping between AI Dynamo nodes and roles by @TaekyungHeo in #617
Full Changelog: v1.4.beta7...v1.4.beta8
v1.4.beta7
What's Changed
- Handle multi-section CSV format in AI Dynamo report generation by @TaekyungHeo in #620
Full Changelog: v1.4.beta6...v1.4.beta7
v1.4.beta6
What's Changed
- Improve prepare_output_dir error handling for permissions and read-only fs (continued) by @TaekyungHeo in #631
- Add GitRepo support to KubernetesInstaller with install/uninstall logic by @TaekyungHeo in #634
- Use a shell script as the entry point for AI Dynamo by @TaekyungHeo in #615
Full Changelog: v1.4.beta5...v1.4.beta6
v1.4.beta5
What's Changed
- Improve prepare_output_dir error handling for permissions and read-only fs by @TaekyungHeo in #629
Full Changelog: v1.4.beta4...v1.4.beta5