Releases: NVIDIA/cloudai
Releases · NVIDIA/cloudai
v1.3.beta29
What's Changed
- Handles comma in env vars values for NemoLauncher by @amaslenn in #591
- Create CmdGenStrategy per usage by @amaslenn in #596
- Require docker image for NCCL tests to be explicitly set in config by @amaslenn in #597
- Rely on member test run object instead of args by @amaslenn in #598
Full Changelog: v1.3.beta28...v1.3.beta29
v1.3.beta28
What's Changed
- Avoid confusing post_test/pre_test folder structure by @amaslenn in #592
- Remove default_cmd_args field from TestTemplateStrategy by @amaslenn in #594
- Add AI Dynamo by @TaekyungHeo in #519
- Enable NCCL w/ K8S SPCx by @TaekyungHeo in #579
Full Changelog: v1.3.beta27...v1.3.beta28
v1.3.beta27
What's Changed
- Silently skip NIXL summary generation if no NIXL tests by @amaslenn in #587
- Llama31_405b by @srivatsankrishnan in #582
- Merge JobIdRetrieval functionality into respective runners by @amaslenn in #588
- Re-work job status fetching by @amaslenn in #589
- Update UCC configs by @amaslenn in #590
Full Changelog: v1.3.beta26...v1.3.beta27
v1.3.beta26
What's Changed
- Update regex to correctly extract full GPU type names including suffixes and variants by @TaekyungHeo in #578
- Fix missing k8s import by using lazy.k8s in MPIJob delete call by @TaekyungHeo in #580
- Align method with BaseRunner by renaming to on_job_completion and removing async by @TaekyungHeo in #581
- Add DockerImage support to Kubernetes installer methods by @TaekyungHeo in #583
- Match json_gen_strategy implementation to command_gen_strategy by @TaekyungHeo in #585
- Fix nodes allocation from the same group by @amaslenn in #586
- Guard on_job_submit with null check for _command_gen_strategy access by @TaekyungHeo in #584
Full Changelog: v1.3.beta25...v1.3.beta26
v1.3.beta25
What's Changed
- Add BashCmd workload by @amaslenn in #570
- Correctly load and save tdef as part of TestRunDetails by @amaslenn in #574
- Make NIXL work in single-sbatch mode by @amaslenn in #575
- Re-work slurm node status update by @amaslenn in #577
- Add NIXL summary report by @amaslenn in #576
Full Changelog: v1.3.beta24...v1.3.beta25
v1.3.beta24
v1.3.beta23
What's Changed
- Add configurable reward functions to CloudAIGym by @TaekyungHeo in #566
Full Changelog: v1.3.beta22...v1.3.beta23
v1.3.beta22
v1.3.beta21
What's Changed
- Fix path to jinja template by @amaslenn in #562
- Generate reports for DSE jobs by @TaekyungHeo in #563
- Do not use --copies for venv creation by @amaslenn in #565
Full Changelog: v1.3.beta20...v1.3.beta21
v1.3.beta20
What's Changed
- Migrate to modern datetime interface by @emmanuel-ferdman in #561
- Add single sbatch runner for slurm systems by @amaslenn in #555
- Fix DeepSeekR1 inference report by @TaekyungHeo in #560
- Rework imports by @amaslenn in #559
New Contributors
- @emmanuel-ferdman made their first contribution in #561
Full Changelog: v1.3.beta19...v1.3.beta20