Comprehensive deployment guides and automation scripts for Microsoft Azure High Performance Computing (HPC) and Artificial Intelligence (AI) solutions
π Personal Notes Disclaimer
This document contains personal notes created by Ricardo S Jacomini - Azure HPC & AI SEE.
This is NOT an official Microsoft document and represents personal insights and experiences.
This folder contains resources to deploy an HPC Pack Cluster in Microsoft Azure.
The deployment sets up:
- An HPC Pack Cluster tailored for Windows workloads.
- A single head node configuration.
- A new Active Directory Domain as part of the deployment process.
This setup is ideal for high-performance computing scenarios that require Windows-based infrastructure and centralized domain management.
- Azure subscription
- Appropriate permissions to deploy resources and create Active Directory domains
Follow the instructions in the deployment scripts or templates provided in this folder to initiate the setup.
π HPC-PACK - Main deployment folder containing all necessary scripts and templates
This repository provides comprehensive Bicep templates and automation scripts for deploying Azure Managed Lustre File Systems (AMLFS) with different security and feature configurations.
| Feature | Basic Version | Managed Identity Version |
|---|---|---|
| π― Use Case | Testing & Development | Production Workloads |
| π Authentication | User credentials | Managed Identity |
| π‘οΈ RBAC Setup | Manual | Automatic |
| π¦ HSM Support | β | β Blob storage integration |
| ποΈ Storage Container | β | β Private container |
| π Security Level | Basic NSG rules | Enhanced + Lustre ports |
| π Credential Management | Manual rotation | Automatic |
| π Audit Trail | Limited | Full via managed identity |
| βοΈ Complexity | Simple | Advanced |
| β±οΈ Setup Time | ~5 minutes | ~10 minutes |
π Complete Documentation: README-basic.md
Perfect for: Development, Testing, Quick Prototyping
# Quick start - Basic version
.\AMLFS\scripts\Test-AMLFSZones.ps1 -ResourceGroup "aml-rsj" -Location "eastus"π What you get:
- β Clean, minimal Bicep template
- β Automated zone testing
- β Basic network security
- β 8TiB AMLFS Premium-250
- β Simple deployment process
π Files:
AMLFS/templates/infra-basic.bicep- Minimal templateAMLFS/scripts/Test-AMLFSZones.ps1- Zone testing scriptAMLFS/README-basic.md- Complete documentation
π Complete Documentation: README-managed-identity.md
Perfect for: Production, Enterprise, Security-First Deployments
# Quick start - Managed identity version
.\AMLFS\scripts\Test-AMLFSZones-ManagedIdentity.ps1 -ResourceGroup "aml-rsj-managed-identity" -Location "eastus"π What you get:
- β User-Assigned Managed Identity
- β Automatic RBAC role assignments
- β HSM (Hierarchical Storage Management)
- β Private blob container
- β Enhanced security rules
- β Production-ready configuration
π Files:
AMLFS/templates/infra-managed-identity.bicep- Full-featured templateAMLFS/scripts/Test-AMLFSZones-ManagedIdentity.ps1- Advanced testing scriptAMLFS/README-managed-identity.md- Complete documentation
- π§ͺ You're testing or developing AMLFS solutions
- β‘ You need quick deployment with minimal configuration
- π You're learning AMLFS concepts
- π° You want minimal resource overhead
- π§ You prefer manual control over security settings
π Go to Basic Documentation
- π’ You're deploying for production workloads
- π You need enterprise-grade security
- π You require audit trails and compliance
- ποΈ You want HSM data tiering capabilities
- π€ You prefer automated credential management
- π₯ You're working in multi-tenant environments
π Go to Managed Identity Documentation
Both versions require:
- Azure CLI installed and configured
- PowerShell 5.1+ (for Windows automation scripts)
- Azure Login:
az login - Proper permissions: Contributor role (Basic) or User Access Administrator (Managed Identity)
Before using either version, run these commands to verify readiness:
# Check AMLFS quota and availability
az rest --method GET --url "https://management.azure.com/subscriptions/$(az account show --query id -o tsv)/providers/Microsoft.StorageCache/locations/eastus/usages?api-version=2023-05-01"
# Verify StorageCache provider is registered
az provider list --query "[?namespace=='Microsoft.StorageCache'].{Namespace:namespace, State:registrationState}" -o table# Test and deploy basic version
.\AMLFS\scripts\Test-AMLFSZones.ps1 -ResourceGroup "aml-rsj" -Location "eastus"π Full Basic Guide β
# Test and deploy managed identity version
$resourceGroup = "amlfs-managed-identity-$(Get-Date -Format 'yyyyMMdd-HHmm')"
.\AMLFS\scripts\Test-AMLFSZones-ManagedIdentity.ps1 -ResourceGroup $resourceGroup -Location "eastus"π Full Managed Identity Guide β
π AMLFS Deployment Repository
βββ π README.md # This overview file
βββ π AMLFS/README-basic.md # Basic version documentation
βββ π AMLFS/README-managed-identity.md # Managed identity version documentation
βββ π AMLFS/templates/ # Infrastructure templates
β βββ π§© infra-basic.bicep # Basic Bicep template
β βββ π§© infra-managed-identity.bicep # Managed identity Bicep template
β βββ π§© infra.bicep # Legacy template
β βββ οΏ½ infra-managed-identity.json # Parameters file
βββ οΏ½π AMLFS/scripts/ # Automation scripts
β βββ π€ Test-AMLFSZones.ps1 # Basic version zone testing
β βββ π€ Test-AMLFSZones-ManagedIdentity.ps1 # Managed identity zone testing
β βββ π€ next-steps.ps1 # Post-deployment automation
β βββ π€ create-vm.ps1 # VM creation with fallbacks
β βββ π€ Check-ManagedIdentityPermissions.ps1 # Permission validation
β βββ π§© kernel-downgrade.sh # Lustre client installation
β βββ π Various utility scripts
βββ π AMLFS/pictures/ # Documentation images
βββ πΌοΈ diagram.png # Architecture diagram
- π¦ README-basic.md - Complete guide for basic AMLFS deployment
- π© README-managed-identity.md - Complete guide for managed identity deployment
Both solutions have been tested and validated:
- β Zone Testing: All zones (1, 2, 3) verified available in East US
- β Deployment Automation: Scripts working correctly
- β Template Validation: Bicep templates compile successfully
- β Documentation: Complete setup and troubleshooting guides
- β Managed Identity Deployment: Successfully deployed with fresh resource group pattern
- β HSM Post-Deployment: Process documented for production workloads
- Basic Version Issues: See README-basic.md β Troubleshooting section
- Managed Identity Issues: See README-managed-identity.md β Troubleshooting section
- Common Problems: BCP081 warnings are expected and safe to ignore
- Capacity Issues: Use the automated zone testing to find available zones
Both templates provide:
- Automated zone availability testing - No more guesswork on capacity
- Flexible deployment options - Choose your zone (1, 2, or 3)
- Complete documentation - Step-by-step guides with examples
- Production-ready code - Tested and validated templates
You'll want to use Lustre when your workloads demand extreme I/O performance, massive parallelism, and low-latency access to large datasets β especially in high-performance computing (HPC) and AI scenarios.
| Scenario | Why Lustre Excels |
|---|---|
| MPI-based HPC workloads | Parallel file access with RDMA support |
| AI/ML training | Fast access to large datasets, especially with GPUs |
| Genomics & Bioinformatics | Handles millions of small files with high throughput |
| Seismic & CFD simulations | Sustains multi-GB/s reads/writes across compute nodes |
| Financial modeling | Low-latency access for time-sensitive calculations |
| Video rendering & processing | High bandwidth for large media files |
π‘ Key Insight: Lustre is designed to keep up with your compute, not slow it down. It's used in supercomputers like Frontier and Fugaku, and powers many of the world's top 100 HPC clusters.
Since you're already working with MPI, RDMA, and AMLFS:
- β Use Lustre (via AMLFS) when you need parallel I/O across many nodes
- π Stick with Managed Disks or NFS for simpler, single-node workloads
- π° Consider Blob integration with AMLFS to tier cold data cost-effectively
- Azure Managed Lustre File System Documentation - Official Azure AMLFS documentation
- HPC Pack Documentation - Microsoft HPC Pack official guide
- Azure Bicep Documentation - Infrastructure as Code with Bicep
- Azure RBAC Documentation - Role-Based Access Control guide
- Lustre.org - Official Lustre project website
- Lustre Operations Manual - Comprehensive Lustre administration guide
- OpenSFS Foundation - Open Scalable File Systems community
- Top500 Supercomputers - List of world's fastest supercomputers
- Exascale Computing Project - US Department of Energy HPC initiative
- OpenMPI Documentation - Message Passing Interface implementation
- Azure Well-Architected Framework - Design principles for Azure solutions
- Azure HPC Architecture - HPC patterns and practices
- Azure Storage Performance Guide - Storage optimization strategies
- Azure HPC Tech Community - Microsoft Tech Community for HPC
- Stack Overflow - Azure HPC - Community Q&A for Azure HPC
- GitHub - Azure HPC Examples - Official Azure HPC samples and templates
π Start with the version that matches your needs, and you'll have AMLFS running in minutes!
Quick Navigation: