Skip to content

ricardojacomini/Microsoft-HPC

Repository files navigation

Microsoft HPC and AI - Deployment Solutions and Technical Documentation

Comprehensive deployment guides and automation scripts for Microsoft Azure High Performance Computing (HPC) and Artificial Intelligence (AI) solutions

πŸ“ Personal Notes Disclaimer
This document contains personal notes created by Ricardo S Jacomini - Azure HPC & AI SEE.
This is NOT an official Microsoft document and represents personal insights and experiences.

HPC-PACK Deployment

This folder contains resources to deploy an HPC Pack Cluster in Microsoft Azure.

Overview

The deployment sets up:

  • An HPC Pack Cluster tailored for Windows workloads.
  • A single head node configuration.
  • A new Active Directory Domain as part of the deployment process.

This setup is ideal for high-performance computing scenarios that require Windows-based infrastructure and centralized domain management.

Prerequisites

  • Azure subscription
  • Appropriate permissions to deploy resources and create Active Directory domains

Deployment

Follow the instructions in the deployment scripts or templates provided in this folder to initiate the setup.

Quick Start

πŸ“ HPC-PACK - Main deployment folder containing all necessary scripts and templates

Azure Managed Lustre File System (AMLFS) Deployment

This repository provides comprehensive Bicep templates and automation scripts for deploying Azure Managed Lustre File Systems (AMLFS) with different security and feature configurations.

πŸš€ Choose Your Deployment Version

πŸ“‹ Quick Comparison

Feature Basic Version Managed Identity Version
🎯 Use Case Testing & Development Production Workloads
πŸ” Authentication User credentials Managed Identity
πŸ›‘οΈ RBAC Setup Manual Automatic
πŸ“¦ HSM Support ❌ βœ… Blob storage integration
πŸ—„οΈ Storage Container ❌ βœ… Private container
πŸ”’ Security Level Basic NSG rules Enhanced + Lustre ports
πŸ”„ Credential Management Manual rotation Automatic
πŸ“Š Audit Trail Limited Full via managed identity
βš™οΈ Complexity Simple Advanced
⏱️ Setup Time ~5 minutes ~10 minutes

πŸ“ Available Solutions

🟦 Option 1: Basic Version

πŸ“– Complete Documentation: README-basic.md

Perfect for: Development, Testing, Quick Prototyping

# Quick start - Basic version
.\AMLFS\scripts\Test-AMLFSZones.ps1 -ResourceGroup "aml-rsj" -Location "eastus"

πŸ“‹ What you get:

  • βœ… Clean, minimal Bicep template
  • βœ… Automated zone testing
  • βœ… Basic network security
  • βœ… 8TiB AMLFS Premium-250
  • βœ… Simple deployment process

πŸ“ Files:

  • AMLFS/templates/infra-basic.bicep - Minimal template
  • AMLFS/scripts/Test-AMLFSZones.ps1 - Zone testing script
  • AMLFS/README-basic.md - Complete documentation

🟩 Option 2: Managed Identity Version

πŸ“– Complete Documentation: README-managed-identity.md

Perfect for: Production, Enterprise, Security-First Deployments

# Quick start - Managed identity version
.\AMLFS\scripts\Test-AMLFSZones-ManagedIdentity.ps1 -ResourceGroup "aml-rsj-managed-identity" -Location "eastus"

πŸ† What you get:

  • βœ… User-Assigned Managed Identity
  • βœ… Automatic RBAC role assignments
  • βœ… HSM (Hierarchical Storage Management)
  • βœ… Private blob container
  • βœ… Enhanced security rules
  • βœ… Production-ready configuration

πŸ“ Files:

  • AMLFS/templates/infra-managed-identity.bicep - Full-featured template
  • AMLFS/scripts/Test-AMLFSZones-ManagedIdentity.ps1 - Advanced testing script
  • AMLFS/README-managed-identity.md - Complete documentation

🎯 Decision Guide

Choose Basic Version if:

  • πŸ§ͺ You're testing or developing AMLFS solutions
  • ⚑ You need quick deployment with minimal configuration
  • πŸŽ“ You're learning AMLFS concepts
  • πŸ’° You want minimal resource overhead
  • πŸ”§ You prefer manual control over security settings

πŸ‘‰ Go to Basic Documentation

Choose Managed Identity Version if:

  • 🏒 You're deploying for production workloads
  • πŸ” You need enterprise-grade security
  • πŸ“Š You require audit trails and compliance
  • πŸ—‚οΈ You want HSM data tiering capabilities
  • πŸ€– You prefer automated credential management
  • πŸ‘₯ You're working in multi-tenant environments

πŸ‘‰ Go to Managed Identity Documentation


πŸ“š Common Prerequisites

Both versions require:

  1. Azure CLI installed and configured
  2. PowerShell 5.1+ (for Windows automation scripts)
  3. Azure Login: az login
  4. Proper permissions: Contributor role (Basic) or User Access Administrator (Managed Identity)

πŸ” Pre-Deployment Checks

Before using either version, run these commands to verify readiness:

# Check AMLFS quota and availability
az rest --method GET --url "https://management.azure.com/subscriptions/$(az account show --query id -o tsv)/providers/Microsoft.StorageCache/locations/eastus/usages?api-version=2023-05-01"

# Verify StorageCache provider is registered
az provider list --query "[?namespace=='Microsoft.StorageCache'].{Namespace:namespace, State:registrationState}" -o table

πŸš€ Quick Start Commands

Basic Version:

# Test and deploy basic version
.\AMLFS\scripts\Test-AMLFSZones.ps1 -ResourceGroup "aml-rsj" -Location "eastus"

πŸ“– Full Basic Guide β†’

Managed Identity Version:

# Test and deploy managed identity version
$resourceGroup = "amlfs-managed-identity-$(Get-Date -Format 'yyyyMMdd-HHmm')"
.\AMLFS\scripts\Test-AMLFSZones-ManagedIdentity.ps1 -ResourceGroup $resourceGroup -Location "eastus"

πŸ“– Full Managed Identity Guide β†’

πŸ› οΈ Repository Structure

πŸ“‚ AMLFS Deployment Repository
β”œβ”€β”€ πŸ“„ README.md                           # This overview file
β”œβ”€β”€ πŸ“„ AMLFS/README-basic.md               # Basic version documentation
β”œβ”€β”€ πŸ“„ AMLFS/README-managed-identity.md    # Managed identity version documentation
β”œβ”€β”€ πŸ“ AMLFS/templates/                    # Infrastructure templates
β”‚   β”œβ”€β”€ 🧩 infra-basic.bicep                   # Basic Bicep template
β”‚   β”œβ”€β”€ 🧩 infra-managed-identity.bicep        # Managed identity Bicep template
β”‚   β”œβ”€β”€ 🧩 infra.bicep                         # Legacy template
β”‚   └── οΏ½ infra-managed-identity.json         # Parameters file
β”œβ”€β”€ οΏ½πŸ“ AMLFS/scripts/                      # Automation scripts
β”‚   β”œβ”€β”€ πŸ€– Test-AMLFSZones.ps1                # Basic version zone testing
β”‚   β”œβ”€β”€ πŸ€– Test-AMLFSZones-ManagedIdentity.ps1 # Managed identity zone testing
β”‚   β”œβ”€β”€ πŸ€– next-steps.ps1                     # Post-deployment automation
β”‚   β”œβ”€β”€ πŸ€– create-vm.ps1                      # VM creation with fallbacks
β”‚   β”œβ”€β”€ πŸ€– Check-ManagedIdentityPermissions.ps1 # Permission validation
β”‚   β”œβ”€β”€ 🧩 kernel-downgrade.sh                # Lustre client installation
β”‚   └── πŸ“‹ Various utility scripts
└── πŸ“ AMLFS/pictures/                     # Documentation images
    └── πŸ–ΌοΈ diagram.png                        # Architecture diagram

πŸ“– Documentation Links

βœ… Verified Status

Both solutions have been tested and validated:

  • βœ… Zone Testing: All zones (1, 2, 3) verified available in East US
  • βœ… Deployment Automation: Scripts working correctly
  • βœ… Template Validation: Bicep templates compile successfully
  • βœ… Documentation: Complete setup and troubleshooting guides
  • βœ… Managed Identity Deployment: Successfully deployed with fresh resource group pattern
  • βœ… HSM Post-Deployment: Process documented for production workloads

πŸ†˜ Getting Help

  • Basic Version Issues: See README-basic.md β†’ Troubleshooting section
  • Managed Identity Issues: See README-managed-identity.md β†’ Troubleshooting section
  • Common Problems: BCP081 warnings are expected and safe to ignore
  • Capacity Issues: Use the automated zone testing to find available zones

πŸŽ‰ Success Stories

Both templates provide:

  • Automated zone availability testing - No more guesswork on capacity
  • Flexible deployment options - Choose your zone (1, 2, or 3)
  • Complete documentation - Step-by-step guides with examples
  • Production-ready code - Tested and validated templates

🎯 When to Choose Lustre for Your Workloads

You'll want to use Lustre when your workloads demand extreme I/O performance, massive parallelism, and low-latency access to large datasets β€” especially in high-performance computing (HPC) and AI scenarios.

πŸš€ Ideal Use Cases for Lustre

Scenario Why Lustre Excels
MPI-based HPC workloads Parallel file access with RDMA support
AI/ML training Fast access to large datasets, especially with GPUs
Genomics & Bioinformatics Handles millions of small files with high throughput
Seismic & CFD simulations Sustains multi-GB/s reads/writes across compute nodes
Financial modeling Low-latency access for time-sensitive calculations
Video rendering & processing High bandwidth for large media files

πŸ’‘ Key Insight: Lustre is designed to keep up with your compute, not slow it down. It's used in supercomputers like Frontier and Fugaku, and powers many of the world's top 100 HPC clusters.

🧠 Lustre's Decision Matrix

Since you're already working with MPI, RDMA, and AMLFS:

  • βœ… Use Lustre (via AMLFS) when you need parallel I/O across many nodes
  • πŸ“‚ Stick with Managed Disks or NFS for simpler, single-node workloads
  • πŸ’° Consider Blob integration with AMLFS to tier cold data cost-effectively

πŸ“š References

Official Microsoft Documentation

Lustre File System Resources

High-Performance Computing References

Azure Architecture & Best Practices

Community & Support


🌟 Start with the version that matches your needs, and you'll have AMLFS running in minutes!

Quick Navigation:

About

Microsoft-HPC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors