Skip to content

feat: add opensource vLLM deployment pathway with container apps and gpu#184

Open
mishraomp wants to merge 7 commits intomainfrom
try/gemma4
Open

feat: add opensource vLLM deployment pathway with container apps and gpu#184
mishraomp wants to merge 7 commits intomainfrom
try/gemma4

Conversation

@mishraomp
Copy link
Copy Markdown
Collaborator

@mishraomp mishraomp commented Apr 11, 2026

AI Hub Infra Changes

Summary: 2 to add, 20 to change, 0 to destroy (across 4 stack(s))

Show plan details
Terraform will perform the following actions:

  # module.foundry_project["ai-hub-admin"].azapi_resource.rai_policy["gpt-5.1-chat"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/ai-hub-admin-gpt-5-1-chat-filter"
        name                      = "ai-hub-admin-gpt-5-1-chat-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1-mini"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-mini-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-mini-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1-nano"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-nano-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-nano-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4o"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"

Updated by CI — plan against test environment (run #358) at 2026-04-13 03:22:30 UTC.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 12, 2026

Portal Preview

Preview URL https://pr184-ai-hub-onboarding.azurewebsites.net
App Service pr184-ai-hub-onboarding
Mode Dev auto-login (no Keycloak)

Deployed by run #358 — destroyed automatically on PR close.

@mishraomp mishraomp self-assigned this Apr 12, 2026
@mishraomp mishraomp added enhancement New feature or request apim Issues or Pull Requests related Azure APIM labels Apr 12, 2026
Implements azureml_registry as a second model source option in the
vllm-service module, complementing the default huggingface source.

Changes:
- modules/vllm-service/variables.tf: add model_source enum var and
  azureml_registry nullable object var; update descriptions
- modules/vllm-service/main.tf:
  - new locals: use_azureml_registry_source, azureml_download_parent,
    azureml_model_root, vllm_model_arg, azureml_init_image
  - azurerm_user_assigned_identity.azureml_downloader (count-conditional)
    created before Container App to avoid RBAC propagation race
  - null_resource.build_azureml_init_image builds Dockerfile.azureml-init
    into module ACR; filemd5 triggers on Dockerfile + script changes
  - dynamic identity block on Container App for UA identity
  - vllm_model_arg replaces hardcoded var.model_id in args; adds
    --served-model-name for azureml_registry source to keep API name stable
  - HF secret, HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE, HF_TOKEN env gated on
    model_source == huggingface
  - dynamic init_container block: downloads model via azureml-init image,
    mounts model-cache volume, passes UA client_id for managed identity login
  - check block validates azureml_registry != null when source is azureml_registry
- modules/vllm-service/Dockerfile.azureml-init: pins azure-cli:2.67.0,
  pre-bakes az ml extension, CMDs azureml-init.sh
- modules/vllm-service/azureml-init.sh: idempotent download with
  .download-complete marker, atomic move, config.json validation
- modules/vllm-service/outputs.tf: add azureml_downloader_principal_id;
  update descriptions
- stacks/vllm/locals.tf: add use_azureml_source local
- stacks/vllm/main.tf: thread model_source + azureml_registry from
  vllm_config; add azurerm_role_assignment.azureml_registry_user
- stacks/vllm/README.md: document AzureML Registry Source section
- model-deployments.md: document azureml_registry source option with
  registration steps, format requirements, and source comparison table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RBAC race (MAJOR): move azurerm_role_assignment.azureml_registry_user from
stacks/vllm into modules/vllm-service so it can be placed in the Container
App depends_on chain. Previously the init container could start before the
AzureML Registry User assignment propagated. The Container App now explicitly
waits on the role assignment resource.

Remove use_azureml_source local from stacks/vllm/locals.tf — no longer
referenced after role assignment moved to module.

NSG bypass (MAJOR): replace the broad AllowVnetInbound-* dynamic rules
(all address spaces) on the vllm-aca NSG with three targeted rules:
  - AllowAzureLoadBalancerInbound (priority 200): ACA health probes
  - AllowPeSubnetInbound (priority 210): APIM backend calls via PE NIC
  - AllowApimSubnetInbound (priority 220): direct APIM calls before PE DNS

Broad VNet inbound is removed; only APIM (via PE and direct) and platform
health probes can reach the vLLM Container App.

Update stacks/vllm/README.md RBAC wiring section and
.github/skills/network/references/REFERENCE.md NSG table accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mishraomp and others added 2 commits April 12, 2026 18:20
…, and align network docs

- Add vLLM Open-Source Models section to services.html with nav card,
  model table, and cold-start warning
- Add vllm-service module and vllm stack sections to terraform-reference.html
  (also added missing pii-redaction stack)
- Add vllm to Phase 3 deployment table in iac-coder SKILL.md
- Align network SKILL.md with vllm-aca-subnet: update output contract,
  doc sync rules, code locations, known subnets, env allocation tables
- Add vllm-aca-subnet to network variables.tf validation block
- Add vLLM subnet-allocation-{env}.json entry to workflows.html
- Rebuild published docs via build.sh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…offline mode

Add a HuggingFace download init container that pre-downloads the model to
the persistent Azure Files cache before vLLM starts. A .download-complete
marker skips re-download on subsequent restarts. The main container then
runs with HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1, guaranteeing zero
network calls at runtime regardless of model source.

Changes:
- offline_mode default changed from false to true (module + stack fallback)
- HF init container added (conditional on model_source=huggingface + offline_mode)
- HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE now set for all sources when offline
- HF token secret created whenever token is provided (init container needs it)
- AzureML init validates tokenizer assets (tokenizer_config.json + tokenizer)
- huggingface-cli fallback to python snapshot_download if CLI unavailable
- Updated stacks/vllm/README.md with Model Caching section
- Updated services.html and terraform-reference.html docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tate guard, tenant-info

- Fix gpu_memory_utilization error message to match exclusive upper bound (< 1)
- Add completion_tokens assertion to SSE streaming integration test
- Add check block warning when tenant enables vLLM but stack not deployed
- Gate vllm_models in tenant-info response on vllm_enabled flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apim Issues or Pull Requests related Azure APIM enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant