feat: add opensource vLLM deployment pathway with container apps and gpu by mishraomp · Pull Request #184 · bcgov/ai-hub-tracking

mishraomp · 2026-04-11T23:56:26Z

AI Hub Infra Changes

Summary: 2 to add, 20 to change, 0 to destroy (across 4 stack(s))

Show plan details

Terraform will perform the following actions:

  # module.foundry_project["ai-hub-admin"].azapi_resource.rai_policy["gpt-5.1-chat"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/ai-hub-admin-gpt-5-1-chat-filter"
        name                      = "ai-hub-admin-gpt-5-1-chat-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1-mini"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-mini-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-mini-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4.1-nano"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"
                        # (4 unchanged attributes hidden)
                    },
                ]
                # (2 unchanged attributes hidden)
            }
        }
        id                        = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.CognitiveServices/accounts/ai-services-hub-test-foundry/raiPolicies/gcpe-media-monitoring-gpt-4-1-nano-filter"
        name                      = "gcpe-media-monitoring-gpt-4-1-nano-filter"
      ~ output                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

  # module.foundry_project["gcpe-media-monitoring"].azapi_resource.rai_policy["gpt-4o"] will be updated in-place
  ~ resource "azapi_resource" "rai_policy" {
      ~ body                      = {
          ~ properties = {
              ~ contentFilters = [
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "selfharm" -> "hate"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Hate" -> "selfharm"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Sexual" -> "sexual"
                        # (4 unchanged attributes hidden)
                    },
                  ~ {
                      ~ name              = "Violence" -> "violence"

Updated by CI — plan against test environment (run #358) at 2026-04-13 03:22:30 UTC.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-12T00:16:36Z

Portal Preview


Preview URL	https://pr184-ai-hub-onboarding.azurewebsites.net
App Service	`pr184-ai-hub-onboarding`
Mode	Dev auto-login (no Keycloak)

Deployed by run #358 — destroyed automatically on PR close.

Implements azureml_registry as a second model source option in the vllm-service module, complementing the default huggingface source. Changes: - modules/vllm-service/variables.tf: add model_source enum var and azureml_registry nullable object var; update descriptions - modules/vllm-service/main.tf: - new locals: use_azureml_registry_source, azureml_download_parent, azureml_model_root, vllm_model_arg, azureml_init_image - azurerm_user_assigned_identity.azureml_downloader (count-conditional) created before Container App to avoid RBAC propagation race - null_resource.build_azureml_init_image builds Dockerfile.azureml-init into module ACR; filemd5 triggers on Dockerfile + script changes - dynamic identity block on Container App for UA identity - vllm_model_arg replaces hardcoded var.model_id in args; adds --served-model-name for azureml_registry source to keep API name stable - HF secret, HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE, HF_TOKEN env gated on model_source == huggingface - dynamic init_container block: downloads model via azureml-init image, mounts model-cache volume, passes UA client_id for managed identity login - check block validates azureml_registry != null when source is azureml_registry - modules/vllm-service/Dockerfile.azureml-init: pins azure-cli:2.67.0, pre-bakes az ml extension, CMDs azureml-init.sh - modules/vllm-service/azureml-init.sh: idempotent download with .download-complete marker, atomic move, config.json validation - modules/vllm-service/outputs.tf: add azureml_downloader_principal_id; update descriptions - stacks/vllm/locals.tf: add use_azureml_source local - stacks/vllm/main.tf: thread model_source + azureml_registry from vllm_config; add azurerm_role_assignment.azureml_registry_user - stacks/vllm/README.md: document AzureML Registry Source section - model-deployments.md: document azureml_registry source option with registration steps, format requirements, and source comparison table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RBAC race (MAJOR): move azurerm_role_assignment.azureml_registry_user from stacks/vllm into modules/vllm-service so it can be placed in the Container App depends_on chain. Previously the init container could start before the AzureML Registry User assignment propagated. The Container App now explicitly waits on the role assignment resource. Remove use_azureml_source local from stacks/vllm/locals.tf — no longer referenced after role assignment moved to module. NSG bypass (MAJOR): replace the broad AllowVnetInbound-* dynamic rules (all address spaces) on the vllm-aca NSG with three targeted rules: - AllowAzureLoadBalancerInbound (priority 200): ACA health probes - AllowPeSubnetInbound (priority 210): APIM backend calls via PE NIC - AllowApimSubnetInbound (priority 220): direct APIM calls before PE DNS Broad VNet inbound is removed; only APIM (via PE and direct) and platform health probes can reach the vLLM Container App. Update stacks/vllm/README.md RBAC wiring section and .github/skills/network/references/REFERENCE.md NSG table accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…, and align network docs - Add vLLM Open-Source Models section to services.html with nav card, model table, and cold-start warning - Add vllm-service module and vllm stack sections to terraform-reference.html (also added missing pii-redaction stack) - Add vllm to Phase 3 deployment table in iac-coder SKILL.md - Align network SKILL.md with vllm-aca-subnet: update output contract, doc sync rules, code locations, known subnets, env allocation tables - Add vllm-aca-subnet to network variables.tf validation block - Add vLLM subnet-allocation-{env}.json entry to workflows.html - Rebuild published docs via build.sh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…offline mode Add a HuggingFace download init container that pre-downloads the model to the persistent Azure Files cache before vLLM starts. A .download-complete marker skips re-download on subsequent restarts. The main container then runs with HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1, guaranteeing zero network calls at runtime regardless of model source. Changes: - offline_mode default changed from false to true (module + stack fallback) - HF init container added (conditional on model_source=huggingface + offline_mode) - HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE now set for all sources when offline - HF token secret created whenever token is provided (init container needs it) - AzureML init validates tokenizer assets (tokenizer_config.json + tokenizer) - huggingface-cli fallback to python snapshot_download if CLI unavailable - Updated stacks/vllm/README.md with Model Caching section - Updated services.html and terraform-reference.html docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tate guard, tenant-info - Fix gpu_memory_utilization error message to match exclusive upper bound (< 1) - Add completion_tokens assertion to SSE streaming integration test - Add check block warning when tenant enables vLLM but stack not deployed - Gate vllm_models in tenant-info response on vllm_enabled flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add hub vLLM deployment pathway

09ca963

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mishraomp temporarily deployed to tools April 11, 2026 23:56 — with GitHub Actions Inactive

mishraomp temporarily deployed to test April 11, 2026 23:58 — with GitHub Actions Inactive

mishraomp self-assigned this Apr 12, 2026

mishraomp added enhancement New feature or request apim Issues or Pull Requests related Azure APIM labels Apr 12, 2026

documentation updates

72287ce

mishraomp temporarily deployed to tools April 12, 2026 18:46 — with GitHub Actions Inactive

mishraomp temporarily deployed to test April 12, 2026 18:47 — with GitHub Actions Inactive

mishraomp temporarily deployed to tools April 13, 2026 00:28 — with GitHub Actions Inactive

mishraomp temporarily deployed to test April 13, 2026 00:29 — with GitHub Actions Inactive

mishraomp temporarily deployed to tools April 13, 2026 00:59 — with GitHub Actions Inactive

mishraomp temporarily deployed to test April 13, 2026 01:00 — with GitHub Actions Inactive

mishraomp and others added 2 commits April 12, 2026 18:20

mishraomp temporarily deployed to tools April 13, 2026 02:52 — with GitHub Actions Inactive

mishraomp temporarily deployed to test April 13, 2026 02:53 — with GitHub Actions Inactive

mishraomp temporarily deployed to tools April 13, 2026 03:15 — with GitHub Actions Inactive

mishraomp deployed to tools April 13, 2026 03:15 — with GitHub Actions Active

mishraomp deployed to test April 13, 2026 03:17 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opensource vLLM deployment pathway with container apps and gpu#184

feat: add opensource vLLM deployment pathway with container apps and gpu#184
mishraomp wants to merge 7 commits intomainfrom
try/gemma4

mishraomp commented Apr 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mishraomp commented Apr 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Hub Infra Changes

Uh oh!

github-actions bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Portal Preview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mishraomp commented Apr 11, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Apr 12, 2026 •

edited

Loading