NVIDIA · antonslutskyms · Apr 13, 2026
diff --git a/cloud-service-providers/azure/workshops/aks-openclaw/README.md b/cloud-service-providers/azure/workshops/aks-openclaw/README.md
@@ -0,0 +1,200 @@
+# AKS OpenClaw + Microsoft Foundry
+
+This workshop folder contains Kubernetes manifests and a small container entrypoint helper used to run **[OpenClaw](https://github.com/openclaw/openclaw)** on **Azure Kubernetes Service (AKS)** with the gateway wired to **Microsoft Foundry** (Azure AI Foundry) as the model provider.
+
+In plain terms: you get a single **OpenClaw gateway pod** on your cluster that serves the Control UI and agent traffic on **port 18789**, while LLM calls go to your **Foundry project endpoint** using the correct Azure **Responses** API shape (`azure-openai-responses`), not the generic OpenAI Responses mode that Azure often rejects.
+
+---
+
+## What this codebase does
+
+### Components
+
+| Artifact | Role |
+|----------|------|
+| [`openclaw-k8s.yaml`](./openclaw-k8s.yaml) | Core stack: `ConfigMap` (`openclaw.json`), `ConfigMap` (`openclaw-foundry-endpoint` → `OPENCLAW_FOUNDRY_BASE_URL`), `Secret` (credentials), `Pod` (`openclaw`), `ClusterIP` `Service` on 18789. |
+| [`openclaw-ingress.yaml`](./openclaw-ingress.yaml) | Optional **`LoadBalancer`** `Service` (`openclaw-http`) mapping **80 → 18789** for a public Azure LB in front of the same pod selector. |
+| [`docker-entrypoint.sh`](./docker-entrypoint.sh) | Optional wrapper used when building a custom image: runs `openclaw doctor --fix`, optionally skips onboard when `OPENCLAW_SKIP_ONBOARD=1`, and passes **`--port`** / **`--token`** to `openclaw gateway run` when env vars are set. The manifest in this folder uses the upstream **`ghcr.io/openclaw/openclaw`** image; wire this script in your own `Dockerfile` if you need the same behavior in a custom build. |
+
+### Runtime flow
+
+1. **Init container** (`busybox`) copies `openclaw.json` from the `ConfigMap` into an **`emptyDir`** mounted at `/home/node/.openclaw`. This avoids mounting the ConfigMap file directly as `subPath`, which can block OpenClaw from renaming/updating its config (EBUSY on some setups).
+
+2. **Main container** runs OpenClaw with:
+   - **`OPENCLAW_SKIP_ONBOARD=1`** so the gateway starts from the pre-provisioned config instead of interactive `openclaw onboard` / `openclaw setup`.
+   - **`OPENCLAW_CONFIG_PATH=/home/node/.openclaw/openclaw.json`** pointing at the copied file.
+   - **`OPENCLAW_GATEWAY_BIND=lan`** (and `gateway.bind: "lan"` in JSON) so the process listens on **0.0.0.0**. If it bound only to loopback, **Kubernetes Services and LoadBalancers would see connection refused** from kube-proxy.
+
+3. **Microsoft Foundry** is configured under `models.providers.microsoft-foundry` in `openclaw.json`:
+   - **`baseUrl`** comes from env **`OPENCLAW_FOUNDRY_BASE_URL`**, injected from the **`openclaw-foundry-endpoint`** `ConfigMap` (set your resource and project names there, or override the map with `kubectl create configmap … --from-literal=…`).
+   - **`api`: `azure-openai-responses`** — required for Azure; generic `openai-responses` can produce payloads Azure rejects (400 schema / empty item type). Use a **recent OpenClaw** (e.g. **2026.4.x** as noted in the manifest comments).
+   - **`apiKey`** / headers use **`OPENCLAW_MODEL_API_KEY`** from the `Secret`.
+   - Default agent model is **`OPENCLAW_AGENT_PRIMARY_MODEL`**, e.g. `microsoft-foundry/gpt-5.3-chat`, aligned with the provider block and `agents.defaults.models`.
+
+4. **Gateway auth**: `gateway.auth.token` is expanded from **`OPENCLAW_GATEWAY_TOKEN`** in the `Secret`. The Control UI must use the **same** token (or a URL with `?token=...`). If the token is missing or mismatched, you will see **`token_missing` / `token_mismatch`** in logs. Prefer generating a stable random value (e.g. `openssl rand -hex 32`) and storing it only in the Secret (or creating the Secret out-of-band and **removing** inline `Secret` from YAML before commit).
+
+5. **Control UI over HTTP**: the sample `openclaw.json` sets `controlUi.allowInsecureAuth`, `dangerouslyDisableDeviceAuth`, and permissive `allowedOrigins` for **plain HTTP** (e.g. public LB without TLS). **This is appropriate for workshops only.** For production, terminate **TLS** (Ingress, Application Gateway, etc.) and set **`allowedOrigins`** to your real **`https://`** origin instead.
+
+### Namespace
+
+All resources use the **`nemoclaw`** namespace so **Pod labels**, **Services**, and **Endpoints** stay consistent. If the pod and Service are in different namespaces, a LoadBalancer can show **no endpoints** even when the pod is running.
+
+---
+
+## Prerequisites
+
+- An **AKS** cluster (or any Kubernetes cluster on Azure with working `LoadBalancer` if you use `openclaw-ingress.yaml`).
+- **`kubectl`** configured to the correct context.
+- **Network** from the cluster to your Foundry endpoint (`*.services.ai.azure.com` or your configured host).
+- **OpenClaw image**: default is `ghcr.io/openclaw/openclaw` with `imagePullPolicy: IfNotPresent`. For reproducible workshops, pin a tag (e.g. `2026.4.8`) in the `Pod` spec after you validate it.
+- **Foundry**: a project with a deployed model whose **id** matches the manifest (e.g. `gpt-5.3-chat`) or change the manifest to your model id and `OPENCLAW_AGENT_PRIMARY_MODEL` accordingly.
+
+---
+
+## One-time setup
+
+### 1. Namespace
+
+```bash
+kubectl create namespace nemoclaw --dry-run=client -o yaml | kubectl apply -f -
+```
+
+### 2. Secrets (recommended: out-of-band)
+
+Avoid re-applying placeholder secrets from Git. Create the Secret once:
+
+```bash
+GW=$(openssl rand -hex 32)
+kubectl create secret generic openclaw-credentials -n nemoclaw \
+  --from-literal=OPENCLAW_MODEL_API_KEY='YOUR_FOUNDRY_KEY' \
+  --from-literal=OPENCLAW_GATEWAY_TOKEN="$GW"
+```
+
+If the Secret already exists:
+
+```bash
+kubectl delete secret openclaw-credentials -n nemoclaw --ignore-not-found
+# then re-run create secret as above
+```
+
+**Before** applying [`openclaw-k8s.yaml`](./openclaw-k8s.yaml), either:
+
+- Remove the `Secret` object from the file and keep only `ConfigMap` + `Pod` + `Service`, **or**
+- Replace placeholders with real values and **never commit** real keys.
+
+### 3. Foundry endpoint (`OPENCLAW_FOUNDRY_BASE_URL`)
+
+The pod reads **`OPENCLAW_FOUNDRY_BASE_URL`** from the **`openclaw-foundry-endpoint`** `ConfigMap` (same value is referenced in `openclaw.json` as `${OPENCLAW_FOUNDRY_BASE_URL}`).
+
+**Option A — edit YAML:** In [`openclaw-k8s.yaml`](./openclaw-k8s.yaml), find `kind: ConfigMap` / `name: openclaw-foundry-endpoint` and replace `YOUR_FOUNDRY_RESOURCE_NAME` and `YOUR_FOUNDRY_PROJECT_NAME` in the URL.
+
+**Option B — CLI (good for CI):**
+
+```bash
+kubectl create configmap openclaw-foundry-endpoint -n nemoclaw --dry-run=client -o yaml \
+  --from-literal=OPENCLAW_FOUNDRY_BASE_URL='https://<resource>.services.ai.azure.com/api/projects/<project>/openai/v1' \
+  | kubectl apply -f -
+```
+
+### 4. Agent primary model
+
+In the **`Pod`** env, **`OPENCLAW_AGENT_PRIMARY_MODEL`** must stay `microsoft-foundry/<modelId>` where `<modelId>` matches `models[].id` under `microsoft-foundry` in the `openclaw-config` `openclaw.json`. If you change the model id, update both the env var and the `ConfigMap` JSON (`agents.defaults.models` keys must stay aligned).
+
+---
+
+## Deploy / run
+
+From this directory (`aks-openclaw`), after secrets and env edits are correct:
+
+### Apply core manifest (ConfigMap, Pod, Service)
+
+```bash
+kubectl delete pod openclaw -n nemoclaw --ignore-not-found && kubectl apply -f ./openclaw-k8s.yaml -n nemoclaw
+```
+
+Notes:
+
+- Resources in the YAML already declare **`metadata.namespace: nemoclaw`**. The **`-n nemoclaw`** flag is harmless and matches the workshop convention; it also sets the default namespace for any future objects you add without an explicit namespace.
+- Deleting the **Pod** forces a fresh pod (new `emptyDir`, re-run init container) while leaving the Service and ConfigMap in place.
+
+### Optional public LoadBalancer (HTTP)
+
+After the pod is ready:
+
+```bash
+kubectl apply -f ./openclaw-ingress.yaml -n nemoclaw
+```
+
+Check that endpoints are populated:
+
+```bash
+kubectl get endpoints openclaw-http -n nemoclaw -o wide
+kubectl get pods -n nemoclaw -l app=openclaw
+```
+
+If **Endpoints** are empty, verify **namespace**, **`app: openclaw`** labels, and **`OPENCLAW_GATEWAY_BIND=lan`**.
+
+### Tear down (optional)
+
+```bash
+kubectl delete -f ./openclaw-ingress.yaml -n nemoclaw --ignore-not-found
+kubectl delete -f ./openclaw-k8s.yaml -n nemoclaw --ignore-not-found
+kubectl delete secret openclaw-credentials -n nemoclaw --ignore-not-found
+```
+
+---
+
+## Access
+
+### Port-forward (simplest)
+
+```bash
+kubectl port-forward pod/openclaw 18789:18789 -n nemoclaw
+```
+
+Then open the Control UI / gateway at **`http://127.0.0.1:18789`** (exact path depends on OpenClaw version).
+
+### LoadBalancer service
+
+After `openclaw-ingress.yaml`:
+
+```bash
+kubectl get svc openclaw-http -n nemoclaw
+```
+
+Use the **EXTERNAL-IP** on port **80** (mapped to gateway **18789**).
+
+If you use the dashboard without `?token=...`, paste the same **`OPENCLAW_GATEWAY_TOKEN`** you stored in the Secret into Control UI settings.
+
+---
+
+## Custom image (optional)
+
+If you build an image that uses [`docker-entrypoint.sh`](./docker-entrypoint.sh), ensure the image **`ENTRYPOINT`** invokes this script before `openclaw gateway run`, and set the same env vars as in the `Pod` spec. The workshop manifest does **not** require a custom image unless you want this entrypoint behavior in the published image itself.
+
+Example build/push (adjust registry/tag):
+
+```bash
+docker build -t <registry>/openclaw:2026.4.8 .
+docker push <registry>/openclaw:2026.4.8
+```
+
+Then set `spec.containers[0].image` in [`openclaw-k8s.yaml`](./openclaw-k8s.yaml) to your image.
+
+---
+
+## Troubleshooting
+
+| Symptom | Things to check |
+|--------|------------------|
+| **401** from provider | Foundry key in **`OPENCLAW_MODEL_API_KEY`**, and **`OPENCLAW_AGENT_PRIMARY_MODEL`** uses **`microsoft-foundry/...`**, not `default/...`. |
+| **400** / schema errors from Azure | Confirm **`api`: `azure-openai-responses`** in the provider block; upgrade OpenClaw if needed. |
+| **LoadBalancer has no endpoints** | Pod in **`nemoclaw`**, label **`app: openclaw`**, gateway **`bind`** / **`OPENCLAW_GATEWAY_BIND`** is **`lan`**. |
+| **`token_missing` / `token_mismatch`** | Stable **`OPENCLAW_GATEWAY_TOKEN`** in Secret matches UI / URL token; avoid letting OpenClaw auto-generate a different token on disk. |
+| **Config / rename errors** | Init container + **`emptyDir`** pattern must stay; do not mount ConfigMap `subPath` directly onto the live config path OpenClaw mutates. |
+
+---
+
+## Security reminder
+
+Treat **`OPENCLAW_MODEL_API_KEY`** and **`OPENCLAW_GATEWAY_TOKEN`** as secrets. Prefer **`kubectl create secret`** or a secret manager integration, and **rotate** keys that may have been committed or shared.
diff --git a/cloud-service-providers/azure/workshops/aks-openclaw/docker-entrypoint.sh b/cloud-service-providers/azure/workshops/aks-openclaw/docker-entrypoint.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+# Run onboarding (or non-interactive setup) before the gateway when appropriate.
+#
+# OPENCLAW_SKIP_ONBOARD=1     — skip onboard/setup; go straight to the command
+# OPENCLAW_CONFIG_PATH        — if set, used to detect existing config (see OpenClaw CLI)
+# OPENCLAW_GATEWAY_PORT       — if set, passed as `openclaw gateway run --port …`
+# OPENCLAW_GATEWAY_TOKEN      — if set, passed as `openclaw gateway run --token …`
+
+set -eu
+
+CONFIG="${OPENCLAW_CONFIG_PATH:-${HOME}/.openclaw/openclaw.json}"
+
+openclaw doctor --fix >/dev/null 2>&1 || true
+
+wants_gateway=false
+if [ "$#" -ge 3 ] && [ "$1" = "openclaw" ] && [ "$2" = "gateway" ] && [ "$3" = "run" ]; then
+  wants_gateway=true
+fi
+
+if [ "${OPENCLAW_SKIP_ONBOARD:-0}" != "1" ] && [ "$wants_gateway" = true ]; then
+  if [ ! -f "$CONFIG" ]; then
+    if [ -t 0 ]; then
+      openclaw onboard
+    else
+      printf '%s\n' "openclaw: no config at ${CONFIG}; running openclaw setup (non-interactive)." \
+        "For full interactive onboarding, use: docker run -it ..." >&2
+      openclaw setup
+    fi
+  fi
+fi
+
+# Optional gateway listen/auth flags (e.g. Kubernetes + port-forward).
+# Only applies when the command is exactly `openclaw gateway run` with no extra args.
+if [ "$wants_gateway" = true ] && [ "$#" -eq 3 ]; then
+  if [ -n "${OPENCLAW_GATEWAY_PORT:-}" ] || [ -n "${OPENCLAW_GATEWAY_TOKEN:-}" ]; then
+    set -- openclaw gateway run
+    [ -n "${OPENCLAW_GATEWAY_PORT:-}" ] && set -- "$@" --port "$OPENCLAW_GATEWAY_PORT"
+    [ -n "${OPENCLAW_GATEWAY_TOKEN:-}" ] && set -- "$@" --token "$OPENCLAW_GATEWAY_TOKEN"
+  fi
+fi
+
+exec "$@"
diff --git a/cloud-service-providers/azure/workshops/aks-openclaw/openclaw-ingress.yaml b/cloud-service-providers/azure/workshops/aks-openclaw/openclaw-ingress.yaml
@@ -0,0 +1,34 @@
+# Public LoadBalancer for OpenClaw (port 18789 on pods → port 80 on LB).
+#
+# Prerequisites (same namespace as Pod + ClusterIP Service — default here: nemoclaw):
+#   - Pod `openclaw` labels include app=openclaw
+#   - Gateway must bind LAN: OPENCLAW_GATEWAY_BIND=lan (see openclaw-k8s.yaml). Loopback bind → no endpoints work.
+#
+# Debug empty LB:
+#   kubectl get endpoints openclaw-http -n nemoclaw -o wide
+#   kubectl get pods -n nemoclaw -l app=openclaw
+# If Endpoints shows no addresses, fix namespace/selector or pod Ready state.
+#
+# Apply:
+#   kubectl apply -f openclaw-ingress.yaml
+
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: openclaw-http
+  namespace: nemoclaw
+  labels:
+    app: openclaw
+  # annotations:
+  #   service.beta.kubernetes.io/azure-load-balancer-resource-group: "<MC_...>"
+  #   service.beta.kubernetes.io/azure-pip-name: "<pip-name>"
+spec:
+  type: LoadBalancer
+  selector:
+    app: openclaw
+  ports:
+    - name: http
+      port: 80
+      targetPort: 18789
+      protocol: TCP