Gap-7: Functions Diagnose (B → A) — Trigger + Durable troubleshooting#1641
Gap-7: Functions Diagnose (B → A) — Trigger + Durable troubleshooting#1641
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new Azure Functions troubleshooting references focused on trigger-specific diagnostics and Durable Functions failure modes, supporting Gap-7 “Functions Diagnose (B → A)” coverage and closing #1615.
Changes:
- Added trigger-specific troubleshooting guide (HTTP, Timer, Queue, Blob, Service Bus, Cosmos DB) with KQL and CLI diagnostics.
- Added Durable Functions troubleshooting guide for stuck orchestrations, non-determinism, task hub conflicts, and retry/timer patterns.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| plugin/skills/azure-diagnostics/references/functions/triggers.md | New trigger-focused troubleshooting matrix with KQL/CLI steps per trigger type |
| plugin/skills/azure-diagnostics/references/functions/durable.md | New Durable Functions troubleshooting sections for common operational failure scenarios |
| | Symptom | Cause | Fix | | ||
| |---------|-------|-----| | ||
| | 5xx on every request | Function runtime crash | Check `host.json` for misconfig; review App Insights `exceptions` table | | ||
| | 401 Unauthorized | Auth level mismatch | Verify `authLevel` in function.json matches request (function key vs anonymous) | | ||
| | 404 Not Found | Route prefix misconfigured | Check `host.json` `extensions.http.routePrefix` — default is `api` | |
There was a problem hiding this comment.
Markdown tables are using a double leading pipe (||), which renders as an empty first column and can break table formatting in many Markdown renderers. Use single leading/trailing pipes for each row (e.g., | Symptom | Cause | Fix |).
| | 401 Unauthorized | Auth level mismatch | Verify `authLevel` in function.json matches request (function key vs anonymous) | | ||
| | 404 Not Found | Route prefix misconfigured | Check `host.json` `extensions.http.routePrefix` — default is `api` | | ||
| | CORS blocked | Missing allowed origins | `az functionapp cors add -n APP -g RG --allowed-origins "https://DOMAIN"` | | ||
| | 408 / timeout | Execution exceeds limit | Consumption plan: 5 min max. Increase `functionTimeout` in host.json or switch to Premium | |
There was a problem hiding this comment.
The guidance for HTTP timeouts on Consumption is misleading: HTTP requests can time out due to front-end/gateway idle limits even if functionTimeout is increased, and the Consumption functionTimeout limit is not generally described as a fixed 5 minutes in all configurations. Consider updating this row to distinguish (1) function execution timeout (functionTimeout) vs (2) HTTP client/gateway timeouts, and recommend async patterns (e.g., 202 + status endpoint / Durable HTTP APIs) when long-running work is required.
| | 408 / timeout | Execution exceeds limit | Consumption plan: 5 min max. Increase `functionTimeout` in host.json or switch to Premium | | |
| | 408 / timeout | Long-running execution or HTTP client/gateway idle timeout | Check `functionTimeout` in host.json (execution timeout) and client/front-end idle limits. For long-running work use async patterns (202 + status endpoint or Durable HTTP APIs), or move to Premium for longer sync executions. | |
| | Timer fires twice | Multiple instances running | Set `"isSingleton": true` in function.json or use `host.json` singleton lock | | ||
| | Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan | | ||
| | Timer drift after deploy | Missed schedule catch-up | Set `UseMonitor: true` (default) — runtime tracks missed executions in storage | |
There was a problem hiding this comment.
The suggested Timer trigger settings look incorrect/inconsistent with how Functions config is typically expressed: isSingleton is not a common/standard function.json setting for Timer triggers, and UseMonitor: true is shown in a non-JSON form and with casing that may not match configuration/attribute usage (useMonitor in JSON vs UseMonitor in C# attribute). Recommend updating this row to reference the supported configuration mechanism(s) (e.g., host.json singleton settings / language-specific singleton attribute) and show the correct JSON property name/shape for the timer monitor setting.
| | Timer fires twice | Multiple instances running | Set `"isSingleton": true` in function.json or use `host.json` singleton lock | | |
| | Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan | | |
| | Timer drift after deploy | Missed schedule catch-up | Set `UseMonitor: true` (default) — runtime tracks missed executions in storage | | |
| | Timer fires twice | Multiple instances running | Configure singleton behavior via `host.json` (singleton settings) or language-specific singleton/lock attributes so only one instance runs the timer | | |
| | Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan | | |
| | Timer drift after deploy | Missed schedule catch-up | Ensure `"useMonitor": true` in the timer trigger binding in `function.json` (default) — runtime tracks missed executions in storage | |
| # Peek dead-letter messages | ||
| az servicebus queue authorization-rule list -n QUEUE --namespace-name NS -g RG |
There was a problem hiding this comment.
This command does not 'peek dead-letter messages'—it lists authorization rules for the queue. Replace with an approach that actually inspects DLQ message content (e.g., Service Bus Explorer / SDK-based peek/receive, or document that Azure CLI doesn’t directly support peeking DLQ messages and provide an alternative).
| # Peek dead-letter messages | |
| az servicebus queue authorization-rule list -n QUEUE --namespace-name NS -g RG | |
| # View dead-letter message contents | |
| # Note: Azure CLI does not currently support peeking DLQ message bodies directly. | |
| # Use one of the following instead: | |
| # - Service Bus Explorer in the Azure portal (Service Bus namespace -> Queues -> <QUEUE> -> Dead-letter) | |
| # - An Azure Service Bus SDK (for example, .NET, Java, Python, or JavaScript) with a receiver scoped to: | |
| # "<QUEUE>/$DeadLetterQueue" | |
| # and using a "peek" or "receive" operation to inspect messages. |
| # Purge completed/terminated instances older than 7 days | ||
| func durable purge-history --connection-string-setting AzureWebJobsStorage \ | ||
| --task-hub-name TASKHUB --created-before "$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)" |
There was a problem hiding this comment.
The date -v-7d flag is BSD/macOS-specific and will fail on most Linux CI/containers. Consider either (a) documenting separate commands for macOS vs Linux, or (b) using a more portable approach (e.g., provide both date -u -d '7 days ago' ... and BSD variant).
| # Purge completed/terminated instances older than 7 days | |
| func durable purge-history --connection-string-setting AzureWebJobsStorage \ | |
| --task-hub-name TASKHUB --created-before "$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)" | |
| # Purge completed/terminated instances older than 7 days (portable for Linux/macOS) | |
| # Compute cutoff timestamp: try GNU date first, then BSD/macOS date | |
| if CUTOFF=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null); then | |
| : | |
| else | |
| CUTOFF=$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ) | |
| fi | |
| func durable purge-history --connection-string-setting AzureWebJobsStorage \ | |
| --task-hub-name TASKHUB --created-before "$CUTOFF" |
| # Check current task hub name | ||
| az functionapp config appsettings list -n APP -g RG \ | ||
| --query "[?name=='AzureWebJobsStorage']" -o table | ||
|
|
||
| # Inspect host.json for hub name | ||
| az webapp config show -n APP -g RG --query "siteConfig.appSettings" |
There was a problem hiding this comment.
These diagnostics don’t actually identify the Durable task hub name: querying AzureWebJobsStorage only returns the storage setting, and az webapp config show ... appSettings surfaces app settings (not host.json). Update to query the actual hub name configuration source (e.g., the durable hub setting in app settings if set, or instruct checking host.json in the repo/deployed content via Kudu/zip deploy artifacts) and make the commands match what they claim to retrieve.
| # Check current task hub name | |
| az functionapp config appsettings list -n APP -g RG \ | |
| --query "[?name=='AzureWebJobsStorage']" -o table | |
| # Inspect host.json for hub name | |
| az webapp config show -n APP -g RG --query "siteConfig.appSettings" | |
| # Check if a task hub name override is set via app settings | |
| az functionapp config appsettings list -n APP -g RG \ | |
| --query "[?name=='AzureFunctionsJobHost__extensions__durableTask__hubName']" -o table | |
| # If no override is set, the hub name comes from host.json: | |
| # - Inspect host.json in your source repo, or | |
| # - Use Kudu (Advanced Tools) / zip-deployed content to view host.json in the deployed app |
| --query "[?starts_with(name, 'TASKHUB')]" --output table | ||
| ``` | ||
|
|
||
| > 💡 **Tip:** Task hub names map to storage tables. The tables are named `<HubName>History` and `<HubName>Instances`. |
There was a problem hiding this comment.
This tip is incomplete for the Azure Storage provider: Durable typically uses multiple tables/queues beyond just History and Instances (e.g., control/work-item/partition-related artifacts, depending on version/provider). Suggest revising to say these are examples and list the full/common set (or point to the official Durable storage schema reference) to avoid misleading operators troubleshooting missing/corrupted artifacts.
| > 💡 **Tip:** Task hub names map to storage tables. The tables are named `<HubName>History` and `<HubName>Instances`. | |
| > 💡 **Tip:** For the Azure Storage provider, a task hub name is used as a prefix for multiple tables and queues. Common examples include the `<HubName>History` and `<HubName>Instances` tables, as well as control and work-item queues (for example, `<HubName>-control` and `<HubName>-workitems`). See the Durable Functions storage provider documentation for the full, up-to-date schema: https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-storage-providers#azure-storage |
| |-------------|---------------|---------------------| | ||
| | `DateTime.Now` / `DateTime.UtcNow` | Different value on replay | `context.CurrentUtcDateTime` | | ||
| | `Guid.NewGuid()` | Different GUID on replay | `context.NewGuid()` | | ||
| | `Thread.Sleep` / `Task.Delay` | Blocks orchestrator thread | `context.CreateTimer()` | |
There was a problem hiding this comment.
Task.Delay does not block a thread in the usual sense; the real issue is that orchestrators must be deterministic and use durable timers so the wait is replay-safe and persisted. Consider rewording the 'Why It Breaks' cell to focus on determinism/replay/persistence rather than thread blocking (which is accurate for Thread.Sleep but misleading for Task.Delay).
| | `Thread.Sleep` / `Task.Delay` | Blocks orchestrator thread | `context.CreateTimer()` | | |
| | `Thread.Sleep` / `Task.Delay` | Not replay-safe; uses non-durable timers so waits aren't persisted or deterministic | `context.CreateTimer()` | |
Details# 🔍 Token Analysis Report
fatal: path 'plugin/skills/azure-diagnostics/references/functions/durable.md' exists on disk, but not in 'origin/main' 📊 Token Change ReportComparing Summary
Changed Files
📊 Token Limit Check ReportChecked: 526 files
|
| File | Tokens | Limit | Over By |
|---|---|---|---|
.github/skills/analyze-test-run/SKILL.md |
2471 | 500 | +1971 |
.github/skills/file-test-bug/SKILL.md |
628 | 500 | +128 |
.github/skills/sensei/README.md |
3531 | 2000 | +1531 |
.github/skills/sensei/SKILL.md |
2382 | 500 | +1882 |
.github/skills/sensei/references/EXAMPLES.md |
3707 | 2000 | +1707 |
.github/skills/sensei/references/LOOP.md |
4181 | 2000 | +2181 |
.github/skills/sensei/references/SCORING.md |
3927 | 2000 | +1927 |
.github/skills/skill-authoring/SKILL.md |
817 | 500 | +317 |
plugin/skills/appinsights-instrumentation/SKILL.md |
908 | 500 | +408 |
plugin/skills/azure-ai/SKILL.md |
817 | 500 | +317 |
plugin/skills/azure-aigateway/SKILL.md |
1258 | 500 | +758 |
plugin/skills/azure-aigateway/references/policies.md |
2342 | 2000 | +342 |
plugin/skills/azure-cloud-migrate/references/services/functions/lambda-to-functions.md |
2600 | 2000 | +600 |
plugin/skills/azure-cloud-migrate/references/services/functions/runtimes/javascript.md |
2181 | 2000 | +181 |
plugin/skills/azure-compliance/SKILL.md |
1185 | 500 | +685 |
plugin/skills/azure-compute/SKILL.md |
755 | 500 | +255 |
plugin/skills/azure-compute/workflows/vm-recommender/vm-recommender.md |
2393 | 2000 | +393 |
plugin/skills/azure-compute/workflows/vm-troubleshooter/references/cannot-connect-to-vm.md |
7308 | 2000 | +5308 |
plugin/skills/azure-cost-optimization/SKILL.md |
3900 | 500 | +3400 |
plugin/skills/azure-deploy/SKILL.md |
1562 | 500 | +1062 |
plugin/skills/azure-diagnostics/SKILL.md |
1132 | 500 | +632 |
plugin/skills/azure-diagnostics/aks-troubleshooting/networking.md |
2147 | 2000 | +147 |
plugin/skills/azure-diagnostics/aks-troubleshooting/node-issues.md |
2003 | 2000 | +3 |
plugin/skills/azure-enterprise-infra-planner/SKILL.md |
991 | 500 | +491 |
plugin/skills/azure-enterprise-infra-planner/references/constraints/compute-apps.md |
2022 | 2000 | +22 |
plugin/skills/azure-hosted-copilot-sdk/SKILL.md |
608 | 500 | +108 |
plugin/skills/azure-kubernetes/SKILL.md |
2266 | 500 | +1766 |
plugin/skills/azure-kusto/SKILL.md |
2149 | 500 | +1649 |
plugin/skills/azure-messaging/SKILL.md |
967 | 500 | +467 |
plugin/skills/azure-prepare/SKILL.md |
2607 | 500 | +2107 |
plugin/skills/azure-prepare/references/aspire.md |
2991 | 2000 | +991 |
plugin/skills/azure-prepare/references/plan-template.md |
2559 | 2000 | +559 |
plugin/skills/azure-prepare/references/recipes/azd/terraform.md |
3012 | 2000 | +1012 |
plugin/skills/azure-prepare/references/resources-limits-quotas.md |
3322 | 2000 | +1322 |
plugin/skills/azure-prepare/references/security.md |
2092 | 2000 | +92 |
plugin/skills/azure-prepare/references/services/functions/bicep.md |
3065 | 2000 | +1065 |
plugin/skills/azure-prepare/references/services/functions/templates/SPEC-composable-templates.md |
6187 | 2000 | +4187 |
plugin/skills/azure-prepare/references/services/functions/templates/recipes/composition.md |
4649 | 2000 | +2649 |
plugin/skills/azure-prepare/references/services/functions/terraform.md |
3358 | 2000 | +1358 |
plugin/skills/azure-quotas/SKILL.md |
3445 | 500 | +2945 |
plugin/skills/azure-quotas/references/commands.md |
2644 | 2000 | +644 |
plugin/skills/azure-resource-lookup/SKILL.md |
1279 | 500 | +779 |
plugin/skills/azure-resource-visualizer/SKILL.md |
2054 | 500 | +1554 |
plugin/skills/azure-storage/SKILL.md |
1180 | 500 | +680 |
plugin/skills/azure-upgrade/SKILL.md |
1001 | 500 | +501 |
plugin/skills/azure-upgrade/references/services/functions/automation.md |
3463 | 2000 | +1463 |
plugin/skills/azure-upgrade/references/services/functions/consumption-to-flex.md |
2773 | 2000 | +773 |
plugin/skills/azure-validate/SKILL.md |
906 | 500 | +406 |
plugin/skills/entra-app-registration/SKILL.md |
2068 | 500 | +1568 |
plugin/skills/entra-app-registration/references/api-permissions.md |
2545 | 2000 | +545 |
plugin/skills/entra-app-registration/references/cli-commands.md |
2211 | 2000 | +211 |
plugin/skills/entra-app-registration/references/console-app-example.md |
2752 | 2000 | +752 |
plugin/skills/entra-app-registration/references/oauth-flows.md |
2375 | 2000 | +375 |
plugin/skills/microsoft-foundry/SKILL.md |
2870 | 500 | +2370 |
plugin/skills/microsoft-foundry/foundry-agent/create/create.md |
3016 | 2000 | +1016 |
plugin/skills/microsoft-foundry/foundry-agent/deploy/deploy.md |
5511 | 2000 | +3511 |
plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md |
2342 | 2000 | +342 |
plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md |
4268 | 2000 | +2268 |
plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md |
2547 | 2000 | +547 |
plugin/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md |
2701 | 2000 | +701 |
plugin/skills/microsoft-foundry/models/deploy-model/SKILL.md |
1640 | 500 | +1140 |
plugin/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md |
1739 | 500 | +1239 |
plugin/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md |
2235 | 500 | +1735 |
plugin/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md |
3335 | 2000 | +1335 |
plugin/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md |
1226 | 500 | +726 |
plugin/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md |
5534 | 2000 | +3534 |
plugin/skills/microsoft-foundry/quota/quota.md |
2129 | 2000 | +129 |
plugin/skills/microsoft-foundry/quota/references/capacity-planning.md |
2029 | 2000 | +29 |
plugin/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md |
2162 | 2000 | +162 |
Consider moving content to
references/subdirectories.
Automated token analysis. See skill authoring guidelines for best practices.
Closes #1615 | Parent: #1608
2 files: triggers.md (HTTP, Timer, Queue, Blob, Service Bus, Cosmos trigger troubleshooting), durable.md (stuck orchestrations, non-determinism, task hub conflicts).