Skip to content

Gap-7: Functions Diagnose (B → A) — Trigger + Durable troubleshooting#1641

Open
paulyuk wants to merge 1 commit intomainfrom
pass-equity-gap-7
Open

Gap-7: Functions Diagnose (B → A) — Trigger + Durable troubleshooting#1641
paulyuk wants to merge 1 commit intomainfrom
pass-equity-gap-7

Conversation

@paulyuk
Copy link
Copy Markdown
Member

@paulyuk paulyuk commented Apr 1, 2026

Closes #1615 | Parent: #1608

2 files: triggers.md (HTTP, Timer, Queue, Blob, Service Bus, Cosmos trigger troubleshooting), durable.md (stuck orchestrations, non-determinism, task hub conflicts).

Starting assessment — domain experts should review.

Closes #1615 | Parent: #1608
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 17:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Azure Functions troubleshooting references focused on trigger-specific diagnostics and Durable Functions failure modes, supporting Gap-7 “Functions Diagnose (B → A)” coverage and closing #1615.

Changes:

  • Added trigger-specific troubleshooting guide (HTTP, Timer, Queue, Blob, Service Bus, Cosmos DB) with KQL and CLI diagnostics.
  • Added Durable Functions troubleshooting guide for stuck orchestrations, non-determinism, task hub conflicts, and retry/timer patterns.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
plugin/skills/azure-diagnostics/references/functions/triggers.md New trigger-focused troubleshooting matrix with KQL/CLI steps per trigger type
plugin/skills/azure-diagnostics/references/functions/durable.md New Durable Functions troubleshooting sections for common operational failure scenarios

Comment on lines +5 to +9
| Symptom | Cause | Fix |
|---------|-------|-----|
| 5xx on every request | Function runtime crash | Check `host.json` for misconfig; review App Insights `exceptions` table |
| 401 Unauthorized | Auth level mismatch | Verify `authLevel` in function.json matches request (function key vs anonymous) |
| 404 Not Found | Route prefix misconfigured | Check `host.json` `extensions.http.routePrefix` — default is `api` |
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown tables are using a double leading pipe (||), which renders as an empty first column and can break table formatting in many Markdown renderers. Use single leading/trailing pipes for each row (e.g., | Symptom | Cause | Fix |).

Copilot uses AI. Check for mistakes.
| 401 Unauthorized | Auth level mismatch | Verify `authLevel` in function.json matches request (function key vs anonymous) |
| 404 Not Found | Route prefix misconfigured | Check `host.json` `extensions.http.routePrefix` — default is `api` |
| CORS blocked | Missing allowed origins | `az functionapp cors add -n APP -g RG --allowed-origins "https://DOMAIN"` |
| 408 / timeout | Execution exceeds limit | Consumption plan: 5 min max. Increase `functionTimeout` in host.json or switch to Premium |
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidance for HTTP timeouts on Consumption is misleading: HTTP requests can time out due to front-end/gateway idle limits even if functionTimeout is increased, and the Consumption functionTimeout limit is not generally described as a fixed 5 minutes in all configurations. Consider updating this row to distinguish (1) function execution timeout (functionTimeout) vs (2) HTTP client/gateway timeouts, and recommend async patterns (e.g., 202 + status endpoint / Durable HTTP APIs) when long-running work is required.

Suggested change
| 408 / timeout | Execution exceeds limit | Consumption plan: 5 min max. Increase `functionTimeout` in host.json or switch to Premium |
| 408 / timeout | Long-running execution or HTTP client/gateway idle timeout | Check `functionTimeout` in host.json (execution timeout) and client/front-end idle limits. For long-running work use async patterns (202 + status endpoint or Durable HTTP APIs), or move to Premium for longer sync executions. |

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +40
| Timer fires twice | Multiple instances running | Set `"isSingleton": true` in function.json or use `host.json` singleton lock |
| Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan |
| Timer drift after deploy | Missed schedule catch-up | Set `UseMonitor: true` (default) — runtime tracks missed executions in storage |
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested Timer trigger settings look incorrect/inconsistent with how Functions config is typically expressed: isSingleton is not a common/standard function.json setting for Timer triggers, and UseMonitor: true is shown in a non-JSON form and with casing that may not match configuration/attribute usage (useMonitor in JSON vs UseMonitor in C# attribute). Recommend updating this row to reference the supported configuration mechanism(s) (e.g., host.json singleton settings / language-specific singleton attribute) and show the correct JSON property name/shape for the timer monitor setting.

Suggested change
| Timer fires twice | Multiple instances running | Set `"isSingleton": true` in function.json or use `host.json` singleton lock |
| Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan |
| Timer drift after deploy | Missed schedule catch-up | Set `UseMonitor: true` (default) — runtime tracks missed executions in storage |
| Timer fires twice | Multiple instances running | Configure singleton behavior via `host.json` (singleton settings) or language-specific singleton/lock attributes so only one instance runs the timer |
| Missed timer execution | App was stopped / scaled to zero | Enable Always On (`az functionapp config set -n APP -g RG --always-on true`) — requires App Service plan |
| Timer drift after deploy | Missed schedule catch-up | Ensure `"useMonitor": true` in the timer trigger binding in `function.json` (default) — runtime tracks missed executions in storage |

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +135
# Peek dead-letter messages
az servicebus queue authorization-rule list -n QUEUE --namespace-name NS -g RG
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command does not 'peek dead-letter messages'—it lists authorization rules for the queue. Replace with an approach that actually inspects DLQ message content (e.g., Service Bus Explorer / SDK-based peek/receive, or document that Azure CLI doesn’t directly support peeking DLQ messages and provide an alternative).

Suggested change
# Peek dead-letter messages
az servicebus queue authorization-rule list -n QUEUE --namespace-name NS -g RG
# View dead-letter message contents
# Note: Azure CLI does not currently support peeking DLQ message bodies directly.
# Use one of the following instead:
# - Service Bus Explorer in the Azure portal (Service Bus namespace -> Queues -> <QUEUE> -> Dead-letter)
# - An Azure Service Bus SDK (for example, .NET, Java, Python, or JavaScript) with a receiver scoped to:
# "<QUEUE>/$DeadLetterQueue"
# and using a "peek" or "receive" operation to inspect messages.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +47
# Purge completed/terminated instances older than 7 days
func durable purge-history --connection-string-setting AzureWebJobsStorage \
--task-hub-name TASKHUB --created-before "$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)"
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date -v-7d flag is BSD/macOS-specific and will fail on most Linux CI/containers. Consider either (a) documenting separate commands for macOS vs Linux, or (b) using a more portable approach (e.g., provide both date -u -d '7 days ago' ... and BSD variant).

Suggested change
# Purge completed/terminated instances older than 7 days
func durable purge-history --connection-string-setting AzureWebJobsStorage \
--task-hub-name TASKHUB --created-before "$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)"
# Purge completed/terminated instances older than 7 days (portable for Linux/macOS)
# Compute cutoff timestamp: try GNU date first, then BSD/macOS date
if CUTOFF=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null); then
:
else
CUTOFF=$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)
fi
func durable purge-history --connection-string-setting AzureWebJobsStorage \
--task-hub-name TASKHUB --created-before "$CUTOFF"

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +98
# Check current task hub name
az functionapp config appsettings list -n APP -g RG \
--query "[?name=='AzureWebJobsStorage']" -o table

# Inspect host.json for hub name
az webapp config show -n APP -g RG --query "siteConfig.appSettings"
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These diagnostics don’t actually identify the Durable task hub name: querying AzureWebJobsStorage only returns the storage setting, and az webapp config show ... appSettings surfaces app settings (not host.json). Update to query the actual hub name configuration source (e.g., the durable hub setting in app settings if set, or instruct checking host.json in the repo/deployed content via Kudu/zip deploy artifacts) and make the commands match what they claim to retrieve.

Suggested change
# Check current task hub name
az functionapp config appsettings list -n APP -g RG \
--query "[?name=='AzureWebJobsStorage']" -o table
# Inspect host.json for hub name
az webapp config show -n APP -g RG --query "siteConfig.appSettings"
# Check if a task hub name override is set via app settings
az functionapp config appsettings list -n APP -g RG \
--query "[?name=='AzureFunctionsJobHost__extensions__durableTask__hubName']" -o table
# If no override is set, the hub name comes from host.json:
# - Inspect host.json in your source repo, or
# - Use Kudu (Advanced Tools) / zip-deployed content to view host.json in the deployed app

Copilot uses AI. Check for mistakes.
--query "[?starts_with(name, 'TASKHUB')]" --output table
```

> 💡 **Tip:** Task hub names map to storage tables. The tables are named `<HubName>History` and `<HubName>Instances`.
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tip is incomplete for the Azure Storage provider: Durable typically uses multiple tables/queues beyond just History and Instances (e.g., control/work-item/partition-related artifacts, depending on version/provider). Suggest revising to say these are examples and list the full/common set (or point to the official Durable storage schema reference) to avoid misleading operators troubleshooting missing/corrupted artifacts.

Suggested change
> 💡 **Tip:** Task hub names map to storage tables. The tables are named `<HubName>History` and `<HubName>Instances`.
> 💡 **Tip:** For the Azure Storage provider, a task hub name is used as a prefix for multiple tables and queues. Common examples include the `<HubName>History` and `<HubName>Instances` tables, as well as control and work-item queues (for example, `<HubName>-control` and `<HubName>-workitems`). See the Durable Functions storage provider documentation for the full, up-to-date schema: https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-storage-providers#azure-storage

Copilot uses AI. Check for mistakes.
|-------------|---------------|---------------------|
| `DateTime.Now` / `DateTime.UtcNow` | Different value on replay | `context.CurrentUtcDateTime` |
| `Guid.NewGuid()` | Different GUID on replay | `context.NewGuid()` |
| `Thread.Sleep` / `Task.Delay` | Blocks orchestrator thread | `context.CreateTimer()` |
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task.Delay does not block a thread in the usual sense; the real issue is that orchestrators must be deterministic and use durable timers so the wait is replay-safe and persisted. Consider rewording the 'Why It Breaks' cell to focus on determinism/replay/persistence rather than thread blocking (which is accurate for Thread.Sleep but misleading for Task.Delay).

Suggested change
| `Thread.Sleep` / `Task.Delay` | Blocks orchestrator thread | `context.CreateTimer()` |
| `Thread.Sleep` / `Task.Delay` | Not replay-safe; uses non-durable timers so waits aren't persisted or deterministic | `context.CreateTimer()` |

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Details# 🔍 Token Analysis Report

@github-copilot-for-azure/scripts@1.0.0 tokens
node --import tsx src/tokens/cli.ts compare --base origin/main --head HEAD --markdown

fatal: path 'plugin/skills/azure-diagnostics/references/functions/durable.md' exists on disk, but not in 'origin/main'
fatal: path 'plugin/skills/azure-diagnostics/references/functions/triggers.md' exists on disk, but not in 'origin/main'

📊 Token Change Report

Comparing origin/mainHEAD

Summary

Metric Value
📈 Total Change +3,319 tokens (0%)
Before 0 tokens
After 3,319 tokens
Files Changed 2

Changed Files

File Before After Change
plugin/skills/azure-diagnostics/references/functions/triggers.md - 1,743 +1743
plugin/skills/azure-diagnostics/references/functions/durable.md - 1,576 +1576

@github-copilot-for-azure/scripts@1.0.0 tokens
node --import tsx src/tokens/cli.ts check --markdown

📊 Token Limit Check Report

Checked: 526 files
Exceeded: 69 files

⚠️ Files Exceeding Token Limits

File Tokens Limit Over By
.github/skills/analyze-test-run/SKILL.md 2471 500 +1971
.github/skills/file-test-bug/SKILL.md 628 500 +128
.github/skills/sensei/README.md 3531 2000 +1531
.github/skills/sensei/SKILL.md 2382 500 +1882
.github/skills/sensei/references/EXAMPLES.md 3707 2000 +1707
.github/skills/sensei/references/LOOP.md 4181 2000 +2181
.github/skills/sensei/references/SCORING.md 3927 2000 +1927
.github/skills/skill-authoring/SKILL.md 817 500 +317
plugin/skills/appinsights-instrumentation/SKILL.md 908 500 +408
plugin/skills/azure-ai/SKILL.md 817 500 +317
plugin/skills/azure-aigateway/SKILL.md 1258 500 +758
plugin/skills/azure-aigateway/references/policies.md 2342 2000 +342
plugin/skills/azure-cloud-migrate/references/services/functions/lambda-to-functions.md 2600 2000 +600
plugin/skills/azure-cloud-migrate/references/services/functions/runtimes/javascript.md 2181 2000 +181
plugin/skills/azure-compliance/SKILL.md 1185 500 +685
plugin/skills/azure-compute/SKILL.md 755 500 +255
plugin/skills/azure-compute/workflows/vm-recommender/vm-recommender.md 2393 2000 +393
plugin/skills/azure-compute/workflows/vm-troubleshooter/references/cannot-connect-to-vm.md 7308 2000 +5308
plugin/skills/azure-cost-optimization/SKILL.md 3900 500 +3400
plugin/skills/azure-deploy/SKILL.md 1562 500 +1062
plugin/skills/azure-diagnostics/SKILL.md 1132 500 +632
plugin/skills/azure-diagnostics/aks-troubleshooting/networking.md 2147 2000 +147
plugin/skills/azure-diagnostics/aks-troubleshooting/node-issues.md 2003 2000 +3
plugin/skills/azure-enterprise-infra-planner/SKILL.md 991 500 +491
plugin/skills/azure-enterprise-infra-planner/references/constraints/compute-apps.md 2022 2000 +22
plugin/skills/azure-hosted-copilot-sdk/SKILL.md 608 500 +108
plugin/skills/azure-kubernetes/SKILL.md 2266 500 +1766
plugin/skills/azure-kusto/SKILL.md 2149 500 +1649
plugin/skills/azure-messaging/SKILL.md 967 500 +467
plugin/skills/azure-prepare/SKILL.md 2607 500 +2107
plugin/skills/azure-prepare/references/aspire.md 2991 2000 +991
plugin/skills/azure-prepare/references/plan-template.md 2559 2000 +559
plugin/skills/azure-prepare/references/recipes/azd/terraform.md 3012 2000 +1012
plugin/skills/azure-prepare/references/resources-limits-quotas.md 3322 2000 +1322
plugin/skills/azure-prepare/references/security.md 2092 2000 +92
plugin/skills/azure-prepare/references/services/functions/bicep.md 3065 2000 +1065
plugin/skills/azure-prepare/references/services/functions/templates/SPEC-composable-templates.md 6187 2000 +4187
plugin/skills/azure-prepare/references/services/functions/templates/recipes/composition.md 4649 2000 +2649
plugin/skills/azure-prepare/references/services/functions/terraform.md 3358 2000 +1358
plugin/skills/azure-quotas/SKILL.md 3445 500 +2945
plugin/skills/azure-quotas/references/commands.md 2644 2000 +644
plugin/skills/azure-resource-lookup/SKILL.md 1279 500 +779
plugin/skills/azure-resource-visualizer/SKILL.md 2054 500 +1554
plugin/skills/azure-storage/SKILL.md 1180 500 +680
plugin/skills/azure-upgrade/SKILL.md 1001 500 +501
plugin/skills/azure-upgrade/references/services/functions/automation.md 3463 2000 +1463
plugin/skills/azure-upgrade/references/services/functions/consumption-to-flex.md 2773 2000 +773
plugin/skills/azure-validate/SKILL.md 906 500 +406
plugin/skills/entra-app-registration/SKILL.md 2068 500 +1568
plugin/skills/entra-app-registration/references/api-permissions.md 2545 2000 +545
plugin/skills/entra-app-registration/references/cli-commands.md 2211 2000 +211
plugin/skills/entra-app-registration/references/console-app-example.md 2752 2000 +752
plugin/skills/entra-app-registration/references/oauth-flows.md 2375 2000 +375
plugin/skills/microsoft-foundry/SKILL.md 2870 500 +2370
plugin/skills/microsoft-foundry/foundry-agent/create/create.md 3016 2000 +1016
plugin/skills/microsoft-foundry/foundry-agent/deploy/deploy.md 5511 2000 +3511
plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md 2342 2000 +342
plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md 4268 2000 +2268
plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md 2547 2000 +547
plugin/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md 2701 2000 +701
plugin/skills/microsoft-foundry/models/deploy-model/SKILL.md 1640 500 +1140
plugin/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md 1739 500 +1239
plugin/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md 2235 500 +1735
plugin/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md 3335 2000 +1335
plugin/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md 1226 500 +726
plugin/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md 5534 2000 +3534
plugin/skills/microsoft-foundry/quota/quota.md 2129 2000 +129
plugin/skills/microsoft-foundry/quota/references/capacity-planning.md 2029 2000 +29
plugin/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md 2162 2000 +162

Consider moving content to references/ subdirectories.


Automated token analysis. See skill authoring guidelines for best practices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gap-7: Functions Diagnose (B → A) — Trigger/binding/Durable troubleshooting + tests

2 participants