Skip to content

fix(azservicebus): set minimum 60s server-timeout for management operations#26499

Open
EldertGrootenboer wants to merge 2 commits intoAzure:mainfrom
EldertGrootenboer:fix/peek-messages-server-timeout-26421
Open

fix(azservicebus): set minimum 60s server-timeout for management operations#26499
EldertGrootenboer wants to merge 2 commits intoAzure:mainfrom
EldertGrootenboer:fix/peek-messages-server-timeout-26421

Conversation

@EldertGrootenboer
Copy link
Copy Markdown

Problem

Receiver.PeekMessages() consistently times out with context deadline exceeded on specific queues (#26421), while Receiver.ReceiveMessages() on the same queue succeeds quickly.

Root cause: The Go SDK passes the user's context deadline directly as the server-timeout AMQP application property on management RPC operations. The .NET SDK sends a fixed 60,000 ms and Java sends 59,000 ms. With typical user contexts of 10-20 seconds, the service doesn't have enough time to complete the partition scatter-gather that PeekMessages triggers on partitioned entities.

The service-side PeekRequestAsyncResult iterates all partitions sequentially, forwarding each peek to the partition's broker via an internal RPC link. Each hop uses RemainingTime() as its deadline, so slow partitions cascade into timeout exhaustion. ReceiveMessages bypasses this entirely by using a direct data link to a single partition.

Fix

Added a defaultServerTimeout constant (60 seconds) and serverTimeoutMillis(ctx) helper that returns max(ctx_remaining, 60s). Applied to all management operation enforcement points:

  • mgmt.goPeekMessages, ScheduleMessages, CancelScheduledMessages (explicit)
  • rpc.go — RPC fallback covering ReceiveDeferred, RenewLocks, RenewSessionLock, GetSessionState, SetSessionState, SettleOnMgmtLink

The user's context still governs client-side cancellation (the select on ctx.Done() in rpc.go is unchanged). The fix only affects the server-timeout value sent to the service, giving it sufficient headroom for partition scatter-gather.

Design decisions

  • 60s floor (not fixed): Unlike .NET's fixed 60s, Go uses max(remaining, 60s). This respects users who set longer timeouts (e.g., 120s) while establishing a floor that matches cross-SDK parity.
  • All management ops (not just PeekMessages): All management operations go through the same RPC path and could be affected by partition scatter-gather. Scoping to PeekMessages only would leave others vulnerable.
  • Dual-key guard in rpc.go: ScheduleMessages/CancelScheduledMessages use the vendor-prefixed key com.microsoft:server-timeout while PeekMessages and the RPC fallback use server-timeout. The guard now checks both keys to avoid sending a duplicate property.
  • No-deadline contexts now send explicit 60s: Previously, no server-timeout was sent when ctx had no deadline. Now 60000 is always sent, matching .NET/Java behavior.

Testing

6 table-driven unit tests for serverTimeoutMillis covering: no deadline, short deadline (10s floor), very short (1s floor), long deadline (120s respected), exact boundary (60s), and expired context. All pass.

Full module suite: 10/10 test packages pass, go vet clean, gofmt clean.

Cross-SDK Parity

SDK server-timeout Source
.NET Fixed 60,000 ms ServiceBusRetryOptions.cs, AmqpRequestMessage.cs
Java Fixed 59,000 ms ServiceBusConstants.java, MessageUtils.java
Go (before) User's ctx deadline
Go (after) max(ctx_remaining, 60,000) ms mgmt.go:serverTimeoutMillis()

Fixes #26421

…ations (Azure#26421)

- Add defaultServerTimeout (60s) constant and serverTimeoutMillis() helper
  that returns max(ctx remaining, 60s), matching .NET/Java SDK defaults
- Apply to PeekMessages, ScheduleMessages, CancelScheduledMessages (mgmt.go)
  and RPC fallback for all other management operations (rpc.go)
- Add dual-key guard in rpc.go to avoid duplicate server-timeout when
  operations use the vendor-prefixed com.microsoft:server-timeout key
- Add 6 unit tests for serverTimeoutMillis covering all edge cases
- Previously the user's context deadline was passed directly as server-timeout,
  causing premature timeouts on partitioned entities where the service needs
  up to 60s for partition scatter-gather

Fixes Azure#26421
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts how server-timeout is computed and applied for Service Bus RPC-based operations to reduce premature timeouts (notably Receiver.PeekMessages() on partitioned entities), aligning behavior more closely with other language SDK defaults.

Changes:

  • Introduces a defaultServerTimeout (60s) and serverTimeoutMillis(ctx) helper that uses max(ctx_remaining, 60s) (or 60s if no deadline).
  • Applies the computed server-timeout to management operations (explicitly in mgmt.go and as an RPC fallback in rpc.go, with a dual-key guard).
  • Adds unit tests covering serverTimeoutMillis behavior across deadline/no-deadline scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
sdk/messaging/azservicebus/internal/rpc.go Ensures RPC requests include a server-timeout (unless already set), with a guard against duplicate vendor-prefixed keys.
sdk/messaging/azservicebus/internal/mgmt.go Adds the 60s floor helper and applies it to management calls like Peek/Schedule/Cancel.
sdk/messaging/azservicebus/internal/mgmt_test.go Adds table-driven unit tests for serverTimeoutMillis.
sdk/messaging/azservicebus/CHANGELOG.md Documents the behavioral change for management operations’ server-timeout.

…comment wording

- Scope rpc.go server-timeout fallback to management operations only
  (com.microsoft: prefix), excluding CBS put-token requests
- Update comment wording per review: 'aligns with' instead of 'matches'
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Service Bus Queue PeekMessages always timeouts in specific queues

3 participants