fix(lorawan): prevent permanent WOULD_BLOCK when duty-cycle backoff_time is zero by hallard · Pull Request #15545 · ARMmbed/mbed-os

hallard · 2026-03-06T09:36:14Z

Summary

In LoRaMac::schedule_tx(), when set_next_channel() returns LORAWAN_STATUS_DUTYCYCLE_RESTRICTED with backoff_time == 0, the original code skipped both starting the backoff timer and setting _can_cancel_tx. However, the caller (process_scheduling_state in LoRaWANStack.cpp) still set tx_ongoing = true and transitioned to DEVICE_STATE_SENDING.

This creates an unrecoverable stuck state:

No timer fires → on_backoff_timer_expiry() never called → handle_scheduling_failure() never called → reset_ongoing_tx() never called
tx_ongoing stays true permanently
All subsequent lorawan.send() calls return LORAWAN_STATUS_WOULD_BLOCK (-1001) forever
stop_sending() cannot recover the state either, because _can_cancel_tx is false, making clear_tx_pipe() return LORAWAN_STATUS_BUSY

The backoff_time == 0 case can occur due to sub-millisecond rounding when the remaining duty-cycle time is nearly zero at the moment set_next_channel() runs.

Fix

Enforce a minimum backoff of 1ms so the timer always fires, giving on_backoff_timer_expiry() a path to retry or invoke handle_scheduling_failure() to clean up the state.

// Before
case LORAWAN_STATUS_DUTYCYCLE_RESTRICTED:
    if (backoff_time != 0) {
        tr_debug("DC enforced: Transmitting in %lu ms", backoff_time);
        _can_cancel_tx = true;
        ...
        _lora_time.start(_params.timers.backoff_timer, backoff_time);
    }
    return LORAWAN_STATUS_OK;  // returns OK even when no timer was started!

// After
case LORAWAN_STATUS_DUTYCYCLE_RESTRICTED:
    if (backoff_time == 0) {
        backoff_time = 1;  // ensure timer always fires
    }
    tr_debug("DC enforced: Transmitting in %lu ms", backoff_time);
    _can_cancel_tx = true;
    ...
    _lora_time.start(_params.timers.backoff_timer, backoff_time);
    return LORAWAN_STATUS_OK;

Test plan

Verify normal duty-cycle restricted behaviour (non-zero backoff_time) is unchanged
Verify that a simulated backoff_time == 0 / DUTYCYCLE_RESTRICTED case no longer leaves tx_ongoing stuck
Run existing LoRaWAN unit tests: connectivity/lorawan/tests/

🤖 Generated with Claude Code

…ime is zero In schedule_tx(), when set_next_channel() returns DUTYCYCLE_RESTRICTED with backoff_time == 0 (which can occur due to sub-millisecond rounding when remaining duty-cycle time is nearly expired), the original code did not start the backoff timer and did not set _can_cancel_tx. However, the caller (process_scheduling_state) still set tx_ongoing=true and transitioned to DEVICE_STATE_SENDING. With no timer scheduled to retry, on_backoff_timer_expiry() never fires, handle_scheduling_failure() is never called, and reset_ongoing_tx() is never reached. The MAC is permanently stuck with tx_ongoing=true, causing all subsequent lorawan.send() calls to return LORAWAN_STATUS_WOULD_BLOCK (-1001) forever. Additionally, stop_sending() cannot recover the state because _can_cancel_tx is false, making clear_tx_pipe() return BUSY. Fix: enforce a minimum backoff of 1ms so the timer always fires regardless of how small the computed remaining time is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bug 1 - LoRaMac::disconnect() does not clear tx_ongoing: All timers (backoff, RX windows, ACK timeout) are stopped in disconnect(), which prevents the state machine from ever calling reset_ongoing_tx(). If a TX was in-flight at disconnect time, tx_ongoing remains true. After reconnect, _lw_session.active becomes true again but tx_ongoing is still true, so every subsequent lorawan.send() returns LORAWAN_STATUS_WOULD_BLOCK (-1001) permanently. Fix: call reset_ongoing_tx(true) at end of disconnect(). Bug 2 - QoS nb_trans retry leaves tx_ongoing stuck on re-send failure: When the network server configures nb_trans > LORAWAN_DEFAULT_QOS, post_process_tx_no_reception() queues a new state_controller(SCHEDULING) call via _queue->call() and returns early, leaving tx_ongoing=true from the first TX. If the queued scheduling fires but send_ongoing_tx() fails with a direct error (e.g. LORAWAN_STATUS_NO_CHANNEL_FOUND), process_scheduling_state silently ignores the failure because the _queue->call() return value is discarded, tx_ongoing stays true, and there is no path to reset_ongoing_tx(). Fix: in process_scheduling_state(), detect the case where send_ongoing_tx() failed while tx_ongoing was already true and explicitly clean up the state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hallard and others added 2 commits March 6, 2026 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lorawan): prevent permanent WOULD_BLOCK when duty-cycle backoff_time is zero#15545

fix(lorawan): prevent permanent WOULD_BLOCK when duty-cycle backoff_time is zero#15545
hallard wants to merge 2 commits intoARMmbed:masterfrom
hallard:lorawan/fix_backoff

hallard commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallard commented Mar 6, 2026

Summary

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant