fix(lorawan): prevent permanent WOULD_BLOCK when duty-cycle backoff_time is zero#15545
Open
hallard wants to merge 2 commits intoARMmbed:masterfrom
Open
fix(lorawan): prevent permanent WOULD_BLOCK when duty-cycle backoff_time is zero#15545hallard wants to merge 2 commits intoARMmbed:masterfrom
hallard wants to merge 2 commits intoARMmbed:masterfrom
Conversation
…ime is zero In schedule_tx(), when set_next_channel() returns DUTYCYCLE_RESTRICTED with backoff_time == 0 (which can occur due to sub-millisecond rounding when remaining duty-cycle time is nearly expired), the original code did not start the backoff timer and did not set _can_cancel_tx. However, the caller (process_scheduling_state) still set tx_ongoing=true and transitioned to DEVICE_STATE_SENDING. With no timer scheduled to retry, on_backoff_timer_expiry() never fires, handle_scheduling_failure() is never called, and reset_ongoing_tx() is never reached. The MAC is permanently stuck with tx_ongoing=true, causing all subsequent lorawan.send() calls to return LORAWAN_STATUS_WOULD_BLOCK (-1001) forever. Additionally, stop_sending() cannot recover the state because _can_cancel_tx is false, making clear_tx_pipe() return BUSY. Fix: enforce a minimum backoff of 1ms so the timer always fires regardless of how small the computed remaining time is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 1 - LoRaMac::disconnect() does not clear tx_ongoing: All timers (backoff, RX windows, ACK timeout) are stopped in disconnect(), which prevents the state machine from ever calling reset_ongoing_tx(). If a TX was in-flight at disconnect time, tx_ongoing remains true. After reconnect, _lw_session.active becomes true again but tx_ongoing is still true, so every subsequent lorawan.send() returns LORAWAN_STATUS_WOULD_BLOCK (-1001) permanently. Fix: call reset_ongoing_tx(true) at end of disconnect(). Bug 2 - QoS nb_trans retry leaves tx_ongoing stuck on re-send failure: When the network server configures nb_trans > LORAWAN_DEFAULT_QOS, post_process_tx_no_reception() queues a new state_controller(SCHEDULING) call via _queue->call() and returns early, leaving tx_ongoing=true from the first TX. If the queued scheduling fires but send_ongoing_tx() fails with a direct error (e.g. LORAWAN_STATUS_NO_CHANNEL_FOUND), process_scheduling_state silently ignores the failure because the _queue->call() return value is discarded, tx_ongoing stays true, and there is no path to reset_ongoing_tx(). Fix: in process_scheduling_state(), detect the case where send_ongoing_tx() failed while tx_ongoing was already true and explicitly clean up the state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In
LoRaMac::schedule_tx(), whenset_next_channel()returnsLORAWAN_STATUS_DUTYCYCLE_RESTRICTEDwithbackoff_time == 0, the original code skipped both starting the backoff timer and setting_can_cancel_tx. However, the caller (process_scheduling_stateinLoRaWANStack.cpp) still settx_ongoing = trueand transitioned toDEVICE_STATE_SENDING.This creates an unrecoverable stuck state:
on_backoff_timer_expiry()never called →handle_scheduling_failure()never called →reset_ongoing_tx()never calledtx_ongoingstaystruepermanentlylorawan.send()calls returnLORAWAN_STATUS_WOULD_BLOCK(-1001) foreverstop_sending()cannot recover the state either, because_can_cancel_txisfalse, makingclear_tx_pipe()returnLORAWAN_STATUS_BUSYThe
backoff_time == 0case can occur due to sub-millisecond rounding when the remaining duty-cycle time is nearly zero at the momentset_next_channel()runs.Fix
Enforce a minimum backoff of 1ms so the timer always fires, giving
on_backoff_timer_expiry()a path to retry or invokehandle_scheduling_failure()to clean up the state.Test plan
backoff_time) is unchangedbackoff_time == 0/DUTYCYCLE_RESTRICTEDcase no longer leavestx_ongoingstuckconnectivity/lorawan/tests/🤖 Generated with Claude Code