Skip to content

WIP: fix(connect): re-resolve context after dealer disconnect (silent resume)#1713

Closed
tobsch wants to merge 1 commit into
librespot-org:devfrom
tobsch:fix/reresolve-context-after-disconnect
Closed

WIP: fix(connect): re-resolve context after dealer disconnect (silent resume)#1713
tobsch wants to merge 1 commit into
librespot-org:devfrom
tobsch:fix/reresolve-context-after-disconnect

Conversation

@tobsch

@tobsch tobsch commented May 22, 2026

Copy link
Copy Markdown

WIP / lightly tested — opening for discussion on the right approach. Repro is reliable on my setup but I haven't run the full test matrix.

Problem

On a Spotify Connect device, pause → (idle) → resume can leave the device active in the Spotify app but silent. The log shows:

WARN  librespot_connect::state::context  couldn't load context info because: context is not available. type: Default

repeated, with no audio. Playback only recovers after the connect host is restarted.

Root cause

Spirc::handle_disconnect() calls self.context_resolver.clear(). The dealer connection gets dropped while paused — Spotify routinely drops idle Connect dealer connections, and this is much more frequent on accounts running several simultaneous Connect devices (multi-room). That triggers handle_disconnect, wiping the resolver queue.

On resume, set_active_context(ContextType::Default) calls get_context(), which returns StateError::NoContext because the context data is gone and nothing re-fetches it. The resume silently fails.

The transfer path (handle_transfer) has a partial fallback ("continuing transfer in an unknown state"), but a plain resume on the existing context does not re-resolve.

Change

Capture the current context_uri before clearing the resolver, then re-enqueue it as a ResolveContext::from_uri(.., ContextAction::Replace) so the context is re-resolved on the next activation.

Re-enqueuing is harmless for intentional disconnects: it only makes the context resolvable again; it does not start playback or re-acquire active status.

Open questions for reviewers

  1. Is handle_disconnect the right seam, or should the re-resolve happen on reconnect in handle_connection_id_update (gated on context_uri non-empty && get_context().is_err())? The reconnect side is more precise but has several early-return branches.
  2. Should this be limited to non-intentional disconnects (transient dealer drop) vs. the explicit Disconnect command / shutdown? Currently it fires for all; the wasted resolve on intentional disconnect is negligible but not zero.

Testing

Reproduced on a multi-zone setup (several Connect devices on one account) where pausing for a minute then resuming reliably produced the silent-resume + context is not available spam. With the patch, resume re-resolves and plays. Have not added unit coverage yet — pointers welcome on where Connect state transitions are tested.

🤖 Generated with Claude Code

@tobsch

tobsch commented May 23, 2026

Copy link
Copy Markdown
Author

Corrected root-cause analysis (my initial patch is insufficient)

After testing the handle_disconnect patch against a real reproduction, it does not fix the issue. Deeper tracing shows why, and clarifies the actual mechanism:

The context is wiped in ConnectState::became_inactive()reset_context(ResetContext::Completely), which sets self.context = None and player.context_uri.clear().

became_inactive is reached on a transient dealer drop: when paused, Spotify drops the idle dealer; on reconnect a ClusterUpdate arrives where active_device_id != our device, so handle_cluster_update calls handle_disconnect() (which my patch hooks) — but the dealer failures arrive in a burst (7 near-simultaneous librespot_core::dealer "peer closed connection" in my logs). Sequence:

  1. disconnect Stop pulseaudio sink when not in use #1handle_disconnect: capture context_uri (still set), re-enqueue ResolveContext, then became_inactive clears context_uri
  2. disconnect Added repeat and shuffle support from kingosticks  #2handle_disconnect: context_resolver.clear() wipes the re-enqueued resolve from step 1, and context_uri is now empty so it can't re-add

So the burst defeats a handle_disconnect-local capture. On resume, set_active_contextget_contextNoContext ("context is not available") and the device stays silent.

A correct fix needs a recovery context uri that survives the wipe, e.g. stash the uri in reset_context(Completely) before clearing into a persistent ConnectState field, and re-resolve from it on re-activation when get_context() is empty. Will push an updated commit along these lines; feedback on the preferred seam very welcome, since the Connect state machine has a lot of subtlety here.

When the device becomes inactive (e.g. Spotify drops the idle dealer
during a pause and a ClusterUpdate marks another device active),
`became_inactive` -> `reset_context(ResetContext::Completely)` wipes both
`self.context` and `player.context_uri`. On resume the device
re-activates but `set_active_context` -> `get_context` returns
`NoContext` ("context is not available") and playback never starts —
the device shows active in the app but stays silent until the connect
host is restarted.

Preserve the last non-empty context uri across the reset in a new
`ConnectState::recovery_context_uri` field, and on re-activation, if no
context is loaded, re-resolve it (`ResolveContext::from_uri`,
`ContextAction::Replace`). The value is taken (cleared) on use so we
don't repeatedly retry the same context.

Doing the recovery at activation (rather than at disconnect) avoids the
burst-of-disconnects race where each `handle_disconnect` clears the
resolver: re-activation happens once, after the dealer has settled.

WIP: reproduced on a multi-zone setup where pause→(idle dealer drop)→
resume reliably produced the silent-resume + `context is not available`
spam; with this patch resume re-resolves and plays. Feedback welcome on
whether `handle_activate` is the right seam vs. the transfer/load paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tobsch tobsch force-pushed the fix/reresolve-context-after-disconnect branch from 3a57d20 to 1430e2a Compare May 23, 2026 05:57
@tobsch

tobsch commented May 24, 2026

Copy link
Copy Markdown
Author

Retested this branch (commit 1430e2a) on real hardware (RPi 5, lox-audioserver beta.12 with stock JS — i.e. no other session/connect-host churn from local patches) against the exact repro: play a Spotify Connect zone → pause long enough for Spotify to drop the idle dealer (~1+ min) → resume.

Result: does not fix the symptom. On resume I still get the burst of:

[librespot_connect::state::context] couldn't load context info because: context is not available. type: Default
[librespot_connect::spirc] failed filling up next_track during stopping: Invalid state { context is not available. type: Default }

and playback plays the small buffered bit then stops (device shows active in the app, silent).

Likely why: the recovery hinges on stashing player.context_uri inside reset_context(Completely) before it's cleared — but in this path context_uri is frequently already empty by the time reset_context runs (it gets cleared earlier in the became_inactive / cluster-update flow), so recovery_context_uri is never populated and handle_activate has nothing to re-resolve. The take_recovery_context_uri() / re-resolve-once logic looks correct; the problem is purely that there's nothing to recover by the time we reach the reset.

So handle_activate may be the right place to trigger recovery, but the URI needs to be captured earlier (or recovered from a different source — e.g. the last transfer/load state, or the player's pre-inactive context) rather than at reset_context. Keeping this as draft until that seam is sorted out. Happy to test a revised approach on the same setup.

@tobsch

tobsch commented Jun 10, 2026

Copy link
Copy Markdown
Author

Closing this — after testing on real hardware it does not fix the symptom. The approach (stash context_uri in reset_context before the clear, re-resolve in handle_activate) is on the wrong seam: by the time reset_context(Completely) runs, context_uri is already empty, so there is nothing to stash or recover.

The real failure is in Spirc::handle_connection_id_update: when a device reconnects to the dealer as the still-active device but with a cleared/empty local context (e.g. after a prior handle_disconnectcontext_resolver.clear(), or an externally-recreated session), the !active_device_id.is_empty() || !same_session gate returns early without re-hydrating context. The app's next play/stop then drives handle_stopreset_playback_to_position(None)context is not available churn and stale-track playback. The fix belongs there — re-hydrate from cluster.player_state (which still carries the context_uri) on reconnect-as-active. Will open a fresh PR targeting that seam once validated on hardware.

@tobsch tobsch closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant