Skip to content

fix(worker): cooperative shutdown via recv_timeout + atexit/R_unload (#103)#199

Open
CGMossa wants to merge 1 commit intomainfrom
fix/issue-103-worker-shutdown
Open

fix(worker): cooperative shutdown via recv_timeout + atexit/R_unload (#103)#199
CGMossa wants to merge 1 commit intomainfrom
fix/issue-103-worker-shutdown

Conversation

@CGMossa
Copy link
Copy Markdown
Collaborator

@CGMossa CGMossa commented Apr 17, 2026

Closes #103.

Problem

The worker thread (`worker-thread` feature) blocks forever on `mpsc::recv()`. The static `JOB_TX: OnceLock<SyncSender>` is never dropped, and `OnceLock` has no `take()` method, so `recv` never sees `Disconnected`. On process exit the worker stays pinned in a blocked syscall — harmless on Linux/macOS where `_exit` reaps threads, but a latent hang risk on Windows under `system2` pipe capture.

Fix (defense-in-depth)

  • Replace the blocking `while let Ok(job) = job_rx.recv()` loop with `recv_timeout(250 ms)` that polls a new `WORKER_SHOULD_STOP: AtomicBool`. Idle workers wake at most 4x/sec; in-flight jobs dispatch immediately.
  • Add `miniextendr_runtime_shutdown()` — public, idempotent, a no-op without the `worker-thread` feature.
  • `miniextendr_runtime_init` registers a libc `atexit` hook calling `miniextendr_runtime_shutdown`. atexit can be flaky on Windows but is additive — never hurts.
  • `miniextendr_init!` now also generates `R_unload_` which R calls on `detach(unload=TRUE)` / `dyn.unload()`.

Test plan

  • `just clippy`
  • `cargo clippy --workspace --all-targets --locked -- -D warnings` (clippy_default)
  • `cargo clippy --workspace --all-targets --locked --features -- -D warnings` (clippy_all)
  • `just fmt`
  • `just rcmdinstall` — builds & installs clean on worker-thread build
  • `just devtools-document` — `R_unload_miniextendr` doesn't surface as an R-level symbol, so NAMESPACE / wrappers unchanged from current
  • `just vendor` — `rpkg/inst/vendor.tar.xz` refreshed
  • `cargo test -p miniextendr-api --features worker-thread run_on_worker_reentry` — existing re-entry guard still green

Notes

  • The `recv_timeout` polling interval (250 ms) is the only new tuning knob — chosen so unloading a package is imperceptibly slow (~250 ms worst case) without the worker busy-polling the atomic flag when idle.
  • We deliberately do NOT track a `JoinHandle` or call `thread::join`: the worker exits within ≤250 ms of the shutdown signal and the OS reaps the thread on process exit. Tracking a JoinHandle would add refcount machinery for marginal benefit.
  • Capacity-0 rendezvous semantics on the job channel are unchanged, so in-flight job behavior is identical to before.

Generated with Claude Code

@CGMossa
Copy link
Copy Markdown
Collaborator Author

CGMossa commented Apr 17, 2026

We should consider depending on the rust crates ctor/dtor instead of relying on atexit. Althought this is a perfectly acceptable approach as is.

…103)

The worker thread was blocked on `mpsc::recv()` forever: the static
`JOB_TX: OnceLock<SyncSender<AnyJob>>` was never dropped, so `recv`
never saw `Disconnected`, and `OnceLock` has no `take()` to let us
signal otherwise. On process exit the worker stayed parked in a
blocked syscall — harmless on Linux/macOS because `_exit` terminates
threads, but a latent hang risk on Windows under `system2` pipe
capture.

Fix, defense-in-depth:

- Replace the blocking `while let Ok(job) = job_rx.recv()` loop with a
  `recv_timeout(250ms)` loop that polls a new
  `WORKER_SHOULD_STOP: AtomicBool`. Idle workers wake at most 4 times
  per second; queued jobs dispatch immediately (timeout only fires
  when nothing is in flight).
- Add `miniextendr_runtime_shutdown()` — public, idempotent — that
  flips the flag. Without the `worker-thread` feature it's a no-op.
- Call sites:
  - `miniextendr_runtime_init` registers a libc `atexit` hook, so
    graceful R session shutdown (`q()`) wakes the worker even when
    `.onUnload` doesn't fire. atexit can be flaky on Windows but is
    additive — it only helps, never hurts.
  - `miniextendr_init!` now also generates `R_unload_<pkg>` which R
    calls on `detach(unload=TRUE)` / `dyn.unload()`, calling
    `miniextendr_runtime_shutdown()`.

Re-entry guard and all existing worker tests still green. Capacity-0
rendezvous semantics on the job channel are unchanged, so in-flight
behavior is identical to before.

Keeps OS thread cleanup implicit: we don't track a `JoinHandle` or
`thread::join` — on process exit the OS reaps the thread; on package
unload the thread exits within ≤250 ms of the shutdown signal.

Closes #103.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CGMossa CGMossa force-pushed the fix/issue-103-worker-shutdown branch from e118f20 to 911bc44 Compare April 17, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker thread blocks on recv() — no clean shutdown on process exit

1 participant