Skip to content

Remove //base threading & scheduling dependency #13

@domfarolino

Description

@domfarolino

As described in the Mage documentation, specifically in https://github.com/domfarolino/mage#threading--task-scheduling, Mage has a strong dependency on the //base threading & scheduling library which was written with Mage in mind. We have a goal of decoupling this library from //base for this reason:

  • For Mage to be an unopinionated, drop-in IPC utility for any application that wants seamless IPC, it must integrate with applications that utilize arbitrary threading & scheduling providers (that meet a certain baseline API criteria). We don't want to mandate that Mage embedders must use the //base library for all of their scheduling — the integration cost would be too high, and that's generally a bad design goal.

Technically we don't have to remove full reliance on the //base library; if Mage continues to depend on //base for random utilities like CHECK(), that wouldn't prevent the project from being useful and moving forward — it would just be unnecessary. So specifically, the goal is to remove Mage's reliance on the threading & scheduling components of //base (which of course is the primary thing that library is responsible for, but still being precise about this is important).

Decoupling Mage from //base requires making Mage generically thread-safe. Today, Mage is only partially thread-safe. Specifically:

  1. ✅ Some parts are completely thread-safe, like:
  2. 🟠 Some parts are thread-safe only by virtue of the way we use //base; we assert that some methods are only ever called on certain well-named threads, so we don't require locks in some places, because the assertions ensure some data structures are only ever handled on threads that are implicitly safe
    • CreateMessagePipesAndGetEndpoints() is an example of this, where we modify Node::local_endpoints_ without locking access to it, because we only ever add to it on the main thread. Even this might not be fully true actually, but that is the intent, which is simply insufficient once these sorts of operations can truly be called from arbitrary threads (within what the documented Mage API allows for)
  3. ❌ Some parts can today be called on arbitrary threads, but literally aren't protected by any lock or anything. These would need to be fixed even if we kept the //base dependency; these are just bugs plain and simple
    • Node::pending_invitations_ map: insertions are made on the UI thread when an invitation is sent, and removals/look-ups happen on the IO thread when acceptance is received. For a single invitation this is "safe" because receipt of an invitation can only ever follow the sending of an invitation (which includes the insertion which must be finished at that point). But when sending many invitations, some acceptances can be received while others are still being sent. This leads to thread-unsafe mutations of the map
    • Channel's use of write() is not locked based on socket. This means multiple messages being sent in parallel to the same socket (bound for different endpoints, perhaps) can lead to corrupted/interleaving messages

In order to decouple Mage from the threading & scheduling infrastructure that //base provides, we have to fix the 🟠 and ❌ cases — all of them, not just the examples listed above — and at the same time document which methods of the Mage public API are thread-safe, vs. which group of APIs must be called on the same thread (if any of these exist).

Metadata

Metadata

Assignees

No one assigned

    Labels

    threading//base threading infra

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions