Fix race condition in XNNPACK weights cache during concurrent init()

billmguo · facebook-github-bot · commit 4cfd4e562885 · 2026-05-19T15:06:15.000-07:00
Summary:
D105123995 replaced the compile-time `#ifdef ENABLE_XNNPACK_WEIGHTS_CACHE` gate with a runtime `bool use_weight_cache` flag. However, the `weights_cache_mutex_` lock in `XNNPACKBackend::init()` only covered the `initialize_for_runtime()` call — the subsequent `compileModel()` (which calls `load_unpacked_data()`, `xnn_create_runtime_v4()`, and `finalize_for_runtime()`) ran unlocked against the shared `XNNWeightsCache`.

When two XNNPACK methods load concurrently (e.g., CRIA loading multiple ExecuTorch methods on separate IO threads), the second thread's `initialize_for_runtime()` resets `is_finalized_` to `false` and overwrites `named_data_map_` while the first thread is mid-`compileModel`. This causes:
- `delete_packed_data()` to fail with "cache is not finalized"
- `load_unpacked_data()` to fail because `named_data_map_` was overwritten
- `compileModel` to fail with error `0x24`
- Warmup/prefill to fail with ExecuTorch runtime error 36

The fix extends the lock scope to cover the entire init-compile-finalize sequence, matching the pattern already used by `execute()` and `destroy()`.

This diff was authored with Claude.

Differential Revision: D105753995
diff --git a/backends/xnnpack/runtime/XNNPACKBackend.cpp b/backends/xnnpack/runtime/XNNPACKBackend.cpp
@@ -91,8 +91,13 @@ class XnnpackBackend final
     auto workspace = workspace_result.get();
 
     bool use_weight_cache = options_.resolve_weight_cache(context);
+    // Hold the lock for the entire init-compile-finalize sequence to prevent
+    // concurrent inits from resetting is_finalized_ or overwriting
+    // named_data_map_ while compileModel is using the shared weights cache.
+    std::unique_lock<std::mutex> lock_weights_cache(
+        weights_cache_mutex_, std::defer_lock);
     if (use_weight_cache) {
-      const std::lock_guard<std::mutex> lock_weight_cache(weights_cache_mutex_);
+      lock_weights_cache.lock();
       weights_cache_->initialize_for_runtime(
           context.get_runtime_allocator(), named_data_map);
     }