Potential deadlock in CavityManager::migrate() without color-based scheduling

Hi,


I may have found a potential deadlock issue while integrating my own edge-split kernel with RXMesh dynamic topology.

In my setup, the kernel can hang inside `cavity.prologue()`, more specifically in `CavityManager::migrate()->lock_neighbour_patches()` during edge split.

From debugging, it seems that multiple adjacent patches may enter `migrate()->lock_neighbour_patches()` in the same round without an effective color-based gating step on the path I am using. Since each block has already acquired the lock for its own patch, and then tries to acquire multiple neighboring patch locks according to `PatchStash` slot order, the lock acquisition order is not globally consistent. This appears to create a circular-wait scenario, and in practice the kernel hangs before the split synchronization point.

My understanding is that the issue is not in the spin lock itself, but in the scheduling/rounding discipline:
only patches of the current color should be allowed to enter `CavityManager` in that round, and the host-side loop should drive the round with something like: `num_inner_iter % rx.get_num_colors()`


In cavity_manager.cuh:
```cpp
struct CavityColorIteration
{
    uint32_t value;
};
```
Then restore the color-based filter [here](https://github.com/owensgroup/RXMesh/blob/725a935ad2dba0875be536ab53f0c06706dbe957/include/rxmesh/cavity_manager_impl.cuh#L62)
```cpp
uint32_t color = INVALID32;
if (s_patch_id != INVALID32 && iteration.value != INVALID32) {
    color = m_context.m_patches_info[s_patch_id].color;
    if (color != iteration.value) {
        push(s_patch_id);
        s_patch_id = INVALID32;
    }
}
```
In my local experiment, I introduced an explicit color-round argument to the `CavityManager` construction path used by my split kernel, so that the existing color filter could actually be applied on that path.
```cpp
template <uint32_t blockThreads, CavityOp cop>
device forceinline CavityManager<blockThreads, cop>::CavityManager(
cooperative_groups::thread_block& block,
Context& context,
ShmemAllocator& shrd_alloc,
bool preserve_cavity,
CavityColorIteration iteration,
bool allow_touching_cavities,
uint32_t current_p)
```

and the process is like:
```cpp
        while (!rx->is_queue_empty()) {
            if (++inner_iter > kMaxInnerIter) {
                break;
            }
            const uint32_t color_iteration = inner_iter % num_colors;
            printf("[pass %d] before split launch\n", inner_iter);
            if (use_provenance) {
                face_subdivide_split_with_provenance<float, blockThreads>
                    <<<lb_split.blocks,
                       lb_split.num_threads,
                       lb_split.smem_bytes_dyn>>>(
                        rx->get_context(),
                        *coords,
                        *v_boundary,
                        *edge_status,
                        *edge_flip,
                        *persistent_id,
                        *parents,
                        color_iteration,
                        debug_stage_limit,
                        debug_prologue_stage_limit,
                        debug_migrate_stage_limit);
            } else {
                face_subdivide_split<float, blockThreads>
                    <<<lb_split.blocks,
                       lb_split.num_threads,
                       lb_split.smem_bytes_dyn>>>(
                        rx->get_context(),
                        *coords,
                        *v_boundary,
                        *edge_status,
                        *edge_flip,
                        color_iteration,
                        debug_stage_limit,
                        debug_prologue_stage_limit,
                        debug_migrate_stage_limit);
            }
            CUDA_ERROR(cudaGetLastError());
            printf("[pass %d] after split launch\n", inner_iter);
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after split sync\n", inner_iter);

            printf("[pass %d] before cleanup1\n", inner_iter);
            rx->cleanup();

            printf("[pass %d] after cleanup1 call\n", inner_iter);

            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after cleanup1 sync\n", inner_iter);

            printf("[pass %d] before slice_patches\n", inner_iter);
            if (use_provenance) {
                rx->slice_patches(
                    *coords,
                    *v_boundary,
                    *edge_status,
                    *edge_flip,
                    *persistent_id,
                    *parents);
            } else {
                rx->slice_patches(*coords, *v_boundary, *edge_status, *edge_flip);
            }
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after slice_patches\n", inner_iter);
            printf("[pass %d] before cleanup2\n", inner_iter);
            rx->cleanup();
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after cleanup2\n", inner_iter);
        }
```


Please let me know if my understanding is incorrect. Thank you for your time and for maintaining RXMesh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential deadlock in CavityManager::migrate() without color-based scheduling #74

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Potential deadlock in CavityManager::migrate() without color-based scheduling #74

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions