Skip to content

Potential deadlock in CavityManager::migrate() without color-based scheduling #74

Description

@will-zzy

Hi,

I may have found a potential deadlock issue while integrating my own edge-split kernel with RXMesh dynamic topology.

In my setup, the kernel can hang inside cavity.prologue(), more specifically in CavityManager::migrate()->lock_neighbour_patches() during edge split.

From debugging, it seems that multiple adjacent patches may enter migrate()->lock_neighbour_patches() in the same round without an effective color-based gating step on the path I am using. Since each block has already acquired the lock for its own patch, and then tries to acquire multiple neighboring patch locks according to PatchStash slot order, the lock acquisition order is not globally consistent. This appears to create a circular-wait scenario, and in practice the kernel hangs before the split synchronization point.

My understanding is that the issue is not in the spin lock itself, but in the scheduling/rounding discipline:
only patches of the current color should be allowed to enter CavityManager in that round, and the host-side loop should drive the round with something like: num_inner_iter % rx.get_num_colors()

In cavity_manager.cuh:

struct CavityColorIteration
{
    uint32_t value;
};

Then restore the color-based filter here

uint32_t color = INVALID32;
if (s_patch_id != INVALID32 && iteration.value != INVALID32) {
    color = m_context.m_patches_info[s_patch_id].color;
    if (color != iteration.value) {
        push(s_patch_id);
        s_patch_id = INVALID32;
    }
}

In my local experiment, I introduced an explicit color-round argument to the CavityManager construction path used by my split kernel, so that the existing color filter could actually be applied on that path.

template <uint32_t blockThreads, CavityOp cop>
device forceinline CavityManager<blockThreads, cop>::CavityManager(
cooperative_groups::thread_block& block,
Context& context,
ShmemAllocator& shrd_alloc,
bool preserve_cavity,
CavityColorIteration iteration,
bool allow_touching_cavities,
uint32_t current_p)

and the process is like:

        while (!rx->is_queue_empty()) {
            if (++inner_iter > kMaxInnerIter) {
                break;
            }
            const uint32_t color_iteration = inner_iter % num_colors;
            printf("[pass %d] before split launch\n", inner_iter);
            if (use_provenance) {
                face_subdivide_split_with_provenance<float, blockThreads>
                    <<<lb_split.blocks,
                       lb_split.num_threads,
                       lb_split.smem_bytes_dyn>>>(
                        rx->get_context(),
                        *coords,
                        *v_boundary,
                        *edge_status,
                        *edge_flip,
                        *persistent_id,
                        *parents,
                        color_iteration,
                        debug_stage_limit,
                        debug_prologue_stage_limit,
                        debug_migrate_stage_limit);
            } else {
                face_subdivide_split<float, blockThreads>
                    <<<lb_split.blocks,
                       lb_split.num_threads,
                       lb_split.smem_bytes_dyn>>>(
                        rx->get_context(),
                        *coords,
                        *v_boundary,
                        *edge_status,
                        *edge_flip,
                        color_iteration,
                        debug_stage_limit,
                        debug_prologue_stage_limit,
                        debug_migrate_stage_limit);
            }
            CUDA_ERROR(cudaGetLastError());
            printf("[pass %d] after split launch\n", inner_iter);
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after split sync\n", inner_iter);

            printf("[pass %d] before cleanup1\n", inner_iter);
            rx->cleanup();

            printf("[pass %d] after cleanup1 call\n", inner_iter);

            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after cleanup1 sync\n", inner_iter);

            printf("[pass %d] before slice_patches\n", inner_iter);
            if (use_provenance) {
                rx->slice_patches(
                    *coords,
                    *v_boundary,
                    *edge_status,
                    *edge_flip,
                    *persistent_id,
                    *parents);
            } else {
                rx->slice_patches(*coords, *v_boundary, *edge_status, *edge_flip);
            }
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after slice_patches\n", inner_iter);
            printf("[pass %d] before cleanup2\n", inner_iter);
            rx->cleanup();
            CUDA_ERROR(cudaDeviceSynchronize());
            printf("[pass %d] after cleanup2\n", inner_iter);
        }

Please let me know if my understanding is incorrect. Thank you for your time and for maintaining RXMesh.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions