Hi,
I may have found a potential deadlock issue while integrating my own edge-split kernel with RXMesh dynamic topology.
In my setup, the kernel can hang inside cavity.prologue(), more specifically in CavityManager::migrate()->lock_neighbour_patches() during edge split.
From debugging, it seems that multiple adjacent patches may enter migrate()->lock_neighbour_patches() in the same round without an effective color-based gating step on the path I am using. Since each block has already acquired the lock for its own patch, and then tries to acquire multiple neighboring patch locks according to PatchStash slot order, the lock acquisition order is not globally consistent. This appears to create a circular-wait scenario, and in practice the kernel hangs before the split synchronization point.
My understanding is that the issue is not in the spin lock itself, but in the scheduling/rounding discipline:
only patches of the current color should be allowed to enter CavityManager in that round, and the host-side loop should drive the round with something like: num_inner_iter % rx.get_num_colors()
In cavity_manager.cuh:
struct CavityColorIteration
{
uint32_t value;
};
Then restore the color-based filter here
uint32_t color = INVALID32;
if (s_patch_id != INVALID32 && iteration.value != INVALID32) {
color = m_context.m_patches_info[s_patch_id].color;
if (color != iteration.value) {
push(s_patch_id);
s_patch_id = INVALID32;
}
}
In my local experiment, I introduced an explicit color-round argument to the CavityManager construction path used by my split kernel, so that the existing color filter could actually be applied on that path.
template <uint32_t blockThreads, CavityOp cop>
device forceinline CavityManager<blockThreads, cop>::CavityManager(
cooperative_groups::thread_block& block,
Context& context,
ShmemAllocator& shrd_alloc,
bool preserve_cavity,
CavityColorIteration iteration,
bool allow_touching_cavities,
uint32_t current_p)
and the process is like:
while (!rx->is_queue_empty()) {
if (++inner_iter > kMaxInnerIter) {
break;
}
const uint32_t color_iteration = inner_iter % num_colors;
printf("[pass %d] before split launch\n", inner_iter);
if (use_provenance) {
face_subdivide_split_with_provenance<float, blockThreads>
<<<lb_split.blocks,
lb_split.num_threads,
lb_split.smem_bytes_dyn>>>(
rx->get_context(),
*coords,
*v_boundary,
*edge_status,
*edge_flip,
*persistent_id,
*parents,
color_iteration,
debug_stage_limit,
debug_prologue_stage_limit,
debug_migrate_stage_limit);
} else {
face_subdivide_split<float, blockThreads>
<<<lb_split.blocks,
lb_split.num_threads,
lb_split.smem_bytes_dyn>>>(
rx->get_context(),
*coords,
*v_boundary,
*edge_status,
*edge_flip,
color_iteration,
debug_stage_limit,
debug_prologue_stage_limit,
debug_migrate_stage_limit);
}
CUDA_ERROR(cudaGetLastError());
printf("[pass %d] after split launch\n", inner_iter);
CUDA_ERROR(cudaDeviceSynchronize());
printf("[pass %d] after split sync\n", inner_iter);
printf("[pass %d] before cleanup1\n", inner_iter);
rx->cleanup();
printf("[pass %d] after cleanup1 call\n", inner_iter);
CUDA_ERROR(cudaDeviceSynchronize());
printf("[pass %d] after cleanup1 sync\n", inner_iter);
printf("[pass %d] before slice_patches\n", inner_iter);
if (use_provenance) {
rx->slice_patches(
*coords,
*v_boundary,
*edge_status,
*edge_flip,
*persistent_id,
*parents);
} else {
rx->slice_patches(*coords, *v_boundary, *edge_status, *edge_flip);
}
CUDA_ERROR(cudaDeviceSynchronize());
printf("[pass %d] after slice_patches\n", inner_iter);
printf("[pass %d] before cleanup2\n", inner_iter);
rx->cleanup();
CUDA_ERROR(cudaDeviceSynchronize());
printf("[pass %d] after cleanup2\n", inner_iter);
}
Please let me know if my understanding is incorrect. Thank you for your time and for maintaining RXMesh.
Hi,
I may have found a potential deadlock issue while integrating my own edge-split kernel with RXMesh dynamic topology.
In my setup, the kernel can hang inside
cavity.prologue(), more specifically inCavityManager::migrate()->lock_neighbour_patches()during edge split.From debugging, it seems that multiple adjacent patches may enter
migrate()->lock_neighbour_patches()in the same round without an effective color-based gating step on the path I am using. Since each block has already acquired the lock for its own patch, and then tries to acquire multiple neighboring patch locks according toPatchStashslot order, the lock acquisition order is not globally consistent. This appears to create a circular-wait scenario, and in practice the kernel hangs before the split synchronization point.My understanding is that the issue is not in the spin lock itself, but in the scheduling/rounding discipline:
only patches of the current color should be allowed to enter
CavityManagerin that round, and the host-side loop should drive the round with something like:num_inner_iter % rx.get_num_colors()In cavity_manager.cuh:
Then restore the color-based filter here
In my local experiment, I introduced an explicit color-round argument to the
CavityManagerconstruction path used by my split kernel, so that the existing color filter could actually be applied on that path.and the process is like:
Please let me know if my understanding is incorrect. Thank you for your time and for maintaining RXMesh.