refactor: optimize VTK conversion#167
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the loop-based ibits function with a bitwise implementation and optimizes connectivity generation by precomputing a node-to-global-block mapping to avoid repeated function calls. Review feedback identifies a bug in the bitwise logic for 32-bit shifts and suggests a more efficient single-pass approach for the precomputation logic to improve performance on large datasets.
Benchmark Results (Julia v1)Time benchmarks
Memory benchmarks
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #167 +/- ##
==========================================
- Coverage 83.05% 82.86% -0.20%
==========================================
Files 21 21
Lines 4155 4166 +11
==========================================
+ Hits 3451 3452 +1
- Misses 704 714 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds VTK benchmarks and optimizes the getConnectivity function by replacing repeated calls to nodeToGlobalBlock with a pre-calculated mapping. The ibits function was also refactored to use bitwise operations. Feedback suggests optimizing the nBlock_P calculation to avoid quadratic complexity relative to processors and nodes, making the ibits mask generic for 64-bit integers, and using in-place sorting to minimize heap allocations.
Description: Optimize VTK IO Performance and Connectivity Generation
This PR refactors the VTK connectivity generation logic in
src/vtk.jlto address performance bottlenecks related to tree traversal and global block indexing. The changes significantly reduce heap allocations and improve execution speed, particularly for large AMR datasets.Key Changes
Bitwise
ibitsOptimization:digits()-based bit extraction with efficient bitwise shifts:(i >> pos) & ((Int32(1) << len) - 1).nodeToGlobalBlockwith a precomputed mapping arraynodeToGlobalBlock_I.getConnectivitythat maps tree nodes to their global block IDs in constant time.Refactored
getConnectivityLoop:@inboundsand@viewwhere appropriate.Performance Improvements
Benchmarks performed on
3d_mhd_amrdataset (136 leaf blocks):find_grid_block(Median Time)getConnectivity(Median Time)Note: For production datasets with 1,000+ blocks, the $O(1)$ indexing provides an order-of-magnitude speedup as it avoids repeated $O(N)$ tree scans for every ghost cell.
Validation
test/tests_io.jlpass.c6c5a65a...) with the baseline regression data.