Skip to content

refactor: optimize VTK conversion#167

Merged
henry2004y merged 6 commits into
masterfrom
refactor-vtk
May 18, 2026
Merged

refactor: optimize VTK conversion#167
henry2004y merged 6 commits into
masterfrom
refactor-vtk

Conversation

@henry2004y

Copy link
Copy Markdown
Owner

Description: Optimize VTK IO Performance and Connectivity Generation

This PR refactors the VTK connectivity generation logic in src/vtk.jl to address performance bottlenecks related to tree traversal and global block indexing. The changes significantly reduce heap allocations and improve execution speed, particularly for large AMR datasets.

Key Changes

  1. Bitwise ibits Optimization:

    • Replaced the allocation-heavy digits()-based bit extraction with efficient bitwise shifts: (i >> pos) & ((Int32(1) << len) - 1).
    • This eliminates thousands of small string/array allocations during Morton tree traversal.
  2. $O(1)$ Global Block Lookups:

    • Replaced the original $O(N)$ linear search in nodeToGlobalBlock with a precomputed mapping array nodeToGlobalBlock_I.
    • Introduced a localized precomputation phase in getConnectivity that maps tree nodes to their global block IDs in constant time.
  3. Refactored getConnectivity Loop:

    • Optimized the dual-round connectivity generation to use @inbounds and @view where appropriate.
    • Removed redundant redundant tree searches during ghost-cell mapping.

Performance Improvements

Benchmarks performed on 3d_mhd_amr dataset (136 leaf blocks):

Metric Original Optimized Improvement
find_grid_block (Median Time) 353 ns 245 ns ~30% faster
getConnectivity (Median Time) 496 µs 320 µs ~35% faster
Total Allocations ~50k ~38k -24%
Heap Memory 2.62 MiB 1.78 MiB -32%

Note: For production datasets with 1,000+ blocks, the $O(1)$ indexing provides an order-of-magnitude speedup as it avoids repeated $O(N)$ tree scans for every ghost cell.

Validation

  • Regression Testing: All tests in test/tests_io.jl pass.
  • Output Parity: Verified that generated VTK connectivity maintains exact SHA256 hash parity (c6c5a65a...) with the baseline regression data.
  • AMR Level Jumps: Confirmed that coordinate mapping for same-level and coarser neighbors correctly handles T-junctions in the VTK unstructured grid output.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the loop-based ibits function with a bitwise implementation and optimizes connectivity generation by precomputing a node-to-global-block mapping to avoid repeated function calls. Review feedback identifies a bug in the bitwise logic for 32-bit shifts and suggests a more efficient single-pass approach for the precomputation logic to improve performance on large datasets.

Comment thread src/vtk.jl
Comment thread src/vtk.jl
@github-actions

github-actions Bot commented May 16, 2026

Copy link
Copy Markdown

Benchmark Results (Julia v1)

Time benchmarks
master f8aa842... master / f8aa842...
amrex/load 25.6 ± 0.61 μs 25.4 ± 0.54 μs 1.01 ± 0.032
amrex/phase_space_3d 6.3 ± 0.85 ms 6.24 ± 1.4 ms 1.01 ± 0.27
amrex/select_region 0.251 ± 0.017 ms 0.251 ± 0.015 ms 1 ± 0.09
amrex/select_region_from_files 0.852 ± 0.15 ms 0.866 ± 0.16 ms 0.984 ± 0.26
read/ASCII 0.69 ± 0.013 ms 0.663 ± 0.015 ms 1.04 ± 0.03
read/Anisotropy 7.63 ± 1.5 μs 9.13 ± 1.6 μs 0.836 ± 0.22
read/Current density 10.1 ± 0.45 μs 9.56 ± 0.41 μs 1.06 ± 0.066
read/Current density 3D 10.5 ± 0.33 μs 10.5 ± 0.33 μs 1 ± 0.045
read/Current density 3D Jx 5.3 ± 0.11 μs 5.32 ± 0.11 μs 0.996 ± 0.029
read/Current density Jz 8.19 ± 0.13 μs 7.53 ± 0.08 μs 1.09 ± 0.021
read/Cutdir 2.25 ± 0.18 μs 2.26 ± 0.2 μs 0.991 ± 0.12
read/Cutdir subset 3.1 ± 0.24 μs 3.11 ± 0.25 μs 1 ± 0.11
read/Extract Bmag 0.521 ± 0.11 μs 0.491 ± 0.1 μs 1.06 ± 0.31
read/HDF5 0.112 ± 0.0038 ms 0.112 ± 0.0037 ms 0.999 ± 0.047
read/HDF5 extract 14.1 ± 0.37 μs 14.1 ± 0.34 μs 1 ± 0.036
read/Interp2d 1.07 ± 0.16 μs 0.791 ± 0.19 μs 1.35 ± 0.38
read/Load binary structured 0.0474 ± 0.026 ms 0.0467 ± 0.0065 ms 1.02 ± 0.58
time_to_load 1.26 ± 0.003 s 1.25 ± 0.00087 s 1 ± 0.0025
Memory benchmarks
master f8aa842... master / f8aa842...
amrex/load 0.212 k allocs: 9.67 kB 0.212 k allocs: 9.67 kB 1
amrex/phase_space_3d 0.059 k allocs: 18.3 MB 0.059 k allocs: 18.3 MB 1
amrex/select_region 3 allocs: 1.07 MB 3 allocs: 1.07 MB 1
amrex/select_region_from_files 0.05 k allocs: 7.48 MB 0.05 k allocs: 7.48 MB 1
read/ASCII 0.395 k allocs: 0.109 MB 0.395 k allocs: 0.109 MB 1
read/Anisotropy 3 allocs: 4.84 kB 3 allocs: 4.84 kB 1
read/Current density 10 allocs: 12.5 kB 10 allocs: 12.5 kB 1
read/Current density 3D 10 allocs: 6.65 kB 10 allocs: 6.65 kB 1
read/Current density 3D Jx 4 allocs: 2.23 kB 4 allocs: 2.23 kB 1
read/Current density Jz 4 allocs: 4.18 kB 4 allocs: 4.18 kB 1
read/Cutdir 0.137 k allocs: 5.98 kB 0.137 k allocs: 5.98 kB 1
read/Cutdir subset 0.159 k allocs: 8.89 kB 0.159 k allocs: 8.89 kB 1
read/Extract Bmag 3 allocs: 4.09 kB 3 allocs: 4.09 kB 1
read/HDF5 0.1 k allocs: 3.67 kB 0.1 k allocs: 3.67 kB 1
read/HDF5 extract 17 allocs: 4.55 kB 17 allocs: 4.55 kB 1
read/Interp2d 12 allocs: 4.45 kB 12 allocs: 4.45 kB 1
read/Load binary structured 0.144 k allocs: 0.0743 MB 0.144 k allocs: 0.0743 MB 1
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

@codecov

codecov Bot commented May 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.86%. Comparing base (1095c9d) to head (f8aa842).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   83.05%   82.86%   -0.20%     
==========================================
  Files          21       21              
  Lines        4155     4166      +11     
==========================================
+ Hits         3451     3452       +1     
- Misses        704      714      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henry2004y

Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds VTK benchmarks and optimizes the getConnectivity function by replacing repeated calls to nodeToGlobalBlock with a pre-calculated mapping. The ibits function was also refactored to use bitwise operations. Feedback suggests optimizing the nBlock_P calculation to avoid quadratic complexity relative to processors and nodes, making the ibits mask generic for 64-bit integers, and using in-place sorting to minimize heap allocations.

Comment thread src/vtk.jl
Comment thread src/vtk.jl Outdated
Comment thread src/vtk.jl Outdated
@henry2004y henry2004y merged commit 5c0eba6 into master May 18, 2026
8 checks passed
@henry2004y henry2004y deleted the refactor-vtk branch May 18, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant