refactor: nibble-pack relstatus arrays to halve their memory footprint#125
Merged
Merged
Conversation
Each relstatus row stored one SDC_STS_* value per byte. Since valid
values are 1–6 (easily fitting in 4 bits), two values can be packed
per byte. sdcRAAllocRow now allocates (row - col0 + 2) / 2 bytes instead
of row - col0 + 1 bytes, and initialises each byte with two SDC_STS_SKIP
values packed into its nibbles. All reads and writes use new explicitly
static inline helpers: nibble_set and nibble_get.
Tested on the NZ national geodetic adjustment (national-geodetic-adjustment
repo, combo branch, commit 3cab048, 122k-station network via nga.snp):
ngah (NZGD2000) ngav (NZVD2016)
Golden (old): 20m21s 6.54 GB 30m23s 8.09 GB
single-cholesky: 21m01s 3.96 GB 32m13s 5.51 GB
nibble-pack: 19m52s 3.38 GB 29m02s 4.78 GB
Memory reduction is ~0.6 GB for both runs. Run times are within system
noise (range observed across multiple runs: ngah ~19–21 min, ngav ~29–32 min).
Output is identical to the Golden baseline for both runs.
The reduction appears similar for ngah and ngav because peak relstatus memory
is determined by the single largest order's relstatus footprint — this same
memory is reused across all orders, so cumulative allocation across orders
does not contribute to peak. The remaining ~1.4 GB gap between ngah and
ngav peaks is not attributable to relstatus.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
Looks good to me... you weren't tempted to pack into two bytes per value (efficiency consideration?) |
ccrook
approved these changes
Jun 23, 2026
Contributor
Author
This PR packs two values per byte. Are you asking why I didn't go for e.g. 3-bit packing and get 8 values per 3 bytes? If so, it's obviously more complicated and I thought it would be nice to allow room for some expansion of values in future (just in case) :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Each relstatus row stored one SDC_STS_* value per byte. Since valid values are 1–6 (easily fitting in 4 bits), two values can be packed per byte. sdcRAAllocRow now allocates (row - col0 + 2) / 2 bytes instead of row - col0 + 1 bytes, and initialises each byte with two SDC_STS_SKIP values packed into its nibbles. All reads and writes use new explicitly static inline helpers: nibble_set and nibble_get.
Tested on the NZ national geodetic adjustment (national-geodetic-adjustment repo, combo branch, commit 3cab048, 122k-station network via nga.snp):
Memory reduction is ~0.6 GB for both runs. Run times are within system noise (range observed across multiple runs: ngah ~19–21 min, ngav ~29–32 min). Output is identical to the Golden baseline for both runs.
The reduction appears similar for ngah and ngav because peak relstatus memory is determined by the single largest order's relstatus footprint — this same memory is reused across all orders, so cumulative allocation across orders does not contribute to peak. The remaining ~1.4 GB gap between ngah and ngav peaks is not attributable to relstatus.
GSR-955