fix: mitigate dense bltreq matrix allocation causing OOM for large networks#123
Merged
Conversation
…tworks create_bltmatrix() sets nsparse=0. blt_requested_size() reads nsparse=0 and sets all r->req=0 as a side effect, destroying the values accumulated by the pair-loop. copy_bltmatrix() then calls copy_bltmatrix_bandwidth() which sets nsparse=nrow, followed by alloc_bltrow_arrays() which allocates i+1-r->req elements per row — full width since r->req=0. For the NZ network (122,317 stations, 137,801-row matrix) this produces a ~52 GB allocation. Fix: call blt_set_sparse_rows(bltreq, nrow) immediately after creation. This one-line fix sets nsparse=nrow so blt_requested_size() does not fire the destructive branch and the pair-loop's req values are preserved. The resulting row widths are still correct. The pair-loop calls blt_nonzero_element() for each station pair that needs a covariance element, which only ever decreases req (widens a row), so requests accumulate correctly into bltreq regardless of nsparse. copy_bltmatrix() in relacc_calc_requested_covar() then seeds bltreq with bltdec's bandwidth via copy_bltmatrix_bandwidth() before allocating, so the final blt is allocated at exactly UNION(pair-loop requests, bltdec bandwidth) — neither full-width nor diagonal-only. A follow-up refactor (refactor/snapspec-single-cholesky-creation) will avoid holding the Cholesky factor (bltdec) in memory alongside the inverted covariance matrix (blt), reducing peak memory further. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "Matrix size" log line previously showed 55 elements (100.00% full) because blt_requested_size() had a destructive side effect: with nsparse=0 (the default from create_bltmatrix), it zeroed all req values before counting, making every row appear full-width. The fix in this branch (blt_set_sparse_rows(bltreq, nrow) immediately after create_bltmatrix) sets nsparse=nrow, so blt_requested_size() no longer fires the zeroing branch. The logged size now reflects the actual pair-loop bandwidth: 16 elements (29.09% full) for these test cases. The station order assignments and all other output are unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
create_bltmatrix()setsnsparse=0.blt_requested_size()readsnsparse=0and sets allr->req=0as a side effect, destroying the values accumulated by the pair-loop.copy_bltmatrix()then callscopy_bltmatrix_bandwidth()which setsnsparse=nrow, followed byalloc_bltrow_arrays()which allocatesi+1-r->reqelements per row — full width sincer->req=0. For the NZ network (122,317 stations, 137,801-row matrix) this produces a ~52 GB allocation.Fix: call
blt_set_sparse_rows(bltreq, nrow)immediately after creation. This one-line fix setsnsparse=nrowsoblt_requested_size()does not fire the destructive branch and the pair-loop's req values are preserved.The resulting row widths are still correct. The pair-loop calls
blt_nonzero_element()for each station pair that needs a covariance element, which only ever decreasesreq(widens a row), so requests accumulate correctly intobltreqregardless ofnsparse.copy_bltmatrix()inrelacc_calc_requested_covar()then seedsbltreqwithbltdec's bandwidth viacopy_bltmatrix_bandwidth()before allocating, so the finalbltis allocated at exactlyUNION(pair-loop requests, bltdec bandwidth)— neither full-width nor diagonal-only.A follow-up refactor (refactor/snapspec-single-cholesky-creation) will avoid holding the Cholesky factor (
bltdec) in memory alongside the inverted covariance matrix (blt), reducing peak memory further.