-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[MISC] Add register-only tiled cholesky, and incremental H patching, for performance. #2659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hughperkins
wants to merge
67
commits into
Genesis-Embodied-AI:main
Choose a base branch
from
hughperkins:hp/incremental-hessian-fuse-chol-tiles-api
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 19 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
f9244a4
Fix H patching: separate nt_L for Cholesky factor, nt_H keeps Hessian
hughperkins 2e4778b
Switch to per-env adaptive H patching decision
hughperkins 70f9374
cholesky: register-resident blocked factorization with 16x16 shuffle …
hughperkins 01f3cbe
cholesky: add identity padding for non-multiple-of-16 DOF counts
hughperkins d0aca95
cholesky: fix padding — inline bounds checks, remove array-passing qd…
hughperkins ee9335b
bump quadrants to 0.6.0b1, fix gpu_graph -> graph rename
hughperkins b6ba7fe
tile16 Cholesky writes L to nt_L, reads H from nt_H
hughperkins 8348d06
fused Cholesky+Solve: L in shared memory, +21% dex_hand
hughperkins fcf3f8e
revert batch Cholesky/solve/backward to use nt_H in-place
hughperkins 348a822
fix fused kernel: load/store all n_dofs elements, not just 16
hughperkins 52927a0
eliminate nt_L: single tensor nt_H for all paths
hughperkins 2030a00
fix H patching: restrict to fused/tiled path, force full rebuild on f…
hughperkins 016390c
fix test_cholesky_tiling: compare qacc instead of nt_H
hughperkins 37cd1c5
gate use_full_hessian check behind qd.static template parameter
hughperkins 9ebf712
fix test_cholesky_tiling: compare qpos over 10 steps
hughperkins 80f6d45
rename tile BLAS primitives: _tile_syr_sub_16, _tile_ger_sub_16
hughperkins bac01c7
add instrumented bp6 benchmark to diagnose performance regression
hughperkins 9c8c986
Migrate tiled Cholesky to quadrants Tile16 API
hughperkins 0b9a67c
remove superfluous diagnostic benchmark scripts
hughperkins f00edd7
pre-commit
hughperkins 1de2b6c
restore and update func_cholesky_factor_direct_tiled docstring
hughperkins 36be574
extract _butterfly_reduce_16 helper for subgroup shuffle reduction
hughperkins 459c235
restore nt_H re-purpose comment in ConstraintState
hughperkins aeda794
use make_tile16(gs.qd_float) for precision-correct Tile16 registers
hughperkins 7b45735
6.0b4
hughperkins 0356dce
Merge branch 'hp/incremental-hessian-fuse-chol-tiles-api' of github.c…
hughperkins eabefd9
migrate solver to new Tile16 API: slice syntax, outer, cholesky_, eye…
hughperkins f2225dc
Replace _CHOL_TILE global with Tile16.SIZE
hughperkins 61a70e7
Rename Tile16 to Tile16x16
hughperkins 1230f65
Use Tile16x16.zeros() instead of Tile16x16()
hughperkins 735cff4
Use Tile16x16.eye() and Tile16x16.zeros() factories in solver
hughperkins aa32f7d
8
hughperkins 5fcf106
Merge remote-tracking branch 'origin/main' into hp/incremental-hessia…
hughperkins d5cb5f1
Fix Tile16x16 scoping: declare tiles before if/else blocks
hughperkins 716ee91
test_cholesky_tiling: compare Mgrad with iterations=1 instead of qpos…
hughperkins e591827
Update tile stores to SharedArray to use new slice syntax
hughperkins c840edb
test_cholesky_tiling: compare Mgrad with iterations=1 instead of qpos…
hughperkins 9c1690d
bump quadrants dependency to 0.6.0b9
hughperkins 2234bd0
Refactor butterfly reduction to use qd.static loop
hughperkins fcafdb5
test_cholesky_tiling: require Mgrad norm > 5.0 instead of > 0.0
hughperkins 6f90d8a
Shorten fused cholesky+solve docstring
hughperkins 60f2af9
update tolernace
hughperkins f6df0a5
add link to script
hughperkins 3583394
11
hughperkins e1be020
Merge remote-tracking branch 'origin/main' into hp/incremental-hessia…
hughperkins 9561149
move analysis to pr
hughperkins 5654141
Rewrap comments in changed lines to 120-char line width
hughperkins ef1c3e2
Clarify _func_patch_hessian_delta docstring
hughperkins 7bfed26
Add block-level comments to Cholesky factorization functions
hughperkins 4c01929
Restore +1 shared memory padding on L_sh to avoid bank conflicts
hughperkins 02ff753
Explain sequential column-block dependency in Cholesky factorization …
hughperkins a4943cf
Fix Cholesky comment: off-diagonal rows are sequential, not parallel
hughperkins ae548f4
Fix import order: group bare imports before from-imports
hughperkins 5fe6d2e
Use qd.types.Tile16x16(dtype=...) instead of make_tile16x16 import
hughperkins cc4d140
Use VecSliceProxy syntax for column-vector loads in Cholesky kernels
hughperkins 56a8aa6
Clarify comment: triangular solve uses scalar rows, not tiles
hughperkins cc978ea
12
hughperkins a1ec2fb
Merge remote-tracking branch 'origin/main' into hp/incremental-hessia…
hughperkins 4b49f2a
Merge remote-tracking branch 'origin/main' into hp/incremental-hessia…
hughperkins 7cb0e3b
fix: use qd.simt.Tile16x16 proxy API instead of removed qd.types.Tile…
hughperkins 17ec942
fix: use _eye_() instead of eye_() — method is private in Tile16x16
hughperkins d803919
fix: use qd.simt.Tile16x16 directly in kernels to avoid purity violation
hughperkins 6b90dc5
fix: inline tile size constant 16 to avoid purity-checker rejection
hughperkins 7714b96
Merge remote-tracking branch 'origin/main' into hp/incremental-hessia…
hughperkins e661aa3
00~v0.6.3b301~
hughperkins 86c3233
0.6.3
hughperkins adbb55c
precommit
hughperkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.