Fix `isblasmatrix` for GPUArrays by kshyatt · Pull Request #59 · QuantumKitHub/Strided.jl

kshyatt · 2026-05-13T10:12:41Z

This will help us use the new support for generic GPUArray strided views in a way that bypasses some really awful ambiguity warnings.

github-actions · 2026-05-13T10:13:48Z

Your PR no longer requires formatting changes. Thank you for your contribution!

lkdvos

This doesn't fully fix the issue I think, that code path shouldn't ever be reached by the GPU arrays since it is guarded by an isblasmatrix call that checks pointer(A) isa Ptr.

I think this really needs a more proper rewrite that dispatches to a gemm function that then indeed can determine the proper driver.
Note also that the current fallback is using the Strided machinery to manually write out the kernel, which is actually equivalent to what the generic_matmatmul! function does anyways

kshyatt · 2026-05-13T11:58:24Z

pointer(A) isa Ptr is true for ROCArrays, so this code path is definitely reached by GPUArrays because it was erroring

kshyatt · 2026-05-13T12:48:59Z

This doesn't fully fix the issue I think

It seems to have unblocked the v1 AMD stuff on TO 🤷 . But if someone wants to make a higher performance version, go ahead. The rocBLAS gemm doesn't work well if the stride in the first dimension isn't 1, I think.

codecov · 2026-05-13T12:58:16Z

Codecov Report

❌ Patch coverage is 32.14286% with 19 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
ext/StridedAMDGPUExt.jl	0.00%	9 Missing ⚠️
ext/StridedcuBLASExt.jl	0.00%	9 Missing ⚠️
ext/StridedGPUArraysExt.jl	83.33%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/linalg.jl	`70.10% <100.00%> (+47.99%)`	⬆️
ext/StridedGPUArraysExt.jl	`49.25% <83.33%> (+3.35%)`	⬆️
ext/StridedAMDGPUExt.jl	`0.00% <0.00%> (ø)`
ext/StridedcuBLASExt.jl	`0.00% <0.00%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lkdvos · 2026-05-13T13:53:42Z

Would it then not make more sense to fix the isblasmatrix implementation? That at least treats CUDA and AMD equally then.

kshyatt · 2026-05-13T13:54:56Z

So I think the CUDA one doesn't ever touch this because the result of pointer(A) there is a CuPtr, NOT a Ptr 🙃 . Not sure about Metal?

lkdvos · 2026-05-13T17:54:19Z

Yes, exactly, I think my argument is to either:
A. Make AMD also not touch this so it is treated the same as the other backends
B. Make every single one either dispatch to gemm if possible, and if not use the Strided implementation, similar to how the current CPU version works.

kshyatt · 2026-06-02T20:36:01Z

Finally got back to this, looks like it should be ok now?

kshyatt · 2026-06-04T14:28:09Z

After a Zulip discussion, we decided to try inserting a new blas_mul! function we can override where appropriate for the GPU arrays, allowing us to pass to the efficient vendor libraries where we can. I've added extensions for this.

lkdvos

Only have some comments about code coverage, otherwise looks very clean!

kshyatt · 2026-06-05T18:33:25Z

Added some more tests to trigger the fallback to the GPUArrays kernels

lkdvos · 2026-06-06T03:17:24Z

Can we just exhaustively do all the mul cases that trigger the BLAS path too? I don't think we're using the adjoint path right now for example

lkdvos · 2026-06-07T18:06:50Z

I apologize strongly for this very inefficient back-and-forth, I'll be at my desk again next week so then things should hopefully be better. I think now the non-BLAS codepaths are indeed checked, but I was really mostly talking about exhaustively testing the various BLAS paths with the different values of transposed, conjugated etc, since that seems like the more brittle part (basically replacing the randperm tests with a test for each perm).
I should find some time to do this tomorrow if it's not clear what I mean, I apologize to keep dragging this on.

kshyatt · 2026-06-07T18:08:13Z

It's cool, no worries. I can also test a BLAS one (with all 1 strides) for each f1, f2 combo?

lkdvos · 2026-06-07T18:19:52Z

Yeah, I think I was mostly worried about the transpose and adjoint versions, where the getblasmatrix would generate non-trivial 'T' or 'A' characters etc.

kshyatt · 2026-06-07T18:21:09Z

I'll try to get to it tonight or tomorrow :)

kshyatt · 2026-06-08T07:06:58Z

Good call as I ended up finding and fixing a bug

lkdvos

Thanks for adding the tests!
Overall looks good to me, left a small note about the gemm call for AMD, but otherwise good to go!

lkdvos · 2026-06-08T13:02:16Z

+    A2a = Base.unsafe_wrap(ROCMatrix{T}, pointer(A2), size(A2))
+    B2a = Base.unsafe_wrap(ROCMatrix{T}, pointer(B2), size(B2))
+    C2a = Base.unsafe_wrap(ROCMatrix{T}, pointer(C2), size(C2))


I just checked the actual blas wrapper implementation, do you think it would be worth it to simply call that directly, so we can actually deal with the cases where the stride is not equal to the size? (https://github.com/JuliaGPU/AMDGPU.jl/blob/f49923a4c13b06325ff32696952b34d5ec73998f/src/blas/wrappers.jl#L547-L566)

Alternatively, we could also keep that for MatrixAlgebraKit and future implementations, I'm happy to already get this in as well.

I considered doing that but then we have to handle the library handle etc ourselves, which I'd prefer not to do. I'd say merge for now and we can revisit if needed.

So that this silently fail for matrices that do not have stride(A, 1) = size(A, 1) ?

Would it have been an option to do something like

A2a = view(Base.unsafe_wrap(ROCMatrix{T}, pointer(A2), (stride(A2, 1), size(A2, 2)), 1:size(A2, 1), :)

Oh I see this is at least checked in a specialized isblasmatrix version.

kshyatt requested a review from lkdvos May 13, 2026 10:12

lkdvos reviewed May 13, 2026

View reviewed changes

Use a pass-through for gemm

8b37605

kshyatt force-pushed the ksh/gemm branch from 7531b22 to 8b37605 Compare June 2, 2026 19:35

kshyatt changed the title ~~Use a pass-through for gemm~~ Fix isblasmatrix for GPUArrays Jun 2, 2026

Add tests and remove extraneous ones

e7f3075

Katharine Hyatt added 3 commits June 3, 2026 10:35

Fixes

69d1828

Formatter

0e908bf

Insert blas_mul layer to shim to vendor libraries

d6edc88

Katharine Hyatt and others added 2 commits June 4, 2026 16:31

Formatter

78d6b17

Actually use cuBLAS

45fda97

lkdvos reviewed Jun 5, 2026

View reviewed changes

Comment thread test/gpu.jl

Comment thread test/gpu.jl

Test some non-BLAS matrices for GPU

7cfc9f2

kshyatt added 2 commits June 6, 2026 01:41

Cover more blas paths

bf8978e

Formatter

6f07ee7

More tests and a fix

353c4cb

lkdvos approved these changes Jun 8, 2026

View reviewed changes

lkdvos merged commit a3183b2 into main Jun 8, 2026
9 of 13 checks passed

lkdvos deleted the ksh/gemm branch June 8, 2026 14:16

Conversation

kshyatt commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkdvos left a comment

Choose a reason for hiding this comment

Uh oh!

kshyatt commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kshyatt commented May 13, 2026

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lkdvos commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kshyatt commented May 13, 2026

Uh oh!

lkdvos commented May 13, 2026

Uh oh!

kshyatt commented Jun 2, 2026

Uh oh!

kshyatt commented Jun 4, 2026

Uh oh!

lkdvos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kshyatt commented Jun 5, 2026

Uh oh!

lkdvos commented Jun 6, 2026

Uh oh!

lkdvos commented Jun 7, 2026

Uh oh!

kshyatt commented Jun 7, 2026

Uh oh!

lkdvos commented Jun 7, 2026

Uh oh!

kshyatt commented Jun 7, 2026

Uh oh!

kshyatt commented Jun 8, 2026

Uh oh!

lkdvos left a comment

Choose a reason for hiding this comment

Uh oh!

lkdvos Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

kshyatt Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Jutho Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jutho Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 13, 2026 •

edited

Loading

kshyatt commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading

lkdvos commented May 13, 2026 •

edited

Loading

Jutho Jun 9, 2026 •

edited

Loading