Skip to content

Commit bbebfcd

Browse files
[rocm-libraries] ROCm/rocm-libraries#7518 (commit 2d260f3)
[hiptensor] Add hipTensor support on gfx1250 ## Motivation Add hipTensor support on gfx1250. ## Technical Details NA ## Test Plan NA ## Test Result NA ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. AIHIPTENS-2
1 parent e9b264f commit bbebfcd

9 files changed

Lines changed: 48 additions & 19 deletions

File tree

CHANGELOG.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
Full documentation for hipTensor is available at [rocm.docs.amd.com/projects/hiptensor](https://rocm.docs.amd.com/projects/hipTensor/en/latest/index.html).
44

5+
## Since last release ROCm 7.13
6+
7+
### Added
8+
9+
* Added support for new GPU target gfx1250.
10+
511
## Since last release ROCm 7.12
612

713
### Added
@@ -11,13 +17,13 @@ Full documentation for hipTensor is available at [rocm.docs.amd.com/projects/hip
1117
* gfx11: gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, gfx1153.
1218
* gfx12: gfx1200, gfx1201.
1319
* Added unary element-wise operators to contraction, including the new `BilinearUnary` class, dedicated instances, samples, and tests.
14-
* Added Dockerfiles (prebuilt and full build) and documentation to streamline hipTensor build environment setup.
20+
* Added Dockerfiles (prebuilt and full build) and documentation to streamline hipTensor build environment setup.
1521
* Added the `CREATE_TEST_APP_LOCAL_DEPLOY` CMake option to stage required ROCm DLLs on Windows, and updated the Windows build documentation accordingly.
1622

1723
### Changed
1824
* Replaced numeric UID-based actor-critic kernel lookup with platform-stable string-based kernel name comparison to enable cross-platform compatibility.
1925
* Adopted FNV-1a string hashing in place of `std::hash` to ensure plan cache files are portable across platforms.
20-
* Switched to ROCm-provided CMake install functions (`rocm_export_targets`) for consistency with other ROCm libraries.
26+
* Switched to ROCm-provided CMake install functions (`rocm_export_targets`) for consistency with other ROCm libraries.
2127
* Adapted hipTensor to CK namespace changes for `host_tensor` functions.
2228
* Cleaned up `rtest` script formatting and removed invalid run commands from `rtest.xml`.
2329

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ Welcome! hiptensor is AMD's C++ library for accelerating tensor primitives using
99

1010
hipTensor currently supports the following AMDGPU architectures:
1111

12-
* CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx942, gfx950 as 'gfx9'.
13-
* RDNA class GPU featuring matrix core support: gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, gfx1153, gfx1200 and gfx1201.
12+
* CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx942, gfx950 as 'gfx9'
13+
* RDNA class GPU featuring matrix core support: gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, gfx1153, gfx1200, gfx1201 and gfx1250.
1414

1515
> [!NOTE]
16-
> Double precision FP64 datatype support requires gfx90a, gfx942 or gfx950.
16+
> Double precision FP64 datatype support requires gfx90a, gfx942, gfx950 or gfx1250.
1717
1818
Dependencies:
1919

cmake/Functions/hiptensorSupportedArchitectures.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ set(SUPPORTED_ARCHITECTURES
1414
gfx11-generic
1515
gfx1200
1616
gfx1201
17+
gfx1250
1718
gfx12-generic
1819
)
1920

docs/api-reference/api-reference.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,11 @@ List of supported RDNA architectures:
3838
* gfx1153
3939
* gfx1200
4040
* gfx1201
41+
* gfx1250
4142

4243
.. note::
4344
gfx11 = gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, gfx1153
44-
gfx12 = gfx1200, gfx1201
45+
gfx12 = gfx1200, gfx1201, gfx1250
4546

4647
.. _hiptensor-supported-data-types:
4748

@@ -82,7 +83,7 @@ Data Types **<Ti / To / Tc>** = <Input type / Output Type / Compute Type>, where
8283
| +------------------------------+ | |
8384
| | f32 / f32 / bf16 | | 4m4n4k (Rank8) |
8485
| +------------------------------+---------------------+ |
85-
| | f16 / f16 / f32 | gfx9 | 5m5n5k (Rank10) |
86+
| | f16 / f16 / f32 | gfx9 gfx1250 | 5m5n5k (Rank10) |
8687
| +------------------------------+ | |
8788
| | bf16 / bf16 / f32 | | 6m6n6k (Rank12) |
8889
| +------------------------------+ | |

docs/install/installation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ including the gfx908, gfx90a, gfx942, and gfx950 GPUs (collectively labeled as g
7373

7474
Additionally, hipTensor is supported on AMD RDNA GPUs:
7575
- gfx11-generic: gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152 and gfx1153.
76-
- gfx12-generic; gfx1200 and gfx1201.
76+
- gfx12-generic: gfx1200, gfx1201 and gfx1250.
7777

7878
.. note::
7979

@@ -180,7 +180,7 @@ Here are the available options to build the hipTensor library, with or without c
180180
- **Default value**
181181
* - ``GPU_TARGETS``
182182
- Build the code for specific GPU target(s)
183-
- ``gfx908``; ``gfx90a``; ``gfx942``; ``gfx950``; ``gfx1100``; ``gfx1101``; ``gfx1102``; ``gfx1103``; ``gfx1150``; ``gfx1151``; ``gfx1152``; ``gfx1153``; ``gfx11-generic``; ``gfx1200``; ``gfx1201``; ``gfx12-generic``
183+
- ``gfx908``; ``gfx90a``; ``gfx942``; ``gfx950``; ``gfx1100``; ``gfx1101``; ``gfx1102``; ``gfx1103``; ``gfx1150``; ``gfx1151``; ``gfx1152``; ``gfx1153``; ``gfx11-generic``; ``gfx1200``; ``gfx1201``; ``gfx1250``; ``gfx12-generic``
184184
* - ``HIPTENSOR_BUILD_TESTS``
185185
- Build the tests
186186
- ``ON``

library/include/hiptensor/internal/config.hpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ namespace hiptensor
4545
/// HIPTENSOR_ARCH_GFX1153
4646
/// HIPTENSOR_ARCH_GFX1200
4747
/// HIPTENSOR_ARCH_GFX1201
48+
/// HIPTENSOR_ARCH_GFX1250
4849
#if defined(__gfx908__)
4950
#define HIPTENSOR_ARCH_GFX908 __gfx908__
5051
#elif defined(__gfx90a__)
@@ -73,6 +74,8 @@ namespace hiptensor
7374
#define HIPTENSOR_ARCH_GFX1200 __gfx1200__
7475
#elif defined(__gfx1201__)
7576
#define HIPTENSOR_ARCH_GFX1201 __gfx1201__
77+
#elif defined(__gfx1250__)
78+
#define HIPTENSOR_ARCH_GFX1250 __gfx1250__
7679
#else
7780
#define HIPTENSOR_ARCH_HOST 1
7881
#endif
@@ -119,6 +122,9 @@ namespace hiptensor
119122
#if !defined(HIPTENSOR_ARCH_GFX1201)
120123
#define HIPTENSOR_ARCH_GFX1201 0
121124
#endif
125+
#if !defined(HIPTENSOR_ARCH_GFX1250)
126+
#define HIPTENSOR_ARCH_GFX1250 0
127+
#endif
122128
#if !defined(HIPTENSOR_ARCH_HOST)
123129
#define HIPTENSOR_ARCH_HOST 0
124130
#endif

library/src/hip_device.cpp

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,10 @@ namespace hiptensor
102102
{
103103
mGcnArch = hipGcnArch_t::GFX1201;
104104
}
105+
else if(deviceName.find("gfx1250") != std::string::npos)
106+
{
107+
mGcnArch = hipGcnArch_t::GFX1250;
108+
}
105109

106110
switch(mProps.warpSize)
107111
{
@@ -135,7 +139,8 @@ namespace hiptensor
135139
{
136140
return (mGcnArch == HipDevice::hipGcnArch_t::GFX90A
137141
|| mGcnArch == HipDevice::hipGcnArch_t::GFX942
138-
|| mGcnArch == HipDevice::hipGcnArch_t::GFX950);
142+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX950
143+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1250);
139144
}
140145

141146
bool HipDevice::matrixCoreSupport(hiptensorComputeDescriptor_t typeCompute) const
@@ -152,7 +157,8 @@ namespace hiptensor
152157
return (mGcnArch == HipDevice::hipGcnArch_t::GFX908
153158
|| mGcnArch == HipDevice::hipGcnArch_t::GFX90A
154159
|| mGcnArch == HipDevice::hipGcnArch_t::GFX942
155-
|| mGcnArch == HipDevice::hipGcnArch_t::GFX950);
160+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX950
161+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1250);
156162
case HIPTENSOR_COMPUTE_DESC_16F:
157163
case HIPTENSOR_COMPUTE_DESC_16BF:
158164
return (mGcnArch == HipDevice::hipGcnArch_t::GFX908
@@ -168,7 +174,8 @@ namespace hiptensor
168174
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1152
169175
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1153
170176
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1200
171-
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1201);
177+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1201
178+
|| mGcnArch == HipDevice::hipGcnArch_t::GFX1250);
172179
default:
173180
return false;
174181
}

library/src/include/hip_device.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ namespace hiptensor
5151
GFX1153 = 0x1153,
5252
GFX1200 = 0x1200,
5353
GFX1201 = 0x1201,
54+
GFX1250 = 0x1250,
5455
UNSUPPORTED_ARCH = 0x0,
5556
};
5657

test/utils.hpp

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,8 @@ inline bool isF16Supported()
135135
|| (deviceName.find("gfx1152") != std::string::npos)
136136
|| (deviceName.find("gfx1153") != std::string::npos)
137137
|| (deviceName.find("gfx1200") != std::string::npos)
138-
|| (deviceName.find("gfx1201") != std::string::npos);
138+
|| (deviceName.find("gfx1201") != std::string::npos)
139+
|| (deviceName.find("gfx1250") != std::string::npos);
139140
}
140141

141142
inline bool isF32Supported()
@@ -161,7 +162,8 @@ inline bool isF32Supported()
161162
|| (deviceName.find("gfx1152") != std::string::npos)
162163
|| (deviceName.find("gfx1153") != std::string::npos)
163164
|| (deviceName.find("gfx1200") != std::string::npos)
164-
|| (deviceName.find("gfx1201") != std::string::npos);
165+
|| (deviceName.find("gfx1201") != std::string::npos)
166+
|| (deviceName.find("gfx1250") != std::string::npos);
165167
}
166168

167169
inline bool isF64Supported()
@@ -176,7 +178,8 @@ inline bool isF64Supported()
176178

177179
return (deviceName.find("gfx90a") != std::string::npos)
178180
|| (deviceName.find("gfx942") != std::string::npos)
179-
|| (deviceName.find("gfx950") != std::string::npos);
181+
|| (deviceName.find("gfx950") != std::string::npos)
182+
|| (deviceName.find("gfx1250") != std::string::npos);
180183
}
181184

182185
inline bool isF16F16MatrixCoreSupported()
@@ -198,7 +201,8 @@ inline bool isF16F16MatrixCoreSupported()
198201
|| (deviceName.find("gfx1152") != std::string::npos)
199202
|| (deviceName.find("gfx1153") != std::string::npos)
200203
|| (deviceName.find("gfx1200") != std::string::npos)
201-
|| (deviceName.find("gfx1201") != std::string::npos);
204+
|| (deviceName.find("gfx1201") != std::string::npos)
205+
|| (deviceName.find("gfx1250") != std::string::npos);
202206
}
203207

204208
inline bool isF32F16MatrixCoreSupported()
@@ -224,7 +228,8 @@ inline bool isF32F16MatrixCoreSupported()
224228
|| (deviceName.find("gfx1152") != std::string::npos)
225229
|| (deviceName.find("gfx1153") != std::string::npos)
226230
|| (deviceName.find("gfx1200") != std::string::npos)
227-
|| (deviceName.find("gfx1201") != std::string::npos);
231+
|| (deviceName.find("gfx1201") != std::string::npos)
232+
|| (deviceName.find("gfx1250") != std::string::npos);
228233
}
229234

230235
inline bool isF32F32MatrixCoreSupported()
@@ -240,7 +245,8 @@ inline bool isF32F32MatrixCoreSupported()
240245
return (deviceName.find("gfx908") != std::string::npos)
241246
|| (deviceName.find("gfx90a") != std::string::npos)
242247
|| (deviceName.find("gfx942") != std::string::npos)
243-
|| (deviceName.find("gfx950") != std::string::npos);
248+
|| (deviceName.find("gfx950") != std::string::npos)
249+
|| (deviceName.find("gfx1250") != std::string::npos);
244250
}
245251

246252
inline bool isF16F32MatrixCoreSupported()
@@ -275,7 +281,8 @@ inline bool isF64F32MatrixCoreSupported()
275281

276282
return (deviceName.find("gfx90a") != std::string::npos)
277283
|| (deviceName.find("gfx942") != std::string::npos)
278-
|| (deviceName.find("gfx950") != std::string::npos);
284+
|| (deviceName.find("gfx950") != std::string::npos)
285+
|| (deviceName.find("gfx1250") != std::string::npos);
279286
}
280287

281288
inline bool isDataType16Bits(hiptensorDataType_t dataType)

0 commit comments

Comments
 (0)