- KleidiAI delivers optimized Arm® micro-kernels (packing + matmul) for AI/ML frameworks that already manage scheduling, threading, and memory.
kai/– Common headers plus micro-kernel families grouped by operator (for exampleukernels/matmul). Each family follows naming rules documented in per-directory READMEs and implements the standard interfaces inukernels/matmul/*.h.test/– GoogleTest unit suites covering micro-kernel correctness and API guarantees.benchmark/– CMake targets that time kernels across Arm variants.examples/– Minimal builds that exercise the library as an external dependency and serve as smoke tests.docs/– Task-focused guides (packing/matmul intros, indirect matmul walkthroughs, framework integration examples, external patches).docker/- Contains containers used in CI.
KleidiAI uses two build systems; CMake and Bazel.
- Native Arm build (default):
cmake -S . -B build && cmake --build build- Test with
build/kleidiai_test
- Test with
- Bazel build and test
bazelisk test //test:kleidiai_test
The testing pipeline is described in .gitlab-ci.yml, which does make use of
container described in docker/Dockerfile. This utilizes FVP, which enables
testing on different HW configurations.
- Micro-kernel variants are contained in
.S,.cand.hfiles, namedkai_<op>_<variant>_<tech>_<feature>, and only depend onkai_common.h. - Typical flow: pack LHS → pack RHS → invoke matmul kernel.
Use items here as checklist when reviewing, or making changes,
-
Remove commented-out code before committing.
-
Add new source files to all relevant build scripts (CMake and Bazel).
-
Use
TODOcomments sparingly and explain the follow-up. -
Write clear, correctly formatted documentation for every function:
/// Brief one-line description. /// /// Optional longer description if needed /// /// @param[in/out] param_name Description of parameter /// /// @return Description of return value, if any. -
Use the term micro-kernel rather than kernel/ukernel/function.
-
Guard CPU-feature-dependent code with ACLE feature test macros (noting that
__ARM_FEATURE_SMEis often an exception).
- Describe the change; one-line messages are reserved for trivial edits.
- Include
Signed-off-by: <Contact>in every commit.
- Document all micro-kernel additions, optimizations, and bug fixes in
CHANGELOG.md
- Micro-kernel bundles (
.c,.h,.S) live underkai/; pack kernels go inpack, matmul kernels in operator-specific directories with interface headers likekai_imatmul_clamp_f16_f16p_f16p_interface.h. - Follow the established naming convention or explicitly extend it.
- Provide matching interface headers for matmul micro-kernels.
- Add unit tests for every new kernel; an end-to-end pack + matmul test is acceptable if per-kernel tests are impractical.
- Register new matmul-style kernels with the benchmark suite.
- Conform to AAPCS64; save
d8–d15andx19–x28if modified. - Emit exactly one
retand avoid calling other functions (bl), except approved SME support routines with properLR/FPpreservation and__ARM_FEATURE_SMEguards. - Ensure the implementation matches the advertised behavior (e.g., clamp functionality).
- Preserve public symbol names unless the change and changelog clearly state otherwise; parameter renames for consistency are fine.
- Maintain kernel functionality unless explicitly changing it.
- Keep file listings sorted.
- File lists in
BUILD.bazelare named<TECH>[_<FEAT>]*_KERNELS[_ASM]. - File lists in
CMakeLists.txtare namedKLEIDIAI_FILES_<TECH>[_<FEAT>]*[_ASM]. - Kernels should be added to list that matches required technology and features.
- Kernels that use inline assembly should be added to a list without
_ASMsuffix. However, this prevent some compiler support, so avoid if possible. - Kernels that doesn't use inline assembly should be added to
KLEIDIAI_FILES_<TYPE>_ASM. This is typically what we want to do.
- Kernels that use inline assembly should be added to a list without
- Never skip tests unless unavoidable.
- Cache expensive reference data generation (notably for matmul) using utilities
like
common/cache.hpp. - Test code must use
KAI_ASSUME_ALWAYS(expr)andKAI_ASSERT_ALWAYS(expr)as to not be optimized away in release builds. - Tests should have deterministic behavior, and seed should only be randomized if explicitly requested by runtime parameter.
- Confirm
KAI_UNUSED(symbol)values are actually unused. - Enforce required parameter values via
KAI_ASSUME(param == value). - Use
KAI_ASSERT(<assumption>)for other invariants. - At the top of
.cfiles, assert required architecture extensions, for example:#if (!defined(__aarch64__) || !defined(__ARM_FEATURE_DOTPROD)) && \ !defined(_M_ARM64)