Skip to content

InhabitancyCocoon/triton_learning

Repository files navigation

Triton self learning

environment

triton 3.4.0
torch 2.8.0+cu128
cuda 12.8
RTX5090 (capability 12.0)

Note

  • There are some accuracy problems with matmul and linear.
  • RTX5090 doesn't support certain features like blockwise scaled matmul.
  • RTX5090 will encounter some OOM error. Try to reduce the problem size.
  • Please check the corresponding version tag of triton tutorial, don't use main branch.
  • Some perfermance reports are weird. Please check the correctness.
  • The first param torch.testing.assert_close() is true value.
  • Be careful of the dtype used for input, computation, accumulation and the output.
  • Be careful of the alignment and edge case.
  • Leave the ref link, it will help you in the future.
  • Use ls ~/.triton/cache to check the kernel compilation cache.
  • Triton kernel is useful for highly fused and customized operation.

set up

  • linux bash shell settings
export PS1="\u@\h:\W> " (\W only display current directory)
export PS1="\u:\W> "

  • query the device capability
import torch
torch.cuda.get_device_capability()
  • check autodl github for github access source /etc/network_turbo

  • debug

export TRITON_INTERPRET=1
unset TRITON_INTERPRET

link

triton_tutorial

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors