Skip to content

patrick-toulme/justabyte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Just a Byte

Hi! I'm Patrick Toulme, a compiler and performance engineer. I write Just a Byte — a blog about AI compilers, silicon, and systems.

This repo contains companion code, IR dumps, and reproduction scripts for the blog posts.

Website LinkedIn X

Posts

Post 1: From JAX to VLIW: Tracing a Computation Through the TPU Compiler Stack Traces 8 lines of JAX through the full TPU compiler pipeline — from HLO through optimization passes to 250 VLIW bundles across five fused kernels.

Post 2: When XLA Isn't Enough: From Pallas to VLIW with Splash Attention on TPU Explores the limits of XLA's automatic optimization for attention and how Pallas custom kernels achieve 6x fewer VLIW bundles and 37x less HBM traffic.

Post 3: CuTile on Blackwell: NVIDIA's Compiler Moat Is Already Built Traces a Mixture of Experts kernel through NVIDIA's CuTile stack — 86 lines of Python compiled into 1,900 lines of optimized PTX with tcgen05 instructions.

Post 4: Frontier Pretraining Infrastructure Is Already Open Source: GPT-OSS on TPU with MaxText Shows how MaxText and XLA compress 11,207 HLO instructions into 887 fused kernels, arguing frontier training infra is already available in open source.

About

Code snippets and reproductions from JustAByte

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors