Triton Kernel Lifecycle
You'll be using this term more often.
What is a Kernel? 💡
A kernel is a small, highly-optimized program that runs on the GPU. It defines the operations that every thread in a grid will execute in parallel. Kernels are the building blocks of GPU computation.
Lifecycle
Here’s what happens when you write and run a Triton kernel:
-
Definition:
- You define your kernel in Python using
@triton.jit
.
- You define your kernel in Python using
-
JIT Compilation:
- Triton compiles your Python kernel into GPU-specific machine code (PTX for CUDA).
-
Kernel Launch:
- The compiled kernel is launched on the GPU, with a grid of thread blocks handling the workload.
-
Execution on GPU:
- The GPU’s Streaming Multiprocessors (SMs) execute the kernel, with threads running in parallel using SIMT (Single Instruction, Multiple Threads).
-
Results:
- Computation results are written back to GPU global memory and returned to the CPU.
In short: Python → PTX → CUDA → Blazing-fast GPU execution. 🚀