Triton Kernel Lifecycle

You'll be using this term more often.

What is a Kernel? 💡
A kernel is a small, highly-optimized program that runs on the GPU. It defines the operations that every thread in a grid will execute in parallel. Kernels are the building blocks of GPU computation.


Lifecycle

Here’s what happens when you write and run a Triton kernel:

  1. Definition:

    • You define your kernel in Python using @triton.jit.
  2. JIT Compilation:

    • Triton compiles your Python kernel into GPU-specific machine code (PTX for CUDA).
  3. Kernel Launch:

    • The compiled kernel is launched on the GPU, with a grid of thread blocks handling the workload.
  4. Execution on GPU:

    • The GPU’s Streaming Multiprocessors (SMs) execute the kernel, with threads running in parallel using SIMT (Single Instruction, Multiple Threads).
  5. Results:

    • Computation results are written back to GPU global memory and returned to the CPU.

In short: Python → PTX → CUDA → Blazing-fast GPU execution. 🚀