GPU Software Terminology
Terms that you'll be using when creating GPU Programs.
Grids
A grid is a collection of thread blocks used to organize the GPU workload. It represents the highest level of parallelism in GPU programming.
- Grids allow you to scale computation to handle large datasets by mapping multiple blocks to the GPU.
Blocks
A block is a group of threads that execute together and share resources like shared memory. Blocks are the second level of abstraction, below grids.
- Each block runs on a single Streaming Multiprocessor (SM).
- Threads in a block can communicate using shared memory.
Threads
The smallest unit of execution in a GPU program. Each thread executes the same kernel code independently but works on a different piece of data.
- A thread has access to its own registers.
- Threads in a block can collaborate via shared memory.
Warps
A group of 32 threads that execute instructions in lockstep. Warps are the unit of execution for a Streaming Multiprocessor (SM).
- Within a block, threads are divided into warps.
- All threads in a warp execute the same instruction, so divergence (e.g., different execution paths) can reduce efficiency.
Single Instruction, Multiple Threads (SIMT)
The execution model of GPUs, where a single instruction is applied to multiple threads in parallel.
- SIMT is how GPUs achieve massive parallelism efficiently.
Thread Hierarchy Recap
- Grid → Blocks: The overall workload is divided into blocks.
- Block → Threads: Each block contains many threads.
- Threads → Warps: Threads within a block are executed in groups of 32 (warps).