GPU Software Terminology

Terms that you'll be using when creating GPU Programs.

Grids

A grid is a collection of thread blocks used to organize the GPU workload. It represents the highest level of parallelism in GPU programming.

Grids allow you to scale computation to handle large datasets by mapping multiple blocks to the GPU.

Blocks

A block is a group of threads that execute together and share resources like shared memory. Blocks are the second level of abstraction, below grids.

Each block runs on a single Streaming Multiprocessor (SM).
Threads in a block can communicate using shared memory.

Threads

The smallest unit of execution in a GPU program. Each thread executes the same kernel code independently but works on a different piece of data.

A thread has access to its own registers.
Threads in a block can collaborate via shared memory.

Warps

A group of 32 threads that execute instructions in lockstep. Warps are the unit of execution for a Streaming Multiprocessor (SM).

Within a block, threads are divided into warps.
All threads in a warp execute the same instruction, so divergence (e.g., different execution paths) can reduce efficiency.

Single Instruction, Multiple Threads (SIMT)

The execution model of GPUs, where a single instruction is applied to multiple threads in parallel.

SIMT is how GPUs achieve massive parallelism efficiently.

Thread Hierarchy Recap

Grid → Blocks: The overall workload is divided into blocks.
Block → Threads: Each block contains many threads.
Threads → Warps: Threads within a block are executed in groups of 32 (warps).

Getting Started

Newbie Kernels