GPU Software Terminology

Terms that you'll be using when creating GPU Programs.

Grids

A grid is a collection of thread blocks used to organize the GPU workload. It represents the highest level of parallelism in GPU programming.

  • Grids allow you to scale computation to handle large datasets by mapping multiple blocks to the GPU.

Blocks

A block is a group of threads that execute together and share resources like shared memory. Blocks are the second level of abstraction, below grids.

  • Each block runs on a single Streaming Multiprocessor (SM).
  • Threads in a block can communicate using shared memory.

Threads

The smallest unit of execution in a GPU program. Each thread executes the same kernel code independently but works on a different piece of data.

  • A thread has access to its own registers.
  • Threads in a block can collaborate via shared memory.

Warps

A group of 32 threads that execute instructions in lockstep. Warps are the unit of execution for a Streaming Multiprocessor (SM).

  • Within a block, threads are divided into warps.
  • All threads in a warp execute the same instruction, so divergence (e.g., different execution paths) can reduce efficiency.

Single Instruction, Multiple Threads (SIMT)

The execution model of GPUs, where a single instruction is applied to multiple threads in parallel.

  • SIMT is how GPUs achieve massive parallelism efficiently.

Thread Hierarchy Recap

  1. Grid → Blocks: The overall workload is divided into blocks.
  2. Block → Threads: Each block contains many threads.
  3. Threads → Warps: Threads within a block are executed in groups of 32 (warps).