GPU Hardware Terminology.

Most visited page just because you'll need to refresh your memory on this.

Streaming Multiprocessors (SMs)

The core computational units of a GPU. Each SM handles multiple warps (groups of threads) and executes them in parallel.


CUDA Cores

The individual processing units within an SM. CUDA cores perform arithmetic and logical operations in parallel, enabling massive computational throughput.


SFU (Special Function Units)

Hardware units within an SM that handle complex mathematical functions like trigonometric, exponential, and logarithmic operations efficiently.


LSU (Load/Store Unit)

Manages memory operations, including loading data from and storing data to global memory, shared memory, or registers.


Warp Scheduler

A hardware scheduler within an SM that assigns warps to available execution units. It determines which warp to execute next, optimizing resource utilization.


Registers

The fastest memory available, allocated per thread. Registers store temporary data used during computation.


Shared Memory

Fast, on-chip memory shared among threads within the same block. Ideal for data reuse and reducing global memory access latency.


L1 Cache

A small, fast cache within an SM that stores frequently accessed data to reduce latency when accessing global memory.


GPU RAM

The main memory of the GPU (often called global memory). It provides high capacity but has higher latency compared to on-chip memory like shared memory and caches.


Memory Mapping

Register → Thread
Shared Memory → Thread Block
L1 Cache → Streaming Multiprocessor (SM)
Global Memory → GPU