Definition

The use of a Graphics Processing Unit (GPU) to speed up computations, particularly parallelizable tasks like those found in machine learning and LLM inference. This typically results in much faster processing times compared to using only the CPU.

Why it matters (in Poovi’s context)

Crucial for achieving usable performance with LLMs; the video investigates which hardware configurations successfully leverage GPU acceleration.

Key properties or components

  • VRAM requirements
  • CUDA (Nvidia)
  • ROCm (AMD)
  • Driver compatibility
  • Model size limitations

Contradictions or debates

None.

Sources