15.HSA foundations - polimi-notes

# HSA foundations The Heterogeneous System Architecture (HSA) represents a paradigm shift in heterogeneous computing, aiming to seamlessly integrate CPUs, GPUs, and other accelerators. Founded in June 2012: - Developing a new platform for heterogeneous systems - Aims to unite accelerators architecturally - Initial focus on GPU compute, but expanding beyond Goals: - Enable power-efficient performance - Improve programmability of heterogeneous processors - Increase portability of code across processors and platforms - Increase pervasiveness of heterogeneous solutions ## Key Features Built on three key pillars: - hUMA - hQ - HSAIL ### hUMA (Heterogeneous Unified Memory Architecture) - Unified coherent memory enables data sharing across all processors - Allows usage of pointers - No explicit data transfer - values move on demand - Pageable virtual addresses for GPUs - no GPU capacity constraints - CPU and GPU have unified virtual memory spaces ### hQ (Heterogeneous Queuing) - User mode queuing for low latency dispatch - Architected Queuing Layer (AQL) enables any agent to enqueue tasks - Single compute dispatch path for all hardware - No driver translation, direct access to hardware - Allows dispatch to queue from any agent (CPU or GPU) - GPU self-enqueue enables solutions like recursion and tree traversal ### HSAIL (HSA Intermediate Language) - Portable "virtual ISA" for vendor-independent compilation and distribution - Low-level IR, close to machine ISA level - Generated by high-level compilers (LLVM, gcc, Java VM, etc.) - Compiled to target ISA by vendor-specific "finalizer" ## Advantages over Legacy GPU Compute - Eliminates multiple memory pools and address spaces - No explicit data copying across PCIe - Lower dispatch overhead - No need for lots of compute on GPU to amortize copy overhead - Removes GPU memory capacity limitations - Eliminates dual source development - More accessible to non-expert programmers - Enables natural expression of nested parallelism - Removes synchronization and communication overhead with the host - Exposes finer granularities of parallelism to scheduler and load balancer - Task preemption and context switching support on all computing resources (including GPUs) ## AMD Carrizo As one of the founding members of the HSA Foundation in 2012, **AMD** was instrumental in driving the initiative forward. AMD Carrizo, launched in 2015, was one of the first processors to fully implement HSA 1.0 specifications. It was part of AMD's APU (Accelerated Processing Unit) lineup, which combined CPU and GPU cores on a single chip. Carrizo represented a significant step forward in realizing the HSA vision, offering: 1. Full support for heterogeneous unified memory architecture (hUMA) 2. Hardware-based GPU scheduling 3. User-mode queuing 4. Shared virtual memory between CPU and GPU While HSA as a specific standard hasn't materialized as initially hoped, many of its core ideas have influenced the broader industry: AMD with its APU designs, Apple's M1/M2 chips, NVIDIA's unified memory in CUDA and Intel's integrated GPUs incorporate many HSA principles.