introduction to hsa
TRANSCRIPT
INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE
Presenter: BingRu Wu
Outline
◻ Introduction◻ Goal◻ Concept◻ Memory Model◻ System Components
Introduction
◻ HSA: Heterogeneous System Architecture◻ Promising future:
◻ Arm processors producers◻ GPU vendors: AMD, Imaginations
◻ Fully utilize computation resource◻ Our system may connect to major
application base with supporting HSA
Goal of HSA
◻ Remove programmability barrier◻ Memory space barrier◻ Access latency among devices
◻ Backward compatible◻ Utilize existing programming models
Concept of HSA
Abstract
◻ Two kinds of compute unit◻ LCU: Latency Compute Unit (ex. CPU)◻ TCU: Throughput Compute Unit (ex. GPU)
◻ Merged memory space
Memory Management (1/2)
◻ Shared page table◻ Memory is shared by all devices◻ No longer host to device copy and vice versa◻ Support pointer data structure (ex. list)
◻ Page faulting◻ Virtual memory space for all devices◻ ex. GPU now can use memory as if it has
whole memory space
Memory Management (2/2)
◻ Coherent memory regions◻ The memory is coherent
◻ Shared among all devices (CUs)◻ Unified address space
◻ Memory type separated by address◻ Private / local / global memory decided by
memory region◻ No special instruction is required
User-Level Command Queue
◻ Queues for communication◻ User to device◻ Device to device
◻ HSA runtime handles the queue◻ Allocation & destruction◻ Each per application◻ Vendor dependent implementation
◻ Direct access to devices◻ No OS syscall◻ No task managing
Hardware Scheduler (1/3)
◻ No real scheduling on TCU (GPU)◻ Task scheduling◻ Task preemption
◻ Current implementation◻ Execute without lock:
◻ All threads execute◻ Multiple tasks cause error result
Hardware Scheduler (2/3)
◻ Current implementation◻ Execute with lock:
◻ Code exception may cause the resource being locked up
◻ Long runtime tasks prevent others from execution
◻ We may fail to finish critical jobs
Hardware Scheduler (3/3)
HSA runtime guarantees:◻ Bounded execution time
◻ Any process cease in reasonable time◻ Fast switch among applications
◻ Use hardware to save time◻ Application level parallelism
HSAIL (1/2)
◻ HSA Intermediate Language◻ The language for TCU
◻ Similar to “PTX” code◻ No graphic-specific instructions◻ Further translated to HW ISA (by Finalizer)
◻ The abstract platform is similar to OpenCL◻ Work item (thread)◻ Work group (block)◻ NDRange (grid)
HSAIL (2/2)
Memory Model
◻ All types of memory using same space◻ Memory access behavior
◻ Not all regions are accessible by all devices◻ OS kernel should not be accessible◻ Mapping to a region in kernel is still possible
◻ Accessing identical address may gives different values◻ Work item private memory◻ Work group local memory◻ Accessing other item / group is not valid
Virtual Memory Address
◻ Global◻ The memory shared by all LCU & TCU◻ Accessible via work item / group
◻ Group◻ The memory shared by all work items in the
same group◻ Private
◻ The memory only visible by a work item
Memory Region
◻ Kernarg◻ The memory for kernel arguments◻ Kernel is the code fragment we ask a device
to run on◻ Readonly
◻ Read-only type of global memory◻ Spill
◻ Memory for register spill◻ Arg
◻ Memory for function call arguments
Memory Region
Memory Consistency
◻ LCU◻ LCU maintains its own consistency◻ Shares global memory
◻ Work item◻ Memory operation to same address by single
work item is in order◻ Memory operations to different address may
be reordered◻ Other than that, nothing is guaranteed
System Components
HSA System
Compilation
◻ Frontend◻ LLVM IR◻ No data dependency
◻ Backend◻ Convert IR to HSAIL◻ Optimization happens
here◻ Binary format
◻ ELF format◻ Embedded container for
HSAIL (BRIG)
Runtime
◻ HSA runtime◻ Issue tasks to device
protocol◻ Device
◻ Convert HSAIL to ISA with Finalizer
HSAIL Program Features
◻ Backward Compatible◻ A system without HSA support should still
run the executable◻ Function Invocation
◻ LCU functions may call LCU ones◻ TCU functions may call TCU ones with
Finalizer support◻ LCU to TCU / TCU to LCU is supported by
using queue◻ C++ compatible
Conclusion
◻ HSA is an open and standard layer between software / hardware
◻ The cardinal feature of HSA is the unified virtual memory space
◻ No replacement for current programming framework, no new language is required
Reference
◻ Heterogeneous System Architecture: A Technical Review
◻ HSA Programmer’s Reference Manual◻ HSAIL: Write-Once-Run-Everywhere for
Heterogeneous Systems