Download - ISCA Final Presentation - Applications
![Page 1: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/1.jpg)
HSA APPLICATIONSWEN-MEI HWU, PROFESSOR, UNIVERSITY OF ILLINOIS
WITH J.P. BORDES AND JUAN GOMEZ
![Page 2: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/2.jpg)
USE CASES SHOWING HSA ADVANTAGE
Programming Technique Use Case Description HSA Advantage
Pointer-based Data Structures
Binary tree searchesGPU performs parallel searches in a CPU created binary tree.
CPU and GPU have access to entire unified coherent memory. GPU can access existing data structures containing pointers.
Platform Atomics
Work-Group Dynamic Task ManagementGPU directly operate on a task pool managed by the CPU for algorithms with dynamic computation loads
Binary tree updatesCPU and GPU operating simultaneously on the tree, both doing modifications
CPU and GPU can synchronize using Platform AtomicsHigher performance through parallel operations reducing the need for data copying and reconciling.
Large Data SetsHierarchical data searchesApplications include object recognition, collision detection, global illumination, BVH
CPU and GPU have access to entire unified coherent memory. GPU can operate on huge models in place, reducing copy and kernel launch overhead.
CPU CallbacksMiddleware user-callbacksGPU processes work items, some of which require a call to a CPU function to fetch new data
GPU can invoke CPU functions from within a GPU kernelSimpler programming does not require “split kernels”Higher performance through parallel operations
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 3: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/3.jpg)
UNIFIED COHERENT MEMORY FOR POINTER-BASED DATA STRUCTURES
![Page 4: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/4.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 5: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/5.jpg)
L R
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 6: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/6.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
L R
L R
L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 7: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/7.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 8: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/8.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 9: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/9.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 10: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/10.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
Legacy
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
GPU MEMORY
RESULT BUFFER
FLAT TREE
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 11: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/11.jpg)
SYSTEM MEMORY
KERNEL
GPU
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
HSA and full OpenCL 2.0
TREE RESULTBUFFER
L R
L R L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 12: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/12.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
HSA
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 13: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/13.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
HSA
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 14: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/14.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
HSA
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 15: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/15.jpg)
UNIFIED COHERENT MEMORYMORE EFFICIENT POINTER DATA STRUCTURES
HSA
SYSTEM MEMORY
KERNEL
GPU
TREE RESULTBUFFER
L R
L R L R
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 16: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/16.jpg)
POINTER DATA STRUCTURES - CODE COMPLEXITY
HSA Legacy
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 17: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/17.jpg)
POINTER DATA STRUCTURES- PERFORMANCE
1M 5M 10M 25M0
10,000
20,000
30,000
40,000
50,000
60,000
Binary Tree Search
CPU (1 core)
CPU (4 core)
Legacy APU
HSA APU
Tree size ( # nodes )
Se
arc
h r
ate
(
no
de
s /
ms
)
Measured in AMD labs Jan 1-3 on system shown in back up slide
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 18: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/18.jpg)
PLATFORM ATOMICS FOR DYNAMIC TASK MANAGEMENT
![Page 19: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/19.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
0
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
TASKS POOL
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
0
NUM. WRITTEN
TASKS
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 20: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/20.jpg)
0
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
0
NUM. WRITTEN
TASKS
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Asynchronous transfer
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 21: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/21.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
0
NUM. WRITTEN
TASKS
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 22: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/22.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Asynchronous transfer
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 23: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/23.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 24: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/24.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
1
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 25: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/25.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
1
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 26: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/26.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
2
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 27: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/27.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
2
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 28: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/28.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
3
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICS
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 29: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/29.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
3
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 30: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/30.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
4
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 31: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/31.jpg)
4
SYSTEM MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
GPU MEMORY
QUEUE 2QUEUE 1
0
4
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
4
NUM. WRITTEN
TASKS
0
4
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 3
WORK-GROUP 4
TASKS POOL
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
Legacy*
*Chen et al., Dynamic load balancing on single- and multi-GPU systems, IPDPS 2010
Zero-copy
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 32: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/32.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
0
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 33: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/33.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
0
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
memcpy
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 34: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/34.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 35: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/35.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
0
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 36: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/36.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
1
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
Platform atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 37: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/37.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
1
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 38: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/38.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
2
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
Platform atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 39: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/39.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
2
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 40: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/40.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
3
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
Platform atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 41: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/41.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
3
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 42: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/42.jpg)
PLATFORM ATOMICSENABLING MORE EFFICIENT DYNAMIC TASK MANAGEMENT
4
HOST COHERENT MEMORY
WORK-GROUP 1
GPU
NUM. WRITTEN
TASKS
QUEUE 2QUEUE 1
TASKS POOL
0
4
NUM. CONSUMED
TASKS
0
QUEUE 1
QUEUE 2
WORK-GROUP 2
WORK-GROUP 3
WORK-GROUP 4
HSA and full OpenCL 2.0
GPU MEMORY
Platform atomic add
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 43: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/43.jpg)
PLATFORM ATOMICS – CODE COMPLEXITY
HSALegacy
Host enqueue function: 20 lines of code
Host enqueue function: 102 lines of code
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 44: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/44.jpg)
PLATFORM ATOMICS - PERFORMANCE
64 128 256 512 64 128 256 5124096 16384
0
100
200
300
400
500
600
700
Legacy implementation (ms)
HSA implementation (ms)
Tasks per insertionTasks pool size
Exe
cuti
on
tim
e (m
s)
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 45: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/45.jpg)
PLATFORM ATOMICS FOR CPU/GPU COLLABORATION
![Page 46: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/46.jpg)
PLATFORM ATOMICSENABLING EFFICIENT GPU/CPU COLLABORATION
Legacy
Only GPU can work on input
arrayConcurre
nt processin
g not possible
TREEINPUTBUFFER
GPU
KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 47: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/47.jpg)
PLATFORM ATOMICS
Legacy
Only GPU can work on input
arrayConcurre
nt processin
g not possible
TREEINPUTBUFFER
GPU
KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 48: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/48.jpg)
PLATFORM ATOMICS
Legacy
Only GPU can work on input
arrayConcurre
nt processin
g not possible
TREEINPUTBUFFER
GPU
KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 49: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/49.jpg)
GPU
KERNEL
PLATFORM ATOMICS
Both CPU+GPU
operating on same
data structure
concurrently
TREEINPUTBUFFER
CPU 0
CPU 1
HSA and full OpenCL 2.0
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 50: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/50.jpg)
GPU
KERNEL
PLATFORM ATOMICS
Both CPU+GPU
operating on same
data structure
concurrently
TREEINPUTBUFFER
CPU 0
CPU 1
HSA and full OpenCL 2.0
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 51: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/51.jpg)
UNIFIED COHERENT MEMORY FOR LARGEDATA SETS
![Page 52: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/52.jpg)
PROCESSING LARGE DATA SETS
The CPU creates a large data structure in System Memory. Computations
using the data are offloaded to the GPU.
SYSTEM MEMORY
GPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 53: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/53.jpg)
SYSTEM MEMORY
Level 1
Level 2
Level 3
Level 4
Level 5
PROCESSING LARGE DATA SETS
Larg
e 3D
spa
tial d
ata
stru
ctur
e
GPU
The CPU creates a large data structure in System Memory. Computations
using the data are offloaded to the GPU.
Compare HSA and Legacy methods
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 54: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/54.jpg)
SYSTEM MEMORY
LEGACY ACCESS USING GPU MEMORY
Legacy
GPU Memory is smaller
Have to copy and process in chunks
GPU
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 55: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/55.jpg)
SYSTEM MEMORY
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
LEGACY ACCESS TO LARGE STRUCTURES
Larg
e 3D
spa
tial d
ata
stru
ctur
e
GPU
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 56: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/56.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
Copy of top 2 levels of hierarchy
Larg
e 3D
spa
tial d
ata
stru
ctur
e
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 57: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/57.jpg)
GPU
GPU MEMORY
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
FIRSTKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 58: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/58.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
GPU MEMORY
FIRSTKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 59: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/59.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
?
GPU
GPU MEMORY
FIRSTKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 60: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/60.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 61: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/61.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
Copy of bottom 3 levels of one branch of the hierarchy
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 62: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/62.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
SECOND KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 63: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/63.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
SECOND KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 64: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/64.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
SECOND KERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 65: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/65.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 66: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/66.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 67: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/67.jpg)
SYSTEM MEMORY
COPY ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
Copy of bottom 3 levels of a different branch of the
hierarchy
GPU MEMORY
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 68: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/68.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
NthKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 69: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/69.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
NthKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 70: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/70.jpg)
SYSTEM MEMORY
PROCESS ONE CHUNK AT A TIME
Legacy
Level 1
Level 2
Level 3
Level 4
Level 5
GPU
KERNEL
GPU MEMORY
NthKERNEL
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 71: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/71.jpg)
LARGE SPATIAL DATA STRUCTURE
Level 1
Level 2
Level 3
Level 4
Level 5
Larg
e 3D
spa
tial d
ata
stru
ctur
eSYSTEM MEMORY
KERNEL
GPUHSA and full OpenCL 2.0
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 72: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/72.jpg)
SYSTEM MEMORY
GPU CAN TRAVERSE ENTIRE HIERARCHY
Level 1
Level 2
Level 3
Level 4
Level 5
HSA
KERNEL
GPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 73: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/73.jpg)
SYSTEM MEMORY
GPU CAN TRAVERSE ENTIRE HIERARCHY
Level 1
Level 2
Level 3
Level 4
Level 5
HSA
KERNEL
GPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 74: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/74.jpg)
SYSTEM MEMORY
GPU CAN TRAVERSE ENTIRE HIERARCHY
Level 1
Level 2
Level 3
Level 4
Level 5
HSA
KERNEL
GPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 75: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/75.jpg)
SYSTEM MEMORY
GPU CAN TRAVERSE ENTIRE HIERARCHY
Level 1
Level 2
Level 3
Level 4
Level 5
HSA
KERNEL
GPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 76: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/76.jpg)
SYSTEM MEMORY
GPU CAN TRAVERSE ENTIRE HIERARCHY
Level 1
Level 2
Level 3
Level 4
Level 5
KERNEL
HSAGPU
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 77: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/77.jpg)
CALLBACKS
![Page 78: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/78.jpg)
CALLBACKS
Parallel processing algorithm with branches A seldom taken branch requires new data from the CPU
On legacy systems, the algorithm must be split: Process Kernel 1 on GPU Check for CPU callbacks and if any, process on CPU Process Kernel 2 on GPU
Example algorithm from Image Processing Perform a filter Calculate average LUMA in each tile Compare LUMA against threshold and call CPU callback if exceeded (rare) Perform special processing on tiles with callbackx\s
COMMON SITUATION IN HC
Input Image Output Image
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 79: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/79.jpg)
CALLBACKS
Legacy
1st KERNEL
END
STAR
T
GP
U T
HR
EA
DS
0
1
2
N
.
.
.
.
.
.
.
.
.
CPU callbacks
Early term
ination
due to need fo
r
callback
2nd KERNEL
END
START Continuation kernel
finishes up kernel works results in poor GPU utilization
TIME
TIME
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 80: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/80.jpg)
CALLBACKS
Input Image
1 Tile = 1 OpenCL Work Item
Output Image
GPU• Work items compute average RGB value
of all the pixels in a tile • Work items also compute average Luma
from the average RGB• If average Luma > threshold, workgroup
invokes CPU CALLBACK• In parallel with callback, continue compute
CPU • For selected tiles, update average Luma
value (set to RED)
GPU• Work items apply the Luma value to all
pixels in the tile
GPU to CPU callbacks use Shared Virtual Memory (SVM) Semaphores, implemented using Platform Atomic Compare-and-Swap.
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 81: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/81.jpg)
CALLBACKS
A few kernel threads need CPU callback services but serviced immediately
KERNEL
END
STAR
T
GP
U T
HR
EA
DS
0
1
2
N
.
.
.
.
.
.
.
.
.
TIME
CPU callbacks
HSA and full OpenCL 2.0
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 82: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/82.jpg)
SUMMARY - HSA ADVANTAGE
Programming Technique Use Case Description HSA Advantage
Pointer-based Data Structures
Binary tree searchesGPU performs parallel searches in a CPU created binary tree.
CPU and GPU have access to entire unified coherent memory. GPU can access existing data structures containing pointers.
Platform Atomics
Work-Group Dynamic Task ManagementGPU directly operate on a task pool managed by the CPU for algorithms with dynamic computation loads
Binary tree updatesCPU and GPU operating simultaneously on the tree, both doing modifications
CPU and GPU can synchronize using Platform AtomicsHigher performance through parallel operations reducing the need for data copying and reconciling.
Large Data SetsHierarchical data searchesApplications include object recognition, collision detection, global illumination, BVH
CPU and GPU have access to entire unified coherent memory. GPU can operate on huge models in place, reducing copy and kernel launch overhead.
CPU CallbacksMiddleware user-callbacksGPU processes work items, some of which require a call to a CPU function to fetch new data
GPU can invoke CPU functions from within a GPU kernelSimpler programming does not require “split kernels”Higher performance through parallel operations
© Copyright 2014 HSA Foundation. All Rights Reserved
![Page 83: ISCA Final Presentation - Applications](https://reader038.vdocuments.net/reader038/viewer/2022103115/55838d34d8b42a9e528b4a1e/html5/thumbnails/83.jpg)
QUESTIONS?