![Page 1: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/1.jpg)
ACO, a new compiler backend for GCN GPUs
2019-10-02Bas NieuwenhuizenDaniel Schürmann
![Page 2: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/2.jpg)
Who are we?● Bas Nieuwenhuizen
○ RADV maintainer since it started existing (Summer 2016)
● Daniel Schürmann○ Contracted by Valve Corporation○ Working on RADV since February 2018
![Page 3: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/3.jpg)
Outline● A new compiler backend● High level overview● Register Pressure Control● Current Status● Performance results● Challenges & Future
![Page 4: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/4.jpg)
A new compiler backend?● RADV: Independent Vulkan Driver for AMD GCN GPUs
SPIR-V NIR
LLVM IR LLVM MIR
Binary
Optimizations
OptimizationsOptimizations
ACO IR
Optimizations
New!
![Page 5: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/5.jpg)
A new compiler backend?1. Improve control flow handling by directly using NIR
○ NIR stores the structured control flow of shaders
2. Improve compile time performance
![Page 6: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/6.jpg)
Using NIR - Control Flow● In hardware multiple invocations get executed using SIMD
● Scalar registers sometimes needed for○ Performance○ Correctness
![Page 7: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/7.jpg)
Using NIR - Control Flowuniform texture2d textures[64];
void main() { int i; vec4 r1, r2;
for (i = 0;; ++i) { if (i == g_localInvocationIndex) { r1 = texture(texture2d[i], vec2(0.0)); break; } } r2 = texture(texture2d[i], vec2(0.0));}
OK
Not OK
![Page 8: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/8.jpg)
Using NIR - Control Flowuniform texture2d textures[64];
void main() { int i; vec4 r1, r2;
for (i = 0;; ++i) { if (i == g_localInvocationIndex) { r1 = texture(texture2d[i], vec2(0.0)); break; } } r2 = texture(texture2d[i], vec2(0.0));}
OK
Not OK
r1 =
r2 =
![Page 9: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/9.jpg)
Using NIR - Control Flow
A
B
D C
E
function loop if
A B
C
DE
NIR stores structure control flow:
With LLVM this information was thrown away
![Page 10: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/10.jpg)
High level overview● “Modern” Compiler Construction Principles● Written in C++● Divergence-aware instruction selection● Logical vs linear CFG● SSA-IR based on hardware ISA● SSA-based Register Allocation
![Page 11: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/11.jpg)
High level overview
NIRValue
Numbering
Assembly
Instruction Combining
Exec-mask Handling
Spilling Scheduling Register Allocation
SSA Elimination
Instruction Selection
![Page 12: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/12.jpg)
Register Pressure ControlProblem:
● Scheduling might increase register pressure● High register pressure might lead to spilling● On AMD GPUs, high register pressure lowers parallelism (occupancy)
![Page 13: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/13.jpg)
Register Pressure ControlSolution: Control the register pressure!
● Via SSA-based Spilling/RA● And scheduling under register pressure constraints
![Page 14: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/14.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 2) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
![Page 15: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/15.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 2) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
![Page 16: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/16.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 2) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 17: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/17.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 2) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 18: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/18.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 19: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/19.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 2) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 20: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/20.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 21: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/21.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 3, sgprs: 2) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 2) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 2) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 4, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
![Page 22: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/22.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 3, sgprs: 11) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 11) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 11) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 23: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/23.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 3, sgprs: 11) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 11) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 11) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
![Page 24: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/24.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 3, sgprs: 11) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 11) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 11) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 25: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/25.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 3, sgprs: 11) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 11) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 11) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 5, sgprs: 13) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
insertion
current
![Page 26: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/26.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 3, sgprs: 11) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 11) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 11) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 11) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 11) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 5, sgprs: 13) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
![Page 27: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/27.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 2, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 3, sgprs: 13) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 13) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 13) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 13) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 13) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 5, sgprs: 13) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 5, sgprs: 13) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z(vgprs: 4, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder...
![Page 28: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/28.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 2, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 3, sgprs: 13) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 13) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 13) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 13) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 13) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 5, sgprs: 13) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 5, sgprs: 13) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z(vgprs: 4, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder...
![Page 29: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/29.jpg)
Register Pressure ControlBB0(vgprs: 2, sgprs: 2) s1: %19:s[2], s1: %20:s[3], v1: %21:v[0], v1: %22:v[1], s2: %23:exec = p_startpgm(vgprs: 2, sgprs: 3) s2: %45:exec, s1: %44:scc = s_wqm_b64 %23:exec(vgprs: 2, sgprs: 2) p_logical_start(vgprs: 2, sgprs: 3) s2: %28 = p_create_vector %19, 0(vgprs: 2, sgprs: 11) s8: %29 = s_load_dwordx8 %28, 32 reorder(vgprs: 2, sgprs: 13) s4: %31 = s_load_dwordx4 %28, 0 reorder(vgprs: 3, sgprs: 13) v1: %26 = v_interp_p1_f32 %21, %20:m0 attr0.y(vgprs: 3, sgprs: 13) v1: %3 = v_interp_p2_f32 %22, %20:m0, %26 attr0.y(vgprs: 4, sgprs: 13) v1: %27 = v_interp_p1_f32 %21, %20:m0 attr0.x(vgprs: 4, sgprs: 13) v1: %4 = v_interp_p2_f32 %22, %20:m0, %27 attr0.x(vgprs: 4, sgprs: 13) v2: %5 = p_create_vector %4, %3(vgprs: 4, sgprs: 13) v2: %33 = p_wqm %5(vgprs: 3, sgprs: 1) v1: %32 = image_sample %33, %29, %31 dmask:w reorder(vgprs: 4, sgprs: 1) v1: %34 = v_interp_p1_f32 %21, %20:m0 attr0.z(vgprs: 4, sgprs: 1) v1: %9 = v_interp_p2_f32 %22, %20:m0, %34 attr0.z...
![Page 30: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/30.jpg)
Register Pressure ControlResults: LLVM -> ACO
● +9.40 % needed SGPRs● +2.65 % needed VGPRs● -95.91 % less SPGR Spilling● -100.00 % less VGPR Spilling● -5.24 % max Waves● -7.90 % less Code Size
![Page 31: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/31.jpg)
Current Status● Fully working VS, FS & CS● Supports same extensions as RADV/LLVM except for < 32bit types.● Currently GFX8 & GFX9 only● Same CTS pass rate as RADV/LLVM
![Page 32: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/32.jpg)
Current Status● ACO has been widely announced by Valve in July 2019
○ Resulted in 118 bug reports and tons of feedback
● Now merged into upstream Mesa○ Included in 19.3 release○ … but behind RADV_PERFTEST=aco
![Page 33: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/33.jpg)
Runtime Performance
![Page 34: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/34.jpg)
Compile Performance
![Page 35: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/35.jpg)
Future: Making ACO the default● Complete support for GPUs
○ Navi support in progress
● Finish remaining Vulkan extensions
● No date on making ACO the default yet○ First have to make sure it has bug-parity
![Page 36: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/36.jpg)
Challenges● Wider adoption
○ Radeonsi?
● More performance○ Alias analysis
● Testing○ Game traces?○ Unit tests?○ both?
![Page 37: ACO, a new compiler backend for GCN GPUs · 2019. 10. 4. · ACO, a new compiler backend forGCN GPUs 2019-10-02 Bas Nieuwenhuizen Daniel Schürmann](https://reader036.vdocuments.net/reader036/viewer/2022071604/613f4e7ba7a58608c268d6d2/html5/thumbnails/37.jpg)
End
Questions?