lce13: gnu toolchain - compiler performance
DESCRIPTION
Resource: LCE13 Name: GNU Toolchain - Compiler Performance Date: 11-07-2013 Speaker: Matthew Gretton-Dann Video: http://youtu.be/gSUWbe71NIsTRANSCRIPT
GNU ToolchainCompiler performance
Team
Christophe
Kugan Venkat
Yvan Zhenqiang
Achieved since LCA13
● Switched to gcc-4.8– Lots of backports from trunk
● Gcc-4.7 is now in maintenance● Improved epilogues of leaf functions (can now use LR)● Shrink-wrapping● Progress on conditional compare support● Progress on VRP (Value Range Propagation)● Progress on divmod optimisation● Progress on disabling loop peeling● Address sanitizer
● Shrink-wrapping: move prologue/epilogue inside function body
● Conditional compare support: short-circuit &&/|| if possible:
● VRP: helps removing useless sign/zero extensions
● Divmod: ARM runtime lib contains a routine computing div & mod at the same time
X = a / b; // call div()Y = a % b; // call mod()
(x,y) = divmod(a,b)
short foo(unsigned char c) { c = c & (unsigned char)0x0F ;
if (c > 7) { return c - 6; } return c;}
foo: and r0, r0, #15 cmp r0, #7 subhi r0, r0, #6 uxthhi r0, r0 sxth r0, r0 bx lr
foo: and r0, r0, #15 cmp r0, #7 subhi r0, r0, #6 bx lr
Void test (int a){ If (a == 0) return; ….}
Push {….}If (a == 0) goto Lx;…..Lx:Pop {…}return
If (a == 0) return;Push {…}….Pop {…}Return
If (a == b && c == d) Cmp a,bCmpeq c,d
● Loop peeling: generate out of loop iterations to make sure the loop body makes aligned memory accesses for vectorization. – Mostly useless on ARM which supports unaligned
memory accesses.
● Address sanitizer: new GCC framework to identify NULL pointers accesses, invalid memory references....
Next iteration
● Spec2k analysis– Comparison with x86– Looking for hot spots– Identify and prioritise actions
● Shrink wrapping improvements● Conditional compares● Finalize loop peeling improvements● Neon intrinsics improvements● GCC trunk backports● Compiler target hooks audit