app optimizations using qualcomm snapdragon llvm compiler for android

30
1 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Upload: qualcomm-developer-network

Post on 14-Jun-2015

626 views

Category:

Technology


1 download

DESCRIPTION

The Qualcomm® Snapdragon™ LLVM, a product of Qualcomm Technologies, Inc. is an optimizing compiler tuned for 32- and 64-bit Snapdragon processors. In this session you will learn how to use Snapdragon LLVM to build your Android app’s native code. We’ll provide guidelines on how to target your C and C++ code to exploit Snapdragon LLVM and sample code demonstrating areas of acceleration. Learn more about Snapdragon LLVM Compiler for Android: https://developer.qualcomm.com/mobile-development/increase-app-performance/snapdragon-llvm-compiler-android Watch this presentation on YouTube: https://www.youtube.com/watch?v=6lKOY2_Bg70

TRANSCRIPT

Page 1: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

1 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Page 2: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

Using Qualcomm® Snapdragon™ LLVM compiler to optimize apps for 32 and 64 Bit

Zino Benaissa Engineer, Principal/Manager Qualcomm Innovation Center, Inc.

Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.

Page 3: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

3 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Outline

• Introduction

• Coding guidelines for performance

• LLVM optimization pragmas

• LLVM internal flags

• Summary

Page 4: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

4 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Introduction

Page 5: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

5 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Software engineering Software applications are growing exponentially

• Software quality and security − Many tools to fight bugs, scrutinize source code for security holes. LLVM community is developing such

tools: − Static analyzer

− Sanitizers: − Address

− Undefined behavior

− Loop coverage tools

• Performance − Well, hardware/compilers are smart and they are!

− But often performance goals are not met. In this case programmers are on their own − Costly analysis is required

− Ad hoc methods are used − Inspection of assembly code and code rewrite

Page 6: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

6 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Compilers Compilers are formidable tools

• They have evolved along with the hardware evolution − Superscalar, SIMD, multi-core, 64 bits

• Typical industrial compiler includes over hundred optimizations

• Many powerful optimizations has been actively researched and developed to target hardware features − Loop auto-vectorization targeting SIMD execution unit

− Loop auto-parallelization targeting multi-cores

• Work correctly on any program

• Produce fast code

• Maximize utilization of hardware capabilities

Programmer expectations

Page 7: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

7 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Compilers Compilers are just programs. Programmers should be aware

• Contains thousands bugs like any other large software

• Optimizations have limitations − Can fail to apply on legitimate piece of code

• Lack “expected” optimization − No assumption of what the compiler will do

• Systematic but typically unable to infer critical knowledge of domain experts

Page 8: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

8 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Compilers The good news: minor rewrites of source code often trigger optimizations

• Following simple coding guidelines can significantly increase compiler effectiveness

• Compiler knows why an optimization did not apply − The LLVM community is actively developing optimization reporting feature targeted for release 3.6

− The Snapdragon LLVM team are extending this feature − Early preview of this feature is possible

Page 9: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

9 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Coding Guidelines for Performance

Page 10: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

10 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Sample code included in this presentation is made available subject to The Clear BSD License Copyright (c) 2014 Qualcomm Innovation Center, Inc.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* Neither the name of Qualcomm Innovation Center, Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Page 11: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

11 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 1

void foo(int *A) { for (int i = 0; i < computeN(); i++) A[i] += 1; }

Page 12: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

12 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 1: Make the loop trip count known

Loop Rewrite to void foo(int *A) { for (int i = 0; i < computeN(); i++) A[i] += 1; }

void foo(int *A) { int n = computeN(); for (int i = 0; i < n; i++) A[i] += 1; }

computeN() need to be evaluated every loop iteration computeN() is evaluated only once

Page 13: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

13 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 2

void foo(int *myArray, unsigned n) { for (unsigned i = 0; i < n; i += 2) myArray[i] += 1; }

Page 14: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

14 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 2: Use signed type

Loop Rewrite to void foo(int *myArray, unsigned n) { for (unsigned i = 0; i < n; i += 2) myArray[i] += 1; }

void foo(int *myArray, unsigned n) { for (int i = 0; i < n; i += 2) myArray[i] += 1; }

Unsigned type has modulo (wrap) semantic. Because variable i can overflow, compiler cannot assume it executes n iterations

Overflow of signed type is undefined. Compiler assumes loop counter never overflows.

Page 15: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

15 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 3

void foo(MyStruct *s) { for (int i = 0; i < s->NumElm; i++) s->MyArray[i] += 1; }

Page 16: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

16 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 3: Beware of pointer aliasing

Loop Rewrite to void foo(MyStruct *s) { for (int i = 0; i < s->NumElm; i++) s->MyArray[i] += 1; }

void foo(MyStruct *s) { int n = s->NumElm; for (int i = 0; i < n; i++) s->MyArray[i] += 1; }

Programmer should not assume that the compiler will be able to hoist s->NumElm

Compiler knows the number of loop iterations

Page 17: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

17 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guidelines 4

typedef struct { int **b; } S; void foo(S *A) { for (int i = 0; i < 100; i++) A->b[i] = nullptr; }

Page 18: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

18 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 4: Hoist complex pointer indirections

Loop Rewrite to typedef struct { int **b; } S; void foo(S *A) { for (int i = 0; i < 100; i++) A->b[i] = nullptr; }

typedef struct { int **b; } S; void foo(S *A) { int **ptr = A->b; for (int i = 0; i < 100; i++) ptr[i] = nullptr; }

A->b is evaluated every iterations If there are more that 2 levels of pointer/struct indirections. Hoist outside loop.

Page 19: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

19 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 5

void foo(int *A, int *B) { for (int i = 0; i < 100; i++) A[i] += B[i]; }

Page 20: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

20 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 5: Use restrict keyword

Loop Rewrite to void foo(int *A, int *B) { for (int i = 0; i < 100; i++) A[i] += B[i]; }

void foo(int *__restrict A, int *__restrict B) { for (int i = 0; i < 100; i++) A[i] += B[i]; }

The loop cannot be parallelized because the compiler has to worry about 1 case: A is pointing to B[i+1]

Tells the compiler that A and B are pointing to separate arrays.

LLVM vectorizes this loop without restrict. It generates run time checks to verify A and B are not overlapping

Page 21: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

21 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 6

void foo(int *A, int n, int m) { for (int i = 0; i < n ; i++) { for (int j = 0; j < m ; j++) { if (j != m - 1) *A |= 1; if (i != n – 1) *A |= 2; if (j != 0) *A |= 4; if (i != 0) *A |= 8; A++; } } }

Most elements of A will be set with *A | 15

Last iteration excluded

First iteration excluded

Page 22: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

22 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 6: Avoid complex control-flow

Loop Rewrite to void foo(int *A, int n, int m) { for (int i = 0; i < n ; i++) { for (int j = 0; j < m ; j++) { if (j != m - 1) *A |= 1; if (i != n – 1) *A |= 2; if (j != 0) *A |= 4; if (i != 0) *A |= 8; A++; } } }

void foo(int *A, int n, int m) { // Handle cases n == 1 and m == 1 // Peel iteration when i is 0 // Most executed loop for (i = 1; i < n - 1; i++) { *A++ |= 11; /* iter j = 0 */ for (int j = 1; j < m - 1; j++) *A++ |= 15; *A++ |= 14; /* iter j = m - 1 */ } // Peel iteration i = n – 1 }

Last and first iterations are peeled

Most common executed code

©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Page 23: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

23 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

LLVM Optimization Pragmas

Page 24: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

24 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 7: Use pragma vectorize

Loop void foo(int *A, int n) { for (int i = 0; i < n % 4; i++) A[i] += 1; }

Loop has too few iterations

Page 25: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

25 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 7: Use pragma vectorize

Loop Rewrite to void foo(int *A, int n) { for (int i = 0; i < n % 4; i++) A[i] += 1; }

void foo(int *A, int n) { #pragma clang loop vectorize(disable) for (int i = 0; i < n % 4; i++) A[i] += 1; }

Compiler often has no way to know n is less than three Beware pragma often are target dependent. Apply only to intended target Pragmas override command line flags

Programmer cannot assume the compiler will figure out that loop has at least four iterations

pragmas will be supported in the upcoming Snapdragon LLVM release

Page 26: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

26 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Guideline 7: Use pragma vectorize Example 2

Loop Rewrite to void foo(char *A, int n) { n = min(14, n); for (int i = 0; i < n; i++) A[i] += 1; }

void foo(char *A, int n) { n = min(14, n); #pragma clang loop vectorize_width(8) for (int i = 0; i < n; i++) A[i] += 1; }

Compiler is unaware there is at most 15 iterations. It will attempt to vectorize using a factor of 16 to fill ARM/NEON registers (128 bits)

Compiler will vectorize using a factor 8. When n >= 8, vector instructions are used.

Page 27: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

27 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

LLVM Internal Flags

Page 28: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

28 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

LLVM hidden optimization flags

• Compiler utilizes various heuristics and optimization threshold − Preset depending on optimization level

• Many optimizations are experimental and remain turned off

• Controlled by command line compiler flags − “clang –help-hidden” displays all available flags

• Difficult to utilize them − Can significantly accelerate specific pieces of code

− Unsafe to use in general

• Typically reserved to advanced programmers and compiler developers − In future, compiler reporting to suggest usage of a subset of these flags

Page 29: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

29 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Summary

• Coding guidelines can make compilers significantly more effective − Significant speed up

• Guidelines are only useful while the code remains readable − Avoid obscure and complex source changes

• Use Domain expert knowledge − LLVM supported pragmas

• Snapdragon LLVM compiler available at Qualcomm Developer Nework

Page 30: App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

30 ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog

©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries, used with permission. Uplinq is a trademark of Qualcomm Incorporated, used with permission. Other products and brand names may be trademarks or registered trademarks of their respective owners of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.

Thank you FOLLOW US ON: