migrating code from arm to arm64 - linux plumbers … architecture: n aarch64 is its 64-bit...
TRANSCRIPT
OutlineWriting portable code
Best practisesType promotionHow do I migrate?
ARM vs. ARM64A few definitionsComparing AArch32 and AArch64Return instructionStack pointer and zero registerNo load multiple, only pairsLDAR / STLRConditional execution exampleNEONLegacy instructions
References
2
Outline
Writing portable codeBest practisesType promotionHow do I migrate?
ARM vs. ARM64
References
3
Writing portable codeBest practises
n No assumptions about the type sizesn Magic numbersn size_t and ssize_tn printf formats (%zu, %zd, etc)n Beware of shiftsn Structure padding / alignmentn And of course: sizeof(int) != sizeof(void*)
4
Writing portable codeType promotion
n C/C++ have internal promotion rules (size and/or sign)n int + long -> longn unsigned + signed -> unsignedn If the second conversion (loss of sign) is carried out before the second (promotion
to long) then the result may be incorrect when assigned to a signed long.
n Complicated, even experienced programmers get caughtn Understand the order
5
Writing portable codeType promotion
Consider this example, in which you would expect the result -1 in a:
long a;int b;unsigned int c;
b = -2;c = 1;a = b + c;
n 32-bit: a = 0xFFFFFFFF (-1)n 64-bit: a = 0x00000000FFFFFFFF (232 − 1)
n Not what you expectn The result of the addition is converted to unsigned
before it is converted to long
Solution: cast to 64-bit before the cast to unsigned
6
Writing portable codeType promotion
long a;int b;unsigned int c;
b = -2;c = 1;a = (long) b + c;
n 32-bit: a = 0xFFFFFFFF (-1)n 64-bit: a = 0xFFFFFFFFFFFFFFFF (-1)
n Calculation is now all carried out in 64-bitarithmetic
n The conversion to signed now gives the correctresult
7
Writing portable codeHow do I migrate?
A mix of:n Recompilen Rewrite
n Better use of 64-bitn An opportunity to clean the code
8
OutlineWriting portable code
ARM vs. ARM64A few definitionsComparing AArch32 and AArch64Return instructionStack pointer and zero registerNo load multiple, only pairsLDAR / STLRConditional execution exampleNEONLegacy instructions
References9
ARM vs. ARM64A few definitions
ARMv8-A architecture:n AArch64 is its 64-bit execution state
n New A64 instruction setn AArch32 is its 32-bit execution state
n Superset of ARMv7-An Compatiblen Can run ARM®, Thumb® code
10
ARM vs. ARM64Comparing AArch32 and AArch64
n Presenting only userspacen See Rodolph Perfetta’s ”Introduction to A64” presentation
11
ARM vs. ARM64Return instruction
PC not an accessible register anymore
AArch32MOV PC, LRorPOP {PC}orBX LR
AArch64RET
12
ARM vs. ARM64Stack pointer and zero register
n Register no. 31n Zero register
n xzr or wzrn Reads as zeron A way to ignore results
n Stack pointern 16-byte aligned (configurable but Linux does it this way)n No multiple loadsn Only a few instructions will see x31 as the SP
13
ARM vs. ARM64No load multiple, only pairs
AArch32PUSH {r0, r1, r2, r3}
AArch64STP w3, w2, [sp, #-16]! // push first pair
// create space for secondSTP w1, w0, [sp, #8] // push second pair
Keep SP 16-byte aligned
14
ARM vs. ARM64LDAR / STLR
AArch32LDRSTRDMBLDRSTR
AArch64LDR ; these two accesses may be observed after the LDARSTRLDAR ; “”barrier which affects subsequent accesses onlySTR ; this access must be observed after LDAR
Similarly:LDR ; this access must be observed before STLRSTLR ; “”barrier which affects prior accesses onlyLDR ; these accesses may be observed before STLRSTR
15
ARM vs. ARM64Conditional execution example
Cint gcd(int a, int b) {
while (a != b) {if (a < b) {
a = a - b;} else {
b = b - a;}
}return a;
}
AArch32/T32gcd:
CMP r0, r1ITESUBGT r0, r0, r1SUBLT r1, r1, r0BNE gcdBX lr
AArch64gcd:
SUBS w2, w0, w1CSEL w0, w2, w0, gtCSNEG w1, w1, w2, gtBNE gcdRET
16
ARM vs. ARM64Conditional execution example
So, how do I migrate that?
Short answer:n You’re on your own, be clever
More interesting answer:n Don’t attempt direct translation
n Won’t work in a majority of casesn Even if it does, it is usually a bad idea
n Opportunity for new optimisations
17
ARM vs. ARM64NEON
n Part of the main instruction set / no longer optionaln Set the core condition flags (NZCV) rather than their own
n Easier to mix control and data flow with NEON
AArch32vadd.u16 d0, d1, d2
AArch64add v0.4h, v1.4h, v2.4h
18
ARM vs. ARM64NEON
AArch32: 16 x 128-bitregisters
AArch64: 32 x 128-bitregisters
..s3
.s2
.s1
.s0
.
d1
.
d0
.
q0
.
127
.
63
.
31
.
0..s0
.
d0
.
q0
.
127
.
63
.
31
.
0
19
ARM vs. ARM64NEON
n Intrinsics are mostly compatiblen Assembly will need a translation and/or rewrite
n Scriptsn Register aliasing can will get in the way
20
ARM vs. ARM64Legacy instructions
n SWP and SWPBn SETENDn CP15 barriersn IT, partiallyn VFP short vectorsn …n Emulated, slow, (possibly broken for some cases)
If you can avoid them, please do.
21
Outline
Writing portable code
ARM vs. ARM64
References
22
References
n Porting to ARM 64-bit, Chris Shoren ARM C Language Extensionsn ARM Architecture Reference Manueln ARMv8 Instruction Set overview
23
Thank You
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARMLimited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featuredmay be trademarks of their respective owners.
24
Q&A
25