system v application binary interface - the world's ... · intel386 architecture processor...

64
System V Application Binary Interface Intel386 Architecture Processor Supplement Version 1.1 Edited by H.J. Lu 1 , David L Kreitzer 2 , Milind Girkar 3 , Zia Ansari 4 Based on System V Application Binary Interface AMD64 Architecture Processor Supplement Edited by H.J. Lu 5 , Michael Matz 6 , Milind Girkar 7 , Jan Hubiˇ cka 8 , Andreas Jaeger 9 , Mark Mitchell 10 December 7, 2015 1 [email protected] 2 [email protected] 3 [email protected] 4 [email protected] 5 [email protected] 6 [email protected] 7 [email protected] 8 [email protected] 9 [email protected] 10 [email protected] Intel386 ABI 1.1 – December 7, 2015 – 8:57

Upload: vuongnhan

Post on 27-Jun-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

System V Application Binary InterfaceIntel386 Architecture Processor Supplement

Version 1.1

Edited byH.J. Lu1, David L Kreitzer2, Milind Girkar3, Zia Ansari4

Based on

System V Application Binary Interface

AMD64 Architecture Processor Supplement

Edited by

H.J. Lu5, Michael Matz6, Milind Girkar7, Jan Hubicka8, Andreas Jaeger9, Mark Mitchell10

December 7, 2015

[email protected]@[email protected]@[email protected]@[email protected]@[email protected]

[email protected]

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Contents

1 About this Document 51.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Related Information . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Low Level System Information 72.1 Machine Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Data Representation . . . . . . . . . . . . . . . . . . . . 72.2 Function Calling Sequence . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 The Stack Frame . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Parameter Passing and Returning Values . . . . . . . . . . 112.2.4 Variable Argument Lists . . . . . . . . . . . . . . . . . . 16

2.3 Process Initialization . . . . . . . . . . . . . . . . . . . . . . . . 172.3.1 Initial Stack and Register State . . . . . . . . . . . . . . . 172.3.2 Thread State . . . . . . . . . . . . . . . . . . . . . . . . 202.3.3 Auxiliary Vector . . . . . . . . . . . . . . . . . . . . . . 20

2.4 DWARF Definition . . . . . . . . . . . . . . . . . . . . . . . . . 232.4.1 DWARF Release Number . . . . . . . . . . . . . . . . . 242.4.2 DWARF Register Number Mapping . . . . . . . . . . . . 24

2.5 Stack Unwind Algorithm . . . . . . . . . . . . . . . . . . . . . . 24

3 Object Files 283.1 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.1 Special Sections . . . . . . . . . . . . . . . . . . . . . . 283.1.2 EH_FRAME sections . . . . . . . . . . . . . . . . . . . 28

3.2 Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Relocation Types . . . . . . . . . . . . . . . . . . . . . . 34

1

Intel386 ABI 1.1 – December 7, 2015 – 8:57

4 Libraries 394.1 Unwind Library Interface . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Exception Handler Framework . . . . . . . . . . . . . . . 404.1.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . 424.1.3 Throwing an Exception . . . . . . . . . . . . . . . . . . . 454.1.4 Exception Object Management . . . . . . . . . . . . . . . 484.1.5 Context Management . . . . . . . . . . . . . . . . . . . . 484.1.6 Personality Routine . . . . . . . . . . . . . . . . . . . . . 50

5 Conventions 555.1 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Intel MPX Extension 576.1 Parameter Passing and Returning of Values . . . . . . . . . . . . 57

6.1.1 Bounds Passing . . . . . . . . . . . . . . . . . . . . . . . 576.1.2 Returning of Bounds . . . . . . . . . . . . . . . . . . . . 58

A Linker Optimization 59A.1 Combine GOTPLT and GOT Slots . . . . . . . . . . . . . . . . . 59A.2 Optimize R_386_GOT32X Relocation . . . . . . . . . . . . . . . 60

2

Intel386 ABI 1.1 – December 7, 2015 – 8:57

List of Tables

2.1 Scalar Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Stack Frame with Base Pointer . . . . . . . . . . . . . . . . . . . 112.3 Register Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Return Value Locations for Fundamental Data Types . . . . . . . 142.5 Parameter Passing Example . . . . . . . . . . . . . . . . . . . . . 152.6 Register Allocation for Parameter Passing Example . . . . . . . . 152.7 Stack Layout at the Call . . . . . . . . . . . . . . . . . . . . . . . 162.8 x87 Floating-Point Control Word . . . . . . . . . . . . . . . . . . 172.9 MXCSR Status Bits . . . . . . . . . . . . . . . . . . . . . . . . . 182.10 EFLAGS Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.11 Initial Process Stack . . . . . . . . . . . . . . . . . . . . . . . . . 192.12 auxv_t Type Definition . . . . . . . . . . . . . . . . . . . . . . 202.13 Auxiliary Vector Types . . . . . . . . . . . . . . . . . . . . . . . 212.14 DWARF Register Number Mapping . . . . . . . . . . . . . . . . 252.15 Pointer Encoding Specification Byte . . . . . . . . . . . . . . . . 26

3.1 Special sections . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Common Information Entry (CIE) . . . . . . . . . . . . . . . . . 303.3 CIE Augmentation Section Content . . . . . . . . . . . . . . . . 313.4 Frame Descriptor Entry (FDE) . . . . . . . . . . . . . . . . . . . 323.5 FDE Augmentation Section Content . . . . . . . . . . . . . . . . 333.6 Relocation Types . . . . . . . . . . . . . . . . . . . . . . . . . . 36

A.1 Call, Jmp and Mov Conversion . . . . . . . . . . . . . . . . . . . 60A.2 Test and Binop Conversion . . . . . . . . . . . . . . . . . . . . . 61

3

Intel386 ABI 1.1 – December 7, 2015 – 8:57

List of Figures

3.1 Relocatable Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.1 Bound Register Usage . . . . . . . . . . . . . . . . . . . . . . . 57

A.1 Procedure Linkage Table Entry Via GOTPLT Slot . . . . . . . . . 59A.2 Procedure Linkage Table Entry Via GOT Slot . . . . . . . . . . . 60

Revision History1.1 — 2015-12-07 Add AVX-512 support. Add linker optimization to combine

GOTPLT and GOT slots. Add R_386_GOT32X relocation and linker opti-mization. Add FS/GS Base addresses to DWARF register number mapping.Add Intel MPX support.

1.0 — 2015-02-03 Reformat table of Returning Values.

0.1 — 2015-01-19 Initial release.

4

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 1

About this Document

This document is a supplement to the existing Intel386 System V Application Bi-nary Interface (ABI) document available at http://www.sco.com/developers/devspecs/abi386-4.pdf, which describes the Linux IA-32 ABI for proces-sors compatible with the Intel386 Architecture.

Intel processors released after the Pentium processors (Pentium 4, Intel Core,and later), have introduced new architecture features, particularly new registersand corresponding instructions to operate on the registers, like the MMX, IntelSSE(1-4), and Intel AVX instruction set extensions. The C/C++ programminglanguages have evolved to allow programmers to use new data types (for example,__m64, __m128, and __m256). Many compilers (including the Intel compilerand GCC) have supported these data types for some time. Other features in tools(for example, the decimal floating point types, 64-bit integers, exception handling,and so on) have also been developed since the original ABI was written.

This document describes the conventions and constraints on the implementa-tion of these new features for interoperability between various tools.

1.1 ScopeThis document describes the conventions on the new C/C++ language types (in-cluding alignment and parameter passing conventions), the relocation symbolsin the object binary, and the exception handling mechanism for Intel386 architec-ture. Some of this work has been discussed before http://groups.google.com/group/ia32-abi or http://www.akkadia.org/drepper/tls.pdf. The C++ object model that is expected to be followed is described in http:

5

Intel386 ABI 1.1 – December 7, 2015 – 8:57

//mentorembedded.github.io/cxx-abi/. In particular, this documentspecifies the information that compilers have to generate and the library routinesthat do the frame unwinding for exception handling.

1.2 Related InformationLinks to useful documents:

• System V Application Binary Interface, Intel386TM Architecture ProcessorSupplement Fourth Edition: http://www.sco.com/developers/devspecs/abi386-4.pdf

• System V Application Binary Interface, AMD64 Architecture ProcessorSupplement, Draft Version 0.99.6: http://www.x86-64.org/documentation/abi.pdf

• Discussion of Intel processor extensions: http://groups.google.com/group/ia32-abi

• ELF Handling of Thread-Local Storage: http://www.akkadia.org/drepper/tls.pdf

• Thread-Local Storage Descriptors for IA32 and AMD64/EM64T: http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt

• Itanium C++ ABI, Revised March 20, 2001: http://mentorembedded.github.io/cxx-abi/

6

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 2

Low Level System Information

This section describes the low-level system information for the Intel386 SystemV ABI.

2.1 Machine InterfaceThe Intel386 processor architecture and data representation are covered in thissection.

2.1.1 Data RepresentationWithin this specification, the term byte refers to a 8-bit object, the term twobyterefers to a 16-bit object, the term fourbyte refers to a 32-bit object, the term eight-byte refers to a 64-bit object, and the term sixteenbyte refers to a 128-bit object.1

Fundamental Types

Table 2.1 shows the correspondence between ISO C scalar types and the proces-sor scalar types. __float80, __float128, __m64, __m128, __m256 and__m512 types are optional.

1The Intel386 ABI uses the term halfword for a 16-bit object, the term word for a 32-bit object,the term doubleword for a 64-bit object. But most IA-32 processor specific documentation definea word as a 16-bit object, a doubleword as a 32-bit object, a quadword as a 64-bit object and adouble quadword as a 128-bit object.

7

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.1: Scalar TypesAlignment Intel386

Type C sizeof (bytes) Architecture_Bool† 1 1 booleanchar 1 1 signed bytesigned charunsigned char 1 1 unsigned byteshort 2 2 signed twobytesigned shortunsigned short 2 2 unsigned twobyteint 4 4 signed fourbyte

Integral signed intenum†††

unsigned int 4 4 unsigned fourbytelong 4 4 signed fourbytesigned longunsigned long 4 4 unsigned fourbytelong long 8 4 signed eightbytesigned long longunsigned long long 8 4 unsigned eightbyte

Pointer any-type * 4 4 unsigned fourbyteany-type (*)()

Floating- float 4 4 single (IEEE-754)point double 8 4 double (IEEE-754)

long double††††

__float80†† 12 4 80-bit extended (IEEE-754)long double††††

__float128†† 16 16 128-bit extended (IEEE-754)Complex _Complex float 8 4 complex single (IEEE-754)Floating- _Complex double 16 4 complex double (IEEE-754)point _Complex long double††††

_Complex __float80†† 24 4 complex 80-bit extended (IEEE-754)_Complex long double††††

_Complex __float128†† 32 16 complex 128-bit extended (IEEE-754)Decimal- _Decimal32 4 4 32bit BID (IEEE-754R)floating- _Decimal64 8 8 64bit BID (IEEE-754R)point _Decimal128 16 16 128bit BID (IEEE-754R)Packed __m64†† 8 8 MMX and 3DNow!

__m128†† 16 16 SSE and SSE-2__m256†† 32 32 AVX__m512†† 64 64 AVX-512

† This type is called bool in C++.†† These types are optional.††† C++ and some implementations of C permit enums larger than an int. The underlyingtype is bumped to an unsigned int.†††† The long double type is 64-bit, the same as the double type, on the AndroidTM

platform. More information on the AndroidTM platform is available from http://www.android.com/.

8

Intel386 ABI 1.1 – December 7, 2015 – 8:57

The 128-bit floating-point type uses a 15-bit exponent, a 113-bit mantissa (thehigh order significant bit is implicit) and an exponent bias of 16383.2

The 80-bit floating-point type uses a 15 bit exponent, a 64-bit mantissa withan explicit high order significant bit and an exponent bias of 16383.3

A null pointer (for all types) has the value zero.The type size_t is defined as unsigned int.Booleans, when stored in a memory object, are stored as single byte objects the

value of which is always 0 (false) or 1 (true). When stored in integer registers(except for passing as arguments), all 4 bytes of the register are significant; anynonzero value is considered true.

The Intel386 architecture in general does not require all data accesses to beproperly aligned. Misaligned data accesses may be slower than aligned accessesbut otherwise behave identically. The only exceptions are that __float128,_Complex __float128, _Decimal128, __m128, __m256 and __m512must always be aligned properly.

Structures and Unions

Structures and unions assume the alignment of their most strictly aligned compo-nent. Each member is assigned to the lowest available offset with the appropriatealignment. The size of any object is always a multiple of the object‘s alignment.

Structure and union objects can require padding to meet size and alignmentconstraints. The contents of any padding is undefined.

2.2 Function Calling SequenceThis section describes the standard function calling sequence, including stackframe layout, register usage, parameter passing and so on.

The standard calling sequence requirements apply only to global functions.Local functions that are not reachable from other compilation units may use dif-ferent conventions. Nevertheless, it is recommended that all functions use thestandard calling sequence when possible.

2Initial implementations of the Intel386 architecture are expected to support operations on the128-bit floating-point type only via software emulation.

3This type is the x87 double extended precision data type.

9

Intel386 ABI 1.1 – December 7, 2015 – 8:57

2.2.1 RegistersThe Intel386 architecture provides 8 general purpose 32-bit registers. In additionthe architecture provides 8 SSE registers, each 128 bits wide and 8 x87 floatingpoint registers, each 80 bits wide. Each of the x87 floating point registers may bereferred to in MMX mode as a 64-bit register. All of these registers are global toall procedures active for a given thread.

Intel AVX (Advanced Vector Extensions) provides 8 256-bit wide AVX regis-ters (%ymm0 - %ymm7). The lower 128-bits of %ymm0 - %ymm7 are aliased to therespective 128b-bit SSE registers (%xmm0 - %xmm7). Intel AVX-512 provides 8512-bit wide SIMD registers (%zmm0 - %zmm7). The lower 128-bits of %zmm0- %zmm7 are aliased to the respective 128b-bit SSE registers (%xmm0 - %xmm7).The lower 256-bits of %zmm0 - %zmm7 are aliased to the respective 256-bit AVXregisters (%ymm0 - %ymm7). For purposes of parameter passing and function re-turn, %xmmN, %ymmN and %zmmN refer to the same register. Only one of themcan be used at the same time. We use vector register to refer to either SSE, AVXor AVX-512 register. In addition, Intel AVX-512 also provides 8 vector maskregisters (%k0 - %k7), each 64-bit wide.

The CPU shall be in x87 mode upon entry to a function. Therefore, everyfunction that uses the MMX registers is required to issue an emms or femmsinstruction after using MMX registers, before returning or calling another function.4 The direction flag DF in the %EFLAGS register must be clear (set to “forward”direction) on function entry and return. Other user flags have no specified role inthe standard calling sequence and are not preserved across calls.

The control bits of the MXCSR register are callee-saved (preserved acrosscalls), while the status bits are caller-saved (not preserved). The x87 status wordregister is caller-saved, whereas the x87 control word is callee-saved.

2.2.2 The Stack FrameIn addition to registers, each function has a frame on the run-time stack. This stackgrows downwards from high addresses. Table 2.2 shows the stack organization.

The end of the input argument area shall be aligned on a 16 (32 or 64, if__m256 or __m512 is passed on stack) byte boundary. In other words, the value(%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to

4All x87 registers are caller-saved, so callees that make use of the MMX registers may use thefaster femms instruction.

10

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.2: Stack Frame with Base Pointer

Position Contents Frame4n+8(%ebp) memory argument fourbyte n

. . . Previous8(%ebp) memory argument fourbyte 04(%ebp) return address0(%ebp) previous %ebp value-4(%ebp) unspecified Current

. . .0(%esp) variable size

the function entry point. The stack pointer, %esp, always points to the end of thelatest allocated stack frame. 5

2.2.3 Parameter Passing and Returning ValuesAfter the argument values have been computed, they are placed either in registersor pushed on the stack.

Passing Parameters

Most parameters are passed on the stack. Parameters are pushed onto the stack inreverse order - the last argument in the parameter list has the highest address, thatis, it is stored farthest away from the stack pointer at the time of the call.

Padding may be needed to increase the size of each parameter to enforce align-ment according to the values in Table 2.1. There is an exception for __m64 and_Decimal64, which are treated as having an alignment of four for the purposesof parameter passing. Additional padding may be necessary to ensure that thebottom of the parameter block (closest to the stack pointer) is at an address whichis 0 mod 16, to guarantee proper alignment to the callee.

The exceptions to parameters passed on stack are as follows:

5The conventional use of %ebp as a frame pointer for the stack frame may be avoided by using%esp (the stack pointer) to index into the stack frame. This technique saves two instructions inthe prologue and epilogue and makes one additional general-purpose register (%ebp) available.

11

Intel386 ABI 1.1 – December 7, 2015 – 8:57

• The first three parameters of type __m64 are passed in %mm0, %mm1, and%mm2.

• The first three parameters of type __m128 are passed in %xmm0, %xmm1,and %xmm2.6

If parameters of type __m256 are required to be passed on the stack, the stackpointer must be aligned on a 0 mod 32 byte boundary at the time of the call.

If parameters of type __m512 are required to be passed on the stack, the stackpointer must be aligned on a 0 mod 64 byte boundary at the time of the call.

Returning Values

Table 2.4 lists the location used to return a value for each fundamental data type.Aggregate types (structs and unions) are always returned in memory.

Functions that return scalar floating-point values in registers return them on thetop of the x87 register stack, that is, %st0. It is the responsibility of the callingfunction to pop this value from the stack regardless of whether or not the valueis actually used. Failure to do so results in undefined behavior. An implicationof this requirement is that functions returning scalar floating-point values must beproperly prototyped. Again, failure to do so results in undefined behavior.

Returning Values in Memory

Some fundamental types and all aggregate types are returned in memory. Forfunctions that return a value in memory, the caller passes a pointer to the memorylocation where the called function must write the return value. This pointer ispassed to called function as an implicit first argument. The memory location mustbe properly aligned according to the rules in section 2.1.1. In addition to writ-ing the return value to the proper location, the called function is responsible forpopping the implicit pointer argument off the stack and storing it in %eax priorto returning. The calling function may choose to reference the return value via%eax after the function returns.

As an example of the register passing conventions, consider the declarationsand the function call shown in Table 2.5. The corresponding register allocation

6The SSE, AVX and AVX-512 registers share resources. Therefore, if the first __m128 pa-rameter gets assigned to %xmm0 , the first __m256/__m512 parameter after that is assigned to%ymm1/%zmm1 and not %ymm0/%zmm0.

12

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.3: Register Usage

Preserved acrossRegister Usage function calls

%eax scratch register; also used to return integer andpointer values from functions; also stores the ad-dress of a returned struct or union

No

%ebx callee-saved register; also used to hold the GOTpointer when making function calls via the PLT

Yes

%ecx scratch register No%edx scratch register; also used to return the upper

32bits of some 64bit return typesNo

%esp stack pointer Yes%ebp callee-saved register; optionally used as frame

pointerYes

%esi callee-saved register yes%edi callee-saved register yes%xmm0, %ymm0 scratch registers; also used to pass and return

__m128, __m256 parametersNo

%xmm1–%xmm2, scratch registers; also used to pass __m128, No%ymm1–%ymm2 __m256 parameters%xmm3–%xmm7, scratch registers No%ymm3–%ymm7%mm0 scratch register; also used to pass and return

__m64 parameterNo

%mm1–%mm2 used to pass __m64 parameters No%mm3–%mm7 scratch registers No%k0–%k7 scratch registers No%st0 scratch register; also used to return float,

double, long double, __float80 param-eters

No

%st1–%st7 scratch registers No%gs Reserved for system (as thread specific data reg-

ister)No

mxcsr SSE2 control and status word partialx87 SW x87 status word Nox87 CW x87 control word Yes

13

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.4: Return Value Locations for Fundamental Data TypesType C Return Value Location

_Bool %alchar The upper 24 bits of %eax are undefined. The caller must notsigned char rely on these being set in a predefined way by the calledunsigned char function.short %axsigned short The upper 16 bits of %eax are undefined. The caller must notunsigned short rely on these being set in a predefined way by the called function.int %eax

Integral signed intenumunsigned intlongsigned longunsigned longlong long %edx:%eaxsigned long long The most significant 32 bits are returned in %edx. The leastunsigned long long significant 32 bits are returned in %eax.

Pointer any-type * %eaxany-type (*)()float %st0

Floating- double %st0point long double %st0

__float80 %st0__float128 memory_Complex float %edx:%eax

The real part is returned in %eax. The imaginary part is returnedComplex in %edx.floating- _Complex double memorypoint _Complex long double memory

_Complex __float80 memory_Complex __float128 memory_Decimal32 %eax

Decimal- _Decimal64 %edx:%eaxfloating- The most significant 32 bits are returned in %edx. The leastpoint significant 32 bits are returned in %eax.

_Decimal128 memory__m64 %mm0

Packed __m128 %xmm0__m256 %ymm0__m512 %zmm0

14

Intel386 ABI 1.1 – December 7, 2015 – 8:57

is given in Table 2.6, the stack frame layout given in Table 2.7 shows the framebefore calling the function.

Table 2.5: Parameter Passing Example

typedef struct {int a, b;double d;

} structparm;structparm s;int i;__m128 v, x, y;__m256 w, z;

extern structparm func (int i, __m128 v,structparm s, __m256 w,__m128 x, __m128 y,__m256 z);

func (i, v, s, w, x, y, z);

Table 2.6: Register Allocation for Parameter Passing Example

Parameter Location before the callReturn value pointer (%esp)

i 4(%esp)v %xmm0s 8(%esp)w %ymm1x %xmm2y 32(%esp)z 64(%esp)

15

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.7: Stack Layout at the Call

Contents Lengthz 32 bytes

padding 16 bytesy 16 bytes

padding 8 bytess 16 bytesi 4 bytes

Return value pointer 4 bytes ←− %esp (32-byte aligned)

When a value of type _Bool is returned or passed in a register or on the stack,bit 0 contains the truth value and bits 1 to 7 shall be zero7.

2.2.4 Variable Argument ListsSome otherwise portable C programs depend on the argument passing scheme,implicitly assuming that all arguments are passed on the stack, and argumentsappear in increasing order on the stack. Programs that make these assumptionsnever have been portable, but they have worked on many implementations. How-ever, they do not work on the Intel386 architecture because some arguments arepassed in registers. Portable C programs must use the header file <stdarg.h>in order to handle variable argument lists.

When a function taking variable-arguments is called, all parameters are passedon the stack, including __m64, __m128 and __m256. This rule applies to bothnamed and unnamed parameters. Because parameters are passed differently de-pending on whether or not the called function takes a variable argument list, it isnecessary for such functions to be properly prototyped. Failure to do so results inundefined behavior.

7Other bits are left unspecified, hence the consumer side of those values can rely on it being 0or 1 when truncated to 8 bit.

16

Intel386 ABI 1.1 – December 7, 2015 – 8:57

2.3 Process Initialization

2.3.1 Initial Stack and Register StateSpecial Registers

The Intel386 architecture defines floating point instructions. At process startupthe two floating point units, SSE2 and x87, both have all floating-point exceptionstatus flags cleared. The status of the control words is as defined in tables 2.8 and2.9.

Table 2.8: x87 Floating-Point Control Word

Field Value NoteRC 0 Round to nearestPC 11 Double extended precisionPM 1 Precision maskedUM 1 Underflow maskedOM 1 Overflow maskedZM 1 Zero divide maskedDM 1 De-normal operand maskedIM 1 Invalid operation masked

17

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.9: MXCSR Status Bits

Field Value NoteFZ 0 Do not flush to zeroRC 0 Round to nearestPM 1 Precision maskedUM 1 Underflow maskedOM 1 Overflow maskedZM 1 Zero divide maskedDM 1 De-normal operand maskedIM 1 Invalid operation maskedDAZ 0 De-normals are not zero

The EFLAGS register contains the system flags, such as the direction flag andthe carry flag. The low 16 bits (FLAGS portion) of EFLAGS are accessible byapplication software. The state of them at process initialization is shown in ta-ble 2.10.

Table 2.10: EFLAGS Bits

Field Value NoteDF 0 Direction forwardCF 0 No carryPF 0 Even parityAF 0 No auxiliary carryZF 0 No zero resultSF 0 Unsigned resultOF 0 No overflow occurred

Stack State

This section describes the machine state that exec (BA_OS) creates for newprocesses. Various language implementations transform this initial program stateto the state required by the language standard.

18

Intel386 ABI 1.1 – December 7, 2015 – 8:57

For example, a C program begins executing at a function named main de-clared as:

extern int main ( int argc , char *argv[ ] , char* envp[ ] );

where

argc is a non-negative argument count

argv is an array of argument strings, with argv[argc] == 0

envp is an array of environment strings, terminated by a null pointer.

When main() returns its value is passed to exit() and if that has beenover-ridden and returns, _exit() (which must be immune to user interposition).

The initial state of the process stack, i.e. when _start is called is shown intable 2.11.

Table 2.11: Initial Process Stack

Purpose Start Address LengthUnspecified High AddressesInformation block, including argu-ment strings, environment strings,auxiliary information ...

varies

UnspecifiedNull auxiliary vector entry 1 fourbyteAuxiliary vector entries ... 2 fourbytes each0 fourbyteEnvironment pointers ... 1 fourbyte each0 4+4*argc+%esp fourbyteArgument pointers 4+%esp argc fourbytesArgument count %esp fourbyteUndefined Low Addresses

Argument strings, environment strings, and the auxiliary information appearin no specific order within the information block and they need not be compactlyallocated.

Only the registers listed below have specified values at process entry:

19

Intel386 ABI 1.1 – December 7, 2015 – 8:57

%ebp The content of this register is unspecified at process initialization time,but the user code should mark the deepest stack frame by setting the framepointer to zero.

%esp The stack pointer holds the address of the byte with lowest address whichis part of the stack. It is guaranteed to be 16-byte aligned at process entry.

%edx a function pointer that the application should register with atexit (BA_OS).

It is unspecified whether the data and stack segments are initially mapped withexecute permissions or not. Applications which need to execute code on the stackor data segments should take proper precautions, e.g., by calling mprotect().

2.3.2 Thread StateNew threads inherit the floating-point state of the parent thread and the state isprivate to the thread thereafter.

2.3.3 Auxiliary VectorThe auxiliary vector is an array of the following structures (ref. table 2.12), inter-preted according to the a_type member.

Table 2.12: auxv_t Type Definition

typedef struct{

int a_type;union {

long a_val;void *a_ptr;void (*a_fnc)();

} a_un;} auxv_t;

The Intel386 ABI uses the auxiliary vector types defined in table 2.13.

20

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.13: Auxiliary Vector Types

Name Value a_unAT_NULL 0 ignoredAT_IGNORE 1 ignoredAT_EXECFD 2 a_valAT_PHDR 3 a_ptrAT_PHENT 4 a_valAT_PHNUM 5 a_valAT_PAGESZ 6 a_valAT_BASE 7 a_ptrAT_FLAGS 8 a_valAT_ENTRY 9 a_ptrAT_NOTELF 10 a_valAT_UID 11 a_valAT_EUID 12 a_valAT_GID 13 a_valAT_EGID 14 a_valAT_PLATFORM 15 a_ptrAT_HWCAP 16 a_valAT_CLKTCK 17 a_valAT_SECURE 23 a_valAT_BASE_PLATFORM 24 a_ptrAT_RANDOM 25 a_ptrAT_HWCAP2 26 a_valAT_EXECFN 31 a_ptr

AT_NULL The auxiliary vector has no fixed length; instead its last entry’s a_typemember has this value.

AT_IGNORE This type indicates the entry has no meaning. The correspondingvalue of a_un is undefined.

AT_EXECFD At process creation the system may pass control to an interpreterprogram. When this happens, the system places either an entry of typeAT_EXECFD or one of type AT_PHDR in the auxiliary vector. The entry

21

Intel386 ABI 1.1 – December 7, 2015 – 8:57

for type AT_EXECFD uses the a_val member to contain a file descriptoropen to read the application program’s object file.

AT_PHDR The system may create the memory image of the application programbefore passing control to the interpreter program. When this happens, thea_ptr member of the AT_PHDR entry tells the interpreter where to findthe program header table in the memory image.

AT_PHENT The a_val member of this entry holds the size, in bytes, of oneentry in the program header table to which the AT_PHDR entry points.

AT_PHNUM The a_val member of this entry holds the number of entries inthe program header table to which the AT_PHDR entry points.

AT_PAGESZ If present, this entry’s a_val member gives the system page size,in bytes.

AT_BASE The a_ptr member of this entry holds the base address at which theinterpreter program was loaded into memory. See “Program Header” in theSystem V ABI for more information about the base address.

AT_FLAGS If present, the a_val member of this entry holds one-bit flags. Bitswith undefined semantics are set to zero.

AT_ENTRY The a_ptr member of this entry holds the entry point of the appli-cation program to which the interpreter program should transfer control.

AT_NOTELF The a_val member of this entry is non-zero if the program is inanother format than ELF.

AT_UID The a_val member of this entry holds the real user id of the process.

AT_EUID The a_val member of this entry holds the effective user id of theprocess.

AT_GID The a_val member of this entry holds the real group id of the process.

AT_EGID The a_val member of this entry holds the effective group id of theprocess.

AT_PLATFORM The a_ptr member of this entry points to a string containingthe platform name.

22

Intel386 ABI 1.1 – December 7, 2015 – 8:57

AT_HWCAP The a_val member of this entry contains an bitmask of CPUfeatures. It mask to the value returned by CPUID 1.EDX.

AT_CLKTCK The a_valmember of this entry contains the frequency at whichtimes() increments.

AT_SECURE The a_val member of this entry contains one if the program isin secure mode (for example started with suid). Otherwise zero.

AT_BASE_PLATFORM The a_ptr member of this entry points to a stringidentifying the base architecture platform (which may be different from theplatform).

AT_RANDOM The a_ptrmember of this entry points to 16 securely generatedrandom bytes.

AT_HWCAP2 The a_valmember of this entry contains the extended hardwarefeature mask. Currently it is 0, but may contain additional feature bits in thefuture.

AT_EXECFN The a_ptr member of this entry is a pointer to the file name ofthe executed program.

2.4 DWARF DefinitionThis section8 defines the Debug With Arbitrary Record Format (DWARF) debug-ging format for the Intel386 processor family. The Intel386 ABI does not define adebug format. However, all systems that do implement DWARF on Intel386 shalluse the following definitions.

DWARF is a specification developed for symbolic, source-level debugging.The debugging information format does not favor the design of any compiler ordebugger. For more information on DWARF, see DWARF Debugging FormatStandard, available at: http://www.dwarfstd.org/.

8This section is structured in a way similar to the PowerPC psABI

23

Intel386 ABI 1.1 – December 7, 2015 – 8:57

2.4.1 DWARF Release NumberThe DWARF definition requires some machine-specific definitions. The registernumber mapping needs to be specified for the Intel386 registers. In addition, start-ing with version 3 the DWARF specification requires processor-specific addressclass codes to be defined.

2.4.2 DWARF Register Number MappingTable 2.149 outlines the register number mapping for the Intel386 processor fam-ily.10

2.5 Stack Unwind AlgorithmThe stack frames are not self descriptive and where stack unwinding is desirable(such as for exception handling) additional unwind information needs to be gen-erated. The information is stored in an allocatable section .eh_frame whoseformat is identical to .debug_frame defined by the DWARF debug informa-tion standard, see DWARF Debugging Information Format, with the followingextensions:

Position independence In order to avoid load time relocations for position inde-pendent code, the FDE CIE offset pointer should be stored relative to thestart of CIE table entry. Frames using this extension of the DWARF stan-dard must set the CIE identifier tag to 1.

Outgoing arguments area delta To maintain the size of the temporarily allo-cated outgoing arguments area present on the end of the stack (when us-ing push instructions), operation GNU_ARGS_SIZE (0x2e) can be used.This operation takes a single uleb128 argument specifying the currentsize. This information is used to adjust the stack frame when jumping intothe exception handler of the function after unwinding the stack frame. Ad-ditionally the CIE Augmentation shall contain an exact specification of theencoding used. It is recommended to use a PC relative encoding wheneverpossible and adjust the size according to the code model used.

9The table defines Return Address to have a register number, even though the address is storedin 0(%esp) and not in a physical register.

10This document does not define mappings for privileged registers.

24

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.14: DWARF Register Number Mapping

Register Name Number AbbreviationGeneral Purpose Register EAX 0 %eaxGeneral Purpose Register ECX 1 %ecxGeneral Purpose Register EDX 2 %edxGeneral Purpose Register EBX 3 %ebxStack Pointer Register ESP 4 %espFrame Pointer Register EBP 5 %ebpGeneral Purpose Register ESI 6 %esiGeneral Purpose Register EDI 7 %ediReturn Address RA 8Flag Register 9 %EFLAGSReserved 10Floating Point Registers 0–7 11-18 %st0–%st7Reserved 19-20Vector Registers 0–7 21-28 %xmm0–%xmm7MMX Registers 0–7 29-36 %mm0–%mm7Media Control and Status 39 %mxcsrSegment Register ES 40 %esSegment Register CS 41 %csSegment Register SS 42 %ssSegment Register DS 43 %dsSegment Register FS 44 %fsSegment Register GS 45 %gsReserved 46-47Task Register 48 %trLDT Register 49 %ldtrReserved 50-92FS Base address 93 %fs.baseGS Base address 94 %gs.base

25

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 2.15: Pointer Encoding Specification Byte

Mask Meaning0x1 Values are stored as uleb128 or sleb128 type (according to flag 0x8)0x2 Values are stored as 2 bytes wide integers (udata2 or sdata2)0x3 Values are stored as 4 bytes wide integers (udata4 or sdata4)0x4 Values are stored as 8 bytes wide integers (udata8 or sdata8)0x8 Values are signed

0x10 Values are PC relative0x20 Values are text section relative0x30 Values are data section relative0x40 Values are relative to the start of function

CIE Augmentations: The augmentation field is formated according to the aug-mentation field formating string stored in the CIE header.

The string may contain the following characters:

z Indicates that a uleb128 is present determining the size of the augmen-tation section.

L Indicates the encoding (and thus presence) of an LSDA pointer in theFDE augmentation.The data filed consist of single byte specifying the way pointers areencoded. It is a mask of the values specified by the table 2.15.The default DWARF pointer encoding (direct 4-byte absolute pointers)is represented by value 0.

R Indicates a non-default pointer encoding for FDE code pointers. Theformating is represented by a single byte in the same way as in the ‘L’command.

P Indicates the presence and an encoding of a language personality routinein the CIE augmentation. The encoding is represented by a single bytein the same way as in the ’L’ command followed by a pointer to thepersonality function encoded by the specified encoding.

When the augmentation is present, the first command must always be ‘z’ toallow easy skipping of the information.

26

Intel386 ABI 1.1 – December 7, 2015 – 8:57

In order to simplify manipulation of the unwind tables, the runtime library pro-vide higher level API to stack unwinding mechanism, for details see section 4.1.

27

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 3

Object Files

3.1 Sections

3.1.1 Special Sections

Table 3.1: Special sections

Name Type Attributes.eh_frame SHT_PROGBITS SHF_ALLOC

.eh_frame This section holds the unwind function table. The contents are de-scribed in Section 3.1.2 of this document.

3.1.2 EH_FRAME sectionsThe call frame information needed for unwinding the stack is output into one sec-tion named .eh_frame. An .eh_frame section consists of one or more sub-sections. Each subsection contains a CIE (Common Information Entry) followedby varying number of FDEs (Frame Descriptor Entry). A FDE corresponds to anexplicit or compiler generated function in a compilation unit, all FDEs can accessthe CIE that begins their subsection for data. If the code for a function is not onecontiguous block, there will be a separate FDE for each contiguous sub-piece.

28

Intel386 ABI 1.1 – December 7, 2015 – 8:57

If an object file contains C++ template instantiations there shall be a separateCIE immediately preceding each FDE corresponding to an instantiation.

Using the preferred encoding specified below, the .eh_frame section can beentirely resolved at link time and thus can become part of the text segment.

EH_PE encoding below refers to the pointer encoding as specified in the en-hanced LSB Chapter 7 for Eh_Frame_Hdr.

29

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 3.2: Common Information Entry (CIE)

Field Length (byte) DescriptionLength 4 Length of the CIE (not including this 4-

byte field)CIE id 4 Value 0 for .eh_frame (used to distin-

guish CIEs and FDEs when scanning thesection)

Version 1 Value One (1)CIE Augmenta-tion String

string Null-terminated string with legal valuesbeing "" or ’z’ optionally followed by sin-gle occurrances of ’P’, ’L’, or ’R’ in anyorder. The presence of character(s) in thestring dictates the content of field 8, theAugmentation Section. Each character hasone or two associated operands in the AS(see table 3.3 for which ones). Operandorder depends on position in the string (’z’must be first).

Code Align Fac-tor

uleb128 To be multiplied with the "Advance Lo-cation" instructions in the Call Frame In-structions

Data Align Fac-tor

sleb128 To be multiplied with all offsets in the CallFrame Instructions

Ret Address Reg 1/uleb128 A "virtual" register representation of thereturn address. In Dwarf V2, this is a byte,otherwise it is uleb128. It is a byte in gcc3.3.x

Optional CIEAugmentationSection

varying Present if Augmentation String in Aug-mentation Section field 4 is not 0. See ta-ble 3.3 for the content.

Optional CallFrame Instruc-tions

varying

30

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 3.3: CIE Augmentation Section Content

Char Operands Length (byte) Descriptionz size uleb128 Length of the remainder of the Augmen-

tation SectionP personality_enc 1 Encoding specifier - preferred value is a

pc-relative, signed 4-bytepersonalityroutine

(encoded) Encoded pointer to personality routine(actually to the PLT entry for the per-sonality routine)

R code_enc 1 Non-default encoding for thecode-pointers (FDE membersinitial_location andaddress_range and the operand forDW_CFA_set_loc) - preferred valueis pc-relative, signed 4-byte

L lsda_enc 1 FDE augmentation bodies may containLSDA pointers. If so they are encodedas specified here - preferred value is pc-relative, signed 4-byte possibly indirectthru a GOT entry

31

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 3.4: Frame Descriptor Entry (FDE)

Field Length (byte) DescriptionLength 4 Length of the FDE (not including this 4-

byte field)CIE pointer 4 Distance from this field to the nearest pre-

ceding CIE (the value is subtracted fromthe current address). This value can neverbe zero and thus can be used to distin-guish CIE’s and FDE’s when scanning the.eh_frame section

Initial Location var Reference to the function code correspond-ing to this FDE. If ’R’ is missing fromthe CIE Augmentation String, the field isan 8-byte absolute pointer. Otherwise, thecorresponding EH_PE encoding in the CIEAugmentation Section is used to interpretthe reference

Address Range var Size of the function code corresponding tothis FDE. If ’R’ is missing from the CIEAugmentation String, the field is an 8-byteunsigned number. Otherwise, the size isdetermined by the corresponding EH_PEencoding in the CIE Augmentation Section(the value is always absolute)

Optional FDEAugmentationSection

var Present if CIE Augmentation String is non-empty. See table 3.5 for the content.

Optional CallFrame Instruc-tions

var

32

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 3.5: FDE Augmentation Section Content

Char Operands Length (byte) Descriptionz length uleb128 Length of the remainder of the Augmen-

tation SectionL LSDA var LSDA pointer, encoded in the format

specified by the corresponding operandin the CIE’s augmentation body. (onlypresent if length > 0).

The existence and size of the optional call frame instruction area must be com-puted based on the overall size and the offset reached while scanning the precedingfields of the CIE or FDE.

The overall size of a .eh_frame section is given in the ELF section header.The only way to determine the number of entries is to scan the section until theend, counting entries as they are encountered.

3.2 Symbol TableThe STT_GNU_IFUNC 1 symbol type is optional. It is the same as STT_FUNCexcept that it always points to a function or piece of executable code which takesno arguments and returns a function pointer. If an STT_GNU_IFUNC symbolis referred to by a relocation, then evaluation of that relocation is delayed untilload-time. The value used in the relocation is the function pointer returned by aninvocation of the STT_GNU_IFUNC symbol.

The purpose of the STT_GNU_IFUNC symbol type is to allow the run-time toselect between multiple versions of the implementation of a specific function. Theselection made in general will take the currently available hardware into accountand select the most appropriate version.

1It is specified in ifunc.txt at http://sites.google.com/site/x32abi/documents

33

Intel386 ABI 1.1 – December 7, 2015 – 8:57

3.3 Relocation

3.3.1 Relocation TypesFigure 3.3.1 shows the allowed relocatable fields.

Figure 3.1: Relocatable Fields

7 word8 0

15 word16 0

31 word32 0

word8 This specifies a 8-bit field occupying 1 byte.word16 This specifies a 16-bit field occupying 2 bytes with arbitrary

byte alignment. These values use the same byte order asother word values in the Intel386 architecture.

word32 This specifies a 32-bit field occupying 4 bytes with arbitrarybyte alignment. These values use the same byte order asother word values in the Intel386 architecture.

The following notations are used for specifying relocations in table 3.6:

A Represents the addend used to compute the value of the relocatable field.

B Represents the base address at which a shared object has been loaded into mem-ory during execution. Generally, a shared object is built with a 0 base virtualaddress, but the execution address will be different.

34

Intel386 ABI 1.1 – December 7, 2015 – 8:57

G Represents the offset into the global offset table at which the relocation entry’ssymbol will reside during execution.

GOT Represents the address of the global offset table.

L Represents the place (section offset or address) of the Procedure Linkage Tableentry for a symbol.

P Represents the place (section offset or address) of the storage unit being relo-cated (computed using r_offset).

S Represents the value of the symbol whose index resides in the relocation entry.

Z Represents the size of the symbol whose index resides in the relocation entry.

35

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Table 3.6: Relocation Types

Name Value Field CalculationR_386_NONE 0 none noneR_386_32 1 word32 S + AR_386_PC32 2 word32 S + A - PR_386_GOT32 3 word32 G + A - GOTR_386_PLT32 4 word32 L + A - PR_386_COPY 5 none noneR_386_GLOB_DAT 6 word32 SR_386_JUMP_SLOT 7 word32 SR_386_RELATIVE 8 word32 B + AR_386_GOTOFF 9 word32 S + A - GOTR_386_GOTPC 10 word32 GOT + A - PR_386_TLS_TPOFF 14 word32R_386_TLS_IE 15 word32R_386_TLS_GOTIE 16 word32R_386_TLS_LE 17 word32R_386_TLS_GD 18 word32R_386_TLS_LDM 19 word32R_386_16 20 word16 S + AR_386_PC16 21 word16 S + A - PR_386_8 22 word8 S + AR_386_PC8 23 word8 S + A - PR_386_TLS_GD_32 24 word32R_386_TLS_GD_PUSH 25 word32R_386_TLS_GD_CALL 26 word32R_386_TLS_GD_POP 27 word32R_386_TLS_LDM_32 28 word32R_386_TLS_LDM_PUSH 29 word32R_386_TLS_LDM_CALL 30 word32R_386_TLS_LDM_POP 31 word32R_386_TLS_LDO_32 32 word32R_386_TLS_IE_32 33 word32R_386_TLS_LE_32 34 word32R_386_TLS_DTPMOD32 35 word32R_386_TLS_DTPOFF32 36 word32R_386_TLS_TPOFF32 37 word32R_386_SIZE32 38 word32 Z + AR_386_TLS_GOTDESC 39 word32R_386_TLS_DESC_CALL 40 none noneR_386_TLS_DESC 41 word32R_386_IRELATIVE 42 word32 indirect (B + A)R_386_GOT32X 43 word32 G + A - GOT / G + A†

† Used without base register when position-independent code is disabled.

The R_386_GOT32X relocation can be used to compute the address of thesymbol’s global offset table entry without base register when position-independentcode is disabled. For name@GOT in:

36

Intel386 ABI 1.1 – December 7, 2015 – 8:57

call *name@GOT(%reg)jmp *name@GOT(%reg)mov name@GOT(%reg1), %reg2test %reg1, name@GOT(%reg2)binop name@GOT(%reg1), %reg2

as well as

call *name@GOTjmp *name@GOTmov name@GOT, %regtest %reg, name@GOTbinop name@GOT, %reg

where binop is one of adc, add, and, cmp, or, sbb, sub, xor instructions2,the R_386_GOT32X relocation should be generated, instead of the R_386_GOT32relocation.

A program or object file using R_386_8, R_386_16, R_386_PC16 orR_386_PC8 relocations is not conformant to this ABI, these relocations are onlyadded for documentation purposes. The R_386_16, and R_386_8 relocationstruncate the computed value to 16-bits and 8-bits respectively.

The relocations R_386_TLS_TPOFF, R_386_TLS_IE,R_386_TLS_GOTIE, R_386_TLS_LE, R_386_TLS_GD,R_386_TLS_LDM, R_386_TLS_GD_32, R_386_TLS_GD_PUSH,R_386_TLS_GD_CALL, R_386_TLS_GD_POP, R_386_TLS_LDM_32,R_386_TLS_LDM_PUSH, R_386_TLS_LDM_CALL,R_386_TLS_LDM_POP, R_386_TLS_LDO_32, R_386_TLS_IE_32,R_386_TLS_LE_32, R_386_TLS_DTPMOD32, R_386_TLS_DTPOFF32and R_386_TLS_TPOFF32 are listed for completeness. They are part ofthe Thread-Local Storage ABI extensions and are documented in the doc-ument called “ELF Handling for Thread-Local Storage”3. The relocationsR_386_TLS_GOTDESC, R_386_TLS_DESC_CALL and R_386_TLS_DESCare also used for Thread-Local Storage, but are not documented there as of thiswriting. A description can be found in the document “Thread-Local StorageDescriptors for IA32 and AMD64/EM64T”4.

2mov name@GOT, %eax must be encoded with opcode 0x8b, not 0xa0, to allow linkeroptimization.

3This document is currently available via http://www.akkadia.org/drepper/tls.pdf

4This document is currently available via http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt

37

Intel386 ABI 1.1 – December 7, 2015 – 8:57

R_386_IRELATIVE is similar to R_386_RELATIVE except that the valueused in this relocation is the program address returned by the function, which takesno arguments, at the address of the result of the corresponding R_386_RELATIVErelocation.

One use of the R_386_IRELATIVE relocation is to avoid name lookup forthe locally defined STT_GNU_IFUNC symbols at load-time. Support for thisrelocation is optional, but is required for the STT_GNU_IFUNC symbols.

38

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 4

Libraries

4.1 Unwind Library InterfaceThis section defines the Unwind Library interface1, expected to be provided byany Intel386 psABI-compliant system. This is the interface on which the C++ABI exception-handling facilities are built. We assume as a basis the Call FrameInformation tables described in the DWARF Debugging Information Format doc-ument.

This section is meant to specify a language-independent interface that can beused to provide higher level exception-handling facilities such as those defined byC++.

The unwind library interface consists of at least the following routines:_Unwind_RaiseException ,_Unwind_Resume ,_Unwind_DeleteException ,_Unwind_GetGR ,_Unwind_SetGR ,_Unwind_GetIP ,_Unwind_SetIP ,_Unwind_GetRegionStart ,_Unwind_GetLanguageSpecificData ,_Unwind_ForcedUnwind ,_Unwind_GetCFA

1The overall structure and the external interface is derived from the IA-64 UNIX System VABI

39

Intel386 ABI 1.1 – December 7, 2015 – 8:57

In addition, two data types are defined (_Unwind_Context and_Unwind_Exception ) to interface a calling runtime (such as the C++ run-time) and the above routine. All routines and interfaces behave as if definedextern "C". In particular, the names are not mangled. All names definedas part of this interface have a "_Unwind_" prefix.

Lastly, a language and vendor specific personality routine will be stored bythe compiler in the unwind descriptor for the stack frames requiring exceptionprocessing. The personality routine is called by the unwinder to handle language-specific tasks such as identifying the frame handling a particular exception.

4.1.1 Exception Handler FrameworkReasons for Unwinding

There are two major reasons for unwinding the stack:

• exceptions, as defined by languages that support them (such as C++)

• “forced” unwinding (such as caused by longjmp or thread termination)

The interface described here tries to keep both similar. There is a major dif-ference, however.

• In the case where an exception is thrown, the stack is unwound while theexception propagates, but it is expected that the personality routine for eachstack frame knows whether it wants to catch the exception or pass it through.This choice is thus delegated to the personality routine, which is expected toact properly for any type of exception, whether “native” or “foreign”. Someguidelines for “acting properly” are given below.

• During “forced unwinding”, on the other hand, an external agent is drivingthe unwinding. For instance, this can be the longjmp routine. This exter-nal agent, not each personality routine, knows when to stop unwinding. Thefact that a personality routine is not given a choice about whether unwindingwill proceed is indicated by the _UA_FORCE_UNWIND flag.

To accommodate these differences, two different routines are proposed._Unwind_RaiseException performs exception-style unwinding, undercontrol of the personality routines. _Unwind_ForcedUnwind , on the otherhand, performs unwinding, but gives an external agent the opportunity to intercept

40

Intel386 ABI 1.1 – December 7, 2015 – 8:57

calls to the personality routine. This is done using a proxy personality routine, thatintercepts calls to the personality routine, letting the external agent override thedefaults of the stack frame’s personality routine.

As a consequence, it is not necessary for each personality routine to knowabout any of the possible external agents that may cause an unwind. For instance,the C++ personality routine need deal only with C++ exceptions (and possiblydisguising foreign exceptions), but it does not need to know anything specificabout unwinding done on behalf of longjmp or pthreads cancellation.

The Unwind Process

The standard ABI exception handling/unwind process begins with the raising of anexception, in one of the forms mentioned above. This call specifies an exceptionobject and an exception class.

The runtime framework then starts a two-phase process:

• In the search phase, the framework repeatedly calls the personality routine,with the _UA_SEARCH_PHASE flag as described below, first for the cur-rent %eip and register state, and then unwinding a frame to a new %eipat each step, until the personality routine reports either success (a handlerfound in the queried frame) or failure (no handler) in all frames. It does notactually restore the unwound state, and the personality routine must accessthe state through the API.

• If the search phase reports a failure, e.g. because no handler was found, itwill call terminate() rather than commence phase 2.

If the search phase reports success, the framework restarts in the cleanupphase. Again, it repeatedly calls the personality routine, with the_UA_CLEANUP_PHASE flag as described below, first for the current%eip and register state, and then unwinding a frame to a new %eip ateach step, until it gets to the frame with an identified handler. At that point,it restores the register state, and control is transferred to the user landingpad code.

Each of these two phases uses both the unwind library and the personalityroutines, since the validity of a given handler and the mechanism for transferringcontrol to it are language-dependent, but the method of locating and restoringprevious stack frames is language-independent.

41

Intel386 ABI 1.1 – December 7, 2015 – 8:57

A two-phase exception-handling model is not strictly necessary to implementC++ language semantics, but it does provide some benefits. For example, the firstphase allows an exception-handling mechanism to dismiss an exception beforestack unwinding begins, which allows presumptive exception handling (correctingthe exceptional condition and resuming execution at the point where it was raised).While C++ does not support presumptive exception handling, other languages do,and the two-phase model allows C++ to coexist with those languages on the stack.

Note that even with a two-phase model, we may execute each of the two phasesmore than once for a single exception, as if the exception was being thrown morethan once. For instance, since it is not possible to determine if a given catch clausewill re-throw or not without executing it, the exception propagation effectivelystops at each catch clause, and if it needs to restart, restarts at phase 1. Thisprocess is not needed for destructors (cleanup code), so the phase 1 can safelyprocess all destructor-only frames at once and stop at the next enclosing catchclause.

For example, if the first two frames unwound contain only cleanup code, andthe third frame contains a C++ catch clause, the personality routine in phase 1,does not indicate that it found a handler for the first two frames. It must do so forthe third frame, because it is unknown how the exception will propagate out ofthis third frame, e.g. by re-throwing the exception or throwing a new one in C++.

The API specified by the Intel386 psABI for implementing this framework isdescribed in the following sections.

4.1.2 Data StructuresReason Codes

The unwind interface uses reason codes in several contexts to identify the reasonsfor failures or other actions, defined as follows:

42

Intel386 ABI 1.1 – December 7, 2015 – 8:57

typedef enum {_URC_NO_REASON = 0,_URC_FOREIGN_EXCEPTION_CAUGHT = 1,_URC_FATAL_PHASE2_ERROR = 2,_URC_FATAL_PHASE1_ERROR = 3,_URC_NORMAL_STOP = 4,_URC_END_OF_STACK = 5,_URC_HANDLER_FOUND = 6,_URC_INSTALL_CONTEXT = 7,_URC_CONTINUE_UNWIND = 8

} _Unwind_Reason_Code;The interpretations of these codes are described below.

Exception Header

The unwind interface uses a pointer to an exception header object as its repre-sentation of an exception being thrown. In general, the full representation of anexception object is language- and implementation-specific, but is prefixed by aheader understood by the unwind interface, defined as follows:

typedef void (*_Unwind_Exception_Cleanup_Fn)(_Unwind_Reason_Code reason,struct _Unwind_Exception *exc);

struct _Unwind_Exception {uint64 exception_class;_Unwind_Exception_Cleanup_Fn exception_cleanup;uint32 private_1;uint32 private_2;

};

An _Unwind_Exception object must be eightbyte aligned. The first twofields are set by user code prior to raising the exception, and the latter two shouldnever be touched except by the runtime.

The exception_class field is a language- and implementation-specificidentifier of the kind of exception. It allows a personality routine to distinguishbetween native and foreign exceptions, for example. By convention, the high 4bytes indicate the vendor (for instance GNUC), and the low 4 bytes indicate thelanguage. For the C++ ABI described in this document, the low four bytes areC++\0.

43

Intel386 ABI 1.1 – December 7, 2015 – 8:57

The exception_cleanup routine is called whenever an exception objectneeds to be destroyed by a different runtime than the runtime which created theexception object, for instance if a Java exception is caught by a C++ catch handler.In such a case, a reason code (see above) indicates why the exception object needsto be deleted:

_URC_FOREIGN_EXCEPTION_CAUGHT = 1 This indicates that a differentruntime caught this exception. Nested foreign exceptions, or re-throwing aforeign exception, result in undefined behavior.

_URC_FATAL_PHASE1_ERROR = 3 The personality routine encountered anerror during phase 1, other than the specific error codes defined.

_URC_FATAL_PHASE2_ERROR = 2 The personality routine encountered anerror during phase 2, for instance a stack corruption.

Normally, all errors should be reported during phase 1 by returning from_Unwind_RaiseException. However, landing pad code could cause stackcorruption between phase 1 and phase 2. For a C++ exception, the runtime shouldcall terminate() in that case.

The private unwinder state (private_1 and private_2) in an exceptionobject should be neither read by nor written to by personality routines or otherparts of the language-specific runtime. It is used by the specific implementationof the unwinder on the host to store internal information, for instance to rememberthe final handler frame between unwinding phases.

In addition to the above information, a typical runtime such as the C++ run-time will add language-specific information used to process the exception. Thisis expected to be a contiguous area of memory after the _Unwind_Exceptionobject, but this is not required as long as the matching personality routines knowhow to deal with it, and the exception_cleanup routine de-allocates it prop-erly.

Unwind Context

The _Unwind_Context type is an opaque type used to refer to a system-specific data structure used by the system unwinder. This context is created anddestroyed by the system, and passed to the personality routine during unwinding.

struct _Unwind_Context

44

Intel386 ABI 1.1 – December 7, 2015 – 8:57

4.1.3 Throwing an Exception_Unwind_RaiseException

_Unwind_Reason_Code _Unwind_RaiseException( struct _Unwind_Exception *exception_object );

Raise an exception, passing along the given exception object, which shouldhave its exception_class and exception_cleanup fields set. The ex-ception object has been allocated by the language-specific runtime, and has alanguage-specific format, except that it must contain an _Unwind_Exceptionstruct (see Exception Header above). _Unwind_RaiseException does notreturn, unless an error condition is found (such as no handler for the exception,bad stack format, etc.). In such a case, an _Unwind_Reason_Code value isreturned.

Possibilities are:

_URC_END_OF_STACK The unwinder encountered the end of the stack dur-ing phase 1, without finding a handler. The unwind runtime willnot have modified the stack. The C++ runtime will normally calluncaught_exception() in this case.

_URC_FATAL_PHASE1_ERROR The unwinder encountered an unexpected er-ror during phase 1, e.g. stack corruption. The unwind runtime will not havemodified the stack. The C++ runtime will normally call terminate() inthis case.

If the unwinder encounters an unexpected error during phase 2, it should re-turn _URC_FATAL_PHASE2_ERROR to its caller. In C++, this will usually be__cxa_throw, which will call terminate().

The unwind runtime will likely have modified the stack (e.g. popped framesfrom it) or register context, or landing pad code may have corrupted them. As aresult, the the caller of _Unwind_RaiseException can make no assumptionsabout the state of its stack or registers.

45

Intel386 ABI 1.1 – December 7, 2015 – 8:57

_Unwind_ForcedUnwind

typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn)(int version,_Unwind_Action actions,uint64 exceptionClass,struct _Unwind_Exception *exceptionObject,struct _Unwind_Context *context,void *stop_parameter );_Unwind_Reason_Code_Unwind_ForcedUnwind( struct _Unwind_Exception *exception_object,_Unwind_Stop_Fn stop,void *stop_parameter );

Raise an exception for forced unwinding, passing along the givenexception object, which should have its exception_class andexception_cleanup fields set. The exception object has been allo-cated by the language-specific runtime, and has a language-specific format,except that it must contain an _Unwind_Exception struct (see ExceptionHeader above).

Forced unwinding is a single-phase process (phase 2 of the normal exception-handling process). The stop and stop_parameter parameters control thetermination of the unwind process, instead of the usual personality routine query.The stop function parameter is called for each unwind frame, with the pa-rameters described for the usual personality routine below, plus an additionalstop_parameter.

When the stop function identifies the destination frame, it transfers control(according to its own, unspecified, conventions) to the user code as appropriatewithout returning, normally after calling _Unwind_DeleteException. Ifnot, it should return an _Unwind_Reason_Code value as follows:

_URC_NO_REASON This is not the destination frame. The unwind runtime willcall the frame’s personality routine with the _UA_FORCE_UNWIND and_UA_CLEANUP_PHASE flags set in actions, and then unwind to the nextframe and call the stop function again.

_URC_END_OF_STACK In order to allow _Unwind_ForcedUnwind to per-form special processing when it reaches the end of the stack, the unwindruntime will call it after the last frame is rejected, with a NULL stack pointer

46

Intel386 ABI 1.1 – December 7, 2015 – 8:57

in the context, and the stop function must catch this condition (i.e. by notic-ing the NULL stack pointer). It may return this reason code if it cannothandle end-of-stack.

_URC_FATAL_PHASE2_ERROR The stop function may return this code forother fatal conditions, e.g. stack corruption.

If the stop function returns any reason code other than _URC_NO_REASON,the stack state is indeterminate from the point of view of the caller of_Unwind_ForcedUnwind. Rather than attempt to return, therefore, the un-wind library should return _URC_FATAL_PHASE2_ERROR to its caller.

Example: longjmp_unwind()The expected implementation of longjmp_unwind() is as follows. The

setjmp() routine will have saved the state to be restored in its custom-ary place, including the frame pointer. The longjmp_unwind() routinewill call _Unwind_ForcedUnwind with a stop function that compares theframe pointer in the context record with the saved frame pointer. If equal,it will restore the setjmp() state as customary, and otherwise it will return_URC_NO_REASON or _URC_END_OF_STACK.

If a future requirement for two-phase forced unwinding were identified, an al-ternate routine could be defined to request it, and an actions parameter flag definedto support it.

_Unwind_Resume

void _Unwind_Resume(struct _Unwind_Exception *exception_object);

Resume propagation of an existing exception e.g. after executing cleanup codein a partially unwound stack. A call to this routine is inserted at the end of alanding pad that performed cleanup, but did not resume normal execution. Itcauses unwinding to proceed further.

_Unwind_Resume should not be used to implement re-throwing. To theunwinding runtime, the catch code that re-throws was a handler, and the previousunwinding session was terminated before entering it. Re-throwing is implementedby calling _Unwind_RaiseException again with the same exception object.

This is the only routine in the unwind library which is expected to be calleddirectly by generated code: it will be called at the end of a landing pad in a"landing-pad" model.

47

Intel386 ABI 1.1 – December 7, 2015 – 8:57

4.1.4 Exception Object Management_Unwind_DeleteException

void _Unwind_DeleteException(struct _Unwind_Exception *exception_object);

Deletes the given exception object. If a given runtime resumes nor-mal execution after catching a foreign exception, it will not know howto delete that exception. Such an exception will be deleted by calling_Unwind_DeleteException. This is a convenience function that calls thefunction pointed to by the exception_cleanup field of the exception header.

4.1.5 Context ManagementThese functions are used for communicating information about the unwind con-text (i.e. the unwind descriptors and the user register state) between the unwindlibrary and the personality routine and landing pad. They include routines to reador set the context record images of registers in the stack frame corresponding to agiven unwind context, and to identify the location of the current unwind descrip-tors and unwind frame.

_Unwind_GetGR

uint32 _Unwind_GetGR(struct _Unwind_Context *context, int index);

This function returns the 32-bit value of the given general register. The registeris identified by its index as given in table 2.14.

During the two phases of unwinding, no registers have a guaranteed value.

_Unwind_SetGR

void _Unwind_SetGR(struct _Unwind_Context *context,int index,uint32 new_value);

This function sets the 32-bit value of the given register, identified by its indexas for _Unwind_GetGR.

The behavior is guaranteed only if the function is called during phase 2 ofunwinding, and applied to an unwind context representing a handler frame, for

48

Intel386 ABI 1.1 – December 7, 2015 – 8:57

which the personality routine will return _URC_INSTALL_CONTEXT. In thatcase, only registers %eax and %edx should be used. These scratch registers arereserved for passing arguments between the personality routine and the landingpads.

_Unwind_GetIP

uint32 _Unwind_GetIP(struct _Unwind_Context *context);

This function returns the 32-bit value of the instruction pointer (IP).During unwinding, the value is guaranteed to be the address of the instruction

immediately following the call site in the function identified by the unwind con-text. This value may be outside of the procedure fragment for a function call thatis known to not return (such as _Unwind_Resume).

_Unwind_SetIP

void _Unwind_SetIP(struct _Unwind_Context *context,uint32 new_value);

This function sets the value of the instruction pointer (IP) for the routine iden-tified by the unwind context.

The behavior is guaranteed only when this function is called for an unwindcontext representing a handler frame, for which the personality routine will return_URC_INSTALL_CONTEXT. In this case, control will be transferred to the givenaddress, which should be the address of a landing pad.

_Unwind_GetLanguageSpecificData

uint32 _Unwind_GetLanguageSpecificData(struct _Unwind_Context *context);This routine returns the address of the language-specific data area for the cur-

rent stack frame.This routine is not strictly required: it could be accessed through

_Unwind_GetIP using the documented format of the DWARF Call Frame In-formation Tables, but since this work has been done for finding the personalityroutine in the first place, it makes sense to cache the result in the context. Wecould also pass it as an argument to the personality routine.

49

Intel386 ABI 1.1 – December 7, 2015 – 8:57

_Unwind_GetRegionStart

uint32 _Unwind_GetRegionStart(struct _Unwind_Context *context);

This routine returns the address of the beginning of the procedure or codefragment described by the current unwind descriptor block.

This information is required to access any data stored relative to the beginningof the procedure fragment. For instance, a call site table might be stored relativeto the beginning of the procedure fragment that contains the calls. During un-winding, the function returns the start of the procedure fragment containing thecall site in the current stack frame.

_Unwind_GetCFA

uint32 _Unwind_GetCFA(struct _Unwind_Context *context);

This function returns the 32-bit Canonical Frame Address which is defined asthe value of %esp at the call site in the previous frame. This value is guaranteedto be correct any time the context has been passed to a personality routine or astop function.

4.1.6 Personality Routine_Unwind_Reason_Code (*__personality_routine)

(int version,_Unwind_Action actions,uint64 exceptionClass,struct _Unwind_Exception *exceptionObject,struct _Unwind_Context *context);

The personality routine is the function in the C++ (or other language) run-time library which serves as an interface between the system unwind library andlanguage-specific exception handling semantics. It is specific to the code fragmentdescribed by an unwind info block, and it is always referenced via the pointer inthe unwind info block, and hence it has no psABI-specified name.

Parameters

The personality routine parameters are as follows:

50

Intel386 ABI 1.1 – December 7, 2015 – 8:57

version Version number of the unwinding runtime, used to detect a mis-matchbetween the unwinder conventions and the personality routine, or to providebackward compatibility. For the conventions described in this document,version will be 1.

actions Indicates what processing the personality routine is expected to per-form, as a bit mask. The possible actions are described below.

exceptionClass An 8-byte identifier specifying the type of the thrown ex-ception. By convention, the high 4 bytes indicate the vendor (for instanceGNUC), and the low 4 bytes indicate the language. For the C++ ABI de-scribed in this document, the low four bytes are C++\0. This is not a null-terminated string. Some implementations may use no null bytes.

exceptionObject The pointer to a memory location recording the necessaryinformation for processing the exception according to the semantics of agiven language (see the Exception Header section above).

context Unwinder state information for use by the personality routine. This isan opaque handle used by the personality routine in particular to access theframe’s registers (see the Unwind Context section above).

return value The return value from the personality routine indicates how furtherunwind should happen, as well as possible error conditions. See the follow-ing section.

Personality Routine Actions

The actions argument to the personality routine is a bitwise OR of one or more ofthe following constants:typedef int _Unwind_Action;const _Unwind_Action _UA_SEARCH_PHASE = 1;const _Unwind_Action _UA_CLEANUP_PHASE = 2;const _Unwind_Action _UA_HANDLER_FRAME = 4;const _Unwind_Action _UA_FORCE_UNWIND = 8;

_UA_SEARCH_PHASE Indicates that the personality routine should check if thecurrent frame contains a handler, and if so return _URC_HANDLER_FOUND,

51

Intel386 ABI 1.1 – December 7, 2015 – 8:57

or otherwise return _URC_CONTINUE_UNWIND. _UA_SEARCH_PHASEcannot be set at the same time as _UA_CLEANUP_PHASE.

_UA_CLEANUP_PHASE Indicates that the personality routine should per-form cleanup for the current frame. The personality routine canperform this cleanup itself, by calling nested procedures, and return_URC_CONTINUE_UNWIND. Alternatively, it can setup the registers (in-cluding the IP) for transferring control to a "landing pad", and return_URC_INSTALL_CONTEXT.

_UA_HANDLER_FRAME During phase 2, indicates to the personality routinethat the current frame is the one which was flagged as the handler frameduring phase 1. The personality routine is not allowed to change its mindbetween phase 1 and phase 2, i.e. it must handle the exception in this framein phase 2.

_UA_FORCE_UNWIND During phase 2, indicates that no language is allowedto "catch" the exception. This flag is set while unwinding the stack forlongjmp or during thread cancellation. User-defined code in a catch clausemay still be executed, but the catch clause must resume unwinding with acall to _Unwind_Resume when finished.

Transferring Control to a Landing Pad

If the personality routine determines that it should transfer control to a landingpad (in phase 2), it may set up registers (including IP) with suitable values forentering the landing pad (e.g. with landing pad parameters), by calling the contextmanagement routines above. It then returns _URC_INSTALL_CONTEXT.

Prior to executing code in the landing pad, the unwind library restores registersnot altered by the personality routine, using the context record, to their state in thatframe before the call that threw the exception, as follows. All registers specifiedas callee-saved by the base ABI are restored, as well as scratch registers %eax and%edx (see below). Except for those exceptions, scratch (or caller-saved) registersare not preserved, and their contents are undefined on transfer.

The landing pad can either resume normal execution (as, for instance, at theend of a C++ catch), or resume unwinding by calling _Unwind_Resume andpassing it the exceptionObject argument received by the personality routine._Unwind_Resume will never return.

52

Intel386 ABI 1.1 – December 7, 2015 – 8:57

_Unwind_Resume should be called if and only if the personality routinedid not return _Unwind_HANDLER_FOUND during phase 1. As a result, theunwinder can allocate resources (for instance memory) and keep track of them inthe exception object reserved words. It should then free these resources beforetransferring control to the last (handler) landing pad. It does not need to free theresources before entering non-handler landing-pads, since _Unwind_Resumewill ultimately be called.

The landing pad may receive arguments from the runtime, typically passedin registers set using _Unwind_SetGR by the personality routine. Fora landing pad that can call to _Unwind_Resume, one argument must bethe exceptionObject pointer, which must be preserved to be passed to_Unwind_Resume.

The landing pad may receive other arguments, for instance a switch valueindicating the type of the exception. Two scratch registers are reserved for thisuse (%eax and %edx).

Rules for Correct Inter-Language Operation

The following rules must be observed for correct operation between languagesand/or run times from different vendors:

An exception which has an unknown class must not be altered by the personal-ity routine. The semantics of foreign exception processing depend on the languageof the stack frame being unwound. This covers in particular how exceptions froma foreign language are mapped to the native language in that frame.

If a runtime resumes normal execution, and the caught exception was createdby another runtime, it should call _Unwind_DeleteException. This is trueeven if it understands the exception object format (such as would be the casebetween different C++ run times).

A runtime is not allowed to catch an exception if the _UA_FORCE_UNWINDflag was passed to the personality routine.

Example: Foreign Exceptions in C++. In C++, foreign exceptionscan be caught by a catch(...) statement. They can also becaught as if they were of a __foreign_exception class, defined in<exception>. The __foreign_exception may have subclasses, such as__java_exception and __ada_exception, if the runtime is capable ofidentifying some of the foreign languages.

The behavior is undefined in the following cases:

53

Intel386 ABI 1.1 – December 7, 2015 – 8:57

• A __foreign_exception catch argument is accessed in any way (in-cluding taking its address).

• A __foreign_exception is active at the same time as another excep-tion (either there is a nested exception while catching the foreign exception,or the foreign exception was itself nested).

• uncaught_exception(), set_terminate(), set_unexpected(),terminate(), or unexpected() is called at a time a foreign excep-tion exists (for example, calling set_terminate() during unwindingof a foreign exception).

All these cases might involve accessing C++ specific content of the thrownexception, for instance to chain active exceptions.

Otherwise, a catch block catching a foreign exception is allowed:

• to resume normal execution, thereby stopping propagation of the foreignexception and deleting it, or

• to re-throw the foreign exception. In that case, the original exception objectmust be unaltered by the C++ runtime.

A catch-all block may be executed during forced unwinding. For instance, alongjmp may execute code in a catch(...) during stack unwinding. However,if this happens, unwinding will proceed at the end of the catch-all block, whetheror not there is an explicit re-throw.

Setting the low 4 bytes of exception class to C++\0 is reserved for use by C++run-times compatible with the common C++ ABI.

54

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 5

Conventions

1

1This chapter is used to document some features special to the Intel386 ABI. The differentsections might be moved to another place or removed completely.

55

Intel386 ABI 1.1 – December 7, 2015 – 8:57

5.1 C++For the C++ ABI we will use the IA-64 C++ ABI and instantiate it appropriately.The current draft of that ABI is available at:http://mentorembedded.github.io/cxx-abi/

56

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Chapter 6

Intel MPX Extension

Intel MPX (Memory Protection Extensions) provides 4 64-bit wide bound reg-isters (%bnd0 - %bnd3). For purpose of function return, the lower 32 bits of%bnd0 specify lower bound of function return, and the upper 32 bits specify up-per bound of function return. The upper bound is represented in one’s complementform.

6.1 Parameter Passing and Returning of Values

6.1.1 Bounds PassingIntel MPX provides ISA extensions that allow passing bounds for a pointer argu-ment that specify memory area that may be legally accessed by dereferencing thepointer. This paragraph desribes how the bounds are passed to the callee.

Figure 6.1: Bound Register Usage

Preserved acrossRegister Usage function calls

%bnd0 used to return bounds of pointer returnvalue

No

%bnd1–%bnd3 scratch registers No

57

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Several functions used in the description below are defined as follows:

BOUND_MAP_STORE(bnd, addr, ptr) This function executes Intel MPX bndstxinstruction. ptr argument is used to initialize index field of the memoryoperand of the bndstx instruction, addr is encoded in base and/or dis-placement fields of the memory operand, bnd is encoded in the registeroperand.

BOUND_MAP_LOAD(addr, ptr) This function executes Intel MPX bndldxinstruction. ptr argument is used to initialize index field of the memoryoperand of the bndldx instruction, addr is encoded in base and/or dis-placement fields of the memory operand.

The bounds associated with each pointer contained in the fourbyte are passedin a CPU defined manner by executing BOUND_MAP_STORE(bnd, addr,ptr) function, where bnd is the current bounds of the pointer argument, addris the address of the pointer argument’s stack location, ptr is the actual value ofthe pointer argument. If the fourbyte may contain parts of partially overlappingpointers, then bounds associated with the pointers are ignored and special boundsthat allow accessing all memory are passed for such pointers. The callee fetchesthe passed bounds using BOUND_MAP_LOAD(addr, ptr), where addr is thesame address passed to the corresponding BOUND_MAP_STORE in the caller, andptr is the actual value of the pointer parameter fetched by the callee from a stacklocation.

When passing arguments with bounds to functions, function prototypes mustbe provided. Otherwise, the run-time behavior is undefined.

6.1.2 Returning of BoundsThe returning of bounds is done according to the following algorithm:

1. When the value is returned in memory, on return %bnd0 must containbounds of the “hidden” first argument that has been passed in by the caller.

2. When a pointer value is returned, on return %bnd0 must contain bounds ofthe pointer value.

58

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Appendix A

Linker Optimization

This chapter describes optimizations which may be performed by linker.

A.1 Combine GOTPLT and GOT SlotsIn the small and medium models, when there are both PLT and GOT references tothe same function symbol, normally linker creates a GOTPLT slot for PLT entryand a GOT slot for GOT reference. A run-time JUMP_SLOT relocation is createdto update the GOTPLT slot and a run-time GLOB_DAT relocation is created toupdate the GOT slot. Both JUMP_SLOT and GLOB_DAT relocations apply thesame symbol value to GOTPLT and GOT slots, respectively, at run-time.

As an optimization, linker may combine GOTPLT and GOT slots into a singleGOT slot and remove the run-time JUMP_SLOT relocation. It replaces the regularPLT entry:

Figure A.1: Procedure Linkage Table Entry Via GOTPLT Slot

.PLT: jmp [GOTPLT slot]pushl relocation indexjmp .PLT0

with an GOT PLT entry with an indirect jump via the GOT slot:

59

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Figure A.2: Procedure Linkage Table Entry Via GOT Slot

.PLT: jmp [GOT slot]nop

and resolves the PLT reference to the GOT PLT entry. Indirect jmp is an 5-byteinstruction. nop can be encoded as a 3-byte instruction or a 11-byte instructionfor 8-byte or 16-byte PLT slot. A separate PLT with 8-byte slots may be used forthis optimization.

This optimization isn’t applicable to the STT_GNU_IFUNC symbols sincetheir GOTPLT slots are resolved to the selected implementation and their GOTslots are resolved to their PLT entries.

This optimization must be avoided if pointer equality is needed since the sym-bol value won’t be cleared in this case and the dynamic linker won’t update theGOT slot. Otherwise, the resulting binary will get into an infinite loop at run-time.

A.2 Optimize R_386_GOT32X RelocationThe Intel386 instruction encoding supports converting certain instructions on mem-ory operand with R_386_GOT32X relocation against symbol, foo, into a differ-ent form on immediate operand if foo is defined locally.

Convert call, jmp and mov Convert memory operand of call, jmp and movinto immediate operand.

Table A.1: Call, Jmp and Mov Conversion

Memory Operand Immediate Operandcall *foo@GOT(%reg) nop call foocall *foo@GOT(%reg) call foo nopjmp *foo@GOT(%reg) jmp foo nopmov foo@GOT(%reg1), %reg2 lea foo@GOTOFF(%reg1), %reg2

60

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Convert Test and Binop Convert memory operand of call, jmp, mov, testand binop into immediate operand, where binop is one of adc, add,and, cmp, or, sbb, sub, xor instructions, when position-independentcode is disabled.

Table A.2: Test and Binop Conversion

Memory Operand Immediate Operandcall *foo@GOT nop call foocall *foo@GOT call foo nopjmp *foo@GOT jmp foo nopmov foo@GOT, %reg lea foo, %regtest %reg, foo@GOT test $foo, %regbinop foo@GOT, %reg binop $foo, %regcall *foo@GOT(%reg) nop call foocall *foo@GOT(%reg) call foo nopjmp *foo@GOT(%reg) jmp foo nopmov foo@GOT(%reg1), %reg2 lea foo, %reg2test %reg1, name@GOT(%reg2) test $foo, %reg1binop name@GOT(%reg1), %reg2 binop $foo, %reg2

61

Intel386 ABI 1.1 – December 7, 2015 – 8:57

Index

_UA_CLEANUP_PHASE, 41_UA_FORCE_UNWIND, 40_UA_SEARCH_PHASE, 41_Unwind_Context, 40_Unwind_DeleteException, 39_Unwind_Exception, 40_Unwind_ForcedUnwind, 39, 40_Unwind_GetCFA, 39_Unwind_GetGR, 39_Unwind_GetIP, 39_Unwind_GetLanguageSpecificData, 39_Unwind_GetRegionStart, 39_Unwind_RaiseException, 39, 40_Unwind_Resume, 39_Unwind_SetGR, 39_Unwind_SetIP, 39

auxiliary vector, 20

boolean, 9byte, 7

C++, 56Call Frame Information tables, 39Convert call, jmp and mov, 60Convert Test and Binop, 61

double quadword, 7doubleword, 7DWARF Debugging Information Format, 39

62

Intel386 ABI 1.1 – December 7, 2015 – 8:57

eightbyte, 7exec, 18

fourbyte, 7

halfword, 7

longjmp, 40

Procedure Linkage Table, 35

quadword, 7

sixteenbyte, 7size_t, 9

terminate(), 41Thread-Local Storage, 37twobyte, 7

Unwind Library interface, 39

word, 7

63

Intel386 ABI 1.1 – December 7, 2015 – 8:57