microprocessors frame pointers and the use of the –fomit-frame-pointer switch feb 25th, 2002
Post on 21-Dec-2015
215 views
TRANSCRIPT
MicroprocessorsMicroprocessors
Frame Pointers and the use ofFrame Pointers and the use of the –fomit-frame-pointer the –fomit-frame-pointer
switchswitch
Feb 25th, 2002Feb 25th, 2002
General OutlineGeneral Outline
Usually a function uses a frame Usually a function uses a frame pointer to address the local variables pointer to address the local variables and parametersand parameters
It is possible in some limited It is possible in some limited circumstances to avoid the use of the circumstances to avoid the use of the frame pointer, and use the stack frame pointer, and use the stack pointer instead.pointer instead.
The -fomit-frame-pointer switch of gcc The -fomit-frame-pointer switch of gcc triggers this switch. This set of slides triggers this switch. This set of slides describes the effect of this feature.describes the effect of this feature.
-fomit-frame-pointer-fomit-frame-pointer
Consider this exampleConsider this exampleInt q (int a, int b) {Int q (int a, int b) {
int c; int c; int d; int d;
c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}
Calling the functionCalling the function
The caller does something likeThe caller does something like push second-arg (b)push second-arg (b)
push first-arg (a) push first-arg (a) call q call q add esp, 8 add esp, 8
Stack at function entryStack at function entry
Stack contents (top of memory first)Stack contents (top of memory first)Argument bArgument b
Argument aArgument areturn point return point ESP ESP
Code of q itselfCode of q itself
The prologThe prologpush ebppush ebpmov ebp,espmov ebp,esp
sub esp, 8sub esp, 8
Stack after the prologStack after the prolog
Immediately after the sub of espImmediately after the sub of espsecond argument (b)second argument (b)first argument (a)first argument (a)return pointreturn point
old EBP old EBP EBPEBPvalue of cvalue of cvalue of dvalue of d ESP ESP
Addressing using Frame Addressing using Frame PointerPointer
The local variables and arguments The local variables and arguments are addressed by using fixed offsets are addressed by using fixed offsets from the frame pointer (ESP is not from the frame pointer (ESP is not referenced)referenced)A is at [EBP+8]A is at [EBP+8]B is at [EBP+12]B is at [EBP+12]C is at [EBP-4]C is at [EBP-4]D is at [EBP-8]D is at [EBP-8]
Code for qCode for q
Code after the prologCode after the prolog
MOVMOV EAX, [EBP+8] EAX, [EBP+8] ; A; AADD EAX,4ADD EAX,4MOV [EBP-4], EAXMOV [EBP-4], EAX ; C; C
PUSH [EBP+12]PUSH [EBP+12] ; B; BCALL ISQRTCALL ISQRTADD ESP, 4ADD ESP, 4MOV [EBP-8], EAXMOV [EBP-8], EAX ; D; DMOV EAX, [EBP-4]MOV EAX, [EBP-4] ; C; CADD EAX, [EBP-8]ADD EAX, [EBP-8] ; D; D
Optimizing use of ESPOptimizing use of ESP
We don’t really need to readjust ESP We don’t really need to readjust ESP after a CALL, just so long as we do after a CALL, just so long as we do not leave junk on the stack not leave junk on the stack permanently.permanently.
The epilog will clean the entire frame The epilog will clean the entire frame anyway.anyway.
Let’s use this to improve the codeLet’s use this to improve the code
Code with ESP optimizationCode with ESP optimization
Code after the prologCode after the prolog
MOVMOV EAX, [EBP+8] EAX, [EBP+8] ; A; AADD EAX,4ADD EAX,4MOV [EBP-4], EAXMOV [EBP-4], EAX ; C; C
PUSH [EBP+12]PUSH [EBP+12] ; B; BCALL ISQRTCALL ISQRTMOV [EBP-8], EAXMOV [EBP-8], EAX ; D; DMOV EAX, [EBP-4]MOV EAX, [EBP-4] ; C; CADD EAX, [EBP-8]ADD EAX, [EBP-8] ; D; D
We omitted the ADD after the CALL, not We omitted the ADD after the CALL, not neededneeded
EpilogEpilog
Clean up and returnClean up and return
MOV ESP, EBPMOV ESP, EBP
POP EBPPOP EBP
RETRET
OrOr
LEAVELEAVERETRET
-fomit-frame-pointer-fomit-frame-pointer
Now we will look at the effect of omitting Now we will look at the effect of omitting the frame pointer on the same example, the frame pointer on the same example, that is we will compile this with the –fomit-that is we will compile this with the –fomit-frame-pointer switch set.frame-pointer switch set. Int q (int a, int b) {Int q (int a, int b) {
int c; int c; int d; int d;
c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}
Calling the functionCalling the function
The caller does something likeThe caller does something like push second-arg (b)push second-arg (b)
push first-arg (a) push first-arg (a) call q call q add esp, 8 add esp, 8
This is exactly the same as before, This is exactly the same as before, the switch affects only the called the switch affects only the called function, not the callerfunction, not the caller
Stack at function entryStack at function entry
Stack contents (top of memory first)Stack contents (top of memory first)Argument bArgument b
Argument aArgument areturn point return point ESP ESP
This is the same as beforeThis is the same as before
Code of q itselfCode of q itself
The prologThe prolog sub esp, 8sub esp, 8
That’s quite different, we have saved That’s quite different, we have saved some instructions by neither saving some instructions by neither saving nor setting the frame pointernor setting the frame pointer
Stack after the prologStack after the prolog
Immediately after the sub of espImmediately after the sub of espsecond argument (b)second argument (b)first argument (a)first argument (a)return pointreturn point
value of cvalue of cvalue of dvalue of d ESP ESP
Addressing using Stack Addressing using Stack PointerPointer
The local variables and arguments The local variables and arguments are addressed by using fixed offsets are addressed by using fixed offsets from the stack pointerfrom the stack pointerA is at [ESP+12]A is at [ESP+12]B is at [ESP+16]B is at [ESP+16]C is at [ESP+4]C is at [ESP+4]D is at [ESP]D is at [ESP]
Code for qCode for q
Code after the prologCode after the prolog
MOVMOV EAX, [ESP+12] EAX, [ESP+12] ; A; AADD EAX,4ADD EAX,4MOV [ESP+4], EAXMOV [ESP+4], EAX ; C; CPUSH [ESP+16]PUSH [ESP+16] ; B; BCALL ISQRTCALL ISQRTADD ESP, 4ADD ESP, 4MOV [ESP], EAXMOV [ESP], EAX ; D; DMOV EAX, [ESP+4]MOV EAX, [ESP+4] ; C; CADD EAX, [ESP]ADD EAX, [ESP] ; D; D
Epilog for –fomit-frame-Epilog for –fomit-frame-pointerpointer
We must remove the 8 bytes of local We must remove the 8 bytes of local parameters from the stack, so that parameters from the stack, so that ESP is properly set for the RET ESP is properly set for the RET instructioninstruction
ADD ESP,8ADD ESP,8 RETRET
Why not always use ESP?Why not always use ESP?
Problems with debuggingProblems with debuggingDebugger relies on hopping back frames Debugger relies on hopping back frames
using saved frame pointers (which form using saved frame pointers (which form a linked list of frames) to do back traces a linked list of frames) to do back traces etc.etc.
If code causes ESP to move then there If code causes ESP to move then there are difficultiesare difficultiesPush of parametersPush of parametersDynamic arraysDynamic arraysUse of allocaUse of alloca
Pushing ParametersPushing Parameters
Pushing parameters modifies ESPPushing parameters modifies ESPSometimes no problem, as in our Sometimes no problem, as in our
example here, since we undo the example here, since we undo the modification immediately after the modification immediately after the call.call.
But suppose we had called FUNC(B,B)But suppose we had called FUNC(B,B)We could not doWe could not do
PUSH [ESP+16]PUSH [ESP+16]PUSH [ESP+16]PUSH [ESP+16]
Since ESP is moved by the first PUSHSince ESP is moved by the first PUSH
More on ESP handlingMore on ESP handling
Once againOnce againPUSH [ESP+16]PUSH [ESP+16]
PUSH [ESP+16]PUSH [ESP+16]Would not work, but we can keep Would not work, but we can keep
track of the fact that ESP has moved track of the fact that ESP has moved and doand doPUSH [ESP+16]PUSH [ESP+16] ; Push B; Push B
PUSH [ESP+20]PUSH [ESP+20] ; Push B again; Push B againAnd that works fineAnd that works fine
More on ESP optimizationMore on ESP optimization
In the case of using the frame In the case of using the frame pointer, we were able to optimize to pointer, we were able to optimize to remove the add of ESP.remove the add of ESP.
Can we still do that?Can we still do that?Answer yes, but we have to keep Answer yes, but we have to keep
track of the fact that there is an track of the fact that there is an extra word on the stack, so ESP is 4 extra word on the stack, so ESP is 4 “off”.“off”.
Code with ESP optimizationCode with ESP optimization
Code after the prologCode after the prolog
MOVMOV EAX, [ESP+12] EAX, [ESP+12] ; A; AADD EAX,4ADD EAX,4MOV [ESP+4], EAXMOV [ESP+4], EAX ; C; CPUSH [ESP+16]PUSH [ESP+16] ; B; BCALL ISQRTCALL ISQRTMOV [ESP+4], EAXMOV [ESP+4], EAX ; D; DMOV EAX, [ESP+8]MOV EAX, [ESP+8] ; C; CADD EAX, [ESP+4]ADD EAX, [ESP+4] ; D; D
Last three references had to be modifiedLast three references had to be modified
Epilog for Optimized codeEpilog for Optimized code
We also have to modify the epilog in We also have to modify the epilog in this case, since now there are 12 this case, since now there are 12 bytes on the stack at the exit, 8 from bytes on the stack at the exit, 8 from the local parameters, and 4 from the the local parameters, and 4 from the push we did.push we did.
Epilog becomesEpilog becomes
ADD ESP,12ADD ESP,12 RETRET
But no instructions were addedBut no instructions were added
Other cases of ESP movingOther cases of ESP moving
Dynamic arrays allocated on the Dynamic arrays allocated on the local stack, whose size is not knownlocal stack, whose size is not known
Explicit call to allocaExplicit call to allocaHow alloca worksHow alloca works
Subtract given value from ESPSubtract given value from ESPReturn ESP value as pointer to new areaReturn ESP value as pointer to new area
These cases are fatalThese cases are fatalMUST use a frame pointer in these casesMUST use a frame pointer in these cases
Even better, More Even better, More optimizationoptimization
Let’s recall our example:Let’s recall our example: Int q (int a, int b) {Int q (int a, int b) {
int c; int c; int d; int d;
c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}
We can rewrite this to avoid the use of the We can rewrite this to avoid the use of the local parameters c and d completely, and local parameters c and d completely, and the compiler can do the same thing.the compiler can do the same thing.
Optimized VersionOptimized Version
With some optimization, we can writeWith some optimization, we can writeInt q (int a, int b) {Int q (int a, int b) {
return isqrt (b) + a + 4; return isqrt (b) + a + 4;}}
We are not suggesting that the user We are not suggesting that the user have to rewrite the code this way, we have to rewrite the code this way, we want the compiler to do it automaticallywant the compiler to do it automatically
Optimizations We UsedOptimizations We Used
Commutative OptimizationCommutative OptimizationA + B = B + AA + B = B + A
Associative OptimizationAssociative OptimizationA + (B + C) = (A + B) + CA + (B + C) = (A + B) + C
For integer operands, these For integer operands, these optimizations are certainly valid (well optimizations are certainly valid (well see fine point on next slide)see fine point on next slide)
Floating-point is another matter!Floating-point is another matter!
A fine PointA fine Point
The transformation ofThe transformation of (A + B) + C to A + (B + C)(A + B) + C to A + (B + C)
Works fine in 2’s complement integer Works fine in 2’s complement integer arithmetic with no overflow, which is the arithmetic with no overflow, which is the code the compiler will generatecode the compiler will generate
But strictly at the C source level, B+C But strictly at the C source level, B+C might overflow, so at the source level this might overflow, so at the source level this transformation is not technically correcttransformation is not technically correct
But we are really talking about compiler But we are really talking about compiler optimizations anyway, so this does not optimizations anyway, so this does not matter.matter.
The optimized codeThe optimized code
Still omitting the frame pointer, we Still omitting the frame pointer, we now have the following modified now have the following modified code for the optimized functioncode for the optimized function
The prologThe prolog
(this slide intentionally blank (this slide intentionally blank )) No prolog code is necessary, we can use No prolog code is necessary, we can use
the stack exactly as it came to us:the stack exactly as it came to us: second argument (b)second argument (b)
first argument (a)first argument (a)return pointreturn point ESPESP
And address parameters off unchanged And address parameters off unchanged ESPESP A is at [ESP+4]A is at [ESP+4] B is at [ESP+8]B is at [ESP+8]
The body of the functionThe body of the function
Code after the (empty) prologCode after the (empty) prolog
PUSHPUSH [ESP+8][ESP+8] ; B; BCALLCALL ISQRTISQRTADDADD EAX, [ESP+8]EAX, [ESP+8] ; A; AADD ADD EAX, 4EAX, 4
Note that the reference to A was Note that the reference to A was adjusted to account for the extra 4 adjusted to account for the extra 4 bytes pushed on to the stack before bytes pushed on to the stack before the call to ISQRT.the call to ISQRT.
The epilogThe epilog
We pushed 4 bytes extra on to the We pushed 4 bytes extra on to the stack, so we need to pop them offstack, so we need to pop them off ADDADD ESP,4ESP,4
RETRET
And that’s it, only 6 instructions in all.And that’s it, only 6 instructions in all.Removing the frame pointer really Removing the frame pointer really
helped here, since it saved 3 helped here, since it saved 3 instructions and two memory instructions and two memory referencesreferences
Other advantages of omitting Other advantages of omitting FPFP
If we omit the frame pointer then we have If we omit the frame pointer then we have an extra registeran extra register
For the x86, going from 6 to 7 available For the x86, going from 6 to 7 available registers can make a real differenceregisters can make a real difference
Of course we have to save and restore EBP Of course we have to save and restore EBP to use it freelyto use it freely
But that may well be worth while in a long But that may well be worth while in a long function, anything to keep things in function, anything to keep things in registers and save memory references is a registers and save memory references is a GOOD THING!GOOD THING!
SummarySummary
Now you know what this gcc switch Now you know what this gcc switch doesdoes
But more importantly, if you But more importantly, if you understand what it does, you understand what it does, you understand all about frame pointers understand all about frame pointers and addressing of data in local and addressing of data in local frames.frames.