c programming and assembly language janakiraman v – [email protected]@yahoo.com nitk...

37
C Programming and Assembly Language Janakiraman V – [email protected] NITK Surathkal 2 nd August 2014

Upload: anne-abbott

Post on 15-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

C Programming and Assembly Language

Janakiraman V – [email protected]

NITK Surathkal

2nd August 2014

Page 2: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Motivation

Do you know how all this is implemented in assembly?

Page 3: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Agenda

•Brief introduction to the 8086 processor architecture

•Describe commonly used assembly instructions

•Use of stack and related instructions

•Translate high level function calls into low level assembly language

•Familiarize the calling conventions

•Explain how variables are passed and accessed

Page 4: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

8086 Architecture

•ALU – Arithmetic and Logical unit – The heart of the processor

•Control Unit – Decodes instructions, Controls the execution flow

•Registers – Implicit memory locations within the processor

•Registers – Serve as arguments to most operations

•Flags – All ALU operations will set particular bits after execution

Page 5: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Registers

•EAX – Stores integer return values

•ECX – Stores the counters for loops and also stores “THIS” pointer

•EIP –Instruction pointer. Stores the address of the next instruction to be executed

•ESP – The Stack pointer. Implicitly changed during Call/ Ret instructions.

•EBP – Base pointer. Used to access local variables and function parameters.

Page 6: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Registers Contd…

•EBX – A general purpose register

•ESI– The source index register for string instructions

•EDI - The destination index registers for string instructions

•EFL – Flag register. Stores the flag bits of various flags like Carry, Zero, etc.

•Segment registers point to a segment of memory. EDS, ESS, EES, ECS

•EDX – Stores high 32 bits of 64 bit values

Page 7: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Instruction Set

•Data transfer

•Arithmetic and logical

•Stack Operations

•Branching and Looping

•Function calls

•String Instructions

•Prefix to instructions

Page 8: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Data transfer instructions

MOV Destination, Source - Format» Data transfer is always from RIGHT to LEFT.

» Source Register is unaffected.

LEA – Load effective address.» Loads the offset Address of the specified variable into the

destination.

» Equivalent of int y = &x;

Page 9: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Arithmetic and Logical instructions

•Operation destination, source – Format

»ADD AX, BX

»SUB AX, [BX]

»OR AX, [BX+4]

»XOR AX, AX – Fastest way to clear registers

Page 10: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Exercise 1

Write an assembly program to evaluate the following expression. (All variables are 32 bit integers)

» EAX = x*y + a – b

» EBX =( x^y) | ( a&b)

int x=4, y=6, a=3, b=2;

__asm

{

MOV EAX, x

MUL y

ADD EAX, a

SUB EAX, b

MOV EBX, x

XOR EBX, y

MOV ECX, a

AND ECX, b

OR EBX, ECX

}

Page 11: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Branching and Looping

•JMP Addr – Loads EIP with Addr

•Conditional Jumps» Transfers control based on a condition

» Based on state of one or more flags

» ALU operation sets flags

Janakiraman
This instruction is used to unconditionally transfer control to another location. The GOTO is typical example of its usage
Janakiraman
All conditional jump instructions are dependant on the state of one flag or the other. In some cases it could depend on more than one flag. JG (Jump if greater than).
Janakiraman
Every ALU operation sets/ resets flags, hence to conditionally jump an ALU operation MUST be executed to affect the flags and then an appropriate conditional jump can be executed
Page 12: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Exercise 2

Write an assembly program to evaluate the expression “ z = x * y ”using

» Repeated addition

» MUL instruction

Write an assembly program to calculate the string length of a constant string

Multiplication by repeated addition.

int x =9, y=10, z=0;

__asm

{

XOR EAX, EAX

MOV EBX, y

MULT: ADD EAX, x

DEC EBX

JNZ MULT

MOV z, EAX

}

String length of a constant string

char* pChar = “Test data";

MOV EDI, pChar

XOR ECX, ECX

COMPARE: CMP [EDI], 0

JNZ INCREASE

JMP DONE

INCREASE: INC ECX

INC EDI

JMP COMPARE

DONE: MOV len, ECX

Page 13: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Stack Operations

PUSH: PUSH EAX

» ESP decreases by 4/ 2/ 1

» Data is moved on to top of stack

» Used extensively to pass parameters to functions.

POP: POP EAX

» ESP increases 4/ 2/ 1

» Data is copied to the destination

» Compliment of PUSH

Janakiraman
PUSH EAXESP is decremented by 1, the most significant BYTE of EAX is pushed onto stack, followed by the next three bytes. Thus decrementing ESP by 4 if 32 bit data is pushed onto stack
Janakiraman
POP EAXThe BYTE pointed to by ESP is loaded on to the least significant byte of EAX while incrementing ESP by 1. Similarly the following three bytes are also loaded on to EAX. Thus incrementing ESP by 4.
Page 14: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Exercise 3

Write an assembly program to swap two integers x and y.

Write a C program to swap two numbers using a function Swap(int* pX, int* pY). Implement the Swap function directly in assembly language

Swap two integers.

int x=4, y=5;

__asm

{

PUSH x

PUSH y

POP x

POP y

}

Function to swap variables

void swap(int* pX, int* pY)

{

__asm

{

MOV EAX, pX

MOV EBX, pY

PUSH DWORD PTR [EAX]

PUSH DWORD PTR [EBX]

POP DWROD PTR [EAX]

POP DWORD PTR [EBX]

}

}

Janakiraman
This example is a very simple example of the use of stack and it brings to light the LIFO principle.
Janakiraman
The function to swap integers works much the same way as the previous example with the difference that, apart from the PUSH/ POP sequence two MOV instructions are necessary initialization steps. In short, what the program does is that it pushes the contents pointed to by pX and pY onto stack and Pops them to effect the swap. It is important to notice that an intutive statement like PUSH [pX] (Push the contents pointed to by pX) cannot be used!!!!! This will become clear once access of local variables is explained.
Page 15: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Function calls

CALL – CALL ADDR» Used for function calls.

» Implicitly pushes the EIP on to the stack.

» Reads the address specified (ADDR) and loads EIP with ADDR.

RET – RET n» Used to return to the calling function.

» Implicitly pops the DWORD on the TOS into EIP.

» ‘n’ Specifies the number to be added to ESP after returning. Used for stack clean up.

Page 16: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C program!!

int g_iVar = 5;

void main()

{

int z=0;

z = Fn(2,4);

g_iVar = z;

}

int Fn(int x, int y)

{

int z=0;

z = x+ y

return z;

}

Page 17: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

C and assembly language - FAQ

•How are function calls in ‘C’ translated into assembly?

•How are parameters passed to the function?

•What does it mean to say local variables are stored on stack? Scope of local variables!

•How are global variables accessed?

Page 18: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

C and Assembly language Contd….

•Cannot pass many parameters in registers

•Scope – Desirable feature

•Stack – Ideal to store local variables

•ESP cannot be used to access the local variables

•EBP is used to access them!!!

Janakiraman
In a function ther can be many local variables but there is only ONE base pointer which can be used to access any of them. To achieve this, local variables are stored continuously on stack and they are accessed by "INDIRECT ADDRESSING with OFFSET" mode ie if there are two local variables X and Y, X is accessed as [EBP-N] and Y as [EBP-N-4]
Janakiraman
Function scope is a desirable feature and that is the reason why local variables are not accessed by a direct address.
Janakiraman
ESP is a register used internally by the processor for instructions like PUSH, POP, CALL and RET. Hence it is not advisable to use the ESP for random access on the stack.
Page 19: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Parameters, Local and Global variables

•Before a function is called parameters are pushed onto stack

•Parameters are accessed by [EBP +n]

•Local variables are accessed by [EBP –n]

•Integers are returned in EAX

•Global variables are accessed by direct address values

Page 20: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C program Contd…

void main()

{

int z=0;

MOV z, 0

z = Fn(2,4);

PUSH 0x00000004

PUSH 0x00000002

CALL Fn

MOV z, EAX

g_iVal = z;

MOV [g_iVar], EAX

}

int Fn(int x, int y)

{

int z=0;

MOV z, 0

z = x+ y;

MOV EAX, x

ADD EAX, y

MOV z, EAX

return z;

RET

}

Page 21: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C Program Contd….

CODE SEGMENT – Function – main()

.

int z = 0;

C100 MOV [EBP-4], 0

z = Fn(2,4);

C101 PUSH 0x00000004

C102 PUSH 0x00000002

C103 Call C200

C104 MOV [EBP-4], EAX

g_iVar = z;

C105 MOV [g_iVar], EAX

.

.

STACK SEGMENT

0x00000004

0x00000002

C104

ESP

ESP

ESP

ESP

ESP

0x00000000 local var Z

EBP

Page 22: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C Program Contd….

CODE SEGMENT – Function – Fn()

C200 MOV EBP, ESP

C201 SUB ESP, 0x40

int z=0;

C202 MOV [EBP-4], 0

z = x+ y

C203 MOV EAX, [EBP+4]

C204 ADD EAX, [EBP+8]

C205 MOV [EBP-4], EAX

return z;

C206 ADD ESP, 0x40

C206 RET

STACK SEGMENT

0x00000004

0x00000002

C104ESP

EBP

ESP

Local variable space

0x00000000

ESP

0x000000060x00000006Z

EBP

0x00000000 local var Z

Janakiraman
The base pointer, EBP, going to be used to access all local variables in this function. At present it points to the local varibales of the previous function. It has to be set to point to the current functions stack
Janakiraman
Space has to be allocated on the stack for the local variables. This is done by simply subtracting the stack pointer, ESP. If this is not done any PUSH operation will simply overwrite the local variable. Usually the compiler will allocate more than what is required.
Janakiraman
Space allocalted for the local variables has to be de allocated which is a achieved by adding to the stack pointer, the same number of bytes that was subtracted while allocating space. In this case its 0x40 bytes.
Page 23: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

CODE SEGMENT – Function – main()

.

int z = 0;

C100 MOV [EBP-4], 0

z = Fn(2,4);

C101 PUSH 0x00000004

C102 PUSH 0x00000002

C103 Call C200

C104 MOV [EBP-4], EAX

g_iVar = z;

C105 MOV [g_iVar], EAX

C106 RET

STACK SEGMENT

0x00000004

0x00000002

C104

0x00000000 Local var Z

ESP

EBP

0x000000060x00000006Stack corruption!!!!!

You have accessed the stack of the function “Fn()”

You computer will now REBOOT!!!!!

Janakiraman
The base pointer, EBP, is still pointing to the stack of the function Fn(). It has to be reset to point to the currrent functions stack. But very clearly we do not have the value of the current functions EBP!!!!! Something should have been done to store the EBP value of the current funtion even before the function Fn() gets executed. As mentioned earlier, the stack can be used as a temporary memory location without worrying about where the data is stored. Therfore before setting the EBP in the funtion Fn() we have to PUSH the current EBP and just before returning POP it back and restore the current stack state.
Janakiraman
The stack pointer, ESP, is currently pointing to a location which has a value 0x00000002. At this point if we execute a RET it will simply POP 0x00000002 on to the EIP and resume execution at that location( which happens to be in the boot sector). Similar to the other function Fn() the main will also have the initalization code to set up the stack and allocate space for local variables, and before it returns it will undo all what it did in the initialization code. When the OS called the function main() it would have PUSHed the return address on to stack. But we have pushed two parameters to call the function Fn(), which is what is causing the stack corruption. We have to undo that also in order to, not corrupt the stack and return the control to the OS properly. This is known as STACK CLEAN UP.
Page 24: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C Program Contd….

CODE SEGMENT – Function – main()

.

int z = 0;

C100 MOV [EBP-4], 0

z = Fn(2,4);

C101 PUSH 0x00000004

C102 PUSH 0x00000002

C103 Call C200

C104 MOV [EBP-4], EAX

g_iVar = z;

C105 MOV [g_iVar], EAX

.

.

STACK SEGMENT

0x00000004

0x00000002

C104

ESP

ESP

ESP

ESP

ESP

0x00000000 local var Z

EBP

Page 25: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Compile the C Program Contd….

CODE SEGMENT – Function – Fn()

C200 PUSH EBP

C202 MOV EBP, ESP

C203 SUB ESP, 0x40

int z=0;

C204 MOV [EBP-4], 0

z = x+ y

C205 MOV EAX, [EBP+8]

C206 ADD EAX, [EBP+12]

C207 MOV [EBP-4], EAX

return z;

C208 ADD ESP, 0x40

C209 POP EBP

C20A RET 8

STACK SEGMENT

0x00000004

0x00000002

C104ESPEBP

ESPLocal variable space

0x00000000

ESP

0x000000060x00000006Z

EBP

0x00000000 local var Z

EBP - main()ESP

EBP

ESP

Page 26: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

CODE SEGMENT – Function – main()

.

int z = 0;

C100 MOV [EBP-4], 0

z = Fn(2,4);

C101 PUSH 0x00000004

C102 PUSH 0x00000002

C103 Call C200

C104 MOV [EBP-4], EAX

g_iVar = z;

C105 MOV [g_iVar], EAX

C106 Epilogue

STACK SEGMENT

0x00000004

0x00000002

C104

0x00000000 Local var Z

ESP

EBP

0x0000006

0x00000006

ESP

ESP

Page 27: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Function calls in C - Summary

Function call gets translated to CALL addr

Prologue

» Store the current EBP on stack

» Set up the stack - Initialize the EBP

» Allocate space for local variables.

Execute the function accordingly

Epilogue

» Set the ESP to its original value

» Set the EBP back to its original value

Janakiraman
Apart from the mentioned stack setup additional instructions may exist in the prologue to save all registers on to stack and restore them in the epilogue. Also the allocated space for local variables may be initialized to a particular value. The Microsoft compiler initializes the local vaiables with 0xCCCCCCCC in the DEBUG build.
Page 28: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Stack clean up

•When?» Happens after returning from a function

•Why?» Undo the effect of pushing parameters

•How?» RET N or ADD ESP, N

Page 29: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

C Program Assembly Contd…

void main()

{

int z = 0;

z = Function(2, 4);

}

/*Contd……*/

Prologue

MOV [EBP-4], 0

PUSH 0x00000004

PUSH 0x00000002

CALL Function

MOV [EBP-4], EAX

Epilogue

Contd……

Page 30: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

C Program Assembly Contd…

int Function(int a, int b)

{

int c=0;

c = a + b;

return c;

}

PUSH EBP

MOV EBP, ESP --------- Prologue

SUB ESP, N

MOV [EBP-4], 0

MOV EAX, [EBP + 8] --- Body

ADD EAX, [EBP+12]

MOV [EBP-4], EAX

ADD ESP, N

POP EBP ----------------- Epilogue

RET 8

Page 31: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Calling conventions

__cdecl » Default calling convention of C functions

» Needed for variable argument list

» Caller cleans the stack - ADD ESP, N instruction

__stdcall» Faster than the __cdecl call.

» Callee cleans the stack - RET N instruction

Contd……

Page 32: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Back to Exercise 3

Write a C program to swap two numbers using a function Swap(int* pX, int* pY). Implement the Swap function directly in assembly language

Function to swap variables

void swap(int* pX, int* pY)

{

__asm

{

PUSH DWORD PTR [pX]

PUSH DWORD PTR [pY]

POP DWROD PTR [pX]

POP DWORD PTR [pY]

}

}

Function to swap variables

void swap(int* pX, int* pY)

{

__asm

{

MOV DWORD PTR EAX, [EBP+4]

MOV DWORD PTR EBX, [EBP+8]

PUSH DWORD PTR [EAX]

PUSH DWORD PTR [EBX]

POP DWROD PTR [EAX]

POP DWORD PTR [EBX]

}

}

Function to swap variables

void swap(int* pX, int* pY)

{

__asm

{

PUSH DWORD PTR [[EBP+4]]

PUSH DWORD PTR [[EBP+8]]

POP DWROD PTR [[EBP+4]]

POP DWORD PTR [[EBP+8]]

}

}

Double indirection is not a valid instruction

Janakiraman
The function to swap integers works much the same way as the previous example with the difference that, apart from the PUSH/ POP sequence two MOV instructions are necessary initialization steps. In short, what the program does is that it pushes the contents pointed to by pX and pY onto stack and Pops them to effect the swap. It is important to notice that an intutive statement like PUSH [pX] (Push the contents pointed to by pX) cannot be used!!!!! This will become clear once access of local variables is explained.
Page 33: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

What about C++?

struct stTest

{

int x;

int y;

};

void FnTest(stTest* pSt)

{

pSt->x = 0;

pSt->y = 1;

}

void main()

{

stTest obj;

FnTest(&obj);

}

class clsTest

{

int x;

int y;

public:

void FnTest()

{

x = 0;y=1;

}

};

void main()

{

clsTest obj;

obj.FnTest();

}

Page 34: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Calling convention Contd…

this call – The C++ calling convention» Behaves like the __cdecl call in most ways

» This pointer is passed in the ECX register

» Stores the this pointer in [EBP-4] location on stack

Page 35: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

String Instructions

•Uses ESI, EDI as its operands.

•After the operation ESI and EDI are automatically Incremented/ Decremented depending on the direction flag.

•Usually used with the Prefix instructions.

•Very efficient for standard looping instructions.

Page 36: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Prefix to instructions

REP – REP MOVSB» Used to repeat instructions unconditionally

» Implicitly decrements ECX by 1 after each execution

» Stops once ECX = 0

REPNE/ REPE – REPE SCASB» Used to repeat instructions conditionally

» Implicitly decrements ECX by 1 after each execution

» Stops once ECX = 0 or ZERO flag is set/ reset

Page 37: C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.comjramaanv@yahoo.com NITK Surathkal 2 nd August 2014

Optimized C functions

•Memcpy

•Strlen

•Memset