cs252: systems programming ninghui li based on slides by prof. gustavo rodriguez- rivera topic 2:...
TRANSCRIPT
CS252: Systems Programming
Ninghui Li
Based on Slides by Prof. Gustavo Rodriguez-Rivera
Topic 2: Program Structure and Using GDB
What Happens From a C Source Program, to Program Execution
Building a program, i.e, generating an executable file from source code
What are the steps?What does an executable file look like?
Loading a programEach time when you execute a program, a process is created.In Unix/Linux, use “ps” command to show processes in the system
Building a Program
The programmer writes a program hello.cThe preprocessor expands #define, #include, #ifdef etc preprocessor statements and generates a hello.i file.The compiler compiles hello.i, optimizes it and generates an assembly instruction listing hello.sThe assembler (as) assembles hello.s and generates an object file hello.oThe compiler (cc or gcc) by default hides all these intermediate steps. You can use compiler options to run each step independently.
Original file hello.c
#include <stdio.h>
main()
{
printf("Hello\n");
}
After preprocessor
gcc -E hello.c > hello.i (-E stops compiler after running preprocessor)hello.i:
/* Expanded /usr/include/stdio.h */typedef void *__va_list;typedef struct __FILE __FILE;typedef int ssize_t;struct FILE {…};extern int fprintf(FILE *, const char *, ...);extern int fscanf(FILE *, const char *, ...);extern int printf(const char *, ...);/* and more */main(){ printf("Hello\n");}
After assemblergcc -S hello.c (-S stops compiler after generating assembly code)
Resulting file is hello.sActual code depends on the system
LC0: .ascii "Hello\0" .text .globl _main .def _main; .scl 2; .type 32; .endef_main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp call ___main movl $LC0, (%esp) call _puts leave ret
After compiling & assembling
“gcc -c hello.c” generates hello.oThe main function already has a value in the object file hello.ohello.o has undefined symbols, like the _puts function call that we don’t know where it is placed.
The command “nm” can lists the symbols from object files
Output of “nm hello.o”
0000000000000000 b .bss // uninitilized data0000000000000000 d .data // Global and static vars0000000000000000 t .text U __main // entry point of program0000000000000000 T main // main function defined
in code U puts
__main and puts are undefined in “hello.o”
They are provided by the libraries
Building a program (continued)
The linker puts together all object files as well as the object files in static libraries.The linker also takes the definitions in shared libraries and verifies that the symbols (functions and variables) needed by the program are completely satisfied.If there is symbol that is not defined in either the executable or shared libraries, the linker will give an error.Static libraries (.a files) are added to the executable. shared libraries (.so files) are not added to the executable file.
Static and Shared Libraries
Shared libraries are shared across different processes.
There is only one instance of each shared library for the entire system.
Static libraries are not shared.
There is an instance of an static library for each process.
After linking
“gcc –o hello hello.c” generates the hello executable
The hello.o object code is statically linked with libraries to include code of library functions
In linking, “static = compilation/building time”Sometimes, not all functions’ code are included, some code are stored in shared libraries and dynamically linked.
In linking, “dynamic = loading/execution time”
Building a Program
Programmer
C Preprocessor Compiler
(cc)Optimizer
Assembler (as)
(static)
Linker (ld)
Editor
hello.c hello.i
hello.s
hello.o
Executable File (hello)
Other .o filesStatic libraries (.a files) They add to the size of the executable.
What is a program?
A program is a file in a special format that contains all the necessary information to load an application into memory and make it run.A program file includes: machine instructions initialized data List of library dependencies List of memory sections that the program will use List of undefined values in the executable that will be
known until the program is loaded into memory.
Executable File Formats
There are different executable file formats ELF – Executable Link File
It is used in most UNIX systems (Solaris, Linux)Can use elfdump to see information in binary file
COFF – Common Object File FormatIt is used in Windows systems
a.out – Used in BSD (Berkeley Standard Distribution) and early UNIX
It was very restrictive. It is not used anymore.
Note: BSD UNIX and AT&T UNIX are the predecessors of the modern UNIX flavors like Solaris and Linux.
Loading a Program
After one types hello in a shell, the shell creates a new process and load the file hello.
The loader is a program that is used to run an executable file in a process.
Before the program starts running, the loader allocates space for all the sections of the executable file (text, data, bss etc)
It loads into memory the executable and shared libraries (if not loaded yet)
Loading a Program
It also writes (resolves) any values in the executable to point to the functions/variables in the shared libraries.(E.g. calls to printf in hello.c)Once memory image is ready, the loader jumps to the _start entry point that calls init() of all libraries and initializes static constructors. Then it calls main() and the program begins. _start also calls exit() when main() returns.The loader is also called “runtime linker”.
Loading a Program
Loader (runtime linker) (/usr/lib/ld.so.1)
Executable File
Executable in memory
Shared libraries (.so, .dll)
Memory Structure of a Process
Memory of a Process
A 32-bit process sees memory as an array of bytes that goes from address 0 to 232-1 (0 to 4GB-1)
0
(4GB-1) 232-1
Memory Sections
The memory is organized into sections called “memory mappings”.
Stack
Text
Data
Bss
Heap
Shared Libs
0
232-1
Memory Sections
Each section has different permissions: read/write/execute or a combination of them.Text- Instructions that the program runsData – Initialized global variables. Bss – Uninitialized global variables. They are initialized to zeroes.Heap – Memory returned when calling malloc/new. It grows upwards.Stack – It stores local variables and return addresses. It grows downwards.
Memory Sections
Dynamic libraries – They are libraries shared with other processes. Each dynamic library has its own text, data, and bss.Each program has its own view of the memory that is independent of each other.
Virtual memory, mapped by OS to physical memoryThis view is called the “Address Space” of the program.If a process modifies a byte in its own address space, it will not modify the address space of another process.
Where things are located
Program hello.cint a = 5; // Stored in data sectionint b[20]; // Stored in bssint main() { // Stored in text int x; // Stored in stack int *p =(int*) malloc(sizeof(int)); //In heap}
Memory Gaps
Between each memory section there may be gaps that do not have any memory mapping.If the program tries to access a memory gap, the OS will send a SEGV signal that by default kills the program and dumps a core file.The core file contains the value of the variables global and local at the time of the SEGV. The core file can be used for “post mortem” debugging.gdb program-name coregdb> where
Using a Debugger
What is GDB
GDB is a debugger that helps you debug your program.
The time you spend now learning gdb will save you days of debugging time.
A debugger will make a good programmer a better programmer.
Compiling a program for gdb
You need to compile with the “-g” option to be able to debug a program with gdb.
The “-g” option adds debugging information to your program
gcc –g –o hello hello.c
Running a Program with gdb
To run a program with gdb typegdb progname(gdb)
Then set a breakpoint in the main function.(gdb) break main
A breakpoint is a marker in your program that will make the program stop and return control back to gdb.Now run your program.(gdb) runIf your program has arguments, you can pass them after run.
Stepping Through your ProgramYour program will start running and when it reaches “main()” it will stop.gdb>
Now you have the following commands to run your program step by step:(gdb) step
It will run the next line of code and stop. If it is a function call, it will enter into it
(gdb) next It will run the next line of code and stop. If it is a function call, it will not enter the function and it will go through it.
Example: (gdb) step
(gdb) next
Setting breakpoints
You can set breakpoints in a program in multiple ways:(gdb) break function
Set a breakpoint in a function E.g. (gdb) break main
(gdb) break lineSet a break point at a line in the current file. E.g.(gdb) break 66It will set a break point in line 66 of the current file.
(gdb) break file:lineIt will set a break point at a line in a specific file. E.g.(gdb) break hello.c:78
Regaining the Control
When you type (gdb) run
the program will start running and it will stop at a break point.
If the program is running without stopping, you can regain control again typing ctrl-c.
Where is your Program
The command(gdb)where
Will print the current function being executed and the chain of functions that are calling that fuction.
This is also called the backtrace.
Example:(gdb) where
#0 main () at test_mystring.c:22
(gdb)
Printing the Value of a Variable
The command(gdb) print var
Prints the value of a variable. E.g.
(gdb) print i$1 = 5(gdb) print s1$1 = 0x10740 "Hello"(gdb) print stack[2]$1 = 56(gdb) print stack$2 = {0, 0, 56, 0, 0, 0, 0, 0, 0, 0}(gdb)
Exiting gdb
The command “quit” exits gdb.(gdb) quit
The program is running. Exit anyway? (y or n) y
Debugging a Crashed Program
This is also called “postmortem debugging”It has nothing to do with CSI When a program crashes, it writes a core file.bash-4.1$ ./helloSegmentation Fault (core dumped)bash-4.1$
The core is a file that contains a snapshot of the program at the time of the crash. That includes what function the program was running.
Debugging a Crashed Program
To run gdb in a crashed program typegdb program coreE.g.
bash-4.1$ gdb hello coreGNU gdb 6.6Program terminated with signal 11, Segmentation fault.#0 0x000106cc in main () at hello.c:1111 *s2 = 9;(gdb)
Now you can type where to find out where the program crashed and the value of the variables at the time of the crash.(gdb) where#0 0x000106cc in main () at hello.c:11(gdb) print s2$1 = 0x0(gdb)
This tells you why your program crashed. Isn’t that great?
Now Try gdb in Your Own Program
Make sure that your program is compiled with the –g option.
Remember: One hour you spend learning gdb will save you
days of debugging. Faster development, less stress, better results
Stack Buffer Overflow
CS526
Topic 9: Software Vulnerabilities 41
Call Stack
• Aka. Execution stack, control stack, run-time stack, machine stack
• Why do we need to use stacks in processes?• To support function calls, and especially recursive function
calls.
• What are stored on the stack?• Functional call parameters
• Local
• Return address
• Saved state information
Stack Frame
Parameters
Return address
Saved Stack Frame Pointer
Local variables
SP
StackGrowth
High Address
Low Address
Code Fragment for Printing Stack Frame (from prstack.c)
int fac(int a, int p) {
char f[8] = " ";
int b = 0;
// print stack frame
gets(f); // buffer may overflow
if (a == 1) { b = 1; }
else { b = a * fac(a-1,p); }
// print stack frame again }
return b;
}
int main(int argc, char*argv[]) {
int n;
int r;
if (argc == 2) {
n = atoi(argv[1]);
r = fac(n, 0);
} else if (argc == 3) {
n = atoi(argv[2]);
r = fac(n, 1);
}
return 0;
}
Code Fragment for Printing Stack Frame (from prstack.c)
int fac(int a, int p) {
char f[8] = " "; int b = 0;
printf("Address %p: argument int p: 0x%.8x\n", &p, p);
printf("Address %p: argument int a: 0x%.8x\n", &a, a);
printf("Address %p: return address : 0x%.8x\n", &a-1, *(&a-1));
printf("Address %p: saved stack frame p: 0x%.8x\n", &a-2, *(&a-2));
printf("Address %p: local var f[4-7] : 0x%.8x\n", (char *)(&f)+4,
*((int *)(&f[4])));
printf("Address %p: local var f[0-3] : 0x%.8x\n", &f, *((int *)f));
printf("Address %p: local var int b: 0x%.8x\n", &b, b);
printf("Address %p: gap : 0x%.8x\n", &b-1, *(&b-1));
…
}
Printed Stack FrameEntering function call fac(a=2), code at 0x080484a5Address 0xff98942c: argument int p: 0x00000001Address 0xff989428: argument int a: 0x00000002Address 0xff989424: return address : 0x0804860eAddress 0xff989420: saved stack frame p: 0xff989440Address 0xff98941c: local var f[4-7] : 0x00202020Address 0xff989418: local var f[0-3] : 0x20202020Address 0xff989414: local var int b: 0x00000000Address 0xff989410: gap : 0x00000000
Entering function call fac(a=1), code at 0x080484a5Address 0xff98940c: argument int p: 0x00000001Address 0xff989408: argument int a: 0x00000001Address 0xff989404: return address : 0x0804860eAddress 0xff989400: saved stack frame p: 0xff989420Address 0xff9893fc: local var f[4-7] : 0x00202020Address 0xff9893f8: local var f[0-3] : 0x20202020Address 0xff9893f4: local var int b: 0x00000000Address 0xff9893f0: gap : 0x00000000
Stack Frame with Overflowed Buffer
Entering function call fac(a=1), code at 0x080484a5Address 0xffd5724c: argument int p: 0x00000001Address 0xffd57248: argument int a: 0x00000001Address 0xffd57244: return address : 0x0804860eAddress 0xffd57240: saved stack frame p: 0xffd57260Address 0xffd5723c: local var f[4-7] : 0x00202020Address 0xffd57238: local var f[0-3] : 0x20202020Address 0xffd57234: local var int b: 0x00000000Address 0xffd57230: gap : 0x00000000123456789012345
Leaving function call fac(a=1), code at 0x80484a5Address 0xffd5724c: argument int p: 0x00000001Address 0xffd57248: argument int a: 0x00000001Address 0xffd57244: return address : 0x00353433Address 0xffd57240: saved stack frame p: 0x32313039Address 0xffd5723c: local var f[4-7] : 0x38373635Address 0xffd57238: local var f[0-3] : 0x34333231Address 0xffd57234: local var int b: 0x00000001Address 0xffd57230: gap : 0x00000001Segmentation fault (core dumped)
Overflowing f to overwrite saved sfp and return address.
Input 15 bytes.
What does a function do?fac 0x080484a5 <+0>: push %ebp save stack frame pointer (fp) 0x080484a6 <+1>: mov %esp,%ebp set current stack fp 0x080484a8 <+3>: sub $0x18,%esp allocate space for local var 0x080484ab <+6>: movl $0x20202020,-0x8(%ebp) initialize f[0-3] 0x080484b2 <+13>: movl $0x202020,-0x4(%ebp) initialize f[4-7] 0x080484b9 <+20>: movl $0x0,-0xc(%ebp) initialize b 0x080484c0 <+27>: mov 0xc(%ebp),%eax load value of p to eax 0x080484c3 <+30>: test %eax,%eax check if eax is 0 0x080484c5 <+32>: je 0x80485e8 <fac+323> if so, skip printing frame .... 0x080485e8 <+323>: mov 0x8(%ebp),%eax load value of a to eax 0x080485eb <+326>: cmp $0x1,%eax check if a==1 0x080485ee <+329>: jne 0x80485f9 <fac+340> if not, call fac 0x080485f0 <+331>: movl $0x1,-0xc(%ebp) otherwise, assigns 1 to b 0x080485f7 <+338>: jmp 0x8048617 <fac+370> …. 0x08048609 <+356>: call 0x80484a5 <fac> 0x0804860e <+361>: mov 0x8(%ebp),%edx 0x08048611 <+364>: imul %edx,%eax
GDB commands for examining stack frames
• backtrace bt print all frames
• frame f print brief current frame info
• info frame info f print detailed current frame info
See http://web.mit.edu/gnu/doc/html/gdb_8.html for more
What is Buffer Overflow?
A buffer overflow, or buffer overrun, is an anomalous condition where a process attempts to store data beyond the boundaries of a fixed-length buffer.
The result is that the extra data overwrites adjacent memory locations. The overwritten data may include other buffers, variables and program flow data, and may result in erratic program behavior, a memory access exception, program termination (a crash), incorrect results or ― especially if deliberately caused by a malicious user ― a possible breach of system security.
Most common with C/C++ programs
History
Used in 1988’s Morris Internet Worm
Alphe One’s “Smashing The Stack For Fun And Profit” in Phrack Issue 49 in 1996 popularizes stack buffer overflows
Still extremely common today
What are buffer overflows?
Suppose a web server contains a function:
void func(char *str) { char buf[128];
strcpy(buf, str); do-something(buf);
}
When the function is invoked the stack looks like:
What if *str is 136 bytes long? After strcpy:
strret-addrsfpbuf
str *str ret
Basic stack exploitMain problem: no range checking in strcpy().
Suppose *str is such that after strcpy stack looks like:
When func() exits, the user will be given a shell !!
Note: attack code runs in stack.
topof
stack *str ret Code for P
Program P: exec( “/bin/sh” )
(exact shell code by Aleph One)
Carrying out this attack requires
Determine the location of injected code position on stack when func() is called.So as to change stored return address on stack to point to
it
Location of injected code is fixed relative to the location of the stack frame
Program P should not contain the ‘\0’ character.Easy to achieve
Overflow should not crash program before func() exits.
53
Summary of Stack-based Buffer Overflow
• Local variables occur before (in lower address) than stored return address
• If overflow occurs when writing to local variable buffers (e.g., character arrays), the return address may be overwritten.
• When the current function returns, it will go to address
Some unsafe C lib functions
strcpy (char *dest, const char *src)
strcat (char *dest, const char *src)
gets (char *s)
scanf ( const char *format, … )
sprintf (conts char *format, … )
Many others exist
Review
Steps of building a program
Static vs. shared library
Static vs. dynamic linking
Memory structure of a process:
text, data, stack, heap
Concept and structure of stack frames
Concept of buffer overflow; able to identify code that include buffer overflow
Coming Attractions
• Looking at even more detail how a program is compiled• Topic 3: Programming in FIZ