lcu14 201- binary analysis tools

44
LCU14 BURLINGAME C. Lyon & O. Javaid, LCU14 LCU14-201: Binary Analysis Tools

Upload: linaro

Post on 18-Nov-2014

346 views

Category:

Software


0 download

DESCRIPTION

LCU14 201- Binary Analysis Tools --------------------------------------------------- Speaker: C. Lyon & O. Javaid Date: September 16, 2014 --------------------------------------------------- ★ Session Summary ★ This session will be a presentation about currently available binary analysis tools, including: Sanitizers, perf (a performance counter and tracing profiling tool), record/replay (a reverse debugging facility in GDB) and prelink rootfs. --------------------------------------------------- ★ Resources ★ Zerista: http://lcu14.zerista.com/event/member/137726 Google Event: https://plus.google.com/u/0/events/ca2pdo9sn9r8n81l5vrbiibvcts Video: https://www.youtube.com/watch?v=QIu601HYwSA&list=UUIVqQKxCyQLJS6xvSmfndLA Etherpad: http://pad.linaro.org/p/lcu14-201 --------------------------------------------------- ★ Event Details ★ Linaro Connect USA - #LCU14 September 15-19th, 2014 Hyatt Regency San Francisco Airport --------------------------------------------------- http://www.linaro.org http://connect.linaro.org

TRANSCRIPT

Page 1: LCU14 201- Binary Analysis Tools

LCU14 BURLINGAME

C. Lyon & O. Javaid, LCU14

LCU14-201: Binary Analysis Tools

Page 2: LCU14 201- Binary Analysis Tools

● debug helpers: Sanitizers● perf● reverse debugging

Binary analysis tools

Page 3: LCU14 201- Binary Analysis Tools

● tools to help debug common programming errors○ ASAN: AddressSanitizer○ LSAN: LeakSanitizer○ TSAN: ThreadSanitizer○ MSAN: MemorySanitizer○ UBSAN: UndefinedBehaviorSanitizer

Sanitizers: what are they?

Page 4: LCU14 201- Binary Analysis Tools

● generate instrumented code (unlike valgrind)● errors are printed during execution● use run-time libraries

○ override memory allocation functions○ detect threads race conditions

● faster than valgrind

Sanitizers

Page 5: LCU14 201- Binary Analysis Tools

● memory error detector● use after free● heap/stack/global buffers overflows● use after return● double free/invalid free● typical slowdown: ~2x

Sanitizers: ASAN

Page 6: LCU14 201- Binary Analysis Tools

● -fsanitize=address compiler option● interaction with gdb:

○ set a bkp on __asan_report_error or AsanDie○ helper to describe a memory location

● run-time flags via ASAN_OPTIONS environment variable

ASAN: how to use it

Page 7: LCU14 201- Binary Analysis Tools

int main(int argc, char **argv) {

int *array = new int[100];

delete [] array;

return array[argc]; // Use after free

}

$ g++ -g -fsanitize=address asan.cc -o asan.exe

$ ./asan.exe

=================================================================

==21981==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x400834 bp 0x7fff631c2030 sp 0x7fff631c2028

READ of size 4 at 0x61400000fe44 thread T0

#0 0x400833 in main /tmp/asan.cc:4

#1 0x3a3ae1ecdc in __libc_start_main (/lib64/libc.so.6+0x3a3ae1ecdc)

#2 0x4006b8 (/tmp/asan.exe+0x4006b8)

0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0)

freed by thread T0 here:

#0 0x7fa4b8268617 in operator delete[](void*) (/lib64/libasan.so.1+0x55617)

#1 0x4007e7 in main /tmp/asan.cc:3

#2 0x3a3ae1ecdc in __libc_start_main (/lib64/libc.so.6+0x3a3ae1ecdc)

previously allocated by thread T0 here:

#0 0x7fa4b82681af in operator new[](unsigned long) (/lib64/libasan.so.1+0x551af)

#1 0x4007d0 in main /tmp/asan.cc:2

#2 0x3a3ae1ecdc in __libc_start_main (/lib64/libc.so.6+0x3a3ae1ecdc)

SUMMARY: AddressSanitizer: heap-use-after-free /tmp/asan.cc:4 main

Shadow bytes around the buggy address:

0x0c287fff9f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

0x0c287fff9f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

0x0c287fff9f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

0x0c287fff9fa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

=>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd

0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

0x0c287fff9fe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

0x0c287fff9ff0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa

0x0c287fffa000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

0x0c287fffa010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

Shadow byte legend (one shadow byte represents 8 application bytes):

Addressable: 00

Partially addressable: 01 02 03 04 05 06 07

Heap left redzone: fa

Heap right redzone: fb

Freed heap region: fd

Stack left redzone: f1

Stack mid redzone: f2

Stack right redzone: f3

Stack partial redzone: f4

Stack after return: f5

Stack use after scope: f8

Global redzone: f9

Global init order: f6

Poisoned by user: f7

Contiguous container OOB:fc

ASan internal: fe

==21981==ABORTING

ASAN: example

Page 8: LCU14 201- Binary Analysis Tools

● memory leak detector● run-time ASAN option or -fsanitize=leak

compiler option● no slowdown added to ASAN

Sanitizers: LSAN

Page 9: LCU14 201- Binary Analysis Tools

#include <stdlib.h>

void *p;

int main() {

p = malloc(7);

p = 0; // The memory is leaked here.

return 0;

}

$ gcc -g -fsanitize=leak lsan.c -o lsan.exe

$ ./lsan.exe

=================================================================

==24106==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 7 byte(s) in 1 object(s) allocated from:

#0 0x7fb12ee5c218 in malloc (/lib64/liblsan.so.0+0xb218)

#1 0x4006a5 in main /tmp/lsan.c:6

#2 0x3a3ae1ecdc in __libc_start_main (/lib64/libc.so.6+0x3a3ae1ecdc)

SUMMARY: LeakSanitizer: 7 byte(s) leaked in 1 allocation(s).

LSAN: example

Page 10: LCU14 201- Binary Analysis Tools

● data races detector● similar to helgrind● slowdown 5-15x● -fsanitize=thread -fPIE -pie compiler

options

Sanitizers: TSAN

Page 11: LCU14 201- Binary Analysis Tools

$ g++ -g -fsanitize=thread tsan.cc -o tsan.exe -pie -fPIE

$ ./tsan.exe

foo=

==================

WARNING: ThreadSanitizer: data race (pid=24197)

Read of size 1 at 0x7d080000efd8 by thread T1:

#0 memcmp <null>:0 (libtsan.so.0+0x000000048e7d)

#1 std::string::compare(std::string const&) const <null>:0 (libstdc++.so.6+0x0000000bd9a2)

#2 std::less<std::string>::operator()(std::string const&, std::string const&) const /include/c++/4.9.0/bits/stl_function.h:367 (tsan.exe+0x0000000018e3)

#3 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_lower_bound(std::_Rb_tree_node<std::pair<std::string const, std::string> >*, std::_Rb_tree_node<std::pair<std::string const, std::string> >*, std::string const&) /include/c++/4.9.0/bits/stl_tree.h:1260 (tsan.exe+0x000000001f10)

#4 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::lower_bound(std::string const&) /include/c++/4.9.0/bits/stl_tree.h:927 (tsan.exe+0x000000001b50)

#5 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::lower_bound(std::string const&) /include/c++/4.9.0/bits/stl_map.h:902 (tsan.exe+0x00000000182f)

#6 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::operator[](std::string const&) /include/c++/4.9.0/bits/stl_map.h:496 (tsan.exe+0x0000000015fb)

#7 threadfunc(void*) /tmp/tsan.cc:10 (tsan.exe+0x000000001386)

Previous write of size 8 at 0x7d080000efd8 by main thread:

#0 operator new(unsigned long) <null>:0 (libtsan.so.0+0x0000000496e2)

#1 std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) <null>:0 (libstdc++.so.6+0x0000000bddb8)

#2 __libc_start_main <null>:0 (libc.so.6+0x003a3ae1ecdc)

Location is heap block of size 28 at 0x7d080000efc0 allocated by main thread:

#0 operator new(unsigned long) <null>:0 (libtsan.so.0+0x0000000496e2)

#1 std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) <null>:0 (libstdc++.so.6+0x0000000bddb8)

#2 __libc_start_main <null>:0 (libc.so.6+0x003a3ae1ecdc)

Thread T1 (tid=24199, running) created by main thread at:

#0 pthread_create <null>:0 (libtsan.so.0+0x000000047c13)

#1 main /tmp/tsan.cc:17 (tsan.exe+0x00000000142e)

SUMMARY: ThreadSanitizer: data race ??:0 memcmp

==================

==================

WARNING: ThreadSanitizer: data race (pid=24197)

Read of size 8 at 0x7d0c0000efe0 by thread T1:

#0 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_S_left(std::_Rb_tree_node_base*) /include/c++/4.9.0/bits/stl_tree.h:545 (tsan.exe+0x000000001e08)

#1 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_lower_bound(std::_Rb_tree_node<std::pair<std::string const, std::string> >*, std::_Rb_tree_node<std::pair<std::string const, std::string> >*, std::string const&) /include/c++/4.9.0/bits/stl_tree.h:1261 (tsan.exe+0x000000001f2b)

#2 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::lower_bound(std::string const&) /include/c++/4.9.0/bits/stl_tree.h:927 (tsan.exe+0x000000001b50)

#3 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::lower_bound(std::string const&) /include/c++/4.9.0/bits/stl_map.h:902 (tsan.exe+0x00000000182f)

#4 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::operator[](std::string const&) /include/c++/4.9.0/bits/stl_map.h:496 (tsan.exe+0x0000000015fb)

#5 threadfunc(void*) /tmp/tsan.cc:10 (tsan.exe+0x000000001386)

Previous write of size 8 at 0x7d0c0000efe0 by main thread:

#0 operator new(unsigned long) <null>:0 (libtsan.so.0+0x0000000496e2)

#1 __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > >::allocate(unsigned long, void const*) /include/c++/4.9.0/ext/new_allocator.h:104 (tsan.exe+0x0000000030e9)

#2 __gnu_cxx::__alloc_traits<std::allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > > >::allocate(std::allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > >&, unsigned long) /include/c++/4.9.0/ext/alloc_traits.h:182 (tsan.exe+0x000000003073)

#3 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_get_node() /include/c++/4.9.0/bits/stl_tree.h:385 (tsan.exe+0x000000002ec7)

#4 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_create_node(std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:395 (tsan.exe+0x000000002c98)

#5 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_insert_(std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:1142 (tsan.exe+0x000000002683)

#6 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_insert_unique_(std::_Rb_tree_const_iterator<std::pair<std::string const, std::string> >, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:1602 (tsan.exe+0x000000001cca)

#7 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::insert(std::_Rb_tree_iterator<std::pair<std::string const, std::string> >, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_map.h:683 (tsan.exe+0x000000001a0c)

#8 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::operator[](std::string const&) /include/c++/4.9.0/bits/stl_map.h:504 (tsan.exe+0x0000000016c0)

#9 main /tmp/tsan.cc:18 (tsan.exe+0x000000001464)

Location is heap block of size 48 at 0x7d0c0000efd0 allocated by main thread:

#0 operator new(unsigned long) <null>:0 (libtsan.so.0+0x0000000496e2)

#1 __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > >::allocate(unsigned long, void const*) /include/c++/4.9.0/ext/new_allocator.h:104 (tsan.exe+0x0000000030e9)

#2 __gnu_cxx::__alloc_traits<std::allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > > >::allocate(std::allocator<std::_Rb_tree_node<std::pair<std::string const, std::string> > >&, unsigned long) /include/c++/4.9.0/ext/alloc_traits.h:182 (tsan.exe+0x000000003073)

#3 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_get_node() /include/c++/4.9.0/bits/stl_tree.h:385 (tsan.exe+0x000000002ec7)

#4 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_create_node(std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:395 (tsan.exe+0x000000002c98)

#5 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_insert_(std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:1142 (tsan.exe+0x000000002683)

#6 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_insert_unique_(std::_Rb_tree_const_iterator<std::pair<std::string const, std::string> >, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_tree.h:1602 (tsan.exe+0x000000001cca)

#7 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::insert(std::_Rb_tree_iterator<std::pair<std::string const, std::string> >, std::pair<std::string const, std::string> const&) /include/c++/4.9.0/bits/stl_map.h:683 (tsan.exe+0x000000001a0c)

#8 std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::operator[](std::string const&) /include/c++/4.9.0/bits/stl_map.h:504 (tsan.exe+0x0000000016c0)

#9 main /tmp/tsan.cc:18 (tsan.exe+0x000000001464)

Thread T1 (tid=24199, running) created by main thread at:

#0 pthread_create <null>:0 (libtsan.so.0+0x000000047c13)

#1 main /tmp/tsan.cc:17 (tsan.exe+0x00000000142e)

SUMMARY: ThreadSanitizer: data race /include/c++/4.9.0/bits/stl_tree.h:545 std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_S_left(std::_Rb_tree_node_base*)

==================

ThreadSanitizer: reported 2 warnings

TSAN: example#include <pthread.h>

#include <stdio.h>

#include <string>

#include <map>

typedef std::map<std::string, std::string> map_t;

void *threadfunc(void *p) {

map_t& m = *(map_t*)p;

m["foo"] = "bar";

return 0;

}

int main() {

map_t m;

pthread_t t;

pthread_create(&t, 0, threadfunc, &m);

printf("foo=%s\n", m["foo"].c_str());

pthread_join(t, 0);

}

Page 12: LCU14 201- Binary Analysis Tools

● uninitialized memory reads detector● much faster than valgrind

Sanitizers: MSAN

Page 13: LCU14 201- Binary Analysis Tools

● undefined behavior checker● -fsanitize=undefined compiler option

Sanitizers: UBSAN

Page 14: LCU14 201- Binary Analysis Tools

$ gcc -g -fsanitize=undefined ubsan.c -o ubsan.exe

$ ./ubsan.exe

ubsan.c:9:13: runtime error: shift exponent 33 is too large for 32-bit type 'int'

ubsan.c:15:9: runtime error: division by zero

ubsan.c:20:9: runtime error: division of -2147483648 by -1 cannot be represented in type 'int'

ubsan.c:25:5: runtime error: load of null pointer of type 'int'

ubsan.c:29:4: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

UBSAN: examples#include <stdio.h>

#include <limits.h>

int main() {

/* shift */

int i=1;

int j=33;

int k = i << j;

/* division by 0 */

i = 1;

j = 0;

k = i / j;

/* int_min / -1 */

i = INT_MIN;

j = -1;

k = i / j;

/* null */

int *ptr = NULL;

i = *ptr;

/* signed int overflow */

i = INT_MAX;

i++;

}

Page 15: LCU14 201- Binary Analysis Tools

● Developed by Google for LLVM● Ported to GCC (on-going)

○ appeared in gcc-4.8 for x86_64○ enablement needed target by target

● TSAN needs 64 bit pointers○ won’t be available on Aarch32

Sanitizers: availability

Page 16: LCU14 201- Binary Analysis Tools

MSAN is not available in GCC yetLLVW has more options available than GCC[1] TSAN requires 64 bit pointers[2] ASAN/UBSAN enablement patch on AArch64 submitted b/o September

Sanitizers: availability in GCC ASAN LSAN TSAN UBSAN

i686 YES NO NO YES

x86_64 YES YES YES YES

AArch32 YES WONT[1] YES

AArch64 YES[2] YES[2]

Page 17: LCU14 201- Binary Analysis Tools

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/

Page 18: LCU14 201- Binary Analysis Tools

● What is gdb record/replay?● Record execution state of a program - Sufficient for reproducing execution.● Store recorded state in a core file● Replay recorded execution state

● What is reverse debugging?● Ability to debug program backwards● Allows you to step/continue backward in time● Allows you set reverse breakpoints/watchpoints● Allows to revert to an earlier execution state

● Reverse debugging with record/replay● Start recording your program during execution● Debug forward and backward during recording● Debug forward and backward with replay

GDB Reverse Debugging: An Introduction

Page 19: LCU14 201- Binary Analysis Tools

● Forward vs Reverse● Forward

● Operating system support for debugging - ptrace syscall (YES)● Hardware support for debugging - Debug instructions, registers etc (YES)● Hardware ability to trap, halt or break (YES)

● Reverse● Going Back to future has its damages● Operating System ability to reverse execution (NO)● Hardware ability to go back in time (NO)

● What to do for reverse?● Best possible reproduction of past execution state

● Process Data: Memory, Registers, Threads etc ● OS Data Structures: Processes, Threads etc● Hardware State: Timing, cache, interrupts etc

● Maintain maximum possible cost benefit balance

GDB Reverse Debugging: How It Works

Page 20: LCU14 201- Binary Analysis Tools

● What?● GDB needs ability to store machine state● GDB needs ability to revert to a past state

● How?● After an instruction is executed

● Record registers that were modified● Record memory location that were changed● Keep record data in an memory buffer● Save to a core file if replay/reverse is needed

● Revert registers and memory to step backwards● Load saved record by loading core file

GDB Reverse Debugging: How It Works

Page 21: LCU14 201- Binary Analysis Tools

● Reverse-Step (rs)● Reverse-Continue (rc)● Reverse-Finish● Reverse-Next (rn)● Reverse-Nexti● Reverse-Stepi● set exec-direction (forward/reverse)● Break, Watch etc

GDB Reverse Debugging: Commands Overview

Page 22: LCU14 201- Binary Analysis Tools

● Configuration UI

GDB Reverse Debugging: Eclipse CDT UI

Page 23: LCU14 201- Binary Analysis Tools

● Run control UI

GDB Reverse Debugging: Eclipse CDT UI

Page 24: LCU14 201- Binary Analysis Tools

● Significant speedup over cyclic debugging

GDB Reverse Debugging: Some Use-Cases

STEPS

Reverse

Bug

Forward

Program Running

Reverse Debugging

Page 25: LCU14 201- Binary Analysis Tools

● Capture notorious bugs with record/replay

GDB Reverse Debugging: Some Use-Cases

Program Re-running

STEPS

No Bug OccuredProgram Running

Program Running

Program Re-runningNo Bug Occured

Bug

Crash

Same Bug

Program Running

Page 26: LCU14 201- Binary Analysis Tools

● Limited record log size● Serial/sequential execution● CPU overhead for saving/restoring state● Does not restores system state● Limitations for multi-threaded program and non-stop mode● Not of much use for analysis of complex bugs● Terminal/UI panic

GDB Reverse Debugging: Limitations

Page 27: LCU14 201- Binary Analysis Tools

● Mozilla RR● Record/Replay● Reverse debugging● Claims its more efficient than GDB● Claims to debug complex applications like FireFox browser

● References● http://www.gnu.org/software/gdb/news/reversible.html● http://www.codeproject.com/Articles/235287/Reverse-Debugging-using-GDB● https://sourceware.org/gdb/current/onlinedocs/gdb/Process-Record-and-Replay.html● http://rr-project.org

GDB Reverse Debugging: In research

Page 28: LCU14 201- Binary Analysis Tools

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/

Page 29: LCU14 201- Binary Analysis Tools

● What is PERF? (Performance Counters for Linux)● Almost a superset of all tracing and profiling tools available on Linux● Integrated with Linux kernel● Hardware + Software + Trace + More● Light weight profiling (Low Overhead)● Not for tracing and profiling the kernel only● Profile and trace user-space applications

● How PERF does it?● Hardware: PMU (Performance Counters)● Perf kernel module● Perf user-space application

Linux Perf Tools: An Overview

Page 30: LCU14 201- Binary Analysis Tools

● Why● Your app or kernel consuming CPU?● Your application is starving for CPU?● Certain threads holding onto locks?

● Which● Part of kernel/application code causing cache misses?● Application consuming memory?

● What● has caused driver performance downgrade?● is average syscall handling overhead?● cpu and memory optimizations are possible in your code?

● And a lot more...

Linux Perf Tools: What perf can do for you...

Page 31: LCU14 201- Binary Analysis Tools

● Hardware Events● cycles, branches, instructions etc● cache-references, cache-misses etc

● Hardware Cache Event● L1/L2 cache loads, stores, misses etc● TLB loads, stores misses etc

● Software Events● task-clock, page-faults, context-switches etc

● Kernel PMU Events● cpu/branch-instructions● cpu/cache-misses

● Trace Events

Linux Perf Tools: Events

Page 32: LCU14 201- Binary Analysis Tools

● Source: http://www.brendangregg.com/linuxperf.html

Linux Perf Tools: Perf coverage map

Page 33: LCU14 201- Binary Analysis Tools

● Perf Installation on Ubuntu● apt-get install linux-tools

● Commandline tools under perf● record: Run a command and record its profile into perf.data● report: Read perf.data (created by perf record) and display profile● lock: Analyze lock events● mem: Profile memory accesses● timechart: Tool to visualize total system behavior during a workload● top: System profiling tool● trace: strace inspired tool● probe: Define new dynamic tracepoints● kmem: Tool to trace/measure kernel memory(slab) properties

● Write “perf” on commandline to get full list

Linux Perf Tools: User Interface (Commands)

Page 34: LCU14 201- Binary Analysis Tools

● Graphical UI● Install the Perf plug-in for Eclipse● http://www.eclipse.org/linuxtools/projectPages/perf/● http://wiki.eclipse.org/Linux_Tools_Project/PERF/User_Guide

● Source: http://wiki.eclipse.org/Linux_Tools_Project/PERF/User_Guide

Linux Perf Tools: User Interface (Graphical)

Page 35: LCU14 201- Binary Analysis Tools

● perf record● perf record [options] [commandline] [arguments]● Generates an output file called perf.data

● perf report● reads perf.data● generates a concise execution profile

● perf annotate● Performs source level analysis● Binary should be compiled with debug info

● List all raw events● perf script (from perf.data by default)

Linux Perf Tools: Sampling and analysis

Page 36: LCU14 201- Binary Analysis Tools

● Counting events● perf stat [application] [argument]● Keeps a event count during process execution● Displays a common list of events by default● Can count specific events● Both user and kernel level code

● Real-time monitoring: Perf Top● “perf top” prints sampled functions in real time● Configurable but shows all CPUs by default● Shows user-level as well as kernel functions● Show system calls by process, refreshing every 2 seconds

● perf top -e raw_syscalls:sys_enter -ns comm

Linux Perf Tools: Monitoring

Page 37: LCU14 201- Binary Analysis Tools

● Benchmarking● Scripting● Static Tracing● Dynamic Tracing● Much more..

source: http://www.brendangregg.com/perf_events

Linux Perf Tools: Perf also supports

Page 38: LCU14 201- Binary Analysis Tools

● Some other tools● LTTNG● SystemTAP● gprof● Perfctr● oprofile● Sysprof● Dtrace

● References● http://www.brendangregg.com/perf.html● https://perf.wiki.kernel.org/index.php/Tutorial● https://perf.wiki.kernel.org/index.php/Main_Page

Linux Perf Tools: Concluding..

Page 39: LCU14 201- Binary Analysis Tools

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/

Page 40: LCU14 201- Binary Analysis Tools

● Dynamic vs Static Linking● Significantly reduced binary size● Library code shared and updated without recompile● But run time address calculation overhead● More libraries means higher startup time● Address binding to a fixed address: Not a good idea!!● Overhead burden increases with frequent load/un-load

● Preload● Load ahead of time based on frequency of use● A daemon that runs in background● Useful with frequently run program● Requires constant extra space in memory● Not for apps that are not unloaded frequently● Caching may be doing the same already

Prelink: Some background first...

Page 41: LCU14 201- Binary Analysis Tools

● Speeds up application load time● By reducing dynamic linking overhead● But only for library dependent application like KDE, QT etc● Pre-calculate dependencies● Load libraries to preferred addresses● Revert to dynamic linking if prelink fails.

Prelink: What it is?

Page 42: LCU14 201- Binary Analysis Tools

● Use with Caution: It may mess your system up!● How to set it up?

● Install prelink● sudo apt-get install prelink

● Configure what to prelink● edit /etc/default/prelink● Enable by "PRELINKING=unknown” from “unknown" to "yes"

● Start a daily update● /etc/cron.daily/prelink

● Undo by● setting "PRELINKING=no” in /etc/default/prelink● run /etc/cron.daily/prelink

● Run again whenever you update/install new stuff

Prelink: How it works?

Page 43: LCU14 201- Binary Analysis Tools

● Advantages● Good for systems like Infotainment Systems, Set-Top-Boxes etc● Provides significant speedup on application loading time● Can undo/redo prelink

● Disadvantages● ReLink required on package upgrade● Predictable shared library locations (no ASLR)● Modifies files which means MD5 mis-match● Hard to maintain system integrity with frequent updates/changes

● References● https://wiki.gentoo.org/wiki/Prelink

Prelink: Is it worth the effort?

Page 44: LCU14 201- Binary Analysis Tools

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/