cosmopen: dynamic reverse engineering on a budget · 1ghz pentium iii, linux kernel 2.4, gdb 5.1-1...

37
CosmOpen: Dynamic reverse engineering on a budget François Ta ïani 1 , Marc-Olivier Killijian 2 , Jean-Charles Fabre 2 1 Lancaster University, UK 2 LAAS-CNRS, France ([email protected])

Upload: others

Post on 26-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

CosmOpen: Dynamic reverse

engineering on a budget

François Taïani1, Marc-Olivier Killijian2, Jean-Charles Fabre2

1Lancaster University, UK2LAAS-CNRS, France([email protected])

Page 2: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 2

The best and safest method of philosophizingseems to be, first to inquire diligently into theproperties of things, and to establish thoseproperties by experiences and then to proceedmore slowly to hypotheses for the explanationof them.

Let us inquire diligently

Isaac Newton

Page 3: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 3

Lancaster Middleware Group

Facilitate development of new distributed systems

Abstractions?

Mechanisms?

Exotic environments: MANETs, WSN, routers, …

Page 4: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 4

Lancaster Middleware Group

Facilitate development of new distributed systems

Abstractions?

Mechanisms?

Exotic environments: MANETs, WSN, routers, …

Flip-side: study of existing middleware

Emergent structures?

Development practices?

How to ease refactoring / dissemination?

Page 5: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 5

Example: one CORBA request

one request,

2065 individual invocations,

over 50 C-functions and 140

C++ classes.

ORBacus Request

Processing

How to analyse this?

Page 6: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 6

Outline

Why reverse engineering middleware?

Our approach: software “interferometry”

Does it work? A case study

Page 7: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 7

Outline

Why reverse engineering middleware?

Our approach: software “interferometry”

Does it work? A case study

Page 8: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 8

Original goal

Control non-determinism in industry grade middleware

Observation of mutex activities

Link to request life cycle

Why?

OS

application

intergiciel(s)

syscalls, syslibs(synchronisation,

memory managment...)

middleware services(RPC, pub/sub, …)

Page 9: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 9

Original goal

Control non-determinism in industry grade middleware

Observation of mutex activities

Link to request life cycle

Why?

syscalls, syslibs(synchronisation,

memory managment...)

middleware services(RPC, pub/sub, …)

Complex & layered system

Hard to observe and analyse

P1: Large behavioural data-set

P2: Cross-layer entangling

Page 10: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 10

P1: Large behavioural data-sets

Example: Globus

Huge piece of software (3.9.x)

123,839 lines in Java (without reused libraries)

(1,908,810 lines in C/C++, including reused libraries)

Many libraries layered

XML, WSDL (Descr. Lang), WSRF (Resource Fwork)

Axis (SOAP), Xerces (XML Parsing), com.ibm.wsdl

Java: exhaustive tracing (outside the JVM libs)

client : 1,544,734 local method call (sic)

server : 6,466,652 local method calls (sic) [+time out]

Page 11: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

P1: Large behavioural data-sets

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

1 4 7

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

Stack Depth

Ca

lls

org.apache.xerces

org.apache.xml

org.apache.axis

org.apache.log4j

org.apache.xpath

org.apache.commons

com.ibm.wsdl

others

(globus client, 1 creation, 4

requests, 1 destruction) Projection w.r.t.

stack depth

package (structure)

Page 12: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

P1: Large behavioural data-sets

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

1 4 7

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

Stack Depth

Ca

lls

org.apache.xerces

org.apache.xml

org.apache.axis

org.apache.log4j

org.apache.xpath

org.apache.commons

com.ibm.wsdl

others

(globus client, 1 creation, 4

requests, 1 destruction)

Far too large for manual analysis

Projection w.r.t.

stack depth

package (structure)

Exhaustive tracing: high observation costs

Intractable interferences

Page 13: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 13

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

P2: Cross-layer entangling

Page 14: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 14

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

P2: Cross-layer entangling

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

?OS

dummy1 dummy2 ob

se

rva

tion

by tra

cin

g

Page 15: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 15

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

P2: Cross-layer entangling

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

?OS

dummy1 dummy2 ob

se

rva

tion

by tra

cin

g

Can we reconstruct main’s behaviour?

Page 16: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 16

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

P2: Cross-layer entangling

OS

pthread

program

Page 17: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 17

int main () {

pthread_t threadN1, threadN2 ;

pthread_create(&threadN1, NULL, dummy1, NULL) ;

pthread_create(&threadN2, NULL, dummy2, NULL) ;

pthread_join ( threadN1, NULL) ;

pthread_join ( threadN2, NULL) ;

};

P2: Cross-layer entangling

OS

pthread

program

Program behaviour not apparent

Reason: mixes several abstraction levels

Much more complex than original code

Page 18: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 18

Outline

Why reverse engineering middleware?

Our approach: software “interferometry”

Does it work? A case study

Page 19: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 19

Approach: “interferometry”

Limit observation costs: more for less

capture stack traces (more)

limit observation points to component boundaries (less)

Filter out unneeded data: “interferometry”

graph manipulation script language

reusable filters

vs.

no turbulence

but very costly

‘cheap’

but post-processing

needed

Page 20: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

20

More for less

#0 0x405c8f20 in send () from /lib/libpthread.so.0

#1 0x4015804a in omni::tcpConnection::Send ()

#2 0x4011364c in omni::giopStream::sendChunk ()

[..]

#8 0x400bdb0a in omniAsyncWorker::run ()

#9 0x405abbdf in omni_thread_wrapper ()

#10 0x405c2bf0 in pthread_start_thread ()

#11 0x405c2c6f in pthread_start_thread_event ()

Page 21: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 21

Reconstructing a call tree Problem: stack traces are ambiguous

Choice: “smallest” compatible call tree

a variant of smallest prefix tree (with timestamps)

can be characterised formally

main

launch

broadcast

marshallAndSend

send

main

launch

broadcast

marshallAndSend

send

2 identical traces

Page 22: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

Provides variables to store intermediary results

A number of graph operators

selection

put ::pthread_create* G CREATE

recursive extension

forward CREATE G

boolean algebra

remove ::pthread_create* CREATE

temporal operations fuse ::pthread_create* ::pthread_start_thread* G

Graph manipulation filters

Page 23: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 23

Revisiting pthread

fuse ::pthread_create* ::pthread_start_thread* G

Page 24: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 24

Revisiting pthread

‘Reconstructing’ data-flow from temporal succession

fuse ::pthread_create* ::pthread_start_thread* G

Page 25: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 25

Revisiting pthreadfuse ::pthread_create* ::pthread_start_thread* G

put ::pthread_create* G CREATE

forwN 1 CREATE G

remove ::pthread_create* CREATE

remove ::pthread_start_thread* CREATE

forward CREATE G

exclude CREATE G

absPatern ::pthread_create* G

absPatern ::pthread_start_thread* G

Page 26: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 26

Revisiting pthreadfuse ::pthread_create* ::pthread_start_thread* G

put ::pthread_create* G CREATE

forwN 1 CREATE G

remove ::pthread_create* CREATE

remove ::pthread_start_thread* CREATE

forward CREATE G

exclude CREATE G

absPatern ::pthread_create* G

absPatern ::pthread_start_thread* G

Generic script: applies to any pthread code

Page 27: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 27

3 C/C++ industry-grade CORBA products

ORBacus, omniORB, TAO

Set up

~ 60 breakpoints (locks, memory, threading, callbacks)

one ping-pong request

1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1

Case study

Page 28: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 28

3 C/C++ industry-grade CORBA products

ORBacus, omniORB, TAO

Set up

~ 60 breakpoints (locks, memory, threading, callbacks)

one ping-pong request

1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1

Observation overheads (Orbacus)

non-instrumented: < 1s

fully instrumented: 1h 2m 11s

lock tracing disabled during initialisation: 4m 53s

Case study

Page 29: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 29

Outline

Why reverse engineering middleware?

Our approach: software “interferometry”

Does it work? A case study

Page 30: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 30

reconstructed tree

= added value

breakpoint

activation

= cost

ratio

= speedup

ORB threads traces frames invocations invocations/traces

ORBACUS 4.1 8 658 9178 2066 3.13

OMNIORB 4 7 1828 16807 3088 1.68

TAO 1.2.1 6 512 11260 1352 2.64

Stack capture speedup

(with lock tracing disabled during initialisation)

#0 0x405c8f20 in send () from /lib/libpthread.so.0

#1 0x4015804a in omni::tcpConnection::Send ()

#2 0x4011364c in omni::giopStream::sendChunk ()

[..]

#8 0x400bdb0a in omniAsyncWorker::run ()

#9 0x405abbdf in omni_thread_wrapper ()

#10 0x405c2bf0 in pthread_start_thread ()

#11 0x405c2c6f in pthread_start_thread_event ()

Page 31: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 31

Orbacus complete graph// [remove pthread]

put ::recv'* GlobalGraph R

put ::send'* GlobalGraph R put Hello_impl::say_hello'* GlobalGraph R

put ::accept'* GlobalGraph R

backward R GlobalGraph

put ::lsf_thread_adapter'* ABS

put JTC* ABS put OCI::* ABS

put ::__libc_start_main't1-0 ABS

put ::run't1-9 ABS

put OB::ORBControl::initializeRootPOA't1-11 ABS

put OBPortableServer::POAManagerFactory_impl::create_poa_manager't1-12 POA forwN 3 POA R

add POA ABS

put OBPortableServer::POAPolicies::POAPolicies't1-42 ABS

put OB::DispatchStrategyFactory_impl::* ABS abstract ABS R

abstract ABS GlobalGraph

remove ::accept't3-882 R

absPatern ::recv't8-828 R

absPatern ::recv't8-1309 R absPatern ::recv't8-1908 R

put OB::Upcall::Upcall't8-1083 U

backward U GlobalGraph

add U R

put OB::ThreadPool::add't8-1224 TP put OB::ThreadPool::get't4-1265 TP

backward TP GlobalGraph

add TP R

put OB::GIOPServerStarterThreaded::StarterThread::* ABS2

put OB::DispatchThreadPool_impl::* ABS2 put OB::DispatchRequest_impl::* ABS2

put OB::POAOAInterface_impl::* ABS2

abstract ABS2 R

absPatern OB::GIOPServerWorker::executeRequest't8-* R

2065 individual

invocations, over 50

C-functions and 140

C++ classes.

Page 32: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 32

Result

26 individual

invocations, 4 C

functions, and

23 C++ classes.

Page 33: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 33

RequestAfterApplication

RequestBeforeApplication

class

thread creation

object creation

RequestContentionPoint

method call

Application

Deterministic replication

link OS level ↔ app level

use reflective approach

instrument MW to realise

meta-model

ref: [DSN’03] and [DSN’05]

Page 34: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 34

Conclusion

Pragmatic approach to dynamic reverse engineering

addresses complex layered software

minimises interferences by using stack traces

reconstruction and graph manipulation for analysis

Caveat

reverse engineering remains a human activity

constructed graphs provide a roadmap / guide

but must be supported by other activities

Future

better analysis of help provided (user studies)

better formalisation of consistency in graph manipulation

Page 35: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 35

(Some) references

François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen:

Dynamic reverse-engineering on a budget. Software: Practice and

Experience, to appear [contact me for most recent version]

Chen YFR, Gansner ER, and Koutsofios E. A C++ data model

supporting reachability analysis and dead code detection. SIGSOFT

Softw. Eng. Notes 1997; 22(6):414-431

Jerding DF, Stasko JT, and Ball T. Visualizing interactions in program

executions. Proc. of the 19th Int. Conf. on Software Engineering

(ICSE’97), ACM Press, 1997; 360-370

Walker RJ, Murphy GC, Freeman-Benson B, Wright D, Swanson D,

Isaak J. Visualizing Dynamic Software System Information through High-

Level Models. OOPSLA’98, ACM Press, 1998; 271-283

Systa T, Koskimies K, Mueller H. Shimba – an environment for reverse

engineering Java software systems. Software Practice and Experience

(SPE), 2001; 31(4):371-394. DOI: 10.1002/spe.386

Page 36: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 36

(Some) references (cont)

Zaidman A, Calders T, Demeyer S, Paredaens J. Applying Webmining

Techniques to Execution Traces to Support the Program Comprehension

Process. In Proc. of the Ninth European Conf. on Software Maintenance

and Reengineering (CSMR’2005). IEEE Computer Society, 2005; 134-

142. DOI: 10.1109/CSMR.2005.12

39. Wolf F, Mohr B, Dongarra J, Moore S: Efficient Pattern Search in

Large Traces through Successive Refinement. Proc. of the 2004

European Conf. on Parallel Computing (Euro-Par), Lecture Notes in

Computer Science, Springer, 2004; 3149:47-54.

Arnold DC, Ahn DH, de Supinski BR, Lee GL, Miller BP, Schulz M. Stack

Trace Analysis for Large Scale Debugging. Proc. of the IEEE Int. Parallel

& Distributed Processing Symposium (IPDPS 2007), Long Beach, CA,

2007. DOI: 10.1109/IPDPS.2007.370254

http://ftaiani.ouvaton.org/7-software/

Page 37: CosmOpen: Dynamic reverse engineering on a budget · 1GHz Pentium III, Linux kernel 2.4, gdb 5.1-1 ... François Taïani, Marc-Olivier Killijian, Jean-Charles Fabre, CosmOpen: Dynamic

F. Taiani 37

Thank you

Questions?