regulating program behavior via mainstream computing · regulating program behavior via mainstream...

68
STATISTICALLY REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen IBM Research Lab Thursday, April 29, 2010

Upload: others

Post on 04-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

STATISTICALLY REGULATING PROGRAM

BEHAVIOR VIA MAINSTREAM COMPUTING

Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

IBM Research Lab

Thursday, April 29, 2010

Page 2: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

OUR WORKTHE ELEVATOR PITCH

• Build anomaly detection into a deployed application

• Flag the execution of the application if it appears to be abnormal

• Give the user the ability to...

• adjust the meaning of “abnormal”

• decide how to proceed when flags are raised

Thursday, April 29, 2010

Page 3: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MOTIVATIONANOMALOUS EXECUTION

•Attacks on vulnerable code

• Buffer overruns, value overflow and underflow, denial of service, injection attacks, etc.

• Soft errors

Thursday, April 29, 2010

Page 4: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MOTIVATIONANOMALY DETECTION

application

model of normal usage

Profile application with a set of inputs to construct a model of normal behavior

Thursday, April 29, 2010

Page 5: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MOTIVATIONANOMALY DETECTION

•Models for anomaly detection are trained on a set of inputs (called the training set)

•Generally, training with more inputs reduces false positives

•...but, increases the number of false negatives of the model

•Current systems don’t give the user a method for adjusting the tradeoff between the two “falses”

Thursday, April 29, 2010

Page 6: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MOTIVATIONANOMALY DETECTION

•Models for anomaly detection are trained on a set of inputs (called the training set)

•Generally, training with more inputs reduces false positives

•...but, increases the number of false negatives of the model

•Current systems don’t give the user a method for adjusting the tradeoff between the two “falses”

Thursday, April 29, 2010

Page 7: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MOTIVATIONANOMALY DETECTION

•Models for anomaly detection are trained on a set of inputs (called the training set)

•Generally, training with more inputs reduces false positives

•...but, increases the number of false negatives of the model

•Current systems don’t give the user a method for adjusting the tradeoff between the two “falses”

Thursday, April 29, 2010

Page 8: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MAINSTREAM COMPUTINGCONCEPTUAL FIGURE

95% of executions

99% of executions

gmailbing

hulu

facebookyoutube

cnn

Thursday, April 29, 2010

Page 9: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MAINSTREAM COMPUTINGCONCEPTUAL FIGURE

Reject

95% of executions

99% of executions

gmailbing

hulu

facebookyoutube

cnn

Thursday, April 29, 2010

Page 10: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

MAINSTREAM COMPUTINGCONCEPTUAL FIGURE

Reject

95% of executions

99% of executions

gmailbing

hulu

facebookyoutube

cnn

Thursday, April 29, 2010

Page 11: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

OUR SYSTEMMAINSTREAM COMPUTING

• Allow a user to say, “Ensure that this execution conforms with 99% of the usage patterns for the application”

• System constructs a model that is statistically guaranteed to raise a flag at most 1% of the times the application is invoked

• Provide a single knob for each application

• Allow the user to select what action is taken when a flag is raised

Thursday, April 29, 2010

Page 12: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

RECOURSE

• What should we do when the execution looks abnormal?

Thursday, April 29, 2010

Page 13: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

HIGH LEVEL VIEW

• Collaborative approach

• A centralized server collects runtime profiles from clients

• Centralized server uses these runtime profiles to generate constraint sets for applications

clientserver

client

client

Thursday, April 29, 2010

Page 14: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

clientserver

pfail = 3%

• Clients have two main tasks:

1.Ensure that server provided constraints are not violated

2.Continually sample aspects of execution for this invocation

CLIENT OPERATION

Thursday, April 29, 2010

Page 15: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

client

ID RANGE

dim [1,3]

ix [-1,99]

j [0,99]

constraint set

server

pfail = 3%

• Clients have two main tasks:

1.Ensure that server provided constraints are not violated

2.Continually sample aspects of execution for this invocation

CLIENT OPERATION

Thursday, April 29, 2010

Page 16: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

client

ID RANGE

dim [2,3]

ix [-1,52]

j [0,19]

runtime profile

ID RANGE

dim [1,3]

ix [-1,99]

j [0,99]

constraint set

server

pfail = 3%

• Clients have two main tasks:

1.Ensure that server provided constraints are not violated

2.Continually sample aspects of execution for this invocation

CLIENT OPERATION

Thursday, April 29, 2010

Page 17: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

client

ID RANGE

dim [2,8]

ix [-1,52]

j [0,19]

runtime profile

server

• Server aggregates runtime profiles, from the clients

• Creates constraint sets with statistical bounds on failure rates

• Can probabilistically tolerate runtime profiles from rogue users

SERVER OPERATION

Thursday, April 29, 2010

Page 18: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

• Server aggregates runtime profiles, from the clients

Thursday, April 29, 2010

Page 19: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

• Separate runtime profiles into a training set and a validation set

• These sets are disjoint

Thursday, April 29, 2010

Page 20: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

• Separate runtime profiles into a training set and a validation set

• These sets are disjoint

...

training set

Thursday, April 29, 2010

Page 21: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

• Separate runtime profiles into a training set and a validation set

• These sets are disjoint

...

training set validation set

...

Thursday, April 29, 2010

Page 22: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

• Create a model of nominal behavior using runtime profiles in the training set

Thursday, April 29, 2010

Page 23: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training setID RANGE

dim [1,2]

ix [0,99]

j [1,99]

• Create a model of nominal behavior using runtime profiles in the training set

ID RANGE

dim [2,3]

ix [-1,10]

j [2,11]

U

Thursday, April 29, 2010

Page 24: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training setID RANGE

dim [1,2]

ix [0,99]

j [1,99]

• Create a model of nominal behavior using runtime profiles in the training set

ID RANGE

dim [2,3]

ix [-1,10]

j [2,11]

ID RANGE

dim [1,3]

ix [-1,99]

j [1,99]

U =

Thursday, April 29, 2010

Page 25: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

Thursday, April 29, 2010

Page 26: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [1,12]

ix [1,50]

j [3,91]

!

Thursday, April 29, 2010

Page 27: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [1,12]

ix [1,50]

j [3,91]

!!!!

Thursday, April 29, 2010

Page 28: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [1,12]

ix [1,50]

j [3,91]

!

!

!!!

Thursday, April 29, 2010

Page 29: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

!

!

Thursday, April 29, 2010

Page 30: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [0,11]

ix [1,50]

j [3,91]

!

!

Thursday, April 29, 2010

Page 31: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [0,11]

ix [1,50]

j [3,91]

!

!

"!!

Thursday, April 29, 2010

Page 32: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...

ID RANGE

dim [0,11]

ix [1,50]

j [3,91]

!

!

"

"

!!

Thursday, April 29, 2010

Page 33: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...!"!! " "! !!"

P̂fail =Nfailed

Nvalidate± !

Thursday, April 29, 2010

Page 34: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...!"!! " "! !!"

• This is only an estimate, the accuracy of which depends on Nvalidate

P̂fail =Nfailed

Nvalidate± !

Thursday, April 29, 2010

Page 35: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...!"!! " "! !!"

• We can find a statistical upper bound for the failure rate by using the well-known solution to the polling problem

P̂fail =Nfailed

Nvalidate± !

Thursday, April 29, 2010

Page 36: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

ID RANGE

dim [1,12]

ix [-2,99]

j [0,99]

constraint set

validation set

...!"!! " "! !!"

SET Ntrain

100 35.42

200 22.98

......

...

3900 0.10

4000

P̂fail =Nfailed

Nvalidate± !

P̂fail

Thursday, April 29, 2010

Page 37: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

SERVER OPERATION

runtime profiles

...

...

training set

constraint set

validation set

...!"!! " "! !!"

SET Ntrain

100 35.42

200 22.98

......

...

3900 0.10

4000

P̂fail =Nfailed

Nvalidate± !

P̂fail

Thursday, April 29, 2010

Page 38: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

PROTOTYPE IMPLEMENTATION

• Augmented GCC (version 4.2) with a mainstream computing pass

• Pass inserts calls to a runtime library that simultaneously sample execution and ensure constraints aren’t violated

• Object file constructor modified to initialize execution constraints

• Communication with server implemented as a daemon

• When user changes tolerances, daemon fetches latest constraint set from the server

• Similarly, it periodically pushes client runtime profiles back to the server

Thursday, April 29, 2010

Page 39: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

RESULTS

• Perform the following experiments

• False positive study

• Detecting exploits

• Detecting soft errors, failure oblivious execution, runtime overhead

• We simulate a user community

} tradeoff

Thursday, April 29, 2010

Page 40: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

: FALSE POSITIVESP̂fail

!"##$%

&'(#)*

!" !#$%& "'(%)*++ ,)*% ,#$% -%*, .$!%'%%.*) .$!/$00 /1) 2" (1345446

75446

&5446

85446

95446

:5446

;5446

<5446

=5446

>5446

745446

775446

7&5446

785446&:; :7& 74&9 &49= 94>; =7>& 7;8=9

01$.?)*@)1/*

Thursday, April 29, 2010

Page 41: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

: FALSE POSITIVESP̂fail

• Future work to reduce failure rates:

1.Only instrument likely indicator variables

2.Use smoke detector model

!"##$%

&'(#)*

!" !#$%& "'(%)*++ ,)*% ,#$% -%*, .$!%'%%.*) .$!/$00 /1) 2" (1345446

75446

&5446

85446

95446

:5446

;5446

<5446

=5446

>5446

745446

775446

7&5446

785446&:; :7& 74&9 &49= 94>; =7>& 7;8=9

01$.?)*@)1/*

Thursday, April 29, 2010

Page 42: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

: FALSE POSITIVESP̂fail

• Future work to reduce failure rates:

1.Only instrument likely indicator variables

2.Use smoke detector model

!"##$%

&'(#)*

!" !#$%& "'(%)*++ ,)*% ,#$% -%*, .$!%'%%.*) .$!/$00 /1) 2" (1345446

75446

&5446

85446

95446

:5446

;5446

<5446

=5446

>5446

745446

775446

7&5446

785446&:; :7& 74&9 &49= 94>; =7>& 7;8=9

01$.?)*@)1/*

Thursday, April 29, 2010

Page 43: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

DETECTING EXPLOITS

app exp 256 512 1024 2048 4096 8192 16384

bc BV 100% 100% 100% 100% NA NA NA

compress BV 0% 0% 0% 0% 0% 0% NA

grep DOS 100% 100% 100% 100% 100% 100% 100%

gzip BV 100% 50% 40% 30% 10% 0% 0%

libpoppler UPF 100% 100% 100% 100% 100% 100% 100%

libtiff OVF 100% 100% 100% 100% 100% 100% 100%

man BV 100% 100% 100% 100% 100% NA NA

Thursday, April 29, 2010

Page 44: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

RELATED WORK

• Forrest et al. The Evolution of System-call Monitoring. ACSAC 2008

• Perkins et al. Automatically Patching Errors in Deployed Software, SOSP 2009

• Demsky et al. Inference and Enforcement of Data Structure Consistency Specifications, ISSTA ’06

• Key differences:

• Allow user to tradeoff false positives for false negatives

• Demonstrate ability to thwart several types of attacks and soft errors

• Analytically show that we can tolerate rogue users in the community

Thursday, April 29, 2010

Page 45: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

CONCLUSIONS

• We need systems that can identify exploits in deployed code

• Allow users to specify failure rates they are willing to tolerate

• Mainstream computing can identify unanticipated, and potentially malicious execution

• Buffer overruns, integer overflow, injection attacks, and DOS

• We show that it can even be used to identify soft errors

Thursday, April 29, 2010

Page 46: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

Thursday, April 29, 2010

Page 47: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

FUTURE WORK

• Only instrument the likely indicator variables

• Deploy in the “real world”

• Consider server workloads

• Improve the performance of the runtime

• Consider privacy concerns

Thursday, April 29, 2010

Page 48: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

FILTERING VOLATILE VARIABLES

• Our prototype samples nearly all variables

• Some variables (e.g., timing-based variables) are hard or impossible to constrain with our baseline strategy

• We use a machine-learning strategy for filtering them out of the constraint set

• Future work will obviate the need for this strategy

Thursday, April 29, 2010

Page 49: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

TOLERATING ROGUE USERS

• Community may have malicious users

• Our constraint set creation approach attempts to limit the number of runtime profiles in the training set

• The fewer the runtime profiles in the training set, the less likely it is that the resultant constraint set will be tainted by rogue runtime profiles

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 2000 4000 6000 8000 10000 12000

Pro

ba

bili

ty o

f ro

gu

e s

am

ple

po

llutin

g a

sse

rtio

n

Sample set size for assertion computation

Pr=1e-04

Pr=1e-05

Pr=1e-06

Runtime Profiles in Training Set

Pro

bab

ility

of T

aint

ed C

ons

trai

nt S

et

Thursday, April 29, 2010

Page 50: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

• Sample and constrain the values of program “variables”

• Variables: application-level and many IR temporaries

• What we sample and constrain:

• Data-range: e.g., [32, 36]

• Constant bits: e.g., [00100TTT]

• Population count range: e.g., [1, 2]

Thursday, April 29, 2010

Page 51: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

Sam

plin

g

Thursday, April 29, 2010

Page 52: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

Sam

plin

g

Thursday, April 29, 2010

Page 53: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Thursday, April 29, 2010

Page 54: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

Cons

trai

ntch

ecki

ng

Thursday, April 29, 2010

Page 55: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

Cons

trai

ntch

ecki

ng

!

Thursday, April 29, 2010

Page 56: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

Cons

trai

ntch

ecki

ng

! !

Thursday, April 29, 2010

Page 57: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

Cons

trai

ntch

ecki

ng

! ! !

Thursday, April 29, 2010

Page 58: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

! ! !

Thursday, April 29, 2010

Page 59: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

! ! !!

Thursday, April 29, 2010

Page 60: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

! ! !!!

Thursday, April 29, 2010

Page 61: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

! ! !!!!

Thursday, April 29, 2010

Page 62: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

[1,8] [0000TTTT] [1,1] 7 [00000111]

! ! !!!!

Thursday, April 29, 2010

Page 63: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

[1,8] [0000TTTT] [1,1] 7 [00000111]

!

!

! !!!!

Thursday, April 29, 2010

Page 64: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

[1,8] [0000TTTT] [1,1] 7 [00000111]

!

!

! !!!!

!

Thursday, April 29, 2010

Page 65: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONVALUE BASED

Data Range Constant Bits Population Count Value

[32,32] [00100000] [1,1] 32 [00100000]

[32,33] [0010000T] [1,2] 33 [00100001]

[32,36] [00100T0T] [1,2] 36 [00100100]Sam

plin

g

Data Range Constant Bits Population Count Value

[1,8] [0000TTTT] [1,1] 2 [00000010]

[1,8] [0000TTTT] [1,1] 8 [00001000]

Cons

trai

ntch

ecki

ng

[1,8] [0000TTTT] [1,1] 7 [00000111]

!

! "

! !!!!

!

Thursday, April 29, 2010

Page 66: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

ASPECTS OF EXECUTIONCONTROL-FLOW BASED

• Sample and check for simple control flow invariants

• Paths: sample and check value of a branch history vector and given program points

• Calls: sample and check ID of caller in a callee’s header

Thursday, April 29, 2010

Page 67: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

OVERHEAD OF SYSTEM

Overhead (factor over -O1)Benchmark Full CF CS VB Selective

bc 10.8 2.5 1.0 8.6 3.2bzip2 21.4 3.4 1.0 19.0 4.3

compress 8.5 2.2 0.9 7.5 4.4grep 4.2 1.3 1.0 4.0 1.7gzip 16.4 4.1 1.0 13.1 7.2jpeg 29.8 3.1 1.0 27.7 4.2

libpoppler 9.8 0.8 1.0 9.2 0.9libtiff 15.0 1.3 1.0 15.0 4.5

libvorbis 15.0 1.5 1.0 14.8 5.6tar 1.1 1.0 1.0 1.1 1.0wc 4.3 1.9 1.0 4.4 1.9

Table 2. Prototype instrumentation overhead.

variables (e.g. is variable a less than variable b?). Statistical datamining techniques are then employed to identify the predicates thatare best able to predict program crashes. While these systems helpsoftware developers pinpoint probable causes for critical softwarefailures, they cannot protect against undiscovered vulnerabilities.

Our work also leverages the software invariant detection ideaspioneered in Daikon [17] and Diduce [18], and in later hardware-based solutions [14]. While in some sense systems like Daikon andDiduce identify mainstream behavior, the goals of our systems arevery different. In systems like Daikon, Diduce, and mainstreamcomputing, a user will encounter false positives. Because main-stream computing is meant to protect deployed applications, weallow users to specify a tolerance for failure with which they willbe comfortable. This aspect of mainstream computing uniquely al-lows it to cope with malicious users.

“Taint analyses” of various forms have been proposed and usedto prevent untrusted data from affecting program execution (e.g.,[8, 11, 31]). Similarly, Castro et al. propose a (necessarily conser-vative) approach for detecting situations where the runtime flow ofdata does not agree with the static data-flow graph [7].

Recently many promising approaches for tolerating softwarebugs have arisen. Bouncer is a system that generates and uses fil-ters to drop messages with malicious payloads before they can beprocessed by a vulnerable program [10]. Rx is a checkpointing sys-tem in which a program that encounters a failure is restored to aprevious checkpoint and rerun in the context of a different environ-ment [28]. Locasto et al. use a reactionary approach to immunizingan application community against software failures [24]. Once anapplication instance detects an error, it communicates informationabout the error to other application instances, which then use emu-lation for the program methods involved in the failure.

Failure oblivious computing is an approach in which failures areignored, and values that directly led to the failure are set such thatcomputation can resume [29]. The Die Hard system probabilisti-cally manages memory, which greatly increases the chances thatprograms with memory errors will execute properly [5]. These sys-tems often convert catastrophic errors into correct executions, orexecutions with relatively benign issues.

ClearView is a collaborative system that relies on “monitors” todetect buffer overruns and illegal control transfers [27]; and simi-lar to statistical bug isolation [21], ClearView maintains likely in-variants which it mines to automatically generate software patcheswhen a monitor is triggered. Because ClearView relies on monitorsto tell the system when a failure has occurred, it does not detectarbitrary failures (such as injection and denial of service attacks).

Philosophically, mainstream computing is fundamentally differ-ent than all of these works. It makes no assumptions about errortypes, their root causes, or their implications. It simply enforcesmainstream behavior at runtime. This simplicity makes it a pow-erful, generic tool, enabling it to detect a wide variety of softwarefailures (and often, long before actual data corruption happens).

A direct extension to failure oblivious computing that inspiredour work is presented in [13]. Demsky et al. use the Daikon invari-ant system to determine invariants for the fields of critical programstructures. Upon violation of an invariant, the system will failure-obliviously “repair” the error according to the invariants specifiedfor the offending field. The major differences between our workand [13] are three-fold: 1) we allow users to bound false positiverates based on their needs; 2) we demonstrate the ability of a main-stream system to tolerate soft errors and several different kinds ofattacks; and 3) we describe a methodology for automatically ex-tracting the invariants by leveraging a community of users.

6. Future WorkWhile we are encouraged by our prototype’s initial success, thereare still many open questions that must be answered before main-stream computing is viable. First and foremost, it remains to beseen how our prototype system would behave in a real deployment.Though we went to great lengths to simulate a real user community,our setup is limited in scope, and therefore our runtime profiles maybe biased. In order to collect unbiased data, future work will con-sider larger scale deployments. That said, in some cases belongingto a biased community may have significant security advantages.At the extreme, future work will consider personalized servers thatcater constraint sets to an individual user.

In addition, work is already underway to broaden the typesof applications we consider. In particular, future work will con-sider “server” applications which potentially run for weeks at atime. Such applications may require periodically submitting par-tial runtime profiles to the server, while simultaneously updatingthe client’s constraint sets.

We will also investigate smarter merging and filtering method-ologies to reduce p̂ for a given training set size. While a 1% toler-ance for failure would be perfectly acceptable for many users (e.g.,one-per-day usage would only prompt the user once every severalmonths), other users would find such tolerances unacceptable. Inaddition, some users may object that mainstream computing’s sta-tistical guarantees are not with respect to a single user, but withrespect to the collaborative community: a “unique” user in a homo-geneous community may be frustrated by the mainstream comput-ing approach.

Finally our prototype was not engineered to have low run-time overheads. We will explore incorporating our ideas into a dy-namic code generation system. Inlining our instrumentation codeand completely omitting unnecessary constraint checks would al-low us to drastically reduce the overhead of the runtime system.Furthermore, we will investigate the potential of sparse samplingto reduce the overhead of collecting features. Liblit et al. showthat their approach reduces the overhead of sampling to a marginalamount (< .05) for equally heavy weight instrumentation [23].3

7. ConclusionThis paper explores a novel approach to increasing security and re-liability. By enforcing mainstream behavior—the level of which isuser definable—we show that a mainstream computing system caneffectively identify unanticipated and potentially malicious compu-tation. To our knowledge, our system is the first that allows usersto specify the failure rates that they are willing to tolerate. Highertolerances for failure may provide more protection through stricterregulation. Our approach allows mainstream computing to effec-

3 Such an approach would destroy the property that a constraint would failon a runtime profile iff it would have failed during the actual execution;sparse sampling would therefore require reworking our constraint genera-tion and validation approach.

9

Thursday, April 29, 2010

Page 68: Regulating Program Behavior via Mainstream Computing · REGULATING PROGRAM BEHAVIOR VIA MAINSTREAM COMPUTING Mark Stephenson, Ram Rangan*, Emmanuel Yashchin, and Eric Van Hensbergen

DETECTING SOFT ERRORSUndetected and FailedUndetected and PassedDetected and FailedDetected and Passed

0%

20%

40%

60%

80%

100%

bc

bzi

p2

com

pre

ssd

jpe

gg

rep

ma

np

op

ple

rta

rtif

fm

ea

n bc

bzi

p2

com

pre

ssd

jpe

gg

rep

ma

np

op

ple

rta

rtif

fm

ea

n

Fre

qu

en

cy o

f E

ven

t w

he

n B

it F

lipp

ed

Benchmark

Just Detecting Failure Oblivious

Thursday, April 29, 2010