tolerating and correcting memory errors in c and c++
DESCRIPTION
Tolerating and Correcting Memory Errors in C and C++. Ben Zorn Microsoft Research In collaboration with: Emery Berger and Gene Novark, Umass - Amherst Karthik Pattabiraman, UIUC Vinod Grover and Ted Hart, Microsoft Research. Focus on Heap Memory Errors. - PowerPoint PPT PresentationTRANSCRIPT
Tolerating and Correcting Memory Errors in C and C++
Ben ZornMicrosoft Research
In collaboration with:Emery Berger and Gene Novark, Umass - Amherst
Karthik Pattabiraman, UIUCVinod Grover and Ted Hart, Microsoft Research
Ben Zorn, Microsoft Research 1
Tolerating and Correcting Memory Errors in C and C++
Buffer overflow
char *c = malloc(100);c[101] = ‘a’;
Dangling reference
char *p1 = malloc(100);char *p2 = p1;
free(p1);p2[0] = ‘x’;
a
Focus on Heap Memory Errors
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 2
c
0 99
p1
0 99
p2
x
Ben Zorn, Microsoft Research
Approaches to Memory Corruptions Rewrite in a safe language
Static analysis / safe subset of C or C++ SAFECode [Adve], etc.
Runtime detection, fail fast Jones & Lin, CRED [Lam], CCured [Necula], etc. Debugging possible, security advantage?
Runtime toleration Failure oblivious [Rinard] (unsound) Rx, Boundless Memory Blocks, ECC memory
DieHard / Exterminator, Samurai
3Tolerating and Correcting Memory Errors
in C and C++
Fault Tolerance and Platforms Platforms necessary in computing ecosystem
Extensible frameworks provide lattice for 3rd parties Tremendously successful business model Examples numerous: Window, iPod, browser, etc.
Platform power derives from extensibility Tension between isolation for fault tolerance,
integration for functionality Platform only as reliable as weakest plug-in Tolerating bad plug-ins necessary by design
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 4
Research Vision Increase robustness of installed code base
Potentially improve millions of lines of code Minimize effort – ideally no source mods, no
recompilation Reduce requirement to patch
Patches are expensive (detect, write, deploy) Patches may introduce new errors
Enable trading resources for robustness E.g., more memory implies higher reliability
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 5
Ben Zorn, Microsoft Research
Outline Motivation Exterminator
Collaboration with Emery Berger, Gene Novark Automatically corrects memory errors Suitable for large scale deployment
Critical Memory / Samurai Collaboration with Karthik Pattabiraman, Vinod Grover New memory semantics Source changes to explicitly identify and protect
critical data Conclusion
6Tolerating and Correcting Memory Errors
in C and C++
Tolerating and Correcting Memory Errors in C and C++
DieHard Allocator in a Nutshell Joint work with Emery Berger Existing heaps are packed
tightly to minimize space Tight packing increases likelihood
of corruption Predictable layout is easier for
attacker to exploit We randomize and
overprovision the heap Expansion factor determines how
much empty space Semantics are identical
Replication increases benefits Enables analytic reasoning
7
Normal Heap
DieHard Heap
Ben Zorn, Microsoft Research
DieHard in Practice DieHard (non-replicated)
Windows, Linux version implemented by Emery Berger Try it right now! (http://www.diehard-software.org/) Adaptive, automatically sizes heap Mechanism automatically redirects malloc calls to DieHard DLL
Application: Firefox & Mozilla Known buffer in version 1.7.3 overflow crashes browser
Experience Usable in practice – no perceived slowdown Roughly doubles memory consumption
20.3 Mbytes vs. 44.3 Mbytes with DieHard
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 8
Ben Zorn, Microsoft Research
Caveats Primary focus is on protecting heap
Techniques applicable to stack data, but requires recompilation and format changes
DieHard trades space, extra processors for memory safety Not applicable to applications with large footprint Applicability to server apps likely to increase
DieHard requires non-deterministic behavior to be made deterministic (on input, gettimeofday(), etc.)
DieHard is a brute force approach Improvements possible (efficiency, safety, coverage, etc.)
9Tolerating and Correcting Memory Errors
in C and C++
Exterminator Motivation DieHard limitations
Tolerates errors probabilistically, doesn’t fix them Memory and CPU overhead Provides no information about source of errors
“Ideal” solution addresses the limitations Program automatically detects and fixes memory errors Corrected program has no memory, CPU overhead Sources of errors are pinpointed, easier for human to fix
Exterminator = correcting allocator Joint work with Emery Berger, Gene Novark Random allocation => isolates bugs while tolerating them
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 10
Exterminator Components Architecture of Exterminator dictated by solving
specific problems How to detect heap corruptions effectively?
DieFast allocator How to isolate the cause of a heap corruption
precisely? Heap differencing algorithms
How to automatically fix buggy C code without breaking it? Correcting allocator + hot allocator patches
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 11
DieFast Allocator Randomized, over-provisioned heap
Canary = random bit pattern fixed at startup Leverage extra free space by inserting canaries
Inserting canaries Initialization – all cells have canaries On allocation – no new canaries On free – put canary in the freed object with prob. P Remember where canaries are (bitmap)
Checking canaries On allocation – check cell returned On free – check adjacent cells
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 12
100101011110
1 2
Installing and Checking Canaries
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 13
Allocate Allocate
Install canarieswith probability PCheck canary Check canary
Free
Initially, heap full of canaries
1
Heap Differencing Strategy
Run program multiple times with different randomized heaps
If detect canary corruption, dump contents of heap Identify objects across runs using allocation order
Key insight: Relation between corruption and object causing corruption is invariant across heaps Detect invariant across random heaps More heaps => higher confidence of invariant
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 14
1 2 X
Attributing Buffer Overflows
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 15
One candidate!
4 3
corrupted canary
Which object caused?
delta is constant but unknown?
12 X4 3
Run 2
Run 1
Now only 2 candidates
2 4
41 X3
Run 3
2 44
Precision increases exponentially with number of runs
Detecting Dangling Pointers (2 cases) Dangling pointer read/written (easy)
Invariant = canary in freed object X has same corruption in all runs
Dangling pointer only read (harder) Sketch of approach (paper explains details)
Only fill freed object X with canary with probability P Requires multiple trials: ≈ log2(number of callsites) Look for correlations, i.e., X filled with canary => crash Establish conditional probabilities
Have: P(callsite X filled with canary | program crashes) Need: P(crash | filled with canary), guess “prior” to compute
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 16
Correcting Allocator Group objects by allocation site Patch object groups at allocate/free time Associate patches with group
Buffer overrun => add padding to size request malloc(32) becomes malloc(32 + delta)
Dangling pointer => defer free free(p) becomes defer_free(p, delta_allocations)
Fixes preserve semantics, no new bugs created Correcting allocation may != DieFast or DieHard
Correction allocator can be space, CPU efficient “Patches” created separately, installed on-the-fly
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 17
Deploying Exterminator Exterminator can be deployed in different modes Iterative – suitable for test environment
Different random heaps, identical inputs Complements automatic methods that cause crashes
Replicated mode Suitable in a multi/many core environment Like DieHard replication, except auto-corrects, hot patches
Cumulative mode – partial or complete deployment Aggregates results across different inputs Enables automatic root cause analysis from Watson dumps Suitable for wide deployment, perfect for beta release Likely to catch many bugs not seen in testing lab
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 18
cfra
c
espr
esso
linds
ay p2c
robo
op
164.
gzip
175.
vpr
176.
gcc
181.
mcf
186.
craf
ty
197.
pars
er
252.
eon
253.
perlb
mk
254.
gap
255.
vortex
256.
bzip
2
300.
twolf
Geom
etric
mea
n
0
0.5
1
1.5
2
2.5
GNU libc Exterminator
Norm
alized
Execu
tion
Tim
e
allocation-intensive SPECint2000
DieFast Overhead
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 19
Exterminator Effectiveness Squid web cache buffer overflow
Crashes glibc 2.8.0 malloc 3 runs sufficient to isolate 6-byte overflow
Mozilla 1.7.3 buffer overflow (recall demo) Testing scenario - repeated load of buggy page
23 runs to isolate overflow Deployed scenario – bug happens in middle of
different browsing sessions 34 runs to isolate overflow
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 20
Ben Zorn, Microsoft Research
Outline Motivation Exterminator
Collaboration with Emery Berger, Gene Novark Automatically corrects memory errors Suitable for large scale deployment
Critical Memory / Samurai Collaboration with Karthik Pattabiraman, Vinod Grover New memory semantics Source changes to explicitly identify and protect
critical data Conclusion
21Tolerating and Correcting Memory Errors
in C and C++
Critical Memory Motivation C/C++ programs vulnerable to memory errors
Software errors: buffer overflows, etc. Hardware transient errors: bit flips, etc.
Increasingly a problem due to process shrinking, power
Critical memory goals: Harden programs from both SW and HW errors Allow local reasoning about memory state Allow selective, incremental hardening of apps Provide compatibility with existing libraries, apps
Ben Zorn, Microsoft Research 22
Tolerating and Correcting Memory Errors in C and C++
balance += 100;if (balance < 0) { chargeCredit();} else { x += 10; y += 10;}
Main Idea: Data-centric Robustness Critical memory
Some data is more important than other data Selectively protect that data from corruption
Examples Account data, document contents are critical
// UI data is not Game score information, player stats, critical
// rendering data structures are not
balance
Data Code
x, y
critical data
code thatreferencescritical data
Ben Zorn, Microsoft Research 23
Tolerating and Correcting Memory Errors in C and C++
Critical Memory Semantics Conceptually, critical memory is parallel and
independent of normal memory Critical memory requires special allocate/deallocate
and read/write operations critical_store (cstore) – only way to consistently update
critical memory critical_load (cload) – only way to consistently read critical
memory Critical load/store have priority over normal
load/store Normal loads still see the value of critical memory
Ben Zorn, Microsoft Research 24
Tolerating and Correcting Memory Errors in C and C++
Example
cstore balance, 100…cload balance returns 100load balance returns 100
100
100normalmemory
criticalmemory
cstore100
cstore balance, 100store balance, 10000 (applications should not do this)…load balance returns 10000 (compatibility semantics)
cload balance returns 100 (possibly triggers exception)
100
10000normalmemory
criticalmemory
cstore 100store 10000
cloadload
load
Ben Zorn, Microsoft Research 25
Tolerating and Correcting Memory Errors in C and C++
int x, y, buffer[10];critical int balance = 100;
third_party_lib(&x, &y);buffer[10] = 10000; // Bug!
// balance still == 100
if (balance < 0) { chargeCredit();} else { x += 10; y += 10;}
Critical Memory Deployment Define “critical” type specifier:
Like “const” Easy to use, minimal source
mods Allows local reasoning
External libraries, code cannot modify critical data
Tolerates memory errors Non-critical overflows cannot
corrupt critical values Alllows static analysis of program
subset Critical subset of program can be
statically checked independently Additional checking on critical
data possible
Ben Zorn, Microsoft Research 26
Tolerating and Correcting Memory Errors in C and C++
Third-party Libraries/Untrusted Code Library code does not
need to be critical memory aware If library does not mod
critical data, no changes required
If library needs to modify critical data Allow normal stores to
critical memory in library Explicitly “promote” on
return Copy-in, copy-out semantics
critical int balance = 100;…library_foo(&balance);promote balance;…__________________
// arg is not critical int *void library_foo(int *arg){ *arg = 10000; return;}
Ben Zorn, Microsoft Research 27
Tolerating and Correcting Memory Errors in C and C++
Samurai: SCM Prototype Software critical memory for heap objects
Critical objects allocated with crit_malloc, crit_free Approach
Replication – base copy + 2 shadow copies Redundant metadata
Stored with base copy, copy in hash table Checksum, size data for overflow detection
Robust allocator as foundation DieHard, unreplicated Randomizes locations of shadow copies
Ben Zorn, Microsoft Research 28
Tolerating and Correcting Memory Errors in C and C++
Samurai Implementation
cstore balance, 100…cload balance returns 100load balance returns 100
100
100basecopy
shadowcopies
cstore100
cstore balance, 100store balance, 10000…load balance returns 10000cload balance returns 100
100
cload
metadata
cs
=?
100
10000basecopy
shadowcopies
100
metadata
cs
load=?
cload
=?store 10000
Ben Zorn, Microsoft Research 29
Tolerating and Correcting Memory Errors in C and C++
Samurai Experimental Results Prototype implementation of critical memory
Fault-tolerant runtime system for C/C++ Applied to heap objects Automated Phoenix compiler pass
Identified critical data for five SPECint applications Low overheads for most applications (less than 10%)
Conducted fault-injection experiments Fault tolerance significantly improved over based code Low probability of fault-propagation from non-critical data to
critical data for most applications No new assertions or consistency checks added
Ben Zorn, Microsoft Research 30
Tolerating and Correcting Memory Errors in C and C++
Performance ResultsPerformance Overhead
1.03 1.08 1.01 1.08
2.73
0
0.5
1
1.5
2
2.5
3
vpr crafty parser rayshade gzip
Benchmark
Slo
wd
ow
n
Baseline Samurai
Ben Zorn, Microsoft Research 31
Tolerating and Correcting Memory Errors in C and C++
Fault Injection into Critical Data (vpr)Fault Injections into vpr (with Samurai)
0%
20%
40%
60%
80%
100%
1000
0
2000
0
3000
0
4000
0
5000
0
6000
0
7000
0
8000
0
9000
0
1E+
06
Fault Period (number of accesses)
Per
cen
tag
e o
f T
rial
s
Successes Failures False-Positives
Fault Injections into vpr (without Samurai )
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1000
00
2000
00
3000
00
4000
00
5000
00
6000
00
7000
00
8000
00
9000
00
1000
000
Fault Period (number of accesses)
Per
cen
tag
e o
f T
rial
s
Successes Failures False-Positives
Ben Zorn, Microsoft Research 32
Tolerating and Correcting Memory Errors in C and C++
Fault Injection into Non-Critical DataApp Number
of TrialsControl Errors
Data Errors
Pointer Errors
Assertion Violations
Total Errors
vpr 550 (199) 0 203 (0) 1 (0) 2 (2) 203 (0)
crafty 55 (18) 12 (7) 9 (3) 4 (3) 0 25 (13)
parser 500 (380) 0 3 (1) 0 0 3 (1)
rayshade 500 (68) 0 5 (1) 0 1 (1) 5 (1)
gzip 500 (239) 0 1 (1) 2 (2) 157 (157) 3 (3)
Ben Zorn, Microsoft Research 33
Tolerating and Correcting Memory Errors in C and C++
Experience with CM for Components Hardened two libraries with critical memory
CM-STL – hardened List Class data and metadata CM-malloc – hardened allocator metadata
Effort Relatively small changes to existing code
Performance impact CM-STL in Web Server thread list => 10% slower CM-malloc in cfrac benchmark => 22% slower
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 34
Ben Zorn, Microsoft Research
Conclusion Programs written in C / C++ can execute safely
and correctly despite memory errors Research vision
Improve existing code without source modifications Reduce human generated patches required Increase reliability, security by order of magnitude
Current projects DieHard / Exterminator: automatically detect and
correct memory errors (with high probability) Critical Memory / Samurai: enable local reasoning,
allow selective hardening, compatibility ToleRace: replication to hide data races
35Tolerating and Correcting Memory Errors
in C and C++
Ben Zorn, Microsoft Research
Hardware Trends (1) Reliability Hardware transient faults are increasing
Even type-safe programs can be subverted in presence of HW errors Academic demonstrations in Java, OCaml
Soft error workshop (SELSE) conclusions Intel, AMD now more carefully measuring “Not practical to protect everything” Faults need to be handled at all levels from HW up the
software stack Measurement is difficult
How to determine soft HW error vs. software error? Early measurement papers appearing
36Tolerating and Correcting Memory Errors
in C and C++
Ben Zorn, Microsoft Research
Hardware Trends (2) Multicore DRAM prices dropping
2Gb, Dual Channel PC 6400 DDR2 800 MHz $85
Multicore CPUs Quad-core Intel Core 2 Quad, AMD
Quad-core Opteron
Eight core Intel by 2008?
Challenge: How should we use all this hardware?
37Tolerating and Correcting Memory Errors
in C and C++
Additional Information Web sites:
Ben Zorn: http://research.microsoft.com/~zorn DieHard: http://www.diehard-software.org/ Exterminator: http://www.cs.umass.edu/~gnovark/
Publications Emery D. Berger and Benjamin G. Zorn, "DieHard:
Probabilistic Memory Safety for Unsafe Languages", PLDI’06. Karthik Pattabiraman, Vinod Grover, and Benjamin G. Zorn,
"Samurai - Protecting Critical Data in Unsafe Languages", Microsoft Research, MSR-TR-2006-127, September 2006.
Gene Novark, Emery D. Berger and Benjamin G. Zorn, “Exterminator: Correcting Memory Errors with High Probability", PLDI’07.
Ben Zorn, Microsoft Research 38
Tolerating and Correcting Memory Errors in C and C++
Backup Slides
Ben Zorn, Microsoft Research 39
Tolerating and Correcting Memory Errors in C and C++
Ben Zorn, Microsoft Research
DieHard: Probabilistic Memory Safety Collaboration with Emery Berger Plug-compatible replacement for malloc/free in C lib We define “infinite heap semantics”
Programs execute as if each object allocated with unbounded memory
All frees ignored Approximating infinite heaps – 3 key ideas
Overprovisioning Randomization Replication
Allows analytic reasoning about safety
40Tolerating and Correcting Memory Errors
in C and C++
Overprovisioning, Randomization
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 41
Expand size requests by a factor of M (e.g., M=2)
1 2 3 4 5
1 2 3 4 5
Randomize object placement
12 34 5
Pr(write corrupts) = ½ ?
Pr(write corrupts) = ½ !
Replication (optional)
Ben Zorn, Microsoft Research
Tolerating and Correcting Memory Errors in C and C++ 42
Replicate process with different randomization seeds
1 234 5
P2
12 345
P3
input
Broadcast input to all replicas
Compare outputs of replicas, kill when replica disagrees
1 23 45
P1
Voter
Ben Zorn, Microsoft Research
DieHard Implementation Details Multiply allocated memory by factor of M Allocation
Segregate objects by size (log2), bitmap allocator Within size class, place objects randomly in address
space Randomly re-probe if conflicts (expansion limits probing)
Separate metadata from user data Fill objects with random values – for detecting uninit reads
Deallocation Expansion factor => frees deferred Extra checks for illegal free
43Tolerating and Correcting Memory Errors
in C and C++
Segregated size classes
- Static strategy pre-allocates size classes- Adaptive strategy grows each size class incrementally
Ben Zorn, Microsoft Research
Over-provisioned, Randomized Heap
2
H = max heap size, class i
L = max live size ≤ H/2
F = free = H-L
4 5 3 1 6
object size = 16
object size = 8
…
44Tolerating and Correcting Memory Errors
in C and C++
Ben Zorn, Microsoft Research
Randomness enables Analytic ReasoningExample: Buffer Overflows
k = # of replicas, Obj = size of overflow With no replication, Obj = 1, heap no more
than 1/8 full:Pr(Mask buffer overflow), = 87.5%
3 replicas: Pr(ibid) = 99.8%
45Tolerating and Correcting Memory Errors
in C and C++
Ben Zorn, Microsoft Research
DieHard CPU Performance (no replication) Runtime on Windows
0
0.2
0.4
0.6
0.8
1
1.2
1.4
cfrac espresso lindsay p2c roboop Geo. Mean
Norm
aliz
ed runtim
e
malloc DieHard
46Tolerating and Correcting Memory Errors in
C and C++
Ben Zorn, Microsoft Research
DieHard CPU Performance (Linux)
47Tolerating and Correcting Memory Errors
in C and C++
cfra
c
esp
ress
o
lind
say
rob
oo
p
Ge
o. M
ea
n
16
4.g
zip
17
5.v
pr
17
6.g
cc
18
1.m
cf
18
6.c
rafty
19
7.p
ars
er
25
2.e
on
25
3.p
erl
bm
k
25
4.g
ap
25
5.v
ort
ex
25
6.b
zip
2
30
0.tw
olf
Ge
o. M
ea
n
0
0.5
1
1.5
2
2.5
malloc GC DieHard (static) DieHard (adaptive)
No
rma
lize
d r
un
tim
e
alloc-intensive general-purpose
Ben Zorn, Microsoft Research
Correctness Results Tolerates high rate of synthetically injected
errors in SPEC programs Detected two previously unreported benign
bugs (197.parser and espresso) Successfully hides buffer overflow error in
Squid web cache server (v 2.3s5) But don’t take my word for it…
48Tolerating and Correcting Memory Errors
in C and C++
Experiments / Benchmarks vpr: Does place and route on FPGAs from netlist
Made routing-resource graph critical crafty: Plays a game of chess with the user
Made cache of previously-seen board positions critical gzip: Compress/Decompresses a file
Made Huffman decoding table critical parser: Checks syntactic correctness of English
sentences based on a dictionary Made the dictionary data structures critical
rayshade: Renders a scene file Made the list of objects to be rendered critical
Ben Zorn, Microsoft Research 49
Tolerating and Correcting Memory Errors in C and C++
Ben Zorn, Microsoft Research
Related Work Conservative GC (Boehm / Demers / Weiser)
Time-space tradeoff (typically >3X) Provably avoids certain errors
Safe-C compilers Jones & Kelley, Necula, Lam, Rinard, Adve, … Often built on BDW GC Up to 10X performance hit
N-version programming Replicas truly statistically independent
Address space randomization (as in Vista) Failure-oblivious computing [Rinard]
Hope that program will continue after memory error with no untoward effects
50Tolerating and Correcting Memory Errors
in C and C++