data structure repair using goal-directed reasoning brian demsky martin rinard computer science and...

60
Data Structure Repair Using Goal-Directed Reasoning Brian Demsky Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Post on 22-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Data Structure Repair Using Goal-Directed Reasoning

Brian DemskyMartin Rinard

Computer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

Problem

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure

Errors• Missing elements• Inappropriate

sharing• Dangling

references• Out of bounds

array indices• Inconsistent values

F = 10G = 5

F = 2G = 1

I = 3

J = 2

F = 20G = 10

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure Consistent Data Structure

RepairAlgorithm

Solution

OOPSLA 2003

101110010111010101110110101110110

000110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract ModelAbstract

Repair

Model Definition

Rules

External ConsistencyConstraints

ICSE 2005 (this paper)

101110010111010101110110101110110

000110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract ModelAbstract

Repair

Model Definition

Rules

External ConsistencyConstraints

Why Eliminate External Consistency Constraints?

• Development overhead of external consistency constraints

• Possibility of errors in the external consistency constraints

• Difficulties ensuring that repaired model corresponds to a concrete data structure

How Did We Do It?

Goal-directed reasoning replaces external consistency constraints

• Start with repair that makes model consistent

• Goal: implement this repair in data structures

• Reasoning: analyze model definition rules

• Result: data structure updates that implement abstract repairs

Result

101110010111010101110110101110110

000110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract ModelAbstract

Repair

Model Definition

Rules

External ConsistencyConstraints

Result

101110010111010101110110101110110

000110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract ModelAbstract

Repair

AutomaticallyGenerated

Concrete DataStructure Updates

Model Definition

Rules

External ConsistencyConstraints

File System Example

struct disk { int blockbitmap; entry dir[numentries]; block

block[numblocks];} struct entry {

byte name[Length];int firstblock;

}

struct block {int nextblock;byte data[blocksize];

}struct blockbitmap subtype block { int nextblock; bit bitmap[numblocks];}

intro -5 2 -1

Directory Entries Disk Blocks

-1 3 -1

File System Model

• Sets of objectsset Block of block : Used | Freeset Used of block : Bitmap

• Relations between objects relation Next : Used, Used relation BlockStatus : Block, boolean

Block

Used FreeNext

Bitmap

boolean

BlockStatus

Model TranslationBits translated to sets and relations in abstract

model using statements of the form:Quantifiers, Condition => Inclusion Constraint

i [0..numentries-1], 0 d.dir[i].firstblock d.block[d.dir[i].firstblock] Used

b Used, 0 b.nextblock b,d.block[b.nextblock] Next

b Used, 0 b.nextblock d.block[b.nextblock] Used

b in [0..numblocks-1], d.block[b] Used d.block[b] Free

true d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>

<d.block[j],b.bitmap[j]> BlockStatus

Model for File System Example

intro -5 2 -1

Directory Entries Disk Blocks

-1 3 -1

1

2Used

Free0

Blocks

Bitmap

3Nex

t

Consistency Constraints in Example

|Bitmap|=1u Used, u.BlockStatus=truef Free, f.BlockStatus=falseb Used, |Next.b| 1

1

2Used

Free0

Blocks

Bitmap

3Nex

t

Detecting InconsistenciesEvaluate consistency properties, find

violations|Bitmap|=1 is violated - Bitmap set is empty

1

2Used

Free0

Blocks

Bitmap

3Nex

t

Repairing Violations of Model Consistency Properties

• Violation provides binding for quantified variables

• Convert Body to disjunctive normal form(p1 … pn ) … (q1 … qm )

p1 … pn , q1 … qm are basic propositions

• Choose a conjunction to satisfy• Repair violated basic propositions in

conjunction

Repairing Violations of Basic Propositions

• Inequality constraints on values of numeric fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E• Compute value of expression, assign relation

• Presence of required number of objects• |S| = C, |S| C, |S| C• Remove or insert objects from/to set

• Topology of region surrounding each object• |V.R| = C, |V.R| C, |V.R| C • |R.V| = C, |R.V| C, |R.V| C• Remove or insert tuples from/to relation

• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Remove or add the object or tuple from/to set

or relation

Repairing InconsistenciesRepair the violation of |Bitmap|=1 by adding a

block to the Bitmap set

1

2Used

Free0

Blocks

Bitmap

3Nex

t

Goal-Directed Reasoning Translates Abstract Repairs Into Concrete

Repairs• Abstract repairs add or remove objects (or

tuples) to sets (or relations)• Goal: find concrete data structure updates

with same effect1) Find model definition rules that construct

the relevant set or relation2) Basic strategy:

For removals, appropriately falsify the guards of all these model definition rules.For additions, appropriately satisfy the guard of one of these model definition rules.

Goal-Directed Reasoning in Example

• Abstract Repair: add block 0 to the Bitmap set

Goal-Directed Reasoning in Example

• Abstract Repair: add block 0 to the Bitmap set

• Model Definition Rules:i [0..numentries-1], 0 d.dir[i].firstblock

d.block[d.dir[i].firstblock] Usedb Used, 0 b.nextblock

b,d.block[b.nextblock] Nextb Used, 0 b.nextblock

d.block[b.nextblock] Used b in [0..numblocks-1], d.block[b] Used

d.block[b] Freetrue d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>

<d.block[j],b.bitmap[j]> BlockStatus

Goal-Directed Reasoning in Example

• Abstract Repair: add block 0 to the Bitmap set

• Model Definition Rules:i [0..numentries-1], 0 d.dir[i].firstblock

d.block[d.dir[i].firstblock] Usedb Used, 0 b.nextblock

b,d.block[b.nextblock] Nextb Used, 0 b.nextblock

d.block[b.nextblock] Used b in [0..numblocks-1], d.block[b] Used

d.block[b] Freetrue d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>

<d.block[j],b.bitmap[j]> BlockStatus

Goal-Directed Reasoning in Example

• Abstract Repair: add block 0 to the Bitmap set

• Relevant Model Definition Rule:true d.block[d.blockbitmap] Bitmap

• d.block[d.blockbitmap]=block 0

Goal-Directed Reasoning in Example

• Abstract Repair: add block 0 to the Bitmap set

• Relevant Model Definition Rule:true d.block[d.blockbitmap] Bitmap

• d.block[d.blockbitmap]=block 0• Data Structure Update:

d.blockbitmap = index of block 0 in d.block array

Repair in Example

Original File System

Updated File System

intro -5 2 -1

Directory Entries Disk Blocks

-1 3 -1

intro 0 2 -1

Directory Entries Disk Blocks

-1 3 -1

blockbitma

p

Reasoning at Compile Time• Compile specifications into repair algorithms• Goal-directed reasoning takes place at compile

time• Consider possibility that |Bitmap| = 0• Abstract repair

• Choose a block in Free set • Add block to Bitmap set

• Concrete repair• Find relevant model definition rule:

true d.block[d.blockbitmap] Bitmap• Goal-directed reasoning finds following update:

d.blockbitmap = index of block in d.block array • Check that block is an element of d.block array:

b in [0..numblocks-1], d.block[b] Used d.block[b] Free

Multiple Repairs

• Some broken data structures may require multiple repairs

• Reconstruct model• Reevaluate consistency constraints• Perform any required additional repairs

Architecture

101110010111010101110110101110110

010110010111010101110110101110110

000110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract Model

AbstractRepair

AutomaticallyGeneratedConcrete

Repair

. . . .

. . . .

Model Translation

Model Recomputation

BlockStatus

1

Used

Free

Blocks

Bitmap

Next

0

true

2 3false

Model Recomputation

Re-evaluate constraints, find violations of u Used, u.BlockStatus=true and f Free, f.BlockStatus=false

BlockStatus

1

Used

Free

Blocks

Bitmap

Next

0

true

2 3false

Model Recomputation

Repair violations of u Used, u.BlockStatus=true and f Free, f.BlockStatus=falseby modifying the BlockStatus relation

BlockStatus

1

Used

Free

Blocks

Bitmap

Next

0

true

2 3false

Repaired File System

blockbitma

p

Repaired File System

intro 1011 0 2 -1

Directory Entries Disk Blocks

-1 3 -1

Acyclic Repair Dependences

• Questions• Isn’t it possible for the repair of one

constraint to invalidate another constraint?

• What about infinite repair loops?• What about unsatisfiable specifications?

• Answer• We require specifications to have no

cyclic repair dependences between constraints

• So all repair sequences terminate• Repair can fail only because of resource

limitations

Repair Dependence Graph

2. Add block to Bitmap

4.Satisfy Rule 6 (BlockStatus)

6. Replace <f,true> with<f,false> in BlockStatus

1. |Bitmap|=1

5. f.BlockStatus=false

3. d.blockbitmap=indexof(bfree)

7. b.bitmap[j]=falsefor j=indexof(f)

8. Remove <f,true> from BlockStatus by

removing Bitmap

Repair Dependence Graph

2. Add block to Bitmap

4.Satisfy Rule 6 (BlockStatus)

6. Replace <f,true> with<f,false> in BlockStatus

1. |Bitmap|=1

5. f.BlockStatus=false

3. d.blockbitmap=indexof(bfree)

7. b.bitmap[j]=falsefor j=indexof(f)

8. Remove <f,true> from BlockStatus by

removing Bitmap

Repair Dependence Graph

2. Add block to Bitmap

4.Satisfy Rule 6 (BlockStatus)

6. Replace <f,true> with<f,false> in BlockStatus

1. |Bitmap|=1

5. f.BlockStatus=false

3. d.blockbitmap=indexof(bfree)

7. b.bitmap[j]=falsefor j=indexof(f)

When to Test for Consistency and Repair

• Persistent data structures• Repair can be independent activity, or• Repair when data written out or read in

• Volatile data structures in running program• Under programmer control• Transaction-based approach

• Identify transaction start and end• Repair at start, end, or both

• Failure-based approach• Wait until program fails• Repair and restart from latest safe point

Experience• We acquired five benchmarks (written in C/C++)

• AbiWord• x86 emulator• CTAS (air-traffic control tool)• Simplified Linux file system• Freeciv interactive game

• We developed specifications for all five • Little development time (days, not weeks)• Most of time spent figuring out Freeciv and

CTAS • Each benchmark has

• Workload• Bug or fault insertion methodology

• Ran benchmarks with and without repair

AbiWord

• Open-source word processing program• Approximately 360,000 lines of C++

code• Abiword represents documents using a

Piece table• Consistency properties:

• Piece table has a section fragment• Piece table has a paragraph fragment• Doubly-linked list of fragments is well

formed

AbiWord Screen Shot

Results

• Workload – import (valid) Microsoft Word document that crashes AbiWord

• Bug that creates inconsistent documents with a text fragment before the section fragment

• Without repair• AbiWord crashes when loading the

document• With repair

• AbiWord is able to open and successfully process the document

Parallel x86 emulator

• Parallel x86 emulator for the RAW machine• Multi-tile architecture• Emulator runs x86 binaries on RAW

• Contains L2 cache of translated x86 assembly instructions

• Maintains a constant L2 cache size• Consistency property:

• Computed size of the L2 cache is consistent with its actual size

Results

• Workload – gzip benchmark on x86 emulator

• Bug that (sometimes) adds the size of a cache item twice when it is inserted

• Without repair• Actual cache size goes to zero• x86 emulator crashes

• With repair• Actual cache size is the same as

computed size• Program runs correctly

CTAS

• Set of air-traffic control tools• Traffic management• Arrival planning• Flow visualization• Shortcut planning

• Deployed in centers around country (Dallas/Ft. Worth, Los Angeles, Denver, Miami, Minneapolis/St. Paul, Atlanta, Oakland)

• Approximately 1 million lines of C/C++ code

CTAS Screen Shot

Results

• Workload – recorded radar feed from DFW• Fault insertion

• Simulate error in flight plan processing• Bad airport index in flight plan data

structure • Without repair

• System crashes – segmentation fault• With repair

• Aircraft has different origin or destination• System continues to execute• Anomaly eventually flushed from system

Aspects of CTAS

• Lots of independent subcomputations• System processes hundreds of aircraft –

problem with one should not affect others• Multipurpose system

(visualization, arrival planning, shortcuts, …) – problem in one purpose should not affect others

• Sliding time window: anomalies eventually flushed

• Rebooting ineffective – system will crash again as soon as it sees the problematic flight plan

intro 110 0 1011

directoryblock

inodebitmapblock

blockbitmapblock

inode inode…

inode block

disk blocks

Simplified Linux File System

Some Consistency Properties• inode bitmap consistent with inode

usage• block bitmap consistent with block

usage• directory entries refer to valid inodes • files contain valid blocks only• files do not share blocks

superblock

groupblock

Results

• Workload – write and verify several files • Simulated power failure

• Inode and block bitmap errors• Partially initialized directory and inode

entries• Without repair

• Incorrect file contents because of inode and disk block sharing

• With repair• Bitmaps repaired preventing illegal

sharing, correct file contents

PO MM

OO MP

PO MM

PP MP

Terrain Grid

City Structures

Freeciv

Consistency Properties• Tiles have valid terrain

values• Cities are not in the ocean• Each city has exactly one

reference from the grid

O = OceanP = PlainM = Mountain

Freeciv Screen Shot

Results

• Workload – Freeciv software plays against itself

• Fault insertion – randomly corrupt terrain values

• Without repair – program crashes (seg fault)

• With repair• Game runs just fine• But game plays out differently because

of the different terrain values

Benefits of Eliminating External Consistency Constraints

• Simplifies AbiWord specification• Without goal-directed reasoning, need

additional model constraints• Shortens specifications

• Linux file system and FreeCiv specifications are ~14% shorter

• Removes possibility of errors in external consistency constraints

• Removes possibility of repaired model with no corresponding data structure

Related Work

• Hand-coded repair• Lucent 5ESS switch• IBM MVS operating system

• Integrity Maintenance in Databases • Deriving Production Rules for Constraint

Maintenance (Ceri, Widom)• Automatic Generation of Production Rules

for Integrity Maintenance (Ceri et al)• Constraint analysis: A design process for

specifying operations on objects (Urban et al)

• Consistency management with repair actions (Nentwich et al)

Related Work• Constraint mechanisms in programming

languages• Kaleidoscope (Lopez)• Alphonse (Hoover)

• Self-stabilizing algorithms (Dijkstra)• Log-based recovery for database systems• Recovery-oriented computing

• Microrecovery & Microreboot (Candea,Fox)

• Undo framework (Brown,Patterson)• Specification Languages

• Alloy (Jackson)• UML

Conclusion

• Data structure repair exciting way to (potentially) improve reliability

• Specification-based approach promises to make technique more widely applicable

• Moving towards more robust, probabilistic, continuous concept of system behavior

Implementation

• Size of system: 26,200 lines• Compiler

•20,400 lines of Java code •2,500 lines of parser definitions

• Runtime - 3,200 lines of C code

Time to Check Consistency & Perform Repairs

Application Time to CheckConsistency(ms)

Time to Check and Repair (ms)

AbiWord 0.06 0.55

CTAS 0.07 0.15

FreeCiv 3.62 15.66

File system 4.22 263.14

Lines of Code

Application Lines of Code

AbiWord 360,000

x86 emulator 65,000

CTAS >1 million

FreeCiv 73,000

File system 700

ICSE 2005 (this paper)

• Use goal directed reasoning to eliminate external consistency constraints

• Benefits:• Eliminate need for model constraints to

ensure the repaired model corresponds to a data structure

• Eliminate the possibility of errors in the external consistency constraints

• Eliminate developer overhead of writing external consistency constraints