carnegie mellon scherlis four-a — component adaptation and assurance bill scherlis institute for...
TRANSCRIPT
CarnegieMellon Scherlis
Four-A — Component Adaptation and Assurance
Bill ScherlisInstitute for Software Research
School of Computer Science CMU
412-268-8741
With: John Tang Boyland (UWM), Aaron Greenhouse,
Edwin Chan
DARPA ITSPI Meeting22 Feb 00Aspen, CO
CarnegieMellon Scherlis
This Presentation
• Technical objectives– Code-level assurance
• In development and adaptation– Application to specific
assurance properties• Code safety and threading. • Frameworks.
• Existing practice– Adaptation: JDK evolution– Security: CERT data
• Premises and scope
• Technical approach– Semantics-based manipulation
• Structural. Threads. Etc.– Annotation and analysis
• Uniqueness. Effects. Etc.– Tool-based studies
• Java source-level manipulation
• Accomplishments & plans– Schedule– Expected accomplishments
• Transition– Tool– Infrastructure
CarnegieMellon Scherlis
Four-A Technical Objectives
1. Improve source-level software assurance– Systematically improve code safety, tolerance, etc., using
source-level analysis, annotation, transformation.– Improve the extent of formal assurance using analyses, annotation,
transformation.– Provide scalable and composable approaches for a variety of code-
safety properties, based on annotations.
2. Provide ongoing assurance thru evolution– Avoid re-verification of code safety, tolerance, and other
properties as software components and systems evolve.– Support programmer through adaptation by formally analyzing
and carrying out changes, preserving and enhancing assurance where possible.
CarnegieMellon Scherlis
A Simple Motivating Example: Thread Safety
Annotation. Manipulation. Analysis
1. Thread safety and security• CERT vulnerability data• Exploitation scenario: incremental thread capture
2. Locks and code evolution3. Technical elements
CarnegieMellon Scherlis
Two Documented Vulnerabilities (CERT)Name: ibm/mknod
• Keywords: IBM, AIX, setuid, root access, race condition• Description: Some (if not all) versions of AIX have a setuid /usr/sbin/mknod so
that ordinary users may create name pipes. This is done with a mknod(2) systemcall followed by a chown(2) system call, this opens for a race condition if the user renames the names pipe, and links it to another file before the chown(2) call. So ordinary users may “steal” other users files, and thereby gain unauthorized root access.
• Impact: local user gains root access
Name: noclobber timing window• Keywords: noclobber; timing window; race condition• Description: There is a race condition with respect to the shell variable
noclobber in some implementations of csh/tcsh. Noclobber is supposed to prevent files from being overwritten if they exist.If the file doesn’t exist, some implementations of csh determine that fact with a stat() call. If stat returns ENOENT then the shell proceeds to write on the file. However, the file could be created between then stat and write calls, thus defeating the purposes of the noclobber variable.
• Impact: files are overwritten.
CarnegieMellon Scherlis
An Aside: The CERT Vulnerability Taxonomy
(~ 1200 vulnerabilities)
• Assumptions wrong or changed• Design errors• Errors in requirements specifications• Implementation errors
– Basic programming practices– Improper use of a well understood algorithm– Privileged programs– Timing windows– Trusts something not designed to support trust– Trusts untrustworthy information
• Other problems• User interface
CarnegieMellon Scherlis
Evolving MultiThreaded CodeWork in progress – Aaron Greenhouse
Why– Improve code safety and robustness– Improve performance and flexibility
How– Annotations
• Locks associated with regions (encapsulated sets of fields)• Assignment of locks to (final) fields or instance variables• Lock ordering
– Manipulations• Shrink lock• Split/merge locks
– Analyses• (multiple)
– Tool support
CarnegieMellon Scherlis
EventQueue
• EventQueue– Sends an event to listeners on dequeue.– Priority levels.
• Initial code state– Free of race conditions
• All methods are declared to be synchronized.• [NB. Deadlocks are still possible.]
• Evolution goal– Performance
• Synchronization is too coarse• Remove unneeded synchronization.• Introduce multiple locks.
– Appropriate simultaneous access
Code fragments below illustrate the systematic refinement process.[ Work in progress by Aaron Greenhouse ]
CarnegieMellon Scherlis
class EventQueue{ public region Listeners; public region Normal; public region Priority;
private final unshared List listeners in Listeners { Instance in Instance }; private final unshared List normal in Normal { Instance in Instance }; private final unshared List high in Priority { Instance in Instance }; private int numNormal in Normal; private int numHigh in Priority;
lock this protects Instance;
public EventQueue() reads nothing writes nothing { /* ... */ }
// Continued
CarnegieMellon Scherlis
public synchronized void addEQListener( final EQListener l ) reads nothing writes Listeners { listeners.add( l ); }
private synchronized void fireEQEvent( final Object o ) reads nothing writes All { final EQEvent evt = new EQEvent( this, o ); final List copy = (List)((ArrayList)listeners).clone(); for( int i = 0; i < copy.size(); i++ ) { final EQListener l = (EQListener)copy.get( i ); l.dequeued( evt ); } }
public synchronized int getSize() reads Normal, Priority writes nothing { return numNormal + numHigh; }
private synchronized void dispatchEvent() reads nothing writes All { final Object o = dequeue(); fireEQEvent( o ); }
. . .} // End of class
CarnegieMellon Scherlis
Shrink synchronized Blocks
Step 1: Shrink synchronized blocks.
– Convert synchronized methods to methods with synchronized bodies (trivial).
– Use effects analysis exclude statements not affecting region associated with lock.
• The signature of methods are not changed.– Call sites are not affected.– Other implementations of the method are not affected.
CarnegieMellon Scherlis
class EventQueue { //...
private void fireEQEvent( final Object o ) reads nothing writes All { //... List copy; synchronized( this ) { copy = (List)((ArrayList)listeners).clone(); } //... }
//... private Object dequeue() reads nothing writes Normal, Priority { Object o = null; while( o == null ) { if( (o = tryGetPriority()) == null ) { o = tryGetNormal(); } } return o; }
CarnegieMellon Scherlis
Split the lock
Step 2: Split the lock used by EventQueue.
– In general, replace a lock L on a region R with locks Li on subregions Ri.
– Replace uses of L with uses of appropriate Li.• Use effects analysis to determine affected Ri. • May need to use multiple locks.
– Avoid deadlock by enforcing lock ordering
– Changes how fields must be accessed• Affects: ancestors and descendent classes.
– Why do this:• Improve concurrency• E.g., Agenda queue— potential simultaneous actions
– “Edit” separate queue elements (tasks)– Reorder spine
CarnegieMellon Scherlis
class EventQueue{ public region Listeners; public region Normal; public region Priority;
private final unshared List listeners in Listeners { Instance in Instance }; private final unshared List normal in Normal { Instance in Instance }; private final unshared List high in Priority { Instance in Instance }; private int numNormal in Normal; private int numHigh in Priority;
lock listeners protects Listeners; lock normal protects Normal; lock high protects Priority; sync high before normal;
public EventQueue() reads nothing writes nothing { /* ... */ }
public void addEQListener( final EQListener l ) reads nothing writes Listeners { synchronized( listeners ) { listeners.add( l ); } }
// Continued
CarnegieMellon Scherlis
private void fireEQEvent( final Object o ) reads nothing writes All { final EQEvent evt = new EQEvent( this, o ); List copy; synchronized( listeners ) { copy = (List)((ArrayList)listeners).clone(); } for( int i = 0; i < copy.size(); i++ ) { final EQListener l = (EQListener)copy.get( i ); l.dequeued( evt ); } } public int getSize() reads Normal, Priority writes nothing { synchronized( high ) { synchronized( normal ) { return numNormal + numHigh; } } } private Object tryGetPriority() reads nothing writes Priority { Object o = null; synchronized( high ) { if( numHigh > 0 ) { o = high.remove( 0 ); numHigh -= 1; } } return o; }
CarnegieMellon Scherlis
Case study summary
• The code improvements are routine, but risky – Motivated for good reasons …– Each entails many small changes …– Any change, improperly executed, can create new vulnerabilities
• Much can be done with annotation and manipulation– Enabling ongoing assurance with tool support
• For threading– Manipulations: Shrink lock, Split/Merge locks, etc.– Annotations:
• Locks and regions, Lock order, Lock variables, Effects, etc.– Analyses: Effects, etc.
• Issue: What portion of this activity is “tool feasible”?– Interactive tool (manipulation, analysis, annotation)– Programmer guidance
CarnegieMellon Scherlis
Four-A Hypotheses• In evolving Java systems, semantics-based annotation and analysis
techniques can provide a component-based approach to the assurance of a useful range of safety and tolerance properties.– Many code-safety properties can be composable on a basis of added
specifications for “mechanical” properties• Thread-safety and race conditions• Array bounds, exceptions, extended type safety, null references, etc.
– Annotations and analysis provide a mechanism• Effects. Unique references. Uses limitations. • Regions for effects, locks.• Cf. Extended Static Checking (ESC)
• The safety risks of complex restructuring tasks can be reduced through the use of systematic manipulations– Administrative structural changes
• Boundary movement. Hierarchy restructuring. • Representation change.
– Performance improvements• Lock skrink/split. Inlining.
– Robustness improvements• Method harmonization
CarnegieMellon Scherlis
Four-A Hypotheses• Manipulations can improve software with respect to safety, tolerance, and
robustness properties– Examples
• Introduce redundancies• Insert/remove audits, checks, logging• Insert techniques for graceful degradation
• The annotation, manipulation, and analysis techniques can be supported in Java-based tools – 99% Java– Basis for experimentation and evaluation – Usable and adoptable
• These techniques can be combined to better support the iterative development of intrusion tolerant systems
CarnegieMellon Scherlis
[ Preliminary JDK Census results ]
CarnegieMellon Scherlis
This Presentation
• Technical objectives– Code-level assurance
• In development and adaptation– Application to specific
assurance properties• Code safety and threading. • Frameworks.
• Existing practice– Adaptation: JDC evolution– Security: CERT data
• Premises
• Technical approach– Semantics-based manipulation
• Structural. Threads. Etc.– Annotation and analysis
• Uniqueness. Effects. Etc.– Tool-based studies
• Java source-level manipulation
• Accomplishments & plans– Schedule– Expected accomplishments
• Transition– Tool– Infrastructure
CarnegieMellon Scherlis
Four-A Premises
• Work from code level thru design toward spec– Why: Code as ground truth. Snapshot problem.– Why: Legacy code. Exploit and improve partial specs. – Why: Manage detail design.
• Use partial information about components in a system– Why: Trade secret (COTS). Security. Distributed development.– Cf. whole-program analysis
• Rely on encapsulation, type safety, composable props– Java, (modified) beans, etc.– Why: Scalability. Partial information. Manipulation soundness.
• Focus on administrative change in routine SWE– Why: Appropriate roles for programmers and tools. Adoptability.– Why: Tune for performance, security, robustness
CarnegieMellon Scherlis
Four-A Technologies(Adaptation, Analysis, Annotation, Accounting)
• Semantics-based program manipulation– Source-code and design level– Structural manipulations– Run-time manipulations– Meta-manipulations
• Analysis and models– OO effects, mutability, uniqueness, aliasing, uses, . . .
• Annotation and specification– Mechanical properties
• Tools for assured adaptation of Java components– Information loss and chain of evidence– Use of audit data
CarnegieMellon Scherlis
Systematic Software Adaptation
Routine software structural evolution– Examples:
• API change• Data representation change• Class hierarchy restructuring• Signature change• Introduce self-adaptation• Mobility• Encapsulation• Split into phases / stages• Cloning to produce specialized variants• Merging of related functions• Replication for robustness• Threading changes
Provide tool support for these operations– With predictable impact on functional and mechanical program
properties
CarnegieMellon Scherlis
Assured Software Change
Structural change in practice• Costly
– Changes can be distributed throughout a system.– Complex analysis (program understanding) is required.
• Risky– Invariants and specifications are not present.– Many code elements may need to be changed.– Code elements may be inaccessible for analysis or change.
• Avoided– Why are we stuck with bad structural design decisions?
• Decisions are made early• Consequences are understood late• They often start wrong and stay wrong
– Why do we tolerate brittleness?• Code rot = persistence of abstractions beyond their time.
– Why do commercial APIs accrete? – Why does ad hoc code persist?– Why is it so costly to navigate structural trade-offs?
• Revise interface and component structure• Trade-off generality and performance
CarnegieMellon Scherlis
Assured Software Change
Structural change in practice• Costly• Risky• Avoided• Necessary
– Structural change enables functional change• Localize/encapsulate related software elements• Sustain compatibility with evolving APIs• Address performance issues
– Structural change enables code management• Code rot = persistence of abstractions beyond their time.• Create views to support programming aspects• Cf. AOP. SOP. N-Dim.
– Navigate structural trade-offs during design/evolution• Support iterative software processes
CarnegieMellon Scherlis
Example: Move Field
Move field f from class C to class A.
Checks– C is descendent of A.– If A is interface, f must be public static final.– Shadowing: A and B have no use of ancestral f.– Unshadowing: No f field in B (capture C’s f uses).– D (and other sibs) have no uses of f.– Initializer code can be reordered, by field type.– Reordering is acceptable for interleaved
constructor and field code.
Actions– Adjust access tags– Handle special cases
Caveats– Visibility in D and other sibs– Visibility in C’s subs– Promises introduced– Changes in binary compatibility
A
C
B
foo bar f
f
D
Programmers can do this using drag-and-drop.
CarnegieMellon Scherlis
Example: Rename Method
Rename methods m from oldName to newName.
Checks– Methods called at a callsite for oldName()
or newName() are the unchanged– Bindings
• Callsites used to dispatch to unchanged methods in override group
– Name conflict• Callsites now dispatch to methods in a
previously existing override group– Uses checks and annotations to assure binary
compatibilityActions
– Rename methods– Rename proved callsites– Name checks/maps for dynamic sites/classes
Caveats– Deletion from override and olverload groups
for A.oldName()– Addition to override and overload groups for
newName()– Promises introduced – Changes in binary compatibility (modulo uses
annotations)
Programmers could do this with a simple gesture
oldName()
A
m oldName()
C
oldName()
D m oldName()
B
newName()
CarnegieMellon Scherlis
Manipulations
• Manipulations enable systematic structural change– Trade-off generality and performance– Sacrifice (or introduce) abstractions– Reorganize component boundaries– Introduce or adjust run-time (later stage) manipulations
• Managed self-adaptivity
• Manipulations are idiomatic program evolution steps– Precise expression of “patterns of evolution” or “refactorings”– Enable rapid/dynamic structural change (fluid programming)– Enable model-based programming (analytic views)
• Tool role– Programmer: Design intent, exploration of structural options– Tool: Mechanical details, soundness, design record
CarnegieMellon Scherlis
Manipulation Techniques(Examples, 1)
• Boundary movement (ISAW’98)– Code relocation (expression, statement, method, class)– Abstract/unfold (method, variable, class)– Clone (class, method, etc.)
• Frequency change– Pass separation– Tabulation/closure
• Data representation change (ESOP’98)– Shift– Idempotency, Projection– Destructive operations
• Hierarchy restructuring– Hoist– Insert– Split/clone = = = . . .
= =
CarnegieMellon Scherlis
Manipulation Techniques(Examples, 2)
• Staging, specialization, splitting– (Partial evaluation)– Merging and generalization– Pass separation
• Thread management – Shrink, Split, Merge– Insert, Remove
• Self-adaptation– Meta-manipulation– Polyvariance and domain-tolerance
• Integrity– Replication– Redundant checks
CarnegieMellon Scherlis
Four-A Technologies(Adaptation, Analysis, Annotation, Accounting)
• Semantics-based program manipulation– Source-code and design level– Structural manipulations– Run-time manipulations– Meta-manipulations
• Analysis and models– OO effects, mutability, uniqueness, aliasing, uses, . . .
• Annotation and specification– Mechanical properties
• Tools for assured adaptation of Java components– Information loss and chain of evidence– Use of audit data
CarnegieMellon Scherlis
Specifications for mechanical properties• Manipulations require analyses
– Example• Manipulation: Reorder code• Analyses: Effects, aliasing (may-equal and uniqueness), uses.
• At scale:– Development is distributed/collaborative. – Functional specifications (and source code) may be lacking.– Programs are dynamically linked, mobile, etc.
• Analyses for manipulation– Composable: Whole-program analysis are infeasible– Goal-directed: Compiler analyses are “opportunistic”
• Analyses require mechanical assertions– Annotations (promises) about components and their elements
CarnegieMellon Scherlis
Properties specified by assertions
• Mechanical properties specified (examples)– Read/write effects in OO systems
• Enable reordering• Use aliasing and uniqueness information (ECOOP’99)• Region designation
– Unique references• Tolerate temporary loss of uniqueness (borrowed)
– Structure declarations• Precise control over uses
– Mutability
• Promises as a currency of flexibility (ICSE’98)– Promises change less frequently than code– Tools identify potential promises
• Programmer chooses which to offer clients– Programmer can request specific promises– Tool manages dependency and validation information
CarnegieMellon Scherlis
Effects Analysis for Manipulation
Manipulation example
Goal: Move statement C; A; B; C; C; A; B;
1. Compute sets of effectsFor each of: A; B; C;
2. Test for interference among computations:
For A; , C; and B; , C;
Analyses
1. What are the effects for a given computation?
2. Do two (or more) given targets overlap?
CarnegieMellon Scherlis
Key Ideas: OO Effects(ECOOP’99)
• Source-level analysis of partial programs– Do not want, and may not have, the whole program– Use annotations on methods as surrogates for components
• Use of regions and aliases to analyze OO programs– Encapsulate state of objects in regions to protect programmer
abstractions– Use aliasing information (may-equal and unique) to improve results
• Programmer-guided source-level manipulation– Goal-directed analysis (vs. compile-time opportunistic analysis)
CarnegieMellon Scherlis
Code safety: Why Unique Variables?
• Sole access to an object entails certain privileges:– Mutations can be performed without regard to rest of
program (no other read access)– Invariants can be maintained without regard to rest of
program (no other write access)
• Program invariants are ideally– Explicit (code readability)– Checked (code maintainability)
CarnegieMellon Scherlis
Uniqueness examples
• String buffer character array– If unique:
• Can be coerced to immutable when final string is desired.
• Vector internal array– If unique:
• Mutations of separate vectors can be reordered.
• Hashtable internal array– If unique:
• One can enforce hashing invariants, • And can rehash without interference.
CarnegieMellon Scherlis
Four-A Technologies(Adaptation, Analysis, Annotation, Accounting)
• Semantics-based program manipulation– Source-code and design level– Structural manipulations– Run-time manipulations– Meta-manipulations
• Analysis and models– OO effects, mutability, uniqueness, aliasing, uses, . . .
• Annotation and specification– Mechanical properties
• Tools for assured adaptation of Java components– Information loss and chain of evidence– Use of audit data
CarnegieMellon Scherlis
Information ManagementThe Internal Representation (IR)
• Features– Global name spaces
• Entitles (fluid.ir.IRNode)• Attributes (fluid.ir.SlotInfo)• Types
– Versioning• Several policies• Possible at cell level• Configurations
– Dependencies• Notification• Tracking
– Conventional wrappers• Attribute patterns:
navigable ordered trees, etc.– Collaboration support
• Persistence• Fine-grained concurrency policy• Surrogacy
Entitynamespace
Attributenamespace
Cell
CarnegieMellon Scherlis
The version forest
Initial version
Latest release
(abandoned)
Latest snapshot
ExperimentalConfiguration
A growing tip in the tree
Each transition representsa manipulation
Shared manipulations
E.g.,400,000 nodes10,000 versions
CarnegieMellon Scherlis
[ demo ]
CarnegieMellon Scherlis
This Presentation
• Technical objectives– Code-level assurance
• In development and adaptation– Application to specific
assurance properties• Code safety and threading. • Frameworks.
• Existing practice– Adaptation: JDK evolution– Security: CERT data
• Premises and scope
• Technical approach– Semantics-based manipulation
• Structural. Threads. Etc.– Annotation and analysis
• Uniqueness. Effects. Etc.– Tool-based studies
• Java source-level manipulation
• Accomplishments & plans– Schedule– Expected accomplishments
• Transition– Tool– Infrastructure
CarnegieMellon Scherlis
Four-A Schedule• Year 1
– Tool infrastructure• 99% Java, analysis, annotation, adaptation, accounting
– Analysis algorithms (uniqueness, effects, mayEqual, etc.)– Demonstrate preservation of assurance properties thru change– Manipulations for threading– Case studies for thread safety and pattern
• Year 2– Class-level structural manipulations– Management of uses information– Exploitation of aliasing annotations to assure code safety props– Threading annotations and analyses– Design record to support assurance information
• Year 3– Manipulation library for improvement of code safety
• Prevent Detect Tolerate– Large-scale manipulation through analytic views– Tool-based case study based on intrusion scenarios
CarnegieMellon Scherlis
Recent Accomplishments
• Four-A tool prototype– Supports non-local manipulations
• Annotations and analyses– Unique. MayEqual.
• Support for evolving multi-threaded code safely– Manipulations (preliminary form)– Annotations
• Software engineering baseline– Evolution census: JDK changes: source code, logs
CarnegieMellon Scherlis
Transition
• Build on mainstream commercial technologies– Java, beans, etc.
• Build on existing infrastructure– Tool (developed by our team) for Java analysis, manipulation,
engineering process, design information management.– Platform (UI, IM, VM, syntax) is also usable for other languages.
• Usability/adoptability a priority from the outset– Enable experimentation/studies without high adoption cost– E.g., gesture-based interface where possible
• Conduct engineering baseline analyses– What are the code-level vulnerabilities being exploited?– What kinds of changes are routinely made in commercial APIs?– What is the impact of those changes on code safety?