cil: intermediate language and tools for analysis and transformation of c programs george c.necula...

40
CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of California, Berkeley Proc. of Conference on Compiler Construction, 2002

Upload: shayna-scaife

Post on 01-Apr-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

CIL:Intermediate Language and

Tools for Analysis and Transformation of C programs

George C.NeculaScott McPeak

S.P.RahulWestley Weimer

University of California, BerkeleyProc. of Conference on Compiler Construction, 2002

Page 2: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

INDEX

Author

Questions

Overview

Introduction

Evaluation

Page 3: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

1st AUTHOR

George C.Necula

Scott McPeak

S.P.Rahul

Westley Weimer

Page 4: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

George C.Necula George C. Necula, Philip Wadler: Proceedings of the 35th ACM SIGPLAN-SIGACT

Symposium on Principles of Programming Languages, POPL 2008, San Francisco, California, USA, January 7-12, 2008 ACM 2008

Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008)

François Pottier, George C. Necula: Proceedings of TLDI'07: 2007 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, Nice, France, January 16, 2007 ACM 2007

Jeremy Condit, Matthew Harren, Zachary R. Anderson, David Gay, George C. Necula: Dependent Types for Low-Level Programming. ESOP 2007: 520-535

Bor-Yuh Evan Chang, Xavier Rival, George C. Necula: Shape Analysis with Structural Invariant Checkers. SAS 2007: 384-401

Ajay Chander, David Espinosa, Nayeem Islam, Peter Lee, George C. Necula: Enforcing resource bounds via static verification of dynamic checks. ACM Trans. Program. Lang. Syst. 29(5): (2007)

Page 5: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

CONTINUED Jens Knoop, George C. Necula, Wolf Zimmermann: Preface. Electr. Notes Theor. Comput.

Sci. 176(3): 1-2 (2007)

Sumit Gulwani, George C. Necula: A polynomial-time algorithm for global value numbering. Sci. Comput. Program. 64(1): 97-114 (2007)

George C. Necula: Using Dependent Types to Port Type Systems to Low-Level Languages. CC 2006: 1

Feng Zhou, Jeremy Condit, Zachary R. Anderson, Ilya Bagrak, Robert Ennals, Matthew Harren, George C. Necula, Eric A. Brewer: SafeDrive: Safe and Recoverable Extensions Using Language-Based Techniques. OSDI 2006: 45-60

Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, George C. Necula: XFI: Software Guards for System Address Spaces. OSDI 2006: 75-88

Bor-Yuh Evan Chang, Matthew Harren, George C. Necula: Analysis of Low-Level Code Using Cooperating Decompilers. SAS 2006: 318-335

Bor-Yuh Evan Chang, Adam J. Chlipala, George C. Necula: A Framework for Certified Program Analysis and Its Applications to Mobile-Code Safety. VMCAI 2006: 174-189

Page 6: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Scott McPeak Scott McPeak, George C. Necula: Data Structure Specifications via Local Equality Axioms.

CAV 2005: 476-490

George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3): 477-526 (2005)

Scott McPeak, George C. Necula: Elkhound: A Fast, Practical GLR Parser Generator. CC 2004: 73-88

Jeremy Condit, Matthew Harren, Scott McPeak, George C. Necula, Westley Weimer: CCured in the real world. PLDI 2003: 232-244

George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228

George C. Necula, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy code. POPL 2002: 128-139

Dan Bonachea, Eugene Ingerman, Joshua Levy, Scott McPeak: An Improved Adaptive Multi-Start Approach to Finding Near-Optimal Solutions to the Euclidean TSP. GECCO 2000: 143-150

Page 7: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

S.P.Rahul George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate

Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228

George C. Necula, Shree Prakash Rahul: Oracle-based checking of untrusted software. POPL 2001: 142-154

Page 8: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Westley Weimer Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, Claire Le Goues: A genetic

programming approach to automated software repair. GECCO 2009: 947-954

Raymond P. L. Buse, Westley Weimer: The road not taken: Estimating path execution frequency statically. ICSE 2009: 144-154

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest: Automatically finding patches using genetic programming. ICSE 2009: 364-374

Pieter Hooimeijer, Westley Weimer: A decision procedure for subset constraints over regular languages. PLDI 2009: 188-198

Tamim I. Sookoor, Timothy W. Hnat, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: Macrodebugging: global views of distributed program execution. SenSys 2009: 141-154

Claire Le Goues, Westley Weimer: Specification Mining with Few False Positives. TACAS 2009: 292-306

Page 9: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

CONTINUED Nicholas Jalbert, Westley Weimer: Automated duplicate detection for bug tracking systems.

DSN 2008: 52-61

bibliographical record in XML Kinga Dobolyi, Westley Weimer: Changing Java's Semantics for Handling Null Pointer Exceptions. ISSRE 2008: 47-56

Raymond P. L. Buse, Westley Weimer: A metric for software readability. ISSTA 2008: 121-130

Raymond P. L. Buse, Westley Weimer: Automatic documentation inference for exceptions. ISSTA 2008: 273-282

Xiang Yin, John C. Knight, Elisabeth A. Nguyen, Westley Weimer: Formal Verification by Reverse Synthesis. SAFECOMP 2008: 305-319

Timothy W. Hnat, Tamim I. Sookoor, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: MacroLab: a vector-based macroprogramming framework for cyber-physical systems. SenSys 2008: 225-238

Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008)

Page 10: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

2nd Questions

Q1: What does the recursive structure transformation look like in CIL?

Q2: What's the implementation of Integrating a CFG into the Intermediate Language?

Q3: How do they achieve the goal of making code immune to stack-smashing attack?

Q4: What are the difficulties in designing the whole-program merger and what about implementation?

Q5: How does the merger deal with .lib and .dll?

Page 11: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BackDraws in C

Phenomenon: the same syntax but different meanings.

What if low-level representation?

No ambiguities for loss of structural information about types, loops, and other high-level constructs.

Page 12: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

3rd OVERVIEW

CIL

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program.

Feature

The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics.

Translating from CIL to C is fairly easy.

Page 13: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Q1: What does the recursive structure transformation look like in CIL?

Q2: What's the implementation of Integrating a CFG into the Intermediate Language? (After transformation, call Cil.computeCFGInfo<Compute all statements and find the successor and predecessor of each statement;Return a list of statements>)

Page 14: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of
Page 15: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

4th INTRODUCTION

Basic components

Compilation( C ---> CIL)

A whole-program merger

Representative application

Page 16: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BASIC COMPONENTS

Lvalue

An lvalue is expressed as a pair of a base plus an offset. The base address can be either the starting address for the storage for a variable (local or global) or any pointer expression.

Page 17: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of
Page 18: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BASIC COMPONENTS

Expression & Instruction

Note:

Casts are inserted explicitly to make the program conform to our type system.

Page 19: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BASIC COMPONENTS

Statement

Page 20: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BASIC COMPONENTS

TypesCIL moves all type declarations to the beginning of the program

and gives them global scope.

Page 21: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

All anonymous composite types are given unique names in CIL and every composite types has its own declaration at the top-level.

Page 22: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

BASIC COMPONENTS

AttributesIt is often useful to have a mechanism for the programmer to

communicate additional information to the program analysis.

Page 23: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

The type attributes for a base type must be specified immediately following the type.

Page 24: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

The type attributes for a pointer type must be specified immediately after the * symbol.

Page 25: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

The attributes for a function type or for an array type can be specified using parenthesized declarators.

Page 26: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

COMPILATION One of the most significant transformations is that expressions

that contain side-effects are separated into statements.

Page 27: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Type specifiers are interpreted and normalized.

Page 28: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Nested structure tag definitions are pulled apart. This means that all structure tag definitions can be found by a simple scan of the globals.

Page 29: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Prototypes are added for those functions that are called before being defined. Furthermore, if a prototype exists but does not specify the type of parameters that is fixed.

Page 30: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Initializers are normalized to include specific initialization for the missing elements.

Page 31: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

CIL will remove from the source file those type declarations, local variables and inline functions that are not used in the file. This means that your analysis does not have to see all the ugly stuff that comes from the header files.

Page 32: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Local variables in inner scopes are pulled to function scope (with appropriate renaming). Local scopes thus disappear. This makes it easy to find and operate on all local variables in a function.

Page 33: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

A WHOLE-PROGRAM MERGER

A tool that merges all of a program’s compilation units into a single compilation unit, with proper renaming to preserve semantics considering many analyses are most effective when applied to the whole program.

Page 34: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Q4: What's the difficulties in designing the whole-program merger and what about implementation?

Page 35: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

File-scope identifiers must be renamed properly to avoid clashes with globals and with similar identifiers in different files.

Page 36: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Solution: Structural Equivalence VS Name Equivalence

For each file there are two merging phases. In the first phase we merge the types and tags.Then in the second stage we rewrite the variable declarations and function bodies.

Page 37: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

REPRESENTATIVE APP

Q3: How do they achieve the goal of making code immune to stack-smashing attack?

CIL modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack.

Page 38: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of
Page 39: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

5th EVALUATION

CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects.

CIL was tested against GCC’s c-torture testsuite and (except for the tests involving complex numbers and inner functions, which CIL does not currently implement) CIL passes most of the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that it should pass. GCC itself fails 19 tests.

Page 40: CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of

Thank you!

More information at http://hal.cs.berkeley.edu/cil/