color lecture notes

Upload: azarahamed

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Color Lecture Notes

    1/237

    CS 352 Compilers: Principles and Practice

    1. Introduction 2

    2. Lexical analysis 31

    3. LL parsing 58

    4. LR parsing 110

    5. JavaCC and JTB 127

    6. Semantic analysis 150

    7. Translation and simplification 165

    8. Liveness analysis and register allocation 185

    9. Activation Records 216

    1

  • 8/6/2019 Color Lecture Notes

    2/237

    Chapter 1: Introduction

    2

  • 8/6/2019 Color Lecture Notes

    3/237

    Things to do

    make sure you have a working mentor account

    start brushing up on Java

    review Java development tools find http://www.cs.purdue.edu/homes/palsberg/cs352/F00/index.html

    add yourself to the course mailing list by writing (on a CS computer)

    mailer add me to cs352

    Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/orfee. Request permission to publish from [email protected].

    3

  • 8/6/2019 Color Lecture Notes

    4/237

    Compilers

    What is a compiler?

    a program that translates an executableprogram in one language into

    an executable program in another language

    we expect the program produced by the compiler to be better, in someway, than the original

    What is an interpreter?

    a program that reads an executableprogram and produces the results

    of running that program

    usually, this involves executing the source program in some fashion

    This course deals mainly with compilers

    Many of the same issues arise in interpreters4

  • 8/6/2019 Color Lecture Notes

    5/237

    Motivation

    Why study compiler construction?

    Why build compilers?

    Why attend class?

    5

  • 8/6/2019 Color Lecture Notes

    6/237

    Interest

    Compiler construction is a microcosm of computer science

    artificial intelligence greedy algorithms

    learning algorithmsalgorithms graph algorithms

    union-find

    dynamic programmingtheory DFAs for scanning

    parser generators

    lattice theory for analysissystems allocation and naming

    locality

    synchronization

    architecture pipeline managementhierarchy management

    instruction set use

    Inside a compiler, all these things come together6

  • 8/6/2019 Color Lecture Notes

    7/237

    Isnt it a solved problem?

    Machines are constantly changing

    Changes in architecture changes in compilers

    new features pose new problems

    changing costs lead to different concerns

    old solutions need re-engineering

    Changes in compilers should prompt changes in architecture

    New languages and features

    7

  • 8/6/2019 Color Lecture Notes

    8/237

    Intrinsic Merit

    Compiler construction is challenging and fun

    interesting problems

    primary responsibility for performance (blame)

    new architectures new challenges

    real results

    extremely complex interactions

    Compilers have an impact on how computers are used

    Compiler construction poses some of the most interesting problems in

    computing8

  • 8/6/2019 Color Lecture Notes

    9/237

    Experience

    You have used several compilers

    What qualities are important in a compiler?

    1. Correct code

    2. Output runs fast

    3. Compiler runs fast

    4. Compile time proportional to program size

    5. Support for separate compilation6. Good diagnostics for syntax errors

    7. Works well with the debugger

    8. Good diagnostics for flow anomalies

    9. Cross language calls

    10. Consistent, predictable optimization

    Each of these shapes your feelings about the correct contents of this course9

  • 8/6/2019 Color Lecture Notes

    10/237

    Abstract view

    errors

    compilercode codesource machine

    Implications:

    recognize legal (and illegal) programs

    generate correct code

    manage storage of all variables and code

    agreement on format for object (or assembly) code

    Big step up from assembler higher level notations10

  • 8/6/2019 Color Lecture Notes

    11/237

    Traditional two pass compiler

    codesource codemachinefrontend backendIR

    errors

    Implications:

    intermediate representation (IR)

    front end maps legal code into IR

    back end maps IR onto target machine

    simplify retargeting

    allows multiple front ends

    multiple passes better code

    11

  • 8/6/2019 Color Lecture Notes

    12/237

    A fallacy

    backend

    frontend

    FORTRANcode

    frontend

    frontend

    frontend

    backend

    backend

    code

    code

    code

    C++

    CLU

    Smalltalk

    target1

    target2

    target3

    Can we build n m compilers with n m components?

    must encode all the knowledge in each front end

    must represent all the features in one IR

    must handle all the features in each back end

    Limited success with low-level IRs12

  • 8/6/2019 Color Lecture Notes

    13/237

  • 8/6/2019 Color Lecture Notes

    14/237

    Front end

    codesource tokens

    errors

    scanner parser IR

    Scanner:

    maps characters into tokens the basic unit of syntax

    becomes

    id, id, id,

    character string value for a token is a lexeme

    typical tokens: number, id, , , , , ,

    eliminates white space (tabs, blanks, comments)

    a key issue is speed

    use specialized recognizer (as opposed to

    )14

  • 8/6/2019 Color Lecture Notes

    15/237

    Front end

    codesourcetokens

    errors

    scanner parser IR

    Parser:

    recognize context-free syntax

    guide context-sensitive analysis

    construct IR(s)

    produce meaningful error messages

    attempt error correction

    Parser generators mechanize much of the work15

  • 8/6/2019 Color Lecture Notes

    16/237

    Front end

    Context-free syntax is specified with a grammar

    sheep noise

    ::=

    sheep noise

    This grammar defines the set of noises that a sheep makes under normalcircumstances

    The format is called Backus-Naur form (BNF)

    Formally, a grammar G S N T P

    S is the start symbol

    N is a set of non-terminal symbols

    T is a set of terminal symbols

    P is a set of productions or rewrite rules (P : N N T)

    16

  • 8/6/2019 Color Lecture Notes

    17/237

    Front end

    Context free syntax can be put to better use

    1

    goal

    ::=

    expr

    2

    expr

    ::=

    expr

    op

    term

    3

    term

    4

    term

    ::=

    5

    6 op ::= 7

    This grammar defines simple expressions with addition and subtractionover the tokens and

    S = goal

    T = , , ,

    N = goal , expr , term , op

    P = 1, 2, 3, 4, 5, 6, 7

    17

    F t d

  • 8/6/2019 Color Lecture Notes

    18/237

    Front end

    Given a grammar, valid sentences can be derived by repeated substitution.

    Prodn. Result

    goal

    1

    expr

    2

    expr

    op

    term

    5

    expr

    op

    7

    expr

    2

    expr

    op

    term

    4

    expr

    op

    6 expr 3 term 5

    To recognize a valid sentence in some CFG, we reverse this process and

    build up a parse18

    F t d

  • 8/6/2019 Color Lecture Notes

    19/237

    Front end

    A parse can be represented by a tree called a parse or syntax tree

    2

    >

  • 8/6/2019 Color Lecture Notes

    20/237

    Front end

    So, compilers often use an abstract syntax tree

    2>

  • 8/6/2019 Color Lecture Notes

    21/237

    Back end

    errors

    IR allocationregisterselectioninstruction machinecode

    Responsibilities

    translate IR into target machine code

    choose instructions for each IR operation

    decide what to keep in registers at each point

    ensure conformance with system interfaces

    Automation has been less successful here21

    Back end

  • 8/6/2019 Color Lecture Notes

    22/237

    Back end

    errors

    IR allocationregister machine

    codeinstructionselection

    Instruction selection:

    produce compact, fast code

    use available addressing modes

    pattern matching problem

    ad hoc techniques tree pattern matching

    string pattern matching

    dynamic programming

    22

    Back end

  • 8/6/2019 Color Lecture Notes

    23/237

    Back end

    errors

    IR machinecodeinstructionselection registerallocation

    Register Allocation:

    have value in a register when used

    limited resources

    changes instruction choices

    can move loads and stores

    optimal allocation is difficult

    Modern allocators often use an analogy to graph coloring23

    Traditional three pass compiler

  • 8/6/2019 Color Lecture Notes

    24/237

    Traditional three pass compiler

    IR

    errors

    IRmiddlefront backend end endsourcecode codemachine

    Code Improvement

    analyzes and changes IR

    goal is to reduce runtime

    must preserve values

    24

  • 8/6/2019 Color Lecture Notes

    25/237

  • 8/6/2019 Color Lecture Notes

    26/237

    The Tiger compiler

    Parse TranslateLexCanon-Semantic

    Analysis calize

    Instruction

    Selection

    Frame

    Layout

    Parsing

    Actions

    SourceProgram

    Tokens

    Pass 10

    Reductions

    AbstractSyntax

    Translate

    IRTrees

    IRTrees

    Frame

    Tables

    Environ-

    ments

    Assem

    Control

    FlowAnalysis

    Data

    FlowAnalysis

    Register

    Allocation

    Code

    Emission Assembler

    MachineLanguage

    Asse

    m

    FlowG

    raph

    Interferenc

    eGraph

    RegisterAs

    signment

    AssemblyL

    anguage

    RelocatableO

    bjectCode

    Pass 1 Pass 4

    Pass 5 Pass 8 Pass 9

    Linker

    Pass 2

    Pass 3

    Pass 6 Pass 7

    26

    The Tiger compiler phases

  • 8/6/2019 Color Lecture Notes

    27/237

    The Tiger compiler phases

    Lex Break source file into individual words, or tokens

    Parse Analyse the phrase structure of programParsingActions

    Build a piece of abstract syntax tree for each phrase

    SemanticAnalysis

    Determine what each phrase means, relate uses of variables to theirdefinitions, check types of expressions, request translation of eachphrase

    FrameLayout

    Place variables, function parameters, etc., into activation records (stackframes) in a machine-dependent way

    Translate Produce intermediate representation trees(IR trees), a notation that isnot tied to any particular source language or target machine

    Canonicalize Hoist side effects out of expressions, and clean up conditional branches,for convenience of later phases

    InstructionSelection

    Group IR-tree nodes into clumps that correspond to actions of target-machine instructions

    Control FlowAnalysis

    Analyse sequence of instructions into control flow graph showing allpossible flows of control program might follow when it runs

    Data Flow

    Analysis

    Gather information about flow of data through variables of program; e.g.,

    liveness analysis calculates places where each variable holds a still-needed (live) value

    RegisterAllocation

    Choose registers for variables and temporary values; variables not si-multaneously live can share same register

    CodeEmission

    Replace temporary names in each machine instruction with registers

    27

    A straight-line programming language

  • 8/6/2019 Color Lecture Notes

    28/237

    A straight line programming language

    A straight-line programming language (no loops or conditionals):

    Stm

    Stm; Stm CompoundStmStm : Exp AssignStmStm ExpList PrintStmExp IdExpExp NumExp

    Exp Exp Binop Exp OpExpExp Stm Exp EseqExpExpList Exp ExpList PairExpList

    ExpList Exp LastExpList

    Binop

    PlusBinop

    Minus

    Binop Times

    Binop Div

    e.g., : 5 3; : 1 10 ;

    prints:

    28

    Tree representation

  • 8/6/2019 Color Lecture Notes

    29/237

    Tree representation

    : 5 3; : 1 10 ;

    AssignStm

    CompoundStm

    a OpExp

    PlusNumExp

    5

    NumExp

    3

    AssignStm

    b EseqExp

    PrintStm

    PairExpList

    IdExp

    a

    LastExpList

    OpExp

    MinusIdExp NumExp

    a 1

    OpExp

    NumExp Times IdExp

    a10

    PrintStm

    LastExpList

    IdExp

    b

    CompoundStm

    This is a convenient internal representation for a compiler to use.

    29

    Java classes for trees

  • 8/6/2019 Color Lecture Notes

    30/237

    30

  • 8/6/2019 Color Lecture Notes

    31/237

    Chapter 2: Lexical Analysis

    31

    Scanner

  • 8/6/2019 Color Lecture Notes

    32/237

    code

    source tokens

    errors

    scanner parser IR

    maps characters into tokens the basic unit of syntax

    becomes id, id, id,

    character string value for a token is a lexeme

    typical tokens: number, id,

    ,

    ,

    ,

    ,

    ,

    eliminates white space (tabs, blanks, comments)

    a key issue is speed use specialized recognizer (as opposed to )

    Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or

    fee. Request permission to publish from [email protected].

    32

    Specifying patterns

  • 8/6/2019 Color Lecture Notes

    33/237

    A scanner must recognize various parts of the languages syntax

    Some parts are easy:

    white space ws ::= ws

    ws

    keywords and operators

    specified as literal patterns:

    ,

    comments

    opening and closing delimiters:

    33

    Specifying patterns

  • 8/6/2019 Color Lecture Notes

    34/237

    A scanner must recognize various parts of the languages syntax

    Other parts are much harder:

    identifiers

    alphabetic followed by k alphanumerics ( , $, &, . . . )

    numbers

    integers: 0 or digit from 1-9 followed by digits from 0-9

    decimals: integer

    digits from 0-9

    reals: (integer or decimal)

    (+ or -) digits from 0-9

    complex:

    real

    real

    We need a powerful notation to specify these patterns34

    Operations on languages

  • 8/6/2019 Color Lecture Notes

    35/237

    Operation Definition

    unionof L and M L M s s L or s Mwritten L M

    concatenation of L and M LM st s L and t Mwritten LM

    Kleene closureof L L

    i 0 L

    i

    written L

    positive closureof L L

    i

    1 Li

    written L

    35

    Regular expressions

  • 8/6/2019 Color Lecture Notes

    36/237

    Patterns are often specified as regular languages

    Notations used to describe a regular language (or a regular set) includeboth regular expressionsand regular grammars

    Regular expressions (over an alphabet):

    1. is a RE denoting the set

    2. if a , then a is a RE denoting a

    3. if r and s are REs, denoting L

    r

    and L

    s

    , then:

    r is a RE denoting L r

    r

    s is a RE denoting L r

    L

    s

    r s is a RE denoting L r L s

    r

    is a RE denoting L r

    If we adopt a precedencefor operators, the extra parentheses can go away.

    We assume closure, then concatenation, then alternation as the order of

    precedence.36

    Examples

  • 8/6/2019 Color Lecture Notes

    37/237

    identifier

    letter a b c z A B C Z

    digit 0 1 2 3 4 5 6 7 8 9

    id letter

    letter digit

    numbers

    integer

    0 1 2 3 9 digit

    decimal integer . digit

    real integer decimal digit

    complex real real

    Numbers can get much more complicated

    Most programming language tokens can be described with REs

    We can use REs to build scanners automatically37

    Algebraic properties of REs

  • 8/6/2019 Color Lecture Notes

    38/237

    Axiom Description

    r s s r is commutativer

    s t

    r s

    t is associative rs t r st concatenation is associativer

    s t

    rs rt concatenation distributes over

    s

    t

    r

    sr

    trr r is the identity for concatenationr

    r

    r

    r

    relation between and r

    r is idempotent

    38

    Examples

  • 8/6/2019 Color Lecture Notes

    39/237

    Let a b

    1. a b denotes a b

    2. a b a b denotes aa ab ba bb

    i.e., a b a b aa ab ba bb

    3. a denotes a aa aaa

    4. a b denotes the set of all strings of as and bs (including )

    i.e., a b a b

    5. a a b denotes a b ab aab aaab aaaab

    39

    Recognizers

  • 8/6/2019 Color Lecture Notes

    40/237

    From a regular expression we can construct a

    deterministic finite automaton(DFA)

    Recognizer for identifier:

    0 21

    3

    digit

    other

    letter

    digit

    letter

    other

    error

    accept

    identifierletter a b c z A B C Z

    digit 0 1 2 3 4 5 6 7 8 9

    id letter letter digit

    40

    Code for the recognizer

  • 8/6/2019 Color Lecture Notes

    41/237

    41

    Tables for the recognizer

  • 8/6/2019 Color Lecture Notes

    42/237

    Two tables control the recognizer

    a

    z A

    Z 0

    9 othervalue letter letter digit other

    class 0 1 2 3

    letter 1 1 digit 3 1

    other 3 2

    To change languages, we can just change tables42

    Automatic construction

  • 8/6/2019 Color Lecture Notes

    43/237

    Scanner generators automatically construct code from regular expression-

    like descriptions

    construct a dfa

    use state minimization techniques

    emit code for the scanner

    (table driven or direct code )

    A key issue in automation is an interface to the parser

    is a scanner generator supplied with UNIX

    emits C code for scanner

    provides macro definitions for each token

    (used in the parser)

    43

    Grammars for regular languages

  • 8/6/2019 Color Lecture Notes

    44/237

    Can we place a restriction on the form of a grammar to ensure that it de-

    scribes a regular language?

    Provable fact:

    For any RE r, there is a grammar g such that L

    r

    L

    g

    .

    The grammars that generate regular sets are called regular grammars

    Definition:

    In a regular grammar, all productions have one of two forms:

    1. A aA

    2. A a

    where A is any non-terminal and a is any terminal symbol

    These are also called type 3 grammars (Chomsky)44

    More regular languages

  • 8/6/2019 Color Lecture Notes

    45/237

    Example: the set of strings containing an even number of zeros and an

    even number of ones

    s0 s1

    s2 s3

    1

    1

    0 0

    1

    1

    0 0

    The RE is 00 11 01 10 00 11 01 10 00 11

    45

    More regular expressions

  • 8/6/2019 Color Lecture Notes

    46/237

    What about the RE

    a b

    abb ?

    s0 s1 s2 s3

    a

    b

    a b b

    State s0

    has multiple transitions on a!

    nondeterministic finite automaton

    a b

    s0 s0 s1 s0

    s1 s2 s2 s3

    46

    Finite automata

    A non deterministic finite automaton (NFA) consists of:

  • 8/6/2019 Color Lecture Notes

    47/237

    A non-deterministic finite automaton(NFA) consists of:

    1. a set of states S s0 sn

    2. a set of input symbols (the alphabet)

    3. a transition function movemapping state-symbol pairs to sets of states

    4. a distinguished start state s0

    5. a set of distinguished accepting or final states F

    A Deterministic Finite Automaton(DFA) is a special case of an NFA:

    1. no state has a -transition, and

    2. for each state s and input symbol a, there is at most one edge labelled

    a leaving s.

    A DFA accepts x iff. there exists a unique path through the transition graph

    from the s0 to an accepting state such that the labels along the edges spell

    x.47

    DFAs and NFAs are equivalent

  • 8/6/2019 Color Lecture Notes

    48/237

    1. DFAs are clearly a subset of NFAs

    2. Any NFA can be converted into a DFA, by simulating sets of simulta-

    neous states:

    each DFA state corresponds to a set of NFA states

    possible exponential blowup

    48

    NFA to DFA using the subset construction: example 1

  • 8/6/2019 Color Lecture Notes

    49/237

    s0 s1 s2 s3

    a b

    a b b

    a b s0 s0 s1 s0

    s0 s1 s0 s1 s0 s2

    s0 s2 s0 s1 s0 s3

    s0 s3 s0 s1 s0

    f

    s0 g f s0 ; s1 g f s0 ; s2 g f s0 ; s3 g

    b

    a b b

    b

    a

    a

    a

    49

    Constructing a DFA from a regular expression

  • 8/6/2019 Color Lecture Notes

    50/237

    DFA

    DFA

    NFA

    RE

    minimized

    moves

    RE NFA w/ movesbuild NFA for each termconnect them with moves

    NFA w/ moves to DFAconstruct the simulationthe subset construction

    DFA minimized DFA

    merge compatible states

    DFA REconstruct Rki j R

    k 1

    ik

    Rk 1kk

    Rk 1k j

    Rk 1i j

    50

    RE to NFA

  • 8/6/2019 Color Lecture Notes

    51/237

    N

    N

    a

    a

    N

    A

    B

    AN(A)

    N(B) B

    N AB

    A N(A) N(B) B

    N A

    AN(A)

    51

    RE to NFA: example

  • 8/6/2019 Color Lecture Notes

    52/237

    a b abb

    a b

    1

    2 3

    6

    4 5

    a

    b

    a b

    0 1

    2 3

    6

    4 5

    7

    a

    b

    abb7 8 9 10

    a b b

    52

    NFA to DFA: the subset construction

  • 8/6/2019 Color Lecture Notes

    53/237

    Input: NFA NOutput: A DFA D with states Dstatesand transitions Dtrans

    such that L D L NMethod: Let s be a state in N and T be a set of states,

    and using the following operations:

    Operation Definition

    -closure

    s

    set of NFA states reachable from NFA states

    on -transitions alone-closure T set of NFA states reachable from some NFA state s in T on -transitions alone

    move T a set of NFA states to which there is a transition on input symbol afrom some NFA state s in T

    add state T

    -closure

    s0

    unmarked to Dstateswhile

    unmarked state T in Dstatesmark Tfor each input symbol a

    U -closure move T a

    if U

    Dstatesthen add U to Dstatesunmarked

    Dtrans T a Uendfor

    endwhile

    -closure s0 is the start state of DA state of D is accepting if it contains at least one accepting state in N

    53

    NFA to DFA using subset construction: example 2

  • 8/6/2019 Color Lecture Notes

    54/237

    0 1

    2 3

    6

    4 5

    7

    a

    b

    8 9 10a b b

    A 0 1 2 4 7 D 1 2 4 5 6 7 9

    B 1 2 3 4 6 7 8 E 1 2 4 5 6 7 10

    C 1 2 4 5 6 7

    a b A B C

    B B D

    C B C

    D B E E B C

    54

    Limits of regular languages

    Not all languages are regular

  • 8/6/2019 Color Lecture Notes

    55/237

    g g g

    One cannot construct DFAs to recognize these languages:

    L

    pkqk

    L

    wcwr w

    Note: neither of these is a regular expression!

    (DFAs cannot count!)

    But, this is a little subtle. One can construct DFAs for:

    alternating 0s and 1s

    1

    01

    0

    sets of pairs of 0s and 1s

    01 10

    55

    So what is hard?

    Language features that can cause problems:

  • 8/6/2019 Color Lecture Notes

    56/237

    reserved wordsPL/I had no reserved words

    significant blanks

    FORTRAN and Algol68 ignore blanks

    string constants

    special characters in strings

    , , ,

    finite closures

    some languages limit identifier lengthsadds states to count length

    FORTRAN 66 6 characters

    These can be swept under the rug in the language design

    56

    How bad can it get?

  • 8/6/2019 Color Lecture Notes

    57/237

    57

  • 8/6/2019 Color Lecture Notes

    58/237

    Chapter 3: LL Parsing

    58

    The role of the parser

    tokens

  • 8/6/2019 Color Lecture Notes

    59/237

    code

    source tokens

    errors

    scanner parser IR

    Parser

    performs context-free syntax analysis

    guides context-sensitive analysis

    constructs an intermediate representation produces meaningful error messages

    attempts error correction

    For the next few weeks, we will look at parser constructionCopyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/orfee. Request permission to publish from [email protected].

    59

    Syntax analysis

  • 8/6/2019 Color Lecture Notes

    60/237

    Context-free syntax is specified with a context-free grammar.

    Formally, a CFG G is a 4-tuple

    Vt Vn S P , where:

    Vt is the set of terminal symbols in the grammar.For our purposes, Vt is the set of tokens returned by the scanner.

    Vn, the nonterminals, is a set of syntactic variables that denote sets of(sub)strings occurring in the language.These are used to impose a structure on the grammar.

    S is a distinguished nonterminal S Vn denoting the entire set of stringsin L G .This is sometimes called a goal symbol.

    P is a finite set of productions specifying how terminals and non-terminals

    can be combined to form strings in the language.Each production must have a single non-terminal on its left hand side.

    The set V Vt Vn is called the vocabulary of G

    60

    Notation and terminology

  • 8/6/2019 Color Lecture Notes

    61/237

    a

    b

    c

    Vt

    A

    B

    C

    Vn

    U

    V

    W

    V

    V

    u v w Vt

    If A then A is a single-step derivationusing A

    Similarly, and denote derivations of 0 and 1 steps

    If S then is said to be a sentential formof G

    L G w Vt S w , w L G is called a sentence of G

    Note, L G V S V

    t61

    Syntax analysis

    Grammars are often written in Backus-Naur form (BNF).

  • 8/6/2019 Color Lecture Notes

    62/237

    G a a s a e o te tte ac us au o ( )

    Example:

    1 goal :: expr

    2 expr :: expr op expr

    3

    4

    5 op ::

    6

    7

    8

    This describes simple expressions over numbers and identifiers.

    In a BNF for a grammar, we represent

    1. non-terminals with angle brackets or capital letters2. terminals with

    font or underline

    3. productions as in the example

    62

    Scanning vs. parsing

    Where do we draw the line?

  • 8/6/2019 Color Lecture Notes

    63/237

    term :: 0 9

    0

    op ::

    expr :: term op term

    Regular expressions are used to classify:

    identifiers, numbers, keywords

    REs are more concise and simpler for tokens than a grammar

    more efficient scanners can be built from REs (DFAs) than grammars

    Context-free grammars are used to count:

    brackets: , . . . , . . . . . .

    imparting structure: expressions

    Syntactic analysis is complicated enough: grammar for C has around 200

    productions. Factoring out lexical analysis as a separate phase makes

    compiler more manageable.

    63

    Derivations

    We can view the productions of a CFG as rewriting rules

  • 8/6/2019 Color Lecture Notes

    64/237

    We can view the productions of a CFG as rewriting rules.

    Using our example CFG:

    goal

    expr

    expr

    op

    expr

    expr op expr op expr

    id,

    op

    expr

    op

    expr

    id, expr

    op

    expr

    id, num,

    op

    expr

    id,

    num,

    expr

    id, num, id,

    We have derived the sentence

    .

    We denote this

    goal

    .

    Such a sequence of rewrites is a derivationor a parse.

    The process of discovering a derivation is called parsing.

    64

    Derivations

    At h t h t i l t l

  • 8/6/2019 Color Lecture Notes

    65/237

    At each step, we chose a non-terminal to replace.

    This choice can lead to different derivations.

    Two are of particular interest:

    leftmost derivationthe leftmost non-terminal is replaced at each step

    rightmost derivationthe rightmost non-terminal is replaced at each step

    The previous example was a leftmost derivation.

    65

    Rightmost derivation

    F th t i

  • 8/6/2019 Color Lecture Notes

    66/237

    For the string

    :

    goal expr

    expr op expr

    expr op id,

    expr id,

    expr op expr id,

    expr

    op

    num,

    id,

    expr

    num,

    id,

    id,

    num,

    id,

    Again, goal .

    66

    Precedence

  • 8/6/2019 Color Lecture Notes

    67/237

    goal

    expr

    expr op expr

    expr op expr *

    +

    Treewalk evaluation computes( )

    the wrong answer!

    Should be

    (

    )

    67

    Precedence

    These two derivations point out a problem with the grammar.

  • 8/6/2019 Color Lecture Notes

    68/237

    It has no notion of precedence, or implied order of evaluation.

    To add precedence takes additional machinery:

    1 goal :: expr

    2

    expr

    ::

    expr

    term

    3

    expr

    term

    4 term

    5 term :: term factor

    6 term factor

    7 factor

    8 factor ::

    9

    This grammar enforces a precedence on the derivation:

    terms must be derived from expressions

    forces the correct tree

    68

    Precedence

    Now for the string :

  • 8/6/2019 Color Lecture Notes

    69/237

    Now, for the string

    :

    goal expr

    expr term

    expr term factor

    expr term id,

    expr factor id,

    expr

    num,

    id,

    term

    num,

    id,

    factor

    num,

    id,

    id,

    num,

    id,

    Again, goal , but this time, we build the desired tree.

    69

    Precedence

    l

  • 8/6/2019 Color Lecture Notes

    70/237

    expr

    expr

    +

    term

    factor

    goal

    term

    *term

    factor

    factor

    Treewalk evaluation computes

    (

    )

    70

    Ambiguity

    If a grammar has more than one derivation for a single sentential form,

    h i i bi

  • 8/6/2019 Color Lecture Notes

    71/237

    then it is ambiguous

    Example: stmt ::=

    expr

    stmt

    expr stmt stmt

    Consider deriving the sentential form:

    E1 E2 S1 S2

    It has two derivations.

    This ambiguity is purely grammatical.

    It is a context-freeambiguity.

    71

    Ambiguity

    May be able to eliminate ambiguities by rearranging the grammar:

  • 8/6/2019 Color Lecture Notes

    72/237

    May be able to eliminate ambiguities by rearranging the grammar: stmt ::= matched

    unmatched

    matched ::= expr matched matched

    unmatched ::= expr stmt

    expr matched unmatched

    This generates the same language as the ambiguous grammar, but applies

    the common sense rule:

    match each

    with the closest unmatched

    This is most likely the language designers intent.

    72

    Ambiguity

    Ambiguity is often due to confusion in the context-free specification.

  • 8/6/2019 Color Lecture Notes

    73/237

    Context-sensitive confusions can arise from overloading.

    Example:

    In many Algol-like languages,

    could be a function or subscripted variable.

    Disambiguating this statement requires context:

    need valuesof declarations not context-free really an issue of type

    Rather than complicate parsing, we will handle this separately.

    73

    Parsing: the big picture

  • 8/6/2019 Color Lecture Notes

    74/237

    parser

    generator

    code

    parser

    tokens

    IR

    grammar

    Our goal is a flexible parser generator system

    74

    Top-down versus bottom-up

    Top-down parsers

  • 8/6/2019 Color Lecture Notes

    75/237

    start at the root of derivation tree and fill in

    picks a production and tries to match the input

    may require backtracking

    some grammars are backtrack-free (predictive)

    Bottom-up parsers

    start at the leaves and fill in

    start in a state valid for legal first tokens

    as input is consumed, change state to encode possibilities

    (recognize valid prefixes)

    use a stack to store both state and sentential forms

    75

    Top-down parsing

    A top-down parser starts with the root of the parse tree, labelled with the

    start or goal symbol of the grammar

  • 8/6/2019 Color Lecture Notes

    76/237

    start or goal symbol of the grammar.

    To build a parse, it repeats the following steps until the fringe of the parse

    tree matches the input string

    1. At a node labelled A, select a production A

    and construct theappropriate child for each symbol of

    2. When a terminal is added to the fringe that doesnt match the input

    string, backtrack

    3. Find the next node to be expanded (must have a label in Vn)

    The key is selecting the right production in step 1

    should be guided by input string

    76

    Simple expression grammar

    Recall our grammar for simple expressions:

  • 8/6/2019 Color Lecture Notes

    77/237

    1 goal :: expr

    2 expr :: expr term

    3 expr term

    4 term

    5 term :: term factor

    6

    term

    factor

    7

    factor

    8 factor ::

    9

    Consider the input string

    77

  • 8/6/2019 Color Lecture Notes

    78/237

    Example

    Another possible parse for

  • 8/6/2019 Color Lecture Notes

    79/237

    Prodn Sentential form Input

    goal 1 expr

    2

    expr

    term

    2 expr term term 2 expr term 2 expr term 2

    If the parser makes the wrong choices, expansion doesnt terminate.This isnt a good property for a parser to have.

    (Parsers should terminate!)

    79

    Left-recursion

    Top-down parsers cannot handle left-recursion in a grammar

  • 8/6/2019 Color Lecture Notes

    80/237

    Formally, a grammar is left-recursive if

    A Vn such that A

    A for some string

    Our simple expression grammar is left-recursive

    80

    Eliminating left-recursion

    To remove left-recursion, we can transform the grammar

  • 8/6/2019 Color Lecture Notes

    81/237

    Consider the grammar fragment:

    foo :: foo

    where and do not start with foo

    We can rewrite this as:

    foo :: bar bar :: bar

    where bar is a new non-terminal

    This fragment contains no left-recursion

    81

    Example

    Our expression grammar contains two cases of left-recursion

    expr :: expr term

  • 8/6/2019 Color Lecture Notes

    82/237

    expr

    term

    term

    term :: term factor

    term factor

    factor

    Applying the transformation gives

    expr :: term expr

    expr :: term expr

    term

    expr

    term ::

    factor

    term

    term ::

    factor

    term

    factor term

    With this grammar, a top-down parser will

    terminate backtrack on some inputs

    82

    Example

    This cleaner grammar defines the same language

  • 8/6/2019 Color Lecture Notes

    83/237

    1

    goal

    ::

    expr

    2 expr :: term expr

    3 term expr

    4 term

    5 term :: factor term

    6 factor term

    7 factor

    8 factor ::

    9

    It is

    right-recursive

    free of productions

    Unfortunately, it generates different associativity

    Same syntax, different meaning

    83

    Example

    Our long-suffering expression grammar:

  • 8/6/2019 Color Lecture Notes

    84/237

    1 goal :: expr

    2 expr :: term expr

    3 expr :: term expr

    4 term expr

    5 6 term ::

    factor

    term

    7 term ::

    factor

    term

    8

    factor

    term

    9

    10 factor ::

    11

    Recall, we factored out left-recursion

    84

    How much lookahead is needed?

    We saw that top-down parsers may need to backtrack when they select the

    wrong production

  • 8/6/2019 Color Lecture Notes

    85/237

    g p

    Do we need arbitrary lookahead to parse CFGs?

    in general, yes

    use the Earley or Cocke-Younger, Kasami algorithmsAho, Hopcroft, and Ullman, Problem 2.34

    Parsing, Translation and Compiling, Chapter 4

    Fortunately

    large subclasses of CFGs can be parsed with limited lookahead

    most programming language constructs can be expressed in a gram-

    mar that falls in these subclasses

    Among the interesting subclasses are:

    LL(1): left to right scan, left-most derivation, 1-token lookahead; and

    LR(1): left to right scan, right-most derivation, 1-token lookahead

    85

    Predictive parsing

    Basic idea:

  • 8/6/2019 Color Lecture Notes

    86/237

    For any two productions A , we would like a distinct way ofchoosing the correct production to expand.

    For some RHS G, define FIRST

    as the set of tokens that appear

    first in some string derived from That is, for some w Vt , w FIRST iff.

    w.

    Key property:

    Whenever two productions A

    and A

    both appear in the grammar,we would like

    FIRST

    FIRST

    This would allow the parser to make a correct choice with a lookahead ofonly one symbol!

    The example grammar has this property!

    86

    Left factoring

    What if a grammar does not have this property?

  • 8/6/2019 Color Lecture Notes

    87/237

    Sometimes, we can transform a grammar to have this property.

    For each non-terminal A find the longest prefix

    common to two or more of its alternatives.

    if then replace all of the A productions

    A 1 2 n

    withA A

    A 1 2 n

    where A is a new non-terminal.

    Repeat until no two alternatives for a single

    non-terminal have a common prefix.

    87

    Example

    Consider a right-recursiveversion of the expression grammar:

    1 l

  • 8/6/2019 Color Lecture Notes

    88/237

    1 goal :: expr

    2 expr :: term expr

    3 term expr

    4 term

    5 term :: factor term

    6 factor term

    7 factor

    8 factor ::

    9

    To choose between productions 2, 3, & 4, the parser must see past the

    or and look at the , , , or .

    FIRST

    2

    FIRST

    3

    FIRST

    4

    This grammar failsthe test.

    Note: This grammar is right-associative.

    88

    Example

    There are two nonterminals that must be left factored:

    t

  • 8/6/2019 Color Lecture Notes

    89/237

    expr :: term expr term expr

    term

    term :: factor term

    factor

    term

    factor

    Applying the transformation gives us:

    expr :: term expr

    expr :: expr

    expr

    term :: factor term

    term :: term

    term

    89

    Example

    Substituting back into the grammar yields

    1 l

  • 8/6/2019 Color Lecture Notes

    90/237

    1 goal :: expr

    2 expr :: term expr

    3 expr :: expr

    4 expr

    5 6 term ::

    factor

    term

    7 term ::

    term

    8

    term

    9

    10 factor ::

    11

    Now, selection requires only a single token lookahead.

    Note: This grammar is still right-associative.

    90

    Example

    Sentential form Input

    goal 1

    expr 2 term expr

  • 8/6/2019 Color Lecture Notes

    91/237

    2 term expr 6 factor term expr

    11 term expr

    term

    expr

    9 expr

    4

    expr

    expr

    2 term expr

    6

    factor

    term

    expr

    10 term expr

    term expr

    7 term expr

    term

    expr

    6

    factor

    term

    expr

    11 term expr

    term

    expr

    9 expr 5

    The next symbol determined each choice correctly.

    91

    Back to left-recursion elimination

    Given a left-factored CFG, to eliminate left-recursion:

  • 8/6/2019 Color Lecture Notes

    92/237

    if A A then replace all of the A productions

    A A

    with

    A NA

    N

    A A

    where N and A are new productions.

    Repeat until there are no left-recursive productions.

    92

    Generality

    Question:

    By left factoring and eliminating left-recursion can we transform

  • 8/6/2019 Color Lecture Notes

    93/237

    By left factoring and eliminating left-recursion, can we transforman arbitrary context-free grammar to a form where it can be pre-

    dictively parsed with a single token lookahead?

    Answer:

    Given a context-free grammar that doesnt meet our conditions,

    it is undecidable whether an equivalent grammar exists that does

    meet our conditions.

    Many context-free languagesdo not have such a grammar:

    an0bn n 1

    an1b2n n 1

    Must look past an arbitrary number of as to discover the 0 or the 1 and so

    determine the derivation.

    93

    Recursive descent parsing

    Now, we can produce a simple recursive descent parser from the (right-

    associative) grammar.

  • 8/6/2019 Color Lecture Notes

    94/237

    94

    Recursive descent parsing

  • 8/6/2019 Color Lecture Notes

    95/237

    95

    Building the tree

    One of the key jobs of the parser is to build an intermediate representation

    of the source code.

  • 8/6/2019 Color Lecture Notes

    96/237

    To build an abstract syntax tree, we can simply insert code at the appropri-

    ate points:

    can stack nodes

    ,

    can stack nodes ,

    can pop 3, build and push subtree

    can stack nodes

    ,

    can pop 3, build and push subtree

    can pop and return tree

    96

    Non-recursive predictive parsing

    Observation:

    Our recursive descent parser encodes state information in its run-

  • 8/6/2019 Color Lecture Notes

    97/237

    Our recursive descent parser encodes state information in its run time stack, or call stack.

    Using recursive procedure calls to implement a stack abstraction may not

    be particularly efficient.This suggests other implementation methods:

    explicit stack, hand-coded parser

    stack-based, table-driven parser

    97

    Non-recursive predictive parsing

    Now, a predictive parser looks like:

    stack

  • 8/6/2019 Color Lecture Notes

    98/237

    scannertable-driven

    parser IR

    parsing

    tables

    source

    code

    tokens

    Rather than writing code, we build tables.

    Building tables can be automated!

    98

    Table-driven parsers

    A parser generator system often looks like:

  • 8/6/2019 Color Lecture Notes

    99/237

    p g y

    scannertable-driven

    parserIR

    parsing

    tables

    stack

    source

    code

    tokens

    parser

    generatorgrammar

    This is true for both top-down (LL) and bottom-up (LR) parsers

    99

    Non-recursive predictive parsing

    Input: a string w and a parsing table M for G

  • 8/6/2019 Color Lecture Notes

    100/237

    Start Symbol

    X is a non-terminal

    M

    X Y1Y2 Yk

    Yk Yk

    1 Y1

    100

  • 8/6/2019 Color Lecture Notes

    101/237

    FIRST

    For a string of grammar symbols , define FIRST

    as:

    the set of terminal symbols that begin strings derived from :

    a Vt a

  • 8/6/2019 Color Lecture Notes

    102/237

    If then FIRST

    FIRST

    contains the set of tokens valid in the initial position in

    To build FIRST X :

    1. If X Vt then FIRST X is X

    2. If X then add to FIRST X .

    3. If X Y1Y2 Yk:

    (a) Put FIRST

    Y1 in FIRST X

    (b) i : 1 i k, if FIRST Y1 FIRST Yi

    1

    (i.e., Y1

    Yi

    1

    )then put FIRST

    Yi in FIRST X

    (c) If FIRST

    Y1 FIRST Yk then put in FIRST X

    Repeat until no more additions can be made.

    102

    FOLLOW

    For a non-terminal A, define FOLLOW A as

    the set of terminals that can appear immediately to the right of Ain some sentential form

  • 8/6/2019 Color Lecture Notes

    103/237

    in some sentential form

    Thus, a non-terminals FOLLOW set specifies the tokens that can legally

    appear after it.

    A terminal symbol has no FOLLOW set.

    To build FOLLOW A :

    1. Put $ in FOLLOW

    goal

    2. If A B:

    (a) Put FIRST in FOLLOW B

    (b) If

    (i.e.,A B) or FIRST

    (i.e., ) then put FOLLOW

    A

    in FOLLOW B

    Repeat until no more additions can be made

    103

    LL(1) grammars

    Previous definition

    A grammar G is LL(1) iff. for all non-terminals A, each distinct pair of pro-

    ductions A and A satisfy the condition FIRST

    FIRST .

  • 8/6/2019 Color Lecture Notes

    104/237

    What if A ?

    Revised definition

    A grammar G is LL(1) iff. for each set of productions A

    1

    2

    n:

    1. FIRST 1 FIRST 2 FIRST n are all pairwise disjoint

    2. If i then FIRST j

    FOLLOW

    A

    1 j n i j.

    If G is -free, condition 1 is sufficient.

    104

    LL(1) grammars

    Provable facts about LL(1) grammars:

    1. No left-recursive grammar is LL(1)

    2 No ambiguous grammar is LL(1)

  • 8/6/2019 Color Lecture Notes

    105/237

    2. No ambiguous grammar is LL(1)

    3. Some languages have no LL(1) grammar

    4. A free grammar where each alternative expansion for A begins with

    a distinct terminal is a simpleLL(1) grammar.

    Example

    S aS a

    is not LL(1) because FIRST aS FIRST a a

    S aS

    S aS accepts the same language and is LL(1)

    105

    LL(1) parse table construction

    Input: Grammar G

    Output: Parsing table M

    Method:

  • 8/6/2019 Color Lecture Notes

    106/237

    Method:

    1. productions A :

    (a)

    a FIRST

    , add A

    to M

    A

    a

    (b) If FIRST

    :

    i. b FOLLOW A , add A to M A b

    ii. If $ FOLLOW A then add A to M A $

    2. Set each undefined entry of M to

    If M A a with multiple entries then grammar is not LL(1).

    Note: recall a b Vt, so a b

    106

    Example

    Our long-suffering expression grammar:

    S E T FT

    E T E

    T

    T T E E E F

  • 8/6/2019 Color Lecture Notes

    107/237

    E

    E

    E F

    FIRST FOLLOW

    S

    $

    E

    $

    E $

    T $

    T $

    F $

    $

    S S

    E S

    E

    E E

    T E E

    T E

    E

    E

    E E

    E

    E

    T T FT T FT

    T

    T T T T T T T

    F F

    F

    107

    A grammar that is not LL(1)

    stmt :: expr stmt

    expr stmt stmt

    Left factored:

  • 8/6/2019 Color Lecture Notes

    108/237

    Left-factored:

    stmt ::

    expr

    stmt

    stmt

    stmt ::

    stmt

    Now, FIRST stmt

    Also, FOLLOW

    stmt $

    But, FIRST stmt

    FOLLOW

    stmt

    On seeing , conflict between choosing

    stmt :: stmt and stmt ::

    grammar is not LL(1)!

    The fix:

    Put priority on stmt :: stmt to associate with clos-

    est previous

    .

    108

    Error recovery

    Key notion:

    For each non-terminal, construct a set of terminals on which the parser

    can synchronize

  • 8/6/2019 Color Lecture Notes

    109/237

    When an error occurs looking for A, scan until an element of SYNCH

    A

    is found

    Building SYNCH:

    1. a FOLLOW

    A

    a SYNCH

    A

    2. place keywords that start statements in SYNCH

    A

    3. add symbols in FIRST A to SYNCH A

    If we cant match a terminal on top of stack:

    1. pop the terminal

    2. print a message saying the terminal was inserted

    3. continue the parse

    (i.e., SYNCH a Vt a )

    109

  • 8/6/2019 Color Lecture Notes

    110/237

    Chapter 4: LR Parsing

    110

    Some definitions

    Recall

    For a grammar G, with start symbol S, any string such that S

    iscalled a sentential form

  • 8/6/2019 Color Lecture Notes

    111/237

    called a sentential form

    If Vt , then is called a sentence in L G

    Otherwise it is just a sentential form (not a sentence in L

    G

    )

    A left-sentential form is a sentential form that occurs in the leftmost deriva-

    tion of some sentence.

    A right-sentential form is a sentential form that occurs in the rightmost

    derivation of some sentence.

    Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/orfee. Request permission to publish from [email protected].

    111

    Bottom-up parsing

    Goal:

    Given an input string w and a grammar G, construct a parse tree by

  • 8/6/2019 Color Lecture Notes

    112/237

    starting at the leaves and working to the root.

    The parser repeatedly matches a right-sentential form from the languageagainst the trees upper frontier.

    At each match, it applies a reduction to build on the frontier:

    each reduction matches an upper frontier of the partially built tree to

    the RHS of some production

    each reduction adds a node on top of the frontier

    The final result is a rightmost derivation, in reverse.

    112

    Example

    Consider the grammar

    1 S AB

    2 A A

  • 8/6/2019 Color Lecture Notes

    113/237

    3

    4 B

    and the input string

    Prodn. Sentential Form

    3

    2

    A

    4 A

    1 aABe S

    The trick appears to be scanning the input and finding valid sentential

    forms.

    113

    Handles

    What are we trying to find?

    A substring of the trees upper frontier that

    matches some production A where reducing to A is one step in

  • 8/6/2019 Color Lecture Notes

    114/237

    matches some production A where reducing to A is one step in

    the reverse of a rightmost derivation

    We call such a string a handle.

    Formally:

    a handle of a right-sentential form is a production A and a po-

    sition in where may be found and replaced by A to produce the

    previous right-sentential form in a rightmost derivation of

    i.e., if S rm Aw rm w then A in the position following is a

    handle of w

    Because is a right-sentential form, the substring to the right of a handle

    contains only terminal symbols.

    114

    Handles

    S

  • 8/6/2019 Color Lecture Notes

    115/237

    A

    wThe handle A in the parse tree for w

    115

    Handles

    Theorem:

    If G is unambiguous then every right-sentential form has a unique han-

  • 8/6/2019 Color Lecture Notes

    116/237

    dle.

    Proof: (by definition)

    1. G is unambiguous rightmost derivation is unique

    2. a unique production A applied to take i 1 to i

    3.

    a unique position k at which A

    is applied

    4. a unique handle A

    116

  • 8/6/2019 Color Lecture Notes

    117/237

    Handle-pruning

    The process to construct a bottom-up parse is called handle-pruning.

    To construct a rightmost derivation

  • 8/6/2019 Color Lecture Notes

    118/237

    S

    0 1 2 n

    1 n w

    we set i to n and apply the following simple algorithm

    n

    1.

    Ai i i

    2. i Ai i 1

    This takes2n steps, wheren is the length of the derivation

    118

    Stack implementation

    One scheme to implement a handle-pruning, bottom-up parser is called a

    shift-reduceparser.

    Shift-reduce parsers use a stack and an input buffer

  • 8/6/2019 Color Lecture Notes

    119/237

    1. initialize stack with $

    2. Repeat until the top of the stack is the goal symbol and the input token

    is $

    a) find the handle

    if we dont have a handle on top of the stack, shiftan input symbolonto the stack

    b) prune the handle

    if we have a handle A

    on the stack, reducei) pop symbols off the stack

    ii) push A onto the stack

    119

    Example: back to

    1

    goal

    ::

    expr

    2

    expr

    ::

    expr

    term

    Stack Input Action

    $ shift$

    reduce 9

    $ factor reduce 7

    $

    term

    reduce 4

    $ expr shift

  • 8/6/2019 Color Lecture Notes

    120/237

    3 expr term4

    term

    5 term :: term factor

    6 term factor7

    factor

    8 factor :: 9

    $ expr shift$

    expr shift$

    expr

    reduce 8$ expr factor reduce 7

    $ expr term shift$

    expr term shift$

    expr

    term

    reduce 9$ expr term factor reduce 5

    $ expr term reduce 3

    $ expr reduce 1$

    goal

    accept

    1. Shift until top of stack is the right end of a handle

    2. Find the left end of the handle and reduce

    5 shifts + 9 reduces + 1 accept

    120

    Shift-reduce parsing

    Shift-reduce parsers are simple to understand

    A shift-reduce parser has just four canonical actions:

  • 8/6/2019 Color Lecture Notes

    121/237

    A shift reduce parser has just four canonical actions:

    1. shift next input symbol is shifted onto the top of the stack

    2. reduce right end of handle is on top of stack;

    locate left end of handle within the stack;

    pop handle off stack and push appropriate non-terminal LHS

    3. accept terminate parsing and signal success

    4. error call an error recovery routine

    The key problem: to recognize handles (not covered in this course).

    121

    LR k grammars

    Informally, we say that a grammar G is LR k if, given a rightmost derivation

    S

    0

    1

    2

    n

    w

    we can, for each right-sentential form in the derivation,

  • 8/6/2019 Color Lecture Notes

    122/237

    1. isolate the handle of each right-sentential form, and

    2. determine the production by which to reduce

    by scanning i from left to right, going at most k symbols beyond the right

    end of the handle of i.

    122

    LR k grammars

    Formally, a grammar G is LR k iff.:

    1. S

    rm Aw

    rm w, and2. S rm Bx rm y, and

    3 FIRST FIRST

  • 8/6/2019 Color Lecture Notes

    123/237

    3. FIRSTk w FIRSTk y

    Ay

    Bx

    i.e., Assume sentential forms w and y, with common prefix and

    common k-symbol lookahead FIRSTk y FIRSTk w , such that w re-

    duces to Aw and y reduces to Bx.

    But, the common prefix means y also reduces to Ay, for the same re-

    sult.

    Thus Ay

    Bx.

    123

    Why study LR grammars?

    LR(1) grammars are often used to construct parsers.

    We call these parsers LR(1) parsers.

    everyones favorite parser

  • 8/6/2019 Color Lecture Notes

    124/237

    virtually all context-free programming language constructs can be ex-

    pressed in an LR(1) form

    LR grammars are the most general grammars parsable by a determin-

    istic, bottom-up parser

    efficient parsers can be implemented for LR(1) grammars

    LR parsers detect an error as soon as possible in a left-to-right scanof the input

    LR grammars describe a proper superset of the languages recognized

    by predictive (i.e., LL) parsers

    LL k : recognize use of a production A seeing first k symbols of

    LR

    k

    : recognize occurrence of (the handle) having seen all of what

    is derived from plus k symbols of lookahead

    124

    Left versus right recursion

    Right Recursion:

    needed for termination in predictive parsers

    requires more stack space

  • 8/6/2019 Color Lecture Notes

    125/237

    q p

    right associative operators

    Left Recursion:

    works fine in bottom-up parsers

    limits required stack space left associative operators

    Rule of thumb:

    right recursion for top-down parsers

    left recursion for bottom-up parsers

    125

    Parsing review

    Recursive descent

    A hand coded recursive descent parser directly encodes a grammar(typically an LL(1) grammar) into a series of mutually recursive proce-

    dures It has most of the linguistic limitations of LL(1)

  • 8/6/2019 Color Lecture Notes

    126/237

    dures. It has most of the linguistic limitations of LL(1).

    LL k

    An LL k parser must be able to recognize the use of a production after

    seeing only the first k symbols of its right hand side.

    LR

    k

    An LR k parser must be able to recognize the occurrence of the righthand side of a production after having seen all that is derived from that

    right hand side with k symbols of lookahead.

    126

  • 8/6/2019 Color Lecture Notes

    127/237

    Chapter 5: JavaCC and JTB

    127

    The Java Compiler Compiler

    Can be thought of as Lex and Yacc for Java.

    It is based on LL(k) rather than LALR(1).

  • 8/6/2019 Color Lecture Notes

    128/237

    Grammars are written in EBNF.

    The Java Compiler Compiler transforms an EBNF grammar into anLL(k) parser.

    The JavaCC grammar can have embedded action code written in Java,

    just like a Yacc grammar can have embedded action code written in C.

    The lookahead can be changed by writing .

    The whole input is given in just one file (not two).

    128

    The JavaCC input format

  • 8/6/2019 Color Lecture Notes

    129/237

    One file:

    header

    token specifications for lexical analysis

    grammar

    129

    The JavaCC input format

    Example of a token specification:

  • 8/6/2019 Color Lecture Notes

    130/237

    Example of a production:

    130

  • 8/6/2019 Color Lecture Notes

    131/237

    Generating a parser with JavaCC

    131

    The Visitor Pattern

    For object-oriented programming,

    the Visitor pattern enables

  • 8/6/2019 Color Lecture Notes

    132/237

    the definition of a new operation

    on an object structure

    without changing the classes

    of the objects.

    Gamma, Helm, Johnson, Vlissides:

    Design Patterns, 1995.

    132

    Sneak Preview

  • 8/6/2019 Color Lecture Notes

    133/237

    When using the Visitor pattern,

    the set of classes must be fixed in advance, and

    each class must have an accept method.

    133

    First Approach: Instanceof and Type Casts

    The running Java example: summing an integer list.

  • 8/6/2019 Color Lecture Notes

    134/237

    134

    First Approach: Instanceof and Type Casts

  • 8/6/2019 Color Lecture Notes

    135/237

    Advantage: The code is written without touching the classes

    and

    .

    Drawback: The code constantly uses type casts and to

    determine what class of object it is considering.

    135

    Second Approach: Dedicated Methods

    The first approach is not object-oriented!

    To access parts of an object the classical approach is to use dedicated

  • 8/6/2019 Color Lecture Notes

    136/237

    To access parts of an object, the classical approach is to use dedicated

    methods which both access and act on the subobjects.

    We can now compute the sum of all components of a given

    -object

    by writing

    .

    136

    Second Approach: Dedicated Methods

  • 8/6/2019 Color Lecture Notes

    137/237

    Advantage: The type casts and operations have disappeared,

    and the code can be written in a systematic way.

    Disadvantage: For each new operation on -objects, new dedicated

    methods have to be written, and all classes must be recompiled.

    137

    Third Approach: The Visitor Pattern

    The Idea:

    Divide the code into an object structure and a Visitor (akin to Func-tional Programming!)

    Insert an method in each class Each accept method takes a

  • 8/6/2019 Color Lecture Notes

    138/237

    Insert an

    method in each class. Each accept method takes aVisitor as argument.

    A Visitor contains a

    method for each class (overloading!) Amethod for a class C takes an argument of type C.

    138

    Third Approach: The Visitor Pattern

    The purpose of the

    methods is to

    invoke the

    method in the Visitor which can handle the currentobject.

  • 8/6/2019 Color Lecture Notes

    139/237

    139

    Third Approach: The Visitor Pattern

    The control flow goes back and forth between the

    methods inthe Visitor and the

    methods in the object structure.

  • 8/6/2019 Color Lecture Notes

    140/237

    Notice: The methods describe both1) actions, and 2) access of subobjects.

    140

    Comparison

    The Visitor pattern combines the advantages of the two other approaches.

    Frequent Frequenttype casts? recompilation?

    Instanceof and type casts Yes NoDedicated methods No Yes

  • 8/6/2019 Color Lecture Notes

    141/237

    Dedicated methods No YesThe Visitor pattern No No

    The advantage of Visitors: New methods without recompilation!

    Requirement for using Visitors: All classes must have an accept method.

    Tools that use the Visitor pattern:

    JJTree (from Sun Microsystems) and the Java Tree Builder (from Pur-due University), both frontends for The Java Compiler Compiler from

    Sun Microsystems.

    141

    Visitors: Summary

    Visitor makes adding new operations easy. Simply write a new

    visitor.

    A visitor gathers related operations. It also separates unrelated

    ones.

  • 8/6/2019 Color Lecture Notes

    142/237

    ones.

    Adding new classes to the object structure is hard. Key consid-eration: are you most likely to change the algorithm applied over an

    object structure, or are you most like to change the classes of objects

    that make up the structure.

    Visitors can accumulate state.

    Visitor can break encapsulation. Visitors approach assumes that

    the interface of the data structure classes is powerful enough to let

    visitors do their job. As a result, the pattern often forces you to providepublic operations that access internal state, which may compromise

    its encapsulation.

    142

    The Java Tree Builder

    The Java Tree Builder (JTB) has been developed here at Purdue in my

    group.

    JTB is a frontend for The Java Compiler Compiler.

  • 8/6/2019 Color Lecture Notes

    143/237

    JTB supports the building of syntax trees which can be traversed usingvisitors.

    JTB transforms a bare JavaCC grammar into three components:

    a JavaCC grammar with embedded Java code for building a syntax

    tree;

    one class for every form of syntax tree node; and

    a default visitor which can do a depth-first traversal of a syntax tree.

    143

    The Java Tree Builder

    The produced JavaCC grammar can then be processed by the Java Com-

    piler Compiler to give a parser which produces syntax trees.

    The produced syntax trees can now be traversed by a Java program by

    writing subclasses of the default visitor.

  • 8/6/2019 Color Lecture Notes

    144/237

    writing subclasses of the default visitor.

    JavaCCgrammar

    JTB JavaCC grammarwith embeddedJava code

    Java CompilerCompiler

    Parser

    Program

    Syntax treewith accept methods

    Syntax-tree-nodeclasses

    Default visitor

    144

    Using JTB

  • 8/6/2019 Color Lecture Notes

    145/237

    145

    Example (simplified)

    For example, consider the Java 1.1 production

    JTB d

  • 8/6/2019 Color Lecture Notes

    146/237

    JTB produces:

    Notice that the production returns a syntax tree represented as an

    object.

    146

    Example (simplified)JTB produces a syntax-tree-node class for :

  • 8/6/2019 Color Lecture Notes

    147/237

    Notice the method; it invokes the method for in

    the default visitor.

    147

    Example (simplified)The default visitor looks like this:

  • 8/6/2019 Color Lecture Notes

    148/237

    Notice the body of the method which visits each of the three subtrees of

    the

    node.

    148

    Example (simplified)Here is an example of a program which operates on syntax trees for Java

    1.1 programs. The program prints the right-hand side of every assignment.

    The entire program is six lines:

  • 8/6/2019 Color Lecture Notes

    149/237

    When this visitor is passed to the root of the syntax tree, the depth-firsttraversal will begin, and when

    nodes are reached, the method

    in

    is executed.

    Notice the use of

    . It is a visitor which pretty prints Java1.1 programs.

    JTB is bootstrapped.

    149

  • 8/6/2019 Color Lecture Notes

    150/237

    Chapter 6: Semantic Analysis

    150

    Semantic Analysis

    The compilation process is driven by the syntactic structure of the program

    as discovered by the parser

    Semantic routines:

    interpret meaning of the program based on its syntactic structure

  • 8/6/2019 Color Lecture Notes

    151/237

    two purposes: finish analysis by deriving context-sensitive information

    begin synthesis by generating the IR or target code

    associated with individual productions of a context free grammar orsubtrees of a syntax tree

    Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/orfee. Request permission to publish from [email protected].

    151

    Context-sensitive analysis

    What context-sensitive questions might the compiler ask?

    1. Is scalar, an array, or a function?

    2. Is

    declared before it is used?

    3. Are any names declared but not used?

    4 Which declaration of does this reference?

  • 8/6/2019 Color Lecture Notes

    152/237

    4. Which declaration of does this reference?

    5. Is an expression type-consistent?

    6. Does the dimension of a reference match the declaration?

    7. Where can

    be stored? (heap, stack,

    )

    8. Does

    reference the result of a malloc()?9. Is defined before it is used?

    10. Is an array reference in bounds?

    11. Does function produce a constant value?

    12. Can

    be implemented as a memo-function?

    These cannot be answered with a context-free grammar

    152

    Context-sensitive analysis

    Why is context-sensitive analysis hard?

    answers depend on values, not syntax

    questions and answers involve non-local information

  • 8/6/2019 Color Lecture Notes

    153/237

    answers may involve computation

    Several alternatives:

    abstract syntax tree specify non-local computations

    (attribute grammars) automatic evaluators

    symbol tables central store for factsexpress checking code

    language design simplify languageavoid problems

    153

    Symbol tables

    For compile-timeefficiency, compilers often use a symbol table:

    associates lexical names (symbols) with their attributes

    What items should be entered?

    variable names

  • 8/6/2019 Color Lecture Notes

    154/237

    variable names

    defined constants

    procedure and function names

    literal constants and strings

    source text labels

    compiler-generated temporaries (well get there)

    Separate table for structure layouts (types) (field offsets and lengths)

    A symbol table is a compile-time structure

    154

    Symbol table information

    What kind of information might the compiler need?

    textual name

    data type

    dimension information (for aggregates)

  • 8/6/2019 Color Lecture Notes

    155/237

    ( gg g )

    declaring procedure

    lexical level of declaration

    storage class (base address)

    offset in storage

    if record, pointer to structure table

    if parameter, by-reference or by-value?

    can it be aliased? to what other names?

    number and type of arguments to functions

    155

    Nested scopes: block-structured symbol tables

    What information is needed?

    when we ask about a name, we want the most recent declaration

    the declaration may be from the current scope or some enclosing

    scope

    innermost scope overrides declarations from outer scopes

  • 8/6/2019 Color Lecture Notes

    156/237

    Key point: new declarations (usually) occur only in current scope

    What operations do we need?

    binds key to value

    returns value bound to key

    remembers current state of table

    restores table to state at most recent scope thathas not been ended

    May need to preserve list of locals for the debugger

    156

    Attribute information

    Attributes are internal representation of declarations

    Symbol table associates names with attributes

  • 8/6/2019 Color Lecture Notes

    157/237

    Names may have different attributes depending on their meaning:

    variables: type, procedure level, frame offset

    types: type descriptor, data size/alignment

    constants: type, value

    procedures: formals (names/types), result type, block information (lo-

    cal decls.), frame size

    157

    Type expressions

    Type expressions are a textual representation for types:

    1. basic types: boolean, char, integer, real, etc.

    2. type names

  • 8/6/2019 Color Lecture Notes

    158/237

    3. constructed types (constructors applied to type expressions):(a) array

    I

    T

    denotes array of elements type T, index type I

    e.g., array 1 10 integer

    (b) T1 T2 denotes Cartesian product of type expressions T1 and T2

    (c) records: fields have names

    e.g., record

    integer

    real

    (d) pointer T denotes the type pointer to object of type T

    (e) D

    R denotes type of function mapping domain D to range Re.g., integer integer integer

    158

    Type descriptors

    Type descriptors are compile-time structures representing type expres-

    sions

    e.g., char char pointer integer

  • 8/6/2019 Color Lecture Notes

    159/237

    g , p g

    !

    char char

    pointer

    integer

    or

    !

    char

    pointer

    integer

    159

    Type compatibility

    Type checking needs to determine type equivalence

    Two approaches:

    Name equivalence: each type name is a distinct type

  • 8/6/2019 Color Lecture Notes

    160/237

    Structural equivalence: two types are equivalent iff. they have the same

    structure (after substituting type expressions for type names)

    s t iff. s and t are the same basic types

    array

    s1 s2 array t1 t2 iff. s1 t1 and s2 t2

    s1 s2 t1 t2 iff. s1 t1 and s2 t2

    pointer

    s

    pointer

    t

    iff. s t

    s1 s2 t1 t2 iff. s1 t1 and s2 t2

    160

    Type compatibility: example

    Consider:

  • 8/6/2019 Color Lecture Notes

    161/237

    Under name equivalence:

    and have the same type

    ,

    and

    have the same type

    and have different type

    Under structural equivalence all variables have the same type

    Ada/Pascal/Modula-2/Tiger are somewhat confusing: they treat distinct

    type definitions as distinct types, so

    has different type from

    and

    161

    Type compatibility: Pascal-style name equivalence

    Build compile-time structure called a type graph:

    each constructor or basic type creates a node

    each name creates a leaf (associated with the types descriptor)

  • 8/6/2019 Color Lecture Notes

    162/237

    n e x t l a s t

    l i n k = pointer

    c e l l

    pointer

    p

    pointer

    q r

    Type expressions are equivalent if they are represented by the same nodein the graph

    162

    Type compatibility: recursive types

    Consider:

    We may want to eliminate the names from the type graph

  • 8/6/2019 Color Lecture Notes

    163/237

    Eliminating name from type graph for record:

    record=c e l l

    i n f o integer

    n e x t pointer

    c e l l

    163

    Type compatibility: recursive types

    Allowing cycles in the type graph eliminates :

  • 8/6/2019 Color Lecture Notes

    164/237

    record=c e l l

    i n f o integer

    n e x t pointer

    164

  • 8/6/2019 Color Lecture Notes

    165/237

    Chapter 7: Translation and Simplification

    165

    Tiger IR trees: Expressions

    i

    CONSTInteger constant i

    n

    NAMESymbolic constant n [a code label]

    t

    TEMPTemporary t [one of any number of registers]

    o e1 e2

    BINOPApplication of binary operator o:

    PLUS, MINUS, MUL, DIV

  • 8/6/2019 Color Lecture Notes

    166/237

    AND, OR, XOR [bitwise logical]LSHIFT, RSHIFT [logical shifts]ARSHIFT [arithmetic right-shift]

    to integer operands e1 (evaluated first) and e2 (evaluated second)

    e

    MEMContents of a word of memory starting at address e

    f

    e1 en

    CALLProcedure call; expression f is evaluated before arguments e1 en

    s e

    ESEQExpression sequence; evaluate s for side-effects, then e for result

    Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and full citation on the first page. To copyotherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/orfee. Request permission to publish from [email protected].

    166

    Tiger IR trees: Statements

    t

    TEMP e

    MOVE

    Evaluate e into temporary t

    e1

    MEM e2

    MOVE

    Evaluate e1 yielding address a, e2 into word at a

    EXPEvaluate e and discard result

  • 8/6/2019 Color Lecture Notes

    167/237

    e

    e

    l1 ln

    JUMPTransfer control to address e; l1 ln are all possible values for e

    o e1 e2 t f

    CJUMPEvaluate e1 then e2, yielding a and b, respectively; compare a with busing relational operator o:

    EQ, NE [signed and unsigned integers]LT, GT, LE, GE [signed]ULT, ULE, UGT, UGE [unsigned]

    jump to t if true, f if false

    s1 s2

    SEQStatement s1 followed by s2

    n

    LABELDefine constant value of name n as current code address; NAME ncan be used as target of jumps, calls, etc.

    167

    Kinds of expressions

    Expression kinds indicate how expression might be used

    Ex(exp) expressions that compute a value

    Nx(stm) statements: expressions that compute no value

    Cx conditionals (jump to true and false destinations)

  • 8/6/2019 Color Lecture Notes

    168/237

    RelCx(op, left, right)

    IfThenElseExp expression/statement depending on use

    Conversion operators allow use of one form in context of another:

    unEx convert to tree expression that computes value of inner tree

    unNx convert to tree statement that computes inner tree but returns no

    value

    unCx(t, f) convert to statement that evaluates inner tree and branches totrue destination if non-zero, false destination otherwise

    168

    Translating Tiger

    Simple variables: fetch with a MEM:

    PLUS TEMP fpCONST k

    BINOP

    MEM

    Ex(MEM(

    (TEMP fp, CONST k)))

    where fp is home frame of variable, found by following static links; k is

  • 8/6/2019 Color Lecture Notes

    169/237

    offset of variable in that levelTiger array variables: Tiger arrays are pointers to array base, so fetch

    with a MEM like any other variable:

    Ex(MEM(

    (TEMP fp, CONST k)))

    Thus, for e i :

    Ex(MEM( (e.unEx, (i.unEx, CONST w))))

    i is index expression and w is word size all values are word-sized

    (scalar) in Tiger

    Note: must first check array index i size e ; runtime will put size in

    word preceding array base

    169

    Translating TigerTiger record variables: Again, records are pointers to record base, so fetch like other

    variables. For e.

    :

    Ex(MEM(

    (e.unEx, CONST o)))

    where o is the byte offset of the field

    in the recordNote: must check record pointer is non-nil (i.e., non-zero)

    String literals: Statically allocated, so just use the strings label

    Ex(NAME(label))

  • 8/6/2019 Color Lecture Notes

    170/237

    where the literal will be emitted as:

    Record creation: f1 e1 f2 e2 fn en in the (preferably GCd) heap, first allocatethe space then initialize it:

    Ex( ESEQ(SEQ(MOVE(TEMP r, externalCall(allocRecord, [CONST n])),SEQ(MOVE(MEM(TEMP r), e1.unEx)),

    SEQ(...,MOVE(MEM(+(TEMP r, CONST n 1 w)),

    en.unEx))),TEMP r))

    where w is the word size

    Array creation:

    e1 e2: Ex(externalCall(initArray, [e1.unEx, e2.unEx]))

    170

    Control structures

    Basic blocks:

    a sequence of straight-line code

    if one instruction executes then they all execute

    i l f i i i h b h

  • 8/6/2019 Color Lecture Notes

    171/237

    a maximal sequence of instructions without branches a label starts a new basic block

    Overview of control structure translation:

    control flow links up the basic blocks ideas are simple

    implementation requires bookkeeping

    some care is needed for good code

    171

    while loops

    while c do s:

    1. evaluate c

    2. if false jump to next statement after loop3. if true fall into loop body

    4. branch to top of loop

    e.g.,

    t t

  • 8/6/2019 Color Lecture Notes

    172/237

    test:if not(c) jump dones

    jump test

    done:Nx( SEQ(SEQ(SEQ(LABEL test, c.unCx(body, done)),

    SEQ(SEQ(LABEL body, s.unNx), JUMP(NAME test))),LABEL done))

    repeat e1 until e2 evaluate/compare/branch at bottom of loop

    172

    for loops

    for := e1 to e2 do s

    1. evaluate lower bound into index variable

    2. evaluate upper bound into limit variable3. if index

    limit jump to next statement after loop4. fall through to loop body5. increment index6. if index limit jump to top of loop body

    t1 1

  • 8/6/2019 Color Lecture Notes

    173/237

    t1 e1t2 e2if t1 t2 jump done

    body: s

    t1

    t1

    1if t1 t2 jump bodydone:

    For break statements:

    when translating a loop push the done label on some stack break simply jumps to label on top of stack when done translating loop and its body, pop the label

    173

    Function calls

    f e1 en :

  • 8/6/2019 Color Lecture Notes

    174/237

    Ex(CALL(NAME labelf, [sl,e1, . . . en]))

    where sl is the static link for the callee f, found by following n static links

    from the caller, n being the difference between the levels of the caller andthe callee

    174

    Comparisons

    Translate a op b as:

    RelCx(op, a.unEx, b.unEx)

    When used as a conditional unCx t f yields:

    CJUMP(op a unEx b unEx t f )

  • 8/6/2019 Color Lecture Notes

    175/237

    CJUMP(op, a.unEx, b.unEx, t, f)

    where t and f are labels.

    When used as a value unEx yields:

    ESEQ(SEQ(MOVE(TEMP r, CONST 1),

    SEQ(unCx(t, f),

    SEQ(LABEL f,

    SEQ(MOVE(TEMP r, CONST 0), LABEL t)))),

    TEMP r)

    175

    ConditionalsThe short-circuiting Boolean operators have already been transformed into

    if-expressions in Tiger abstract syntax:

    e.g., x 5 & a b turns into if x 5 then a b else 0

    Translate if e1 then e2 else e3 into: IfThenElseExp(e1, e2, e3)

    When used as a value unEx yields:

    ESEQ(SEQ(SEQ(e1.unCx(t, f),

    SEQ(SEQ(LABEL t

  • 8/6/2019 Color Lecture Notes

    176/237

    SEQ(SEQ(LABEL t,SEQ(MOVE(TEMP r, e2.unEx),

    JUMP join)),

    SEQ(LABEL f,

    SEQ(MOVE(TEMP r, e3.unEx),JUMP join)))),

    LABEL join),

    TEMP r)

    As a conditional unCx t f yields:SEQ(e1.unCx(tt,ff), SEQ(SEQ(LABEL tt, e2.unCx(t, f)),

    SEQ(LABEL ff, e3.unCx(t, f))))

    176

    Conditionals: Example

    Applying unCx t f to if x 5 then a b else 0:

    SEQ(CJUMP(LT, x.unEx, CONST 5, tt, ff),

    SEQ(SEQ(LABEL CJUMP(GT E b E f ))

  • 8/6/2019 Color Lecture Notes

    177/237

    SEQ(SEQ(LABEL tt, CJUMP(GT, a.unEx, b.unEx, t, f)),

    SEQ(LABEL ff, JUMP f)))

    or more optimally:

    SEQ(CJUMP(LT, x.unEx, CONST 5, tt, f),

    SEQ(LABEL tt, CJUMP(GT, a.unEx, b.uneX, t, f)))

    177

    One-dimensional fixed arrays

  • 8/6/2019 Color Lecture Notes

    178/237

    translates to:

    MEM(+(TEMP fp, +(CONST k

    2w,

    (CONST w, e.unEx))))

    where k is offset of static array from fp, w is word size

    In Pascal, multidimensional arrays are treated as arrays of arrays, so

    is equivalent to A[i][j], so can translate as above.

    178

    Multidimensional arrays

    Array allocation:

    constant bounds

    allocate in static area, stack, or heap

    no run-time descriptor is needed

  • 8/6/2019 Color Lecture Notes

    179/237

    no run time descriptor is neededdynamic arrays: bounds fixed at run-time

    allocate in stack or heap

    descriptor is needed

    dynamic arrays: bounds can change at run-time

    allocate in heap

    descriptor is needed

    179

    Multidimensional arrays

    Array layout:

    Contiguous:1. Row major

    Rightmost subscript varies most quickly:

  • 8/6/2019 Color Lecture Notes

    180/237

    Used in PL/1, Algol, Pascal, C, Ada, Modula-3

    2. Column major

    Leftmost subscript varies most quickly:

    Used in FORTRAN

    By vectorsContiguous vector of pointers to (non-contiguous) subarrays

    180

    Multi-dimensional arrays: row-major layout

    no. of elts in dimension j:

    Dj Uj Lj 1

    position of i1 in :

    in Ln

    in 1 Ln 1 Dn

  • 8/6/2019 Color Lecture Notes

    181/237

    in 2 Ln 2 DnDn 1

    i1 L1 Dn D2

    which can be rewritten asvariable part

    i1D2 Dn i2D3 Dn in

    1Dn in

    L1D2 Dn L2D3 Dn Ln

    1Dn Ln

    constant partaddress of

    i1 in :

    address( ) + ((variable part constant part) element size)

    181

    ca