CS 152: Programming Language Paradigms
April 2 Class Meeting
Department of Computer ScienceSan Jose State University
Spring 2014Instructor: Ron Mak
www.cs.sjsu.edu/~mak
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
2
Introduction to Compilers and Interpreters
In this class, you will use Java to write an interpreter for the Scheme language.
You will be able to execute simple Scheme programs._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
3
A Compiler is a Translator
A compiler translates a program you’ve written
... in a high-level language C, C++, Java, Pascal, etc.
... into a low-level language assembly language or machine language
... that a computer can understand and eventually execute._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
4
Conceptual Design (Version 1) Parser
Controls the translation process. Repeatedly asks the scanner
for the next token.
Scanner Repeatedly reads characters
from the source to construct tokens for the parser.
Token A source language element
identifier (name) number special symbol (+ - * / = etc.) reserved word
Also reads from the source
Source The source program
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
5
Token
A low-level element of the source language. AKA lexeme
Java language tokens
identifiers names of variables, types, procedures, functions,
enumeration values, etc.
numbers integer and real (floating-point)
reserved words class interface if else for while etc.
special symbols + - * / = < <= = >= > >>= <<= . , : ( ) [ ] { } ' "
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
6
Parser
Controls the translation process. Repeatedly asks the scanner for the next token.
Knows the syntax (“grammar”) of the source language’s statements and expressions. Analyzes the sequence of tokens to determine
what kind of statement or expression it is translating. Verifies that what it’s seeing is syntactically correct. Flags any syntax errors that it finds and
attempts to recover from them.
What the parser does is called parsing. It parses the source program in order to translate it. AKA syntax analyzer
_
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
7
Scanner
Whenever requested by the parser,the scanner reads characters sequentially from the source in order to construct the next token.
It knows the syntax of the source language’s tokens.
What the scanner does is called scanning.it scans the source program in order to extract tokens. AKA lexical analyzer
_
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
8
Conceptual Design (Version 2)
We can architect a compiler with three major parts:
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
9
Major Parts of a Compiler Front end
Parser, Scanner, Source, Token
Intermediate tier
Intermediate code (icode) “Predigested” form of the
source code that the back end can process efficiently.
Example: parse trees AKA intermediate representation
(IR)
Symbol table (symtab) Stores information about the
symbols (such as the identifiers) contained in the source program.
Back end
Code generator Processes the icode and the symtab
in order to generate the object code.
Only the front end needs to be source language-specific.
The intermediate tier and the back end can be language-independent!
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
10
What Else Can Compilers Do?
Allow you to program in a high-level language and think about your algorithms, not about machine architecture.
Provide language portability. You can run your C++ and Java programs on different
machines because their compilers enforce language standards.
Can optimize and improve how your programs execute. Optimize the object code for speed. Optimize the object code for size. Optimize the object code for power consumption.
_
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
11
What about Interpreters?
An interpreter executes a source program instead of generating object code.
It executes a source program using the intermediate code and the symbol table.
It shares many of the components of a compiler.
Instead of a code generator in the back end, it has an executor._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
12
Conceptual Design (Version 3) A compiler and an interpreter can both use the
same front end and intermediate tier.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
13
Comparing Compilers and Interpreters
A compiler generates object code, but an interpreter does not.
Executing the source program from object code can be several orders of magnitude faster than executing the program by interpreting the intermediate code and the symbol table.
But an interpreter requires less effort to get a source program to execute = faster turnaround time.
An interpreter maintains control of the source program’s execution.
Interpreters often come with interactive source-level debuggers that allow you to refer to source program elements, such as variable names.
AKA symbolic debugger_
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
14
Compilers and Interpreters
Therefore ...
Interpreters are useful during program development.
Compilers are useful to run released programs in a production environment._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
15
How to Scan for Pascal Tokens
Suppose the source line contains IF (index >= 10) THEN
The scanner skips over the leading blanks. The current character is I, so the next token must be a word.
The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. The scanner determines that the word is a reserved word.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
16
How to Scan for Pascal Tokens, cont’d
The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol.
After extracting the special symbol token, the current character is i. The next token must be a word.
After extracting the word token, the current character is a blank.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
17
How to Scan for Pascal Tokens, cont’d Skip the blank. The current character is >.
Extract the special symbol token. The current character is a blank.
Skip the blank. The current character is 1, so the next token must be a number.
After extracting the number token, the current character is ).
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
18
How to Scan for Pascal Tokens, cont’d Extract the special symbol token. The current character is a blank.
Skip the blank. The current character is T, so the next token must be a word.
Extract the word token. Determine that it’s a reserved word.
The current character is \n, so the scanner is done with this line.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
19
Basic Scanning Algorithm Skip any blanks until the current character is nonblank.
Treat each comment and the end-of-line character as a blank.
The current (nonblank) character determines what the next token is and becomes that token’s first character.
Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token.
Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token. _
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
20
Scheme Syntax Diagrams
0
9
1
digitletter A
Z
B
a
z
b letter
word digit
?
letter
-
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
21
Scheme Syntax Diagrams, cont’d
unsigned integer
unsigned integer .
digit
unsigned integer
number
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
22
Scheme Syntax Diagrams, cont’dcharacter
# \ any character
string
"
any character except "
"
# t
# f
boolean
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
23
Scheme Syntax Diagrams, cont’d
word
number
boolean
symbol
element
( )element
list
list
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
24
Scheme Keywords (Partial List)
and begin cond define else if lambda let letrec let* not or quote
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
25
Scheme Intermediate Code
Scheme programs have a simple structure. Everything is a list.
The Scheme parser can translate a list into a binary tree. The left subtree is the car of the list. The right subtree is the cdr of the list. Each leaf node contains an element of the list.
Example: (1 2 3)1
2
3
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
26
Scheme Intermediate Code, cont’d
Example: ((a b) c (d))
Do a preorder walk of the tree to recreate the list: Visit the root.
If the left subtree is not an element node, open a set of parentheses.
Visit the left subtree. If the left subtree is a leaf, print its element.
Visit the right subtree.
a
b
c
d
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
27
The Symbol Table: Basic Concepts
Purpose To store information about certain tokens
during the translation process (i.e., parsing and scanning).
What information to store? Anything that’s useful! For a symbol:
name how it’s defined (as a variable, procedure name, etc.)
Basic operations Enter new information Look up existing information Update existing information
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
28
The Symbol Table: Conceptual Design
Each symbol table entry has the name of a symbol the symbol’s attributes
At the conceptual level, we don’t worry about implementation.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
29
The Symbol Table: Implementation
Each symbol table entry includes the name of a symbol and its attributes.
To maintain maximum flexibility, implement the attributes as a hash table. Key: the attribute name Value: the attribute value
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
30
The Symbol Table: Implementation, cont’d
The symbol table itself can be a hash table. Key: the symbol name Value: the symbol table entry for the symbol
Therefore, we have a hash table of hash tables.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
31
Assignment #5
Use Java to write a parser and a scanner for Scheme.
Your scanner should recognize Scheme tokens.
Print each source input line after the scanner reads the line.
For each token, your scanner should print the token string and the token type (identifier, number, keyword, special symbol, etc.). One token per output line.
The scanner should ignore comments._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
32
Assignment #5, cont’d
The parser should enter each identifier into a symbol table.
Don’t enter keywords. For now, the attributes of each symbol table entry can be null.
For this assignment, you’ll have only one symbol table._
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
33
Assignment #5, cont’d
Your parser should translate Scheme lists. Build the intermediate code (binary trees).
After an entire top-level list has been parsed, your backend takes over to perform the following:
Walk the binary tree and print the list with the proper parentheses. The list output does not need to be “pretty” but it must be correct. For example, you can start each sublist on a new line
after an indent.
Print the contents of the symbol table in alphabetical order. Hint: Use a Java TreeMap instead of a Hashtable or Hashmap
for the symbol table.
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
34
Assignment #5, cont’d
Clear the intermediate code and the symbol table before parsing the next top-level list.
To ensure that your code is properly partitioned, put your classes into three packages: frontend, intermediate, and backend.
Your test input file is on the last slide.
Email [email protected] a zip file containing: The source directory containing all your Java source files. A text file of the output from your program. Subject: CS 152 Assignment #5, team-name
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
35
Assignment #5, cont’d
Due Wednesday, April 16 Do not wait until the last minute to do this assignment!
_
SJSU Dept. of Computer ScienceSpring 2014: April 2
CS 152: Programming Language Paradigms© R. Mak
36
Assignment #5, cont’d
; Find the derivative of polynomial poly with respect to variable var.; The polynomial must be in canonical infix form.(define deriv (lambda (poly var) (let* ((terms (terminize poly)) ; "terminize" the polynomial (deriv-term ; local procedure deriv-term (lambda (term) (cond ((null? term) '()) ((not (member? var term)) '(0)) ; deriv = 0 ((not (member? '^ term)) (upto var term)) ; deriv = coeff (else (deriv-term-expo term var)) ; handle exponent ))) (diff (map deriv-term terms))) ; map deriv-term over the terms (remove-trailing-plus (polyize diff)) ; finalize the answer)))
; Convert an infix polynomial into a list of sublists,; where each sublist is a term.(define terminize (lambda (poly) (cond ((null? poly) '()) (else (cons (upto '+ poly) (terminize (after '+ poly)))))))
input.lisp