cse 390 b spring 2020 compiler debugging & intro to

57
CSE 390B, Spring 2020 L15: Debugging & Intro to Compiler CSE 390 B Spring 2020 Debugging & Intro to Compiler Best practices for debugging, Overview of the Compiler, Introduction to the Scanner and Parser Significant material adapted from www.nand2tetris.org. © Noam Nisan and Shimon Schocken.

Upload: others

Post on 03-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

CSE 390 B Spring 2020

Debugging & Intro to Compiler

Best practices for debugging, Overview of the Compiler, Introduction to the Scanner and Parser

Significant material adapted from www.nand2tetris.org. © Noam Nisan and Shimon Schocken.

Page 2: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

2

Page 3: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Share a quick summary of your professor meeting.

What did you take away in meeting with your professor?

3

Page 4: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

4

Page 5: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Debugging!

5

Page 6: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Source / Acknowledgements

This is a subset+adaptation of a CSE331 lecture

If you’ve taken CSE331, this is perspective seeing more than once

▪ Plus we are going to “make you do it”

▪ And it’s closely connected to metacognition

If you haven’t taken CSE331, this is a helpful sneak peek

▪ Indeed, it would help if our 100-level classes taught debugging more explicitly

Acknowledgements: CSE331 instructors, notably Michael D. Ernst, Hal Perkins, …

6

Page 7: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

A Bug’s Life

defect – mistake committed by a human

error – incorrect computation

failure – visible error: program violates its specification

Debugging starts when a failure is observed

During testing

In the field

Goal is to go from failure back to defect

7

Page 8: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Testing vs. debugging

Testing ≠ debugging▪ test: reveals existence of problem (failure)▪ debug: pinpoint location + cause of problem (defect)

See CSE331 for:▪ How to write code that has fewer bugs (so less debugging)▪ How to write code that is easier to test (so easier to reveal bugs)▪ How to make testing easier (so you do it more often)▪ How to write code that is easier to debug (so less time spent

debugging)These are all incredibly valuable engineering skills

8

Page 9: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Last (inevitable) resort: debugging

Defects happen – people are imperfect▪ Industry average (?): 10 defects per 1000 lines of code

Defects happen that are not immediately localizable▪ Found during integration testing▪ Or reported by user

Cost of an error increases by orders of magnitude during program lifecycle

step 1 – Clarify symptom (simplify input), create “minimal” teststep 2 – Find and understand causestep 3 – Fixstep 4 – Rerun all tests, old and new

9

Page 10: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The debugging process

step 1 – find small, repeatable test case that produces the failure

▪ May take effort, but helps identify the defect and gives you a regression test

▪ Do not start step 2 until you have a simple repeatable test

step 2 – narrow down location and proximate cause

▪ Loop: (a) Study the data (b) hypothesize (c) experiment

▪ Experiments often involve changing the code

▪ Do not start step 3 until you understand the cause

step 3 – fix the defect

▪ Is it a simple typo, or a design flaw?

▪ Does it occur elsewhere?

step 4 – add test case to regression suite

▪ Is this failure fixed? Are any other new failures introduced?

10

Page 11: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Debugging and the scientific method

❖ Debugging should be systematic

▪ Carefully decide what to do

• Don’t flail!

▪ Keep a record of everything that you do

▪ Don’t get sucked into fruitless avenues

❖ Use an iterative scientific process:

Formulate a hypothesis

Design an experiment

Perform an experiment

Interpret results

11

Page 12: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Example

// returns true iff sub is a substring of full// (i.e. iff there exists A,B such that full=A+sub+B)boolean contains(String full, String sub);

User bug report:It can't find the string "very happy" within:

"Fáilte, you are very welcome! Hi Seán! I am very very happy to see you all."

Poor responses:▪ Notice accented characters, panic about not knowing about Unicode,

begin unorganized web searches and inserting poorly understood library calls, …

▪ Start tracing the execution of this exampleBetter response: simplify/clarify the symptom…

12

Page 13: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Reducing absolute input size

Find a simple test case by divide-and-conquer

Pare test down:

Cannot find "very happy" within

"Fáilte, you are very welcome! Hi Seán! I am very very happy to see you all.""I am very very happy to see you all.""very very happy"

Can find "very happy" within

"very happy"Cannot find "ab" within "aab"

13

Page 14: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Reducing relative input size

Can you find two almost identical test cases where one gives the correct answer and the other does not?

Cannot find "very happy" within

"I am very very happy to see you all."

Can find "very happy" within

"I am very happy to see you all.”

14

Page 15: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

General strategy: simplify

In general: find simplest input that will provoke failure

▪ Usually not the input that revealed existence of the defect

Start with data that revealed the defect

▪ Keep paring it down (“binary search” can help)

▪ Often leads directly to an understanding of the cause

When not dealing with simple method calls:

▪ The “test input” is the set of steps that reliably trigger the failure

▪ Same basic idea

15

Page 16: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Localizing a defect

Take advantage of modularity

▪ Start with everything, take away pieces until failure goes away

▪ Start with nothing, add pieces back in until failure appears

Take advantage of modular reasoning

▪ Trace through program, viewing intermediate results

Binary search speeds up the process

▪ Error happens somewhere between first and last statement

▪ Do binary search on that ordered set of statements

16

Page 17: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Binary search on buggy code

public class MotionDetector { private boolean first = true; private Matrix prev = new Matrix();

public Point apply(Matrix current) { if (first) { prev = current; } Matrix motion = new Matrix(); getDifference(prev,current,motion); applyThreshold(motion,motion,10); labelImage(motion,motion); Hist hist = getHistogram(motion); int top = hist.getMostFrequent(); applyThreshold(motion,motion,top,top); Point result = getCentroid(motion); prev.copy(current); return result; }}

no problem yet

problem exists

Check intermediate

resultat half-way point

17

Page 18: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Binary search on buggy code

public class MotionDetector { private boolean first = true; private Matrix prev = new Matrix();

public Point apply(Matrix current) { if (first) { prev = current; } Matrix motion = new Matrix(); getDifference(prev,current,motion); applyThreshold(motion,motion,10); labelImage(motion,motion); Hist hist = getHistogram(motion); int top = hist.getMostFrequent(); applyThreshold(motion,motion,top,top); Point result = getCentroid(motion); prev.copy(current); return result; }}

no problem yet

problem exists

Check intermediate

resultat half-way point

18

Page 19: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Detecting Bugs in the Real World

Real Systems

▪ Large and complex (duh ☺)

▪ Collection of modules, written by multiple people

▪ Complex input

▪ Many external interactions

▪ Nondeterministic

Replication can be an issue

▪ Infrequent failure

▪ Instrumentation eliminates the failure

▪ No printf or debugger

Errors cross abstraction barriers

Large time lag from corruption (error) to detection (failure)

19

Page 20: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Heisenbugs

In a sequential, deterministic program, failure is repeatable

But the real world is not that nice…▪ Continuous input/environment changes▪ Timing dependencies▪ Concurrency and parallelism

Failure occurs randomly ▪ Depends on results of random-number generation▪ Hash tables behave differently when program is rerun

Bugs hard to reproduce when:▪ Use of debugger or assertions makes failure goes away

• Due to timing or assertions having side-effects▪ Only happens when under heavy load▪ Only happens once in a while

20

Page 21: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Logging Events

Log (record) events during execution as program runs (at full speed)

Examine logs to help reconstruct the past

▪ Particularly on failing runs

▪ And/or compare failing and non-failing runs

But don’t spend too much time manually reading enormous, confusing logs

21

Page 22: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

More Tricks for Hard Bugs

Rebuild system from scratch, or restart/reboot▪ Find the bug in your build system or persistent data structures

Explain the problem to a friend (or to a rubber duck)

Make sure it is a bug▪ Program may be working correctly and you don’t realize it!

Face reality▪ Debug reality (actual evidence), not what you think is true

And things we already know:

▪ Minimize input required to exercise bug (exhibit failure)▪ Add more checks to the program▪ Add more logging

22

Page 23: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Where is the defect?

The defect is not where you think it is

▪ Ask yourself where it cannot be; explain why

▪ Self-psychology: look forward to being wrong!

Look for simple easy-to-overlook mistakes first, e.g.,

▪ Reversed order of arguments

▪ Spelling of identifiers

▪ Same object vs. equal: a == b versus a.equals(b)▪ Uninitialized data/variables

▪ Deep vs. shallow copy

Make sure that you have correct source code!

▪ Check out fresh copy from repository; recompile everything

▪ Does a syntax error break the build? (it should!)

23

Page 24: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

When the going gets tough

Reconsider assumptions

▪ Debug the code, not the comments

• Ensure that comments and specs describe the code

Start documenting your system

▪ Gives a fresh angle, and highlights area of confusion

Get help

▪ We all develop blind spots

▪ Explaining the problem often helps (even to rubber duck)

Walk away

▪ Trade latency for efficiency – sleep!

▪ One good reason to start early

24

Page 25: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Key Concepts

Testing and debugging are different

▪ Testing reveals existence of failures

▪ Debugging pinpoints location of defects

Debugging should be a systematic process

▪ Use the scientific method

Understand the source of defects

▪ To find similar ones and prevent them in the future

Learn from the debugging process

▪ It’s inevitable and you have some control over how you approach the frustration

25

Page 26: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

26

Page 27: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Roadmap

27

High-Level Language

Intermediate Language(s)

Assembly Language

Machine Code

Operating System

Computer

CPUMemory

ALU

Basic Logic Gates

NAND

SOFTWARE

HARDWARE

PC

Assembler

Focus for the rest of the course

Page 28: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

SoftwareOverview

x86, x86-64ARM

RISC-VHACK

Assembly Language

Machine Code

WindowsMac

Unix/LinuxAndroidHack OS

Operating System

SOFTWAREAssembler

Java Byte CodeJack VM Code

JavaPython

C/C++Jack

High-Level Language

Intermediate Language(s)

Compiler

Compiler

(VM Translator)

Compiler

(Your project 7)

Page 29: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Compiler: Goal

29

public int fact(int n) { if (n == 0) { return 1; } else { return n * fact(n - 1); }}

High-Level Language

(fact) @R0 M=M+1 @R1 D=A @ifbranch D;JEQ

Assembly Language

Compiler

Theory Definition: a string, from the set of strings making up a language

Practical Definition: a file containing a bunch of characters

Page 30: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Compiler: Implementation

30

public int fact(int n) { if (n == 0) { return 1; } else { return n * fact(n - 1); }}

High-Level Language

(fact) @R0 M=M+1 @R1 D=A @ifbranch D;JEQ

Assembly Language

Scanner ParserType

CheckerOptimizer

Code Generator

Break string into discrete tokens:

etc.

IF (

==

ID(n)

NUM(0)

Verify the syntax tree is semantically correct

Rearrange the code to be more efficient

Convert the syntax tree to the target language

Arrange tokens into syntax tree:

+

x 10

Page 31: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

31

Page 32: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Aside: The Jack Language

● The High-Level Language we will use to program your Hack computer

● Very similar to Java -- mostly just a different set of keywords sprinkled around○ Makes compiling easier

32

function void main() { var int a, bar; let bar = 10;}

method int f(int a) { return 2;}

Jack

static void main() { int a, bar; bar = 10;}

int f(int a) { return 2;}

Java

Page 33: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner

33

Scanner

function void main() { var int a, bar; let bar=10; // init}

Jack

Token Stream

FUNCTION VOID ID(main)

LPAREN RPAREN LCURLY VAR

INT ID(a) COMMA ID(bar)

SEMICOLON

NUM(10)

LET

EQUALS

ID(bar)

SEMICOLON

RCURLY

● Reads a giant string, breaks down into tokens○ Each token has a type: what role does this token play?

■ e.g. “LCURLY” is a type representing an occurrence of “{“

○ What types do we care about? The “building blocks” of our programming language:

■ Keywords (e.g. )■ Operators (e.g. )■ Punctuation (e.g. )

FUNCTION

EQUALS

SEMICOLON COMMA

Page 34: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner

34

Scanner

function void main() { var int a, bar; let bar=10; // init}

Jack

Token Stream

FUNCTION VOID ID(main)

LPAREN RPAREN LCURLY VAR

INT ID(a) COMMA ID(bar)

SEMICOLON

NUM(10)

LET

EQUALS

ID(bar)

SEMICOLON

RCURLY

● In addition to a type, some tokens carry a value:○ Identifiers (e.g. )

○ Numbers (e.g. )

ID(a)

NUM(10)

● Scanner should present a clean token stream○ No whitespace or comments -- the rest of the compiler only

wants to consider things that change program meaning

Page 35: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

35

Scanner

function void main() { var int a, bar; let bar=10; // init}

Jack

Token Stream

FUNCTION VOID ID(main)

LPAREN RPAREN LCURLY VAR

INT ID(a) COMMA ID(bar)

SEMICOLON

NUM(10)

LET

EQUALS

ID(bar)

SEMICOLON

RCURLY

● What if we split the input program on whitespace, and match each segment to a token type? (e.g. “{“ → LCURLY)

● Tempting, but we would end up with “a,” “bar;” “bar=10;”○ Whitespace is tricky: generally want to ignore, but can’t

count on it being there!

Page 36: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

36

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ Break off a token when we complete one○ If the next char could be part of this token, accumulate it

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: ;

Page 37: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

37

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated:

SEMICOLON

Page 38: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

38

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: le

SEMICOLON

Page 39: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

39

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: let

SEMICOLON

Page 40: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

40

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: l

SEMICOLON

Page 41: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

41

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated:

SEMICOLON LET

Page 42: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

42

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: b

SEMICOLON LET

Page 43: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

43

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: ba

SEMICOLON LET

Page 44: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

44

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: bar

SEMICOLON LET

Page 45: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

45

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: =

SEMICOLON LET ID(bar)

Page 46: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: How?

46

; let bar=10;Jack

Token Stream

● Observation: many tokens have disjoint starting characters!● Keep cursor on current char

○ If the char could be part of this token, accumulate it○ If not, complete the current token

● How to distinguish built-in keywords (e.g. “let”) from identifiers (e.g. “bar”)?○ Simple: when token is done, check against list of keywords

curr

Accumulated: 1

SEMICOLON LET ID(bar)

EQUALS

Page 47: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Scanner: Why?

● Fundamentally: The compiler can’t reason about a massive string, so we need to boil it down to its meaning first○ A great place to start is grouping characters that form a “word”

● Engineering-wise: Separation of concerns○ A stream of tokens is an important abstraction for many

file-processing tasks, not just compiling○ Cleaning away whitespace and comments makes rest of

compiler simpler

47

Page 48: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

48

Page 49: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Parser

49

Token Stream

IF

Abstract Syntax Tree

IF

ASSIGN

ID(x) NUM(2)

LESSTHAN

ID(x) NUM(2)

LPAREN ID(x)

LESSTHAN NUM(2)

RPAREN LCURLY

ID(x) EQUALS

NUM(2) SEMICOLON

Parser

● Takes in the flat token stream and outputs a structured tree representation of program constructs

condition body

left rightleft right

● Result: an Abstract Syntax Tree

○ Captures the structural features of the program○ Important distinction: cares about big-picture syntax (e.g.

entire if statement) rather than nitty-gritty syntax (e.g. semicolons, parentheses, even word “if” used to write that if statement)

Page 50: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Describing a Programming Language

● Many ways to define programming languages, some formal -- we won’t cover language definition in depth○ See 341, 401, 402

● Example: Statements vs. Expressions

50

StatementsPerform an action

ExpressionsEvaluate to a result

● Assignment Statementx = y;

● If Statementif (x == 0) { x = y;}

● Operatorsx == 0;

● Variablex

● Constant24

Page 51: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Describing a Programming Language

● These broad categories lend themselves well to recursive definitions○ Easily express all possible configurations of the language

constructs

51

Symbolic Example

if (x == 0) { x = y;}

General Definition of an if Statement

if ( ) {

}

EXPRESSION

STATEMENT

STATEMENT

...

Token Stream Definition

EXPRESSION

IF LPAREN

RPAREN

LCURLY

RCURLY

STATEMENT

STATEMENT ...

Page 52: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

The Parser: How? (Preview)

52

Token Stream

IF

Abstract Syntax Tree

IF

ASSIGN

ID(x) NUM(2)

LESSTHAN

ID(x) NUM(2)

LPAREN ID(x)

LESSTHAN NUM(2)

RPAREN LCURLY

ID(x) EQUALS

NUM(2) SEMICOLON Parser

● Similar to scanner: single pass through token stream, building up as we go

● Intuition: If we see and , we’re entering an if statement and next we must see a complete expression○ Keep reading until we have a complete expression (recursively

parse that) and attach on the condition side of the

condition body

left rightleft right

IF LPAREN

IF

Page 53: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Agenda

❖ Morning Warm-up Question

❖ Debugging Best Practices

❖ Introduction to the Compiler▪ Overview

▪ The Scanner

▪ The Parser

❖ Where We’re Headed

53

Page 54: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Type Checking (Semantic Analysis)

● Given the abstract syntax tree, run checks over it to ensure that it fits within constraints of the language○ Do the types match up?

● Collect additional info for code generation, such as number/type of arguments in each function

54Abstract Syntax Tree

IF

ASSIGN

ID(x) NUM(2)

LESSTHAN

ID(x) NUM(2)

condition body

left rightleft right

Does this expression evaluate to a boolean?

Is the variable “x” defined at this point?

Page 55: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Optimization

● Code improvement: change correct code into semantically equivalent but “better” code

○ Example: if something is computed every iteration of a while loop, the compiler could yank that computation out and compute it just once before entering the loop

■ Here, “better” means faster

○ But requires caution: what if the value changes on each iteration of the loop?

■ “Semantically equivalent” means user sees same outcome

55

Page 56: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Code Generation

● One way to think of compiler is converting from string in source language to → its actual, abstract “meaning”

● Code generation is converting that “meaning” into a string in the destination language

● Plenty of engineering details○ Example: if you want a stack frame/calling conventions for

function calls, you have to implement them yourself via instructions generated by the compiler every time it sees a function call

56

Page 57: CSE 390 B Spring 2020 Compiler Debugging & Intro to

CSE 390B, Spring 2020L15: Debugging & Intro to Compiler

Where We’re Headed

● Project 7 will be assigned Thursday○ Write a miniature compiler for a subset of Jack features, direct

to Hack assembly!○ By the end of this project, you’ll be able to write (limited)

high-level code that runs all the way down on your computer!

● Next stop: Specifics of the Jack Compiler○ How Parsing works

○ Code Generation

○ Calling Conventions

Project 6 Midterm Redo & Reflection due tonight!

57