1 copyright © 2004 p. simon [email protected] p. simon tuffs, d.phil [email protected]...

38
1 Copyright © 2004 P. Simon Tuffs [email protected] P. Simon Tuffs, D.Phil P. Simon Tuffs, D.Phil [email protected] [email protected] Presented at the Software Development Presented at the Software Development Forum/Java-Sig Forum/Java-Sig February 3, 2004 February 3, 2004 YACCL – Yet Another YACCL – Yet Another Compiler Compiler Compiler Compiler Language Language

Upload: aubrie-chambers

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

1

Copyright © 2004 P. Simon [email protected]

P. Simon Tuffs, D.PhilP. Simon Tuffs, D.Phil

[email protected]@simontuffs.com

Presented at the Software Development Forum/Java-Sig Presented at the Software Development Forum/Java-Sig February 3, 2004February 3, 2004

YACCL – Yet Another YACCL – Yet Another Compiler Compiler LanguageCompiler Compiler Language

Page 2: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

2

Copyright © 2004 P. Simon [email protected]

OutlineOutline

● What is YACCL?What is YACCL?● Where did it come from?Where did it come from?● How do you use it?How do you use it?● What is it good for?What is it good for?● Why Yet Another Compiler Compiler Language?Why Yet Another Compiler Compiler Language?

Page 3: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

3

Copyright © 2004 P. Simon [email protected]

ANTLRANTLR

● This is ANTLR:This is ANTLR:

Page 4: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

4

Copyright © 2004 P. Simon [email protected]

What is YACCL?What is YACCL?

● This is YACCL.This is YACCL.– Carry it under your arm for when Carry it under your arm for when

you need it.you need it.– Fold it away when you're done Fold it away when you're done

with it.with it.– It ain't fast, but it will get you It ain't fast, but it will get you

there, and it doesn't use much gas.there, and it doesn't use much gas.– Sometimes all you need is a Sometimes all you need is a

scooter. scooter.

Page 5: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

5

Copyright © 2004 P. Simon [email protected]

OK, this is a Tech TalkOK, this is a Tech Talk

● Here comes some tech:Here comes some tech:– YACCL is really three things which work together.YACCL is really three things which work together.

● A recursive descent parser class (called RDP)A recursive descent parser class (called RDP)● A bootstrap grammar factory (called YACCL)A bootstrap grammar factory (called YACCL)● A compiler compiler (called Compiler) – I was feeling A compiler compiler (called Compiler) – I was feeling

pretty creative with names that daypretty creative with names that day

– It's 100% pure Java, and engineered in JDK 1.3 so it It's 100% pure Java, and engineered in JDK 1.3 so it uses inner-classes uses inner-classes a lota lot..

– Available under a BSD-style license on sourceforge atAvailable under a BSD-style license on sourceforge at http://yaccl.sourceforge.nethttp://yaccl.sourceforge.net

Page 6: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

6

Copyright © 2004 P. Simon [email protected]

Where did it come from?Where did it come from?

● Inverse-Lazyness PrincipleInverse-Lazyness Principle– I needed a simple mathematical expression parser for I needed a simple mathematical expression parser for

my day-job.my day-job.– I did find a nice little one on the web, and used it. I I did find a nice little one on the web, and used it. I

wish I could remember its name... wish I could remember its name... – I did look at JavaCC and ANTLR, but they were I did look at JavaCC and ANTLR, but they were

overkill for what I neededoverkill for what I needed● Plus I needed something which would build a parse-tree Plus I needed something which would build a parse-tree

once, and then allow repeated evaluations with pull-mode once, and then allow repeated evaluations with pull-mode data.data.

● Not so easy with JavaCC & ANTLR. Not so easy with JavaCC & ANTLR.

Page 7: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

7

Copyright © 2004 P. Simon [email protected]

And the day-job ground on...And the day-job ground on...

● But now my curiosity was piqued: But now my curiosity was piqued: – Just how did top-down parsers work?Just how did top-down parsers work?– Why did JavaCC and ANTLR generate such Why did JavaCC and ANTLR generate such

procedural looking code?procedural looking code?– Was it possible to write an Object-Oriented Was it possible to write an Object-Oriented

Recursive-Descent Parser? Recursive-Descent Parser? – What does the last sentence mean?What does the last sentence mean?

● And so the journey began...And so the journey began...

Page 8: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

8

Copyright © 2004 P. Simon [email protected]

How do you use YACCL?How do you use YACCL?

● Let me count the ways... Let me count the ways...

1)1) Write Java Code to implement a parserWrite Java Code to implement a parser

2)2) Write a YACCL grammar and interpret itWrite a YACCL grammar and interpret it

3)3) Compile the YACCL grammar, and run it.Compile the YACCL grammar, and run it.● Since I didn't have an interpreter (2) or compiler Since I didn't have an interpreter (2) or compiler

(3), I had to start with (1). (3), I had to start with (1).

Page 9: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

9

Copyright © 2004 P. Simon [email protected]

● Let's consider the famous 4-function calculator Let's consider the famous 4-function calculator (without parentheses for now). (without parentheses for now). – Here's its grammar in YACCL (which we haven't Here's its grammar in YACCL (which we haven't

built yet):built yet):sum: mul (('+'|'-') mul)*;sum: mul (('+'|'-') mul)*;

mul: integer (('*'|'/') integer)*; mul: integer (('*'|'/') integer)*;

integer: ['0'-'9']+;integer: ['0'-'9']+;

1+2*3-4/5

Coding a ParserCoding a Parser

mulmul

sumsum

Page 10: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

10

Copyright © 2004 P. Simon [email protected]

Coding a ParserCoding a Parser

● Here's how you write it using the RDP class:Here's how you write it using the RDP class:RDP parser = RDP parser = newnew RDP(); RDP();

Terminal sum = parser.Terminal sum = parser.newnew NonTerminal("sum"); NonTerminal("sum");

Terminal mul = parser.Terminal mul = parser.newnew NonTerminal("mul"); NonTerminal("mul");

Terminal integer = parser.Terminal integer = parser.newnew Chars('0','9').plus(); Chars('0','9').plus();

sum.match(mul.star(parser.terminal("+").or("-").and(sum.match(mul.star(parser.terminal("+").or("-").and(mul)));mul)));

mul.match(integer.star(parser.terminal("*").or("/").mul.match(integer.star(parser.terminal("*").or("/").and(integer)));and(integer)));

● 7 lines of code! And quite readable once you 7 lines of code! And quite readable once you know the rules.know the rules.

Page 11: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

11

Copyright © 2004 P. Simon [email protected]

So where's the Lexer?So where's the Lexer?

● This one will get me in trouble for sure, but...This one will get me in trouble for sure, but...– RDP doesn't use one. RDP doesn't use one.

● It might be faster if it didIt might be faster if it did● But why introduce a separate lexical analyzer if you don't But why introduce a separate lexical analyzer if you don't

need one? And RDP doesn't. need one? And RDP doesn't. ● Define the tokenizing as part of the grammar, and let the Define the tokenizing as part of the grammar, and let the

parser figure it out.parser figure it out.– Holds open the door for context sensitive parsing down the road Holds open the door for context sensitive parsing down the road

(hey, (hey, that'sthat's not a keyword here....) not a keyword here....)

Page 12: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

12

Copyright © 2004 P. Simon [email protected]

But it doesn't But it doesn't dodo anything! anything!

● I thought you'd ask that.I thought you'd ask that.– RDP supports the notion of Action objects:RDP supports the notion of Action objects:RDP parser = new RDP();Action onSum = parser.new Action() { public void fire(Node node) { System.out.println("onSum(" + node.flatten() + ")"); }};Action onMul = parser.new Action() { public void fire(Node node) { System.out.println("onMul(" + node.flatten() + ")"); }};

Page 13: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

13

Copyright © 2004 P. Simon [email protected]

Make it do something!Make it do something!

● By declaring non-terminal objects with an action By declaring non-terminal objects with an action object, actions will be fired when the non-object, actions will be fired when the non-terminals are matched.terminals are matched.

● Output:Output:

// A simple four function calculator (+,-,*,/). Terminal sum = parser.new NonTerminal("sum", onSum); Terminal mul = parser.new NonTerminal("mul", onMul); ... parser.expression("1+2*3-4/5");

onMul(1)onMul(2*3)onMul(4/5)onSum(1+2*3-4/5)

Page 14: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

14

Copyright © 2004 P. Simon [email protected]

So how do we evaluate?So how do we evaluate?

● Well, we could do the usual stack-base evaluatorWell, we could do the usual stack-base evaluator– Add a Add a pushpush action routine for action routine for integerinteger, use it to , use it to

push operands onto a stackpush operands onto a stack– Evaluate the stack during Evaluate the stack during onSumonSum and and onMulonMul

● But this feels like a KludgeBut this feels like a Kludge– And it is. A calculator implemented this way has to And it is. A calculator implemented this way has to

parse its grammar for each expression evaluation. parse its grammar for each expression evaluation. ● This makes it slow.This makes it slow.● What if you have expressions like “2+x*3” where 'x' is a What if you have expressions like “2+x*3” where 'x' is a

variable, and changes on each evaluation each time?variable, and changes on each evaluation each time?

Page 15: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

15

Copyright © 2004 P. Simon [email protected]

Time for (a) TreeTime for (a) Tree

● This is where the concept of a parse-tree (or This is where the concept of a parse-tree (or AST) comes to the rescue.AST) comes to the rescue.– Parser generators don't usually have built-in parse-tree Parser generators don't usually have built-in parse-tree

generation enabled by default.generation enabled by default.– YACCL does. That's what YACCL does. That's what NodeNode is in the is in the fire()fire()

method, a node in the parse tree.method, a node in the parse tree.

Action onMul = parser.new Action() { public void fire(Node node) { System.out.println("onMul(" + node.flatten() + ")"); }};

Page 16: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

16

Copyright © 2004 P. Simon [email protected]

Trees...Trees...

● Once you've parsed a token stream, you can Once you've parsed a token stream, you can dump the tree to see what it looks likedump the tree to see what it looks likedump: <'root'>dump: <sum>dump: <mul>dump: 1dump: +dump: <mul>dump: 2dump: *dump: 3dump: -dump: <mul>dump: 4dump: /dump: 5

Page 17: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

17

Copyright © 2004 P. Simon [email protected]

Evaluate at willEvaluate at will

● Each element in the AST is a Each element in the AST is a NodeNode..● What's a What's a NodeNode??

– It's an object which holds various information about the It's an object which holds various information about the things which have been parsed:things which have been parsed:

● tokentoken – A place to hold matched tokens or reduced subtrees – A place to hold matched tokens or reduced subtrees● termterm – The terminal which led to creation of the node – The terminal which led to creation of the node● parentparent – The parent node (in the AST) – The parent node (in the AST)● childrenchildren – The child nodes (in the AST) – The child nodes (in the AST)● resultresult – An uncommitted result object for use during eval. – An uncommitted result object for use during eval.

Page 18: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

18

Copyright © 2004 P. Simon [email protected]

Too technical?Too technical?

● Probably, so on with the next fun bit.Probably, so on with the next fun bit.– I have the tools to hand-craft a parser in Java (RDP)I have the tools to hand-craft a parser in Java (RDP)

● But writing a parser for any significant grammar would be But writing a parser for any significant grammar would be rather hard work, with all those nested inner objects and rather hard work, with all those nested inner objects and arcane syntaxes.arcane syntaxes.

– So, what language did I build first?So, what language did I build first?

Page 19: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

19

Copyright © 2004 P. Simon [email protected]

YACCL!YACCL!

Page 20: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

20

Copyright © 2004 P. Simon [email protected]

YACCL BootstrapYACCL Bootstrap

● The first significant RDP parser I wrote was for The first significant RDP parser I wrote was for the YACCL grammar, and here it is:the YACCL grammar, and here it is:// This is the bootstrap grammar. This grammar can be used to // create other grammars without writing code like this...

$bootstrap.match(boot.plus($production.or($control)));$production.match($prod.query($action).and(":").and($or).and(";"));$action.match(boot.terminal("{").and($literal).and("}"));$or.match($and.star(boot.terminal("|").and($and)));$and.match(boot.plus($item.query($action).or($bang.and($lp).and ($or).and($rp).action(onRp).query($action)).and($repeat)));$item.match(boot.and($bang.and($string.or($literal).or($char).or ($range)), $repeat).or($caction));$repeat.match(boot.repeat.or($limit).or(boot.empty));$limit.match(boot.terminal("<").and($int).and(">"));

$literal.match(boot.ident.plus());$prod.match(boot.ident.plus());$ctl.match(boot.ident.plus());

Page 21: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

21

Copyright © 2004 P. Simon [email protected]

YACCL BootstrapYACCL Bootstrap

● Phew!Phew!– And that's the last of those we need to write.And that's the last of those we need to write.

$string.match($quote.and(boot.visible.exclude('"','"').plus()).and($quote));$char.match($apos.and($bs.and(boot.visible)).and($apos));$int.match(boot.terminal("0x").and(boot.chars('0','9').include ('a','f').include('A','F').plus()).or(boot.chars('0','9').plus()));

$crange.match($tilde.and(boot.or($char.or($int).and("-").and ($char.or($int)), $string).or($char)));$range.match($tilde.and("[").star($crange.or($literal)).star($stop). and("]").query($rpt));$stop.match($bang.and($string.or($literal)));

$control.match($ctl.and("=").and($bool.or($string).or($literal)).and(";"));$bool.match(boot.terminal("true").or("false"));$caction.match(boot.terminal("<").and($control).and(">"));

$lineComment.match(boot.terminal("//").and(boot.visible.exclude(NL).star()).and (NL));$blockComment.match(boot.terminal("/*").and(boot.visible.include (boot.whitespace).stop("*/").star()).and("*/"));

Page 22: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

22

Copyright © 2004 P. Simon [email protected]

YACCL BootstrapYACCL Bootstrap

● Because that will parse this:Because that will parse this:// Dumping 'boot' productions...bootstrap: (production | control)+;production: prod (action)? ':' or ';';action: '{' literal '}';or: and ('|' and)*;and: ((item (action)? | ('!')? '(' or ')' {onRp} (action)?) repeat)+;item: (('!')? (string | literal | char | range) repeat | caction);repeat: (["*+?"] | limit | empty);limit: '<' int '>';literal: ['a'-'z' 'A'-'Z' '0'-'9' "_.-"]+;prod: ['a'-'z' 'A'-'Z' '0'-'9' "_.-"]+;ctl: ['a'-'z' 'A'-'Z' '0'-'9' "_.-"]+;string: '"' [' '-'~' ~'"' " \t\n\r\f"]+ '"';char: '\'' ('\\')? [' '-'~' " \t\n\r\f"] '\'';int: ("0x" ['0'-'9' 'a'-'f' 'A'-'F']+ | ['0'-'9']+);crange: ('~')? ((char | int) '-' (char | int) | string | char);range: ('~')? '[' (crange | literal)* (stop)* ']' (rpt)?;stop: ('!')? (string | literal);control: ctl '=' (bool | string | literal) ';';bool: ("true" | "false");caction: '<' control '>';linecomment: "//" [' '-'~' " \t\n\r\f" ~"\n"]* '\n';blockcomment: "/*" [' '-'~' " \t\n\r\f \t\n\r\f" !"*/"]* "*/";

Page 23: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

23

Copyright © 2004 P. Simon [email protected]

YACCL BootstrapYACCL Bootstrap

● Which is itself, the output of this:Which is itself, the output of this:

YACCL yaccl = new YACCL();RDP bootstrap = yaccl.bootstrap();String file = "bootstrap.grammar.y";bootstrap.dump(new java.io.PrintWriter(new java.io.FileWriter(file)));

Page 24: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

24

Copyright © 2004 P. Simon [email protected]

So now...So now...

● So now we can interpret our grammars, instead of So now we can interpret our grammars, instead of having to write Java code.having to write Java code.– Here's the example from the ANTLR tutorial, Here's the example from the ANTLR tutorial,

reworked for YACCLreworked for YACCL

NAME: ['a'-'z' 'A'-'Z']+ NEWLINE;NEWLINE: '\r'? '\n';

Page 25: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

25

Copyright © 2004 P. Simon [email protected]

Hmmm.Hmmm.

● Parse failed? Parse failed? – Well no. It actually succeeded on the first line, but Well no. It actually succeeded on the first line, but

failed on the second line, because that's what the failed on the second line, because that's what the grammar says. grammar says.

● That's why it's called grammar I guess.That's why it's called grammar I guess.

– You can see more, turn on some logging...You can see more, turn on some logging...

Page 26: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

26

Copyright © 2004 P. Simon [email protected]

But it doesn't But it doesn't dodo anything! anything!

● We're back to the problem of adding action We're back to the problem of adding action routines.routines.– And here is where YACCL starts to really stray from And here is where YACCL starts to really stray from

the beaten track.the beaten track.– Having worked with YACC for many years, and used Having worked with YACC for many years, and used

it to build cross-language parsers (Perl, C, Java) from it to build cross-language parsers (Perl, C, Java) from single-source grammars, I learned to hate embedding single-source grammars, I learned to hate embedding code in the grammar.code in the grammar.

Page 27: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

27

Copyright © 2004 P. Simon [email protected]

Implicit Action RoutinesImplicit Action Routines

● So in YACCL, action-routines are So in YACCL, action-routines are implicitimplicit. . ● All you have to do is specify an action-classAll you have to do is specify an action-class● When a token like When a token like NAMENAME is matched, the action-class is is matched, the action-class is

introspected and if it contains a method called introspected and if it contains a method called onNAMEonNAME, , that method is invoked.that method is invoked.

import com.simontuffs.yaccl.RDP.Node;import com.simontuffs.yaccl.RDP;

public class Test2 implements RDP.Actions {

public void onNAME(Node node) {System.out.println("Hi there, " + node.tokenAt(0));

}

}

Page 28: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

28

Copyright © 2004 P. Simon [email protected]

Implicit Action RoutinesImplicit Action Routines

● The new grammar:The new grammar:

● And the result:And the result:

NAME: ['a'-'z' 'A'-'Z']+ NEWLINE;NEWLINE: '\r'? '\n';

action-class = grammars.Test2;

Page 29: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

29

Copyright © 2004 P. Simon [email protected]

You can add actions..You can add actions..

● The YACCL grammar does support actions at The YACCL grammar does support actions at places other than terminalsplaces other than terminals– But again, the action routines are implicitly But again, the action routines are implicitly

associated, and never appear in the grammar files.associated, and never appear in the grammar files.

Page 30: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

30

Copyright © 2004 P. Simon [email protected]

Never say Never...Never say Never...

● Ah, but what if you wanted to put the action Ah, but what if you wanted to put the action routines in the grammar file? No problem!routines in the grammar file? No problem!

NAME: ['a'-'z' 'A'-'Z']+ NEWLINE;NEWLINE: '\r'? '\n';

action { package grammars; import com.simontuffs.yaccl.RDP; import com.simontuffs.yaccl.RDP.*; import java.io.*; public class Test3 implements Actions { public void onNAME(Node node) { System.out.println("Hi there, " + node.flatten()); } // Simple runner, reads from System.in public static void main(String args[]) throws Exception { RDP parser = new Test3().parser(); parser.setReader(new InputStreamReader(System.in)); parser.parse(); } }}

Page 31: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

31

Copyright © 2004 P. Simon [email protected]

How?How?

● YACCL uses “includes” to extend itself, so it can YACCL uses “includes” to extend itself, so it can handle the embedded action Java code. handle the embedded action Java code. // A simple grammar to handle Java classes.

action-class: "action" '{' class '}' ;class: package? import* class-spec '{' block '}' ;package: nl sp "package" ws name ';' ws;class-spec: "public" ws "class" ws lit (ws ("extends" | "implements") (ws name)+)? ws ;import: nl sp "import" ws name ".*"? ';' ws ;name: lit ('.' lit)* ;oc: '{' ;cc: '}' ;ws: [" \t\f\r\n"]+ ;sp: [' ']*;nl: ['\r']? ['\n']? ;block: ( (ws | '.' | name | string | char | oper | oc block cc ) )* ;lit: ['a'-'z' 'A'-'Z' '0'-'9' "_-$"]+;oper: ["+-/*&|%();[]:?!=<>,"]+;

file: bootstrap action-class? control*;

Page 32: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

32

Copyright © 2004 P. Simon [email protected]

A Compiler CompilerA Compiler Compiler

● The YACCL grammar to compile things:The YACCL grammar to compile things:

● It's not grammar inheritance, but it is powerful.It's not grammar inheritance, but it is powerful.– And it's all wrapped up in a tool called 'yaccc':And it's all wrapped up in a tool called 'yaccc':

include = "yaccl.y";include = "action.y"action-class = "com.simontuffs.yaccl.Compiler$Actions";

Page 33: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

33

Copyright © 2004 P. Simon [email protected]

Note the Classpath CurseNote the Classpath Curse

● Take the warning about classloaders seriously. Take the warning about classloaders seriously. – Basically, there are two or more Test3.class files Basically, there are two or more Test3.class files

hanging about, and we got the wrong one.hanging about, and we got the wrong one.

– Sure enough, we want the one under 'gen' which we Sure enough, we want the one under 'gen' which we generated, not the other one generated, not the other one

● That's from an Eclipse build, and loaded by the bootstrap That's from an Eclipse build, and loaded by the bootstrap classloaderclassloader

Page 34: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

34

Copyright © 2004 P. Simon [email protected]

What is it good for?What is it good for?

● So what's it good for?So what's it good for?– Simple hand-crafted parsers, perhaps written using Simple hand-crafted parsers, perhaps written using

“yaccc”. “yaccc”. ● The yaccl.jar file is only 76KB in size. This includes RDP, The yaccl.jar file is only 76KB in size. This includes RDP,

YACCL, and Compiler classes, of which you only need YACCL, and Compiler classes, of which you only need RDP for runtime.RDP for runtime.

– Adaptive grammars. Adaptive grammars. ● Since the grammar is held in a tree, it can be modified at Since the grammar is held in a tree, it can be modified at

runtime.runtime.● This is how the compiler-compiler-compiler works.This is how the compiler-compiler-compiler works.

Page 35: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

35

Copyright © 2004 P. Simon [email protected]

What is it good for?What is it good for?

● Bootstrapping other grammarsBootstrapping other grammars– I wrote an XML parser, bootstrapped through an I wrote an XML parser, bootstrapped through an

EBNF grammar.EBNF grammar.● First I wrote the EBNF parser in YACCL, then used that to First I wrote the EBNF parser in YACCL, then used that to

build an XML parser from the EBNF rules for XML.build an XML parser from the EBNF rules for XML.– Found a few bugs in the EBNF file that wayFound a few bugs in the EBNF file that way

● The XML parser was very slow.The XML parser was very slow.● The XML grammar is large, and uses a lot of whacky The XML grammar is large, and uses a lot of whacky

character literal definitions.character literal definitions.● YACCL doesn't run fast on those.YACCL doesn't run fast on those.

Page 36: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

36

Copyright © 2004 P. Simon [email protected]

Why Yet Another Compiler Compiler Why Yet Another Compiler Compiler Language?Language?

● YACC – Yet Another Compiler CompilerYACC – Yet Another Compiler Compiler– LALR grammars.LALR grammars.– Its output can be considered a compile (or parser).Its output can be considered a compile (or parser).

● YACCL – Yet Another Compiler Compiler YACCL – Yet Another Compiler Compiler LanguageLanguage– LL(LL() grammars) grammars– Its output can be a parser to parse other grammarsIts output can be a parser to parse other grammars

● YACCC – Yet Another Compiler Compiler YACCC – Yet Another Compiler Compiler CompilerCompiler

Page 37: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

37

Copyright © 2004 P. Simon [email protected]

Why Yet Another Compiler Compiler Why Yet Another Compiler Compiler Language?Language?

● Why Bother?Why Bother?– Because it was funBecause it was fun– Because it was easy at the start (then it got very hard, Because it was easy at the start (then it got very hard,

but it was too late to stop)but it was too late to stop)● Where Now?Where Now?

– What would I like to see? I think J2SE could do with What would I like to see? I think J2SE could do with a built-in parser to complement the new a built-in parser to complement the new java.util.regex package.java.util.regex package.

– Maybe YACCL could be a start for that. Maybe YACCL could be a start for that. – Anyone want to do a JSR?Anyone want to do a JSR?

Page 38: 1 Copyright © 2004 P. Simon Tuffssimon@simontuffs.com P. Simon Tuffs, D.Phil simon@simontuffs.com Presented at the Software Development Forum/Java-Sig

38

Copyright © 2004 P. Simon [email protected]

Questions?Questions?

http://www.cs.washington.edu/homes/gyngve/julia.jpg