getting started with antlr chapter 1. domain specific languages dsls are high-level languages...

12
Getting Started with ANTLR Chapter 1

Upload: lora-wilson

Post on 28-Dec-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Getting Started with ANTLR

Chapter 1

Page 2: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Domain Specific Languages

• DSLs are high-level languages designed for specific tasks

• DSLs include data formats, configuration file formats, text-processing languages, …

• DSLs make their users effective in a specific domain

Page 3: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

The Big Picture

• Translators map input sentences to output sentences

• Translators have to recognize many different sentences

• We break recognition into two similar but distinct tasks: lexical analysis and parsing

Page 4: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Lexical Analysis

• Lexical analysis consists of reading the input stream, character by character.

• Characters are combined and output as “tokens”

• if (x > 312){ system.out.println(“Hi”);}

• Tokens: if, (, x,WS, >,WS, 312, ),{, system.out.println, (,”Hi”, ), ;, }

Page 5: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Lexical Analysis

• Tokens carry additional information in addtion to the characters they represent

• ANTLR generates a lexical analyser, a Lexer, based on an input grammar it is provided

• We will be building grammars and having ANTLR generate the lexer code for us

Page 6: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Parsing

• Parsing consists of reading tokens and trying to organize them into a valid sentence in the language

• The parser can generate output immediately based on the sentences it recognizes or preserve the structure in the form of an abstract syntax tree (AST) which can be further processed

Page 7: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Translation Data Flow

Lexer Parser

Tree

Walker

CharactersTokens

Output

AST

Page 8: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Finally

• An emitter can take the output of the parser and emit output based on all computations of the previous phases

• Emitter can use templates (documents with holes) that can be filled in

• ANTLR uses the StringTemplate engine to make it easier to build emitters

Page 9: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Characters, Tokens, ASTs

• Lexers consume characters from a CharStream such as ANTLRStream or ANTLRFileStream

• These streams assume that the entire input fits into memory and, as a result, can buffer all characters in memory

• Tokens point directly to character strings in the buffer rather than creating String objects for each token

Page 10: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Characters, Tokens, ASTs

... W I D T H = 2 0 0 ; \ n …Characters

(CharStream)

… ID WS = WS INT ; WS …

x x x

tokens

(Token)

=

ID INT

AST

(CommonTree)

Page 11: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

Characters, Tokens, ASTs

• AST nodes point at token objects rather than copying token data into a tree node

• CommonTree is a predefined node containing a Token payload.

• The type of an ANTLR AST node is treated as Object so there are no restrictions on tree data types

Page 12: Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration

A-mazing Analogy

• Book focuses on the discovery of the implicit tree structure in input sentences and the generation of structured text

• A maze can be thought of as a language recognizer. Imagine a maze with words written on the floor

• Any sentence that guides you from the entrance to the exit is “valid”