syntax-directed transformations of xml streams

35
1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper

Upload: teige

Post on 22-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Syntax-directed Transformations of XML Streams. Stefanie Scherzinger joint work with Alfons Kemper . XML Stream Processing. 1999 Data on the Web Serge Abiteboul Peter Buneman - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Syntax-directed Transformations of XML Streams

1

Syntax-directed Transformationsof XML Streams

Stefanie Scherzinger joint work with Alfons Kemper

Page 2: Syntax-directed Transformations of XML Streams

2

<bib> <book> <year>1999</year> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

<!ELEMENT bib (book)*><!ELEMENT book (year,title,author,author*)<!ELEMENT year #PCDATA><!ELEMENT title #PCDATA><!ELEMENT author #PCDATA>

1. Very long XML documents.

3. Schema information is available.

2. Applications need to becompletely main-memory based.

XML Stream Processing

Page 3: Syntax-directed Transformations of XML Streams

3

XML Query Languages

//book[year=2003]/title

<books> { for $x in input()//book where $x/year=2003 return <book> {$x/title} <authors> {$x/author} </authors> </book> }</books>

XPath

XQuery

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><books><xsl:for-each select="bib/book"><book> <xsl:copy-of select="title"/>

<xsl:copy-of select="author"/></book></xsl:for-each></books></xsl:template></xsl:stylesheet>

XSLT

Schema knowledgenecessary to specify query!

Page 4: Syntax-directed Transformations of XML Streams

4

TransformX Attribute Grammars

1. (Suitable) extended regular tree grammar, e.g. DTD

2. Add attribution functions (Java code)

3. Parser generator produces Java code:• Validates the input• Evaluates the attribution functions

4. Compile and execute

Page 5: Syntax-directed Transformations of XML Streams

5

Extended Regular Tree Grammars

Grammar G = (Nt,T,P,bib)Nonterminals Nt = {bib,pub,year,title,author}Terminals T = {bib,book,year,title,author,PCDATA}

bib ::= bib( pub* ) pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* ) year ::= year( PCDATA )title ::= title( PCDATA )author ::= author( PCDATA ) bib

book

year title author author author

L(G)

Page 6: Syntax-directed Transformations of XML Streams

6

Example: Task<bib> <book> <year>1999</year> < title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

<books> <book> <id>1</id> <title>Data on the Web</title> <year>1999</year> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

1. Re-label root to “books”2. Retrieve all books, but not articles3. For each book, output

• numerical identifier• title, year, and authors

input: output:

Page 7: Syntax-directed Transformations of XML Streams

7

Example: TransformX Attribute Grammar

Page 8: Syntax-directed Transformations of XML Streams

8

Example: TransformX Attribute Grammar

definitionsection

rulessection

class-membersection

attributionfunctions

Page 9: Syntax-directed Transformations of XML Streams

9

Page 10: Syntax-directed Transformations of XML Streams

10

Grammar provides context information potential for optimization

Page 11: Syntax-directed Transformations of XML Streams

11

Extended Regular Tree Grammars

Grammar G = (Nt,T,P,bib)Nonterminals Nt = {bib,pub,year,title,author}Terminals T = {bib,book,year,title,author,PCDATA}

bib ::= bib( pub* ) pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* ) year ::= year( PCDATA )title ::= title( PCDATA )author ::= author( PCDATA ) bib

book

year title author author author

L(G)Abbreviation: (pub*)=(book article)*

Page 12: Syntax-directed Transformations of XML Streams

12

TDLL(1) Grammars

ERTG where rhs is or(regular expression) is one-unambiguous:

• a*.a • a.a* • a.b* a.c* • a.(b* c*)

deterministic parsing with one token lookahead

parse tree can be unambiguously constructed with lookahead of one token:

DTDs are a dialect of TDLL(1) grammars

bib

book

year title author author author

Lee, Mani, Murata, 2000.

Page 13: Syntax-directed Transformations of XML Streams

13

Strong One-Unambiguity

stronglyone-unambiguous

Koch, Scherzinger, 2003.

Page 14: Syntax-directed Transformations of XML Streams

14

Syntax in the AbstractAttributed TDLL(1) grammar, i.e., each production

1. is of one of the four forms:n :: = t()

n :: = {f$[} t()n :: = t() {f$]}

n :: = {f$[} t() {f$]}

2. if is an attributed regular expression, then for the regular expression without the attribution functions:

() must be strongly one-unambiguous

Page 15: Syntax-directed Transformations of XML Streams

15

Example

Page 16: Syntax-directed Transformations of XML Streams

16

Parse Tree

bib

book

year title author author author

Page 17: Syntax-directed Transformations of XML Streams

17

Attributed Parse Tree

bib

book

year title author author author

Page 18: Syntax-directed Transformations of XML Streams

18

Attributed Parse Tree

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 19: Syntax-directed Transformations of XML Streams

19

Attributed Parse Tree

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 20: Syntax-directed Transformations of XML Streams

20

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

L-attributed Grammars

Page 21: Syntax-directed Transformations of XML Streams

21

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 22: Syntax-directed Transformations of XML Streams

22

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 23: Syntax-directed Transformations of XML Streams

23

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 24: Syntax-directed Transformations of XML Streams

24

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 25: Syntax-directed Transformations of XML Streams

25

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

Page 26: Syntax-directed Transformations of XML Streams

26

Page 27: Syntax-directed Transformations of XML Streams

27

In Practice

Page 28: Syntax-directed Transformations of XML Streams

28

In Practice

Page 29: Syntax-directed Transformations of XML Streams

29

accessible from withinattribution functions

Class Members

Page 30: Syntax-directed Transformations of XML Streams

30

transfer informationbetween

attribution functions

TransformXAttributes

Page 31: Syntax-directed Transformations of XML Streams

31

The TransformX Parser Generator

Translation to Java source code:

1. The validator module– validate input– output attribution functions as encountered

in attributed extended parse tree generated in O(|G|3)

2. The evaluator module– evaluate attribution functions– store attributes on stack generated in O(1)

Page 32: Syntax-directed Transformations of XML Streams

32

Experiments

Prototype: C++ implementation,generates Java code

Experiments:1. Validate the input2. Output the input3. Evaluate example

Data: Books and articles, datasets 31-122 MB

Memory consumption: 12 MB

Page 33: Syntax-directed Transformations of XML Streams

33

Conclusion & Summary

• TransformX attribute grammars specify many queries conveniently often more convenient than SAX grammar may reveal potential for optimization

• TransformX parser generatorlittle runtime-overhead (validation+attributes)

• Prototype implementation

Page 34: Syntax-directed Transformations of XML Streams

34

Selected Related WorkXML and Attribute GrammarsM. Benedikt, C.Y. Chang, W. Fan, J. Freire,

and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03.

M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02.

C. Koch and S. Scherzinger:“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.

F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan. 2002.

S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, 2005.

One-unambiguous Regular LanguagesBrüggemann-Klein and D. Wood. “One-

Unambiguous Regular Languages“. Information and Computation, 1998.

Strong One-unambiguityC. Koch and S. Scherzinger:

“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.

TDLL(1) GrammarsD. Lee, M. Mani, and M. Murata. “Reasoning

about XML Schema Languages using Formal Language Theory.“ Technical Report RJ 10197 Log 95071, IBM Research, Nov. 2000.

Lex&YaccJ. R. Levine, T. Mason, D. Brown. “lex&yacc“.

O‘Reilly, 1992.

Page 35: Syntax-directed Transformations of XML Streams

35

Thank you