7 translating data to xml

39
SDPL 2004 Notes 7: XML Conversion 1 7 Translating Data to 7 Translating Data to XML XML How to translate existing data formats to XML? How to translate existing data formats to XML? (and why?) (and why?) XW (XML Wrapper) XW (XML Wrapper) "XML wrapper description language“ ( "XML wrapper description language“ ( kääreenkuvauskieli kääreenkuvauskieli ) ) developed in XRAKE project, Univ. of Kuopio, 2001–02 developed in XRAKE project, Univ. of Kuopio, 2001–02 Ek, Hakkarainen, Kilpeläinen, Kuikka, Penttinen: Ek, Hakkarainen, Kilpeläinen, Kuikka, Penttinen: Describing XML Wrappers for Information Integration. In Describing XML Wrappers for Information Integration. In Proc. of Proc. of XML Finland 2001 XML Finland 2001 , Tampere, Finland, Nov. 2001, 38– , Tampere, Finland, Nov. 2001, 38– 51. 51. Ek, Hakkarainen, Kilpeläinen, Penttinen: Declarative XML Ek, Hakkarainen, Kilpeläinen, Penttinen: Declarative XML Wrapping of Data. CS Report A/2002/2, Univ. of Kuopio. Wrapping of Data. CS Report A/2002/2, Univ. of Kuopio.

Upload: len-morales

Post on 02-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

7 Translating Data to XML. How to translate existing data formats to XML? (and why?) XW (XML Wrapper) "XML wrapper description language“ ( kääreenkuvauskieli ) developed in XRAKE project, Univ. of Kuopio, 2001–02 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 1

7 Translating Data to XML7 Translating Data to XML

How to translate existing data formats to XML?How to translate existing data formats to XML?– (and why?)(and why?)

XW (XML Wrapper)XW (XML Wrapper)– "XML wrapper description language“ "XML wrapper description language“

((kääreenkuvauskielikääreenkuvauskieli))– developed in XRAKE project, Univ. of Kuopio, 2001–02developed in XRAKE project, Univ. of Kuopio, 2001–02– Ek, Hakkarainen, Kilpeläinen, Kuikka, Penttinen: Describing Ek, Hakkarainen, Kilpeläinen, Kuikka, Penttinen: Describing

XML Wrappers for Information Integration. In Proc. of XML Wrappers for Information Integration. In Proc. of XML XML Finland 2001Finland 2001, Tampere, Finland, Nov. 2001, 38–51., Tampere, Finland, Nov. 2001, 38–51.

– Ek, Hakkarainen, Kilpeläinen, Penttinen: Declarative XML Ek, Hakkarainen, Kilpeläinen, Penttinen: Declarative XML Wrapping of Data. CS Report A/2002/2, Univ. of Kuopio.Wrapping of Data. CS Report A/2002/2, Univ. of Kuopio.

Page 2: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 2

XRAKE ProjectXRAKE Project

""XXML-ML-rarajapintojen japintojen kekehittäminen" hittäminen" (Developing XML-based interfaces)(Developing XML-based interfaces)

Studied definition and implementation of XML-Studied definition and implementation of XML-based interfaces, and their application inbased interfaces, and their application in– integration of heterogeneous data sourcesintegration of heterogeneous data sources– management of mass printingmanagement of mass printing– assembly and manipulation of electronic patient assembly and manipulation of electronic patient

recordsrecords

Page 3: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 3

XRAKE - SupportXRAKE - Support

National Technology Agency of Finland (TEKES) and National Technology Agency of Finland (TEKES) and seven local IT companies/organizations seven local IT companies/organizations – DEIO ISDEIO IS– Enfo GroupEnfo Group– JSOP InteractiveJSOP Interactive– Kuopio University HospitalKuopio University Hospital– MedigroupMedigroup– SysOpenSysOpen– TietoEnatorTietoEnator

Page 4: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 4

XW: MotivationXW: Motivation

XML-based protocols developed for XML-based protocols developed for e-business, medical messages, … e-business, medical messages, …

Legacy data formats need to be converted Legacy data formats need to be converted to XMLto XML– How?How?

Page 5: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 5

XML-wrappingXML-wrapping

Need ”Need ”XML-wrappersXML-wrappers” (aka ” (aka extractorsextractors))– ((käärekääre); interface/conversion program to produce an ); interface/conversion program to produce an

XML representation of source dataXML representation of source data

source1source1

source2source2

source3source3

wrapperwrapper11

XML-form-XML-form-11

wrapperwrapper22

wrapperwrapper33

XML-form-XML-form-22

Page 6: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 6

How to wrap?How to wrap?

1. With an interface integrated to source1. With an interface integrated to source– E.g. XML-interfaces of database systemsE.g. XML-interfaces of database systems– OK, OK, ifif available available

2. With an ad-hoc written translator2. With an ad-hoc written translator– E.g. JDBC+Java, or E.g. JDBC+Java, or

separator-encoded text form + Perlseparator-encoded text form + Perl– Conversion possibly efficientConversion possibly efficient– Development and maintenance tediousDevelopment and maintenance tedious

:: -(-(

Page 7: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 7

How to wrap?How to wrap?(2)(2)3. Generic source-independent 3. Generic source-independent

wrapping wrapping – requires a file/message/report requires a file/message/report

produced by the systemproduced by the system» normally availablenormally available

– development and maintenance of development and maintenance of wrappers should become easierwrappers should become easier

=> Wrapper description language XW=> Wrapper description language XW

Page 8: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 8

XW (XML Wrapper)XW (XML Wrapper)

XML-based, declarative wrapper XML-based, declarative wrapper description languagedescription language

To convert To convert – from a textual or binary sourcefrom a textual or binary source

» currently (XW 1.59) only text sources currently (XW 1.59) only text sources supportedsupported

– to an XML formto an XML form

Page 9: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 9

XW: Design principlesXW: Design principles

A concise and natural XML syntaxA concise and natural XML syntax– description of simple and typical conversion description of simple and typical conversion

tasks should be simpletasks should be simple Solving the key problem: Initial conversion Solving the key problem: Initial conversion

of a legacy data format to XMLof a legacy data format to XML– more general post-processing with more general post-processing with

XSLT/SAX/ DOMXSLT/SAX/ DOM– necessary for being able to apply XML necessary for being able to apply XML

techniquestechniques

Page 10: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 10

XW: InfluencesXW: Influences

XML NamespacesXML Namespaces– for separating XW commands and result elementsfor separating XW commands and result elements

XML SchemaXML Schema– description of alternative and repetitive structures description of alternative and repetitive structures

((CHOICECHOICE, , minoccursminoccurs, , maxoccursmaxoccurs))– data types of binary source data data types of binary source data

(string, byte, int, …)(string, byte, int, …) XSLTXSLT

– template-based description of result documentstemplate-based description of result documents– variables for storing result fragmentsvariables for storing result fragments

xmlns:xw=”http://www.cs.uku.fi/XW/2001” xmlns:xw=”http://www.cs.uku.fi/XW/2001”

Page 11: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 11

<xw:wrapper xw:sourcetype="text" <xw:wrapper xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" xmlns:xw="http://www.cs.uku.fi/XW/2001"

xw:inputencoding="Cp850" … > xw:inputencoding="Cp850" … > <invoice note="XW-generated" <invoice note="XW-generated" xw:starter="\^xw:starter="\^INVOICEINVOICE"">> <identifierdata ...>Inserted result text ... <identifierdata ...>Inserted result text ... ...... </identifierdata></identifierdata> <specification <specification xw:starter="\^xw:starter="\^PHONE SPECIFICATIONPHONE SPECIFICATION"" ...> ...> ...... </specification></specification> <data <data xw:starter="\^xw:starter="\^------"" xw:maxoccurs="unbounded"xw:maxoccurs="unbounded"...>...> ...... </data></data> </invoice></invoice></xw:wrapper></xw:wrapper>

How does XW look like?How does XW look like?

Page 12: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 12

XW-architectureXW-architecture (1)(1)

AA AA x1x1x2x2

BBBBy1y1 y2y2

z1 z1 z2z2

<part-a> <part-a> <e1> <e1>x1x1</e1> </e1> <e2> <e2>x2x2</e2></e2></part-a></part-a><part-b><part-b> <line-1> <line-1> <d1> <d1>y1y1</d1> </d1> <d2> <d2>y2y2</d2></d2> </line-1> </line-1> <d3> <d3>z2z2</d3></d3></part-b></part-b>

XW-engineXW-engine

<xw:wrapper … ><xw:wrapper … > … … </xw:wrapper></xw:wrapper>

source source datadata

wrapper wrapper descriptiondescription

result result documentdocument

XSLT

post-post-processingprocessing

SAX

DOM

Page 13: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 13

XW-architecture (2)XW-architecture (2)

AA AA x1x1x2x2

BBBBy1y1 y2y2

z1 z1 z2z2XW-engineXW-engine

<xw:wrapper … ><xw:wrapper … > … … </xw:wrapper></xw:wrapper>

startElement(part-a, …) startElement(part-a, …) startElement(e1, …) startElement(e1, …) characters(”characters(”x1x1”)”)……

SAX eventsSAX events

source source datadata

wrapper wrapper descriptiondescription - to use as a - to use as a

program program componentcomponent

Page 14: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 14

XW-architecture (3)XW-architecture (3)

ap

plica

tioap

plica

tionn

XW

-engin

eX

W-e

ngin

e

SA

XS

AX

AA AA x1x1x2x2

BBBBy1y1 y2y2

z1 z1 z2z2

<xw:wrapper … ><xw:wrapper … > … … </xw:wrapper></xw:wrapper>

<part-a> <part-a> <e1> <e1>x1x1</e1> </e1> <e2> <e2>x2x2</e2></e2></part-a></part-a><part-b><part-b> <line-1> <line-1> <d1> <d1>y1y1</d1> </d1> <d2> <d2>y2y2</d2></d2> </line-1> </line-1> <d3> <d3>z2z2</d3></d3></part-b></part-b>

result result documentdocument

source source datadata

wrapper descriptionwrapper description

Page 15: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 15

Wrapper description ~ aWrapper description ~ a grammar grammar for sourcefor source Wrapping ~ Wrapping ~ parsing parsing the source datathe source data

– split data into parts split data into parts according to the according to the descriptiondescription

XW: Basic IdeasXW: Basic Ideas

– Result document = Result document = XML for the parse tree XML for the parse tree of the sourceof the source

<whole> <whole>

</whole> </whole>

<a> <a>

</a> </a>

<b> <b>

</b> </b>

<b1> <b1>

<b2> <b2>

<b3> <b3>

Page 16: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 16

XW SyntaxXW Syntax

<xw:wrapper xw:sourcetype=”text” <xw:wrapper xw:sourcetype=”text”

xmlns:xw=”http://www.cs.uku.fi/XW/2001”>xmlns:xw=”http://www.cs.uku.fi/XW/2001”> <invoice … > <invoice … > <identifierdata ...><identifierdata ...>

......

</identifierdata></identifierdata>

<specification ...> <specification ...>

......

</specification> </specification> </invoice> </invoice></xw:wrapper></xw:wrapper>

Splitting of Splitting of source content source content

into partsinto parts(-> elements)(-> elements)

Page 17: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 17

Recognition of content parts (1)Recognition of content parts (1)

by by separators separators ((erotinerotin); For example:); For example:<invoice xw:starter="\^INVOICE"…<invoice xw:starter="\^INVOICE"…

by positionby position (within surrounding part): (within surrounding part):

<invoicenumber <invoicenumber xw:position="53 64"/> xw:position="53 64"/>

(Invoice number is in positions 53..64 of the (Invoice number is in positions 53..64 of the first row of an first row of an identifierdataidentifierdata-part)-part)

<identifierdata <identifierdata xw:childterminator="\n" … >xw:childterminator="\n" … >

for for sub-sub-partsparts

Page 18: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 18

Recognition of content parts (2)Recognition of content parts (2)

In binary data by In binary data by content data typescontent data types; ; For example:For example:

<xw:wrapper xw:sourcetype="binary"...><xw:wrapper xw:sourcetype="binary"...>

<A xw:type="byte"/><A xw:type="byte"/><B xw:type="string" <B xw:type="string"

xw:stringLength="20"/>xw:stringLength="20"/><C xw:type="int"/> <C xw:type="int"/>

</xw:wrapper></xw:wrapper>– Split input to a byte, a string of 20 charactes, and Split input to a byte, a string of 20 charactes, and

an integer; (an integer; ( elements elements AA,, BB andand CC))

Page 19: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 19

Repetition:Repetition: <line xw:terminator="\n" <line xw:terminator="\n"

xw:minoccurs="2" xw:maxoccurs="2"/>xw:minoccurs="2" xw:maxoccurs="2"/>

– 2 input lines 2 input lines 2 2 lineline elements elements

Recognition of content parts (3)Recognition of content parts (3)

Alternative parts:Alternative parts:

<xw:CHOICE xw:maxoccurs=”unbounded"><xw:CHOICE xw:maxoccurs=”unbounded"> <A xw:starter=”\^aa” xw:terminator=”\n” /> <A xw:starter=”\^aa” xw:terminator=”\n” /> <B xw:starter=”\^bb” xw:terminator=”\n” /> <B xw:starter=”\^bb” xw:terminator=”\n” />

</xw:CHOICE></xw:CHOICE> – arbitrary number (at least 1) lines starting with ”arbitrary number (at least 1) lines starting with ”aaaa” ”

or ”or ”bbbb” ” elements elements AA or or BB

Page 20: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 20

XW: Modifying the structure of XW: Modifying the structure of datadata Limited modifications supported:Limited modifications supported:

– discarding parts of datadiscarding parts of data– collapsing levels of hierarchycollapsing levels of hierarchy– adding levels of hierarchyadding levels of hierarchy– generating verbatim content and generating verbatim content and

attributesattributes– re-arranging existing datare-arranging existing data

Page 21: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 21

Discarding parts of dataDiscarding parts of data

<spec xw:starter="SPEC" <spec xw:starter="SPEC"

xw:childterminator="\n">xw:childterminator="\n">

<!-- Split the ”SPEC” into rows: --><!-- Split the ”SPEC” into rows: -->

<!-- Ignore the first three rows: --><!-- Ignore the first three rows: -->

<<xw:ignorexw:ignore xw:minoccurs="3" xw:minoccurs="3" xw:maxoccurs="3" />xw:maxoccurs="3" />

. . .. . .

</spec></spec>

Input parts Input parts notnot matched matched by wrapper elements are ignoredby wrapper elements are ignored

Page 22: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 22

Collapsing hierarchyCollapsing hierarchy

<data<data xw:starter=”START”xw:starter=”START” xw:terminator=”END” xw:terminator=”END” xw:childterminator="\n”> xw:childterminator="\n”>

<!-- ’data’ is made of rows --><!-- ’data’ is made of rows -->

<xw:collapse><xw:collapse>

<date xw:position=”5 14"/><date xw:position=”5 14"/>

<sum xw:position=”16 21"/><sum xw:position=”16 21"/></xw:collapse></xw:collapse>

. . .. . .

</data></data>

Page 23: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 23

Collapsing hierarchy Collapsing hierarchy (2)(2)

– Split source data into Split source data into parts according to parts according to specified separatorsspecified separators

STARTSTART 17.8.199617.8.1996 95.5095.50

ENDEND

<data><data>

. . .. . .

</data></data>

Page 24: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 24

<data><data> <xw:collapse<xw:collapse>> </xw:collapse</xw:collapse>> . . . . . .</data></data>

Collapsing hierarchy Collapsing hierarchy (3)(3)

– split parts into sub-split parts into sub-parts, according to parts, according to sub-elementssub-elements

17.8.199617.8.1996 95.5095.50

Page 25: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 25

<data><data> <xw:collapse<xw:collapse>>

<date><date> </date></date> <sum><sum> </sum></sum> </xw</xw::collapsecollapse>> . . .. . .</data></data>

Collapsing hierarchy Collapsing hierarchy (4)(4)

17.8.199617.8.199695.5095.50

<data><data> <date><date> </date></date> <sum><sum> </sum> </sum> . . . . . .</data></data>

17.8.199617.8.199695.5095.50

Page 26: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 26

<date /><date />

Collapsing hierarchy Collapsing hierarchy (5)(5)

<date><date> </date></date>17.8.199617.8.1996

17.8.199617.8.1996

Input partInput part wrapper elementwrapper element

++ resultresult

<xw:collapse /><xw:collapse />17.8.199617.8.1996 ++

17.8.199617.8.1996default:default: discardwhitespace=”true”discardwhitespace=”true”

Page 27: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 27

Adding levels of hierarchyAdding levels of hierarchy

Example: Recognizing IP addresses in Example: Recognizing IP addresses in binary databinary data

<xw:ELEMENT xw:name=”IP-address"><xw:ELEMENT xw:name=”IP-address">

<a xw:type="byte"/> <a xw:type="byte"/> <b xw:type="byte"/> <b xw:type="byte"/>

<c xw:type="byte"/> <c xw:type="byte"/> <d xw:type="byte"/><d xw:type="byte"/>

</xw:ELEMENT></xw:ELEMENT>

Page 28: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 28

Adding levels of hierarchy Adding levels of hierarchy (2)(2)– Binary data = string Binary data = string

of bytesof bytes

<a>193</a> <a>193</a> <b>167</b> <b>167</b> <c>232</c> <c>232</c> <d>253</d><d>253</d>

193193 232232167167 253253

<IP-address> <IP-address>

<a>193</a> <a>193</a>

<b>167</b> <b>167</b> <c>232</c> <c>232</c> <d>253</d> <d>253</d> </IP-address> </IP-address>

Page 29: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 29

Adding levels of hierarchy Adding levels of hierarchy (3)(3) NB: an NB: an xw:ELEMENTxw:ELEMENT does not correspond to parts of does not correspond to parts of

input data (like ordinary result elements do):input data (like ordinary result elements do):

<!-- Wrap first two lines as INTRO: --><!-- Wrap first two lines as INTRO: --><data xw:childterminator="\n"/><data xw:childterminator="\n"/> < <xw:ELEMENTxw:ELEMENT xw:name="INTRO"> xw:name="INTRO">

<!--lines are matched by these elements:--><!--lines are matched by these elements:--> <xw:collapse /><xw:collapse /> <xw:collapse /><xw:collapse />

</</xw:ELEMENTxw:ELEMENT>> … …

</data></data>

Page 30: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 30

Rearranging contentRearranging content

Content can be rearranged by storing Content can be rearranged by storing results temporarily in variables:results temporarily in variables:

<data xw:childterminator="\n"/><data xw:childterminator="\n"/> <xw:STORE xw:name="lines"><xw:STORE xw:name="lines">

<!-- lines are matched by these elements :--><!-- lines are matched by these elements :--> <line1 /><line2 /> <line1 /><line2 />

</xw:STORE></xw:STORE> … …

<xw:COPY-OF xw:select="lines" /><xw:COPY-OF xw:select="lines" />

</data></data>

Page 31: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 31

<whole><whole> <<xw:STORExw:STORE xw:name=" xw:name="xxxx">"> <a xw:starter="<a xw:starter="AA" xw:terminator="" xw:terminator="$$"/>"/> </</xw:STORExw:STORE>> <b xw:starter="B"><b xw:starter="B"> <b1 xw:starter="1"/><b1 xw:starter="1"/> <b2 xw:starter="2"/><b2 xw:starter="2"/> <<xw:COPY-OFxw:COPY-OF xw:select=" xw:select="xxxx"/>"/> <b3 xw:starter="3"/><b3 xw:starter="3"/> </b></b></whole></whole>

<whole><whole> <b><b> <b1>one</b1><b1>one</b1> <b2>two</b2><b2>two</b2>

<b3>three</b3><b3>three</b3> </b></b></whole></whole>

AAxy..xy..zz$$??????

B..B..1one 2two1one 2two

3three3three

Rearranging result Rearranging result structuresstructures

<a>xy..<a>xy..z</a>z</a>

Page 32: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 32

<whole><whole> <<xw:STORExw:STORE xw:name=" xw:name="xxxx">"> <a xw:starter="<a xw:starter="AA" xw:terminator="" xw:terminator="$$"/>"/> </</xw:STORExw:STORE>> <b xw:starter="B"><b xw:starter="B"> <b1 xw:starter="1"/><b1 xw:starter="1"/> <b2 xw:starter="2"/><b2 xw:starter="2"/> <<xw:VALUE-OFxw:VALUE-OF xw:select=" xw:select="xxxx"/>"/> <b3 xw:starter="3"/><b3 xw:starter="3"/> </b></b></whole></whole>

<whole><whole> <b><b> <b1>one</b1><b1>one</b1> <b2>two</b2><b2>two</b2>

<b3>three</b3><b3>three</b3> </b></b></whole></whole>

AAxy..xy..zz$$??????

B..B..1one 2two1one 2two

3three3three

Rearranging result Rearranging result contentcontent

xy..xy..zz

Page 33: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 33

XW: ImplementationXW: Implementation

Prototype implemented with JavaPrototype implemented with Java Apache Xerces 2.0.1 used as a SAX parserApache Xerces 2.0.1 used as a SAX parser

– to read the wrapper description, to read the wrapper description, which is represented internally as ..which is represented internally as ..

a a wrapper tree wrapper tree ((käärintäpuukäärintäpuu))– guides the parsing of source dataguides the parsing of source data

Page 34: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 34

Wrapper TreeWrapper Tree

Wrapper tree nodeWrapper tree node– corresponds to an element of wrapper descriptioncorresponds to an element of wrapper description– used for matching parts of source dataused for matching parts of source data– includes sets includes sets SS, , BB, , TT and and F F of delimiter stringsof delimiter strings

» computed from wrapper descriptioncomputed from wrapper description» SS: element's own : element's own sstarter stringstarter strings» BB: strings that can : strings that can bbegin part of elementegin part of element

= S = S starters of subelements that can begin the starters of subelements that can begin the part of the elementpart of the element

» TT: : tterminating delimiters for the part of elementerminating delimiters for the part of element» FF: strings that can : strings that can ffollow the part of elementollow the part of element

Page 35: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 35

<xw:wrapper xw:name="Wrapper tree example"<xw:wrapper xw:name="Wrapper tree example" xw:sourcetype="text"xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001">xmlns:xw="http://www.cs.uku.fi/XW/2001"> <doku xw:childterminator="<doku xw:childterminator="\n\n" terminator="" terminator="$$">"> <a xw:starter="<a xw:starter="\^A\^A" xw:minoccurs="0"/>" xw:minoccurs="0"/> <b xw:starter="<b xw:starter="\^B\^B" />" /> <c xw:starter="<c xw:starter="\^C\^C"/>"/> <xw:CHOICE xw:minoccurs="0"<xw:CHOICE xw:minoccurs="0" xw:maxoccurs="unbounded">xw:maxoccurs="unbounded"> <d xw:starter="<d xw:starter="\^D\^D"/>"/> <e xw:starter="<e xw:starter="\^E\^E"/>"/> </xw:CHOICE></xw:CHOICE> </doku></doku></xw:wrapper></xw:wrapper>

dokudokuS: S: B:B:\^A \^A ,,\^B\^BT: T: $$ F:F:

bbS:S:\^B \^B B:B:\^B\^BT:T:\n\n F:F:\^C\^C

aaS:S:\^A\^A B:B:\^A\^AT:T:\n\n F:F:\^B\^B

xw:CHOICExw:CHOICES: S: B:B:\^D\^D,,\^E\^E T: T: F:F:\^D\^D,,\^E\^E, , $$

ccS:S:\^C\^C B:B:\^C\^CT:T:\n\n F:F:\^D\^D,,\^E\^E, , $$

ddS:S:\^D\^D B:B:\^D\^D T:T:\n\n F:F:\^D\^D,,\^E\^E, , $$

eeS:S:\^E\^E B:B:\^E\^E T:T:\n\n F:F:\^D\^D,,\^E\^E, , $$

AaaaAaaaBbbbBbbbCcccCcccEeeeEeeeDdddDdddDdddDddd$$

Page 36: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 36

Executing a wrapper (simplified)Executing a wrapper (simplified)

Traverse the wrapper tree; In each node:Traverse the wrapper tree; In each node:– scan input until the start of corresponding part found (= scan input until the start of corresponding part found (=

a delimiter belonging to set a delimiter belonging to set BB))– report report startElement(…)startElement(…) – EitherEither

» process child nodes recursively, or process child nodes recursively, or

» report report characters(…)characters(…) for a leaf-level element for a leaf-level element

– scan input until the end of the part (using sets scan input until the end of the part (using sets TT and and FF))– report report endElement(…)endElement(…)– if node iterative, and a string in if node iterative, and a string in BB found, reprocess found, reprocess

nodenode

Page 37: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 37

Development history and Development history and statusstatus Fall 2001: language designed from Fall 2001: language designed from

concrete examplesconcrete examples 2002: Design of implementation 2002: Design of implementation

principles, implementationprinciples, implementation– wrapping of separator-based and wrapping of separator-based and

positional text data implementedpositional text data implemented– wrapping of binary data (and few other wrapping of binary data (and few other

details) unimplemented details) unimplemented

Page 38: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 38

XW: Some possible XW: Some possible extensionsextensions Evaluation of expressionsEvaluation of expressions

– for generating computed attributes for generating computed attributes ((implemented implemented ))

– for guiding repetition (for guiding repetition (min/maxoccursmin/maxoccurs) by content ) by content values values

Namespace support for resultsNamespace support for results Describing recursive (unlimited nesting) Describing recursive (unlimited nesting)

source structuressource structures=> recognizing LL(k) languages=> recognizing LL(k) languages

(Usefulness for wrapping data formats?)(Usefulness for wrapping data formats?)

Page 39: 7  Translating Data to XML

SDPL 2004 Notes 7: XML Conversion 39

XW: SummaryXW: Summary

XW: a convenient "XML wrapper XW: a convenient "XML wrapper description language”description language”– Developed at Univ. of KuopioDeveloped at Univ. of Kuopio– for translating legacy data to XMLfor translating legacy data to XML– for declarative wrapper descriptionfor declarative wrapper description– Easier than writing ad-hoc conversion Easier than writing ad-hoc conversion

programsprograms– Working prototype implementationWorking prototype implementation