xml technologies for text encoding tamás váradi [email protected]
TRANSCRIPT
![Page 2: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/2.jpg)
BTANT129 w4 2
Introduction
• Processing XML files– CSS – getting the picture right– XPATH – Finding our way around– XSLT extracting the right info
• Encoding content the right way– Text Encoding Initiative– TEI Lite
• Tools
![Page 3: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/3.jpg)
BTANT129 w4 3
Benefits of XML
• makes structure and content clear• encoding independent of display and
device• portable, platform independent• ideal for exchange of data• with a DTD, validation of document is
easy
![Page 4: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/4.jpg)
BTANT129 w4 4
Limitations of XML
• Verbose annotation increases the size of the files (sometimes hugely)
• Not very efficient format for fast access and recall
![Page 5: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/5.jpg)
BTANT129 w4 5
Displaying XML files?
• Style sheets– consistent design– easy to change– one stylesheet can serve many XML
documents– one documents can use different
stylesheets
![Page 6: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/6.jpg)
BTANT129 w4 6
Cascading Stylesheets
h1: { font-size: 3em; }
Elements are associated with display styles
selector property value
A Stylesheet is a collections of style rules
![Page 7: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/7.jpg)
BTANT129 w4 7
Declaring the stylesheet
<?xml-stylesheet
type = "text/css"
href = "url-of-stylesheet"
?>
<? xml version="1.0' ?>
<? xml-stylesheet type="text/css" href="cards.css" ?>
![Page 8: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/8.jpg)
BTANT129 w4 8
An example
• Load the file letter.xml into Internet Explorer
• Now load the file letter2.xml• View source• Open the file letter.css in notepad• Check that what you see corresponds
to what is in the css file
![Page 9: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/9.jpg)
BTANT129 w4 9
Cascading stylesheets
• Features are inherited down the XML tree
• Three levels of applying styles:1. External stylesheets2. Internal style definitions3. Inline style settings
![Page 10: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/10.jpg)
BTANT129 w4 10
Limitations of CSS
• Elements are formatted in their original sequence
• No means to reorder elements• No means to select a set of elements
![Page 11: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/11.jpg)
BTANT129 w4 11
More advanced techniques
• XSL – Extensible stylesheet Language
• XSLT – XSL with Transformations• XPath – a standard way to find
elements in the XML hierarchy
![Page 12: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/12.jpg)
BTANT129 w4 12
XSLT
• See the excellent introduction to XSLT by Sebastian Rahtz available here
![Page 13: XML technologies for text encoding Tamás Váradi varadi@nytud.hu](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f215503460f94c39672/html5/thumbnails/13.jpg)
BTANT129 w4 13
Standard annotation of content
• XML is an annotation standard• it is not designed for any particular
domain• Need for standard way of encoding
typical text genres like books, dictionaries, letters, radio news etc. etc.
• => TEXT ENCODING INITIATIVES (TEI)