xml. markup languages a markup language is a formal way of annotating a document or collection of...

51
XML

Upload: lee-osborne

Post on 13-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML

Page 2: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Markup Languages

• A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate the structure of the document or data file, and the contents of its data elements. This markup provides a computer with information about how to process and display marked-up documents.

Page 3: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

What is XML?

• XML is a grammatical system for creating languages… a meta-language

• Use XML to design your own markup language, consisting of meaningful tags that describe the data they contain

• Create a language for describing…anything

Page 4: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

SGML

• Standard Generalized Markup Language, ISO (International Standards Organization) standard ISO/IEC 8879:1986, first used by the publishing industry, for defining, specifying, and creating digital documents that can be delivered, displayed, linked, and manipulated in a system-independent manner.

• the parent of XML• an international standard for the description of marked-

up electronic text• a metalanguage: a means of formally describing a

language• XML is a subset of SGML• SGML is much more complex than XML

Page 5: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

HTML

• HyperText Markup Language, an SGML-derived markup language used to create documents for World Wide Web applications. HTML emphasizes design and appearance rather than the representation of document structure and data elements.

Page 6: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML

• A simplified subset of SGML that is designed specifically for use with the World Wide Web and that provides for more sophisticated data structuring and validation than HTML. XML is widely held to be the successor to HTML as the language of the Web.

Page 7: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML

• What is XML?– EXtensible Markup Language. XML is a set of rules for

defining markup languages and describing data.

• Why XML?– XML is a standard means of delivering structured data via

Web applications.– XML is extensible—both a blessing and a burden– Authors can define their own tags and attributes, e.g. CML

Chemical markup language– You may hear someone from your IT department mention

"well-formed" XML. A well-formed XML file conforms to a set of very strict rules that govern XML. If a file doesn't conform to those rules, XML stops working

Page 8: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML vs. HTML

• HTML tells Web browsers how to display text, images, etc.— emphasis is on display

• Unlike HTML, XML can “take database information with it”; emphasis is on structure, relationships, and ‘meaning’

• XML is a set of rules that are used to create markup languages while HTML is itself a markup language

• Use HTML to describe the appearance of a document and XML to describe the structure

Page 9: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML vs. HTML

• HTML defines only the appearance of your data — it's a pure display language.

• XML describes the structure and meaning of your data. Using tags that describe the structure and meaning of your data makes it possible to reuse that data in any number of ways. For example, if you have a block of sales data, and each item in the block is clearly identified, you can load just the items that you need into a sales report and load other items into an accounting database.

• HTML is limited to a predefined set of tags that all users share.

• XML allows you to create any tag you need to describe your data and the structure of that data.

Page 10: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Classes of Documents

Poetry

Poem

Stanza

Verse

Foot

Caesura

Monograph Title page Table of contents Chapters Sections Paragraphs Appendix Index

Page 11: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Types of Markup

• Procedural– Display: font, italic, bold, etc.

• Descriptive– Structural (document components)– Nominal e.g. <title>

• Referential– Linking

Page 12: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Descriptive Markup

• Defines structural components of a class of documents

• Defines relationships between data elements• Specifies frequency (repeatable, optional,

mandatory)• Establishes the sequence of elements• Codified in a Document Type Definition (DTD) or

an XML Schema• User-friendly documentation in a tag library

Page 13: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Document Type Definitions/Schemas

• Class of Documents• Literary texts• Archival inventories• Web pages• Electronic commerce• Catalog records• Cultural objects• Hypertext documents

• Protocol• TEI• EAD• Dublin Core• BizTalk• MARC• CDWA/VRA Core• HTML

Page 14: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Separating Markup and Display

• Content

• Presentation

• Output

• MARC record

• ILS Software

• Browser

Page 15: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Separating Markup and Display

• Content

• Presentation

• Output

• EAD document

• Stylesheet file

• Browser, Print

Page 16: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Characteristics of Encoding Standards

• Class of documents

• Identifiable set of common elements

• Codification in a standard– MARC 21 standard– EAD DTD and Tag Library

• Standards maintenance process

Page 17: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Something to remember about XML

• XML does not do anything itself. It is pure information wrapped in XML tags.

• You must use other means to send, receive or display the data

XMLis used by

XML technologies To create

Detailed description to view in a browser

PDF for print

Summary entry to view in a browser

Page 18: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML Concepts

• Document Type Definitions/Schemas– Defines document structure

• Elements– Informational units

• Attributes– Modify elements

• Entities– External files

• Style sheets– Prescribe presentation

Page 19: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Elements (nouns)

• Have start tags and end tags– <title>Moby Dick</title>

• Have formal names and tag names– Formal name = paragraph– Tag name (generic identifier) = <p>

Page 20: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Elements (nouns)

• May contain text– PCDATA (parsable character data)

• <title>Moby Dick</title

• May be empty– <lb></lb> (line break)– XML syntax = <lb/> (empty element syntax)

Page 21: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Elements (nouns)

• May contain other elements– Wrappers– Nesting

– Example:

<date><month>September</month><day>12</day><year>1958</year>

</date>

Page 22: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML: elements

<tag> content </tag>

<language> English </language>

Page 23: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Attributes (adjectives)

• Modify the meaning of elements– <car>Honda</car>

• Attributes of cars– Color– Year– Model

• <car color=“green” model=“Civic” year=“1996”>Honda</car>

Page 24: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML attributes

• Attributes are simple name/value pairs associated with an element

<tag attribute_name=“attribute_value”>content</tag>

<language>English</language>

<language langcode=“eng”>English</language>

<date normal=“20040920”>20 Sept 2004</date>

Page 25: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Entities

• A set of characters references as a unit– Special characters

• Language keyboard• Character map: XML software• Character entity: $141; &amp

– Non-text files (images, sound files)– External data files

Page 26: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Style Sheets

• Separate file

• Controls presentation of data– Text format: font, size, color– Text layout: tabs, indents, line spacing, line

breaks, tables

• Can supply default text and images

Page 27: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML example<catalog>

<cd><title>OK Computer</title><artist>Radiohead</artist><type>pop</type><year>1997</year>

</cd><cd>

<title>Stanley Road</title><artist>Paul Weller</artist><type>pop</type><year>1995</year>

</cd></catalog>

Page 28: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML must be well-formed

• a root element is required

<catalog>

…..all your tags and content…

</catalog>

• closing tags are required

Page 29: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML must be well-formed (2)

• elements must be properly nested <physdesc><extent>10 boxes</extent></physdesc>

<physdesc><extent>10 boxes</physdesc></extent>

Page 30: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

XML must be well-formed (3)

• case matters

• attribute values must be enclosed in quotation marks, e.g. langcode=“fre”

• element names must obey some basic rules, e.g. cannot start with numbers or punctuation characters, cannot contain spaces

Page 31: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Valid XML

• Valid XML: rules specify elements and attributes used and how used

• Valid XML provides consistency and facilitates the exchange of data

• Valid XML is important for displaying, processing and exchanging XML in a wider environment

• Must conform to a Document Type Definition (DTD) or Schema

• Archives: Encoded Archival Description - EAD version 1; EAD 2002

Page 32: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Valid XML<ead><archdesc level="fonds">

<did><repository>John Rylands University Library of Manchester</repository><unitid countrycode="GB" repositorycode="0133">GB 0133 NCN</unitid><unittitle>Papers of Norman Nicholson</unittitle><unitdate normal="1899/1987">1899-1987</unitdate><physdesc><extent>0.44 cu.m; 1,201 items</extent></physdesc><langmaterial>

<language langcode="eng">English</language></langmaterial><origination>Nicholson, Norman Cornthwaite, 1914-1987</origination><note>Created by the John Rylands Library archivist</note>

</did> ………..</archdesc></ead>

Page 33: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Document Type Definitions

• A Document Type Definition defines the building blocks of an XML document

• It specifies elements and attributes and defines how they can be used

• People can agree to use a common DTD for interchanging data

• You can include a DTD in your XML source file, or point to an external DTD

Page 34: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Schemas

• Schemas perform the same task as DTDs

• Schemas use XML syntax

• Schemas support complex data types

• Schemas are extensible

• One XML document can point to more than one schema

Page 35: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

A simple XML document

<?xml version="1.0"?>

<note>

<to>Rachel</to>

<from>John</from> <heading>Reminder</heading> <body>Don't forget the concert!</body>

</note>

Page 36: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

HTML vs. XML (1)

• HTML is ONLY for display, typically in a Web browser• HTML tags do not describe the content • HTML cannot easily be extracted

• HTML: <h1> Papers of Peter Rowe </h1>• XML: <title> Papers of Peter Rowe </title>

• HTML: <b> 21 May 2004 </b>• XML: <date> 21 May 2004 </date>

Page 37: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

HTML vs. XML (2)

• XML tags are self-describing

• XML tags can be specified by anyone

• XML is user and machine readable

Page 38: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Why use XML?

• Because everyone else is!

• International standard, supported by the W3C

• XML is open, licence free and platform neutral

• XML is human and machine readable

• XML documents are text documents

Page 39: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

More reasons to use XML

• Separation of content and presentation– With proprietary systems content is inextricably bound

up with format

• XML does not determine the presentation of the data - You can use style sheets to present XML data

Page 40: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

..and even more reasons

• Hierarchical structure- XML documents are hierarchical in nature – with one top-level root element, and hence XML is an excellent choice for setting out hierarchical data in an easy-to-read fashion

• The ability to manipulate and customise- data can be shaped and additions made as the author wishes

Page 41: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

and for data exchange

• XML is the main basis for defining data exchange languages

• Meaningful tags facilitate extraction – data can be manipulated as required

• Text based - highly portable

Page 42: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Summary

• XML is simple, flexible and great for data exchange

• XML must be well-formed and valid

• DTDs and Schemas provide tags, attributes and rules

• EAD is a DTD for archive descriptions

Page 43: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

A brief detour into metadata: Two ways to designate content

In MARC: 245 04 $a The Big heat

In XML: <title>Big heat</title>

<name>value</name>

Page 44: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

In XML the name-value pair comprises an element

An element has these parts:– Start tag– Element content– End tag

<tag>content</tag>

<subject>Goldfinches</subject>

Page 45: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Element rules and features

• Elements can hold data<pubPlace>Boston</pubPlace>

• Elements can hold other elements ad infinitum <sourceDesc> <biblFull> <titleStmt> <title>A letter to Orestes A. Brownson</title> <author>Hildreth, Richard, 1807-1865.</author> </titleStmt> </biblFull> </sourceDesc>

• Elements must be “properly” nested

Page 46: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

A quick look at other XML entities

• Attributes qualify elements<note type="500">Caption title.</note>

• Document Type Definitions (DTDs) control the structure of XML documents<!ELEMENT note (#PCDATA)><!ATTLIST note type CDATA #IMPLIED>

• XML Schemas give more control than DTDs<xs:element ref="note" />

• Extensible Stylesheet Language Transformation (XSLT) stylesheets transform one XML document into another (or into HTML)

Page 47: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

What does XML allow us to do?

• Structure data with a flexible and extensible set of rules

• Share data in a non-proprietary format, especially among “incompatible” systems

• Reuse data, e.g., in different presentation formats for different purposes

Page 48: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Namespaces

• A namespace identifies a specific set of elements

• Namespaces allow metadata terms to be unambiguously used across applications– Defines what ‘Date’ or ‘Title’ means in a

specific usage, or namespace

• Each namespace has a unique identifier associated with it

Page 49: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Namespaces - Example

<dc:DC xmlns:dc='http://purl.org/dc/elements/1.1/'>

<dc:title>Internet Ethics</dc:title> <dc:creator>Duncan Langford</dc:creator> <dc:format>Book</dc:format> <dc:identifier>ISBN 0333776267</dc:identifier>

Page 50: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Namespaces - Example

<d:studentxmlns:s='http://www.develop.com/student' ' xmlns:w='urn:schemas.develop.com:workshop'> <s:id>3235329</s:id> <s:name>Jeff Smith</s:name> <w:name>Emerging Metadata Topics</w:name> <s:institution>XNL</s:institution>

</d:student>

Page 51: XML. Markup Languages A markup language is a formal way of annotating a document or collection of digital data using embedded encoding tags to indicate

Purpose of Using Namespace in XML

• To group all the related elements and attributes from a single XML application together so that software can easily recognize them.

• To distinguish between elements and attributes from different vocabularies with different meanings and that happen to share the same name.

• xmlns:dc="http://purl.org/dc/elements/1.1/ “