extensible markup language modeling data & metadata wacc ‘99rohit khare san francisco, cauc...

100
Extensible Markup Language Modeling Data & Metadata WACC ‘99 Rohit Khare San Francisco, CA UC Irvine 22 February 1999 4K Associates

Upload: noreen-dennis

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

Extensible Markup Language

Modeling Data & Metadata

WACC ‘99 Rohit KhareSan Francisco, CA UC Irvine22 February 1999 4K Associates

Page 2: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 2

XML: Modeling Data & Metadata

Computer-Supported Cooperative Work systems call for a portable information delivery format – XML is self-descriptive, to cross ontological

boundaries XLinks can reduce the burden of marshaling

interdependent information amongst agents RDF & Namespaces promise composable schema

definition – for XML tags, and for other metadata

Web-based CSCW applications are using XML: Distributed Authoring and Versioning (WebDAV) Interchangeable Process Models (Endeavors)

Page 3: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 3

Agenda

XML: The Least You Need to Know The Evolution of XML Origin

The Origin of (Document) Species Evolution

XML 1.0, Namespaces, XLink, and XSL Specification

Capturing the State of Distributed SystemsImplication

RDF: Model, Syntax, and Schema DCD: Document Content Descriptions

XML in WebDAV: A Tale of Two StandardsDesign Kitchen

http://www.ics.uci.edu/~rohit/wacc99/

Page 4: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

The Ascent of XML

Joint with Dan Connolly Fall 97 Web Journal

Page 5: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 5

Mission Statement

XML was designed to provide an easy-to-write, easy-to-interpret, and easy-to-implement subset of SGML. It was not designed to provide a “one Markup Language fits all” DTD, or a separate DTD for every tag. It was designed so that certain groups could create their own particular markup languages that meet their needs more quickly, efficiently, and (IMO) logically. It was designed to put an end once and for all to the tag-soup wars propagated by Microsoft and Netscape.

Jim Cape, June 3, 1997posted to comp.infosystems.www.authoring.html

Page 6: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 6

World-Wide Markup Language

“The HyperText Markup Language is an SGML format.” [Tim Berners-Lee, 1991, “About HTML”]

Standard, Open Document Formats since the 1960s (GCA’s GenCode, IBM’s GML)

SGML [ISO 8879:1986] can prove the validity of any document using standard DTDs w/complex structuring

1990: Berners-Lee picked tags from a sample SGML DTD and added “killer feature”: links

World-Wide HTML tagset embodies 80/20 rule XML allows Community-Wide Markup Languages,

combining SGML’s power with HTML’s simplicity

Page 7: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 7

Community-Wide Markup Languages

Community size is inversely proportional to shared context Millions agree that <B> means bold, but 2/6/98 reflects local culture XML decentralizes control of specialized markup languages,

making it cost-effective to capture community ontologies HTML is not unilaterally extensible (new tags have potentially

ambiguous syntax, style, and semantics) XML is a strict (but simplified) subset of SGML, offering:

Extensibility – can define new elements, containers, attribute names Structure – a DTD can constrain the information model of a document Validation – every document can be validated; also, well-formedness

can establish conformance to the structure mandated by the DTD XML includes extensible linking and style formatting also “Node content must be left free to evolve.”

[Tim Berners-Lee, 1991, “About Document Formats”]

Page 8: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 8

Future Evolution of XML

Coevolution of HTML and XML All XML is SGML All HTML is SGML But all HTML is not XML...

The HTML 5 effort is aiming in this directionXML-in-HTML offers several near-term solutions

XML profiles SGML: No Markup Minimization No Optional Features Technical Corrigendum harmonizes SGML86

Page 9: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 9

Scorecard… 1/4

Organizations W3C: World Wide Web Consortium

Corporate members sponsored (and trademarked) XML™

ISO: International Organization for StandardsSovereign states sponsored SGML

GCA: Graphics Communications AssociationPrinting industry folks sponsored global *ML

conferences

Page 10: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 10

Scorecard… 2/4

XML Coordination Group Initiation & Oversight of XML Activity, public contact Chair: Jon Bosak (Sun)

XML Schema WG Because grammar rules aren’t enough Chairs: Dave Hollander (HP), C.M. Sperberg-

McQueen (UIUC)

XML Information Set WG Abstract representation of parsed XML entities Chair: David Megginson (Microstar)

Page 11: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 11

Scorecard… 3/4

XML Fragment WG How to clip out a single span of XML, in context Chair: Paul Grosso (ArborText)

XML Syntax WG Maintains the core specification Style sheet linking, canonicalization, profiling Chairs: Tim Bray (Textuality) and Joel Nava (Adobe)

Extensible Stylesheet Language (XSL) WG a language for transforming XML documents an XML vocabulary for formatting semantics

Page 12: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 12

Scorecard… 4/4

Document Object Model (DOM) WG Scripting hooks for HTML/XML in browsers Lauren Wood (SoftQuad)

Cascading Style Sheets (CSS) WG Declarative formatting directives for HTML/XML Leaders: Håkon Lie, Chris Lilley, Bert Bos

(W3C/Inria)

Resource Description Format (RDF) WG Meta-metadata framework; knowledge

representation Chairs: Ora Lassila (Nokia), Ralph Swick (W3C/MIT)

Page 13: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

The Origin of (Document) Species

Presented at WWW7 (Brisbane)

Page 14: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 14

The Origin of (Document) Species

The Document EcologyEvolutionary Adaptations of:

Syntax SGML

Style CSS/XSL

Structure HTML

Semantics XML

The Fossil Record

Page 15: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 15

The Document Ecology

The World Wide Web is the “universe of network-accessible information” [Tim Berners-Lee, 1996] Openness and Content-Neutrality of Documents

HTTP can adapt to any document format URL can represent links to any document format, from within

many “Natural Selection” Favors a Few Document Formats

Preferential adoption of SGML, CSS, HTML, and now XML Each embodies the evolutionary strategy of parsimony

Evolution: Capture Info --> Represent Knowledge Can leverage Web reflexively to capture structure and

semantics XML-based document formats represent an ecosystem of

interdependent (rather than competing) document “species”

Page 16: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 16

Evolution: Syntax

Issues of Concrete Representation Binary (machine) vs. Text (human) formats Mission-specific vs. Generic formats Context-free vs. Turing-complete formats

From Turing-Complete to Declarative

Context-Free Turing-Complete

Text Binary Text Binary

Specific MIF Dump JavaScript Intel x86

General SGML ASN.1 UNIX Scripts COFF

Page 17: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 17

Evolution: Style

Externalized Formatting over Embedded Directives {\keepn\par\pard\sb240\sl-264 \b1\hyppar0 Warning: do not …} <P CLASS=“WARNING”> …

WARNING { font : bold } ... <?XML-stylesheet TYPE=“text/css” HREF=“warning.css”?> <WARNING> …

WARNING { font: bold } ...

Cascading to eXtensible Style SheetsRendering: displays, Braille, audio, ...From Inline Formats to Style

Sheets

Page 18: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 18

Evolution: Structure

Anatomy of a Newspaper Article Logical: headline, byline, body, footer, … Descriptive: bold, italic, indented, … Declarative: title, address, keyboard-input, … Custom declarative: <dateline>, <byline>, ...

Automatic Information Collection Resource Discovery: For providing useful hints to search engines Classifying: For cataloguing information content and relationships Content Rating: For aggregation and filtering Knowledge Codifying, Sharing, and Exchanging: For processing

From Presentational to Declarative

Page 19: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 19

Evolution: Semantics

How well does the document support the potential uses of its contents?

Scenario: To-Do List Manager Natural-language Scratchpad HTML Definition List (Datebook) XML <DEADLINE AT=“iso-date”> Element

Composable, Networked DTDs Reuse by Linking Namespaces

From Operational to Well-defined

Page 20: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 20

Citation Encodings 1/2

Presentational Text: “XML, Java, and the Future of the Web”, Jon Bosak, World Wide Web

Journal, 2(4):219--228, (1997) J. Bosak, World Wide Web Journal, “XML, Java, and the Future of the

Web”, Autumn 1997, Vol. 2, No. 4, pp. 219--228.

Presentational HTML: <UL>XML, Java, and the Future of the Web</UL>,<I>World Wide Web

Journal</I>, Jon Bosak,<B>2(4):219-228</B>, 1997.

Structural HTML: <CITE>XML, Java, and the Future of the Web</CITE>

<H3>World Wide Web Journal</H3> <H4>Jon Bosak</H4><UL> <LI> 2(4):219-228 <LI> 1997 </UL>

Page 21: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 21

Citation Encodings 2/2

Customized XML: <BIB><TITLE>XML, Java, and the Future of the Web</TITLE>

<JOURNAL>World Wide Web Journal</JOURNAL><AUTHOR> <FIRSTNAME>Jon</FIRSTNAME> <LASTNAME>Bosak</LASTNAME></AUTHOR><VOLUME>2</VOLUME> <NUMBER>4</NUMBER><YEAR>1997</YEAR> <PAGES>219-228</PAGES> </BIB>

RDF Metadata in XML, using schemas from the Web: <rdf:Description

about="http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm" xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#" xmlns:bib="http://library.org/bibliography-info#"> <dc:Title>XML, Java, and the Future of the Web </dc:TITLE> <bib:Journal href="http://www.w3j.com/"> World Wide Web Journal</bib:Journal> <dc:Creator>Jon Bosak</dc:Creator> <bib:Volume>2</bib:Volume> <bib:Number>4</bib:Number> <bib:Year>1997</bib:Year> <bib:Pages>219-228</bib:Pages></rdf:Description>

Page 22: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 22

The Fossil Record

DocumentFormat

Syntax Style Structure Semantics

ASN.1 Binary   Type-Length-Value Per-application

Text ASCII, Unicode... Lines   Natural Language

troff Readable Text Inline Directives Sections, Pages Typesetting

TeX Readable Program LaTeX Sections, Pages Typesetting

PostScript Programming Language   Pages Drawing

Rich Text Format Opaque Text Extensible Directives Characters, Paragraphs  

HTML formatting Readable Text Nested Directives Presentational  

HTML structure Readable Text CSS Declarative Fixed (e.g.,<ADDRESS>)

XML Readable Text CSS, XSL Declarative Extensible

PICS S-expressions   Ratings Metadata Schema

RDF XML Text   Declarative Metadata Schema

Page 23: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

XML: Specifications

Details, Details, Details…

Page 24: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 24

XML 1.0 Syntax Self-description with Document Type Defn (DTD):

Structural rules of the document’s markup External entities, internal entities, non-XML resource entities

Non-Minimization (All Markup is Explicit) XML Processor - customizable info structures

Most XML users will not know they are using a DTD! XML-aware software works with custom XML apps without help

Document Validity Conforms to DTD Document well-formedness = structurally sound, but no DTD Well-formedness for delivery format, validity for authoring

Page 25: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 25

XML 1.0: Markup Types Elements - start-tags, end-tags, empty tags

<joke>Take my XML. Please.<applause/></joke>

Attributes - name-value pairs in tags <warning class=“emergency”>

Entity References - special chars, macros, external Comments - not passed along to application

<!-- No dash pair before end of comment. -->

Processing Instructions - passed along to application <?pi-target-name pi-data?>

CDATA Sections - parser ignores markup *p = &q; b = (i <= 3);

Page 26: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 26

XML-1.0: DTDs 1/2

Element Declarations - name and content model <!ELEMENT email (from, to+, cc*, subject, body, sig?) >

<!ELEMENT body (#PCDATA | image)* ><!ELEMENT image EMPTY >

<!ELEMENT node (desc, node*)><!ELEMENT desc (#PCDATA)>

Attribute Declarations - which elements may have what attributes; default and possible values <!ATTLIST joke

name ID #required label CDATA #implied status ( funny | notfunny ) funny >

Page 27: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 27

XML-1.0: DTDs 2/2

Attribute Declarations - name, type, default Attribute types: CDATA, ID, IDREF/IDREFS, ENTITY/ENTITIES,

NMTOKEN/NMTOKENS, a list of names Default values: #REQUIRED, #IMPLIED, “value”, #FIXED “value”

Entity Declarations - associate name with chunk Internal entities: &lt; &gt; &amp; &apos; &quot;

or locally defined: &fork; <!ENTITY fork “4K Consulting”> External entities: <!ENTITY forkfooter SYSTEM “fork/footer.xml”>

<!ENTITY pic SYSTEM “http://4k.org/4k.gif” NDATA GIF87A> Parameter entities: <!ENTITY % html.ver “-//W3C//DTD HTML 4.0//EN”>

Notation Declarations - external binary data <!NOTATION GIF87A SYSTEM “GIF”>

Page 28: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 28

XML: Namespaces

Basic concept: element names and attribute names can be viewed as URIs

But disagreement over the details took up most of 1997

Use the reserved xmlns: attribute to introduce a new namespace name, optionally with a prefix <?xml version="1.0"?>

<!-- both namespace prefixes are available throughout --><bk:book xmlns:bk='urn:loc.gov:books’         xmlns:isbn='urn:ISBN:0-395-36341-6'>    <bk:title>Cheaper by the Dozen</bk:title>    <isbn:number>1568491379</isbn:number></bk:book>

Page 29: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 29

XLink : ‘Real’ Hypertext Links

Start with HTML, add HyTime and TEI concepts Simple links point to a single target resource

<A XML-LINK=“SIMPLE” HREF=“http://www.w3.org/XML”> URL schemes: ftp, http, file, mailto, telnet, nntp, ...

Links can have roles (machine-processible) and human-readable labels associated with them

Can specify default behavior of a link SHOW - embed in current context, replace it, or start new one ACTUATE - user must take action (or not) before dereferencing

Locators - # separates resource name and part id Connectors - | to navigate to relevant element(s) only

Page 30: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 30

XLink: Advanced Extended links - can be multidirectional, need not

live in the resources they point, link groups <related-term-group>Hamlet

<related-term HREF=“Othello.xml”/><related-term HREF=“KingLear.xml”/><related-term HREF=“Macbeth.xml”/></related-term-group>

<!ELEMENT related-term-group (#PCDATA | related-term)* ><!ATTLIST related-term-group XML-LINK CDATA #FIXED “EXTENDED” INLINE CDATA #FIXED “TRUE” CONTENT-ROLE CDATA #FIXED “RT” … >

Extended Pointers - locate resource by traversing the element tree of its containing document XPointers allow links without modifying the containing

document

Page 31: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 31

XSL: “Behavior Sheets” CSS1 - formatting rules in terms of element names, IDs XSL - based on DSSSL (ISO/IEC 10179:1996)

Formatting specification derived from active style sheets With XML doc structure, create flow objects (paragraphs,

tables, ...) with characteristics (font-name, font-size, …) Merging, flow object tree determines document layout Can calculate tables-of-contents, indexes, other scripts

Scheme - core expression (math) language in XS Construction rules declare element style

font-size:18pt, first-line-start-indent:20pt, quadding:left

Typically, formatting shorthand declared as functions (element CODE (UNDERLINED-PHRASE))

Page 32: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

Capturing the State of Distributed Systems

July 1997 IEEE Internet ComputingFall 1997 Web Journal

Page 33: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 33

Data Archaeology

Meishi, or Business Cards Different shapes, sizes, scripts,

demarcations Two sided, magnetic, photos, public keys

Airline Passenger Name Records “NQSS5A” means something to airline Must be manipulated throughout its lifetime

Need stable data format, stable grain of exchange, and common definitions

Page 34: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 34

Every Bit in XML

Across Time Save Self-Description with Data State “Future Proofing” of Documents and Data

Across Space For Exchange as well as Storage of Data

Across Organizations Building Community-Specific Ontologies Key to Knowledge Representation

Page 35: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 35

Across TimeTensions Create Brittle Data Formats

Inertia, Efficiency, Tools, and Context

Future-Proofing Strategies Machine-Readable - for parsers & generators Human-Readable - robustness and simplicity Self-Descriptive - for extraction and validation

XML as Basis to Execute Strategies e.g., Capturing Database Schema as DTDs

Page 36: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 36

Across Space

Tradeoffs when Marshaling Data Distributed Systems with Centralized Data Need: Decentralized, Isolated Subsystems

Strategy: Defer Marshaling Decisions The Web’s Lesson: Link Instead Download Networked Resources as Needed

Leveraging the XLink Model SHOW and ACTUATE; XPointer identifiers

Page 37: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 37

Across Cultures

Challenges to Collaboration Organizations Defined by their Language Ontological Problem: Matching Vocabulary

Strategy: Use Documents “Put It In Writing” Defining Common Terms Popular Ontologies Can Emerge Organically

Solution: XML-enhanced Documents Let a Thousand DTDs Bloom on the Web

Page 38: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 38

XML Example Applications

Mathematical Markup Language MathMLChemical Markup Language CMLDNA Sequences BSMLChannel/Site Manifests CDF, MCFVector Graphics VML, PGMLSynchronous Multimedia SMILWeb Form Automation WIDLForm Transaction Records XFDLSchema Description RDF

Page 39: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 39

Some Software Tools JSXML - Java Object Stream to XML packages

http://www.camb.opengroup.org/~laforge/jsxml/

Lark - a non-validating XML parser in Java http://www.textuality.com/Lark/

MSMXL - Microsoft’s XML parser in Java http://www.microsoft.com/xml/

NXP - a validating parser in Java http://www.edu.uni-klu.ac.at/~nmikula/NXP/

Many, many others http://www.cs.caltech.edu/~adam/local/xml.html

Page 40: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 40

XML: Recommended Reading

XML Books: Richard Light, Presenting XML, Sams.Net, August 1997. Web Journal - XML issue, O’Reilly, Fall 1997 http://www.w3j.com/xml/

XML Papers from Khare/Rifkin: X Marks the Spot: An Introduction to XML

The Ascent of XML (with Dan Connolly)Capturing the State of Distributed Systems with XMLThe Origin of (Document) Species

http://www.cs.caltech.edu/~adam/papers/www XML Links Galore http://www.cs.caltech.edu/~adam/local/xml.html

Robin Cover’s Extensive XML Page http://www.sil.org/sgml/xml.html

Peter Flynn’s XML FAQ http://www.ucc.ie/xml/

XML-Dev Jewels http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html

This talk is at http://www.ics.uci.edu/~rohit/wacc99/xml

Page 41: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

ResourceDescriptionFramework

Model, Syntax, and Schema Specifications

Rohit KhareUC Irvine

4K Associates

Page 42: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 42

Metadata about the RDF Spec <rdf:Description about=""

xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#" xmlns:ddc="http://purl.org/net/ddc#" dc:Title="Resource Description Framework (RDF) Model and Syntax Specification” dc:Description="The Resource Description Framework (RDF) is a foundation

for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources."

dc:Publisher="World Wide Web Consortium" dc:Date="1998-08-19" dc:Format="text/html" dc:Type="technical specification" dc:Language="eng"> <dc:Subject resource="http://purl.org/net/ddc/025.30285"

ddc:Class="025.30285" ddc:Heading="data processing computer applications" />

<dc:Subject resource="http://purl.org/net/ddc/025.316" ddc:Class="025.316" ddc:Heading="Machine-readable catalog record formats" />

<dc:Subject ddc:Class="025.302855741" ddc:Heading="Applications of computer file organization and access methods" />

<dc:Creator> <rdf:Bag rdf:_1="Ora Lassila" rdf:_2="Ralph Swick" /> </dc:Creator> </rdf:Description>

Page 43: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 43

Dissecting the label 1/3

A set of statements about this object (the spec): <rdf:Description about=""

Introducing three vocabularies to describe it: Basic RDF

xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"

Dublin Core xmlns:dc="http://purl.org/metadata/dublin_core#"

Dewey Decimal Code xmlns:ddc="http://purl.org/net/ddc#"

(xmlns: prefixes are used to define new XML attributes and tags)

Page 44: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 44

Dissecting the label 2/3

Using Dublin Core, as attributes of Description

dc:Title="Resource Description Framework..." dc:Description="(RDF) is a foundation for..." dc:Publisher="World Wide Web Consortium" dc:Date="1998-08-19" dc:Format="text/html" dc:Type="technical specification" dc:Language="eng">

...and as an element, using an RDF bag: <dc:Creator>

<rdf:Bag rdf:_1="Ora Lassila" rdf:_2="Ralph Swick" /></dc:Creator>

Page 45: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 45

Dissecting the label 3/3

... and as a repeated Dublin Core element, with Dewey Decimal Code attribtes: <dc:Subject resource="http://purl.org/net/ddc/025.30285"

ddc:Class="025.30285" ddc:Heading="data processing computer applications"/>

<dc:Subject resource="http://purl.org/net/ddc/025.316" ddc:Class="025.316" ddc:Heading="Machine-readable catalog record formats"/>

<dc:Subject ddc:Class="025.302855741" ddc:Heading="Applications of computer file organization and access methods"/>

Finally, returning to the original HTML head: </rdf:Description>

Page 46: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 46

Introduction to RDF The label we just dissected is critical to the Web's

future Beyond machine-readable to machine-understandable

The RDF effort unites a wide array of players Digital librarians, content-raters, privacy advocates, ... Significant industrial momentum, led by W3C

1. The Data Model Resources, properties, and statements

2. The Syntax Rendering into XML with Namespaces

3. The RDF Schema Using RDF to describe new vocabularies

Implications & Applications

Page 47: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 47

Why metadata matters...

Automated processing of Web resources: Resource discovery, cataloging Content rating PICS Collections of pages Sitemaps Security & Privacy P3P, DSIG Intelligent software agents

Sharing data between multiple applications and organizations requires explicit definitions

XML enables processing;RDF enables understanding

Page 48: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 48

The RDF Working Group Co-chaired by Ora Lassila & Ralph Swick Chartered in the Technology & Society Domain First draft was published in August 1997

Represents many communities: Web Standardization HTML Meta, PICS Library Dublin Core, Warwick Framework Structured Documents SGML, XML Knowledge Representation KIF

Significant Industrial Momentum Ex: “What’s Related” button in Netscape Navigator...

Page 49: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 49

1. The Data Model

Resources Any URI reference, from a fragment to a site.

Property Types Named type defines meaning, permitted

values, and relationship to other types. (Types are also resources)

Statements

“Resource has Property with Value” (Values can be resources or atomic XML data)

Page 50: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 50

A Trivial Example

Sentence “Ora Lassila is the creator of the resource

http://www.w3.org/Home/Lassila”

Structure Resource http://www.w3.org/Home/Lassila Property type Creator Value "Ora Lassila"

Directed acyclic graph

Page 51: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 51

An Indirect Example

To add properties to Creator, point through a (possibly anonymous) intermediate Resource.

Page 52: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 52

Collection Containers

Multiple occurrences of the same PropertyType doesn’t establish a relation between the values The Millers own a boat, a bike, and a TV set The Millers need (a car or a truck) (Sarah and Bob) bought a new car

RDF defines three special Resources: Bag unordered values rdf:Bag

Sequence ordered values rdf:Seq

Alternative single value rdf:Alt Core RDF does not enforce ‘set’ semantics amongst values

Page 53: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 53

Example: Bag The students in

course 6.001 are Amy, Tim,John, Mary,and Sue

instanceOf: has been renamed type:

Page 54: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 54

Example: Alternative

The source code for X11 may be found at ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net

Page 55: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 55

Reification

Making statements about statements requires a process for transforming them into Resources propObj the original referent propName the original property type value the original value type the type of this resource

Reified statements are themselves RDF:PropertyCollections are also built-in RDF types

Distributive Referents Referring to a resource vs. its members (aboutEach)

Page 56: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 56

Example: Reification

Ralph Swick says that Ora Lassila is the creator of the resource

http://www.w3.org/Home/Lassila

Page 57: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 57

Recap: A Formal Model of RDF

RDF itself is mathematically straightforward: Definitions Typing Reification Collections

... though the mapping onto XML syntax (and XML’s formal model) is less so...

Page 58: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 58

Formal Model: Definitions

1. There is a set called Nodes 2. There is a subset of Nodes called PropertyTypes

3. There is a set of 3-tuples called Triples {p,r,v}  where p is a member of PropertyTypes, r

is a member of Nodes, and v (called value) is either a member of Nodes or an atomic value

“v is the value of p for r” “r has a property p with a value v” “the p of r is v”

Page 59: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 59

Formal Model: Typing

4. There is an element of PropertyTypes known as RDF:instanceOf.

5. Members of Triples of the form {RDF:instanceOf, r, v} imply r and v are members of Nodes. [RDFSchema] places additional restrictions on

the use of instanceOf.

Page 60: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 60

Formal Model: Reification

6. There is an element of Nodes, not contained in PropertyTypes, known as RDF:Property.

7. There are three elements in PropertyTypes known as RDF:propName, RDF:propObj and RDF:value.

8. Reification of a triple {p,r,v} of Triples is: an element n of Nodes representing the reified triple; and four new elements of Triples: {RDF:propName, n, p}  {RDF:propObj, n, r}  {RDF:value, n, v} {RDF:instanceOf, n, [RDF:Property]}

Page 61: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 61

Formal Model: Collections

9. There are three elements of Nodes, not contained in PropertyTypes, known as RDF:Seq, RDF:Bag, and RDF:Alt.

10. There is a subset of PropertyTypes corresponding to the ordinals called Ord. Refer to elements of Ord as RDF:_1, RDF:_2, ... There must always be one value for RDF:Alt

(RDF:_1 is the default)

Page 62: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 62

2. The Syntax

Why XML alone does not sufficeBasic RDF-in-XML SyntaxAbbreviated FormsDistributing RDF metadataCollected BNF GrammarFormal mapping to XML

Page 63: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 63

Why XML alone does not suffice

XML can already handle new PropertyTypes: <A HREF="RDF-intro">I liked <DC:Creator>Ora

Lassila</DC:Creator>’s RDF introduction</A>!Just declare a new element or attribute!

... but “raw XML” fails in two ways: Interchange: Namespaces only identify new tags;

DTD semantics do not provide types or composition Scalability: Processing generic XML requires

parsing text (“the entity tax”); and the order of XML elements is considered significant, requiring the whole graph.

Page 64: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 64

Basic RDF-in-XML Syntax A Description block about a Resource [1] RDF ::= ['<rdf:RDF>'] description* ['</rdf:RDF>'] [2] description ::= '<rdf:Description' idAboutAttr? '>' property* '</rdf:Description>' [3] idAboutAttr ::= idAttr | aboutAttr [4] aboutAttr ::= 'about="' URI-reference '"' [5] idAttr ::= 'ID="' IDsymbol '"'

contains PropertyName block or empty elements [6] property ::= '<' propName '>' value '</' propName '>'

| '<' propName resourceAttr '/>'

using fully-qualified XML Namespaces on each tag [7] propName ::= Qname

and allows values to be XML data, RDF, or external links

[8] value ::= description | string [9] resourceAttr ::= 'resource="' URI-reference '"'

Page 65: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 65

Example: Basic RDF-in-XML

“Ora Lassila is the Creator of the resource...” <rdf:RDF>

<rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator>| </rdf:Description></rdf:RDF>

(where s: is a separately-declared namespace) xmlns:s="http://description.org/schema/"

So the complete, valid XML document would be: <?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:s="http://description.org/schema/"> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator> </rdf:Description></rdf:RDF>

Page 66: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 66

Abbreviated Forms

XML Namespace defaulting can shorten that: <?xml version="1.0"?>

<RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"> <Description about="http://www.w3.org/Home/Lassila"> <Creator xmlns="http://description.org/schema/"> Ora Lassila</Creator> </Description></RDF>

(but such aggressive elision is officially discouraged)

RDF itself offers 3 abbreviations: String values as XML attributes Nested descriptions as XML attributes instanceOf: property types as XML element names

Page 67: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 67

RDF Abbreviation Rules

Non-repeated, string-valued properties can fold into attributes; Description can be empty <rdf:Description about="http://www.w3.org/Home/Lassila"

s:Creator="Ora Lassila" />

Simple Description-valued properties, too: <rdf:Description about="http://www.w3.org/Home/Lassila">

<s:Creator resource="http://www.w3.org/staffId/85740" v:Name="Ora Lassila" v:Email="[email protected]" /></rdf:Description>

PropertyTypes can be promoted to elements <rdf:Description about="http://www.w3.org/staffId/85740">

<rdf:instanceOf resource="s:Person" /> ... <s:Person about="http://www.w3.org/staffId/85740">

Page 68: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 68

Example: Aggregates a document with

two authors specified alphabetically,

a title specified in two different languages,

and with two equivalent locations

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#"> <rdf:Description about="http://www.foo.com/cool.html">

<dc:Creator> <rdf:Seq ID="CreatorsAlphabeticalBySurname"> <rdf:li>Mary Andrew</rdf:li> <rdf:li>Jacky Crystal</rdf:li> </rdf:Seq> </dc:Creator>

<dc:Identifier> <rdf:Bag ID="MirroredSites"> <rdf:li rdf:resource="http://www.foo.com.au/cool.html"/> <rdf:li rdf:resource="http://www.foo.com.it/cool.html"/> </rdf:bag> </dc:Identifier>

<dc:Title> <rdf:Alt> <rdf:li xml:lang="en"> The Coolest Web Page</rdf:li> <rdf:li xml:lang="it"> Il Pagio di Web Fuba</rdf:li> </rdf:Alt> </dc:Title>

</rdf:Description> </rdf:RDF>

Page 69: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 69

Example: PICS Labels PICS includes a schema,

statements (about pages), and metastatements (about labels)

(PICS-1.1 "http://www.gcf.org/v2.5" by "John Doe" labels on "1994.11.05T08:15-0500" until "1995.12.31T23:59-0000"

for "http://w3.org/Overview.html" ratings (suds 0.5 density 0 color/hue 1)

for "http://w3.org/Underview.html" by "Jane Doe" ratings (subject 2 density 1 color/hue 1))

[hypothetical; this is not a standards proposal]

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/1998/WD-rdf-syntax#" xmlns:pics="http://www.w3.org/TR/@@/WD-PICS-labels#" xmlns:gcf="http://www.gcf.org/v2.5">

<rdf:Description bagID="L01" about="http://w3.org/Overview.html" gcf:suds="0.5" gcf:density="0" gcf:color.hue="1"/>

<rdf:Description bagID="L02" about="http://w3.org/Underview.html" gcf:subject="2" gcf:density="1" gcf:color.hue="1">

<rdf:Description aboutEach="#L01" pics:by="John Doe" pics:on="1994.11.05T08:15-0500" pics:until="1995.12.31T23:59-0000"/>

<rdf:Description aboutEach="#L02" pics:by="Jane Doe" pics:on="1994.11.05T08:15-0500" pics:until="1995.12.31T23:59-0000"/>

</rdf:RDF>

Page 70: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 70

Distributing RDF Metadata

The PICS effort had three goals for labels: Embedded in documents (primarily HTML META) Transmitted with documents (in HTTP headers) Separately, from third parties (HTTP label queries)

Similarly, RDF has embedding mechanisms Using <RDF> in <HEAD>, though it’s invalid HTML

4.0The abbreviated form prevents values from rendering

Using the RDF Namespace in any XML document

Remote access is unspecified as yet .. but XLinks to metadata stores could work neatly

Page 71: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 71

Collected BNF Grammar 1/2

[6.1] RDF ::= '<rdf:RDF>' obj* '</rdf:RDF>' [6.2] obj ::= description | container [6.3] description ::= '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '/>'

| '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '>' property* '</rdf:Description>' | typedNode

[6.4] container ::= sequence | bag | alternative [6.5] idAboutAttr ::= idAttr | aboutAttr | aboutEachAttr [6.6] idAttr ::= 'ID="' IDsymbol '"' [6.7] aboutAttr ::= 'about="' URI-reference '"' [6.8] aboutEachAttr ::= 'aboutEach="' URI-reference '"' [6.9] bagIdAttr ::= 'bagID="' IDsymbol '"' [6.10] propAttr ::= propName '="' string '"'

(with embedded quotes escaped) [6.11] property ::= '<' propName idAttr? '>' value '</' propName '>'

| '<' propName idRefAttr? bagIdAttr? propAttr* '/>' [6.12] typedNode ::= '<' typeName idAboutAttr? bagIdAttr? propAttr* '/>'

| '<' typeName idAboutAttr? bagIdAttr? propAttr* '>' property* '</' typeName '>'

Page 72: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 72

Collected BNF Grammar 2/2

[6.13] propName ::= Qname [6.14] typeName ::= Qname [6.15] idRefAttr ::= idAttr | resourceAttr [6.16] value ::= obj | string [6.17] resourceAttr ::= 'resource="' URI-reference '"' [6.18] Qname ::= [ NSname ':' ] name [6.19] URI-reference ::= (see RFC1738, RFC1808, [URI]) [6.20] IDsymbol ::= (any legal XML name symbol) [6.21] name ::= (any legal XML name symbol) [6.22] NSname ::= (any legal XML namespace prefix) [6.23] string ::= (any XML text, with "<", ">", and "&" escaped) [6.24] sequence ::= '<rdf:Seq' idAttr? '>' member* '</rdf:Seq>' [6.25] bag ::= '<rdf:Bag' idAttr? '>' member* '</rdf:Bag>' [6.26] alternative ::= '<rdf:Alt' idAttr? '>' member+ '</rdf:Alt>' [6.27] member ::= referencedItem | inlineItem [6.28] referencedItem ::= '<rdf:li' resourceAttr '/>' [6.29] inlineItem ::= '<rdf:li>' value </rdf:li>'

Page 73: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 73

Formal mapping to XML 1/2

Each element E in a Description block defines a {p,r,v} triple: p is the element name, fully qualified as a URI r is the about or ID attribute of the Description; or anonymous v is the string or node contained by E, or the resource attribute of E

The Description block defines a Bag containing the reifications of each included property, named as BagID or anonymous

The aboutEach attribute expands the process for each r in C

The LI element works as above, with p assigned in XML order

Page 74: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 74

Formal mapping to XML 2/2

Each attribute A on a Description tag (other than ID, about, aboutEach or bagID) defines a {p,r,v} triple: p is the attribute name, fully qualified as a URI r is the about or ID attribute of the Description; or a member

of the collection in the aboutEach attribute. v is the (string) value of A

Each attribute A on a Property tag (other than ID, resource, or bagID) defines triples:

Linking the node r2 (ID or resource) to the enclosing element’s resource as {p,r1,r2}

On node r2 for each attribute A, as above

Page 75: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 75

3. The RDF Schema

Introducing new PropertyTypes in a machine-understandable way calls for a schema language E.g. “a book must have at least one author”

RDFS is a loosely object-oriented solution with: Core Classes Core PropertyTypes Core Constraints Documentation Hooks Model & Syntax Concepts

Deployment: Dublin Core, DCDs, & Other Issues

Page 76: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 76

Core Classes

RDF:Resource All resources (a.k.a. Nodes) are instances of

thisRoughly corresponds to Object in OO systems

RDF:PropertyType All elements of the set PropertyTypes are

instances

RDFS:Class Loosely corresponds to a type or category No formal properties of Class itself

Page 77: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 77

Core PropertyTypes

RDF:type Indicates a resource is a member of a class;

the value must be be an type:Class

A resource may be an instance of several classes

RDFS:subClassOf Indicates a (strict) subset/superset relationship Its domain and range is Class A class may not be a subclass of itself

But there isn't a way to express/enforce this in RDFS

Page 78: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 78

Core Constraints

RDFS:ConstraintPropertyType Superclass of Range and Domain

RDFS:Range Specify the (at most one) class of property

valuesAny value allowed if no range specified

RDFS:Domain Class(es) on which a propertyType may be used

Allowed on any class if no domain specified

Page 79: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 79

Documentation & Model

RDFS:comment Natural-language description of a resource

RDFS:label Human-readable version of a resource name

RDFS:Collection The superclass of Bag, Seq, and Alt

RDFS:String A resource corresponding to M&S definiton of

string (production 15 in the BNF)

Page 80: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 80

Recap: RDFS Class Hierarchy

Page 81: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 81

Recap: RDFS Constraints

Page 82: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 82

RDFS in RDF <RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"

xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:s="http://www.w3.org/TR/WD-rdf-schema#">

<s:Class rdf:ID="Property" s:comment="A triple consisting of a property type, a node, and a value" />

<s:Class rdf:ID="PropertyType" s:comment="A name of a property, defining specific meaning for the property" />

<s:Class rdf:ID="Bag" s:comment="An unordered collection" />

<s:Class rdf:ID="Seq" s:comment="An ordered collection" />

<s:Class rdf:ID="Alt" s:comment="A collection of alternatives" />

<PropertyType ID="propName" s:comment="Identifies the property type of a property in reified form" s:domain="#Property" s:range="#PropertyType" />

<PropertyType ID="propObj" s:comment="Identifies the resource that a property describes in reified form" s:domain="#Property" />

<PropertyType ID="value" s:comment="Identifies the value of a property in reified form" />

<PropertyType ID="instanceOf" s:comment="Identifies the Class of a resource" s:range="#Class" />

</RDF>

Page 83: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 83

Dublin Core on a Slide

Content Title Subject

Includes keywords Description Source

Metadata of predecessor Language Relation

e.g. isVersionOf, isFormatOf

Coverage Spatial or temporal range

Intellectual Property Creator Contributor

e.g. editor, translator Publisher Rights

Instance Date Type

e.g. novel, poem, TR Format Identifier

Page 84: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 84

Deployment Issues

RDF Schemas vs. “XML Schemas” The problems of controlled vocabulary metadata and

self-describing DTDs are close enough to cause confusion XML-Data, by Microsoft & Co conflates the two DTDs-in-XML is a work item before the rechartered XML WG

URI versioning issues As with DTDs, the permanence of the schema identifier is

a popular red herring (as is “performance”)

Compatibilty with ‘push’ product formats Channel Definition Format, Open Software Description,...

But products and services are shipping...

Page 85: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 85

Document Content Description

Long-standing goal of XML DTDs expressed in XML “A schema language for tagsets”

Here, DCD models the DL element from HTML in RDF: <DCD>

  <ElementDef Type="DL" Model="Elements" Content="Closed">    <Description>A simple 'definition list' construct, which contains paired       'DT' (DL Term) and 'DD' (DL Definition) elements</Description>    <Group Occurs="OneOrMore" RDF:Order="Seq">      <Element>DT</Element>      <Group Occurs="Optional"><Element>DD</Element></Group>    </Group>  </ElementDef>  <ElementDef Type="DT" Model ="Data" Content="Closed">    <Description>The term being defined in a DL list item</Description>  </ElementDef>  <ElementDef Type="DD" Model ="Mixed" Content="Open">    <Description>A term's definition in a DL list item</Description>    <!-- Open because lots of markup can be in a DL -->  </ElementDef> </DCD>

Page 86: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 86

DCD: Beyond Text Processing

DCD can describe data types and constraints, too: <DCD>

 <ElementDef Type="Booking" Model="Elements" Content="Closed">   <Description>Describes an airline reservation</Description>   <Group RDF:Order="Seq">     <Element>LastName</Element> <Element>FirstInitial</Element>     <Element>SeatRow</Element> <Element>SeatLetter</Element>     <Element>Departure</Element> <Element>Class</Element>   </Group> </ElementDef> <!-- example omits boring field declarations --> <ElementDef Type="SeatRow" Model="Data" Datatype="i1" Min="1" Max="72" /> <ElementDef Type="SeatLetter" Model="Data" Datatype="char" Min="A" Max="K"/> <ElementDef Type="Class" Model="Data" Datatype="char" Default="1"/></DCD>

Sample airline booking record: <Booking>

  <LastName>Bray</LastName><FirstInitial>T</FirstInitial>  <SeatRow>33</SeatRow><SeatLetter>B</SeatLetter>  <Departure>1997-05-24T07:55:00+1</Departure></Booking>

Page 87: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 87

DCD: Basic Concepts 1/2

DCDs themselves have descriptive parameters Description, (Canonical) Namespace URI Open or Closed: whether documents must validate

Elements Content Model

Empty, Any, Data, Elements, or Mixed ; Root flag

Attribute or AttributeDef declarations Default & Fixed element contents Groups & Order (Seq or Alt); Occurs

Required, Optional, OneOrMore, or ZeroOrMore

Page 88: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 88

DCD: Basic Concepts 2/2

Attributes Name, is Global, Required or Optional, has ID-Role

Entities Name & (Value, PublicID, or SystemID)

Datatypes XML DTD types, numbers (int, fixed, float, 1-8

bytes), booleans, times (dates & intervals), & binary data

Min, Max & MinExclusive, MaxExclusive Picture constraints on string fields (per COBOL)

Characters, numbers, decimals, symbols

Page 89: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 89

Implications & Applications

What happens when “one application's metadata is another application's data”?

Approaching Tim Berners-Lee’s “next phase of the Web”: Reasoning Engines RDF is a simple frame system -- not a reasoning

system (but one can be built atop it)

Automating the Web of Trust New generation of “Internet-scale” identification,

rights management, authorization tools need signed RDF assertions for trust management

Page 90: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 90

For more information...

The Specifications http://www.w3.org/TR/WD-rdf-syntax & WD-rdf-schema

W3C’s RDF & Metadata home pages http://www.w3.org/Metadata/ & /RDF/

Eric Miller’s introduction http://www.dlib.org/dlib/may98/miller/05miller.html http://purl.oclc.org/~emiller/talks/www7/tutorial

Dave Beckett’s RDF Resources http://www.cs.ukc.ac.uk/people/staff/djb1/

research/metadata/rdf.shtml This talk is at http://www.ics.uci.edu/~rohit/cscw98/rdf/

Page 91: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

XML in WebDAV

or, a Tale of Two Standards

Rohit Khare(with Jim Whitehead)

UC Irvine22 February 1999 • WACC ‘99Originally for XML’98 Dev Day

Page 92: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 92

XML in WebDAV

“Distributed Authoring and Versioning”WebDAV Abstract ModelWhy WebDAV Used XMLThe WebDAV Protocol DTD PROPFIND Example (§ 8.1.1)Standards Process UpdateCall to Arms

Page 93: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 93

Distributed Authoring

Very first Web tools edited, too ... but didn’t have collaboration support

WebDAV founded to prevent “lost-update” HTTP extensions to update documents Metadata to describe sites and pages

Implemented with XML ... about 1/3 of the spec (by volume)

Page 94: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 94

WebDAV Abstract ModelResourcesProperties

create, remove, and query metadata; links

CollectionsLocks

exclusive or shared write access

Namespace Operations atomic copy and move commands

Page 95: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 95

Why WebDAV Used XML

Used XML in headers and entity-bodies to add extra XML elements to existing data to support internationalization

Used whenever arguments: are unbounded length are shown to humans

Decentralized Extensibility e.g. new locktypes beyond write access

Page 96: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 96

The WebDAV Protocol DTD

Protocol elements activelock, multistatus, link, propfind

Property elements creationdate, displayname, getcontentlength

The DAV: Namespace Properties for HTTP-header metadata,

internationalization, and protocol elements

Page 97: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 97

PROPFIND Example (§ 8.1.1)

PROPFIND /container/ HTTP/1.1Host: www.foo.barDepth: 1Content-Type: text/xml;

charset="utf-8"Content-Length: xxxx

<?xml version="1.0”encoding="utf-8" ?>

<D:propfind xmlns:D="DAV:"><D:allprop/>

</D:propfind>

HTTP/1.1 207 Multi-StatusContent-Type: text/xml; charset="utf-8"Content-Length: xxxx

<?xml version="1.0" encoding="utf-8" ?><D:multistatus xmlns:D="DAV:">

<D:response><D:href>http://www.foo.bar/container/</D:href><D:propstat>

<D:prop xmlns:R="http://www.foo.bar/boxschema/"><R:bigbox> <R:BoxType>Box type A</R:BoxType></R:bigbox><R:author> <R:Name>Hadrian</R:Name></R:author>

<D:creationdate> 1997-12-01T17:42:21-08:00 </D:creationdate><D:displayname> Example collection </D:displayname><D:resourcetype><D:collection/></D:resourcetype><D:supportedlock> <D:lockentry>

<D:lockscope><D:exclusive/></D:lockscope><D:locktype><D:write/></D:locktype>

</D:lockentry> </D:supportedlock></D:prop><D:status>HTTP/1.1 200 OK</D:status></D:propstat></D:response>

</D:multistatus>

Page 98: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 98

Standards Process Update

Imminent IETF Proposed Standard Next steps: IESG Approval, Implementations Launching DAV Searching & Location (DASL)

WG ; Access Control Lists (ACL) to come

Support from several vendors Microsoft, Netscape prototype servers Workflow vendors More announcements on the way...

Page 99: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 99

Call to Arms

WebDAV uses XML ... for its protocol commands ... for its properties/ metadata

WebDAV can manage XML ... ideal for versioning ... manages source and derived editions

Protocols can be built atop XML ... though it was tough to coordinate maturity

Page 100: Extensible Markup Language Modeling Data & Metadata WACC ‘99Rohit Khare San Francisco, CAUC Irvine 22 February 19994K Associates

WACC '99 XML: Modeling Data & Metadata 100

For More Information...

WebDAV Working Group [email protected] mailing list Meeting @ LA IETF

Chair: E. James Whitehead [email protected] http://www.ics.uci.edu/~ejw/authoring

(this talk will be available at that url)

Acknowledgments to many, many others