extensible markup language modeling data & metadata wacc ‘99rohit khare san francisco, cauc...
TRANSCRIPT
Extensible Markup Language
Modeling Data & Metadata
WACC ‘99 Rohit KhareSan Francisco, CA UC Irvine22 February 1999 4K Associates
WACC '99 XML: Modeling Data & Metadata 2
XML: Modeling Data & Metadata
Computer-Supported Cooperative Work systems call for a portable information delivery format – XML is self-descriptive, to cross ontological
boundaries XLinks can reduce the burden of marshaling
interdependent information amongst agents RDF & Namespaces promise composable schema
definition – for XML tags, and for other metadata
Web-based CSCW applications are using XML: Distributed Authoring and Versioning (WebDAV) Interchangeable Process Models (Endeavors)
WACC '99 XML: Modeling Data & Metadata 3
Agenda
XML: The Least You Need to Know The Evolution of XML Origin
The Origin of (Document) Species Evolution
XML 1.0, Namespaces, XLink, and XSL Specification
Capturing the State of Distributed SystemsImplication
RDF: Model, Syntax, and Schema DCD: Document Content Descriptions
XML in WebDAV: A Tale of Two StandardsDesign Kitchen
http://www.ics.uci.edu/~rohit/wacc99/
The Ascent of XML
Joint with Dan Connolly Fall 97 Web Journal
WACC '99 XML: Modeling Data & Metadata 5
Mission Statement
XML was designed to provide an easy-to-write, easy-to-interpret, and easy-to-implement subset of SGML. It was not designed to provide a “one Markup Language fits all” DTD, or a separate DTD for every tag. It was designed so that certain groups could create their own particular markup languages that meet their needs more quickly, efficiently, and (IMO) logically. It was designed to put an end once and for all to the tag-soup wars propagated by Microsoft and Netscape.
Jim Cape, June 3, 1997posted to comp.infosystems.www.authoring.html
WACC '99 XML: Modeling Data & Metadata 6
World-Wide Markup Language
“The HyperText Markup Language is an SGML format.” [Tim Berners-Lee, 1991, “About HTML”]
Standard, Open Document Formats since the 1960s (GCA’s GenCode, IBM’s GML)
SGML [ISO 8879:1986] can prove the validity of any document using standard DTDs w/complex structuring
1990: Berners-Lee picked tags from a sample SGML DTD and added “killer feature”: links
World-Wide HTML tagset embodies 80/20 rule XML allows Community-Wide Markup Languages,
combining SGML’s power with HTML’s simplicity
WACC '99 XML: Modeling Data & Metadata 7
Community-Wide Markup Languages
Community size is inversely proportional to shared context Millions agree that <B> means bold, but 2/6/98 reflects local culture XML decentralizes control of specialized markup languages,
making it cost-effective to capture community ontologies HTML is not unilaterally extensible (new tags have potentially
ambiguous syntax, style, and semantics) XML is a strict (but simplified) subset of SGML, offering:
Extensibility – can define new elements, containers, attribute names Structure – a DTD can constrain the information model of a document Validation – every document can be validated; also, well-formedness
can establish conformance to the structure mandated by the DTD XML includes extensible linking and style formatting also “Node content must be left free to evolve.”
[Tim Berners-Lee, 1991, “About Document Formats”]
WACC '99 XML: Modeling Data & Metadata 8
Future Evolution of XML
Coevolution of HTML and XML All XML is SGML All HTML is SGML But all HTML is not XML...
The HTML 5 effort is aiming in this directionXML-in-HTML offers several near-term solutions
XML profiles SGML: No Markup Minimization No Optional Features Technical Corrigendum harmonizes SGML86
WACC '99 XML: Modeling Data & Metadata 9
Scorecard… 1/4
Organizations W3C: World Wide Web Consortium
Corporate members sponsored (and trademarked) XML™
ISO: International Organization for StandardsSovereign states sponsored SGML
GCA: Graphics Communications AssociationPrinting industry folks sponsored global *ML
conferences
WACC '99 XML: Modeling Data & Metadata 10
Scorecard… 2/4
XML Coordination Group Initiation & Oversight of XML Activity, public contact Chair: Jon Bosak (Sun)
XML Schema WG Because grammar rules aren’t enough Chairs: Dave Hollander (HP), C.M. Sperberg-
McQueen (UIUC)
XML Information Set WG Abstract representation of parsed XML entities Chair: David Megginson (Microstar)
WACC '99 XML: Modeling Data & Metadata 11
Scorecard… 3/4
XML Fragment WG How to clip out a single span of XML, in context Chair: Paul Grosso (ArborText)
XML Syntax WG Maintains the core specification Style sheet linking, canonicalization, profiling Chairs: Tim Bray (Textuality) and Joel Nava (Adobe)
Extensible Stylesheet Language (XSL) WG a language for transforming XML documents an XML vocabulary for formatting semantics
WACC '99 XML: Modeling Data & Metadata 12
Scorecard… 4/4
Document Object Model (DOM) WG Scripting hooks for HTML/XML in browsers Lauren Wood (SoftQuad)
Cascading Style Sheets (CSS) WG Declarative formatting directives for HTML/XML Leaders: Håkon Lie, Chris Lilley, Bert Bos
(W3C/Inria)
Resource Description Format (RDF) WG Meta-metadata framework; knowledge
representation Chairs: Ora Lassila (Nokia), Ralph Swick (W3C/MIT)
The Origin of (Document) Species
Presented at WWW7 (Brisbane)
WACC '99 XML: Modeling Data & Metadata 14
The Origin of (Document) Species
The Document EcologyEvolutionary Adaptations of:
Syntax SGML
Style CSS/XSL
Structure HTML
Semantics XML
The Fossil Record
WACC '99 XML: Modeling Data & Metadata 15
The Document Ecology
The World Wide Web is the “universe of network-accessible information” [Tim Berners-Lee, 1996] Openness and Content-Neutrality of Documents
HTTP can adapt to any document format URL can represent links to any document format, from within
many “Natural Selection” Favors a Few Document Formats
Preferential adoption of SGML, CSS, HTML, and now XML Each embodies the evolutionary strategy of parsimony
Evolution: Capture Info --> Represent Knowledge Can leverage Web reflexively to capture structure and
semantics XML-based document formats represent an ecosystem of
interdependent (rather than competing) document “species”
WACC '99 XML: Modeling Data & Metadata 16
Evolution: Syntax
Issues of Concrete Representation Binary (machine) vs. Text (human) formats Mission-specific vs. Generic formats Context-free vs. Turing-complete formats
From Turing-Complete to Declarative
Context-Free Turing-Complete
Text Binary Text Binary
Specific MIF Dump JavaScript Intel x86
General SGML ASN.1 UNIX Scripts COFF
WACC '99 XML: Modeling Data & Metadata 17
Evolution: Style
Externalized Formatting over Embedded Directives {\keepn\par\pard\sb240\sl-264 \b1\hyppar0 Warning: do not …} <P CLASS=“WARNING”> …
WARNING { font : bold } ... <?XML-stylesheet TYPE=“text/css” HREF=“warning.css”?> <WARNING> …
WARNING { font: bold } ...
Cascading to eXtensible Style SheetsRendering: displays, Braille, audio, ...From Inline Formats to Style
Sheets
WACC '99 XML: Modeling Data & Metadata 18
Evolution: Structure
Anatomy of a Newspaper Article Logical: headline, byline, body, footer, … Descriptive: bold, italic, indented, … Declarative: title, address, keyboard-input, … Custom declarative: <dateline>, <byline>, ...
Automatic Information Collection Resource Discovery: For providing useful hints to search engines Classifying: For cataloguing information content and relationships Content Rating: For aggregation and filtering Knowledge Codifying, Sharing, and Exchanging: For processing
From Presentational to Declarative
WACC '99 XML: Modeling Data & Metadata 19
Evolution: Semantics
How well does the document support the potential uses of its contents?
Scenario: To-Do List Manager Natural-language Scratchpad HTML Definition List (Datebook) XML <DEADLINE AT=“iso-date”> Element
Composable, Networked DTDs Reuse by Linking Namespaces
From Operational to Well-defined
WACC '99 XML: Modeling Data & Metadata 20
Citation Encodings 1/2
Presentational Text: “XML, Java, and the Future of the Web”, Jon Bosak, World Wide Web
Journal, 2(4):219--228, (1997) J. Bosak, World Wide Web Journal, “XML, Java, and the Future of the
Web”, Autumn 1997, Vol. 2, No. 4, pp. 219--228.
Presentational HTML: <UL>XML, Java, and the Future of the Web</UL>,<I>World Wide Web
Journal</I>, Jon Bosak,<B>2(4):219-228</B>, 1997.
Structural HTML: <CITE>XML, Java, and the Future of the Web</CITE>
<H3>World Wide Web Journal</H3> <H4>Jon Bosak</H4><UL> <LI> 2(4):219-228 <LI> 1997 </UL>
WACC '99 XML: Modeling Data & Metadata 21
Citation Encodings 2/2
Customized XML: <BIB><TITLE>XML, Java, and the Future of the Web</TITLE>
<JOURNAL>World Wide Web Journal</JOURNAL><AUTHOR> <FIRSTNAME>Jon</FIRSTNAME> <LASTNAME>Bosak</LASTNAME></AUTHOR><VOLUME>2</VOLUME> <NUMBER>4</NUMBER><YEAR>1997</YEAR> <PAGES>219-228</PAGES> </BIB>
RDF Metadata in XML, using schemas from the Web: <rdf:Description
about="http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm" xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#" xmlns:bib="http://library.org/bibliography-info#"> <dc:Title>XML, Java, and the Future of the Web </dc:TITLE> <bib:Journal href="http://www.w3j.com/"> World Wide Web Journal</bib:Journal> <dc:Creator>Jon Bosak</dc:Creator> <bib:Volume>2</bib:Volume> <bib:Number>4</bib:Number> <bib:Year>1997</bib:Year> <bib:Pages>219-228</bib:Pages></rdf:Description>
WACC '99 XML: Modeling Data & Metadata 22
The Fossil Record
DocumentFormat
Syntax Style Structure Semantics
ASN.1 Binary Type-Length-Value Per-application
Text ASCII, Unicode... Lines Natural Language
troff Readable Text Inline Directives Sections, Pages Typesetting
TeX Readable Program LaTeX Sections, Pages Typesetting
PostScript Programming Language Pages Drawing
Rich Text Format Opaque Text Extensible Directives Characters, Paragraphs
HTML formatting Readable Text Nested Directives Presentational
HTML structure Readable Text CSS Declarative Fixed (e.g.,<ADDRESS>)
XML Readable Text CSS, XSL Declarative Extensible
PICS S-expressions Ratings Metadata Schema
RDF XML Text Declarative Metadata Schema
XML: Specifications
Details, Details, Details…
WACC '99 XML: Modeling Data & Metadata 24
XML 1.0 Syntax Self-description with Document Type Defn (DTD):
Structural rules of the document’s markup External entities, internal entities, non-XML resource entities
Non-Minimization (All Markup is Explicit) XML Processor - customizable info structures
Most XML users will not know they are using a DTD! XML-aware software works with custom XML apps without help
Document Validity Conforms to DTD Document well-formedness = structurally sound, but no DTD Well-formedness for delivery format, validity for authoring
WACC '99 XML: Modeling Data & Metadata 25
XML 1.0: Markup Types Elements - start-tags, end-tags, empty tags
<joke>Take my XML. Please.<applause/></joke>
Attributes - name-value pairs in tags <warning class=“emergency”>
Entity References - special chars, macros, external Comments - not passed along to application
<!-- No dash pair before end of comment. -->
Processing Instructions - passed along to application <?pi-target-name pi-data?>
CDATA Sections - parser ignores markup *p = &q; b = (i <= 3);
WACC '99 XML: Modeling Data & Metadata 26
XML-1.0: DTDs 1/2
Element Declarations - name and content model <!ELEMENT email (from, to+, cc*, subject, body, sig?) >
<!ELEMENT body (#PCDATA | image)* ><!ELEMENT image EMPTY >
<!ELEMENT node (desc, node*)><!ELEMENT desc (#PCDATA)>
Attribute Declarations - which elements may have what attributes; default and possible values <!ATTLIST joke
name ID #required label CDATA #implied status ( funny | notfunny ) funny >
WACC '99 XML: Modeling Data & Metadata 27
XML-1.0: DTDs 2/2
Attribute Declarations - name, type, default Attribute types: CDATA, ID, IDREF/IDREFS, ENTITY/ENTITIES,
NMTOKEN/NMTOKENS, a list of names Default values: #REQUIRED, #IMPLIED, “value”, #FIXED “value”
Entity Declarations - associate name with chunk Internal entities: < > & ' "
or locally defined: ⋔ <!ENTITY fork “4K Consulting”> External entities: <!ENTITY forkfooter SYSTEM “fork/footer.xml”>
<!ENTITY pic SYSTEM “http://4k.org/4k.gif” NDATA GIF87A> Parameter entities: <!ENTITY % html.ver “-//W3C//DTD HTML 4.0//EN”>
Notation Declarations - external binary data <!NOTATION GIF87A SYSTEM “GIF”>
WACC '99 XML: Modeling Data & Metadata 28
XML: Namespaces
Basic concept: element names and attribute names can be viewed as URIs
But disagreement over the details took up most of 1997
Use the reserved xmlns: attribute to introduce a new namespace name, optionally with a prefix <?xml version="1.0"?>
<!-- both namespace prefixes are available throughout --><bk:book xmlns:bk='urn:loc.gov:books’ xmlns:isbn='urn:ISBN:0-395-36341-6'> <bk:title>Cheaper by the Dozen</bk:title> <isbn:number>1568491379</isbn:number></bk:book>
WACC '99 XML: Modeling Data & Metadata 29
XLink : ‘Real’ Hypertext Links
Start with HTML, add HyTime and TEI concepts Simple links point to a single target resource
<A XML-LINK=“SIMPLE” HREF=“http://www.w3.org/XML”> URL schemes: ftp, http, file, mailto, telnet, nntp, ...
Links can have roles (machine-processible) and human-readable labels associated with them
Can specify default behavior of a link SHOW - embed in current context, replace it, or start new one ACTUATE - user must take action (or not) before dereferencing
Locators - # separates resource name and part id Connectors - | to navigate to relevant element(s) only
WACC '99 XML: Modeling Data & Metadata 30
XLink: Advanced Extended links - can be multidirectional, need not
live in the resources they point, link groups <related-term-group>Hamlet
<related-term HREF=“Othello.xml”/><related-term HREF=“KingLear.xml”/><related-term HREF=“Macbeth.xml”/></related-term-group>
<!ELEMENT related-term-group (#PCDATA | related-term)* ><!ATTLIST related-term-group XML-LINK CDATA #FIXED “EXTENDED” INLINE CDATA #FIXED “TRUE” CONTENT-ROLE CDATA #FIXED “RT” … >
Extended Pointers - locate resource by traversing the element tree of its containing document XPointers allow links without modifying the containing
document
WACC '99 XML: Modeling Data & Metadata 31
XSL: “Behavior Sheets” CSS1 - formatting rules in terms of element names, IDs XSL - based on DSSSL (ISO/IEC 10179:1996)
Formatting specification derived from active style sheets With XML doc structure, create flow objects (paragraphs,
tables, ...) with characteristics (font-name, font-size, …) Merging, flow object tree determines document layout Can calculate tables-of-contents, indexes, other scripts
Scheme - core expression (math) language in XS Construction rules declare element style
font-size:18pt, first-line-start-indent:20pt, quadding:left
Typically, formatting shorthand declared as functions (element CODE (UNDERLINED-PHRASE))
Capturing the State of Distributed Systems
July 1997 IEEE Internet ComputingFall 1997 Web Journal
WACC '99 XML: Modeling Data & Metadata 33
Data Archaeology
Meishi, or Business Cards Different shapes, sizes, scripts,
demarcations Two sided, magnetic, photos, public keys
Airline Passenger Name Records “NQSS5A” means something to airline Must be manipulated throughout its lifetime
Need stable data format, stable grain of exchange, and common definitions
WACC '99 XML: Modeling Data & Metadata 34
Every Bit in XML
Across Time Save Self-Description with Data State “Future Proofing” of Documents and Data
Across Space For Exchange as well as Storage of Data
Across Organizations Building Community-Specific Ontologies Key to Knowledge Representation
WACC '99 XML: Modeling Data & Metadata 35
Across TimeTensions Create Brittle Data Formats
Inertia, Efficiency, Tools, and Context
Future-Proofing Strategies Machine-Readable - for parsers & generators Human-Readable - robustness and simplicity Self-Descriptive - for extraction and validation
XML as Basis to Execute Strategies e.g., Capturing Database Schema as DTDs
WACC '99 XML: Modeling Data & Metadata 36
Across Space
Tradeoffs when Marshaling Data Distributed Systems with Centralized Data Need: Decentralized, Isolated Subsystems
Strategy: Defer Marshaling Decisions The Web’s Lesson: Link Instead Download Networked Resources as Needed
Leveraging the XLink Model SHOW and ACTUATE; XPointer identifiers
WACC '99 XML: Modeling Data & Metadata 37
Across Cultures
Challenges to Collaboration Organizations Defined by their Language Ontological Problem: Matching Vocabulary
Strategy: Use Documents “Put It In Writing” Defining Common Terms Popular Ontologies Can Emerge Organically
Solution: XML-enhanced Documents Let a Thousand DTDs Bloom on the Web
WACC '99 XML: Modeling Data & Metadata 38
XML Example Applications
Mathematical Markup Language MathMLChemical Markup Language CMLDNA Sequences BSMLChannel/Site Manifests CDF, MCFVector Graphics VML, PGMLSynchronous Multimedia SMILWeb Form Automation WIDLForm Transaction Records XFDLSchema Description RDF
WACC '99 XML: Modeling Data & Metadata 39
Some Software Tools JSXML - Java Object Stream to XML packages
http://www.camb.opengroup.org/~laforge/jsxml/
Lark - a non-validating XML parser in Java http://www.textuality.com/Lark/
MSMXL - Microsoft’s XML parser in Java http://www.microsoft.com/xml/
NXP - a validating parser in Java http://www.edu.uni-klu.ac.at/~nmikula/NXP/
Many, many others http://www.cs.caltech.edu/~adam/local/xml.html
WACC '99 XML: Modeling Data & Metadata 40
XML: Recommended Reading
XML Books: Richard Light, Presenting XML, Sams.Net, August 1997. Web Journal - XML issue, O’Reilly, Fall 1997 http://www.w3j.com/xml/
XML Papers from Khare/Rifkin: X Marks the Spot: An Introduction to XML
The Ascent of XML (with Dan Connolly)Capturing the State of Distributed Systems with XMLThe Origin of (Document) Species
http://www.cs.caltech.edu/~adam/papers/www XML Links Galore http://www.cs.caltech.edu/~adam/local/xml.html
Robin Cover’s Extensive XML Page http://www.sil.org/sgml/xml.html
Peter Flynn’s XML FAQ http://www.ucc.ie/xml/
XML-Dev Jewels http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html
This talk is at http://www.ics.uci.edu/~rohit/wacc99/xml
ResourceDescriptionFramework
Model, Syntax, and Schema Specifications
Rohit KhareUC Irvine
4K Associates
WACC '99 XML: Modeling Data & Metadata 42
Metadata about the RDF Spec <rdf:Description about=""
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#" xmlns:ddc="http://purl.org/net/ddc#" dc:Title="Resource Description Framework (RDF) Model and Syntax Specification” dc:Description="The Resource Description Framework (RDF) is a foundation
for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources."
dc:Publisher="World Wide Web Consortium" dc:Date="1998-08-19" dc:Format="text/html" dc:Type="technical specification" dc:Language="eng"> <dc:Subject resource="http://purl.org/net/ddc/025.30285"
ddc:Class="025.30285" ddc:Heading="data processing computer applications" />
<dc:Subject resource="http://purl.org/net/ddc/025.316" ddc:Class="025.316" ddc:Heading="Machine-readable catalog record formats" />
<dc:Subject ddc:Class="025.302855741" ddc:Heading="Applications of computer file organization and access methods" />
<dc:Creator> <rdf:Bag rdf:_1="Ora Lassila" rdf:_2="Ralph Swick" /> </dc:Creator> </rdf:Description>
WACC '99 XML: Modeling Data & Metadata 43
Dissecting the label 1/3
A set of statements about this object (the spec): <rdf:Description about=""
Introducing three vocabularies to describe it: Basic RDF
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
Dublin Core xmlns:dc="http://purl.org/metadata/dublin_core#"
Dewey Decimal Code xmlns:ddc="http://purl.org/net/ddc#"
(xmlns: prefixes are used to define new XML attributes and tags)
WACC '99 XML: Modeling Data & Metadata 44
Dissecting the label 2/3
Using Dublin Core, as attributes of Description
dc:Title="Resource Description Framework..." dc:Description="(RDF) is a foundation for..." dc:Publisher="World Wide Web Consortium" dc:Date="1998-08-19" dc:Format="text/html" dc:Type="technical specification" dc:Language="eng">
...and as an element, using an RDF bag: <dc:Creator>
<rdf:Bag rdf:_1="Ora Lassila" rdf:_2="Ralph Swick" /></dc:Creator>
WACC '99 XML: Modeling Data & Metadata 45
Dissecting the label 3/3
... and as a repeated Dublin Core element, with Dewey Decimal Code attribtes: <dc:Subject resource="http://purl.org/net/ddc/025.30285"
ddc:Class="025.30285" ddc:Heading="data processing computer applications"/>
<dc:Subject resource="http://purl.org/net/ddc/025.316" ddc:Class="025.316" ddc:Heading="Machine-readable catalog record formats"/>
<dc:Subject ddc:Class="025.302855741" ddc:Heading="Applications of computer file organization and access methods"/>
Finally, returning to the original HTML head: </rdf:Description>
WACC '99 XML: Modeling Data & Metadata 46
Introduction to RDF The label we just dissected is critical to the Web's
future Beyond machine-readable to machine-understandable
The RDF effort unites a wide array of players Digital librarians, content-raters, privacy advocates, ... Significant industrial momentum, led by W3C
1. The Data Model Resources, properties, and statements
2. The Syntax Rendering into XML with Namespaces
3. The RDF Schema Using RDF to describe new vocabularies
Implications & Applications
WACC '99 XML: Modeling Data & Metadata 47
Why metadata matters...
Automated processing of Web resources: Resource discovery, cataloging Content rating PICS Collections of pages Sitemaps Security & Privacy P3P, DSIG Intelligent software agents
Sharing data between multiple applications and organizations requires explicit definitions
XML enables processing;RDF enables understanding
WACC '99 XML: Modeling Data & Metadata 48
The RDF Working Group Co-chaired by Ora Lassila & Ralph Swick Chartered in the Technology & Society Domain First draft was published in August 1997
Represents many communities: Web Standardization HTML Meta, PICS Library Dublin Core, Warwick Framework Structured Documents SGML, XML Knowledge Representation KIF
Significant Industrial Momentum Ex: “What’s Related” button in Netscape Navigator...
WACC '99 XML: Modeling Data & Metadata 49
1. The Data Model
Resources Any URI reference, from a fragment to a site.
Property Types Named type defines meaning, permitted
values, and relationship to other types. (Types are also resources)
Statements
“Resource has Property with Value” (Values can be resources or atomic XML data)
WACC '99 XML: Modeling Data & Metadata 50
A Trivial Example
Sentence “Ora Lassila is the creator of the resource
http://www.w3.org/Home/Lassila”
Structure Resource http://www.w3.org/Home/Lassila Property type Creator Value "Ora Lassila"
Directed acyclic graph
WACC '99 XML: Modeling Data & Metadata 51
An Indirect Example
To add properties to Creator, point through a (possibly anonymous) intermediate Resource.
WACC '99 XML: Modeling Data & Metadata 52
Collection Containers
Multiple occurrences of the same PropertyType doesn’t establish a relation between the values The Millers own a boat, a bike, and a TV set The Millers need (a car or a truck) (Sarah and Bob) bought a new car
RDF defines three special Resources: Bag unordered values rdf:Bag
Sequence ordered values rdf:Seq
Alternative single value rdf:Alt Core RDF does not enforce ‘set’ semantics amongst values
WACC '99 XML: Modeling Data & Metadata 53
Example: Bag The students in
course 6.001 are Amy, Tim,John, Mary,and Sue
instanceOf: has been renamed type:
WACC '99 XML: Modeling Data & Metadata 54
Example: Alternative
The source code for X11 may be found at ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net
WACC '99 XML: Modeling Data & Metadata 55
Reification
Making statements about statements requires a process for transforming them into Resources propObj the original referent propName the original property type value the original value type the type of this resource
Reified statements are themselves RDF:PropertyCollections are also built-in RDF types
Distributive Referents Referring to a resource vs. its members (aboutEach)
WACC '99 XML: Modeling Data & Metadata 56
Example: Reification
Ralph Swick says that Ora Lassila is the creator of the resource
http://www.w3.org/Home/Lassila
WACC '99 XML: Modeling Data & Metadata 57
Recap: A Formal Model of RDF
RDF itself is mathematically straightforward: Definitions Typing Reification Collections
... though the mapping onto XML syntax (and XML’s formal model) is less so...
WACC '99 XML: Modeling Data & Metadata 58
Formal Model: Definitions
1. There is a set called Nodes 2. There is a subset of Nodes called PropertyTypes
3. There is a set of 3-tuples called Triples {p,r,v} where p is a member of PropertyTypes, r
is a member of Nodes, and v (called value) is either a member of Nodes or an atomic value
“v is the value of p for r” “r has a property p with a value v” “the p of r is v”
WACC '99 XML: Modeling Data & Metadata 59
Formal Model: Typing
4. There is an element of PropertyTypes known as RDF:instanceOf.
5. Members of Triples of the form {RDF:instanceOf, r, v} imply r and v are members of Nodes. [RDFSchema] places additional restrictions on
the use of instanceOf.
WACC '99 XML: Modeling Data & Metadata 60
Formal Model: Reification
6. There is an element of Nodes, not contained in PropertyTypes, known as RDF:Property.
7. There are three elements in PropertyTypes known as RDF:propName, RDF:propObj and RDF:value.
8. Reification of a triple {p,r,v} of Triples is: an element n of Nodes representing the reified triple; and four new elements of Triples: {RDF:propName, n, p} {RDF:propObj, n, r} {RDF:value, n, v} {RDF:instanceOf, n, [RDF:Property]}
WACC '99 XML: Modeling Data & Metadata 61
Formal Model: Collections
9. There are three elements of Nodes, not contained in PropertyTypes, known as RDF:Seq, RDF:Bag, and RDF:Alt.
10. There is a subset of PropertyTypes corresponding to the ordinals called Ord. Refer to elements of Ord as RDF:_1, RDF:_2, ... There must always be one value for RDF:Alt
(RDF:_1 is the default)
WACC '99 XML: Modeling Data & Metadata 62
2. The Syntax
Why XML alone does not sufficeBasic RDF-in-XML SyntaxAbbreviated FormsDistributing RDF metadataCollected BNF GrammarFormal mapping to XML
WACC '99 XML: Modeling Data & Metadata 63
Why XML alone does not suffice
XML can already handle new PropertyTypes: <A HREF="RDF-intro">I liked <DC:Creator>Ora
Lassila</DC:Creator>’s RDF introduction</A>!Just declare a new element or attribute!
... but “raw XML” fails in two ways: Interchange: Namespaces only identify new tags;
DTD semantics do not provide types or composition Scalability: Processing generic XML requires
parsing text (“the entity tax”); and the order of XML elements is considered significant, requiring the whole graph.
WACC '99 XML: Modeling Data & Metadata 64
Basic RDF-in-XML Syntax A Description block about a Resource [1] RDF ::= ['<rdf:RDF>'] description* ['</rdf:RDF>'] [2] description ::= '<rdf:Description' idAboutAttr? '>' property* '</rdf:Description>' [3] idAboutAttr ::= idAttr | aboutAttr [4] aboutAttr ::= 'about="' URI-reference '"' [5] idAttr ::= 'ID="' IDsymbol '"'
contains PropertyName block or empty elements [6] property ::= '<' propName '>' value '</' propName '>'
| '<' propName resourceAttr '/>'
using fully-qualified XML Namespaces on each tag [7] propName ::= Qname
and allows values to be XML data, RDF, or external links
[8] value ::= description | string [9] resourceAttr ::= 'resource="' URI-reference '"'
WACC '99 XML: Modeling Data & Metadata 65
Example: Basic RDF-in-XML
“Ora Lassila is the Creator of the resource...” <rdf:RDF>
<rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator>| </rdf:Description></rdf:RDF>
(where s: is a separately-declared namespace) xmlns:s="http://description.org/schema/"
So the complete, valid XML document would be: <?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:s="http://description.org/schema/"> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator> </rdf:Description></rdf:RDF>
WACC '99 XML: Modeling Data & Metadata 66
Abbreviated Forms
XML Namespace defaulting can shorten that: <?xml version="1.0"?>
<RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"> <Description about="http://www.w3.org/Home/Lassila"> <Creator xmlns="http://description.org/schema/"> Ora Lassila</Creator> </Description></RDF>
(but such aggressive elision is officially discouraged)
RDF itself offers 3 abbreviations: String values as XML attributes Nested descriptions as XML attributes instanceOf: property types as XML element names
WACC '99 XML: Modeling Data & Metadata 67
RDF Abbreviation Rules
Non-repeated, string-valued properties can fold into attributes; Description can be empty <rdf:Description about="http://www.w3.org/Home/Lassila"
s:Creator="Ora Lassila" />
Simple Description-valued properties, too: <rdf:Description about="http://www.w3.org/Home/Lassila">
<s:Creator resource="http://www.w3.org/staffId/85740" v:Name="Ora Lassila" v:Email="[email protected]" /></rdf:Description>
PropertyTypes can be promoted to elements <rdf:Description about="http://www.w3.org/staffId/85740">
<rdf:instanceOf resource="s:Person" /> ... <s:Person about="http://www.w3.org/staffId/85740">
WACC '99 XML: Modeling Data & Metadata 68
Example: Aggregates a document with
two authors specified alphabetically,
a title specified in two different languages,
and with two equivalent locations
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:dc="http://purl.org/metadata/dublin_core#"> <rdf:Description about="http://www.foo.com/cool.html">
<dc:Creator> <rdf:Seq ID="CreatorsAlphabeticalBySurname"> <rdf:li>Mary Andrew</rdf:li> <rdf:li>Jacky Crystal</rdf:li> </rdf:Seq> </dc:Creator>
<dc:Identifier> <rdf:Bag ID="MirroredSites"> <rdf:li rdf:resource="http://www.foo.com.au/cool.html"/> <rdf:li rdf:resource="http://www.foo.com.it/cool.html"/> </rdf:bag> </dc:Identifier>
<dc:Title> <rdf:Alt> <rdf:li xml:lang="en"> The Coolest Web Page</rdf:li> <rdf:li xml:lang="it"> Il Pagio di Web Fuba</rdf:li> </rdf:Alt> </dc:Title>
</rdf:Description> </rdf:RDF>
WACC '99 XML: Modeling Data & Metadata 69
Example: PICS Labels PICS includes a schema,
statements (about pages), and metastatements (about labels)
(PICS-1.1 "http://www.gcf.org/v2.5" by "John Doe" labels on "1994.11.05T08:15-0500" until "1995.12.31T23:59-0000"
for "http://w3.org/Overview.html" ratings (suds 0.5 density 0 color/hue 1)
for "http://w3.org/Underview.html" by "Jane Doe" ratings (subject 2 density 1 color/hue 1))
[hypothetical; this is not a standards proposal]
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/1998/WD-rdf-syntax#" xmlns:pics="http://www.w3.org/TR/@@/WD-PICS-labels#" xmlns:gcf="http://www.gcf.org/v2.5">
<rdf:Description bagID="L01" about="http://w3.org/Overview.html" gcf:suds="0.5" gcf:density="0" gcf:color.hue="1"/>
<rdf:Description bagID="L02" about="http://w3.org/Underview.html" gcf:subject="2" gcf:density="1" gcf:color.hue="1">
<rdf:Description aboutEach="#L01" pics:by="John Doe" pics:on="1994.11.05T08:15-0500" pics:until="1995.12.31T23:59-0000"/>
<rdf:Description aboutEach="#L02" pics:by="Jane Doe" pics:on="1994.11.05T08:15-0500" pics:until="1995.12.31T23:59-0000"/>
</rdf:RDF>
WACC '99 XML: Modeling Data & Metadata 70
Distributing RDF Metadata
The PICS effort had three goals for labels: Embedded in documents (primarily HTML META) Transmitted with documents (in HTTP headers) Separately, from third parties (HTTP label queries)
Similarly, RDF has embedding mechanisms Using <RDF> in <HEAD>, though it’s invalid HTML
4.0The abbreviated form prevents values from rendering
Using the RDF Namespace in any XML document
Remote access is unspecified as yet .. but XLinks to metadata stores could work neatly
WACC '99 XML: Modeling Data & Metadata 71
Collected BNF Grammar 1/2
[6.1] RDF ::= '<rdf:RDF>' obj* '</rdf:RDF>' [6.2] obj ::= description | container [6.3] description ::= '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '/>'
| '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '>' property* '</rdf:Description>' | typedNode
[6.4] container ::= sequence | bag | alternative [6.5] idAboutAttr ::= idAttr | aboutAttr | aboutEachAttr [6.6] idAttr ::= 'ID="' IDsymbol '"' [6.7] aboutAttr ::= 'about="' URI-reference '"' [6.8] aboutEachAttr ::= 'aboutEach="' URI-reference '"' [6.9] bagIdAttr ::= 'bagID="' IDsymbol '"' [6.10] propAttr ::= propName '="' string '"'
(with embedded quotes escaped) [6.11] property ::= '<' propName idAttr? '>' value '</' propName '>'
| '<' propName idRefAttr? bagIdAttr? propAttr* '/>' [6.12] typedNode ::= '<' typeName idAboutAttr? bagIdAttr? propAttr* '/>'
| '<' typeName idAboutAttr? bagIdAttr? propAttr* '>' property* '</' typeName '>'
WACC '99 XML: Modeling Data & Metadata 72
Collected BNF Grammar 2/2
[6.13] propName ::= Qname [6.14] typeName ::= Qname [6.15] idRefAttr ::= idAttr | resourceAttr [6.16] value ::= obj | string [6.17] resourceAttr ::= 'resource="' URI-reference '"' [6.18] Qname ::= [ NSname ':' ] name [6.19] URI-reference ::= (see RFC1738, RFC1808, [URI]) [6.20] IDsymbol ::= (any legal XML name symbol) [6.21] name ::= (any legal XML name symbol) [6.22] NSname ::= (any legal XML namespace prefix) [6.23] string ::= (any XML text, with "<", ">", and "&" escaped) [6.24] sequence ::= '<rdf:Seq' idAttr? '>' member* '</rdf:Seq>' [6.25] bag ::= '<rdf:Bag' idAttr? '>' member* '</rdf:Bag>' [6.26] alternative ::= '<rdf:Alt' idAttr? '>' member+ '</rdf:Alt>' [6.27] member ::= referencedItem | inlineItem [6.28] referencedItem ::= '<rdf:li' resourceAttr '/>' [6.29] inlineItem ::= '<rdf:li>' value </rdf:li>'
WACC '99 XML: Modeling Data & Metadata 73
Formal mapping to XML 1/2
Each element E in a Description block defines a {p,r,v} triple: p is the element name, fully qualified as a URI r is the about or ID attribute of the Description; or anonymous v is the string or node contained by E, or the resource attribute of E
The Description block defines a Bag containing the reifications of each included property, named as BagID or anonymous
The aboutEach attribute expands the process for each r in C
The LI element works as above, with p assigned in XML order
WACC '99 XML: Modeling Data & Metadata 74
Formal mapping to XML 2/2
Each attribute A on a Description tag (other than ID, about, aboutEach or bagID) defines a {p,r,v} triple: p is the attribute name, fully qualified as a URI r is the about or ID attribute of the Description; or a member
of the collection in the aboutEach attribute. v is the (string) value of A
Each attribute A on a Property tag (other than ID, resource, or bagID) defines triples:
Linking the node r2 (ID or resource) to the enclosing element’s resource as {p,r1,r2}
On node r2 for each attribute A, as above
WACC '99 XML: Modeling Data & Metadata 75
3. The RDF Schema
Introducing new PropertyTypes in a machine-understandable way calls for a schema language E.g. “a book must have at least one author”
RDFS is a loosely object-oriented solution with: Core Classes Core PropertyTypes Core Constraints Documentation Hooks Model & Syntax Concepts
Deployment: Dublin Core, DCDs, & Other Issues
WACC '99 XML: Modeling Data & Metadata 76
Core Classes
RDF:Resource All resources (a.k.a. Nodes) are instances of
thisRoughly corresponds to Object in OO systems
RDF:PropertyType All elements of the set PropertyTypes are
instances
RDFS:Class Loosely corresponds to a type or category No formal properties of Class itself
WACC '99 XML: Modeling Data & Metadata 77
Core PropertyTypes
RDF:type Indicates a resource is a member of a class;
the value must be be an type:Class
A resource may be an instance of several classes
RDFS:subClassOf Indicates a (strict) subset/superset relationship Its domain and range is Class A class may not be a subclass of itself
But there isn't a way to express/enforce this in RDFS
WACC '99 XML: Modeling Data & Metadata 78
Core Constraints
RDFS:ConstraintPropertyType Superclass of Range and Domain
RDFS:Range Specify the (at most one) class of property
valuesAny value allowed if no range specified
RDFS:Domain Class(es) on which a propertyType may be used
Allowed on any class if no domain specified
WACC '99 XML: Modeling Data & Metadata 79
Documentation & Model
RDFS:comment Natural-language description of a resource
RDFS:label Human-readable version of a resource name
RDFS:Collection The superclass of Bag, Seq, and Alt
RDFS:String A resource corresponding to M&S definiton of
string (production 15 in the BNF)
WACC '99 XML: Modeling Data & Metadata 80
Recap: RDFS Class Hierarchy
WACC '99 XML: Modeling Data & Metadata 81
Recap: RDFS Constraints
WACC '99 XML: Modeling Data & Metadata 82
RDFS in RDF <RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:s="http://www.w3.org/TR/WD-rdf-schema#">
<s:Class rdf:ID="Property" s:comment="A triple consisting of a property type, a node, and a value" />
<s:Class rdf:ID="PropertyType" s:comment="A name of a property, defining specific meaning for the property" />
<s:Class rdf:ID="Bag" s:comment="An unordered collection" />
<s:Class rdf:ID="Seq" s:comment="An ordered collection" />
<s:Class rdf:ID="Alt" s:comment="A collection of alternatives" />
<PropertyType ID="propName" s:comment="Identifies the property type of a property in reified form" s:domain="#Property" s:range="#PropertyType" />
<PropertyType ID="propObj" s:comment="Identifies the resource that a property describes in reified form" s:domain="#Property" />
<PropertyType ID="value" s:comment="Identifies the value of a property in reified form" />
<PropertyType ID="instanceOf" s:comment="Identifies the Class of a resource" s:range="#Class" />
</RDF>
WACC '99 XML: Modeling Data & Metadata 83
Dublin Core on a Slide
Content Title Subject
Includes keywords Description Source
Metadata of predecessor Language Relation
e.g. isVersionOf, isFormatOf
Coverage Spatial or temporal range
Intellectual Property Creator Contributor
e.g. editor, translator Publisher Rights
Instance Date Type
e.g. novel, poem, TR Format Identifier
WACC '99 XML: Modeling Data & Metadata 84
Deployment Issues
RDF Schemas vs. “XML Schemas” The problems of controlled vocabulary metadata and
self-describing DTDs are close enough to cause confusion XML-Data, by Microsoft & Co conflates the two DTDs-in-XML is a work item before the rechartered XML WG
URI versioning issues As with DTDs, the permanence of the schema identifier is
a popular red herring (as is “performance”)
Compatibilty with ‘push’ product formats Channel Definition Format, Open Software Description,...
But products and services are shipping...
WACC '99 XML: Modeling Data & Metadata 85
Document Content Description
Long-standing goal of XML DTDs expressed in XML “A schema language for tagsets”
Here, DCD models the DL element from HTML in RDF: <DCD>
<ElementDef Type="DL" Model="Elements" Content="Closed"> <Description>A simple 'definition list' construct, which contains paired 'DT' (DL Term) and 'DD' (DL Definition) elements</Description> <Group Occurs="OneOrMore" RDF:Order="Seq"> <Element>DT</Element> <Group Occurs="Optional"><Element>DD</Element></Group> </Group> </ElementDef> <ElementDef Type="DT" Model ="Data" Content="Closed"> <Description>The term being defined in a DL list item</Description> </ElementDef> <ElementDef Type="DD" Model ="Mixed" Content="Open"> <Description>A term's definition in a DL list item</Description> <!-- Open because lots of markup can be in a DL --> </ElementDef> </DCD>
WACC '99 XML: Modeling Data & Metadata 86
DCD: Beyond Text Processing
DCD can describe data types and constraints, too: <DCD>
<ElementDef Type="Booking" Model="Elements" Content="Closed"> <Description>Describes an airline reservation</Description> <Group RDF:Order="Seq"> <Element>LastName</Element> <Element>FirstInitial</Element> <Element>SeatRow</Element> <Element>SeatLetter</Element> <Element>Departure</Element> <Element>Class</Element> </Group> </ElementDef> <!-- example omits boring field declarations --> <ElementDef Type="SeatRow" Model="Data" Datatype="i1" Min="1" Max="72" /> <ElementDef Type="SeatLetter" Model="Data" Datatype="char" Min="A" Max="K"/> <ElementDef Type="Class" Model="Data" Datatype="char" Default="1"/></DCD>
Sample airline booking record: <Booking>
<LastName>Bray</LastName><FirstInitial>T</FirstInitial> <SeatRow>33</SeatRow><SeatLetter>B</SeatLetter> <Departure>1997-05-24T07:55:00+1</Departure></Booking>
WACC '99 XML: Modeling Data & Metadata 87
DCD: Basic Concepts 1/2
DCDs themselves have descriptive parameters Description, (Canonical) Namespace URI Open or Closed: whether documents must validate
Elements Content Model
Empty, Any, Data, Elements, or Mixed ; Root flag
Attribute or AttributeDef declarations Default & Fixed element contents Groups & Order (Seq or Alt); Occurs
Required, Optional, OneOrMore, or ZeroOrMore
WACC '99 XML: Modeling Data & Metadata 88
DCD: Basic Concepts 2/2
Attributes Name, is Global, Required or Optional, has ID-Role
Entities Name & (Value, PublicID, or SystemID)
Datatypes XML DTD types, numbers (int, fixed, float, 1-8
bytes), booleans, times (dates & intervals), & binary data
Min, Max & MinExclusive, MaxExclusive Picture constraints on string fields (per COBOL)
Characters, numbers, decimals, symbols
WACC '99 XML: Modeling Data & Metadata 89
Implications & Applications
What happens when “one application's metadata is another application's data”?
Approaching Tim Berners-Lee’s “next phase of the Web”: Reasoning Engines RDF is a simple frame system -- not a reasoning
system (but one can be built atop it)
Automating the Web of Trust New generation of “Internet-scale” identification,
rights management, authorization tools need signed RDF assertions for trust management
WACC '99 XML: Modeling Data & Metadata 90
For more information...
The Specifications http://www.w3.org/TR/WD-rdf-syntax & WD-rdf-schema
W3C’s RDF & Metadata home pages http://www.w3.org/Metadata/ & /RDF/
Eric Miller’s introduction http://www.dlib.org/dlib/may98/miller/05miller.html http://purl.oclc.org/~emiller/talks/www7/tutorial
Dave Beckett’s RDF Resources http://www.cs.ukc.ac.uk/people/staff/djb1/
research/metadata/rdf.shtml This talk is at http://www.ics.uci.edu/~rohit/cscw98/rdf/
XML in WebDAV
or, a Tale of Two Standards
Rohit Khare(with Jim Whitehead)
UC Irvine22 February 1999 • WACC ‘99Originally for XML’98 Dev Day
WACC '99 XML: Modeling Data & Metadata 92
XML in WebDAV
“Distributed Authoring and Versioning”WebDAV Abstract ModelWhy WebDAV Used XMLThe WebDAV Protocol DTD PROPFIND Example (§ 8.1.1)Standards Process UpdateCall to Arms
WACC '99 XML: Modeling Data & Metadata 93
Distributed Authoring
Very first Web tools edited, too ... but didn’t have collaboration support
WebDAV founded to prevent “lost-update” HTTP extensions to update documents Metadata to describe sites and pages
Implemented with XML ... about 1/3 of the spec (by volume)
WACC '99 XML: Modeling Data & Metadata 94
WebDAV Abstract ModelResourcesProperties
create, remove, and query metadata; links
CollectionsLocks
exclusive or shared write access
Namespace Operations atomic copy and move commands
WACC '99 XML: Modeling Data & Metadata 95
Why WebDAV Used XML
Used XML in headers and entity-bodies to add extra XML elements to existing data to support internationalization
Used whenever arguments: are unbounded length are shown to humans
Decentralized Extensibility e.g. new locktypes beyond write access
WACC '99 XML: Modeling Data & Metadata 96
The WebDAV Protocol DTD
Protocol elements activelock, multistatus, link, propfind
Property elements creationdate, displayname, getcontentlength
The DAV: Namespace Properties for HTTP-header metadata,
internationalization, and protocol elements
WACC '99 XML: Modeling Data & Metadata 97
PROPFIND Example (§ 8.1.1)
PROPFIND /container/ HTTP/1.1Host: www.foo.barDepth: 1Content-Type: text/xml;
charset="utf-8"Content-Length: xxxx
<?xml version="1.0”encoding="utf-8" ?>
<D:propfind xmlns:D="DAV:"><D:allprop/>
</D:propfind>
HTTP/1.1 207 Multi-StatusContent-Type: text/xml; charset="utf-8"Content-Length: xxxx
<?xml version="1.0" encoding="utf-8" ?><D:multistatus xmlns:D="DAV:">
<D:response><D:href>http://www.foo.bar/container/</D:href><D:propstat>
<D:prop xmlns:R="http://www.foo.bar/boxschema/"><R:bigbox> <R:BoxType>Box type A</R:BoxType></R:bigbox><R:author> <R:Name>Hadrian</R:Name></R:author>
<D:creationdate> 1997-12-01T17:42:21-08:00 </D:creationdate><D:displayname> Example collection </D:displayname><D:resourcetype><D:collection/></D:resourcetype><D:supportedlock> <D:lockentry>
<D:lockscope><D:exclusive/></D:lockscope><D:locktype><D:write/></D:locktype>
</D:lockentry> </D:supportedlock></D:prop><D:status>HTTP/1.1 200 OK</D:status></D:propstat></D:response>
</D:multistatus>
WACC '99 XML: Modeling Data & Metadata 98
Standards Process Update
Imminent IETF Proposed Standard Next steps: IESG Approval, Implementations Launching DAV Searching & Location (DASL)
WG ; Access Control Lists (ACL) to come
Support from several vendors Microsoft, Netscape prototype servers Workflow vendors More announcements on the way...
WACC '99 XML: Modeling Data & Metadata 99
Call to Arms
WebDAV uses XML ... for its protocol commands ... for its properties/ metadata
WebDAV can manage XML ... ideal for versioning ... manages source and derived editions
Protocols can be built atop XML ... though it was tough to coordinate maturity
WACC '99 XML: Modeling Data & Metadata 100
For More Information...
WebDAV Working Group [email protected] mailing list Meeting @ LA IETF
Chair: E. James Whitehead [email protected] http://www.ics.uci.edu/~ejw/authoring
(this talk will be available at that url)
Acknowledgments to many, many others