xml and chaims dorothea beringer

29
May 1999, D RAFT XML and CHAIMS 1 XML and CHAIMS Dorothea Beringer The Extensible Markup Language and its Use for CHAIMS Main reference: W3C Recommendations for XML 1.0

Upload: michon

Post on 19-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

XML and CHAIMS Dorothea Beringer. The Extensible Markup Language and its Use for CHAIMS Main reference: W3C Recommendations for XML 1.0. Element Tags and Attributes. Start and end element tags with PCDATA or other elements in between: Rochelle - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 1

XML and CHAIMS

Dorothea Beringer

The Extensible Markup Language and its Use for CHAIMS

Main reference: W3C Recommendations for XML 1.0

Page 2: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 2

Element Tags and Attributes• Start and end element tags with PCDATA or other elements in

between: <DES repname=“lastname” >Rochelle</DES>

• Empty element tags: <H:Goal text=“So what?” />

• Attributes (name-value pairs, value is always text): <INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”>

For documents: tags for structure, attributes for additional information concerning structure, all text is PCDATA:<p indent=“true”><b>My text:</b> this is an explanation for my text.</p>

For protocols: large amount of freedom for putting information into tag-name, attribute or CDATA:<CPAMprimitive> <type>INVOKE_request</endtype> <clientid>09870987sdf</clientid> <methodname>makeHoroscope</methodname></CPAMprimitive>

<lastname>Rochelle</lastname>

Page 3: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 3

Comments, CDATA, Characters

• Comments: <!-- this text is comment -->

• CDATA: markup is interpreted as character data, not as markup (e.g. in a document about XML)<![CDATA[ text to be escaped ]]>

<![CDATA[ <name>Michelle</name> ]]>

• Characters: » only characters from chosen character set are allowed

» use character references for uncommon characters

» default escape characters: &amp; for &, &lt; for <, &gt; for >, &quot; for “, &apos; for ‘

Page 4: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 4

PrologProlog: • XML-declaration: <?xml version=“1.0” encoding=“UTF-8” ?>

» ISO 10646 UTF-8: default encoding

• other processing instructions: <? …… ?>

• document type declaration: <!DOCTYPE CCIS:Message SYSTEM “CCISMessage.dtd” [ ….. ]>or <!DOCTYPE mydoc>

• internal DTD as part of document type declaration

Additional DTD’s: • external DTD in document type declaration• use parameter entity definition as part of internal DTD:

<!ENTITY % HoroscopeDTD SYSTEM“http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” >

%HoroscopeDTD;

• internal DTD is read first, and thus overrides external DTD

Page 5: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 5

Example of an XML Document (1)<?xml version=“1.0” encoding=“UTF-8” ?>

<!DOCTYPE CCIS:Message SYSTEM “CCISMessage.dtd” [

<!ENTITY % HoroscopeDTD PUBLIC “-//HoroscopeCompany//TEXT Standard MagicHoroscope//EN” “http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” >

%HoroscopeDTD;

]>

<CCIS:Message version=“0.1” xmlns=“CCIS” xmlns:H=“MagicHoroscope”>

<INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”>

<Parameters>

<DEC repname=“persdat” type=“list” fullname=“Personal Data for Horoscope”>

<DEC repname=“name” fullname=“All names of person”>

<DES repname=“firstname” type=“string”>Michelle</DES>

<DES repname=“middlename” >Andr&eacute;e</DES>

<DES repname=“lastname” >Rochelle</DES>

<DES repname=“addname” fullname=“additional name”>Judit</DES>

<DES repname=“addname” fullname=“additional name”>Monique</DES>

<DES repname=“addname” fullname=“additional name”>Claire</DES>

</DEC>

Page 6: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 6

Example of an XML Document (2)continued from previous slide:

<DES repname=“birthdate” fullname=“Date of Birth” type=“UTCdatetime” description=“exact date and time of birth”>1974-03-23T22:10:35Z</DES>

</DEC>

<DES repname=“fav” fullname=“Favorite Sentence”>Hello, Hello!!</DES>

<DEO repname=“magic_addr” type=“unknown” fullname=“Magic Address” description=“magic address as returned by magic address makers”>

<H:Address>

<H:Homepage href=“http://www-db.stanford.edu/~meier”/>

<H:Picture href=“ftp://ftp.pictures.com/random-picture”/>

<H:Name>Adonso Alerta</H:Name>

<H:MoreInfo>href=“http://www-db.stanford.edu/~meier”</H:MoreInfo>

<H:Goal text=“So what?” />

</H:Address>

</DEO>

</Parameters>

</INVOKE_request>

</CCIS:Message>

Page 7: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 7

The DTDDTDs restrict structure of document and provide

default values:» Well-formed document: adheres to XML specification

» Valid document: adheres to DTD ==> validating parsers

• Element type declaration specify the element content model: <!ELEMENT EXTRACT_response (Parameters, Accuracies?, Error*) ><!ELEMENT Parameters (DEC | DES | DEO)* ><!ELEMENT DES #PCDATA><!ELEMENT DEO ANY><!ELEMENT MagicHoroscope:Goal EMPTY>

Content model has to be deterministic.

• Attribute list declaration:<!ATTLIST CCIS:DOS CCIS:repname NMTOKEN #REQUIRED CCIS:fullname CDATA #IMPLIED CCIS:type (string, integer, real, UTFdatetime) “string” CCIS:compliancy CDATA #FIXED “compliant to CHAIMS”>

Page 8: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 8

Example: CCISMessage.dtd<!ELEMENT CCIS:Message ( SETUP_request | SETUP_response | INVOKE_request

| INVOKE_response)> <!-- not complete -->

<!ATTLIST CCIS:Message version CDATA "0.1”requestnr CDATA "" >

<!ELEMENT INVOKE_request (Parameters)?>

<!ATTLIST INVOKE_requestclientid NMTOKEN #REQUIREDmethodname CDATA #REQUIRED>

<!ELEMENT CCIS:Parameters (CCIS:DES | CCIS:DEC | CCIS:DEO)*>

<!ELEMENT CCIS:DEC (CCIS:DES | CCIS:DEC | CCIS:DEO)* >

<!ATTLIST CCIS:DEC

CCIS:repname NMTOKEN #REQUIRED

CCIS:type NMTOKEN "list"

CCIS:fullname CDATA #IMPLIED

CCIS:description CDATA #IMPLIED>

<!ELEMENT CCIS:DES #PCDATA >

Page 9: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 9

IDsAttributes of type ID and IDREF can be used for

links within the same document:<?xml version=“1.0”?><!DOCTYPE document [ <!ELEMENT document ((#PCDATA | ref)*, paper*)> <!ELEMENT ref #PCDATA> <!ATTLIST ref ref IDREF #REQUIRED> <!ELEMENT paper #PCDATA> <!ATTLIST paper id ID #REQUIRED>]><document> This is text of my document that describes the paper <ref ref=“Wie99”>[Wiederhold99]</ref> into all possible details.<paper id=“Wie99”>Gio Wiederhold; “The advantage of CLAM”; not yet published</paper></document>

or by using Xpointers: <document> This is text of my document that describes the paper <pointer href=“#Wie99”>[Wiederhold99]</pointer> into all possible details.

ID’s must be unique within a document.

Page 10: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 10

Entity References• Parameter Entities, only to be used within DTDs: <!ENTITY % doctype Proposal > in internal DTD statements

<!ELEMENT &doctype; ANY> in external DTD, expands to <!ELEMENT Proposal ANY>

• Entities, to be used everywhere: <!ENTITY su “Stanford University” > in DTD

<!ENTITY eacute &#xE9; >

Palo Alto is famous for nearby &su;. in document body <DES repname=“middlename” >Andr&eacute;e</DES>

• External Entity References: <!ENTITY % HoroscopeDTD PUBLIC “-//HoroscopeCompany//TEXT

Standard MagicHoroscope//EN” “http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” >

<!ENTITY logo SYSTEM “ftp://ftp.epfl.ch/pub/logos/epfl.gif” NDATA gif>

• Notation declaration: <!NOTATION gif SYSTEM “/u/bin/gifviewer”>

Page 11: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 11

Namespaces, Binary DataDefining a namespace for this and all enclosed elements:

<Message xmlns=“CCIS” xmlns:U=“UTCStandard”><INVOKE_request clientid=“09870987sdf” U:date=“1999-05-03”

U:time=“23:01:00” methodname=“makeHoroscope”>

expands to:<CCIS:Message><CCIS:INVOKE_request CCIS:clientid=“09870987sdf”

UTCStandard:date=“1999-05-03” UTCStandard:time=“23:01:00” CCIS:methodname=“makeHoroscope”>

Binary data:• external reference:

<BIN XML-LINK=“simple” HREF=“www.my.com/myfile” />

• internal, yet encoded as bin-hex or uuencode:<DES repname=“bindata” type=“uuencoded”>begin 644 tmp

)37D@5&amp;5X=“$* end</DES>

Page 12: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 12

XLink (1)An XLink element links together several resources:

» an element is a linking element if it has attribute xml:link

» locators identify resources participating in the link

» one-directional, bi-directional, multi-directional links

Reserved attributes used for making linking elements:» xml:link: “simple”, “extended”, “locator”, “group”, “document”

» href: defines a remote locator participating in link, consists of a URI to remote resource and/or a connector (# or |) with an Xpointer to desired fragment of resource

» inline: “false”, “true” (default); inline: one of the resources of the link is the local resource given by the content of the link element

» show: “embed”, “replace”, “new”

» actuate: “auto”, “user”

» behavior: additional information how link should behave

Page 13: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 13

XLink (2)» title: human-readable text describing the linked resource

» role: role of the linked resource in the context of the originating resource

» content-role: role of the local resource in the context of the remote resource

» content-title: title of local resource

Naming conflict: use attribute remapping:

<mylink xml:link=“simple” xml:attributes=“role xlinkrole” xlinkrole=“lrole”...

Simplifying XML-document: specify linking attribute values in DTD

Different kinds of links: - simple links - extended link groups (using xml:link=“group - extended links xml:link=“document”, attribute steps)

XLink and XPointer (not part of XML) are still working drafts (April 1999)!

Page 14: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 14

XLink (3)Simple links:

» just one remote resource, all in one element (xml:link=”simple”)

» normally an inline link

<Picture xml:link=“simple” href=“mypicture.gif” />

<BroaderTerm xml:link=“simple” role=“bt” content-role=“nt” href=“file:/u/dict/terms.dict|=building” actuate=“user” show=“embed” title=“Broader Term”> broader term</BroaderTerm>

Extended links:» link-element (xml:link=“extended”) contains attributes for whole link

» one locator element (xml:link=“locator”) for each remote resource

» normally an out-of-line link ==> detached from the resources that are linked together

<dictionary xml:link=“extended” inline=“false” role=“all synonyms”> <word xml:link=“locator” role=“synonym” href=“#big_id”/> <word xml:link=“locator” role=“synonym” href=“#id(large_id)”/> <word xml:link=“locator” role=“synonym” href=“#origin().great_id”/></dictionary>

Page 15: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 15

XPointer (1)Usage: often part of a locator, specifies fragments of an XML document

<mylink href=“myCCISmessage.xml#root().child(1,INVOKE_request).child(1, “Parameters”).child(1,DEC,repname,persdat).child(1,DEC,repname,name)” >

XPointer starts with a node given by one of:» Root: root(), default, start is root element of document that is given by

URI part of locator or from local document

» Origin: origin(), start is the containing element of locator, no URI allowed in locator

» ID: id(myID), shortcut: myID, looks for an element with an ID-attribute (specified in DTD with ID) with value myID

» HTML: html(target), looks for an element <A name=“target” ….>

In that node the target fragment is found by:» child, descendant, ancestor, preceding, following, psibling, fsibling

» plus all (all candidate elements are selected) or a number

» optionally plus the name of the element type or #element, #pi, #comment, #text, #cdata, #all, gives candidate nodes

Page 16: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 16

XPointer (2)

» optionally plus pairs of attribute name and value* any attribute name or any value#IMPLIED no value is given for this attribute

or exact string or substring

» or span(XPointer, XPointer), returns all in between

» or attr(attributename), returns just this attribute value

» or root().child(1).string(5,”hello”, 1, 3), selects first three characters of the fifth occurrence of the substring hello in first child of root element

cdatacdata cdatacomment cdata

precedingfollowing

cdatapi

child descendantancestorpsiblingfsibling

Page 17: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 17

StylesHTML

» tags have structural as well as some representational semantics, e.g.: <a>, <header1>

» additional formatting with CSS1 (Cascading Style Sheets)

XML» no representational semantics at all

==> all representational information in additional documents

• CSS1

• XSL (Extensible Style Sheet Language), draft, based on DSSSL, contains construction rules for elements

» linking to XML document by XML processor either by external rules (e.g. user defined style sheet) and/or by processing instructions in XML document (draft!):<?xml-stylesheet href=“mystyle.css” title=“Compact type=“text/css”?>

Page 18: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 18

XML Styles in CHAIMSI/O-megamodules in CHAIMS:• parameter in XML without style document: rendering (e.g. into

RTF or HTML) according to default rules, based on datatypes

• parameter in XML with additional style document: rendering according to style rules in style document

Origin of style documents?» http-link in parameter to style document on site of a

megamodule provider

» …..?

Page 19: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 19

Query Languages for XMLQuery languages: extract certain parts from an XML-

document based on a filter/query

• XQL» http://metalab.unc.edu/xql/

» example: /novel//author[@gender=‘mail’ and @size=‘5.4’]

• XML-QL» http://www.w3.org/TR/NOTE-xml-ql

» other syntax than XQL

» more complex, allows joins…

In CHAIMS:• allowing queries in extract primitives (enhancing partial extraction)

• allowing queries in assignment primitives to variables and input parameters in invocation primitives

• only on structures exposed in repository, or also on opaque structures?

Page 20: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 20

DOM and ParsersDOM (Document Object Model) is a programming API

for XML documents ==> closely linked to parsers!

• DOM» http://www.w3.org/TR/REC-DOM-Level-1

» http://www.w3.org/TR/WD-DOM-Level-2

» general specification of the API

» representation of an XML-document as a tree in a programming language

• Parsers» e.g. IBM XML4J Parser for Java and others

» http://www.software.ibm.com/xml/resources/

Page 21: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 21

XML and Other Standards• ASN.1: binary, semantic defined per application

• TeX: readable program, semantic for typesetting, directives for sections, pages, etc.

• PostScript: programming language, semantic for typesetting, directives for sections, pages, etc., page oriented

• RTF: unreadable text, semantic for presentation and structure

• HTML: readable text, presentational and declarative, semantic and limited presentation defined

• SGML: readable text, declarative, semantic and DTD defined per application, richer than XML

• XML: readable text, declarative, extensible, style described externally in CSS and XSL, semantic and DTD defined per application

• RDF: XML text, declarative, metadata schema

Page 22: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 22

Limitations of XMLXML is just a syntax

» it can be used for many different things in many different ways…

» compare to alphabet - though I can read any language using the Latin alphabet, I do not understand most of them

XML is general syntax for marked text / information» specific text-formats still useful

» binary formats still useful

XML has no type system» specify types of PCDATA with DTD and validate it with parsers?

» work-around: see example

specific

general

text binary

mif

xml

dump

ASN.1

Page 23: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 23

Why Using XML for CPAM?XML is one common way to mark up text, and to define the structure for

the mark-up, so why not use XML instead of defining our own?

Advantages of readable text with declarative mark-up:» human readable (in contrast to ASN.1) ==> monitoring of CPAM is

straightforward

» text-based (no marshalling problems)

» extendable! no more problems when extending CPAM as long as old version is subset of new version

» parsers and DTDs (can be much more error- and extension tolerant) instead of method signatures, message-paradigm!

» combinable with XSL and XQL

Advantages of mark-up that supports attributes:» traditional RPCs: type, name and other attributes are defined in

separate documents, e.g. IDL files, header files

» CHAIMS: all data elements carry with them type, name and even descriptive name information

==> use attributes in XML for this

Page 24: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 24

Why Using XML for the CHAIMS Repository?

• Repository is already plain text with mark-up

• Precise and explicit DTD as part of repository

• Using (other) of the shelf parsers

• Combining repository with style-sheet for representation (does that make sense? the repository wizard is more helpful)

• Yet text will get more lengthy

Will it really make a difference?

Page 25: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 25

CPAM in XMLPart of or all the message in XML?1 All the message is one XML file

2 Only input and result parameters for methods are in XML

2a One XML file for each parameter

2b All parameters are in one file

Information in element types, element values, or in attributes?

• which primitive: <EXTRACTprimitive> …. </EXTRACTprimitive>

<primitive><primitivetype>EXTRACT<primitivetype>....</primitive>

<primitive primitivetype=“EXTRACT”>….</primitive>

• parameter names and types

Page 26: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 26

CPAM in XML: Parameters (1)<CCIS:Message version=“0.1” xmlns=“CCIS” xmlns:H=“MagicHoroscope”>

<INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”>

<Parameter repname=“persdat” type=“list” fullname=“Personal Data”>

<DE repname=“name” type=“list” fullname=“All names of person”>

<DE repname=“firstname” type=“simple” datatype=“string”>Michelle</DE>

<DE repname=“lastname” type=“simple” datatype=“string”>Smith</DE>

</DE>

<DE repname=“address” type=“opaqueXML” fullname”Address of Person”>

<Street>234234 El Camino Real</Street>

<City>Palo Alto, CA 94305</City>

<Other what=“picture” xml:link=“simple” href=“http://www.a.b/pict” />\>

</DE>

</Parameter>

<Parameter>

<DE repname=“sdt” type=“dateTime.iso8601tz” name=”Date and Time of Submission”>19990528T08:24:45+08</DE>

</Parameter>

Page 27: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 27

CPAM in XML: Parameters (2)

<Parameter>

<DE repname=“grade” type=“number”>3.24</DE>

</Parameter>

</INVOKE_request>

</CCIS:Message>

• datatype: according to XML-Data (or subset of it)

• type: simple, list, opaque, opaqueXML, link (to file that contains one or more parameters in above syntax)

• each top-level parameter is has a <Parameter> mark-up

• <DE> according to repository, data-structure can be exposed down to a certain level

• at any level the opaque part of the parameter can start, if there is an opaque structure at all

Page 28: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 28

Parameter table in WrapperEach parameter is either:

» native

» XML in string form

» XML in DOM

For long data: » empty data value plus Xlink into a file, when sending data,

Xlink has to be replaced by actual data???

Direct dataflow: » Keeping link in XML-message? Referencing unique URL?

Additional expiration flag? Until then no expiration allowed and other megamodule can get data?

» Adding to extract request: “links allowed”

Page 29: XML and CHAIMS Dorothea Beringer

May 1999, DRAFT

XML and CHAIMS 29

Incremental Extract...

Presetting of parameters: » only for highest level parameters

Extract/Examine: » only highest level parameters?: easy

» or also indivicual DEs?: tricky, maybe using Xpointers or XQL