xmlandkm xml and km powering information and retrieval for the semantic web frank cervone assistant...

94
X X M M L L and and K K M M XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern University f- [email protected] u Darlene Fichter Data Library Coordinator, University of Saskatchewan Library darlene . fichter @ usask .ca

Upload: ethan-leiner

Post on 31-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML and KM Powering Information and Retrieval

for the Semantic Web

Frank CervoneAssistant University Librarian for Information Technology,

Northwestern University

[email protected]

Darlene FichterData Library Coordinator,

University of Saskatchewan Library

[email protected]

Page 2: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Introductions

• Who are you?

• Where do you work?

• What is your experience with KM?

• What is your interest in XML?

Page 3: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Outline

• Semantic Web and KM• What is XML?• SGML & HTML - where do they fit?• XML - Structure and Elements• XML Applications

– Integration of disparate content• News

– Expertise profiling– Enterprise solutions

Page 4: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Semantic Web

“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

Tim Berners-Lee and others

Page 5: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

One Goal

Support elaborate precise

searchesby integrating

and utilizing all relevant sources of information / relationships.

Illustration from Scientific American May 1, 2001

Page 6: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Is XML a magical fix?

• Not likely.

• It does not magically integrate redundant data versions

• We’re unlikely to replace systems with single, common shared version of integrated just for this reason

• But, if used correctly, XML can help

Page 7: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Harness the Power of Semantics

• If we wish to harness this power, then we need to– To understand and resolve the different words

and meanings we use to refer to the same things– Consider ways and means of defining standard

terminology & establishing agreed upon meaning usually through standard metadata

– Be able to use XML messaging between applications and transformations

Page 8: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Pieces

Page 9: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML – Codification of Knowledge

Knowledge Representation

In order for the “idea” to become a reality computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.

Page 10: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Why talk about the semantic web?

• Many of the “information intensive” processes of KM are facing the same challenge– Capture – formalize existing knowledge– Select and assess relevance, value ..– Store – in repository with schema– Share – distribute based on interest and work– Apply – retrieve, use in daily work– Create new knowledge

Beckman, T. Eight stage process of KM

Page 11: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML & KM – What’s the connection?

• Many KM activities that have nothing to do with technology

• Some KM activities have technology is a key enabler or component– in these cases XML is often under the

hood– Knowing about XML means we can exploit

the opportunities and see the limitations

Page 12: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Overview

• Structured data interchange– A common syntax for expressing structure in

data

• Designed to account for “unstructured” data– documents

• Inherently conveys meaning/structure• Content and display separate from

structure• Delivered via standard text files

Page 13: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML in 7 bullets

• New, but not that new• Structured data in a text file via markup• Self-describing information• Looks like HTML but isn't• Verbose text, isn't meant to be read• License-free, platform-independent and well-

supported• A family of technologies

(parts adapted from Bert Bos, http://www.si.uniovi.es/mirror/www.w3.org/XML/1999/XML-in-10-points)

Page 14: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Driving Forces for XML Adoption

• Internationalized media-independent electronic publishing

• Definition of platform-independent protocols for the exchange of data– electronic commerce– knowledge harvesting

• Information delivery to user agents – automatic processing after receipt

Page 15: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Benefits of Adoption

• Easier to develop software – handle specialized information distributed

over the Web

• Processing information using lighter-weight software

• Allows greater end-user control of information display– style sheets

• Metadata for resource discovery

Page 16: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

The *ML family

• SGML

• HTML

• XML

From World Wide Web Consortium note W3C Data Formats, by Tim Berners-Lee.

Page 17: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

SGML

• Designed for documents

• Very powerful

• Very complicated

• “Well defined” = strict rules

• Rigid - not very extensible

• Inappropriate for wide-spread use

Page 18: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

HTML

• Simple, general-purpose document markup language

• Simple hyperlinking

• Designed for collaborative authoring

• Combined authoring and viewing roles

Page 19: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

HTML Evolution

• Started with simple document description– Few tags designed for structuring

documents

• Quickly evolved– forms– images– tables– frames– fonts

Page 20: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

HTML shortcomings

• Not easily extensible– HTML standards change too slowly– Browser-specific tags ("extensions")– Totally geared toward document display

• Limited data formatting– mathematics

• Can't markup data in any structurally meaningful way

Page 21: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Why can’t HTML be used for information exchange?

• HTML markup provides no inherent method of knowing what the information is about

• Browser paradigm is too constraining • Metadata schemes are deficient

– Search engines return far too many hits

• Can't related information items (pages) to one another

• One-way linking is somewhat limited

Page 22: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

How HTML confuses content and presentation

• <h1>…<h6>

• <br>

• <p></p>

• <center>

• <table>

Page 23: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Example - content and presentation mixture in HTML

<HTML>

<BODY BGCOLOR=#FFFFFF>

<H1>005.72 M849et2001</H1>

<I>Enterprise application integration with XML and Java

</I>

<BR>

Upper Saddle River, NJ : Prentice Hall PTR, 2001

</BODY>

</HTML>

Page 24: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

But what does it mean?

Page 25: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML represents structure, not presentation

<marc>

<field=“245” indicator_1=“1” indicator_2=“0”>

<subfield=“a”>Enterprise application integration with XML and Java</subfield>

<subfield=“c”>J.P. Morganthal, with Bill la Forge</subfield>

</field>

<field=“260”>

<subfield=“a”>Upper Saddle River, NJ</subfield>

<subfield=“b”>Prentice Hall PTR</subfield>

<subfield=“c”>2001</subfield>

</field>

</marc>

Page 26: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML is hierarchical

aEnterprise Application I ntegration w ith XML and J ava

cJ .P. Morganthal, w ith Bill la Forge

245title

aUpper Saddle R iver, NJ

bPrentice Hall PTR

c2001

260publisher

MARC

Page 27: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Nesting

<bigdoll>

<mediumdoll>

<littledoll>

rosette theme <littlestdoll/>

</littledoll>

<mediumdoll>

</bigdoll>

Page 28: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Elements, Attributes, and Content

<field=“245” indicator_1=“1” indicator_2=“0”>

<subfield=“a”>Enterprise application integration with XML and Java</subfield>

<subfield=“c”>J.P. Morganthal, with Bill la Forge</subfield>

</field>

Page 29: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

DOM – Document Object Model

• DOM – a platform- and language-neutral interface that allow programs and scripts to dynamically access and update the content, structure and style of documents

• Built into web browsers and servers– Used by web browser for dynamic display

capabilities

Page 30: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Document Type Definition (DTD)

• A set of syntax rules for creating tags

• Defines – What tags can be used– The order they should appear in– Which tags can be nested– Which tags have attributes

• Can be part of an XML document– Typically defined externally

Page 31: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

DTD and Elements

<!DOCTYPE BOOK[<!ELEMENT BOOK(AUTHOR?, TITLE,

PUBLISHER+,SUBJECT*)<!ELEMENT AUTHOR (#PCDATA)>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PUBLISHER (#PCDATA)>

<!ELEMENT SUBJECT (#PCDATA)>

]>

Page 32: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Attributes

<!ELEMENT PERSON EMPTY> <!ATTLIST PERSON person_id ID #REQUIRED> <!ATTLIST PERSON sex (M | F) #IMPLIED> <!ATTLIST PERSON status (employee | trainee) “employee”> <!ATTLIST PERSON company CDATA #FIXED “XYZ”>

Page 33: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Schemas

• Introduces a mechanism for strong typing– Allows a schema to be directly imported

into a database to create a table

• Standardized NULL representation

• Key representation

Page 34: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Well-formed and valid

• Well-formed– Conforms to the general rules of XML

syntax, which are very rigorous– Example – a tag must always be ended

• <title>Discourse Analysis</title>• <subtitle/>

• Valid– Documents that conform to the specific

DTD in use

Page 35: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML-Link and XML Pointer

• Open set of linking elements• Non-directional

– arbitrary– non-hierarchical

• XML Pointer– Enables addressing any part of a text

• A more powerful HTML “anchor” tag

• XML-Link– Enables attaching a behavior to a link– Extended links, similar to a web ring

Page 36: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML-Link Example

<related-URL-group>search

<related-URL HREF=“altavista.xml”/>

<related-URL HREF=“webbrain.xml”/>

<related-URL HREF=“yahoo.xml”/>

</related-URL-group>

<!ELEMENT related-URL-group (#PCDATA | related-URL)*>

<!ATTLIST related-URL-group

XML-Link CDATA #FIXED “EXTENDED”

INLINE CDATA #FIXED “TRUE”

CONTENT ROLE CDATA #FIXED “RT”

>

Page 37: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Displaying XML information in the browser

• XML parser built in– Relates data stream to DTD and style sheet

• Style Sheets– Only method for formatting XML data for display

• Similar to HTML CSS– More powerful

• XSLT– Processing language that allows for

transformation of data presentation

Page 38: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XHTML

• “Next generation” HTML

• HTML that conforms to XML standards

• Will eventually support integration with other XML applications

• Device independent web-access

Page 39: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XHTML Example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml"><head>

<title>Bare bones example</title></head><body> <p>

<a href="http://validator.w3.org/check/referer"> validate </a>

</p></body></html>

Page 40: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

HTML 4 - XHTML Major Differences

• All related to “well formedness”– Tag/attributes must be in lower-case– Elements must nest, no overlap– All non-empty elements must be closed– All empty elements must be terminated– Attribute values must be quoted– Attributes cannot be minimized– Scripts should be downloaded from server

Page 41: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Life Cycle

• Authoring

• Presentation

• Search and Retrieval

• Integration

Page 42: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

The Big Picture

Page 43: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Just “Add Water & Stir”

XML (document or database)XSLT style sheet

XSLT Processor(XML Parser)

Browser(XML Parser)

Page 44: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Authoring Tools

• Editors (getting the content in)– XML and XSLT Editors

• XML Spy• XML Notepad• XMetal• Xeena

– Word processors• WordPerfect

– Content Management Systems

Page 45: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Spy

• Structured/document editor – XML– DTD– schemas (DCD, XDR, BizTalk, XSD)– XSLT

• Views for: – Structured editing (grid view, table view)– Document editing (WYSIWYG)

• Full Unicode support– MSXML3 is used by default, but can be changed

Page 46: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Notepad

• Quick and dirty editor for Windows

• Doesn't use DTD to guide editing– if present, however, validates it on

document loading

Page 47: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XMetal

• Professional, full-featured XML/SGML editing tool– word processor-like view– source view– tag view

• SGML or XML DTD's– context-sensitive lists of allowed elements and

attributes– supports CALS tables, DOM, CSS, and HTML

• Integrated browser preview for XML documents.

Page 48: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Xeena

• Loads DTD and provides tree-view syntax directed editing

• Aware of the DTD grammar– Makes only authorized elements icons

sensitive– Ensures that all documents generated are

valid according to the given DTD

Page 49: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

WordPerfect

• Word processor with advanced support for authoring XML and SGML documents in a WYSIWYG environment

• Includes – Wizards– Automatic element insertion– Automatic generation of documents.

• The DTD, layout information, and mapping files are incorporated into a single WordPerfect template.

Page 50: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Content Management Systems

• Many CM systems repositories use XML under the hood for tagging and storing information

• Or can “speak” XML – export as XML to allow integration with other applications

• Open any trade magazine and see the standard vendor names proclaim their support for XML

• To the document creator, XML is “invisible”

Page 51: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Conversion Tools

Examples:

• Logictran RTF Converter

• HTML Tidy– Free Windows program– Converts HTML to XHTML or XML

Page 52: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Logictran RTF Converter

• Converts Word and RTF documents to HTML, XML, SGML

• The converter allows you to create output for any DTD.

• You can generate HTML, XHTML, OEB and Docbook.

Page 53: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XSLT Processors

• Means of converting files between XML dialects and other formats – MSXML built into Internet Explorer

• http://msdn.microsoft.com/xml

– Xalan • http://xml.apache.org/xalan-j/index.html

Page 54: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Parsers

Examples • Expat

– Written in C (ported to other languages), used by LIBWWW, Apache, …

• XML4J – from alphaWorks, in Java, based on

Apache Xerces, supports DOM and SAX

• Many other parsers

Page 55: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Servers

• Apache XML– xml.apache.org

built in Xerces XML parser, Xalan XSLT processor

Page 56: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Browsers

• Internet Explorer 6– XML support is fairly extensive– Namespaces are supported– Supports Style sheets in CSS as well as XSLT 1.0

Parser is still an issue

• Netscape 6.1– supports HTML 4.0, XML, CSS, DOM,

namespaces, simple Xlink – Does NOT support XSLT

• Opera – supports XML

Page 57: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Standards & Applications

• Many activities where XML has a role

• OASIS has an extensive list of applications – RSS (news headlines)– MathML– SMIL– DocBook

Page 58: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML Standards – Multiplying Like Rabbits

• Software applications (transactions, interchange)

• Publishing

Page 59: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Software Applications

• Office tools and groupware

• Decision support systems

• Functional/transactional systems for HR, CRM ..

• Intelligent systems (ES, IPSS)

• User support

Page 60: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Publishing

• Digital rights (EBX,…)

• DocBook, e-book, TEI

• News (RSS, ICE, nift, NewsML)

• Special subject area formats (MathML, ChemML, CellML, GeneXML)

Page 61: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Publishing: News

• Web site news• Syndicated news• Headlines• Full text

KM applications• Integrating internal, external news, creating

auto-categorization of news, adding items to the news based on new additions to the repository, user profiling

ICE

RSS

NewsML

nift

Page 62: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RSS (Rich Site Summary)

CRM News www.moreover.com

• Web news format• Simple application• Take a look at the

bits and peices

Page 63: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RSS – Why?

• The Need– Quick, easy, and consistent

announcements pushed out to other sites– Incorporate news and other information

feeds on a site

Page 64: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

How it works

Page 65: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Before RSS

• No standard

• Every one put up what was new and described it differently

• Special one off programs to create parsers and screen scrapers

Page 66: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

The Result

• > 1700 sites sharing news

• Many sites re-posting the headlines

• Examples:• myuserland.com• www.moreover.com• xmlTree - directory of content

Page 67: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RSS Syntax

• RSS file has two major placeholders for data: channel and items.

Page 68: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Channel Element

• The channel element must contain the following:

• title or name of the channel, • short description of the channel, • link to the web site of the channel, and • the language that is encoding the web site.• Also, numerous optional elements can be

included with the channel, such as copyright, webmaster, publication date and so on.

Page 69: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Item Element

• RSS file can have up to 15 item elements. Item elements are used to store the headlines and are the meat of the document. Item elements have the following elements:

• title• link• description

Page 70: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RSS Code

• First line contains an XML declaration:

<?xml version="1.0"?> • The next item is the DTD identifier <!

DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/ formats/rss-0.91.dtd">

Page 71: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RSS Statement

• Next, the rss element – must specify the version attribute. – may contain an encoding attribute

• the default is UTF-8

<rss version="0.91" encoding= "ISO_8859-1">

Page 72: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Channel Definition

• Contains a single channel element.– Title, description, link to channel’s web site,

language, one or more item elements, lots of optional elements

<channel> <title>moreover... US politics news</title> <link>http://www.moreover.com</link> <description>US politics news - news headlines from around

the web, refreshed every 15 minutes</description> <language>en-us</language>

Page 73: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Item Elements

• Up to 15 item elements <item> <title>'Author Unknown' by Don Foster

</title> <link>http://www.salon.com/books/feature/2000/10/30/pbacks/index.html

</link> <description>Salon Nov 2 2000 6:51AM </description>

</item>

Page 74: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

From Simple Documents to Complex

• Hierarchical

• Many objects and elements

• Many “namespaces”

Page 75: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Namespaces

• A single XML document may contain elements and attributes that are defined for and used by two or more XML-based languages without conflict or ambiguity

Page 76: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Example

<xmlns:book="http://www.oasis-open.org/docbook/

xml/4.1.2/docbookx.dtd">

<xmlns:dc="http://purl.org/dc/elements/1.1/">

<dc:title>Working Knowledge</dc:title>

<dc:description>Overview and case studies of knowledge management</dc:description>

<book:chapter>5. Knowledge Transfer … </book:chapter>

Page 77: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

OEB - Open E-Book

• In September 1999, the group published the Open E-Book 1.0 Publication Structure

• The Open E-book standard is essentially XHTML—that is, a clean version of HTML 4.0 along with support for CSS.

• www.openebook.org

Page 78: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RDF - Resource Description Framework

• Framework for metadata

• Interoperability of information exchange between applications

• Applications:– Resource discovery

– Knowledge sharing and exchange

– Content rating

– Intellectual property rights

Page 79: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RDF Example

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.0/"> <rdf:Description rdf:about="http://your.url" dc:creator=”Frank Cervone" dc:title="My RDF document" dc:description=”Exciting RDF Stuff." dc:date=”2000-11-10" /></rdf:RDF>

Page 80: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Emerging Standards For KM

• XTM

• OPML

• RFML

• FLBC

• ebXML

Page 81: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XTM: Topic Maps

• Used to organize information into knowledge bases

• Topic maps are a new ISO standard for describing knowledge structures and associating them with information resources

• “GPS” for information• http://www.topicmaps.org/xtm/

index.html“A book without an index is like a country without a map”

Page 82: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

OPML

• Outline Processor Markup Language– Outline-structured information

• Used for data the is easily browsed and editable– Specifications– Legal briefs– Product plans– Presentations– Screenplays– Directories

Page 83: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

RFML

• Relational-functional markup language

• Used to define relationship and functions among data elements– Tables within relational databases– Relational views

Page 84: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

FLBC

• Formal Language for Business Communication– Automated communication – Conversation management– Dialog management– Based on speech act theory

• Formally defined message types• Broad range of message types• Defined in terms of intentions• Clear delineation between message type and content

Page 85: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

XML in Use

• Portals

• Content management & syndication

• Content management: industry sector

• Integration

• Analytical/decision making

• Search and retrieval

• Visualization

Page 86: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Applications: Portals

• Portal are an obvious place for XML to be used. Most are integrating diverse data sources.

• Examples:– Hummingbird’s Enterprise Portal Suite

• allows XML-based third party application integration for variety of scripting languages

• Basically “write with your own tools/platform” exchange data with XML

– DataChannel, Sybase Enterprise Portal, Citrix XPS,

Page 87: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Content Production & Syndication

• Interwoven– Intranet/extranet content management and

authoring based on intelligent business rules, profiling etc.

– Newest component of Interwoven’s suite of tools focuses on content distribution and uses XML.

– OpenSyndicate uses a XML repository which allows content to be stored as objects and reused for multiple projects.

Page 88: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Open Syndicate

Page 89: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Content: Industry Specific Solutions

• Ringtail Solutions– Suite of litigation support and KM modules for

legal practitioner

Page 90: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Integration

• InfoShark– Used to integrate data from host of services and

programs, from 100’s to 1000’s of transactions each day

– Automates data exchange between Oracle, IBM DBW and Microsoft SQL for use over Internet, intranets, and extranets

– Being used by Montgomery county for eGov services of all types

Page 91: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Analytical/Decision Making

• Spotfire– DecisionSite 6.2 is powered by XML-based

application manager to tools, guides, resources for Genomics, Chemistry And Manufacturing

Page 92: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Visualization

• Antarcti.ca– visual mapping technology provides enterprises

with data search and discovery,

Page 93: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Not a Silver Bullet

“XML is not the answer to all the world’s problems—it creates new problems, that are awfully damn interesting to solve.”

Simon St. Laurent,

author of XML: A Primer,

on the xml-dev mailing list

Page 94: XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern

XX

MM

LL

andand

KK

MM

Thank you!

• Frank CervoneAssistant University Librarian for Information

Technology, Northwestern University

[email protected]

• Darlene Fichter Data Library Coordinator, University of

Saskatchewan

[email protected]