strategies for subject navigation of linked web sites using rdf topic maps

26
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Carol Jean Godby Devon Smith Devon Smith OCLC Online Computer Library OCLC Online Computer Library Center Center Knowledge Technologies 2002 – Seattle, Knowledge Technologies 2002 – Seattle,

Upload: mckenzie-may

Post on 31-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

Strategies for subject navigation of linked Web sites using RDF topic maps. Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies 2002 – Seattle, WA. Complex Web sites. Many institutions are struggling to solve problems with their official Web sites. But: - PowerPoint PPT Presentation

TRANSCRIPT

Strategies for subject navigation of linked Web sites using RDF

topic maps

Carol Jean GodbyCarol Jean Godby

Devon SmithDevon Smith

OCLC Online Computer Library CenterOCLC Online Computer Library Center

Knowledge Technologies 2002 – Seattle, WAKnowledge Technologies 2002 – Seattle, WA

Complex Web sites

Many institutions are struggling to solve Many institutions are struggling to solve problems with their official Web sites.problems with their official Web sites.

But:But: The contents constantly change.The contents constantly change. The editors can’t exercise sufficient The editors can’t exercise sufficient

control.control. One result: an institution’s major presence One result: an institution’s major presence

on the Web is difficult to navigate.on the Web is difficult to navigate.

The Semantic Web

Tim Berners-Lee’s vision:Tim Berners-Lee’s vision: ““The current Web has documents for people, not The current Web has documents for people, not

computers. By augmenting Web pages with data computers. By augmenting Web pages with data designed for automated processing, users will designed for automated processing, users will transform the Web into the Semantic Web.”transform the Web into the Semantic Web.”

““Computers will find the meaning of semantic Computers will find the meaning of semantic data by following hyperlinks to definitions of key data by following hyperlinks to definitions of key terms and rules for reasoning about them terms and rules for reasoning about them logically.”logically.”

The Semantic Web:An Architecture

Unicode URI

XML + XML namespaces + XMLschema

RDF + RDFschema

Ontology vocabulary

Logic

Proof

Digitalsignature

Trust

Data

Data

Rules

Self-describingdocuments.

Source: Tim Berners-Lee

The promise of the Semantic Web

A common data modelA common data model

Conceptual linksConceptual links

Limited inferencesLimited inferences

Our demo: goals

Represent subject/topic information obtained from Represent subject/topic information obtained from different sources.different sources.

Demonstrate the value of hypothetical metadata-Demonstrate the value of hypothetical metadata-based navigation for a collection of related Web based navigation for a collection of related Web sites.sites. oclc.orgoclc.org Portions of w3c.orgPortions of w3c.org dublincore.orgdublincore.org

Develop and evaluate the utility of Open Source Develop and evaluate the utility of Open Source prototyping tools based on RDF.prototyping tools based on RDF.

SSome common topics

digital library xml

dublin core xml namespace

xml schemametadata

oclc.org w3c.org

dublincore.org

xml fragmentxml stylesheet

element nodedc element syntax

library automationclassification

traditional librarylibrary userslibrary network

xml profileschema processoruri syntax

Sources of subject/topic metadata

HTML keywordsHTML keywords Subject lines in email messagesSubject lines in email messages An index of library/information science An index of library/information science

termsterms Terms extracted automatically from text Terms extracted automatically from text

using natural-language-processing using natural-language-processing algorithmsalgorithms

Some term relationshipsSingular/Plural Library, librariesAcronyms

Standard Generalized Markup Language--SGMLLibrary of Congress Subject Headings--LCSH

Coordinationlibrary and information science--library science, information scienceinformation storage and retrieval--information storage, information retrieval

Broad/NarrowComputational linguistics—linguisticsClassification scheme—classification

Type-of Library—digital library, traditional libraryRelated Library—library classification scheme, library automation

An RDF encoding

<Topic rdf:about=http://purl.org/rdf/topics/<Topic rdf:about=http://purl.org/rdf/topics/classificationclassification>><name><name>classificationclassification</name></name><related_concepts <related_concepts

rdf:resource=“http://purl.org/rdf/topics/rdf:resource=“http://purl.org/rdf/topics/classification_codesclassification_codes”/>”/><related_concepts rdf:resource=http://purl.org/rdf/topics/<related_concepts rdf:resource=http://purl.org/rdf/topics/classification classification

numbernumber”/>”/><types_of rdf:resource=http://purl.org/rdf/topics/<types_of rdf:resource=http://purl.org/rdf/topics/automatic classificationautomatic classification”/>”/><types_of rdf:resource=“http://purl.org/rdf/topics/<types_of rdf:resource=“http://purl.org/rdf/topics/library_classificationlibrary_classification”/>”/><coordinate rdf:resource=“http://purl.org/rdf/topics/<coordinate rdf:resource=“http://purl.org/rdf/topics/resource_discovery and resource_discovery and

classificationclassification”/>”/><coordinate rdf:resource=“http:/purl.org/rdf/topics/<coordinate rdf:resource=“http:/purl.org/rdf/topics/classification and classification and

knowledgeknowledge”/>”/></Topic></Topic>

Connected RDF encodings

<Topic rdf:about=http://purl.org/rdf/topics/<Topic rdf:about=http://purl.org/rdf/topics/resource_discoveryresource_discovery>><name><name>resource discoveryresource discovery</name></name><broad_concepts rdf:resource=“http://purl.org/rdf/topics/<broad_concepts rdf:resource=“http://purl.org/rdf/topics/resourceresource”/>”/></Topic></Topic>

<Topic rdf:about=http://purl.org/rdf/topics/<Topic rdf:about=http://purl.org/rdf/topics/resourceresource>><name><name>resourceresource</name></name><related_concepts rdf:resource=http://purl.org/rdf/topics/<related_concepts rdf:resource=http://purl.org/rdf/topics/resource resource

discoverydiscovery”/>”/><types_of rdf:resource=http://purl.org/rdf/topics/<types_of rdf:resource=http://purl.org/rdf/topics/resource description resource description

frameworkframework”/>”/><related rdf:resource<related rdf:resource=“http://purl.org/rdf/topics/web_resource=“http://purl.org/rdf/topics/web_resource”/>”/></Topic></Topic>

A graphical representation of relationships

classification

classificationcodes

automaticclassification

resource discoveryand classification

Coordination

Broad/Narrow

resourcediscovery

resource

resource descriptionframework

rdf

Type_of

Coordination

Related

Acronym

The philosophy of our system

ModularModular

Open SourceOpen Source

Project Web site accessible at: Project Web site accessible at:

topicmap.oclc.org:5000topicmap.oclc.org:5000

System architecture: 1

Extractterms

Filter terms Structureterms

NormalizedHTML

data

RDF graph

Term filters: using knowledge encoded in the text

Positive contexts for terms: study of, information about, professor of, department of

information science, metadata applications, data processing, automatic classification, computational linguistics, internet resources

Negative contexts for terms: very different things, few messages, good point, interesting example, appealing idea, small extension, terse document, simple kind

System architecture: 2

Harvester (Perl)

File System (HTML)

Metadata Scraper(Perl)

File System(Normalized HTML)

Term manipulator(Java)

File System (XML/RDF)

XML/RDF Loader

Database

Open issues

RDF knowledge in the user interface.RDF knowledge in the user interface.

Encoding in RDF or XML?Encoding in RDF or XML?

The construction of knowledge ontologies.The construction of knowledge ontologies.

Conclusions

The enterprise succeeds or fails on the The enterprise succeeds or fails on the strength of the knowledge ontology.strength of the knowledge ontology.

RDF and the XTM standard are RDF and the XTM standard are descriptively equivalent for our work.descriptively equivalent for our work.

Sophisticated user interface design is Sophisticated user interface design is required to exploit all of the encoded required to exploit all of the encoded information.information.

For more information

Sharon Caraballo. Automatic Construction Sharon Caraballo. Automatic Construction of a Hypernym-Labeled Noun Hierarchy. of a Hypernym-Labeled Noun Hierarchy. PhD dissertation. Brown University, 2001.PhD dissertation. Brown University, 2001.

Carol Jean Godby. A Computational Study Carol Jean Godby. A Computational Study of Lexicalized Noun Phrases in English. of Lexicalized Noun Phrases in English. PhD dissertation. The Ohio State PhD dissertation. The Ohio State University, 2002.University, 2002.