skos ecoterm 2006 alistair miles cclrc rutherford appleton laboratory

27
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

Upload: gisela-lancaster

Post on 02-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory. Semantic Web Best Practices and Deployment. Reminder: what is it?. S imple K nowledge O rganisation S ystem Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

Rutherford Appleton Laboratory

SKOSEcoterm 2006

Alistair MilesCCLRC Rutherford Appleton Laboratory

Semantic Web Best Practices and Deployment

Page 2: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 2

Reminder: what is it?

• Simple Knowledge Organisation System• Formal language for representing

controlled structured vocabularies (thesauri, classification schemes, … ?)

• Subject metadata & information retrieval …– ‘this document is about romantic love’.– ‘this document is about the cure of tuberculosis by x-

ray in India in the 1950s’.

• Application of RDF

Page 3: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 3

Since Ecoterm 2005 …

• SKOS Core Guide & SKOS Core Vocabulary Specification …– First Working Draft May 2005– Second Working Draft October 2005

• Minor changes

• Quick Guide to Publishing a Thesaurus on the Semantic Web …– First Working Draft May 2005

Page 4: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 4

What comes next … ?

• Life after SWBPD-WG … ?• Plans for next phase of W3C

Semantic Web Activity …• New WG?• SKOS W3C Recommendation by end

2007?• N.B. Not yet approved!

Page 5: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 5

If Rec then …

• What is the scope? What is the fundamental design goal?

• First part of SKOS Rec would be requirements specification.

• Between now and Sept/Oct 2006 … define scope and requirements.

Page 6: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 6

What I’d like to do here …

• Talk about some of the assumptions behind SKOS.

• Sketch some ideas on how to define scope and requirements for SKOS.

• Get your [email protected]

“SKOS: Requirements for Standardization”isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf

Page 7: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 7

Brief history of scope …

• 2003-04: SWAD-Europe– ISO 2788 thesauri– “Non-standard” thesauri via extensibility e.g.

GeMET– Classification scheme (PACS)– Multilingual thesauri– Semantic mapping

• 2004: W3C Glossaries• 2005: Discussion re “terminologies”• Subject headings? Gazeteers?

Folksonomies? Taxonomies?

Page 8: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 8

Assumptions: purpose …

• Formal representation of controlled structured vocabularies intended for use in information retrieval applications.

Page 9: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 9

Assumptions: workflow …

a) Build a vocabularyb) Build an indexc) Retrieve

Page 10: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 10

Assumptions: components …

• Vocabulary Development Application– Something to help build a vocabulary

• Indexing Application– Something to help build an index

• Retrieval Application– Something to help retrieve things

• SKOS ultimately designed to support interoperation of these three “key components”.

Page 11: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 11

Proposed scope …

• SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications.

• SKOS is required to support the interoperation of these three key components.

• I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.

Page 12: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 12

Other components …

• Vocabulary mapping … ?• Metadata registries … ?• … ?

Page 13: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 13

Component specs …

• … first discuss social and technological context, then return to component specs …

Page 14: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 14

Context …

• What is the social and technological context in which controlled structured vocabs are used?

• Assume two basic needs…– Locate something I already know about.– Discover something new.

• N.B. a good location service is not necessarily a good discovery service.

– Cf. Google and del.icio.us

Page 15: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 15

Strategies …

• Basic strategies for implementing retrieval services …

1. Statistical text analysis2. Analysis of user behaviour3. Index with controlled vocab

• Other strategies …1. … kos-assisted text analysis?

Page 16: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 16

Cost problem …

• Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment…

• Given that other strategies are cheaper…

• Huge pressure to drive down cost and increase utility.

• Requirement for seamless integration.– I.e. controlled vocab is seldom used in isolation, most

applications will combine strategies.

Page 17: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 17

Use case …

• Search portal …• Use combined strategies.

Page 18: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 18

Component specs …

• Important factors …

• Minimise cost.– Decentralisation.– Assistance.

• Maximise “utility”.– Query expansion.– Smart ranking.– Maximize lifetime.

• Use the Semantic Web!– Situation A. search across many collections, where

indexers use same controlled vocab.– Situation B. search across many collections, where

indexes use different controlled vocabs.

Page 19: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 19

Focus areas …

• Decentralisation requires different models of collaboration and change.

• Representing change a key factor to keeping a vocab applicable.

• Ranking and scoring well understood for text, less so for controlled index.

• Theory of query expansion? Field trials of query expansion?

• Strategies for providing assistance?

Page 20: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 20

Change and collaboration

• Continuum of collaboration models: centralized <-> decentralised

• Continuum of change management models: continuous <-> discrete

• Decentralization can reduce cost of development and maintenance

• Change management can ensure continued utility – maximize ROI

• Support for declarative representation of change a requirement for SKOS.

Page 21: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 21

Semantic Web architecture…

• Exploit Semantic Web facility to distribute and merge data.

• However, publication of data in the Semantic Web, best practices need work.

• See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).

Page 22: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 22

Semantic Web architecture

Page 23: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 23

Direct interaction …

Page 24: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 24

Information retrieval…

• Indexing and query evaluation well understood for text content.

• Less well understood for controlled metadata.

• Query types?• Query evaluation strategies, e.g.

query expansion?• Ranking?

Page 25: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 25

Assistance for indexers …

• Provide suggestions– Comparison of labels and annotations– Machine learning – Exploit lexical resources– … ?

Page 26: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 26

Assistance for mappers …

• Provide suggestions …– Analysis of labels and annotations– Exploit lexical resources– … ?

Page 27: SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 27

Summary

• SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies.

• Define requirements by describing information retrieval functionalities.

• Divide functionalities into:– Presentation styles– Query types e.g. compound queries, coordination …– Query evaluation strategies

• Assumptions:– Key components– Semantic Web interaction– Context – pressure to make vocabularies “profitable”– … Issues: change, assistance, theory …