rutherford appleton laboratory skos ecoterm 2006 alistair miles cclrc rutherford appleton laboratory...

27
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

Upload: roberta-jefferson

Post on 11-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

Rutherford Appleton Laboratory

SKOSEcoterm 2006

Alistair MilesCCLRC Rutherford Appleton Laboratory

Semantic Web Best Practices and Deployment

Page 2: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 2

Reminder: what is it?

• Simple Knowledge Organisation System• Formal language for representing

controlled structured vocabularies (thesauri, classification schemes, … ?)

• Subject metadata & information retrieval …– ‘this document is about romantic love’.– ‘this document is about the cure of tuberculosis by x-

ray in India in the 1950s’.

• Application of RDF

Page 3: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 3

Since Ecoterm 2005 …

• SKOS Core Guide & SKOS Core Vocabulary Specification …– First Working Draft May 2005– Second Working Draft October 2005

• Minor changes

• Quick Guide to Publishing a Thesaurus on the Semantic Web …– First Working Draft May 2005

Page 4: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 4

What comes next … ?

• Life after SWBPD-WG … ?• Plans for next phase of W3C

Semantic Web Activity …• New WG?• SKOS W3C Recommendation by end

2007?• N.B. Not yet approved!

Page 5: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 5

If Rec then …

• What is the scope? What is the fundamental design goal?

• First part of SKOS Rec would be requirements specification.

• Between now and Sept/Oct 2006 … define scope and requirements.

Page 6: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 6

What I’d like to do here …

• Talk about some of the assumptions behind SKOS.

• Sketch some ideas on how to define scope and requirements for SKOS.

• Get your [email protected]

“SKOS: Requirements for Standardization”isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf

Page 7: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 7

Brief history of scope …

• 2003-04: SWAD-Europe– ISO 2788 thesauri– “Non-standard” thesauri via extensibility e.g.

GeMET– Classification scheme (PACS)– Multilingual thesauri– Semantic mapping

• 2004: W3C Glossaries• 2005: Discussion re “terminologies”• Subject headings? Gazeteers?

Folksonomies? Taxonomies?

Page 8: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 8

Assumptions: purpose …

• Formal representation of controlled structured vocabularies intended for use in information retrieval applications.

Page 9: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 9

Assumptions: workflow …

a) Build a vocabularyb) Build an indexc) Retrieve

Page 10: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 10

Assumptions: components …

• Vocabulary Development Application– Something to help build a vocabulary

• Indexing Application– Something to help build an index

• Retrieval Application– Something to help retrieve things

• SKOS ultimately designed to support interoperation of these three “key components”.

Page 11: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 11

Proposed scope …

• SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications.

• SKOS is required to support the interoperation of these three key components.

• I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.

Page 12: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 12

Other components …

• Vocabulary mapping … ?• Metadata registries … ?• … ?

Page 13: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 13

Component specs …

• … first discuss social and technological context, then return to component specs …

Page 14: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 14

Context …

• What is the social and technological context in which controlled structured vocabs are used?

• Assume two basic needs…– Locate something I already know about.– Discover something new.

• N.B. a good location service is not necessarily a good discovery service.

– Cf. Google and del.icio.us

Page 15: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 15

Strategies …

• Basic strategies for implementing retrieval services …

1. Statistical text analysis2. Analysis of user behaviour3. Index with controlled vocab

• Other strategies …1. … kos-assisted text analysis?

Page 16: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 16

Cost problem …

• Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment…

• Given that other strategies are cheaper…

• Huge pressure to drive down cost and increase utility.

• Requirement for seamless integration.– I.e. controlled vocab is seldom used in isolation, most

applications will combine strategies.

Page 17: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 17

Use case …

• Search portal …• Use combined strategies.

Page 18: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 18

Component specs …

• Important factors …

• Minimise cost.– Decentralisation.– Assistance.

• Maximise “utility”.– Query expansion.– Smart ranking.– Maximize lifetime.

• Use the Semantic Web!– Situation A. search across many collections, where

indexers use same controlled vocab.– Situation B. search across many collections, where

indexes use different controlled vocabs.

Page 19: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 19

Focus areas …

• Decentralisation requires different models of collaboration and change.

• Representing change a key factor to keeping a vocab applicable.

• Ranking and scoring well understood for text, less so for controlled index.

• Theory of query expansion? Field trials of query expansion?

• Strategies for providing assistance?

Page 20: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 20

Change and collaboration

• Continuum of collaboration models: centralized <-> decentralised

• Continuum of change management models: continuous <-> discrete

• Decentralization can reduce cost of development and maintenance

• Change management can ensure continued utility – maximize ROI

• Support for declarative representation of change a requirement for SKOS.

Page 21: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 21

Semantic Web architecture…

• Exploit Semantic Web facility to distribute and merge data.

• However, publication of data in the Semantic Web, best practices need work.

• See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).

Page 22: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 22

Semantic Web architecture

Page 23: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 23

Direct interaction …

Page 24: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 24

Information retrieval…

• Indexing and query evaluation well understood for text content.

• Less well understood for controlled metadata.

• Query types?• Query evaluation strategies, e.g.

query expansion?• Ranking?

Page 25: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 25

Assistance for indexers …

• Provide suggestions– Comparison of labels and annotations– Machine learning – Exploit lexical resources– … ?

Page 26: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 26

Assistance for mappers …

• Provide suggestions …– Analysis of labels and annotations– Exploit lexical resources– … ?

Page 27: Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 27

Summary

• SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies.

• Define requirements by describing information retrieval functionalities.

• Divide functionalities into:– Presentation styles– Query types e.g. compound queries, coordination …– Query evaluation strategies

• Assumptions:– Key components– Semantic Web interaction– Context – pressure to make vocabularies “profitable”– … Issues: change, assistance, theory …