edina cigs-21-september-2012

21
Will’s World: Walking Through Shakespeare The use of Linked Data in the Shakespeare Registry Project Muriel Mewissen Project Manager 21 September 2012 http://willsworld.blogs.edina.ac.uk 1

Upload: edina-university-of-edinburgh

Post on 02-Jul-2015

517 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Edina cigs-21-september-2012

Will’s World: Walking Through Shakespeare

The use of Linked Data in the Shakespeare Registry Project

Muriel Mewissen – Project Manager

21 September 2012 http://willsworld.blogs.edina.ac.uk 1

Page 2: Edina cigs-21-september-2012

Outline

• Shakespeare Registry background

• British Museum SPARQL endpoint

• Conclusion

21 September 2012 http://willsworld.blogs.edina.ac.uk 2

Page 3: Edina cigs-21-september-2012

Background

• JISC Discovery Programme 10 months projects– Dec 11 to Sep 12

• Aim: to improve discoverability and usability of online data through better access to better metadata

• Demonstrate the benefits and principles of assembling metadata: ‘aggregation as a tactic’

• Focus on Shakespeare

– Lots of data

– Cultural Olympiad & Anniversary 23rd April • Glasgow culture hack event

21 September 2012 http://willsworld.blogs.edina.ac.uk 3

Page 4: Edina cigs-21-september-2012

21 September 2012 http://willsworld.blogs.edina.ac.uk 4

Shakespeare Registry

• Filters (Themes, Data types, Metadata, APIs, Sources,…)

• Useful tools, license, documentation (build, use, register, contribute…)

Content Providers• Register online sources• Contribute metadata about

resources

Users/Developers• Search, Identify, Locate online

resources using metadata aggregation

Shakespeare Registry

Page 5: Edina cigs-21-september-2012

Linked Data Fit

Wikipedia on Linked Data:

“linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried… using standard formats such as RDF/XML…”

Answer?

21 September 2012 http://willsworld.blogs.edina.ac.uk 5

Questions

Users/Developers Who? What?

Registry Self- sustainable, transferable

How? Attractive? Format? Schema? License?

ContentRich, complex

How much? Sharing? Easy Access? Format?

Page 6: Edina cigs-21-september-2012

Linked Data Provision

• Over 40 sources of online resources – Royal Shakespeare Company, Shakespeare Birthplace Trust, Shakespeare’s

Globe, Shakespeare Institute, Folger Shakespeare Library, Open Shakespeare, World Shakespeare Festival, Open Source Shakespeare, …

– British Museum, British Library, Bodleain Library, Bristish Universities Film & Video Council, National Library of Scotland, Wellcome Images, British Library of Sounds, JISC MediaHUB, BBC, …

– National Theatre Poster, Bosak’s Play of Shakespeare in XML, The work of the Bard, internet Shakespeare Editions, PlayShakespeare.com, Seanco Technology Shakespeare Quote Generator, ...

• Many images, some XML, one SPARQL endpoint!

British Museum: http://collection.britishmuseum.org/Sparql

21 September 2012 http://willsworld.blogs.edina.ac.uk 6

Page 7: Edina cigs-21-september-2012

SPARQL Endpoint

• Service endpoint

• Web interface

• Run SPARQL queries

• Linked Data

• Structured RDF stores

21 September 2012 http://willsworld.blogs.edina.ac.uk 7

Page 8: Edina cigs-21-september-2012

Using the British Museum SPARQL

Easy to start: • Sample query: document ontologies• Help: data structure, access & URIS• Documentation: Controlled terms, object names thesaurus

Search for “Shakespeare”, “William Shakespeare” • Difficult to do keyword search • Difficult to do multi-stage search

– Find the unique ID for an entity – Retrieve information related to the entity

• Limited or no results• Overload the service

21 September 2012 http://willsworld.blogs.edina.ac.uk 8

Page 9: Edina cigs-21-september-2012

SPARQL Common Issues

Common issues:

• Lack of documentation (ontologies, identifiers)

• Lack example queries

• Lack of identifiers

• Slow, timeouts & result size limit

• Inefficient queries (text & keyword search)

21 September 2012 http://willsworld.blogs.edina.ac.uk 9

Page 10: Edina cigs-21-september-2012

SPARQL endpoint

SPARQL is not

• Relational DB (search on given value for field)

– Simple SQL query can be complex

• Text DB like Solr (flexible text search)

– Not suited for discovery

SPARQL provides links & context

Think about Linked Data in the right way

21 September 2012 http://willsworld.blogs.edina.ac.uk 10

Page 11: Edina cigs-21-september-2012

Asking the Right Questions

• Structured data needs structured queries

• To build meaningful queries, we need to know:– Data, structure, schema, identifiers

• Internally specified

How do we identify “William Shakespeare” and related objects before we can the retrieve the relevant Linked Data?

• Need identifier for “William Shakespeare”

• URI or ID in the British Museum schema

21 September 2012 http://willsworld.blogs.edina.ac.uk 11

Page 12: Edina cigs-21-september-2012

Workflow for extracting metadata

1. Collection Database Search GUI

21 September 2012 http://willsworld.blogs.edina.ac.uk 12

Page 13: Edina cigs-21-september-2012

Collection Database Search

2. Select object

4209 Results!

21 September 2012 http://willsworld.blogs.edina.ac.uk 13

Page 14: Edina cigs-21-september-2012

21 September 2012 http://willsworld.blogs.edina.ac.uk 14

3. Extract the ID for the object

Page 15: Edina cigs-21-september-2012

4. Use the ID in the SPARQL query

21 September 2012 http://willsworld.blogs.edina.ac.uk 15

Page 16: Edina cigs-21-september-2012

5. Extract the metadata

Repeat for the remaining 4208 objects!

21 September 2012 http://willsworld.blogs.edina.ac.uk 16

Page 17: Edina cigs-21-september-2012

Sustainable Workflow?

Workaround

• Multiple GUI searches on Shakespeare, William Shakespeare, Macbeth, Hamlet,….

• Manual steps

• Many small queries, few large queries

Feedback on blog post

Person ID for “Shakespeare, William”

21 September 2012 http://willsworld.blogs.edina.ac.uk 17

Page 18: Edina cigs-21-september-2012

Internal ID

21 September 2012 http://willsworld.blogs.edina.ac.uk 18

Page 19: Edina cigs-21-september-2012

400 triples

21 September 2012 http://willsworld.blogs.edina.ac.uk 19

Page 20: Edina cigs-21-september-2012

Conclusions

• SPARQL best suited to link data from different informational silos, not suited to text search and discovery

• Common identifiers are essentials (i.e. ISSN)– Use of standards (ISNI), common language &

ontologies

• Documentation & example queries

• Be prepared – To use additional data sources to identify URIs

– To run many queries

21 September 2012 http://willsworld.blogs.edina.ac.uk 20

Page 21: Edina cigs-21-september-2012

Thanks

• British Museum

– SPARQL is Beta version to generate feedback

– New version available within a few months

• Owen Stephens

• EDINA

Peter Burnhill, Jackie Clark, Catherine Fleming, Andrew Dorward, Neil Mayo, Nicola Osborne, Christine Rees, Tim Stickland

21 September 2012 http://willsworld.blogs.edina.ac.uk 21