tpdl2013 tutorial linked data for digital libraries 2013-10-22

114
Linked Data for Digital Libraries Uldis Bojars, Nuno Lopes, & Jodi Schneider TPDL 2013 September 22, 2013 Valletta, Malta 1

Upload: jodischneider

Post on 08-May-2015

4.794 views

Category:

Technology


9 download

DESCRIPTION

Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22. http://tpdl2013.upatras.gr/tut-lddl.php This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data. For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.

TRANSCRIPT

Page 1: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Linked Data for Digital Libraries

Uldis Bojars, Nuno Lopes, & Jodi SchneiderTPDL 2013

September 22, 2013Valletta, Malta

1

Page 2: TPDL2013 tutorial linked data for digital libraries 2013-10-22

NunoDigital Repository of Ireland &DERI

UldisNational Library of Latvia

JodiDERI

Page 3: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Schedule for the day9:00 - Introduction of presenters, tutorial schedule, and learning outcomes9:10 - Motivation and concepts of Linked Data9:30 - Discuss: How would you envision using Linked Data in your institution? 9:45 - Lifecycle of Linked Data & Exploring Linked Data10:10 - Case Study 1: Authority Data

10:30 – 11 COFFEE BREAK

11:00 - Recap 11:10 - Modelling data as Linked Data11:30 - Case Study 2: Geographical Linked Data11:50 - Choice of Hands-on Activities12:25 - Conclusions

Page 4: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Hands-on Activities

11:50 – 12:25Choice of Activities….

• Data Modelling• Data Cleaning & Structuring• Querying (SPARQL)

Page 5: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Please share your expertise!

• In the room• On paper• Online - shared folder:

http://tinyurl.com/tpdl2013-ld-notes– PDF of the programme– Shared notes– More materials later

Page 6: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• What is Linked Data? Why use it? • What are some examples of

Linked Data in Digital Libraries?• What are the best practices for

exploring & creating Linked Data?

Objectives for Today

Page 7: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Motivation and concepts of Linked Data

Page 8: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• Using identifiers• to enable access• to add structure • to link to other stuff

What is Linked Data?

Page 9: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Why use Linked Data?

Page 10: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Key technology for library data! Representing

PublishingExchanging

Page 11: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• Powerful querying• Ability to mix/match vocabularies• Same technology stack as everybody else

– Findability– Interoperability

Page 12: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Who is using Linked Data?

Page 13: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Aggregators

Page 14: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Integrated Library Systems & OPACs

Page 15: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Thesauri

Page 16: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Repositories

Page 17: TPDL2013 tutorial linked data for digital libraries 2013-10-22

What is Linked Data (redux)?

Page 18: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Rob Styles

Page 19: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Towards RDF

Subject Predicate Object

Page 20: TPDL2013 tutorial linked data for digital libraries 2013-10-22

RDF triple

Subject Predicate Object

Page 21: TPDL2013 tutorial linked data for digital libraries 2013-10-22

RDF graph

Page 22: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Reuses the existing Web infrastructure to publish your data along with your documents:

– Using URI identifiers– and HTTP for accessing the information

How Linked Data works

Page 23: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Linked Data Principles

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up

those names. 3. When someone looks up a URI, provide

useful information, using the standards- RDF, SPARQL

4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/wiki/LinkedDatahttp://www.w3.org/DesignIssues/LinkedData

Page 24: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• We need a proper infrastructure for a real Web of Data– data is available on the Web

• accessible via standard Web technologies

– data is interlinked over the Web– ie, data can be integrated over the Web

• We need Linked Data

Data on the Web is not enough…

Slide credit: Ivan Herman

Page 25: TPDL2013 tutorial linked data for digital libraries 2013-10-22

In groups of 2-3: Discuss

• How would you envision using Linked Data?What are the opportunities?

• Is your institution already using Linked Data? Planning a Linked Data project?

Page 26: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Lifecycle of Linked Data

Page 27: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Lifecycle of Linked Data

• Find• Explore• Transform• Model• Store• Query• Interlink• Publish

Page 28: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Uldis Bojars, Nuno Lopes, & Jodi Schneider

Semantic Web for Digital Libraries

Exploring Linked Data(Practical Tools and Approaches)

Page 29: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Objectives

• Learn about Linked Data (LD) by looking at existing data sources

• Discover tools and approaches for exploring Linked Data

Page 30: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 31: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Exploring Linked Data

• Discovering Linked Data• Accessing RDF data• Making sense of the data

– Validating RDF data– Converting between formats– Browsing Linked Data

• Querying RDF data

Page 32: TPDL2013 tutorial linked data for digital libraries 2013-10-22

RDF graph

Page 33: TPDL2013 tutorial linked data for digital libraries 2013-10-22

What RDF looks like

• RDF can be expressed in a number of formats:– some are good for machines;

some – understandable to people

• Common formats:– RDF/XML – common, but difficult to read– NTriples – a simple list of RDF triples – Turtle – human-readable, easier to understand

• Can be represented visually

Page 34: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Accessing RDF data

RDF data on the Web can be found as:

• Linked Data– follow links, request data by URI– returned data can be in various RDF formats

• Data dumps– download the data

• SPARQL endpoints– query Linked Data (more on that later)

Page 35: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://www.ivan-herman.net/

Page 36: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Discovering Linked Data

a) find on a link in a Web pageb) have some tools alert you Linked Data is there

– Tabulator– Semantic Radar

c) explore a project you heard about – and know LOD should be there

d) use a registry of sources http://datahub.io/group/lodcloud

e) Just ask someone

Page 37: TPDL2013 tutorial linked data for digital libraries 2013-10-22

RDF discovery example

• data at Ivan Herman’s page can be found via:– finding the RDF icon (with the link to FOAF file)– letting browser tools alert you that RDF is present

• RDF auto-discovery

– extracting RDFa data embedded in the page

• for other data sources RDF content negotiation might work

Page 38: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Making sense of the data

• Validating RDF data– Ensures that data representation is correct

• Converting between formats– Convert to a [more] human-readable RDF format

• Browsing Linked Data– Browse the data without worrying about

“reading” RDF

Page 39: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Validating and Converting RDF

• W3C RDF validator http://www.w3.org/RDF/Validator/

• URI debugger – “Swiss knife” of Linked Datahttp://linkeddata.informatik.hu-berlin.de/uridbg/

• RDFa distiller – extracts RDF embedded in web pages http://www.w3.org/2012/pyRdfa/

• Command-line tools (we’ll return to that)

Page 40: TPDL2013 tutorial linked data for digital libraries 2013-10-22

<http://www.ivan-herman.net/> a foaf:PersonalProfileDocument; dc:creator "Ivan Herman"; dc:date "2009-06-17"^^xsd:date; dc:title "Ivan Herman’s home page"; xhv:stylesheet <http://www.ivan-herman.net/Style/gray.css>; foaf:primaryTopic <http://www.ivan-herman.net/foaf#me> .

<http://twitter.com/ivan_herman> a foaf:OnlineAccount; foaf:accountName "ivan_herman"; foaf:accountServiceHomepage <http://twitter.com/> .

<http://www.ivan-herman.net/cgi-bin/rss2to1.py> a rss:channel .

<http://www.ivan-herman.net/foaf#me> a dc:Agent,

foaf:Person; rdfs:seeAlso <http://www.ivan-herman.net/AboutMe>, <http://www.ivan-herman.net/cgi-bin/rss2to1.py>, <http://www.ivan-herman.net/foaf.rdf>;...

Extracted from http://www.ivan-herman.net/ using RDFa Distiller

Page 41: TPDL2013 tutorial linked data for digital libraries 2013-10-22
Page 42: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Browsing Linked Data (DBPedia):http://live.dbpedia.org/resource/Valletta

Page 43: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Command Line Tools

• wget – command line network downloader$ wget http://dbpedia.org/resource/Valletta

• curl – specify HTTP headers$ curl -L -H "Accept: text/rdf+n3” http://dbpedia.org/resource/Valletta

• Redland rapper – RDF parsing and serialisation$ rapper -o turtle http://dbpedia.org/resource/Valletta

Page 44: TPDL2013 tutorial linked data for digital libraries 2013-10-22
Page 45: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Querying Linked Data

• SPARQL Protocol and RDF Query Language• Graph Matching• Components of a SPARQL Query:

– Prefix Declarations– Result type (SELECT, CONSTRUCT, DESCRIBE, ASK)– Dataset– Query pattern– Solution modifiers

Page 46: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Europeana SPARQL endpoint

http://europeana.ontotext.com/

Page 48: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://tinyurl.com/europeana-rights-sparql

Page 49: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Tool catalogues: many more tools

• Collection of tools from other projects– http://www.w3.org/2001/sw/wiki/LLDtools– http://www.w3.org/2001/sw/wiki/Tools– http://semanticweb.org/wiki/Tools– http://dbpedia.org/Applications

Page 50: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Interesting Projects• LOCAH

a stylesheet to transform UK Archives Hub EAD to RDF/XML, and provides examples of the process using XLST http://data.archiveshub.ac.uk/ead2rdf/

• AliCAT (Archival Linked-data Cataloguing)Tool for editing collection level recordshttp://data.aim25.ac.uk/step-change/

• Axiell CALM Solution for LAM that includes Linked Data functionality, allowing archivists to tag their collections with URIs from any chosen Linked Dataset.

http://www.axiell.com/calm

Page 51: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Tools for Converting MARC records

• MariMba Tool to translate MARC to RDF and Linked Datahttp://mayor2.dia.fi.upm.es/oeg-upm/index.php/en/downloads/228-marimba

• marcauth-2-madsrdfXQuery utility to convert MARC/XML Authority records to MADS/RDF and SKOS resources https://github.com/kefo/marcauth-2-madsrdf

Page 52: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Tools for museum curators

• Karma (http://isi.edu/integration/karma/)was used to map the records of the Smithsonian American Art Museum to RDF and link them the Web and the Linked Open Data Cloud. Demo: http://www.youtube.com/watch?v=kUIqTI56oeQ

Page 53: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Authority Linked Data

VIAF and Wikipedia case study

Page 54: TPDL2013 tutorial linked data for digital libraries 2013-10-22

library linksSlide credit: Jindřich Mynarz

Page 55: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• Use a single, distinct name for each person, organization, …

• Name is consistently used throughout library systems

• Issues:– “Strings” not “things”– in Linked Data world we’d just use

URIs

Page 56: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://viaf.org

Page 57: TPDL2013 tutorial linked data for digital libraries 2013-10-22

VIAF

• Virtual Internet Authority File (viaf.org)

• Integrating authority information from a number of national libraries– Linked data + links to related information

• Matching authority data from multiple sources– using related bibliographic records to help

matching

Page 58: TPDL2013 tutorial linked data for digital libraries 2013-10-22
Page 59: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Wikipedia + VIAF

• How can people discover useful information in VIAF and via VIAF?

• Linked Data eco-system – let’s explore (!)– Wikipedia -> VIAF -> National Library LD

• Example (Andrejs Pumpurs):– http://en.wikipedia.org/wiki/Andrejs_Pumpurs– http://viaf.org/viaf/44427367/

Page 60: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://en.wikipedia.org/wiki/Andrejs_Pumpurs

Page 61: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://viaf.org/viaf/44427367/

Page 62: TPDL2013 tutorial linked data for digital libraries 2013-10-22

VIAF

• Ontologies used:– FOAF, SKOS, RDA (FRBR entities and elements),

Dublin Core, VIAF, UMBEL

• Related datasets:– National authority data:

• Germany (d-nb.info), Sweden (LIBRIS), France (idref.rf)

– DBPedia

Page 63: TPDL2013 tutorial linked data for digital libraries 2013-10-22

http://viaf.org/viaf/44427367/

Page 64: TPDL2013 tutorial linked data for digital libraries 2013-10-22

How did VIAF get into Wikipedia?

• VIAFbot– algorithmically matched by name, important

dates, and selected works• “The principal benefit of VIAFbot is the

interconnected structure.” -

Page 65: TPDL2013 tutorial linked data for digital libraries 2013-10-22

One Direction

VIAF English Wiki

Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC

Page 66: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Enter VIAFBot: Wikipedia Robot

VIAF English Wiki

Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC

Page 67: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Idea: Reciprocate

VIAF English Wiki

Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC

Page 68: TPDL2013 tutorial linked data for digital libraries 2013-10-22

VIAF – summary:

– an efficient way for putting library authority data online as linked data

– in case if the organization also provides Linked Data itself can add links to VIAF to link back to organization’s LD records (which may contain richer / additional information)

Page 69: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Data Modelling

Page 70: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Publishing Data

• Naïve Transform– Direct Mapping of Relational Data to RDF

See RDB2RDF

OR• Model & Transform

– Figure out how to represent data– Then transform according to the model

Page 71: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Model

• Describe the domain– What are the important concepts?– What are their properties?– What are their relations?

• Choose vocabularies

Page 72: TPDL2013 tutorial linked data for digital libraries 2013-10-22

DC TERMS RDF Vocabularyhttp://purl.org/dc/terms/

Page 73: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Deciding on URI patterns

• Use a domain that you control• Use consistent patterns• Manage change: transparent isn’t always best• Consider what concepts are worth

distinguishing

Page 74: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Example URI patterns

• Designing URI Sets for the UK Public Sector• Defines patterns for

– Identifier URI– Document URI– Representation URI

• Identifier example:http://{domain}/id/{concept}/{reference}

http://data.archiveshub.ac.uk/id/person/ncarules/skinnerbeverley1938-1999artist

Page 75: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Choosing Vocabularies

• Audience & Purpose – e.g. search engine vs. bibliographic exchange

• Domain– Biomedical, geographical, …

• Granularity• Popularity: potential for interlinking & reuse

Page 76: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Finding vocabularies & ontologies

Page 77: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Look at examples

Page 78: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Look at examples

Page 79: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Find examples: Linked Open Data Cloud

Page 81: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Ask the community

• Mailing lists– LOD-LAM– Code4Lib– OKFN Open-Bibliography Working Group– W3C Schema.org BibEx Community Group

• Domain-specific Linked Data groups & lists

Page 82: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Popularity

Page 83: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Popularity: Semantic search engines

http://sindice.com/

Page 84: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Modeling spectrum:lightweight to heavyweight

An ontology ”spectrum” (in the order of complexity). Source: [Lassila and McGuinness, 2001]. Image from Bojars 2009

Page 85: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Some popular vocabularies

• DC• BIBO• FOAF• LODE (LinkedEvents)• OAI-ORE• SKOS

Page 86: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Be aware of & connect to

• Authority data– e.g. VIAF

• Thesauri– e.g. Agrovoc

• Linked Data is about Linking!

Page 87: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Modeling examples

• BIBFRAME• British Library Data Model• EDM• LIBRIS• VIAF

Page 88: TPDL2013 tutorial linked data for digital libraries 2013-10-22

VIAF

• Ontologies used:– FOAF, SKOS, RDA (FRBR entities and elements),

Dublin Core, VIAF, UMBEL

• Related datasets:– National authority data:

• Germany (d-nb.info), Sweden (LIBRIS), France (idref.rf)

– DBPedia

Page 89: TPDL2013 tutorial linked data for digital libraries 2013-10-22

LIBRIS Modeling

Page 90: TPDL2013 tutorial linked data for digital libraries 2013-10-22

British Library Data Modelhttp://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf

Page 91: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Uldis Bojars, Nuno Lopes, & Jodi Schneider

Semantic Web for Digital LibrariesGeographical LD case study

Page 92: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• Collections refer to Geographical Data in many forms…

• The Longfield Maps are a set of 1,570 surveys carried out in Ireland between 1770 and 1840.

• Currently catalogued in MarcXML, using data from Logainm, Geonames and Dbpedia.

The NLI Longfield Map Collection

Page 93: TPDL2013 tutorial linked data for digital libraries 2013-10-22

<marc:datafield tag="650" ind1="" ind2="">

<marc:subfield code="a">Land tenure</marc:subfield>

<marc:subfield code="z">Ireland</marc:subfield>

<marc:subfield code="z">Rathdown (Barony)</marc:subfield>

</marc:datafield>

<marc:datafield tag="650" ind1="" ind2="">

<marc:subfield code="a">Land use surveys</marc:subfield>

<marc:subfield code="z">Ireland</marc:subfield>

<marc:subfield code="z">Wicklow (County)</marc:subfield>

</marc:datafield>

Longfield Map example

Page 94: TPDL2013 tutorial linked data for digital libraries 2013-10-22

DBpedia– Includes latitude and longitude for geographic entities

LinkedGeoData – Export of data from OpenStreetMap– Beyond lat/lon (areas as polygons)

GeoNames– Access data as RDF (download requires subscription)

Geographic Data Providers

GeoLinkedData Spain

Ordnance Survey UK

Page 95: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• The authority list of Irish place names, validated by the Place Names Branch.

• Delivering a more detailed level than in DBpedia, Geonames.

• Unique source of Irish language place names.

• NLI looking to integrate Logainm data into their workflow. Allowing to search for place names in Irish.

Logainm.ie

Page 96: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• W3C Geo (very basic)– SpatialThing, latitude and longitude

• Most providers have defined their own

• NeoGeo (http://geovocab.org/doc/neogeo/)– Feature vs Geometry

– Spatial Relations (is_part_of)

Geo-Vocabularies

Page 97: TPDL2013 tutorial linked data for digital libraries 2013-10-22

NeoGeo Overview

• Classes– Feature (spatial:Feature)

• A geographical feature, capable of holding spatial relations.

– Geometry (geom:Geometry)• Super-class of all geometrical representations (RDF,

KML, GML, WKT...).

• Connected by the geometry (geom:geometry)

Page 98: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Relations between geometries

Properties• connects with (spatial:C)• overlaps (spatial:O)• is part of (spatial:P)• contains (spatial:Pi)• …

Page 99: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Creating a LD Dataset

Steps:1. Data transformation / access

• Vocabulary assessment

2. Link Discovery• Evaluation of generated links

3. Deployment• Virtuoso OpenSource

Page 100: TPDL2013 tutorial linked data for digital libraries 2013-10-22

owl:sameAs

foaf:name

http://data.logainm.ie/1375542

Dublinhttp://

sws.geonames.org/2964574/

~100,000 place names

~1.3M triples

Converting Logainm to RDF

Page 101: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Link Discovery

• Silk– http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/

• LIMES– http://aksw.org/Projects/LIMES.html

• Based on specifying rules that compare pairs of entities

Page 102: TPDL2013 tutorial linked data for digital libraries 2013-10-22

• Rules based on:– Place names– Geographical coordinates– Name of the county / parent place name– Hierarchy of places

Rules to discover links to other datasets

• # entities matched:– DBpedia: 1,552– LinkedGeoData: 6,611– GeoNames: 8,229

Page 103: TPDL2013 tutorial linked data for digital libraries 2013-10-22

<marc:datafield tag="650" ind1="" ind2="">

<marc:subfield code="a">Land tenure</marc:subfield>

<marc:subfield code="z">Ireland</marc:subfield>

<marc:subfield code="z">Rathdown (Barony)</marc:subfield>

</marc:datafield>

<marc:datafield tag="650" ind1="" ind2="">

<marc:subfield code="a">Land use surveys</marc:subfield>

<marc:subfield code="z">Ireland</marc:subfield>

<marc:subfield code="z">Wicklow (County)</marc:subfield>

</marc:datafield>

<marc:datafield tag="651" ind2="7" ind1="">

<marc:subfield code="2">logainm.ie</marc:subfield>

<marc:subfield code="a">Rathdown</marc:subfield>

<marc:subfield code="0”>http://data.logainm.ie/place/283</marc:subfield>

</marc:datafield>

Longfield Map example

Page 104: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Demo: Location LODerhttp://apps.dri.ie/locationLODer/locationLODer

Page 105: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Hands-on Activities

11:50 – 12:25Choice of Activities….

• Data Modelling• Data Cleaning & Structuring• Querying (SPARQL)

Page 106: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Uldis Bojars, Nuno Lopes, & Jodi Schneider

Semantic Web for Digital LibrariesOpen Refine Exercise

Page 107: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Open Refine

• Useful for batch transformation of large amounts of data– data cleanup (misspellings, splitting multiple-valued

columns, …)• Linking to other databases

– Freebase– Any SPARQL enabled LD

• Website: http://openrefine.org/ • RDF extension: http://refine.deri.ie/

Page 109: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Task 1 - Data Cleanup

1. Import the collection into OpenRefine2. Get to know your data3. Remove blank rows4. Remove duplicate rows5. Split cells with multiple values6. Remove blank cells7. Cluster values8. Remove double category values

Page 110: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Task 2 - Data Reconciliation & RDF Export

1. Pick a column to reconcile2. Pick a vocabulary to reconcile with3. Tell OpenRefine about the vocabulary4. Start the reconciliation process5. Understanding the reconciliation results6. Interpreting the new reconciliation results7. Exporting RDF

Page 111: TPDL2013 tutorial linked data for digital libraries 2013-10-22

Uldis Bojars, Nuno Lopes, & Jodi Schneider

Semantic Web for Digital LibrariesSPARQL Hands-on Session

Page 112: TPDL2013 tutorial linked data for digital libraries 2013-10-22

SPARQL

• Query Language for RDF data• W3C Standard• Components of a SPARQL Query:

– Prefix Declarations– Result type (SELECT, CONSTRUCT, DESCRIBE, ASK)– Dataset– Query pattern– Solution modifiers

Page 114: TPDL2013 tutorial linked data for digital libraries 2013-10-22

SPARQL by example – Europeana Endpoint

Endpoint: http://europeana.ontotext.com/sparql

1. SPARQL Select template2. List of data providers having contributed

content to Europeana3. List of provided objects with their aggregators4. 18th century Europeana objects from France5. Write your own