data-mining the semantic web

48
Data-mining the Semantic Web and spatially visualising the results Data Visualization for the Arts and Humanities Queen’s University Belfast 5-6 March 2015

Upload: frank-lynam

Post on 20-Jul-2015

408 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Data-mining the Semantic Web

Data-mining the Semantic Weband spatially visualising the resultsData Visualization for the Arts and HumanitiesQueen’s University Belfast 5-6 March 2015

Page 2: Data-mining the Semantic Web

1 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Workshop overview

• Day 1 : Data-mining

– Open Data

– Linked Data

– Linked Open Data implementation

– Semantic Web and ontologies

– Hands-on practicals

Page 3: Data-mining the Semantic Web

2 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Workshop overview

• Day 2 : Data visualisation

– Data visualisation concepts introduction

– Web maps and geo-tagging

– Hands-on practical

– Interpretations

– Hermeneutic circle

Page 4: Data-mining the Semantic Web

3 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

From the horse’s mouth

(source: www.ted.com/talks/tim_berners_lee_on_the_next_web)

Page 5: Data-mining the Semantic Web

4 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Page 6: Data-mining the Semantic Web

5 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Open Access

TerminologyOpen Data

Big Data

The web of data

The Semantic WebLinked Data

data mining

Page 7: Data-mining the Semantic Web

6 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Asking questions of digital datasets

Terminology

Page 8: Data-mining the Semantic Web

7 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Open Access

Terminology

Page 9: Data-mining the Semantic Web

8 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Design by Julie Beckfor the Harvard University Neuroinformatics dept(source: www.juliebcreative.com/portfolio/open-data-logo/)

Page 10: Data-mining the Semantic Web

9 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

http://linkedarc.net/surveys/arch-datasharing

Page 11: Data-mining the Semantic Web

10 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Linked Data

Terminology

The linkages between the major Linked Data datasets (source: lod-cloud.net)

Page 12: Data-mining the Semantic Web

11 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Big Data

Terminology

Wordle of terms associated with Big Data activity (source: sfdata.startupweekend.org)

Page 13: Data-mining the Semantic Web

12 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

5 Stars of Open Data

put your data online under an open license

make it structured (e.g. as an Excel file)

use non-proprietary formats (e.g. XML and not Excel)

use URIs to identify resources

link your data to external datasets

Page 14: Data-mining the Semantic Web

13 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

The RDF Triple

Page 15: Data-mining the Semantic Web

14 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

A Triple Example

‘…the boy’s name is Tom…’

subject

predicate

object

Page 16: Data-mining the Semantic Web

15 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Triple Linking

‘…Tom is short for Thomas…’

subject

predicate

object

Page 17: Data-mining the Semantic Web

16 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Graph data

Page 18: Data-mining the Semantic Web

17 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Serialising RDF

• Turtle

• JSON

• RDF/XML

• N-Triples

Page 19: Data-mining the Semantic Web

18 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

RDF Turtle@base <http://example.org/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix rel: <http://www.perceive.net/schemas/relationship/> .

<green-goblin>rel:enemyOf <spiderman> ;a foaf:Person ; # in the context of the Marvel universefoaf:name "Green Goblin" .

<spiderman>rel:enemyOf <green-goblin> ;a foaf:Person ;foaf:name "Spiderman", "Человек-паук"@ru .

1

2

3

Page 20: Data-mining the Semantic Web

19 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

As N-Triples

<http://example.org/green-goblin> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/spiderman> .<http://example.org/green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<http://example.org/green-goblin> <http://xmlns.com/foaf/0.1/name> "Green Goblin" .<http://example.org/spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/green-goblin> .<http://example.org/spiderman> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman" .<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name> "\u00D0\u00A7\u00D0\u00B5\u00D0\u00BB\u00D0\u00BE\u00D0\u00B2\u00D0\u00B5\u00D0\u00BA-\u00D0\u00BF\u00D0\u00B0\u00D1\u0083\u00D0\u00BA"@ru .

Page 21: Data-mining the Semantic Web

20 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

As JSON

{"http:\/\/example.org\/green-goblin":{"http:\/\/www.perceive.net\/schemas\/relationship\/enemyOf":[{"type":"uri","value":"http:\/\/example.org\/spiderman"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/xmlns.com\/foaf\/0.1\/Person"}],"http:\/\/xmlns.com\/foaf\/0.1\/name":[{"type":"literal","value":"GreenGoblin"}]},"http:\/\/example.org\/spiderman":{"http:\/\/www.perceive.net\/schemas\/relationship\/enemyOf":[{"type":"uri","value":"http:\/\/example.org\/green-goblin"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/xmlns.com\/foaf\/0.1\/Person"}],"http:\/\/xmlns.com\/foaf\/0.1\/name":[{"type":"literal","value":"Spiderman"},{"type":"literal","value":"\u0427\u0435\u043b\u043e\u0432\u0435\u043a-\u043f\u0430\u0443\u043a","lang":"ru"}]}}

Page 22: Data-mining the Semantic Web

21 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

As RDF/XML

<?xml version="1.0" encoding="utf-8" ?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:foaf="http://xmlns.com/foaf/0.1/"xmlns:ns0="http://www.perceive.net/schemas/relationship/">

<foaf:Person rdf:about="http://example.org/green-goblin"><ns0:enemyOf><foaf:Person rdf:about="http://example.org/spiderman">

<ns0:enemyOf rdf:resource="http://example.org/green-goblin"/><foaf:name>Spiderman</foaf:name><foaf:name xml:lang="ru">Человек-паук</foaf:name>

</foaf:Person></ns0:enemyOf>

<foaf:name>Green Goblin</foaf:name></foaf:Person>

</rdf:RDF>

Page 23: Data-mining the Semantic Web

22 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Visualised as a Graph

Page 24: Data-mining the Semantic Web

23 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Triplestores and Infrastructure

Page 25: Data-mining the Semantic Web

24 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: Making RDF

http://www.franklynam.com/blog.aspx?id=85

Q: Create RDF representations of yourself and your relationships

Page 26: Data-mining the Semantic Web

25 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

The Semantic Web and Ontologies

The stages of the Web (source: urenio.org)

Page 27: Data-mining the Semantic Web

26 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Ontological Classes and Properties

Page 28: Data-mining the Semantic Web

27 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

The British Museum data mapping onto the CIDOC CRM(source: confluence.ontotext.com/display/ResearchSpace/BM+Mapping)

Page 29: Data-mining the Semantic Web

28 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

The CIDOC CRM basic entity types and their relationships(source: www.cidoc-crm.org/)

Page 30: Data-mining the Semantic Web

29 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Vocabularies

Page 31: Data-mining the Semantic Web

30 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Graph data

Page 32: Data-mining the Semantic Web

31 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Minna Sundberg (source: www.sssscomic.com/comic.php?page=196)

Page 33: Data-mining the Semantic Web

32 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Querying using SPARQL

SELECT *

WHERE {

?s ?p ?o

} LIMIT 10

Page 34: Data-mining the Semantic Web

33 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

More complex SPARQL

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX letters1916: <http://letters1916.linkedarc.net/ontology/>PREFIX letters1916data: <http://letters1916.linkedarc.net/data/>PREFIX schema: <http://schema.org/>

SELECT DISTINCT ?letter ?letterName ?recipientPostalAddressName ?recipientLongitude ?recipientLatitudeWHERE {

?letter rdf:type letters1916:Letter ;schema:name ?letterName ;letters1916:recipientLocation ?recipientPostalAddress .

?recipientPostalAddress schema:addressRegion ?recipientPostalAddressRegion ;FILTER regex(?recipientPostalAddressRegion, 'Galway', 'i')?recipientPostalAddress schema:name ?recipientPostalAddressName .

?recipientPlace schema:address ?recipientPostalAddress ;schema:geo ?recipientGeoCoordinates .

?recipientGeoCoordinates schema:longitude ?recipientLongitude ;schema:latitude ?recipientLatitude

}

1

2

3

Page 35: Data-mining the Semantic Web

34 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: Universities on DBpedia

http://www.franklynam.com/blog.aspx?id=86

Q: Get a list of all of the universities that DBpedia knows about

Page 36: Data-mining the Semantic Web

35 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

SKOS

@prefix dct: <http://purl.org/dc/terms/> .@prefix skos: <http://www.w3.org/2004/02/skos/core#> .@prefix cc: <http://creativecommons.org/ns#> .

<http://linkedarc.net/vocabs/vessel-jar> a skos:Concept ;cc:license <http://creativecommons.org/licenses/by/3.0> ;cc:attributionURL <http://linkedarc.net> ;cc:attributionName "linkedarc.net" ;skos:inScheme <http://linkedarc.net/vocabs> ;skos:prefLabel “Jar" ;skos:scopeNote ”A jar concept. Pottery. This isn’t a great scope note." ;dct:publisher <http://linkedarc.net> ;dct:identifier <http://linkedarc.net/vocabs/vessel-jar> ;dct:issued "2015-02-23"^^xsd:date ;skos:exactMatch <http://purl.org/heritagedata/schemes/mda_obj/concepts/97609> .

Page 37: Data-mining the Semantic Web

36 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

SPARQL + FILTER

SELECT * WHERE {

?s rdfs:label ?label .

FILTER langMatches(lang(?label), "en”)

}

Page 38: Data-mining the Semantic Web

37 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

SPARQL + FILTER

SELECT * WHERE {

?s rdfs:label ?label .

FILTER langMatches(lang(?label), "en") .

FILTER regex(?label, ”bell", "i”)

}

Page 39: Data-mining the Semantic Web

38 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

SPARQL + FILTER

SELECT * WHERE {

?s dct:dateCreated ?dateCreated .

FILTER (?dateCreated > '1900-01-01'

}

Page 40: Data-mining the Semantic Web

39 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: Getty Concepts

Q: Get all of the Getty URIs that represent concepts related to amphorae

SPARQL endpoint: http://vocab.getty.edu/sparql

Page 41: Data-mining the Semantic Web

40 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: British Museum Sarcophagi

Q: Get the find spots of all of the sarcophagi in the British Museum collection

SPARQL endpoint: http://collection.britishmuseum.org/sparql

Page 42: Data-mining the Semantic Web

41 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Geo-coding the Find Spots

with Google Refine

Page 43: Data-mining the Semantic Web

42 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

The Google Maps API

Address String

Geo-coordinates as JSON

Page 44: Data-mining the Semantic Web

43 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Export as CSV

Page 45: Data-mining the Semantic Web

44 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: data.cso.ie

Q: Get the employment figures generated by the 2011 Irish census by region

SPARQL endpoint: http://nomisma.org/sparql

Page 46: Data-mining the Semantic Web

45 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Practical: Nomisma and Ancient Coins

Q: Get the geo-coordinates of all of the coin hoards stored in the Nomisma triplestore

SPARQL endpoint: http://data.cso.ie/query.html

Page 47: Data-mining the Semantic Web

46 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Additional Linked Data Resources

http://www.franklynam.com/blog.aspx?id=89

Page 48: Data-mining the Semantic Web

47 of 47@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsData Visualization for the Arts and Humanities

Thank you!

Martin Lemay (source: twitter.com/martinlemay)