cigs lod docext_kb_20131118

41
vision txt2rdf grounding Can Documents be Linked Data? Kate Byrne, School of Informatics, University of Edinburgh CIGS LOD Workshop 18th November 2013 1

Upload: cigscotland

Post on 07-Dec-2014

217 views

Category:

Education


0 download

DESCRIPTION

Can documents be Linked Data? / Kate Byrne, School of Informatics, University of Edinburgh, CIGS LOD Workshop Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013

TRANSCRIPT

Page 1: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Can Documents be Linked Data?

Kate Byrne, School of Informatics, University of Edinburgh

CIGS LOD Workshop

18th November 2013

1

Page 2: Cigs lod docext_kb_20131118

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies

2

Page 3: Cigs lod docext_kb_20131118

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

Page 4: Cigs lod docext_kb_20131118

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

Page 5: Cigs lod docext_kb_20131118

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

Page 6: Cigs lod docext_kb_20131118

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

Page 7: Cigs lod docext_kb_20131118
Page 8: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

5

Page 9: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

hobbit hole the groundlives in located in

5

Page 10: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

5

Page 11: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 12: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 13: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 14: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 15: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 16: Cigs lod docext_kb_20131118

vision txt2rdf grounding

A lot of information is in textual form!

6

Page 17: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

7

Page 18: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

7

Page 19: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

nouns

7

Page 20: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

nouns

verbs

7

Page 21: Cigs lod docext_kb_20131118

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies8

Page 22: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Extracting structured knowledge from free text

fancy NLP processing

and RDFisation

8

Page 23: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Natural Language Processing pipeline

sentenceand para

split

POS tagtokenise

multi−wordtokens and

features trained NERmodel

list of NEsand

classes

removeunwantedrelations

generatetriples

attachsiteids

trained REmodel

set of NEpairs andfeatures

list ofrelations

and classes

sfsjksjwjvssjkljljs sd’lajoen s

jjs kjdlk lksjlkj sks oihhg sk

jjlkjlj jljbjl skj ekw

RDFtranslation

Graphof triples

Pre−processing Named Entity Recognition

Relation Extraction

Text documents

9

Page 24: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Named entities and relations

Evidence of a quartz knapping site was found within the confines of the stone

strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.

circle, and in conjunction with several structures within the inner ring,

site 20

10

Page 25: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Named entities and relations

site 20

10

Page 26: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Converting text relations to RDF – 1

site 20

site20 − hasEvent − excavationX

excavationX − hasLocation − SandsOfBreckon

excavationX − hasDate − 1982

11

Page 27: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Converting text relations to RDF – 2

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

site20 − hasEvent − excavationX

excavationX − hasLocation − SandsOfBreckon

excavationX − hasDate − 1982

:hasLocation

siteid:site20

:hasClassn

sitetype:stone+settings20w179

sitename:sands+of+breckon

event:excavation20w158

date:1982

event:excavation

address:hp50nw+11.00

address:breckon

12

Page 28: Cigs lod docext_kb_20131118

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies13

Page 29: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Let’s remind ourselves what’s the point of Linked Data

13

Page 30: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Let’s remind ourselves what’s the point of Linked Data

excavated by Piggot in 1947...

A complex site on the summit of Cairnpapple Hill

site number:

classification: Cairn, henge

sitename: Cairnpapple

NS97SE 16

siteid: 47919

in West Lothian. The stone is from...

ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a

find spot:

objectid:

Cairnpapple

X.EP 167

Classn/Sitetype#cairn%20+henge

:Siteid#site47919

:Agent/Person#piggot

:Event#excavated47919w10

:Time/Date#1947

:hasClassn

:hasPeriod

:hasEvent

:hasAgent

:hasLocation

:Loc/Place#cairnpapple+hill

:Loc/Sitename#cairnpapple

:Objectid#x.ep+167

:hasLocation

:hasLocation

:hasClassn:hasFindSpot

:Classn/Objtype#axe+flake

Id#ns97se+16:hasId

archaeological site archive

:Loc/Place#west+lothian

museum database

13

Page 31: Cigs lod docext_kb_20131118

vision txt2rdf grounding

But linking Linked Data is actually pretty hard

excavated by Piggot in 1947...

A complex site on the summit of Cairnpapple Hill

site number:

classification: Cairn, henge

sitename: Cairnpapple

NS97SE 16

siteid: 47919

in West Lothian. The stone is from...

ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a

find spot:

objectid:

Cairnpapple

X.EP 167

Classn/Sitetype#cairn%20+henge

:Siteid#site47919

:Agent/Person#piggot

:Event#excavated47919w10

:Time/Date#1947

:hasClassn

:hasPeriod

:hasEvent

:hasAgent

:hasLocation

:Loc/Place#cairnpapple+hill

:Loc/Sitename#cairnpapple

:Objectid#x.ep+167

:hasLocation

:hasLocation

:hasClassn:hasFindSpot

:Classn/Objtype#axe+flake

Id#ns97se+16:hasId

archaeological site archive

:Loc/Place#west+lothian

museum database

Direct link means spotting identical node in separate graph

How? String matching? Clues from context?

14

Page 32: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

15

Page 33: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

15

Page 34: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

grounding local URIs

against "authority" nodes

is the

next big challenge!

15

Page 35: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Grounding site20 against Monument Thesaurus

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

sitetype:stone+settings20w179

sitetype:stone+setting

"stone setting"

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

:hasClassn

rdf:type

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdfs:subClassOf

16

Page 36: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Grounding site20 against Monument Thesaurus

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

16

Page 37: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Grounding against various authorities/ontologies

Placename authorities: Geonames, OS gazetteer, Pleiades

Period: EH draft ontology

Monument classifications: Seneschal project

Bibliographic: LCSH, FRBR

...hundreds of LOD datasets in the cloud

Informatics projects

Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world

17

Page 38: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Grounding against various authorities/ontologies

Placename authorities: Geonames, OS gazetteer, Pleiades

Period: EH draft ontology

Monument classifications: Seneschal project

Bibliographic: LCSH, FRBR

...hundreds of LOD datasets in the cloud

Informatics projects

Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world

17

Page 39: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Unlock Text – find placenames and plot on map

http://unlock.edina.ac.uk/

18

Page 40: Cigs lod docext_kb_20131118

vision txt2rdf grounding

GapVis interface

http://nrabinowitz.github.com/gapvis/ 19

Page 41: Cigs lod docext_kb_20131118

vision txt2rdf grounding

Questions?

20