madrid linked data for digital humanities

64
Linked Data for Digital Humanities Victor de Boer

Upload: victor-de-boer

Post on 15-Jul-2015

96 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Madrid Linked Data for Digital Humanities

Linked Data for Digital Humanities

Victor de Boer

Page 2: Madrid Linked Data for Digital Humanities

About me

Victor de BoerAssistant professorWeb & Media Group, Network InstituteVU University Amsterdam

Semantic Technologies, Linked Data

Cultural HeritageDigital History

Linked Data for Development

Page 3: Madrid Linked Data for Digital Humanities

What is Linked Data

Page 4: Madrid Linked Data for Digital Humanities

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)

Page 5: Madrid Linked Data for Digital Humanities

Web of Documents (WWW)Linked Documents

Page 6: Madrid Linked Data for Digital Humanities

From text to data > increased semantics

Page 7: Madrid Linked Data for Digital Humanities

More and more structured data available online

Governments

Social web data

Medical data

Museums

Research data

?

Mo

verum

.com

Page 8: Madrid Linked Data for Digital Humanities

Web of Documents vs Web of DataPeople are often not interested in documents, they are interested in things (information)

Humans are very good at reading (web) documents and distilling information

Computers are good at calculating, combining and filtering information. But they are very bad at reading documents

We need to help machines understand web dataWrite it down in a way that they can understand

LINKED DATA!!

Page 9: Madrid Linked Data for Digital Humanities

Web of Documents (WWW)Linked Documents

Page 10: Madrid Linked Data for Digital Humanities

Web of DataLinked Data

Page 11: Madrid Linked Data for Digital Humanities

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)And the Semantic Web

Page 12: Madrid Linked Data for Digital Humanities

What is Linked Open Data?

Page 13: Madrid Linked Data for Digital Humanities

Open Datais about licenses to allow reuse

Linked Datais about technology for interoperability

Page 14: Madrid Linked Data for Digital Humanities

www.w3.org/designissues/linkeddata.html

Page 15: Madrid Linked Data for Digital Humanities

http://lod-cloud.net/

Page 16: Madrid Linked Data for Digital Humanities
Page 17: Madrid Linked Data for Digital Humanities
Page 18: Madrid Linked Data for Digital Humanities

Google knowledge graph

ww

w.h

uffin

gton

po

st.com

Page 19: Madrid Linked Data for Digital Humanities

OpenPhacts explorer

http://www.openphacts.org/

Page 20: Madrid Linked Data for Digital Humanities

How does all this work?Data, not documentsStructured dataGraph (networked) data!W3C Web standards stack

URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.

Page 21: Madrid Linked Data for Digital Humanities

Four rules of Linked Data1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF)

4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Page 22: Madrid Linked Data for Digital Humanities

Semantic Web standard for writing down data, information

(Subject, Relation, Object)

<Painting001, has_location, Amsterdam>

Resource Description Framework (RDF)

Painting001 Amsterdamhas_location

Page 23: Madrid Linked Data for Digital Humanities
Page 24: Madrid Linked Data for Digital Humanities

Use HTTP URIs for ThingsUniform Resource Identifier (URI) is a string of characters used to identify a name of a resource

http://rijksmuseum.nl/data/schilderij1

I can go there (dereference) and then I get information about it

HTML page for humansRDF data for machines

Page 25: Madrid Linked Data for Digital Humanities

LinksLink your data to other data

By establishing RDF triples that point to other people’s dataBy reusing other people’s URIs

Page 27: Madrid Linked Data for Digital Humanities

Linked Data is ``a term used to describe arecommended best practice for exposing,sharing, and connecting piecesof data, information, and knowledge on theSemantic Web using URIs and RDF.’’ --Wikipedia

Page 28: Madrid Linked Data for Digital Humanities

Why Linked Data for E-scienceLarge amounts of dataEfficient analysis, data mining

Sharing data, information and knowledge between scientists

Across continentsAcross disciplines

Page 29: Madrid Linked Data for Digital Humanities

But what about the humanities?

Page 30: Madrid Linked Data for Digital Humanities

Some examples

Page 31: Madrid Linked Data for Digital Humanities

MultimediaN E-Culture project (2006)

Museums have increasingly nice websites

But: most of them are driven by stand-alone collection databases

Data is isolated, both syntactically and semantically

If users can do cross-collection search, the individual collections become more valuable!

Semantic Search

Page 32: Madrid Linked Data for Digital Humanities

E-Culture data cloud

Page 33: Madrid Linked Data for Digital Humanities

Vocabulary alignment“Easel-pieces”

RMA concept“Schilderij”

RMA is the thesaurus of Rijksmuseum

AAT artefact type“Easel Piece”“Painting”

AAT is Getty’s Art & Architecture Thesaurus

Page 34: Madrid Linked Data for Digital Humanities

http://e-culture.multimedian.nl/

Page 35: Madrid Linked Data for Digital Humanities

BiographyNet

Page 36: Madrid Linked Data for Digital Humanities

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Linked Data for BiograpyNet

Thorbecke

Biographical Description

ProvenanceMeta Data

NNBW

PersonMeta Data

“Thorbecke”

BiographyParts

Birth1798

Event

Biographical Description

Enrichment NLP Tool

PersonMeta Data

EventBirth

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Zwolle1798-01-14

Page 37: Madrid Linked Data for Digital Humanities
Page 38: Madrid Linked Data for Digital Humanities

Interface for historians (mockup)

Page 39: Madrid Linked Data for Digital Humanities

Het Koninkrijk der Nederlanden in de Tweede WereldoorlogHistory of German occupied Dutch society (1940-1945)Published 1969 - 1991 in 14 volumes, 30 parts, 18.000 pages

1. Digitization

2. Open Data

3. Enriched access with Linked Data

E-History: Verrijkt Koninkrijk

Page 40: Madrid Linked Data for Digital Humanities

Step 1: Lou de Jong’s “Het Koninkrijk” was digitized and made available in a reusable format

Step 2: Named Entity Recognition and consolidation of the back-of-the-book index provide structured vocabularies with links into the text

country, collection, doc-type, volume, chapter, section, sub-section, paragraph

Back-of-the-Book index Named Entities

Page 41: Madrid Linked Data for Digital Humanities

Verrijkt Koninkrijk

Step 3: Enrichment with Linked Data makes new ways of interaction and analysis possible

Back-of-the-Book index Named Entities

Page 42: Madrid Linked Data for Digital Humanities

niod:Blitzkrieg

niod:oai_wo2_niod_nl_rec_102045dct:subject

http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386

botb:Blitzkrieg

skos:exactMatch

Page 43: Madrid Linked Data for Digital Humanities

43

National-Socialist

29%

Social-Democrat

21%Protestant

13%

Liberal12%

R-Catholic12%

Communist8%

Jewish5%

http://semanticweb.cs.vu.nl/verrijktkoninkrijk/

http://search.loedejongdigitaal.nl/

Page 44: Madrid Linked Data for Digital Humanities

Results are links to paragraphs

Page 45: Madrid Linked Data for Digital Humanities

Hackathon shows re-usability

Page 46: Madrid Linked Data for Digital Humanities

Dutch Ships and Sailors Project

Page 47: Madrid Linked Data for Digital Humanities

The Problem:((Maritime) historical) data is not integrated

• Researchers’ data is “lost”– In different physical locations– In different file formats– In different semantic structures

• In a workshop, we identified 25+ maritime historical datasets. – http://dutchshipsandsailors.nl

• We do not want to force one monolithic data model for integration

Page 48: Madrid Linked Data for Digital Humanities

The solution: Linked Open Data

• Represent heterogeneous datasets with their own data models– In one data format (RDF)– Link what can be linked to integrate at

project level (and beyond)– Keep specificity of original data

• Links to other sources: re-use knowledge

• Allow multiple levels of semantic enrichment/ normalization – through Named Graphs – Provenance

Page 49: Madrid Linked Data for Digital Humanities

What we did1. Model four maritime historical datasets as

RDF– Noordelijke Monsterrollen Database [J. Leinenga]– Generale Zeemonsterrollen [M. van Rossum]– Dutch Asiatic Shipping– VOC Opvarenden

2. Link to each other (based on ships, ship types, ranks, geography,…)– Models and links evaluated by domain experts

1. Publish as Linked Open Data

2. Show how this data cloud can lead to new types of integrated research questions

Page 50: Madrid Linked Data for Digital Humanities

Links to Historical Newspapers

[HARLINGEN, 24 October.] …gestrand.

Tevens is het berigt ontvan°e > dat het hier

behoorende schoonerschip Transit,

kapitein Schaap, in de Noordzee is

gezonken, nadat het achterschip was

weggeslagen ; een ligtmatroos verloor

daarbij het leven. Mede zijn hier drie

vreemde schepen met meer en minder

zware averij binnengeloopen.

- Andrea Bravo Balado

Page 51: Madrid Linked Data for Digital Humanities

DAS

GZMVOC

MDB

VOCOPVBegunstig

den

VOCOPVSoldijboek

en

PROV

AAT

VOCOPVOpvaren

den

foaf

owl:sameAs

dss:hasKBLink

rdfs:subClassOf,rdfs:subPropertyOf

dss:DAS link

skos :exactMatch

Page 52: Madrid Linked Data for Digital Humanities

Data analysis and visualisation

Page 53: Madrid Linked Data for Digital Humanities

http://dutchshipsandsailors.nl/data

Hands-on session tomorrow

fluxico

n.co

mH

and

s-on

Session

Pro

cess Min

ing n

ne1

2 A

pr

Page 54: Madrid Linked Data for Digital Humanities

DIVE INTO THE EVENT-BASED BROWSING OF LINKED HISTORICAL MEDIA

Page 55: Madrid Linked Data for Digital Humanities

DIGITAL HUMANITIES RESEARCHERS Media researcher Lars Arve

Røssland

of the University of Bergen. (Photo: Andreas R

. Graven)

EXPLORATIVE SEARCH

Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation

Page 56: Madrid Linked Data for Digital Humanities
Page 57: Madrid Linked Data for Digital Humanities

DATA: OPENIMAGES.EU

Open videos Netherlands Institute for Sound and Vision~3000, mostly news broadcastsDescriptions

Page 58: Madrid Linked Data for Digital Humanities

DATA: DELPHER.NL

Scans of Radio bulletins (hand annotated)1937 – 19841.5 Million OCR’ed and NErred

Page 59: Madrid Linked Data for Digital Humanities

ENTITY EXTRACTION

CROWDTRUTH.ORG

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

Page 60: Madrid Linked Data for Digital Humanities

SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS

DIVE:MEDIA OBJECT SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

LINKS TO EUROPEANA (MULTILINGUAL)LINKS TO DBPEDIA

Page 61: Madrid Linked Data for Digital Humanities

DIGITAL SUBMARINE UI

http

s://ww

w.flickr.co

m/p

ho

tos/b

enjcarso

n/2

45

17

18

85

http

s://ww

w.flickr.co

m/p

ho

tos/m

ibu

chat/2

77

42

51

41

5

INFINITY OF EXPLORATION

Page 63: Madrid Linked Data for Digital Humanities

Linked Data allows for new types of Humanities research

• Integrate datasets• Without the need to force everything into one datamodel• Retain original model and intent, reuse another day• New research questions (but at which level?)

• Re-use background knowledge • Common-sense or very specific• Digital hermeneutics

• Provenance fits very well

• Linked Data is the (technically) best way to publish and share your research data