linked data for digital humanities - big data summerschool

48
Linked (Open) Data for Digital Humanities Big Data in Society 2016 Amsterdam Summer school Victor de Boer With input from Christophe Guéret, Serge ter Braake, Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes

Upload: victor-de-boer

Post on 09-Jan-2017

243 views

Category:

Education


0 download

TRANSCRIPT

Linked (Open) Data for Digital Humanities Big Data in Society 2016 Amsterdam Summer school

Victor de Boer With input from Christophe Guéret, Serge ter Braake,

Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes

Victor de Boer Web & Media Group, CS, Vrije Universiteit Amsterdam Netherlands Institute for Sound and Vision

Linked Data for Cultural Heritage Linked Data for Digital History Linked Data for Development

Digital Humanities Part of the effort of researcher is moved from the physical archives to digital ones

Img:w

ww

.doaks.org, ww

w.dkrz.de

Cross - researcher - institution - project - domain collaborations

“Digital History”

http://armstrongdigitalhistory.org/, http://www.vcdh.virginia.edu/courses/fall07/hius401-f/, http://digitalhistory.unl.edu/essays/thomasessay.php, http://www.philipvickersfithian.com/2013/05/gender-in-stacks-on-managing-small.html

“That is great. I would love that… …but my research questions are slightly different.”

Img:Monty Python

Aging

Data Tool

C. Guéret based on http://redmonk.com

/jgovernor/2007/04/05/why-applciations-are-like-fish-and-data-is-like0w

ine/

Data as end-product Do not bake the data into the tool Build tools on top of the data. Make sure others can do so as well.

Fig: C. Guéret

Linked Data

The best way to expose your data?

Machine-readable Web Humans are very good at reading (web) documents and distilling information

Computers are good at calculating, combining and filtering information. But they are very bad at reading documents

We need to write down (web) data, information and knowledge in a way that machines can understand

Web of Documents (WWW) Linked Documents

Web of Data Linked Data

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web) And the Semantic Web

BIG ? LINKED? OPEN?

BIG DATA

LINKED DATA

OPEN DATA

BIG DATA

LINKED DATA

VARIANCE as one of the V’s of Big Data

VOLUME, VELOCITY are

challenges for LD

LINKED DATA

OPEN DATA

LINKED OPEN DATA

as the way to publish and reuse datasets across the Web

www.w3.org/designissues/linkeddata.html

http://lod-cloud.net/

How does all this work? Data, not documents Structured data Graph (networked) data! W3C Web standards stack

URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.

Four rules of Linked Data 1. Use URIs as names for things

2. Use HTTP URIs so that people can look up

those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF)

4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Use HTTP URIs for Things Uniform Resource Identifier (URI) used to identify a name of a resource http://rijksmuseum.nl/data/painting1 I can go there using HTTP (dereference) and then I get information about it

HTML page for humans RDF data for machines

Semantic Web standard for writing down data, information (Subject, Relation, Object)

<Painting001, has_location, Amsterdam>

Resource Description Framework (RDF)

Painting001 Amsterdam has_location

Triples reusing the same URIs form Graphs

Linked Data for E-science: OpenPHACTS

But what about the humanities?

Two examples Digital History and Media studies

Dutch Ships and Sailors

The Problem: ((Maritime) historical) data is not integrated

KB NEWSPAPERS

Dutch-Asiatic Shipping “VOC Opvarenden”

Jur Leinenga Matthias van Rossum

Elbing voyages Archangel voyages

DIFFERENT but LINKED DATAMODELS BASED ON COMPETENCY QUESTIONS

dss:Record gzmvoc:Telling

gzmvoc:telling-1046-De_Berkel

__bnode_1

gzmvoc:aziatischeBemanning

dss:Ship gzmvoc:Schip

gzmvoc: schip-1046-De_Berkel

dss:has_ship gzmvoc:schip

"1046"

“Schip”

“De Berkel”

rdfs:label dss:scheepsnaam

gzmvoc:scheepsnaam

dss:ShipType gzmvoc:Scheepstype gzmvoc: type-Ship

dss:has_shiptype gzmvoc:has_shiptype

gzmvoc:scheepstype

“21”

“Moorse mattroosen”

dss:azRegistratieKop

gzmvoc:azAantalMatrozen

gzmvoc:telling

gzmvoc:heeft DAS heenreis

dss:Record das:Voyage

das:voyage-1918_61

mdb:Schip1 mdb:Kof

mdb:scheepsType

das:ShipX das:Kofship

das:typeOfShip

Aat:Kof

Aat:Platbodems

skos:exactMatch

skos:exactMatch

skos:exactMatch

Link to other datasets

Links to Historical Newspapers published by Royal Library

- Andrea Bravo Balado

Identifying ships

Rather than irreversible normalization, we can add (sameAs) links

– Robin Ponstein

mdb:Alberdina1 mdb:Alberdina2

owl:sameAs

Provenance (1) Individual named graphs have provenance information

Who made it (people/software?) Based on what source Content confidence Prov-O vocabulary

hasOriginalScan

Provenance (2)

HTTP://SEMANTICWEB.CS.VU.NL/DSS

Data analysis and visualisation

DIVE INTO THE EVENT-BASED BROWSING OF LINKED HISTORICAL MEDIA

MEDIA HISTORIANS AND RESEARCHERS Media researcher Lars Arve Røssland of the U

niversity of Bergen. (Photo: Andreas R. Graven)

EXPLORATORY SEARCH

Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation

Four data sources

OPENIMAGES.EU 300 News videos (1920s-1970s)

DELPHER Radio News Bulletins 2210 Scripts (1945-1985)

AMSTERDAM MUSEUM 3541 Objects (1950-1989)

TROPENMUSEUM ~3000 objects (20th C)

ENTITY EXTRACTION

CROWDTRUTH.ORG

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

LINKED DATA KNOWLEDGE GRAPH

DIVE:MEDIA OBJECT SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

PLACE

PLACE

DIGITAL SUBMARINE UI

https://ww

w.flickr.com

/photos/benjcarson/245171885 https://w

ww

.flickr.com/photos/m

ibuchat/2774251415

INFINITY OF EXPLORATION

“DIGITAL SUBMARINE” INTERFACE

DIVEPLUS.BEELDENGELUID.NL

Linked Data allows for new types of (Humanities) research

• Graphs, not tables

• Distributed, heterogeneous data • Integrate datasets in a flexible way • Cross-collection, -institution, -domain • Re-use background knowledge

• Provenance fits very well

• Linked Data is the (technically) best way to publish and share your research data

Thank you!

Victor de Boer

http://victordeboer.com [email protected]