madrid linked data for digital humanities

Post on 15-Jul-2015

96 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linked Data for Digital Humanities

Victor de Boer

About me

Victor de BoerAssistant professorWeb & Media Group, Network InstituteVU University Amsterdam

Semantic Technologies, Linked Data

Cultural HeritageDigital History

Linked Data for Development

What is Linked Data

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)

Web of Documents (WWW)Linked Documents

From text to data > increased semantics

More and more structured data available online

Governments

Social web data

Medical data

Museums

Research data

?

Mo

verum

.com

Web of Documents vs Web of DataPeople are often not interested in documents, they are interested in things (information)

Humans are very good at reading (web) documents and distilling information

Computers are good at calculating, combining and filtering information. But they are very bad at reading documents

We need to help machines understand web dataWrite it down in a way that they can understand

LINKED DATA!!

Web of Documents (WWW)Linked Documents

Web of DataLinked Data

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)And the Semantic Web

What is Linked Open Data?

Open Datais about licenses to allow reuse

Linked Datais about technology for interoperability

www.w3.org/designissues/linkeddata.html

http://lod-cloud.net/

Google knowledge graph

ww

w.h

uffin

gton

po

st.com

OpenPhacts explorer

http://www.openphacts.org/

How does all this work?Data, not documentsStructured dataGraph (networked) data!W3C Web standards stack

URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.

Four rules of Linked Data1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF)

4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Semantic Web standard for writing down data, information

(Subject, Relation, Object)

<Painting001, has_location, Amsterdam>

Resource Description Framework (RDF)

Painting001 Amsterdamhas_location

Use HTTP URIs for ThingsUniform Resource Identifier (URI) is a string of characters used to identify a name of a resource

http://rijksmuseum.nl/data/schilderij1

I can go there (dereference) and then I get information about it

HTML page for humansRDF data for machines

LinksLink your data to other data

By establishing RDF triples that point to other people’s dataBy reusing other people’s URIs

Linked Data is ``a term used to describe arecommended best practice for exposing,sharing, and connecting piecesof data, information, and knowledge on theSemantic Web using URIs and RDF.’’ --Wikipedia

Why Linked Data for E-scienceLarge amounts of dataEfficient analysis, data mining

Sharing data, information and knowledge between scientists

Across continentsAcross disciplines

But what about the humanities?

Some examples

MultimediaN E-Culture project (2006)

Museums have increasingly nice websites

But: most of them are driven by stand-alone collection databases

Data is isolated, both syntactically and semantically

If users can do cross-collection search, the individual collections become more valuable!

Semantic Search

E-Culture data cloud

Vocabulary alignment“Easel-pieces”

RMA concept“Schilderij”

RMA is the thesaurus of Rijksmuseum

AAT artefact type“Easel Piece”“Painting”

AAT is Getty’s Art & Architecture Thesaurus

http://e-culture.multimedian.nl/

BiographyNet

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Linked Data for BiograpyNet

Thorbecke

Biographical Description

ProvenanceMeta Data

NNBW

PersonMeta Data

“Thorbecke”

BiographyParts

Birth1798

Event

Biographical Description

Enrichment NLP Tool

PersonMeta Data

EventBirth

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Zwolle1798-01-14

Interface for historians (mockup)

Het Koninkrijk der Nederlanden in de Tweede WereldoorlogHistory of German occupied Dutch society (1940-1945)Published 1969 - 1991 in 14 volumes, 30 parts, 18.000 pages

1. Digitization

2. Open Data

3. Enriched access with Linked Data

E-History: Verrijkt Koninkrijk

Step 1: Lou de Jong’s “Het Koninkrijk” was digitized and made available in a reusable format

Step 2: Named Entity Recognition and consolidation of the back-of-the-book index provide structured vocabularies with links into the text

country, collection, doc-type, volume, chapter, section, sub-section, paragraph

Back-of-the-Book index Named Entities

Verrijkt Koninkrijk

Step 3: Enrichment with Linked Data makes new ways of interaction and analysis possible

Back-of-the-Book index Named Entities

niod:Blitzkrieg

niod:oai_wo2_niod_nl_rec_102045dct:subject

http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386

botb:Blitzkrieg

skos:exactMatch

43

National-Socialist

29%

Social-Democrat

21%Protestant

13%

Liberal12%

R-Catholic12%

Communist8%

Jewish5%

http://semanticweb.cs.vu.nl/verrijktkoninkrijk/

http://search.loedejongdigitaal.nl/

Results are links to paragraphs

Hackathon shows re-usability

Dutch Ships and Sailors Project

The Problem:((Maritime) historical) data is not integrated

• Researchers’ data is “lost”– In different physical locations– In different file formats– In different semantic structures

• In a workshop, we identified 25+ maritime historical datasets. – http://dutchshipsandsailors.nl

• We do not want to force one monolithic data model for integration

The solution: Linked Open Data

• Represent heterogeneous datasets with their own data models– In one data format (RDF)– Link what can be linked to integrate at

project level (and beyond)– Keep specificity of original data

• Links to other sources: re-use knowledge

• Allow multiple levels of semantic enrichment/ normalization – through Named Graphs – Provenance

What we did1. Model four maritime historical datasets as

RDF– Noordelijke Monsterrollen Database [J. Leinenga]– Generale Zeemonsterrollen [M. van Rossum]– Dutch Asiatic Shipping– VOC Opvarenden

2. Link to each other (based on ships, ship types, ranks, geography,…)– Models and links evaluated by domain experts

1. Publish as Linked Open Data

2. Show how this data cloud can lead to new types of integrated research questions

Links to Historical Newspapers

[HARLINGEN, 24 October.] …gestrand.

Tevens is het berigt ontvan°e > dat het hier

behoorende schoonerschip Transit,

kapitein Schaap, in de Noordzee is

gezonken, nadat het achterschip was

weggeslagen ; een ligtmatroos verloor

daarbij het leven. Mede zijn hier drie

vreemde schepen met meer en minder

zware averij binnengeloopen.

- Andrea Bravo Balado

DAS

GZMVOC

MDB

VOCOPVBegunstig

den

VOCOPVSoldijboek

en

PROV

AAT

VOCOPVOpvaren

den

foaf

owl:sameAs

dss:hasKBLink

rdfs:subClassOf,rdfs:subPropertyOf

dss:DAS link

skos :exactMatch

Data analysis and visualisation

http://dutchshipsandsailors.nl/data

Hands-on session tomorrow

fluxico

n.co

mH

and

s-on

Session

Pro

cess Min

ing n

ne1

2 A

pr

DIVE INTO THE EVENT-BASED BROWSING OF LINKED HISTORICAL MEDIA

DIGITAL HUMANITIES RESEARCHERS Media researcher Lars Arve

Røssland

of the University of Bergen. (Photo: Andreas R

. Graven)

EXPLORATIVE SEARCH

Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation

DATA: OPENIMAGES.EU

Open videos Netherlands Institute for Sound and Vision~3000, mostly news broadcastsDescriptions

DATA: DELPHER.NL

Scans of Radio bulletins (hand annotated)1937 – 19841.5 Million OCR’ed and NErred

ENTITY EXTRACTION

CROWDTRUTH.ORG

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS

DIVE:MEDIA OBJECT SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

LINKS TO EUROPEANA (MULTILINGUAL)LINKS TO DBPEDIA

DIGITAL SUBMARINE UI

http

s://ww

w.flickr.co

m/p

ho

tos/b

enjcarso

n/2

45

17

18

85

http

s://ww

w.flickr.co

m/p

ho

tos/m

ibu

chat/2

77

42

51

41

5

INFINITY OF EXPLORATION

Linked Data allows for new types of Humanities research

• Integrate datasets• Without the need to force everything into one datamodel• Retain original model and intent, reuse another day• New research questions (but at which level?)

• Re-use background knowledge • Common-sense or very specific• Digital hermeneutics

• Provenance fits very well

• Linked Data is the (technically) best way to publish and share your research data

Thank you!

Victor de Boer

http://victordeboer.comv.de.boer@vu.nl

top related