Download - Madrid Linked Data for Digital Humanities
Linked Data for Digital Humanities
Victor de Boer
About me
Victor de BoerAssistant professorWeb & Media Group, Network InstituteVU University Amsterdam
Semantic Technologies, Linked Data
Cultural HeritageDigital History
Linked Data for Development
What is Linked Data
http://info.cern.ch/Proposal.html
Tim Berners-Lee (The inventor of the Web)
Web of Documents (WWW)Linked Documents
From text to data > increased semantics
More and more structured data available online
Governments
Social web data
Medical data
Museums
Research data
?
Mo
verum
.com
Web of Documents vs Web of DataPeople are often not interested in documents, they are interested in things (information)
Humans are very good at reading (web) documents and distilling information
Computers are good at calculating, combining and filtering information. But they are very bad at reading documents
We need to help machines understand web dataWrite it down in a way that they can understand
LINKED DATA!!
Web of Documents (WWW)Linked Documents
Web of DataLinked Data
http://info.cern.ch/Proposal.html
Tim Berners-Lee (The inventor of the Web)And the Semantic Web
What is Linked Open Data?
Open Datais about licenses to allow reuse
Linked Datais about technology for interoperability
www.w3.org/designissues/linkeddata.html
http://lod-cloud.net/
Google knowledge graph
ww
w.h
uffin
gton
po
st.com
OpenPhacts explorer
http://www.openphacts.org/
How does all this work?Data, not documentsStructured dataGraph (networked) data!W3C Web standards stack
URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.
Four rules of Linked Data1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF)
4. Include links to other URIs. so that they can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
Semantic Web standard for writing down data, information
(Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
Resource Description Framework (RDF)
Painting001 Amsterdamhas_location
Use HTTP URIs for ThingsUniform Resource Identifier (URI) is a string of characters used to identify a name of a resource
http://rijksmuseum.nl/data/schilderij1
I can go there (dereference) and then I get information about it
HTML page for humansRDF data for machines
LinksLink your data to other data
By establishing RDF triples that point to other people’s dataBy reusing other people’s URIs
Linked Data is ``a term used to describe arecommended best practice for exposing,sharing, and connecting piecesof data, information, and knowledge on theSemantic Web using URIs and RDF.’’ --Wikipedia
Why Linked Data for E-scienceLarge amounts of dataEfficient analysis, data mining
Sharing data, information and knowledge between scientists
Across continentsAcross disciplines
But what about the humanities?
Some examples
MultimediaN E-Culture project (2006)
Museums have increasingly nice websites
But: most of them are driven by stand-alone collection databases
Data is isolated, both syntactically and semantically
If users can do cross-collection search, the individual collections become more valuable!
Semantic Search
E-Culture data cloud
Vocabulary alignment“Easel-pieces”
RMA concept“Schilderij”
RMA is the thesaurus of Rijksmuseum
AAT artefact type“Easel Piece”“Painting”
AAT is Getty’s Art & Architecture Thesaurus
http://e-culture.multimedian.nl/
BiographyNet
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Linked Data for BiograpyNet
Thorbecke
Biographical Description
ProvenanceMeta Data
NNBW
PersonMeta Data
“Thorbecke”
BiographyParts
Birth1798
Event
Biographical Description
Enrichment NLP Tool
PersonMeta Data
EventBirth
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Zwolle1798-01-14
Interface for historians (mockup)
Het Koninkrijk der Nederlanden in de Tweede WereldoorlogHistory of German occupied Dutch society (1940-1945)Published 1969 - 1991 in 14 volumes, 30 parts, 18.000 pages
1. Digitization
2. Open Data
3. Enriched access with Linked Data
E-History: Verrijkt Koninkrijk
Step 1: Lou de Jong’s “Het Koninkrijk” was digitized and made available in a reusable format
Step 2: Named Entity Recognition and consolidation of the back-of-the-book index provide structured vocabularies with links into the text
country, collection, doc-type, volume, chapter, section, sub-section, paragraph
Back-of-the-Book index Named Entities
Verrijkt Koninkrijk
Step 3: Enrichment with Linked Data makes new ways of interaction and analysis possible
Back-of-the-Book index Named Entities
niod:Blitzkrieg
niod:oai_wo2_niod_nl_rec_102045dct:subject
http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
botb:Blitzkrieg
skos:exactMatch
43
National-Socialist
29%
Social-Democrat
21%Protestant
13%
Liberal12%
R-Catholic12%
Communist8%
Jewish5%
http://semanticweb.cs.vu.nl/verrijktkoninkrijk/
http://search.loedejongdigitaal.nl/
Results are links to paragraphs
Hackathon shows re-usability
Dutch Ships and Sailors Project
The Problem:((Maritime) historical) data is not integrated
• Researchers’ data is “lost”– In different physical locations– In different file formats– In different semantic structures
• In a workshop, we identified 25+ maritime historical datasets. – http://dutchshipsandsailors.nl
• We do not want to force one monolithic data model for integration
The solution: Linked Open Data
• Represent heterogeneous datasets with their own data models– In one data format (RDF)– Link what can be linked to integrate at
project level (and beyond)– Keep specificity of original data
• Links to other sources: re-use knowledge
• Allow multiple levels of semantic enrichment/ normalization – through Named Graphs – Provenance
What we did1. Model four maritime historical datasets as
RDF– Noordelijke Monsterrollen Database [J. Leinenga]– Generale Zeemonsterrollen [M. van Rossum]– Dutch Asiatic Shipping– VOC Opvarenden
2. Link to each other (based on ships, ship types, ranks, geography,…)– Models and links evaluated by domain experts
1. Publish as Linked Open Data
2. Show how this data cloud can lead to new types of integrated research questions
Links to Historical Newspapers
[HARLINGEN, 24 October.] …gestrand.
Tevens is het berigt ontvan°e > dat het hier
behoorende schoonerschip Transit,
kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.
- Andrea Bravo Balado
DAS
GZMVOC
MDB
VOCOPVBegunstig
den
VOCOPVSoldijboek
en
PROV
AAT
VOCOPVOpvaren
den
foaf
owl:sameAs
dss:hasKBLink
rdfs:subClassOf,rdfs:subPropertyOf
dss:DAS link
skos :exactMatch
Data analysis and visualisation
http://dutchshipsandsailors.nl/data
Hands-on session tomorrow
fluxico
n.co
mH
and
s-on
Session
Pro
cess Min
ing n
ne1
2 A
pr
DIVE INTO THE EVENT-BASED BROWSING OF LINKED HISTORICAL MEDIA
DIGITAL HUMANITIES RESEARCHERS Media researcher Lars Arve
Røssland
of the University of Bergen. (Photo: Andreas R
. Graven)
EXPLORATIVE SEARCH
Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation
DATA: OPENIMAGES.EU
Open videos Netherlands Institute for Sound and Vision~3000, mostly news broadcastsDescriptions
DATA: DELPHER.NL
Scans of Radio bulletins (hand annotated)1937 – 19841.5 Million OCR’ed and NErred
ENTITY EXTRACTION
CROWDTRUTH.ORG
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS
DIVE:MEDIA OBJECT SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
LINKS TO EUROPEANA (MULTILINGUAL)LINKS TO DBPEDIA
DIGITAL SUBMARINE UI
http
s://ww
w.flickr.co
m/p
ho
tos/b
enjcarso
n/2
45
17
18
85
http
s://ww
w.flickr.co
m/p
ho
tos/m
ibu
chat/2
77
42
51
41
5
INFINITY OF EXPLORATION
Linked Data allows for new types of Humanities research
• Integrate datasets• Without the need to force everything into one datamodel• Retain original model and intent, reuse another day• New research questions (but at which level?)
• Re-use background knowledge • Common-sense or very specific• Digital hermeneutics
• Provenance fits very well
• Linked Data is the (technically) best way to publish and share your research data