linked open data and the digital archaeological workflow at the swedish national heritage board
TRANSCRIPT
Linked Open Data
and
The Digital Archaeological Workflow
at the
Swedish National Heritage Board
Marcus Smith
A little background• Marcus Smith,
Operations Officer at theSwedish National Heritage Board on Gotland, Sweden
• Government agency for heritage and the historic environment
• Preservation, use/re-use, development
• Works with linked open data, and digitising the workflow of archaeological practice
SOCHSwedish Open Cultural Heritage
• K-samsök –‘Cultural Cross-Search’http://www.ksamsok.se/
• Metadata aggregator & web service for cultural heritage institutions
• Monuments, buildings, museum collections…
SOCHSwedish Open Cultural Heritage
• K-samsök –‘Cultural Cross-Search’http://www.ksamsok.se/
• Metadata aggregator & web service for cultural heritage institutions
• Monuments, buildings, museum collections…
• ≈40 institutions
(≈25–30 million triples)
• 2.1 million artefacts
• 880 thousand photographs
• 830 thousand monuments
• 440 thousand documents
• 110 thousand historic buildings
• 40 thousand personages
• 2000 historical events
• 1500 historic maps
SOCHSwedish Open Cultural Heritage
• K-samsök –‘Cultural Cross-Search’http://www.ksamsok.se/
• Metadata aggregator & web service for cultural heritage institutions
• Monuments, buildings, museum collections…
• ≈40 institutions
• ≈5 million database objects
Harvesting, Linking &
Dissemination• Object metadata
harvested from the content provider using OAI-PMH
Cultural Heritage
Institution’s
Database
SOCH
Local SOCH
adapter
OAI-PMH
Harvesting, Linking &
Dissemination• Object metadata
harvested from the content provider using OAI-PMH
• The metadata is then enriched with additional semantic links to related objects
Burial
mound
depicted by
described by found at
ArtefactDocument
PhotoVendel
Period
dated to
Harvesting, Linking &
Dissemination• Object metadata
harvested from the content provider using OAI-PMH
• The metadata is then enriched with additional semantic links to related objects
• Links can be manually added (UGC)
Burial
mound
has topic
Book
Wikipedia
Article
describes
Harvesting, Linking &
Dissemination• Object metadata
harvested from the content provider using OAI-PMH
• The metadata is then enriched with additional semantic links to related objects
• Links can be manually added (UGC)
• Available as RDF, queryable via an API
SOCH
Application
RDF/XML
JSON-LD
REST +
CQL
HT
TP
Benefits of Linking• Linking facilitates cross-
search
• Linking simplifies discovery, and clarifies context
Obje
ct
meta
da
ta
Re
late
de
xte
rna
l
ob
jects
Images
Re
late
dS
OC
H o
bje
cts
The old gallows (Galgberget), Visby – Riksantikvarieämbetet
Benefits of Linking• Linking facilitates cross-
search
• Linking simplifies discovery, and clarifies context
• Linking allows unanticipated connections appear!
The old gallows (Galgberget), Visby – Riksantikvarieämbetet
Benefits of Linking• Linking facilitates cross-
search
• Linking simplifies discovery, and clarifies context
• Linking allows unanticipated connections appear!
’Galgberget: Memories of Wisby’ – Västergötlands Museum
SOCH as a Platform• SOCH as a platform for
development
• Kringla: a web interfacehttp://kringla.nu/
• Mobile apps
• Mashups
SOCH as a Platform• SOCH as a platform for
development
• Kringla: a web interfacehttp://kringla.nu/
• Mobile apps
• Mashups
• Museum portals
• Over 225 million API requests since launch in 2010
Licensing & Reuse• Only metadata is indexed
– all objects link back to a permanent URI at the source institution with their full record
Licensing & Reuse• Only metadata is indexed
– all objects link back to a permanent URI at the source institution with their full record
• All metadata is CC0• Metadata includes
licensing information for the main record
Licensing & Reuse• Only metadata is indexed
– all objects link back to a permanent URI at the source institution with their full record
• All metadata is CC0• Metadata includes
licensing information for the main record
• Of 1.8 million ‘rich’objects, 1.2 million are CC or PD
Licensing & Reuse• Only metadata is indexed
– all objects link back to a permanent URI at the source institution with their full record
• All metadata is CC0• Metadata includes
licensing information for the main record
• Of 1.8 million ‘rich’objects, 1.2 million are CC or PD
• SOCH is the Swedish national aggregator for Europeana
The Future of SOCH• More institutions delivering
data
• Triplestore, SPARQL endpoint
• Ultimately, we’d like it if SOCH in its current form wasn’t needed – if each institution made their own data available as SPARQL-queryable RDF on the semantic web.
SOCH
API
The Future of SOCH• More institutions delivering
data
• Triplestore, SPARQL endpoint
• Ultimately, we’d like it if SOCH in its current form wasn’t needed – if each institution made their own data available as SPARQL-queryable RDF on the semantic web.
’Charles Babb parts storage’ – SDASM (flickr)
The Problem• No central fieldwork
register
• No central digital archive for archaeological data
The Problem• No central fieldwork
register
• No central digital archive for archaeological data
• Digital availability of fieldwork reports is patchy
The Problem• No central fieldwork
register
• No central digital archive for archaeological data
• Digital availability of fieldwork reports is patchy
• Existing resources not linked
’silos’ – Doc Searls (flickr)
The Problem• No central fieldwork
register
• No central digital archive for archaeological data
• Digital availability of fieldwork reports is patchy
• Existing resources not linked
• Inefficient information transfer(digital → paper → digital)
How It Works – The Computer.
The Output Unit. (Ladybird books)
’CERN storage servers’ – skimaniac (flickr)
Goals for DAP• Fully digitised seamless
information transfer
• Digital archive for
archaeological data
Goals for DAP• Fully digitised seamless
information transfer
• Digital archive for
archaeological data
• Access to source data
Goals for DAP• Fully digitised seamless
information transfer
• Digital archive for
archaeological data
• Access to source data
• Semantically linked
data
’Anchor Men of the Mauretania’
Tyne and Wear Archives and Museums (flickr)
’Come in We’re Open’ – jilleatsapples (flickr)
Goals for DAP• Fully digitised seamless
information transfer
• Digital archive for archaeological data
• Access to source data
• Semantically linked data
• Openly licensed, re-useable data
• National ‘events’ register
DAP so far…• Government directive, with extra funding for five years
• LOD as a core idea; openness and transparency as core
values
• Collaborative effort with the archaeological community
• DAP requires a new data infrastructure for us at RAÄ
• DAP requires a new way of working for archaeologists in
Sweden:
– Technical challenges
– Licensing challenges
– Mindset challenges
DAP so far…• Already in place:
– SAMLA reports/PDF
repository:
http://samla.raa.se/
– Processes mapped
DAP so far…• Already in place:
– SAMLA reports/PDF
repository:
http://samla.raa.se/
– Processes mapped
– Conceptual modeling
ongoing
Acto
r / Ro
le
Org
anis
atio
n
Org
anis
atio
nLegal
framework
Legal
fram
ew
ork
Legal fra
mew
ork
Archaeological
event
Method Analy
sis
Fie
ldw
ork
Documentation
Resolution
Documentation
Development
Research
eventActor
/
Role
Assessment event
Legal
status
Monument
typePeriod
Information
management event
La
nd
ma
na
gem
ent
eve
nt
Legal
event
Natural
event
Tangible
Heritage
Temporal
ContextGeographical
Context
Event
Context
Operative
Context
DAP so far…• Already in place:
– SAMLA reports/PDF repository:http://samla.raa.se/
– Processes mapped
– Conceptual modeling ongoing
• Still to plan:– protocols & formats
– data mapping
– digital archive…
• To do straight away:
– rescue fieldwork data
– start a skeleton of an
events register
– …and ‘master data’
such as ontologies,
thesauri/controlled
vocabularies!
Structured Vocabularies• SOCH publishes LOD…
• …but the majority of the classification metadata is still text strings, rather than URIs pointing to terms in authoritative controlled vocabularies
• We’re going to need a number of such thesauri in for the data a future DAP infrastructure is going to handle
• Monuments types
• Legal status
• Events
• Periods
• Materials
• Built heritage
• Evidence types
• Techniques
• Artefact types
• …etc
• Extant/non-existent
• Internal/external
Who manages what data?• Local authorities: resolutions
• Fieldwork units: field documentation; produce reports
• National Heritage Board: national monuments register, buildings register, monuments types thesaurus, etc; archive reports
• Forest Agency: forest sites
• Museums: finds
• Universities, SND: research data, analyses
• National Land Survey: geospatial data
• Law: legal terms/concepts, legal events
• We need to be able to
manage the data
we're responsible for
• We need to be able to
connect to (fetch)
data that external
bodies are
responsible for, and
react when they
change
Challenges• We welcome suggestions and feedback -
we're very much finding our way as we go!
• DAP is a massive undertaking, and we
don’t want to reinvent the wheel if we can
help it.
DAP
SOCH
http://www.raa.se/dap
http://www.ksamsok.se/
http://www.kringla.nu/
@carwash