embedded uri in marc - american library...
TRANSCRIPT
Embedded URI in MARC An Essential for Linked Data
JACKIE SHIEH
jshieh [AT] gwu.edu
ORCiD: 0000-0003-3214-8846
http://orcid.org/0000-0003-3214-8846
From MARC to BIBFRAME: Linked Data on the Ground #3
GOALS
• Benefits and Challenges of Inserting URI/IRIs in MARC
• Workflows for Inserting URI/IRIs in Bibliographic Records
• Tools for Batch Processing of URI/IRIs
• Process for Batch Insertion of URI/IRIs
2
10/20/2016
BENEFITS & CHALLENGES
• Web-based Research
• Structured Data
• Publish Data on the Web
• Linked Documents & Linked
Data
• Data Publishing
• Data Consumption
• Implementation
• Integration
3
10/20/2016
TODAY’S STACK
• URI/IRI
• RDF
• http
• Data Format
• MARC
• RDF/XML
• MarcNext
Process
• Batch
• Single
• Tools & Vocabularies
• Auto lookup
• Manual lookup
4
10/20/2016
URI SYNTAX & FORMATS
RESOURCE IDENTIFIER (URI/IRI)
Document
http URI
(HTML)
Web
Browser
Resource
Identifier
(URI/IRI)
Document
http URI
(RDF)
Web
Application
6
10/20/2016
URIref
uniquely identifies Things
URL
Method
for finding things
ex: http://, ftp://
with file extension,
e.g. *.html,
*.pdf, *.txt
URN
Names,
alphanumerical
strings, ex:
doi, ISSN, OCoLC
Dereferenceable URI
machine processible,
dereferenceable unique
identifier, ex:
http://id.loc.gov/[value]
URI
7
10/20/2016
SYNTAX: LINKED DATA URI
8
10/20/2016
HTML REPRESENTATION
9
10/20/2016
RDF/XML REPRESENTATION
10
10/20/2016
TURTLE REPRESENTATION
11
10/20/2016
JSON REPRESENTATION
12
10/20/2016
INTERNATIONALIZED URI (IRI)
13
10/20/2016
URI in MARC
WHERE in MARC
• MARC Subfields
• $u - Uniform Resource Identifier
• $0 - Authority record control
number or standard number
• RDF Triple & MARC Subfields
• Resource document: $u
• Resource Object: $0
15
10/20/2016
MarcEditMarcNext – Linked Identifiers
MarcNEXT: LINKED IDENTIFIER
• Data Source: Assumed LC (id.loc.gov)
• Data Formats: MARC (*.mrc) or Mnemonic (*.mrk)
• $0: URI/Resource Object
• 1xx, 7xx
• 6xx
(if Ind2 value ≠ 0 (LC) or 2 (MeSH), but = 7, check $2 for Vocab source)
• 336, 337, 338 (Content, Media, Carrier)
17
10/20/2016
LINKED IDENTIFIERS
18
10/20/2016
GUI: 1xx/7xx
19
10/20/2016
BEFORE & AFTER
10/20/2016
20
GUI: 6xx
21
10/20/2016
10/20/2016
22
BEFORE
AFTER
GUI: 3xx
23
10/20/2016
10/20/2016
24
BEFORE
AFTER
GUI: VIAF
25
10/20/2016
10/20/2016
26
BEFORE & AFTER
GUI: OCLC WORK IDENTIFIER
27
10/20/2016
EMBED WORK ID
10/20/2016
28
WorldCat.ORG
10/20/2016
29
GUI: ALL FIELDS
30
10/20/2016
What about Vernacular Scripts?
ALTERNATE GRAPHIC REPRESENTATION (880)
32
10/20/2016
MarcEdit Command Line
MarcNext PARAMETERS
• -buildlinks: Specifies the Semantic Linking algorithm
• -options: Specifies linking options to use:
Example: lcid,viaf:lc,autodetect,3xx,oclcworkid
• lcid: utilizes id.loc.gov to link 1xx/7xx data
• viaf: linking 1xx/7xx using viaf. Specify index after colon
• autodetect: autodetects subjects and links to know values
• 3xx: autodetects values in 3xx fields and links to known values
• oclcworkid: inserts link to oclc work id if present
34
10/20/2016
lcid,viaf:lc,autodetect,3xx,oclcworkid
BUILD LINKS via COMMAND LINE
cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,3xx,autodetect,viaf:lc
http://marcedit.reeset.net/building-linked-data-links-via-the-command-line35
10/20/2016
RDF URI ENDPOINTS & VALIDATION
DEFAULT VOCABULARIES (ONTOLOGIES)
• id.loc.gov
• id.nlm.nih.gov/mesh
• viaf.org
• d-nb.info
37
10/20/2016
EDIT MarcNext RULES
• Locate the Rules file (linked_data_profile.xml) in “Configs”
folder
• Edit the Processing Field in RULES
• Edit the Collection definition in COLLECTIONS
http://marcedit.reeset.net/editing-marcedits-linked-data-rules-file
38
10/20/2016
SHORTCUT TO RULES
39
10/20/2016
PROCESSING FIELD
<field type="bibliographic">
<tag>650</tag>
<subfields>abvxyz</subfields>
<ind2 vocab=“lcsh" value="0"/>
<ind2 vocab=“lcshac" value=“1"/>
<ind2 vocab=“mesh" value=“2"/>
<ind2 vocab=“none" value=“7"/>
<index>2</index>
<uri>0</uri>
<special_instructions>subject</special_instructions>
</field>
40
10/20/2016
http://marcedit.reeset.net/editing-marcedits-linked-data-rules-file
COLLECTIONS DEFINITION
<collection>
<name>Getty ULAN</name>
<label>ulan</label><uri> http://vocab.getty.edu/sparql.json?query=select ?Subject
?Term ?Parents ?ScopeNote {%0A%0A ?Subject a skos:Concept;
luc:term ' "{search_terms}" ';%0A%0A gvp:prefLabelGVP
[xl:literalForm ?Term].%0A%0A optional {?Subject
gvp:parentStringAbbrev ?Parents}%0A%0A optional {?Subject
skos:scopeNote [dct:language gvp_lang:en; rdf:value
?ScopeNote]}}%0A&toc=Combination_Full-
Text_and_Exact_String_Match&implicit=true&equivalent=false&_form=/
queriesF</uri>
<path>results.bindings[0].Subject.value</path>
</collection>41
10/20/2016
http://marcedit.reeset.net/editing-marcedits-linked-data-rules-file
DETERMINING LINKED DATA URI
42
10/20/2016
LOCATING CANONICAL URI
43
10/20/2016
VALIDATION SERVICES
• http://vm-10-100-0-3.ch.bbcarchdev.net/
• http://linkeddata.uriburner.com:8000/vapour
• http://linkeddata.uriburner.com/ode/?uri=
• https://www.w3.org/RDF/Validator/
• http://oops.linkeddata.es/
• http://rdf-translator.appspot.com
44
10/20/2016
BBC RES VALIDATION
45
10/20/2016
http://id.loc.gov/authorities/names/n78078534
BIBFRAME TRANSFORMATION
MARC TO BIBFRAME
49
10/20/2016
MarcNext
50
10/20/2016
LC BIBFRAME TRANSFORMATION
http://bibframe.org/tools/transform/start 51
10/20/2016
BF RDF/XML (MarcNext)
52
10/20/2016
RECAPS
• RDF URI/IRI with HTTP Scheme
• Where URIs go in MARC ($0 for now)
• MarcNext Workflow for Inserting URI/IRI in MARC
• External Vocabulary Sources
• Formulate & Validate URI/IRI Manually
54
10/20/2016
REFERENCES
• MarcEdit Download
• http://marcedit.reeset.net/downloads
• MarcEdit MARCNext: Linked Records Tool (2014) [video clip]
• https://www.youtube.com/embed/ifhxNT1TxVU
• The Importance of Identifiers in the New Web Environment and Using the
Uniform Resource Identifier (URI) in Subfield Zero ($0): A Small Step That Is
Actually a Big Step (2016) [article]
• http://dx.doi.org/10.1080/19386389.2015.1099981
55
10/20/2016
THANK YOU!
Question?
JACKIE SHIEH
jshieh [AT] gwu.edu
http://orcid.org/0000-0003-3214-8846