godby kss ala alcts what it takes to make linked data work...
TRANSCRIPT
ALCTS Preconference, “Beyond the Looking Glass: RealWorld Data. What Does It Take to Make It Work?”San Francisco, CA 26 June 2015
An OCLC Perspective on What It Takes to Make Linked Data Work
Karen Smith-Yoshimura and Jean GodbyOCLC Research
To make linked data work, we need…
Good data!
Structured, accurate, unambiguous, actionable and can be linked to
other data.
The incremental value of linked data
Data consumed outside the original domain or creation
context
Data consumed outside the original domain or creation
context
Machine-understandable semantics
Machine-understandable semantics
Cleaner, more normalized dataCleaner, more normalized data
Complex data queries without pre-built indexes
Complex data queries without pre-built indexes
Active or actionable data
Active or actionable data
Web syndication
Web syndication
What we want to do
Embed library option here
What we want to do
Ingest original script (Gujarati here) forreaders who can read the original
Present information in the preferred language and script of the user
Create structured descriptions of library resources…so they can be recognized as ‘things’ in the broader Web.
What we want to do
Original title: Щелкунчик, Балет-феерияGenre: balletTranslated title: “The Nutcracker”Composer: Peter Ilych TchaikovskyChoreographer: Marius Petipa
Original title: Щелкунчик, Балет-феерияGenre: balletTranslated title: “The Nutcracker”Composer: Peter Ilych TchaikovskyChoreographer: Marius Petipa
?
The Nutcracker isn’t a thing (yet)
WHAT WE HAVE NOW
Examples:• id.loc.gov/authorities/names/n79104267• isni.org/isni/0000000381493996• viaf.org/viaf/89803084• wikidata.org/wiki/Q9049
Identify:
A unique, persistent and public URI associated with a digital object and resolvable globally over networks via specific protocols that is unambiguous to use, find and identify the resource.
Identifier: a definition
Why things, not strings
English text may refer to:• Bibliothèque nationale de France• BnF• National Library of France
Texts in other languages may refer to:• 法國國家圖書館• ا�وط�������� �ر��� • Εθνική Βιβλιοθήκη της Γαλλίας• צרפתהספרייה הלאומית של • フランス国立図書館•• Национальная библиотека Франции
VIAF aggregates identifiers
Wikidata disseminates identifiers
VIAF consumes Wikidata
• Resources in nearly all languages
• Contributed by more than 20,000 libraries worldwide
• More than half the database is for works not in English
Languages
English
German
French
Spanish
Chinese
Dutch
Japanese
Russian
Arabic
469 others
WorldCat today
But: Top 15 languages in WorldCat are written in non-Latin character sets
Ιλιάδα
زقاق المدق
Война и миръתשובה- דער בעל
紅樓夢
源氏物語
OCLC’s linked data resources
WorldCat Catalog:15 billion triples
WorldCat Works: 5 billion RDF triples
FAST:23 million
triples
VIAF: 2 billion triples
ISNI: 10-50 million triples
DDC: 300 million triples
OUR PROCESS
From records to things: ‘Work’
From records to things: ‘Person’
Mockup
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:
Title: 西遊記
Language: JapaneseTranslator: 中野美代子
Date: 1986IsTranslationOf:
Title: Monkeys PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
# Original Work (in Chinese)<http://worldcat.org/entity/work/id/1215997>
a schema:CreativeWork;schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian”schema:inLanguage "zh";schema:name "靈山"@zh.
.# Translated Work (in English)<http://worldcat.org/entity/work/id/145209748>
a schema:CreativeWork;schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“ [new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel"schema:inLanguage "en";schema:name "Soul Mountain"@en ;[new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> .
Markup for the Semantic Web
Even the best algorithms still need manual intervention
Split off the “Murakami Haruki” with same romanization; different romanizations of same title also resulted in non-match.
These still need to be merged.
Originally 3 clusters each fora different title but by the same author
WHAT YOU CAN DO TO MAKE THE TRANSITION EASIER!
Mockup
Mockup
Language code of original
Original title entry
Uniform title
Added entry for translator, with role term
A good example
Without added entries, we must parse the 245 $c for translator in different languages
Nice! Added entries for translators – with role
term
Also nice! Intermediate translation coded (Vietnamese
translation from the French translation of the Danish)
Distinguish translations into the same language by translator
Jan 2015: 20,108,253 WorldCat records with a 700 $e included for translators:
Free text is unreliable
30,574,365 records with 700 $4: 1,148,813 had code trl
• 305,143 Tł• 238,839 translator.• 217,074 tr• 179,368 Ü̈bers. • 162,510 Traduction. • 138,471 trad.• 136,569 yi.• 22,947 Trad.
68% of 700 fields have no $e or $4
A sound recording
PersonYo-Yo Ma
PersonBobby
McFerrin
CreativeWork
CreativeWork
Organization
schema:performer
‘Manifestation’
‘Work’
schema:exampleOfWork
schema:contributor
The first-draft linked data model
More evidence for the model
A good example
A good example
No redundant role data
Plenty of 700 fields
Specific field semantics and easily parsed text
An obvious primary creator
Some parsing results
Organization“Columbia Records”
schema:publisher
MusicEvent, CreativeWork“Charles Mingus and friends”
schema:workPerfomed
Person“Charles Mingus”
schema:creator
Person“Dizzie Gillespie”
Person“Joe Chambers”
Person“Bill Cosby”
schema:performer
Person“Milt Hinton”
Person“Charles Mingus”
drums
host
vocals
bass
bass
CreativeWork,Music Album
A more expressive model
schema:encodesCreativeWork
CreativeWork,sound recording
• Use uniform titles • Use added entries with role codes (7xx and $4)• Use 041 for translations, including intermediate translations• Use indicators to refine the meaning
• Use the most specific fields appropriate for a descriptive task
• Minimize the use of 500 fields• Obey field semantics• Avoid redundancy
If you must use free text:• Use established conventions• Use standardized terms
Least machine-processable
Most machine-processable
Algorithmically recoverable
Our recommendations
To make linked data work, we need…
Good data!
Structured, accurate, unambiguous, actionable and can be linked to
other data.
RESOURCES
http://www.oclc.org/research/themes/data-science.html
For more information• Godby, Carol Jean, and Ray Denenberg. 2015. Common Ground: Exploring
Compatibilities Between the Linked Data Models of the Library of Congress and OCLC. Dublin, Ohio: Library of Congress and OCLC Research.http://www.oclc.org/content/dam/research/publications/2015/oclcresearch-loc-linked-data-2015.pdf
• Godby, Carol Jean, Shenghui Wang and Jeffrey K. Mixter. 2015. Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Morgan & Claypool, in press. http://www.morganclaypool.com/toc/wbe.1/1/1
• Godby, Carol Jean. “Using Schema.org for Library Resource Description,” in library linked data volume edited by Ed Jones. ALA/ALCTS, forthcoming. http://www.oclc.org/content/dam/research/publications/2015/oclcresearch-using-schema-preprint-2015.pdf
• RDA. 2015. “RDA Element Sets: Expression Properties.” http://www.rdaregistry.info/Elements/e/
• Van Malssen, Kara. 2014. BIBFRAME AV Modeling Study: Defining a Flexible Model for Description of Audiovisual Resources. http://www.loc.gov/bibframe/pdf/bibframe-avmodelingstudy-may15-2014.pdf.
SM
Together we make breakthroughs possible.
Thank you!
Contact: Karen Smith-Yoshimura
ALCTS Preconference, “Beyond the Looking Glass. Real World Linked Data. What Does It Take to Make It Work?”
San Francisco, CA 26 June 2015
Jean [email protected]@oclc.org
@KarenS-Y