the semantic web in practice: a case study at the metropolitan museum of art
DESCRIPTION
A gentle introduction to the Semantic Web, with a focus on solving practical problems in the cultural heritage domain. Discussed in the presentation are basic Semantic Web concepts, strategies for structuring unstructured data, natural language processing, and amalgamation of multiple source data stores using inferencing. These slides originally accompanied a presentation given at the 2008 Museum Computer Network conference by Koven J. Smith and Don Undeen of the Metropolitan Museum of Art, NYC.TRANSCRIPT
THESEMANTIC WEB IN PRACTICEKoven J. Smith and Don UndeenThe Metropolitan Museum of Art, NYC
Large amounts of data, multiple sources
CollectionsManagement
System
Digital AssetManagement
System
BibliographicRecords
WordDocuments
ArchivalMaterials
ArtistLetters
PublicationsDidactic
Text/Labels
Madame X:
depicts Virginie Amelie Avegno Gautreau,wife of Pierre Gautreau
was first shown at the Paris Salon in 1884
is a portrait
was created by John Singer Sargent
was originally titled “Portrait de Mme ***”
is related to a portrait by Gustave Courtois,who painted the same subject
is 82.5” by 43.5”
was acquired by MMA at the same time as“Elijah On the Fiery Chariot” by William Blake
The Semantic Web
An information network in which the nodes are linked at the DATA level, rather than at the PRESENTATION level.
Primary Problems, or, um, “Goals”
1. Store our unstructured content, and harvest usable data from it
2. Pull records and documents from multiple sources together into a single, query-able data store
Structured Content
CollectionsManagement
SystemObjectRecord
CreatorRecord
Creator Name: John Singer Sargent
Semantic MediaWiki
Triple
“Madame X” “Elijah In the Fiery Chariot”acquiredConcurrentlyWith
SUBJECT OBJECTPREDICATE (PROPERTY)
How it works
The Process
Calais accepts unstructured text and uses sophisticated NLP and machine learning techniques to return intelligent metadata
“Madame X” John Singer SargentpaintedBy
“Madame X” 1884paintedIn
“Madame X” John Singer SargentpaintedBy
1884
paintedIn
NODE
NODE
NODEPROPERTY
PROPERTY
“Madame X” paintingis A
John Singer Sargent painteris A
1884 dateis A
INSTANCES CLASSES
paintedBy
1884
paintedIn
painting
isA
“Madame X”
painter
isA
date
isA
John Singer Sargent
painting artworksubClassOf
painter artistsubClassOf
paintedBy madeBysubPropertyOf
paintedBy
1884
paintedIn
painting
isA
“Madame X”
painter
isA
date
isA
John Singer Sargent
artwork
subClassOf
artist
subClassOf
madeBy
subPropertyOf
painting
isA
“Madame X”
artwork
subClassOf
isA
“Madame X” artworkisA(n)
INFERRED TRIPLE
paintedBy
1884
paintedIn
painting
isA
“Madame X”
painter
isA
date
isA
John Singer Sargent
artwork
subClassOf
artist
subClassOf
madeBy
subPropertyOf
isAisA
ONTOLOGY
Using Inference for Data Integration
• In previous examples, we’ve built up an ad-hoc ontology of artists and artworks, with some Class and Property definitions.
“Madame X” John Singer Sargent
Painted By
paintedIn
1884
paintingpainter
Is A Is A
date
Is A
artwork
artist
SubClass Of SubClass Of
Made by
SubProperty of
Is A
Is A
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
This portion of a MARC XML format represents a book’s Title
<IMAGE> <PARAM> <LABEL>Object_Title</LABEL> <VALUE> <STRING>Madame X (Madame Pierre Gautreau)</STRING> </VALUE> </PARAM></IMAGE>
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
MARC
MediaBin
TMS
Ontologies Can also be IMPORTED from other formats, into triples.
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
This portion of a MARC XML format represents a book’s Title
<IMAGE> <PARAM> <LABEL>Object_Title</LABEL> <VALUE> <STRING>Madame X (Madame Pierre Gautreau)</STRING> </VALUE> </PARAM></IMAGE>
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
MARC
MediaBin
TMS
Ontologies Can also be IMPORTED from other formats, into triples.
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
XML Import: MARC XML
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_leader marc_subfieldmarc_datafieldElement Names become CLASSES
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1
isA
isA
isAisA
Element Names become CLASSES
Individual Elements become INSTANCESOf those classes
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1
isA
isA
isAisA
child
childchild
Element Names become CLASSES
Individual Elements become INSTANCESOf those classes
Parent Elements connected to childrenVia child relationship
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1
“245”
“a”
code
isA
isA
isAisA
child
childchild
tag
Element Names become CLASSES
Individual Elements become INSTANCESOf those classes
Parent Elements connected to childrenVia child relationship
Attributes become Properties
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1
“245”
“a”
“00259nz a2200109n 4500”
“John Singer Sargent and the fall of Madame X”
code
isA
isA
isAisA
child
childchild
text
texttag
Element Names become CLASSES
Individual Elements become INSTANCESOf those classes
Parent Elements connected to childrenVia child property
Attributes become Properties
Text is connected with the text property
XML Import: MARC XML
<marc_record> <marc_leader>00259nz a2200109n 4500</marc_leader> <marc_datafield tag="245" > <marc_subfield code="a">John Singer Sargent and the fall of Madame X.</marc_subfield> </marc_datafield> </marc_record>
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1
“245”
“a”
“00259nz a2200109n 4500”
“John Singer Sargent and the fall of Madame X”
code
isA
isA
isAisA
child
childchild
text
texttag
XML Import: MediaBin XML
<IMAGE> <PARAM> <LABEL>Object_Title</LABEL> <VALUE> <STRING>Madame X (Madame Pierre Gautreau)</STRING> </VALUE> </PARAM></IMAGE>
This portion of a MediaBin XML record denotes an image’s Title
IMAGE
PARAM
LABEL VALUE STRING
IMAGE1PARAM1
LABEL1VALUE1 STRING1
“Object_Title”
“Madame X (Madame Pierre Gautreau)”
isAisA
isAisA
isA
text
text
childchild
child child
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ (free) make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
Tables Become CLASSES
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
isAisA
isAisA
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
Tables Become CLASSES
Individual rows become INSTANCES
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
ObjectID
ObjectIDConstituentID
isAisA
isAisA
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
Tables Become CLASSES
Individual rows become INSTANCES
Relational Keys become Properties connecting INSTANCES
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”
“12”
“33”
“John Singer Sargent”
“16.53”
ObjectID
ObjectIDConstituentID
ID ID
AccNo
TitleName
isAisA
isAisA
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
Tables Become CLASSES
Individual rows become INSTANCES
Relational Keys become Properties connecting INSTANCES
All other columns become Properties
RDB Import: TMS
Table OBJECTS
ID AccNo
12 16.53
Table OBJECT_TITLES
ObjectID Title
12 “Madame X”
Table CONXREFS
ObjectID ConstituentID
12 33
Table CONSTITUENTS
ID Name
33 John Singer Sargent
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”
“12”
“33”
“John Singer Sargent”
“16.53”
ObjectID
ObjectIDConstituentID
ID ID
AccNo
TitleName
isAisA
isAisA
This Portion of TMS database records Represents the Title and Artist of “Madame X”
Tools like D2RQ make it possible to do this translation In real-time, from the SQL database. Data does not needto be “Imported.”
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
IMAGE
PARAM
LABEL VALUE STRING
IMAGE1PARAM1
LABEL1VALUE1 STRING1
“Object_Title”“Madame X (Madame Pierre Gautreau)”
isAisA
isAisA
isA
text text
childchild
child child
OBJECTSOBJECT_TITLES CONXREFS
CONSTITUENTS
Object12ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”“12”
“33”
“John Singer Sargent”
ObjectIDObjectID
ConstituentIDID
IDTitle
Name
isAisA
isA
isA
MARC
MediaBinTMS
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
IMAGE
PARAM
LABEL VALUE STRING
IMAGE1PARAM1
LABEL1VALUE1 STRING1
“Object_Title”“Madame X (Madame Pierre Gautreau)”
isAisA
isAisA
isA
text text
childchild
child child
OBJECTSOBJECT_TITLES CONXREFS
CONSTITUENTS
Object12ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”“12”
“33”
“John Singer Sargent”
ObjectIDObjectID
ConstituentIDID
IDTitle
Name
isAisA
isA
isA
Titles1. Image of madame x2. Object madame x3. Book with madame x as subect
Existing Triple-Based Ontologies: CIDOC
Existing Triple-Based Ontologies:
E71.Man-Made ThingE35.Title
E12.Production Event E39.Actor
P11B.participated_in
Thing1Event1
Title1
Actor1
P108B.was_produced_by
P131F.is_identified_by
P102F.has_title
“Madame X”“John Singer Sargent”
P3F.has_note
CIDOC
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
E31.Document
subClassOf
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
E35.Title
subClassOf
E31.Document
subClassOf
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
E35.TitleP102F.has_title
subClassOf
SubPropertyOf
SubPropertyOf
E31.Document
subClassOf
marc_record
marc_datafield1
marc_leader marc_subfield
marc_record1
marc_datafield
marc_leader1
marc_subfield1“245”
“a”
“John Singer Sargent and the fall of Madame X”code
isA
isAisA isA
child
childchild
text
tag
E35.TitleP102F.has_title
P3F.has_note
subClassOf
SubPropertyOf
SubPropertyOf
E31.Document
subClassOf
marc_record1
marc_subfield1
“John Singer Sargent and the fall of Madame X”
E35.Title
P102F.has_title
P3F.has_note
E31.Document
isA
IMAGE
PARAM
LABEL VALUE STRING
IMAGE1PARAM1
LABEL1VALUE1 STRING1
“Object_Title”“Madame X (Madame Pierre Gautreau)”
isAisA
isAisA
isA
text text
childchild
child child
IMAGE
PARAM
LABEL VALUE STRING
IMAGE1PARAM1
LABEL1VALUE1 STRING1
“Object_Title”“Madame X (Madame Pierre Gautreau)”
isAisA
isAisA
isA
text text
childchild
child child
E35.TitleP102F.has_title
P3F.has_note
SubPropertyOfE38.Image
subClassOfsubClassOf
SubPropertyOf
IMAGE1
STRING1
“Madame X (Madame Pierre Gautreau)”
isA
E35.Title
P102F.has_title
P3F.has_note
E38.Image
isA
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”
“12”
“33”
“John Singer Sargent”
“16.53”
ObjectID
ObjectIDConstituentID
ID ID
AccNo
TitleName
isAisA
isAisA
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”
“John Singer Sargent”
ObjectID
ObjectIDConstituentID
TitleName
isAisA
isAisA
E71.Man-Made Thing
E35.Title
E12.Production Event E39.Actor
P108B.was_produced_by
P131F.is_identified_by
P102F.has_title
P3F.has_note
OBJECTS OBJECT_TITLES CONXREFS CONSTITUENTS
Object12 ObjectTitle12 ConXRefs1233
Constituents33
“Madame X”
“John Singer Sargent”
ObjectID
ObjectIDConstituentID
TitleName
isA isAisA
isA
P102B.is_title_of
P108F.produced
P11F.had_participant
SubClassOfSubClassOf
SubClassOf
SubClassOf
subPropertyOf
subPropertyOf
inversePropertyOf
inversePropertyOf
subPropertyOf
subPropertyOf
subPropertyOf
E71.Man-Made Thing
E35.Title
E12.Production Event E39.Actor
P102F.has_title
P3F.has_note
Object12
ObjectTitle12ConXRefs1233
Constituents33
“Madame X”
“John Singer Sargent”
isA
isA
P108B.was_produced_byP11F.had_participant
isA
P131F.is_identified_by
isA
E71.Man-Made Thing
E35.Title
E12.Production EventE39.Actor
E38.Image
E31.Document
has_title
has_note
Object12
ObjectTitle12
ConXRefs1233
Constituents33
“Madame X”
“John Singer Sargent”
was_produced_by
had_participant
is_identified_by
IMAGE1
STRING1
“Madame X (Madame Pierre Gautreau)”
has_titlehas_note
marc_record1
marc_subfield1
“John Singer Sargent and the fall of Madame X”
has_title
has_note
isA
isA
isA
isA
isA
isA
isA
isA
SELECT DISTINCT?found ?node ?rootNode ?rootTextWHERE{FILTER(fn:matches(?found, ‘madame x’,’I’)).?node has_note ?found .?node composite:hasRootNode ?rootNode .?rootNode has_title ?rootTitle .?rootTitle has_note ?rootText .}
Resources - Tools
• Installing Semantic MediaWiki using Halo -http://semanticweb.org/wiki/Halo_Extension_Installation
• D2RQ (SQL to RDF tool) - http://www4.wiwiss.fu-berlin.de/bizer/d2rq/
• TopQuadrant - http://www.topquadrant.com/ (some of the ontology modeling for this pres. was done using TopBraid Composer)
• Protégé (nice free modeling tool) - http://protege.stanford.edu/
• Sesame (RDF triple store) - http://www.openrdf.org/ • Mulgara (RDF triple store) - http://www.mulgara.org/
Resources – Further Reading
• Dean Allemang & Jim Hendler, Semantic Web for the Working Ontologist
• RDF Primer - http://www.w3.org/TR/REC-rdf-syntax/ • SPARQL - http://www.w3.org/TR/rdf-sparql-query/ • Jena (application framework) - http://jena.sourceforge.net/
Additional Resources
Semantic Museum discussion group: http://groups.google.com/group/semuse
Semantic Museum wiki:http://semuse.org
These slides:http://kovenjsmith.com/pres/mcn_2008.ppt