infm 700: session 4 metadata jimmy lin the ischool university of maryland monday, february 18, 2008...
Post on 22-Dec-2015
213 views
TRANSCRIPT
INFM 700: Session 4
Metadata
Jimmy LinThe iSchoolUniversity of Maryland
Monday, February 18, 2008
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
iSchool
Today’s Topics What is metadata?
Taxonomies
Thesauri
Ontologies
Putting everything together
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Metadata Literally “data about data”
“a set of data that describes and gives information about other data” ― Oxford English Dictionary
In practical terms: Metadata helps users interpret content Metadata helps in organization, navigation, etc.
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Data without Metadata…
7/1/1988 OL 950 20.3 13 0.8 -0.1 33.1 27.8 5.3 5.927/2/1988 OL 950 24.2 12.6 1 -0.1 27.8 23.9 3.8 4.567/3/1988 OL . . . . . . . . .7/4/1988 OL 950 0.4 16.3 0.4 0.2 41 34.5 6.5 15.57/5/1988 OL 1005 32.9 18.9 1.4 0.3 29.8 23.7 6.1 14.237/6/1988 OL 1020 32.3 20.5 1.4 0.3 23.4 18.9 4.5 12.977/7/1988 OL 1015 36.8 24.9 1.7 0.5 18.6 15.3 3.2 13.927/8/1988 OL 925 42.8 25.6 2.5 0.6 23.7 19.9 3.9 15.187/9/1988 OL 945 23.3 27.8 0.7 0.8 27.7 23.5 4.3 12.337/10/1988 OL 1030 49.8 26.2 2.6 0.6 40.3 34 6.3 22.147/11/1988 OL 940 44.8 25.2 2.5 0.8 34 29.2 4.8 16.767/12/1988 OL 1010 47.6 26.9 2.6 0.7 47.3 39.6 7.7 16.137/13/1988 OL 945 36.5 22.6 1.9 0.6 36.7 32.6 4 15.57/14/1988 OL 950 19.5 18.6 0.4 0.5 302 39.1 262.9 11.077/15/1988 OL 955 31.7 15.7 1.5 0.4 29.7 25 4.7 9.497/16/1988 OL 955 23.3 14.5 1.8 0.8 23.4 20.7 2.7 8.147/17/1988 OL 1015 23.8 16.6 1.6 0.6 27.7 24.1 3.7 9.177/18/1988 OL 934 32.9 16.7 2.1 0.7 34 28.9 5.1 9.497/19/1988 OL 1010 29.2 20.4 1.9 0.7 26 22.3 3.7 10.447/20/1988 OL 952 44.8 24.8 2.1 0.8 31.7 27.5 4.2 10.757/21/1988 OL 1029 33.7 37.1 1.9 0.6 34.5 30.1 4.3 12.027/22/1988 OL 1017 34.3 32.9 2 0.7 31.4 26.2 5.1 12.657/23/1988 OL 1040 35.7 24.6 2 0.8 23.7 20.4 3.3 15.57/24/1988 OL 923 47.6 28.9 2.9 0.8 67.3 58.9 8.4 20.877/25/1988 OL 1030 58.3 32.6 2.9 0.7 68 59.3 8.7 22.147/26/1988 OL 950 49.3 29.2 3.4 0.6 86 75.1 10.9 21.197/27/1988 OL 1006 54.1 20.9 3.9 0.6 94 82.8 11.2 25.067/28/1988 OL 1010 40.5 16.5 1.7 0.3 41 34.4 6.6 6.547/29/1988 OL 1000 25.5 23.6 1.4 0.1 41 35.4 5.6 3.827/30/1988 OL 1005 47.9 17.6 0.8 0.1 18.3 15.9 2.3 4.197/31/1988 OL 1015 38 22.5 1.5 0.1 30 25.3 4.7 4.448/1/1988 OL 1018 21.2 8.8 1.1 -0.1 24.7 21.1 3.6 4.818/2/1988 OL 1004 38.5 22.8 2.1 0.3 54 46.8 7.2 9.88/3/1988 OL 1011 94 32.6 2.1 0.3 45.5 38.9 6.6 9.498/4/1988 OL 955 58.3 43.1 2.5 1.1 41 33.1 7.9 9.88/5/1988 OL 951 55.8 42.2 2.1 0.8 38 31 7 8.86
Who: authored it? to contact about data?
What: are contents of database?
When: was it collected? processed? finalized? Where: was the study done?
Why: was the data collected?
How: were data collected? processed? Verified?
… can be pretty useless!
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Types of Organizations Taxonomies
Anything organized in some sort of structure
Thesauri Addition of relations between terms Emergence of “concepts”
Ontologies Model of a domain Machine-readable
Increasing complexity and richness
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Menagerie of Terms Classification
Hierarchies
Directories
Controlled vocabularies
Knowledge representations
Let’s focus on significant differences.Let’s focus on advantages/disadvantages.Let’s focus on how each is useful.Let’s not quibble over what to exactly call each.
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Taxonomies Organization of objects according to some
principle
Familiar examples: Linnaean taxonomy (for living organisms) Web directories (e.g., Yahoo or ODP) Corporate directories Organization charts Organizational structures previously discussed
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Thesauri: Motivation “Semantic gap” between concepts and words
Words are used to evoke concepts Concrete objects: MacBook Pro, iPhone Abstract ideas: freedom, peace
ConceptsWordsIdeas
Meaning
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
To name that thing… The semantic gap: What’s the problem?
Synonymy Polysemy
Thesauri represent attempts to better organize mappings between words and concepts
Do these present precision or recall problems?
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
A slight detour… What’s a concept?
Multiple perspectives Literature Philosophy Computer science (artificial intelligence) Cognitive science
Harder to define than you think! What’s a chair? What’s a bird? Who’s a mother?
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Two Attempts First try: necessary and sufficient conditions
Second try: prototypes
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Radial Categories A category with a central prototype…
But has many cases deviating in different dimensions
Example: “Mother” Central case:
Other cases:
A mother who is and always has been female, and who gave birth to the child, supplied her half of the child's genes, nurtured the child, is married to the father, is one generation older than the child, and is the child's legal guardian.
George Lakoff. (1987) Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press.
Metadata
Taxonomies
Thesauri
Ontologies
Integration
biological mother, birth mother, surrogate mother, genetic mother, stepmother, adoptive mother, foster mother, unwed mother, etc…
iSchool
Basic Level Categories Two opposing principles in categorization
Desire for rich structure, ability to discriminate differences
Reduction of cognitive load
Basic level: the balance point
People learn basic level categories first
Eleanor Rosch. (1977) Classification of Real-World Objects: Origins and Representation of Cognition. Johnson-Laird and Wason, eds., Thinking.
Superordinate Basic Level Subordinate
Furniture Chair
Table
Dining chair, lawn chair, armchair, etc.
Dining table, folding table, kitchen table, etc.
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Relation to IA Any organization system must be sensitive to
users’ understanding of different concepts
Examples: What’s the difference between laptop, PDA, phone, and
convergence device? What documents should the system retrieval when
“mother” is the query? When a user browses a furniture catalog for chairs, do
you show them ottomans and footstools?
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Standard Thesaurus Structure
Computer
Notebook Laptop
DesktopReplacement
Ultraportable Tablet PC
IS-A
IS-A
AKASynonyms (variants)
NarrowerTerms
BroaderTerms
Preferred
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Other Thesaurus Concepts Concepts vs. Instances
~ metadata vs. content
Various relations (formal names) Synonyms Hyponyms/Hypernyms Meronym/Holonym …
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Uses of Thesauri For organization
For navigation
For indexing content
For searching
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Poly-Hierarchies Concepts can have multiple parents
Example:
Cracow (Poland : Voivodship)
Auschwitz II-Birkenau (Poland : Death Camp)
Block 25 (Auschwitz II-Birkenau)
German death camps
Kanada(Auschwitz II-Birkenau)
From Shoah Foundation’s thesaurus of holocaust terms
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Poly-Hierarchies What are the advantages and disadvantages?
What’s the relationship to polysemy?
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Faceted Hierarchies Alternative to single and poly-hierarchies
Basic idea: Describe objects along multiple facets Each facet has its associated hierarchy
Issues: What’s a facet? How do you navigate faceted hierarchies?
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Faceted Browsing Example
Demo: http://flamenco.berkeley.edu/demos.html
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Advantages of Facets Integrates searching and browsing
Easy to build complex queries
Easy to narrow, broaden, shift focus
Helps users avoid getting lost
Helps to prevent “categorization wars”
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Ontologies First, a philosophical discipline:
A branch of philosophy that deals with the nature and the organization of reality
What characterizes being? What is being?
More recently, computer science perspective Arose out of desire to build smarter machines Related concepts: knowledge representation,
knowledge engineering Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
What is an ontology? An computational artifact:
Symbols describing relevant concepts in a domain Explicit assumptions regarding the meaning and usage
of the symbols
A formal specification of a particular domain: Represents shared understanding of that domain Must be capable of manipulation by a computer
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
What’s in an ontology? Symbols representing concepts arranged
according to relevant relations
Rules or constraints governing relations between concepts
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Relationship to IA?
DatabaseWeb
ServerApplication
ServerNetwork
Ontologies are implicitly “hidden” here!!!
Flight
Trip
From:
Part-of
Airplane
Equipment
To:
Departure Time:
Arrival Time:
Origin:
Destination:
Type:
Capacity:
Rule: Arrival Time is always after Departure Time
Rule: Distance from Origin to Destination typical > 100 miles
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Grand Vision
Ontology2
General Purpose Reasoning Engine
Ontology3
Ontology1
…
Really, really, really smartmachines!
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Putting it all together…
DatabaseWeb
ServerApplication
ServerNetwork
DatabaseWeb
ServerNetwork
Two-Layer Architecture
Three-Layer Architecture
Apache mySQL
PHP
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Popular Implementation
ContentMetadata
Presentation
SQL Database
PHP/HTML
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Encoding Hierarchies
A
B C
D E F
G H
Child Parent
B A
C A
D C
E C
F C
G D
H D
Table: Hierarchy
Finding children of A:Select child from Hierarchy where parent = ‘A’ B, C
Finding parent of G:Select parent from Hierarchy where child = ‘G’ D
Finding siblings of D:find parent, and then find its children
Store in RDBMS
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Encoding Metadata
A
B C
D E F
G H
ID Attributes … Label
0001 B
0002 B
0003 C
0004 D
0005 D
0006 E
… …
Table: Items
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Content Presentation
A
B C
D E F
G H
You are here: A > C > D
Contents at D
Related - D - E
Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)
Metadata
Taxonomies
Thesauri
Ontologies
Integration
iSchool
Faceted Browsing
Matching Results
Filter by - Facet1
(possible values)
- Facet2
(possible values)
Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)
Metadata
Taxonomies
Thesauri
Ontologies
Integration