infm 700: session 4 metadata jimmy lin the ischool university of maryland monday, february 18, 2008...

37
INFM 700: Session 4 Metadata Jimmy Lin The iSchool University of Maryland Monday, February 18, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United St See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

INFM 700: Session 4

Metadata

Jimmy LinThe iSchoolUniversity of Maryland

Monday, February 18, 2008

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

iSchool

Today’s Topics What is metadata?

Taxonomies

Thesauri

Ontologies

Putting everything together

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Metadata Literally “data about data”

“a set of data that describes and gives information about other data” ― Oxford English Dictionary

In practical terms: Metadata helps users interpret content Metadata helps in organization, navigation, etc.

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Data without Metadata…

7/1/1988 OL 950 20.3 13 0.8 -0.1 33.1 27.8 5.3 5.927/2/1988 OL 950 24.2 12.6 1 -0.1 27.8 23.9 3.8 4.567/3/1988 OL . . . . . . . . .7/4/1988 OL 950 0.4 16.3 0.4 0.2 41 34.5 6.5 15.57/5/1988 OL 1005 32.9 18.9 1.4 0.3 29.8 23.7 6.1 14.237/6/1988 OL 1020 32.3 20.5 1.4 0.3 23.4 18.9 4.5 12.977/7/1988 OL 1015 36.8 24.9 1.7 0.5 18.6 15.3 3.2 13.927/8/1988 OL 925 42.8 25.6 2.5 0.6 23.7 19.9 3.9 15.187/9/1988 OL 945 23.3 27.8 0.7 0.8 27.7 23.5 4.3 12.337/10/1988 OL 1030 49.8 26.2 2.6 0.6 40.3 34 6.3 22.147/11/1988 OL 940 44.8 25.2 2.5 0.8 34 29.2 4.8 16.767/12/1988 OL 1010 47.6 26.9 2.6 0.7 47.3 39.6 7.7 16.137/13/1988 OL 945 36.5 22.6 1.9 0.6 36.7 32.6 4 15.57/14/1988 OL 950 19.5 18.6 0.4 0.5 302 39.1 262.9 11.077/15/1988 OL 955 31.7 15.7 1.5 0.4 29.7 25 4.7 9.497/16/1988 OL 955 23.3 14.5 1.8 0.8 23.4 20.7 2.7 8.147/17/1988 OL 1015 23.8 16.6 1.6 0.6 27.7 24.1 3.7 9.177/18/1988 OL 934 32.9 16.7 2.1 0.7 34 28.9 5.1 9.497/19/1988 OL 1010 29.2 20.4 1.9 0.7 26 22.3 3.7 10.447/20/1988 OL 952 44.8 24.8 2.1 0.8 31.7 27.5 4.2 10.757/21/1988 OL 1029 33.7 37.1 1.9 0.6 34.5 30.1 4.3 12.027/22/1988 OL 1017 34.3 32.9 2 0.7 31.4 26.2 5.1 12.657/23/1988 OL 1040 35.7 24.6 2 0.8 23.7 20.4 3.3 15.57/24/1988 OL 923 47.6 28.9 2.9 0.8 67.3 58.9 8.4 20.877/25/1988 OL 1030 58.3 32.6 2.9 0.7 68 59.3 8.7 22.147/26/1988 OL 950 49.3 29.2 3.4 0.6 86 75.1 10.9 21.197/27/1988 OL 1006 54.1 20.9 3.9 0.6 94 82.8 11.2 25.067/28/1988 OL 1010 40.5 16.5 1.7 0.3 41 34.4 6.6 6.547/29/1988 OL 1000 25.5 23.6 1.4 0.1 41 35.4 5.6 3.827/30/1988 OL 1005 47.9 17.6 0.8 0.1 18.3 15.9 2.3 4.197/31/1988 OL 1015 38 22.5 1.5 0.1 30 25.3 4.7 4.448/1/1988 OL 1018 21.2 8.8 1.1 -0.1 24.7 21.1 3.6 4.818/2/1988 OL 1004 38.5 22.8 2.1 0.3 54 46.8 7.2 9.88/3/1988 OL 1011 94 32.6 2.1 0.3 45.5 38.9 6.6 9.498/4/1988 OL 955 58.3 43.1 2.5 1.1 41 33.1 7.9 9.88/5/1988 OL 951 55.8 42.2 2.1 0.8 38 31 7 8.86

Who: authored it? to contact about data?

What: are contents of database?

When: was it collected? processed? finalized? Where: was the study done?

Why: was the data collected?

How: were data collected? processed? Verified?

… can be pretty useless!

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Early Example of Metadata

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Types of Organizations Taxonomies

Anything organized in some sort of structure

Thesauri Addition of relations between terms Emergence of “concepts”

Ontologies Model of a domain Machine-readable

Increasing complexity and richness

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Menagerie of Terms Classification

Hierarchies

Directories

Controlled vocabularies

Knowledge representations

Let’s focus on significant differences.Let’s focus on advantages/disadvantages.Let’s focus on how each is useful.Let’s not quibble over what to exactly call each.

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Taxonomies Organization of objects according to some

principle

Familiar examples: Linnaean taxonomy (for living organisms) Web directories (e.g., Yahoo or ODP) Corporate directories Organization charts Organizational structures previously discussed

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Thesauri: Motivation “Semantic gap” between concepts and words

Words are used to evoke concepts Concrete objects: MacBook Pro, iPhone Abstract ideas: freedom, peace

ConceptsWordsIdeas

Meaning

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

To name that thing… The semantic gap: What’s the problem?

Synonymy Polysemy

Thesauri represent attempts to better organize mappings between words and concepts

Do these present precision or recall problems?

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

A slight detour… What’s a concept?

Multiple perspectives Literature Philosophy Computer science (artificial intelligence) Cognitive science

Harder to define than you think! What’s a chair? What’s a bird? Who’s a mother?

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Two Attempts First try: necessary and sufficient conditions

Second try: prototypes

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Radial Categories A category with a central prototype…

But has many cases deviating in different dimensions

Example: “Mother” Central case:

Other cases:

A mother who is and always has been female, and who gave birth to the child, supplied her half of the child's genes, nurtured the child, is married to the father, is one generation older than the child, and is the child's legal guardian.

George Lakoff. (1987) Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press.

Metadata

Taxonomies

Thesauri

Ontologies

Integration

biological mother, birth mother, surrogate mother, genetic mother, stepmother, adoptive mother, foster mother, unwed mother, etc…

iSchool

Basic Level Categories Two opposing principles in categorization

Desire for rich structure, ability to discriminate differences

Reduction of cognitive load

Basic level: the balance point

People learn basic level categories first

Eleanor Rosch. (1977) Classification of Real-World Objects: Origins and Representation of Cognition. Johnson-Laird and Wason, eds., Thinking.

Superordinate Basic Level Subordinate

Furniture Chair

Table

Dining chair, lawn chair, armchair, etc.

Dining table, folding table, kitchen table, etc.

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Relation to IA Any organization system must be sensitive to

users’ understanding of different concepts

Examples: What’s the difference between laptop, PDA, phone, and

convergence device? What documents should the system retrieval when

“mother” is the query? When a user browses a furniture catalog for chairs, do

you show them ottomans and footstools?

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Standard Thesaurus Structure

Computer

Notebook Laptop

DesktopReplacement

Ultraportable Tablet PC

IS-A

IS-A

AKASynonyms (variants)

NarrowerTerms

BroaderTerms

Preferred

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Other Thesaurus Concepts Concepts vs. Instances

~ metadata vs. content

Various relations (formal names) Synonyms Hyponyms/Hypernyms Meronym/Holonym …

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Uses of Thesauri For organization

For navigation

For indexing content

For searching

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Poly-Hierarchies Concepts can have multiple parents

Example:

Cracow (Poland : Voivodship)

Auschwitz II-Birkenau (Poland : Death Camp)

Block 25 (Auschwitz II-Birkenau)

German death camps

Kanada(Auschwitz II-Birkenau)

From Shoah Foundation’s thesaurus of holocaust terms

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Poly-Hierarchies What are the advantages and disadvantages?

What’s the relationship to polysemy?

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Faceted Hierarchies Alternative to single and poly-hierarchies

Basic idea: Describe objects along multiple facets Each facet has its associated hierarchy

Issues: What’s a facet? How do you navigate faceted hierarchies?

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Faceted Browsing Example

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Faceted Browsing Example

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Faceted Browsing Example

Demo: http://flamenco.berkeley.edu/demos.html

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Advantages of Facets Integrates searching and browsing

Easy to build complex queries

Easy to narrow, broaden, shift focus

Helps users avoid getting lost

Helps to prevent “categorization wars”

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Ontologies First, a philosophical discipline:

A branch of philosophy that deals with the nature and the organization of reality

What characterizes being? What is being?

More recently, computer science perspective Arose out of desire to build smarter machines Related concepts: knowledge representation,

knowledge engineering Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

What is an ontology? An computational artifact:

Symbols describing relevant concepts in a domain Explicit assumptions regarding the meaning and usage

of the symbols

A formal specification of a particular domain: Represents shared understanding of that domain Must be capable of manipulation by a computer

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

What’s in an ontology? Symbols representing concepts arranged

according to relevant relations

Rules or constraints governing relations between concepts

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Relationship to IA?

DatabaseWeb

ServerApplication

ServerNetwork

Ontologies are implicitly “hidden” here!!!

Flight

Trip

From:

Part-of

Airplane

Equipment

To:

Departure Time:

Arrival Time:

Origin:

Destination:

Type:

Capacity:

Rule: Arrival Time is always after Departure Time

Rule: Distance from Origin to Destination typical > 100 miles

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Grand Vision

Ontology2

General Purpose Reasoning Engine

Ontology3

Ontology1

Really, really, really smartmachines!

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Putting it all together…

DatabaseWeb

ServerApplication

ServerNetwork

DatabaseWeb

ServerNetwork

Two-Layer Architecture

Three-Layer Architecture

Apache mySQL

PHP

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Popular Implementation

ContentMetadata

Presentation

SQL Database

PHP/HTML

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Encoding Hierarchies

A

B C

D E F

G H

Child Parent

B A

C A

D C

E C

F C

G D

H D

Table: Hierarchy

Finding children of A:Select child from Hierarchy where parent = ‘A’ B, C

Finding parent of G:Select parent from Hierarchy where child = ‘G’ D

Finding siblings of D:find parent, and then find its children

Store in RDBMS

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Encoding Metadata

A

B C

D E F

G H

ID Attributes … Label

0001 B

0002 B

0003 C

0004 D

0005 D

0006 E

… …

Table: Items

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Content Presentation

A

B C

D E F

G H

You are here: A > C > D

Contents at D

Related - D - E

Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Faceted Browsing

Matching Results

Filter by - Facet1

(possible values)

- Facet2

(possible values)

Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

Metadata

Taxonomies

Thesauri

Ontologies

Integration

iSchool

Today’s Topics What is metadata?

Taxonomies

Thesauri

Ontologies

Putting everything together

Metadata

Taxonomies

Thesauri

Ontologies

Integration