how ai programs collect concepts ernie davis concept and categories seminar sept 25, 2015

23
How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Upload: myra-robinson

Post on 17-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

How AI Programs Collect Concepts

Ernie DavisConcept and Categories Seminar

Sept 25, 2015

Page 2: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Outline

Part I: Case studies.Systems: Probase, NELL, ConceptNet, WordNet, WikiNet, CYC.Issues: What is a concept? How are concepts collected? How are taxonomic and other relations found? Polysemy & synonymy System features: Size, evaluation, uses.

Part II: Relation to cog. psych. concepts

Page 3: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

What do AI people want?

Cognitive plausibility?Not much. Happy to note similarities if they come up.

Philosophical coherence?No! You can quote Plato or Quine if you want to pretend to be erudite. Otherwise, reading those guys will just leave you hopelessly confused.

Human or superhuman level AI in 20-50 years?Less than you would suppose. Mostly entertainment for journalists.

Page 4: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

What do AI people want?

To build, today, a system that does something.

Page 5: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Probase

Wu et al. 2012 (Google)

2.6 million concepts (categories). 20.7 million isA relations. 92.8% accuracy.

Concept: A noun (English) or a two- or three-word noun phrase. E.g. “country”, “city”, “renewable energy technology”, “common sleep disorder”, “meteorological phenomenon”.

Page 6: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

ProbaseTaxonomy: From Hearst patterns (Hearst 1992).“Countries such as England, France and Germany”, “Bears, lions, and other animals”.

Pitfalls: “animals other than dogs such as cats”“companies such as IBM, Nokia, Proctor and Gamble”, “Europe, Brazil, China, and other countries”, “dead animals such as dogs and cats”.

Page 7: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Probase

Use statistical information to rule out false concepts and taxonomic relations. E.g. there are many instances of “dog” and “cat” and only a few texts that suggest that “cat” is a subcategory of “dog”.

Page 8: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Probase: PolysemyExample: “plants such as refineries and nuclear reactors” vs. “plants such as trees, grass, and cacti”. Rule 1: In each text the hypernym has only one category; you don’t have “plants such as refineries and trees”.Rule 2: If two lists have substantial overlap, then the hypernyms are the same. E.g. “plants such as trees, grass, and cacti” and “plants such as grass, bushes, and trees.”Use these, similar rules, to group together word uses into meanings.

Page 9: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

NELL (Never-Ending Language Learner)

Mitchell et al. 2015. (Carnegie-Mellon).

Using web mining, constructs a taxonomy of concepts, and collects facts corresponding to fixed set of relations.

80 million beliefs (facts).

Page 10: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

NELL: Concepts

A concept is a noun (I think). Instances are proper nouns. No attempt to address polysemy.Features used for categorization, learned in a snowballing way.Name form: E.g. “…burgh” is a city.Context: E.g. “mayor of X” → X is a city.Lists and table: If you see a list “London, Paris, Prague” and you know that London and Paris are cities, infer that Prague is a city.

Page 11: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

NELL: Taxonomy

Seems to be largely accurate though lopsided (e.g. 9047 instances of “amphibian” vs. 0 instances of “poem”).

Page 12: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

NELL: FactsHit and miss. Mostly unexciting. I did an experiment of collecting 110 facts from NELL.64 were true.16 were nearly true, e.g. “Dublin Dublin is the capital city of the country Ireland”.14 were false e.g. “dipodidae is an arthropod” (actually a rodent).5 were hopelessly vague e.g. “David is a person who died at age 10.”11 were meaningless. E.g. “states is a state or province located in the geopolitical location field”.

Page 13: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

ConceptNet

• Starts with the collection of facts amassed by Open Mind Common Sense, a crowd-sourcing platform.

• “Concepts … could be noun phrases, verb phrases, adjective phrases, or clauses. ConceptNet defines concepts as the equivalence class of phrases after normalization removing function words, pronouns, and inflections.”

Page 14: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Relations and Patterns

• IsA• UsedFor• CapableOf• Desires• CreatedBy• PartOf• HasProperty• Causes

• NP is a kind of NP• NP is used for VP• NP can VP• NP wants to VP• You make NP by VP• NP is part of NP• NP is AP• The effect of NP|VP is NP|VP

Page 15: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

ConceptNet

Page 16: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

WordNet

• Hand-constructed lexicon of English (Miller 1995)• Word senses disambiguated.• Words sense related by synonym, antonym,

hypernym, hyponym, meronymy (part/whole).• A concept is a synset (a collection of synonymous

word senses).• 117,000 synsets• WordNets exist (to some extent) for almost 50

languages.

Page 17: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

WikiNet

(Nastase et al. 2010)• A concept “roughly corresponds to a Wikipedia

article”.Defer to the wisdom of crowds (as mediated by the abstruse, bureaucratic, contentious process that is Wikipedia editing).

• Relations are extracted from text and info-boxes.• Multi-lingual Wikipedia. Entities and categories

are aligned across languages.• 90+% accuracy

Page 18: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Sample categories

• Members of Queen (band)• Movies directed by Woody Allen• Villages in Brandenburg• Mixed Martial Arts Television Programs

490,215 categories. 3.3M concepts.36M relations: instance 10M, subcategory 400K, spatial 4M, nationality 570K, topic 340K, genre 330K

Page 19: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

CYC

• Long-term project (1985-present) to encode commonsense knowledge.

• Knowledge hand-coded by knowledge engineers.

• Concepts chosen to optimize the knowledge encoding; only secondarily related to NL words (at least in principle). Very precise conceptual distinctions.

Page 20: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

CYC Concepts

“Just about anything can be reified: a particular proposition, a type of predicate, a problem solving context, an inference mechanism, etc.”

Some concepts (from the 1990 book)PaperboyDeliveringNewspapersAsABusinessEventPaperboyDeliveringNewspapersAsATravelEventEgyptIn1986, PhysicalEgyptIn1986, PoliticalEgyptIn1986YoungerThanParentsConstraint

Page 21: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

CYC

Partly open, partly proprietary. Poorly described.• Open CYC (public): 200K concepts, 2M facts• Research CYC (can be licensed): 500K

concepts, 5M facts.

Page 22: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Relation to Cog. Psych.

Where are:Concept learning? Definitional vs. prototype vs. exemplar theories? Different kind of AI: Classification learning (Supervised vs. Unsupervised)Synthetic categories? Only of theoretical interest. Simple algorithms have superhuman abilities.Base level vs. sub/superordinate. ??

Page 23: How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Hard Questions

• How many concepts? AI: 100,000s to millionsCog psych: ?? • How to evaluate recall (coverage).• What is a concept?• Relation of concepts to Mentalese primitives.• English vs. other languages: Does it make any

difference?