how ai programs collect concepts ernie davis concept and categories seminar sept 25, 2015
TRANSCRIPT
How AI Programs Collect Concepts
Ernie DavisConcept and Categories Seminar
Sept 25, 2015
Outline
Part I: Case studies.Systems: Probase, NELL, ConceptNet, WordNet, WikiNet, CYC.Issues: What is a concept? How are concepts collected? How are taxonomic and other relations found? Polysemy & synonymy System features: Size, evaluation, uses.
Part II: Relation to cog. psych. concepts
What do AI people want?
Cognitive plausibility?Not much. Happy to note similarities if they come up.
Philosophical coherence?No! You can quote Plato or Quine if you want to pretend to be erudite. Otherwise, reading those guys will just leave you hopelessly confused.
Human or superhuman level AI in 20-50 years?Less than you would suppose. Mostly entertainment for journalists.
What do AI people want?
To build, today, a system that does something.
Probase
Wu et al. 2012 (Google)
2.6 million concepts (categories). 20.7 million isA relations. 92.8% accuracy.
Concept: A noun (English) or a two- or three-word noun phrase. E.g. “country”, “city”, “renewable energy technology”, “common sleep disorder”, “meteorological phenomenon”.
ProbaseTaxonomy: From Hearst patterns (Hearst 1992).“Countries such as England, France and Germany”, “Bears, lions, and other animals”.
Pitfalls: “animals other than dogs such as cats”“companies such as IBM, Nokia, Proctor and Gamble”, “Europe, Brazil, China, and other countries”, “dead animals such as dogs and cats”.
Probase
Use statistical information to rule out false concepts and taxonomic relations. E.g. there are many instances of “dog” and “cat” and only a few texts that suggest that “cat” is a subcategory of “dog”.
Probase: PolysemyExample: “plants such as refineries and nuclear reactors” vs. “plants such as trees, grass, and cacti”. Rule 1: In each text the hypernym has only one category; you don’t have “plants such as refineries and trees”.Rule 2: If two lists have substantial overlap, then the hypernyms are the same. E.g. “plants such as trees, grass, and cacti” and “plants such as grass, bushes, and trees.”Use these, similar rules, to group together word uses into meanings.
NELL (Never-Ending Language Learner)
Mitchell et al. 2015. (Carnegie-Mellon).
Using web mining, constructs a taxonomy of concepts, and collects facts corresponding to fixed set of relations.
80 million beliefs (facts).
NELL: Concepts
A concept is a noun (I think). Instances are proper nouns. No attempt to address polysemy.Features used for categorization, learned in a snowballing way.Name form: E.g. “…burgh” is a city.Context: E.g. “mayor of X” → X is a city.Lists and table: If you see a list “London, Paris, Prague” and you know that London and Paris are cities, infer that Prague is a city.
NELL: Taxonomy
Seems to be largely accurate though lopsided (e.g. 9047 instances of “amphibian” vs. 0 instances of “poem”).
NELL: FactsHit and miss. Mostly unexciting. I did an experiment of collecting 110 facts from NELL.64 were true.16 were nearly true, e.g. “Dublin Dublin is the capital city of the country Ireland”.14 were false e.g. “dipodidae is an arthropod” (actually a rodent).5 were hopelessly vague e.g. “David is a person who died at age 10.”11 were meaningless. E.g. “states is a state or province located in the geopolitical location field”.
ConceptNet
• Starts with the collection of facts amassed by Open Mind Common Sense, a crowd-sourcing platform.
• “Concepts … could be noun phrases, verb phrases, adjective phrases, or clauses. ConceptNet defines concepts as the equivalence class of phrases after normalization removing function words, pronouns, and inflections.”
Relations and Patterns
• IsA• UsedFor• CapableOf• Desires• CreatedBy• PartOf• HasProperty• Causes
• NP is a kind of NP• NP is used for VP• NP can VP• NP wants to VP• You make NP by VP• NP is part of NP• NP is AP• The effect of NP|VP is NP|VP
ConceptNet
WordNet
• Hand-constructed lexicon of English (Miller 1995)• Word senses disambiguated.• Words sense related by synonym, antonym,
hypernym, hyponym, meronymy (part/whole).• A concept is a synset (a collection of synonymous
word senses).• 117,000 synsets• WordNets exist (to some extent) for almost 50
languages.
WikiNet
(Nastase et al. 2010)• A concept “roughly corresponds to a Wikipedia
article”.Defer to the wisdom of crowds (as mediated by the abstruse, bureaucratic, contentious process that is Wikipedia editing).
• Relations are extracted from text and info-boxes.• Multi-lingual Wikipedia. Entities and categories
are aligned across languages.• 90+% accuracy
Sample categories
• Members of Queen (band)• Movies directed by Woody Allen• Villages in Brandenburg• Mixed Martial Arts Television Programs
490,215 categories. 3.3M concepts.36M relations: instance 10M, subcategory 400K, spatial 4M, nationality 570K, topic 340K, genre 330K
CYC
• Long-term project (1985-present) to encode commonsense knowledge.
• Knowledge hand-coded by knowledge engineers.
• Concepts chosen to optimize the knowledge encoding; only secondarily related to NL words (at least in principle). Very precise conceptual distinctions.
CYC Concepts
“Just about anything can be reified: a particular proposition, a type of predicate, a problem solving context, an inference mechanism, etc.”
Some concepts (from the 1990 book)PaperboyDeliveringNewspapersAsABusinessEventPaperboyDeliveringNewspapersAsATravelEventEgyptIn1986, PhysicalEgyptIn1986, PoliticalEgyptIn1986YoungerThanParentsConstraint
CYC
Partly open, partly proprietary. Poorly described.• Open CYC (public): 200K concepts, 2M facts• Research CYC (can be licensed): 500K
concepts, 5M facts.
Relation to Cog. Psych.
Where are:Concept learning? Definitional vs. prototype vs. exemplar theories? Different kind of AI: Classification learning (Supervised vs. Unsupervised)Synthetic categories? Only of theoretical interest. Simple algorithms have superhuman abilities.Base level vs. sub/superordinate. ??
Hard Questions
• How many concepts? AI: 100,000s to millionsCog psych: ?? • How to evaluate recall (coverage).• What is a concept?• Relation of concepts to Mentalese primitives.• English vs. other languages: Does it make any
difference?