introduction to natural language processing

42
Introduction to Natural Language Processing A.k.a., “Computational Linguistics”

Upload: aqua

Post on 24-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Natural Language Processing. A.k.a., “Computational Linguistics”. Recall: Agents and Environment. Environment. Agent. percepts. sensors. ?. actions. actuators. Agents and Environments with NLP. Agent. Environment. sensors. Speech, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction  to Natural Language Processing

Introduction toNatural Language Processing

A.k.a., “Computational Linguistics”

Page 2: Introduction  to Natural Language Processing

Recall: Agents and Environment

AgentEnvironment

sensors

actuators

percepts

actions

?

Page 3: Introduction  to Natural Language Processing

Agents and Environments with NLP

Agent Environmentsensors

actuators

1. What do the other agents claim to believe?(NL Understanding)

2. What do the other agents actually believe or want?(Plan recognition, game theory)

3. How can I make the other agents believe X?(Planning, NL Generation)

Agent

Agent

Agent

Speech, Handwriting, printed text, digital text

Speech, Handwriting, printed text, digital text

Page 4: Introduction  to Natural Language Processing

WHAT IS LANGUAGE?• Definition with respect to form:

Language is a system of speech symbols. It is realized acoustically (sound waves), visually-spatially (sign language) and in written form.

• Definition with respect to function:

Language is the most important means of human communication. It is used to convey and exchange information (informative function)

• Multiplicity of languages:

We know of about 7000 languages, which is estimated to be about 1% of all the languages that ever existed.

Page 5: Introduction  to Natural Language Processing

LANGUAGE AND THE BRAIN

Page 6: Introduction  to Natural Language Processing

LANGUAGE AND THE BRAIN

Page 7: Introduction  to Natural Language Processing

THEORIES OF LANGUAGE

• Noam Chomsky claims that language is innate.

• B. F. Skinner claims that language is learned; it is basically a stimulus-response mechanism.

Page 8: Introduction  to Natural Language Processing

WHAT IS GRAMMAR?• When we learn a language we also learn the rules that govern how language elements, such as words, are combined to produce meaningful language.

• These elements and rules constitute the Grammar of a language.

• The Grammar is “what we know”

• Grammar represents our linguistic competence.

Page 9: Introduction  to Natural Language Processing

DESCRIPTIVE vs PRESCRIPTIVEGRAMMAR

Prescriptive

(should be)

Descriptive

(is)

Page 10: Introduction  to Natural Language Processing

Areas of Linguistics• phonetics - the study of speech sounds• phonology - the study of sound systems• morphology- the rules of word formation• syntax - the rules of sentence formation• semantics - the study of word meanings• pragmatics – the study of discourse meanings• sociolinguistics - the study of language in society• applied linguistics –the application of the methods and results of linguistics to such areas as language teaching, national language policies, lexicography, translation, language in politics etc.

Page 11: Introduction  to Natural Language Processing

What is the meaning of ‘meaning’?

• Learning a language includes learning the “agreed upon” meanings of certain strings of sounds and,

• Learning how to combine these meaningful units into larger units which also convey meaning.

Page 12: Introduction  to Natural Language Processing

Morphemes

• Morpheme is the smallest linguistic unit that has meaning.

• Morpheme is a grammatical unit in which there is an arbitrary union of sound and a meaning and,

• which cannot be further analysed (broken down into parts that have meaning).

Page 13: Introduction  to Natural Language Processing

Morphemes

• A morpheme may be represented by a single sound:

• e.g. the plural morpheme [s] in cat+s• A morpheme may be represented by a

syllable (monosyllabic):• e.g. child+ish

Page 14: Introduction  to Natural Language Processing

Morphemes

A morpheme may be represented by more than one syllable (polysyllabic):

• e.g. lady, wateror three syllables:• e.g. crocodile or four syllables:• e.g. salamander

Page 15: Introduction  to Natural Language Processing

15

Words• Two basic ways to form words

– Inflectional (e.g. English verbs + endings other English verbs)• Open + ed = opened• Open + ing = opening

– Derivational (e.g. adverbs from adjectives, nouns from adjectives)• Happy happily• Happy happiness (nouns from adjectives)

Page 16: Introduction  to Natural Language Processing

16

SyntaxThe study of classes of words (nouns, verbs, etc.)

and the rules that govern how the words can combine to make phrases and sentences.

Page 17: Introduction  to Natural Language Processing

17

Basic classes of words• Classes of words aka parts of speech (POS)

– Nouns– Verbs– Adjectives– Adverbs

• The above classes of word belong to the type open class words

• We also have closed class words, or function words– Articles, pronouns, prepositions, particles, quantifiers, conjunctions

Page 18: Introduction  to Natural Language Processing

18

Basic phrases

• A word from an open class can be used to form the basis of a phrase

• The basis of a phrase is called the head

Page 19: Introduction  to Natural Language Processing

19

Examples of phrases• Noun phrases– The manager of the institute– Her worry to pass the exams– Several students from the English Department

• Adjective phrases– easy to understand– mad as a dog– glad that he passed the exam

Page 20: Introduction  to Natural Language Processing

20

Examples of phrases

• Adverb phrases– fast like the wind– outside the building

• Verb phrases– ate her sandwich– went to the doctor– believed what I told him

Page 21: Introduction  to Natural Language Processing

21

Grammars and parsing

• syntactic parsing:Determining the syntactic structure of a sentence

• Basic steps– Identify sentence boundaries– Identify what part of speech is each word– Identify pairs of words that form phrases– Identify pairs of phrases that form larger phrases

Page 22: Introduction  to Natural Language Processing

Context Free Grammar

• S -> NP VP• NP -> det (adj) N• NP -> Proper N• NP -> N• VP -> V, VP -> V PP• VP -> V NP• VP -> V NP PP, PP -> Prep NP• VP -> V NP NP

22

Page 23: Introduction  to Natural Language Processing

23

Parses

V PP

VP

S

NP

the

the mat

satcat

onNPPrep

The cat sat on the mat

DetN

Det N

Page 24: Introduction  to Natural Language Processing

24

Parses

VPP

VP

S

NP

time

an arrow

flies

likeNPPrep

Time flies like an arrow.

N

Det N

Page 25: Introduction  to Natural Language Processing

25

Parses

V NP

VP

S

NP

flies like

anNDet

Time flies like an arrow.

Ntime

arrow

N

Page 26: Introduction  to Natural Language Processing

26

Semantics and Pragmatics

Semantics: the study of meaning that can be determined from a sentence, phrase or word.

Pragmatics: the study of meaning, as it depends on context (speaker, situation)

Page 27: Introduction  to Natural Language Processing

27

Language to Logic

• John went to a book store. s . bookstore(s) ^ go(John, s)

• Every boy loves a girl. ∀b . boy(b) g . girl(g) ^ loves(b, g)∃

• Who broke the vase? λx . broke(x, vase17)

Page 28: Introduction  to Natural Language Processing

28

Headlines• Police Begin Campaign To Run Down Jaywalkers

• Iraqi Head Seeks Arms

• Teacher Strikes Idle Kids

• Miners Refuse To Work After Death

• Juvenile Court To Try Shooting Defendant

Page 29: Introduction  to Natural Language Processing

Language Families

Page 30: Introduction  to Natural Language Processing

30

NLP tends to focus on:

• Syntax– Grammars, parsers, parse trees, dependency

structures• Semantics– Subcategorization frames, semantic classes,

ontologies, formal semantics• Pragmatics– Pronouns, reference resolution, discourse

models

Page 31: Introduction  to Natural Language Processing

31

Issues in NLP

• Ambiguity

• Lack of Knowledge – it’s needed for understanding, but computers don’t have it

Page 32: Introduction  to Natural Language Processing

Ambiguity

• Computational linguists are obsessed with ambiguity

• Ambiguity is a fundamental problem of computational linguistics

• Resolving ambiguity is a crucial goal

Page 33: Introduction  to Natural Language Processing

Ambiguity

• Find at least 5 meanings of this sentence:– I made her duck

Page 34: Introduction  to Natural Language Processing

Ambiguity• Find at least 5 meanings of this sentence:

– I made her duck• I cooked waterfowl for her benefit (to eat)• I cooked waterfowl belonging to her• I created the (plaster?) duck she owns• I caused her to quickly lower her head or body• I waved my magic wand and turned her into undifferentiated waterfowl• At least one other meaning that’s inappropriate for gentle company.

Page 35: Introduction  to Natural Language Processing

Ambiguity is Pervasive• I caused her to quickly lower her head or body

– Lexical category: “duck” can be a N or V• I cooked waterfowl belonging to her.

– Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun

• I made the (plaster) duck statue she owns– Lexical Semantics: “make” can mean “create” or “cook”

Page 36: Introduction  to Natural Language Processing

Ambiguity is Pervasive• Grammar: Make can be:– Transitive: (verb has a noun direct object)

• I cooked [waterfowl belonging to her]

– Ditransitive: (verb has 2 noun objects)• I made [her] (into) [undifferentiated waterfowl]

– Action-transitive (verb has a direct object and another verb)

– I caused [her] [to move her body]

Page 37: Introduction  to Natural Language Processing

Ambiguity is Pervasive• Phonetics!

– I mate or duck– I’m eight or duck– Eye maid; her duck– Aye mate, her duck– I maid her duck– I’m aid her duck– I mate her duck– I’m ate her duck– I’m ate or duck– I mate or duck

Page 38: Introduction  to Natural Language Processing

Kinds of knowledge needed?

• Consider the following interaction with HAL the computer from 2001: A Space Odyssey

• Dave: Open the pod bay doors, Hal.• HAL: I’m sorry Dave, I’m afraid I can’t do

that.

Page 39: Introduction  to Natural Language Processing

Knowledge needed to build HAL?

• Speech recognition and synthesis– Dictionaries (how words are pronounced)– Phonetics (how to recognize/produce each sound of English)

• Natural language understanding– Knowledge of the English words involved

• What they mean• How they combine (what is a `pod bay door’?)

– Knowledge of syntactic structure• I’m I do, Sorry that afraid Dave I’m can’t

Page 40: Introduction  to Natural Language Processing

What’s needed?

• Dialog and pragmatic knowledge– “open the door” is a REQUEST (as opposed to a

STATEMENT or information-question)– It is polite to respond, even if you’re planning to

kill someone.– It is polite to pretend to want to be cooperative

(I’m afraid I can’t…)– What is `that’ in `I can’t do that’?

• Even a system to book airline flights needs much of this kind of knowledge

Page 41: Introduction  to Natural Language Processing

Computational models of how natural languages work

These are sometimes called Language Models or sometimes Grammars

Three main types (among many others):1. Document models, or “topic” models2. Sequence models: Markov models, HMMs,

others3. Context-free grammar models

Page 42: Introduction  to Natural Language Processing

Computational models of how natural languages work

Most of the models I will show you are - Probabilistic models- Graphical models- Generative modelsIn other words, they are essentially Bayes Nets.

In addition, many (but not all) are- Latent variable modelsThis means that some variables in the model are not observed in data, and must be inferred.(Like the hidden states in an HMM.)