probabilistic grammars

1

Christopher Manning

CS300 talk – Fall [email protected]

http://nlp.stanford.edu/~manning/

2

Research areas of interest:

NLP/CL• Statistical NLP models: Combining linguistic and

statistical sophistication• NLP and ML methods for extracting meaning

relations from webpages, medical texts, etc.• Information extraction and text mining• Lexical and structural acquisition from raw text• Using robust NLP: dialect/style, readability, … • Using pragmatics, genre, NLP in web searching• Computational lexicography and the visualization

of linguistic information

3

Models for language

• What is the motivation for statistical models for understanding language?

• From the beginning, logics and logical reasoning were invented for handling natural language understanding

• Logics have a language-like form that draws from and meshes well with natural languages

• Where are the numbers?

4

Sophisticated grammars for NL

• From NP Det Adj* N• there developedprecise andsophisticatedgrammarformalisms(such as LFG,HPSG)

5

The Problem of Ambiguity

• Any broad-coverage grammar is hugely ambiguous (often hundreds of parses for 20+ word sentences).

• Making the grammar more comprehensive only makes the ambiguity problem get worse.

• Traditional (symbolic) NLP methods don’t provide a solution.– Selectional restrictions fail because creative/

metaphorical use of language is everywhere:• I swallowed his story• The supernova swallowed up the planet

6

The problem of ambiguity close up

• “The post office will hold out discounts and service concessions as incentives.”

• 12 words. Real language. At least 83 parses.

8

Statistical NLP methods• P(to | Sarah drove)• P(time is verb | Time flies like an arrow)• P(NP Det Adj N | mother = VP[drive] )• Statistical NLP methods:

– Estimate grammar parameters by gathering counts from texts or structured analyses of texts

– Assign probabilities to various things to determine the likelihood of word sequences, sentence structure, and interpretation

9

Probabilistic Context-Free Grammars

NP Det N: 0.4NP NPposs N: 0.1NP Pronoun: 0.2NP NP PP: 0.1NP N: 0.2

NP

Det

NP

N

PP

P(subtree above) = 0.1 x 0.4 = 0.04

10

Why Probabilistic Grammars?

• The predictions about grammaticality and ambi-guity of categorical grammars are not in accord with human perceptions or engineering needs.

• Categorical grammars aren’t predictive– They don’t tell us what “sounds natural”

• Probabilistic grammars model error tolerance, online lexical acquisition, … and have been amazingly successful as an engineering tool

• They capture a lot of world knowledge for free• Relevant to linguistic change and variation, too!

11

Example: near• In Middle English, was an adjective [Maling]• But, today, is it an adjective or a preposition?

– The near side of the moon– We were near the station

• Not just a word with multiple parts of speech! There is evidence of blending:– We were nearer the bus stop than the train– He has never been nearer the center of the

financial establishment

12

Research aim

• Most current statistical models are quite simple (linguistically and also statistically)

• Aim: To combine the good features of statistical NLP methods with the sophistication of rich linguistic analyses.

13

Lexicalising a CFG

VP[looked]

V[looked]

looked

PP[inside]

P[inside] NP[box]

D[the] N[box]

the box

•A lexicalized CFG can capture probabilistic dependencies between words

14

Left-corner parsing

• The memory requirements of standard parsers do not match human linguistic processing. What humans find hardest – center embedding:– *The man that the woman the priest met

knows couldn’t help• is really the bread-and-butter of standard CFG

parsing:– (((a + b)))

• As an alternative, left-corner parsing does capture this.

15

Parsing and (stack) complexity

• She ruled that the contract between the union and company dictated that claims from both sides should be bargained over or arbitrated.

16

Tree geometry vs. stack depth

• Kim’s friend’s mother’s car smells.

• Kim thinks Sandy knows she likes green apples.

• The rat that the cat that Kim likes chased died

TD LC BU5 1 1

1 1 7

3 3 7

17

Probabilistic Left-Corner Grammars

• Use richer probabilistic conditioning– Left corner and goal category rather

than just parent• P(NP Det Adj N | Det, S)

• Allow left-to-right online parsing (whichcan hope to explain how people buildpartial interpretations online)• Easy integration with lexicalization,part-of-speech tagging models, etc.

S

NP

Det Adj N

18

Probabilistic Head-driven Grammars

• The heads of phrases are the source of the main constraining information about a sentence structure

• We work out from heads by following the dependency order of the sentence

• The crucial property is that we have always built – and have available to us for conditioning – all governing heads and all less oblique dependents of the same head

• We can also easily integrate phrase length

19

Information from the web: The problem

• When people see web pages, they understand their meaning – By and large. To the extent that they don’t,

there’s a gradual degradation• When computers see web pages, they get only

character strings and HTML tags

20

The human view

21

The intelligent agent view

<HTML> <HEAD><TITLE>Ford Motor Company - Home Page</title><META NAME="Keywords" CONTENT="cars, automobiles, trucks, SUV,

mazda, volvo, lincoln, mercury, jaguar, aston martin, ford"><META NAME="description" CONTENT="Ford Motor Company corporate

home page"><SCRIPT LANGUAGE="JavaScript1.2"> … </SCRIPT><DIV ID=trustmarkDiv> <TABLE BORDER="0" CELLPADDING=0 CELLSPACING=0 WIDTH=768> <TR><TD WIDTH=768 ALIGN=CENTER> <A HREF="default.asp?

pageid=473" onmouseover="logoOver('fordscript');rolloverText('ht0')" onmouseout="logoOut('fordscript');rolloverText('ht0')"><img border="0" src="images/homepage/fordscript.gif" ALT="Learn more about Ford Motor Company" WIDTH="521" HEIGHT="39"></A><br>

… </TD></TR></TABLE></DIV> </BODY></HTML>

22

The problem (cont.)

• We'd like computers to see meanings as well, so that computer agents could more intelligently process the web

• These desires have led to XML, RDF, agent markup languages, and a host of other proposals and technologies which attempt to impose more syntax and semantics on the web – in order to make life easier for agents.

23

Thesis

• The problem can’t and won’t be solved by mandating a universal semantics for the web

• The solution is rather agents that can ‘understand’ the human web by text and image processing

24

(1) The semantics

• Are there adequate and adequately understood methods for marking up pages with such a consistent semantics, in such a way that it would support simple reasoning by agents?

• No.

25

What are some AI people saying?

“Anyone familiar with AI must realize that the study of knowledge representation—at least as it applies to the “commensense” knowledge required for reading typical texts such as newspapers—is not going anywhere fast. This subfield of AI has become notorious for the production of countless non-monotonic logics and almost as many logics of knowledge and belief, and none of the work shows any obvious application to actual knowledge-representation problems. Indeed, the only person who has had the courage to actually try to create large knowledge bases full of commonsense knowledge, Doug Lenat …, is believed by everyone save himself to be failing in his attempt.” (Charniak 1993:xvii–xviii)

26

(2) Pragmatics not semantics

pragmatic relating to matters of fact or practical affairs often to the exclusion of intellectual or artistic matters

pragmatics linguistics concerned with the relationship of the meaning of sentences to their meaning in the environment in which they occur

• A lot of the meaning in web pages (as in any communication) derives from the context – what is referred to in the philosophy of language tradition as pragmatics

• Communication is situated

27

Pragmatics on the web

• Information supplied is incomplete – humans will interpret it– Numbers are often missing units– A “rubber band” for sale at a stationery site is a

very different item to a rubber band on a metal lathe

– A “sidelight” means something different to a glazier than to a regular person

• Humans will evaluate content using information about the site, and the style of writing– value filtering

28

(3) The world changes

• The way in which business is being done is changing at an astounding rate– or at least that’s what the ads from e

business companies scream at us• Semantic needs and usages evolve (like

languages) more rapidly than standards (cf. the Académie française)

• People use words that aren’t in the dictionary.• Their listeners understand them.

29

(4) Interoperation

Ontology: a shared formal conceptualization of a particular domain

• Meaning transfer frequently has to occur across the subcommunities that are currently designing *ML languages, and then all the problems reappear, and the current proposals don't do much to help

30

Many products cross industries

http://www.interfilm-usa.com/Polyester.htm

• Interfilm offers a complete range of SKC's Skyrol® brand polyester films for use in a wide variety of packaging and industrial processes.

• Gauges: 48 - 1400• Typical End Uses: Packaging, Electrical, Labels,

Graphic Arts, Coating and Laminating– labels: milk jugs, beer/wine, combination

forms, laminated coupons, …

31

(5) Pain but no gain

• A lot of the time people won't put in information according to standards for semantic/agent markup, even if they exist.

• Three reasons…– Laziness: Only 0.3% of sites currently use the

(simple) Dublin Core metadata standard. – Profits: Having an easily robot-crawlable site is

a recipe for turning what you sell into a commodity, and hence making little profit

– Cheats: There are people out there that will abuse any standard, if it’s profitable

32

(6) Less structure to come

• “the convergence of voice and data is creating the next key interface between people and their technology. By 2003, an estimated $450 billion worth of e-commerce transactions will be voice-commanded.*”

• Question: will these customers speak XML tags?

Intel ad, NYT, 28 Sep 2000*Data Source: Forrester Research.

33

The connection to language

Decker et al. IEEE Internet Computing (2000):

• “The Web is the first widely exploited many-to-many data-interchange medium, and it poses new requirements for any exchange format:– Universal expressive power– Syntactic interoperability– Semantic interoperability”

But human languages have all these properties, and maintain superior expressivity and interoperability through their flexibility and context dependence

34

NLP and information access

• Solution: use robust natural language processing and machine learning techniques

• NLP comes into its own when you want to do more than just standard IR.

• E.g., defined information needs over text:– “An apartment with 2 bedrooms in Menlo

Park for less than $1,500.”– “Where was there an airline accident today?”– “What proteins is this gene known to

regulate?”

35

Example of extracting textual relations: Real Estate Ads

• System starts with plain text of ads– These are hardly exactly “English”

• But an unstructured information source, close to English

– Chosen as lowest common denominator• Output: database records

– A variety of tables giving information about:• the property: bedrooms, garages, price• the real estate agency• inspection times

36

Real Estate Ads: Input

<ADNUM>2067206v1</ADNUM><DATE>March 02, 1998</DATE><ADTITLE>MADDINGTON $89,000</ADTITLE><ADTEXT>OPEN 1.00 - 1.45<BR>

U 11 / 10 BERTRAM ST<BR> NEW TO MARKET Beautiful<BR> 3 brm freestanding<BR> villa, close to shops & bus<BR>

Owner moved to Melbourne<BR> ideally suit 1st home buyer,<BR> investor & 55 and over.<BR> Brian Hazelden 0418 958 996<BR> R WHITE LEEMING 9332 3477

</ADTEXT>

37

Real Estate Ads: Output

• Output is database tables• But the general idea in slot-filler format:

SUBURB: MADDINGTONADDRESS: (11,10,BERTRAM,ST)INSPECTION: (1.00,1.45,11/Nov/98)BEDROOMS: 3TYPE: HOUSEAGENT: BRIAN HAZELDENBUS PHONE: 9332 3477MOB PHONE: 0418 958 996

[Manning & Whitelaw, U. Sydney 1998; in daily use at News Corp.]

40

One needs a little NLP

• There is no semantic coding to use• Standard IR doesn’t work:

– suburbs• the Paddington of the west• one hours drive from Sydney• real estate agent

– prices• recently sold for $x. Was $y now $z. Rent.

– bedrooms – multi-property ads

41

Text Segmentation

Real-estate ads have an hiearchical text structure!!SOUTHPORT UNIT SPECIALS$58,900 o.n.o. 2 brm close to water and shops.$114,000 "Grandview", excellent value, good returnsLJ Coleman Real EstateContact Steve 5527 0572

GLEBE 2br yd $250; 4br yd $430

COOGEE 3br yd $320; 1br $150

BALMAIN 1br $180

H.R. Licensed FEE 9516-3211

42

The End

probabilistic grammars

Documents

np np pp

np npposs n

nlpclstatistical nlp

np pronoun

statistical sophisticationnlp

nlfrom np det adj

sophisticated grammars

real language