probabilistic grammars

Download Probabilistic Grammars

If you can't read please download the document

Post on 14-Aug-2015




1 download

Embed Size (px)


  • Christopher ManningCS300 talk Fall 2000manning@cs.stanford.edu

  • Research areas of interest:NLP/CLStatistical NLP models: Combining linguistic and statistical sophisticationNLP and ML methods for extracting meaning relations from webpages, medical texts, etc.Information extraction and text miningLexical and structural acquisition from raw textUsing robust NLP: dialect/style, readability, Using pragmatics, genre, NLP in web searchingComputational lexicography and the visualization of linguistic information

  • Models for languageWhat is the motivation for statistical models for understanding language?From the beginning, logics and logical reasoning were invented for handling natural language understandingLogics have a language-like form that draws from and meshes well with natural languagesWhere are the numbers?

  • Sophisticated grammars for NLFrom NP Det Adj* Nthere developedprecise andsophisticatedgrammarformalisms(such as LFG,HPSG)

  • The Problem of AmbiguityAny broad-coverage grammar is hugely ambiguous (often hundreds of parses for 20+ word sentences).Making the grammar more comprehensive only makes the ambiguity problem get worse.Traditional (symbolic) NLP methods dont provide a solution.Selectional restrictions fail because creative/ metaphorical use of language is everywhere:I swallowed his storyThe supernova swallowed up the planet

  • The problem of ambiguity close upThe post office will hold out discounts and service concessions as incentives.12 words. Real language. At least 83 parses.

  • Statistical NLP methodsP(to | Sarah drove)P(time is verb | Time flies like an arrow)P(NP Det Adj N | mother = VP[drive] )Statistical NLP methods:Estimate grammar parameters by gathering counts from texts or structured analyses of textsAssign probabilities to various things to determine the likelihood of word sequences, sentence structure, and interpretation

  • Probabilistic Context-Free GrammarsNP Det N: 0.4NP NPposs N: 0.1NP Pronoun: 0.2NP NP PP: 0.1NP N: 0.2NPDetNPNPPP(subtree above) = 0.1 x 0.4 = 0.04

  • Why Probabilistic Grammars?The predictions about grammaticality and ambi-guity of categorical grammars are not in accord with human perceptions or engineering needs.Categorical grammars arent predictiveThey dont tell us what sounds naturalProbabilistic grammars model error tolerance, online lexical acquisition, and have been amazingly successful as an engineering toolThey capture a lot of world knowledge for freeRelevant to linguistic change and variation, too!

  • Example: nearIn Middle English, was an adjective [Maling]But, today, is it an adjective or a preposition?The near side of the moonWe were near the stationNot just a word with multiple parts of speech! There is evidence of blending:We were nearer the bus stop than the trainHe has never been nearer the center of the financial establishment

  • Research aimMost current statistical models are quite simple (linguistically and also statistically)Aim: To combine the good features of statistical NLP methods with the sophistication of rich linguistic analyses.

  • Lexicalising a CFGVP[looked]V[looked]lookedPP[inside]P[inside]NP[box]D[the]N[box]theboxA lexicalized CFG can capture probabilistic dependencies between words

  • Left-corner parsingThe memory requirements of standard parsers do not match human linguistic processing. What humans find hardest center embedding:*The man that the woman the priest met knows couldnt helpis really the bread-and-butter of standard CFG parsing:(((a + b)))As an alternative, left-corner parsing does capture this.

  • Parsing and (stack) complexityShe ruled that the contract between the union and company dictated that claims from both sides should be bargained over or arbitrated.

  • Tree geometry vs. stack depth

    Kims friends mothers car smells.

    Kim thinks Sandy knows she likes green apples.

    The rat that the cat that Kim likes chased died




  • Probabilistic Left-Corner GrammarsUse richer probabilistic conditioningLeft corner and goal category rather than just parentP(NP Det Adj N | Det, S)Allow left-to-right online parsing (whichcan hope to explain how people buildpartial interpretations online)Easy integration with lexicalization,part-of-speech tagging models, etc.SNPDetAdjN

  • Probabilistic Head-driven GrammarsThe heads of phrases are the source of the main constraining information about a sentence structureWe work out from heads by following the dependency order of the sentenceThe crucial property is that we have always built and have available to us for conditioning all governing heads and all less oblique dependents of the same headWe can also easily integrate phrase length

  • Information from the web: The problemWhen people see web pages, they understand their meaning By and large. To the extent that they dont, theres a gradual degradationWhen computers see web pages, they get only character strings and HTML tags

  • The human view

  • The intelligent agent view Ford Motor Company - Home Page

  • The problem (cont.)We'd like computers to see meanings as well, so that computer agents could more intelligently process the webThese desires have led to XML, RDF, agent markup languages, and a host of other proposals and technologies which attempt to impose more syntax and semantics on the web in order to make life easier for agents.

  • ThesisThe problem cant and wont be solved by mandating a universal semantics for the webThe solution is rather agents that can understand the human web by text and image processing

  • (1) The semanticsAre there adequate and adequately understood methods for marking up pages with such a consistent semantics, in such a way that it would support simple reasoning by agents?No.

  • What are some AI people saying? Anyone familiar with AI must realize that the study of knowledge representationat least as it applies to the commensense knowledge required for reading typical texts such as newspapersis not going anywhere fast. This subfield of AI has become notorious for the production of countless non-monotonic logics and almost as many logics of knowledge and belief, and none of the work shows any obvious application to actual knowledge-representation problems. Indeed, the only person who has had the courage to actually try to create large knowledge bases full of commonsense knowledge, Doug Lenat , is believed by everyone save himself to be failing in his attempt.(Charniak 1993:xviixviii)

  • (2) Pragmatics not semanticspragmatic relating to matters of fact or practical affairs often to the exclusion of intellectual or artistic matterspragmatics linguistics concerned with the relationship of the meaning of sentences to their meaning in the environment in which they occurA lot of the meaning in web pages (as in any communication) derives from the context what is referred to in the philosophy of language tradition as pragmaticsCommunication is situated

  • Pragmatics on the webInformation supplied is incomplete humans will interpret itNumbers are often missing unitsA rubber band for sale at a stationery site is a very different item to a rubber band on a metal latheA sidelight means something different to a glazier than to a regular personHumans will evaluate content using information about the site, and the style of writingvalue filtering

  • (3) The world changesThe way in which business is being done is changing at an astounding rateor at least thats what the ads from ebusiness companies scream at usSemantic needs and usages evolve (like languages) more rapidly than standards (cf. the Acadmie franaise)

    People use words that arent in the dictionary.Their listeners understand them.

  • (4) Interoperation Ontology: a shared formal conceptualization of a particular domainMeaning transfer frequently has to occur across the subcommunities that are currently designing *ML languages, and then all the problems reappear, and the current proposals don't do much to help

  • Many products cross industries offers a complete range of SKC's Skyrol brand polyester films for use in a wide variety of packaging and industrial processes. Gauges: 48 - 1400Typical End Uses: Packaging, Electrical, Labels, Graphic Arts, Coating and Laminatinglabels: milk jugs, beer/wine, combination forms, laminated coupons,

  • (5) Pain but no gainA lot of the time people won't put in information according to standards for semantic/agent markup, even if they exist.Three reasonsLaziness: Only 0.3% of sites currently use the (simple) Dublin Core metadata standard. Profits: Having an easily robot-crawlable site is a recipe for turning what you sell into a commodity, and hence making little profitCheats: There are people out there that will abuse any standard, if its profitable

  • (6) Less structure to comethe convergence of voice and data is creating the next key interface between people and their technology. By 2003, an estimated $450 billion worth of e-commerce transactions will be voice-commanded.*

    Question: will these customers speak XML tags?

    Intel ad, NYT, 28 Sep 2000*Data Source: Forrester Research.

  • The connection to languageDecker et al. IEEE Internet Computing (2000):The Web is the first widely exploited many-to-many data-interchange medium, and it poses new requirements for any exchange format:Universal expressive powerSyntactic interoperabilitySemantic interoperabilityBut human languages have all these properties, and maintain superior expressivity and inte