xu ly ngon ngu tu nhien

Upload: dich-phong

Post on 01-Nov-2015

80 views

Category:

Documents


1 download

DESCRIPTION

Tai lieu ve xu ly ngon ngu tu nhien

TRANSCRIPT

  • 1X l ngn ng t nhin(Natural Language Processing)(Natural Language Processing)

    L Thanh HngB mn H thng Thng tin

    Vin CNTT &TT Trng HBKHNEmail: huonglt-fit@mail hut edu vn

    1

    Email: huonglt [email protected]

    Mc ch mn hcz Hiu cc nguyn tc c bn v cc cch tip cn trong XLNNTNz Hc cc k thut v cng c c th dng pht trin cc h thng c cc u c g c c d g p cc g

    hiu vn bn hoc ni chuyn vi con ngi z Thu c mt s tng v cc vn m trong XLNN

    Ti liu tham khoz Christopher Manning and Hinrich Schtze. 1999. Foundations of

    Statistical Natural Language Processing. The MIT Press. z Dan Jurafsky and James Martin 2000 Speech and Language

    2

    z Dan Jurafsky and James Martin. 2000. Speech and Language Processing. PrenticeHall.

    z James Allen. 1994. Natural Language Understanding. The Benjamins/Cummings Publishing Company Inc.

    Thng tin chungz nh giz Gia k: 30%

    z im danh:10%im danh:10%z ng gp trong gi hc:10%z BTL: 10%

    z Cui k: thi vit 70%z iu kin c thi cui k:

    z Vng mt t hn 30% s ln im danh z C tt c cc loi im gia k

    3

    z Bi tp ln: z Vit tiu lun hoc ci t chng trnh theo nhm (

  • 2X l NNTN = chuyn i m thanh thnh ng ngha

    5NNTN l trung tm ca tr tu con ngi

    X l NNTN l g?

    6

    z Mc ch: hiu c nhiu ngn ngz Khng ch n gin l x l xu hoc so khp t kho

    ELIZA

    z http://en.wikipedia.org/wiki/ELIZA

    z http://www.cs.bham.ac.uk/research/projects/cogaff/eliza/

    7

    Ni chuyn vi Elizaz E: how are you today?z H: not so goodz E: why do you say not so good?z H: i suffer from allergies?z E: thats interestingz H: i come out in spotsz E: im not sure i know what youre talking aboutz H: spots on my facez E: your face

    8

    z E: your facez H: thats rightz E: is it really that right?

  • 3Ni chuyn vi Elizaz H: youre a program, arent you?z E: does it please you to believe i am a program, am not i?z H: it sure doesH: it sure doesz E: i seez H: yupz E: a while back you talk about your facez H: what about your face?z E: what is it you really want to know?z H: are you my kind of girl?

    9

    z H: are you my kind of girl?z E: perhaps in your fantasies i am your kind of girlz H: in my fantasies you are blonde

    Ni chuyn vi Elizaz E: what makes you think i am blonde?

    H i d t kz H: i dont knowz E: dont you really know?z H: noz E: are you saying no just to be negative?z H: no

    10

    z E: you are being a bit negativez H: no im notz E: how long have you been not?

    Ti sao nghin cu XLNNTN

    z Nghin cu cch con ngi xc nh tz Nghin cu cch con ngi phn tch cuz Nghin cu cch con ngi hc mt ngn

    ngz Nghin cu cch ngn ng tin ha

    11

    Cc ch trong XLNNTN

    z Mc phn tch: c php, ng ngha, din ngn, thc chng,

    z Cc bi ton con: gn nhn t loi, PTCP, phn gii nhp nhng t, phnt ch cu trc din ngn,

    z Thut ton v phng php: da trn tp ng liu, da trn tri thc,

    12

    z Cc ng dng: trch rt thng tin, phn hi thng tin, dch my, hi p, hiu ngn ng t nhin,

  • 4Cc mc phn tchz Morphology (hnh thi hc): cch t c xy dng,

    cc tin t v hu t ca tcc tin t v hu t ca tz Syntax (c php): mi lin h v cu trc ng php

    gia cc t v ngz Semantics (ng ngha): ngha ca t, cm t, v

    cch din tz Discourse (din ngn): quan h gia cc hoc cc

    cu

    13

    cuz Pragmatic (thc chng): mc ch pht ngn, cch

    s dng ngn ng trong giao tipz World Knowledge (tri thc th gii): cc tri thc v

    th gii, cc tri thc ngm

    Hnh thi hcTing Anh: ngn ng bin hnh, a m titz kick kicks kicked kickingz kick, kicks, kicked, kickingz sit, sits, sat, sittingz murder, murders

    Nhng khng phi lun thm v xa ui.z gorge, gorgeousz arm, army

    rc r

    v: nhi nht; n: nhng ci n, hm ni

    14

    Ting Vit: ngn ng khng bin hnh, n m tit cn tch tCnh tay Qun i

    Tch tz Mt cu c th c n kh nng tch t, nhng ch 1

    t h l trong chng l ngz Gii php n gin: ly chui m tit di nht bt u t v tr hin ti v c trong t in t

    z Vn : chng cho tz Hc sinh | hc sinh | hc.z Hc sinh | hc | sinh hc

    15

    z Hc sinh | hc | sinh hc.) Lit k tt c cc kh nng c th v thit k mt

    gii php la chn ci tt nht

    Gn nhn t loiThe boy threw a ball to the brown dog.

    z The/DT boy/NN threw/VBD a/DT ball/NN to/INthe/DT brown/JJ dog/NN./.

    DT determiner t ch nhNN noun, danh t, s t hoc s nhiu

    16

    VBD verb, past tense ng t, qu khIN preposition gii tJJ adjective tnh t. du chm cu

  • 5Gn nhn t loiCon nga con nga .

    z Con nga/DT /gT con nga/DT /TT.

    z ng/aT gi/TT i/Ph_t nhanh/TT qu/trng_t.

    17

    z ng gi/DT i/gT nhanh/TT qu/trng_t.

    Ng php: nhp nhng cu trc (t loi)

    Time flies like an arrow.

    Time // flies like an arrow.VBZ gii t so snh (IN)

    18

    Time flies // like an arrow.NNS VBP

    Ng php: nhp nhng cu trc (t loi)

    ng gi // i nhanh qu.

    ng // gi i nhanh qu.

    19

    Ng php: nhp nhng cu trc (lin kt)

    SS

    VP

    NP

    20

    NP V NP PP PP I saw the man on the hill with a telescope.

  • 6Ng php: nhp nhng cu trc (lin kt)

    S

    VP

    NP

    21

    NP V NP PP PP I saw the man on the hill with a telescope.

    Ng php: nhp nhng cu trc (lin kt)

    S

    VP

    22

    NP V NP PP PP I saw the man on the hill with a telescope.

    Nhng ng php khng ni ln nhiu iu

    z Colorless green ideas sleep furiously. [Chomsky]

    z fire match arson hotelz plastic cat food can cover

    23

    Ng ngha: nhp nhng mc t vngz I walked to the bank ...

    f th iof the river.to get money.

    z The bug in the room ...was planted by spies.flew out the window.

    z I work for John Hancock

    24

    z I work for John Hancock ...and he is a good boss.which is a good company.

  • 7Din ngn: ng tham chiu

    President John F. Kennedy was assassinated.The president was shot yesterday.Relatives said that John was a good father.JFK was the youngest president in history.His family will bury him tomorrow.

    25

    Friends of the Massachusetts native will hold a candlelight service in Mr. Kennedys home town.

    Thc chngBn rt ra iu g t nhng iu ti ni? Bn

    h th ?phn ng th no?

    Lut hi thoiz Bn i my gi ri?z Anh a cho em l mui c khng?

    26

    g

    Ni km theo din tz Ti c vi bn 500.000 l i Vit Nam s

    thng.

    Tri thc th gii

    Mai i n ti C y gi mn bt tt C y liMai i n ti. C y gi mn bt tt. C y li tin boa v v nh.

    z Mai n g vo ba ti? z Ai mang ba ti n cho Mai?

    27

    z Ai lm bt tt?z Mai c tr tin khng?

    Tri thc v ngn ng: Chng ta bit g v cu ny? z Cc t phi xut hin theo mt trnh t nht nh:

    a Ch kem n b Ch n kema. Ch kem n. b. Ch n kemz Cc b phn cu thnh cu:

    ch = ch ng (subject); n kem = v ng (predicate)z Ai lm g cho ai:

    ch th(ch), hnh ng(n), i tng(kem)

    28

  • 8Cc vn khc?

    z Hai cu Mai ni ch n kem v Mai ph nhn ch nz Hai cu Mai ni ch n kem v Mai ph nhn ch n kem khng logic vi nhau

    z Cu v th gii: bit 1 cu l ng hay sai c th trong mt vi trng hp c th n ng.

    z Ti ung c ph espresso sng nay, nhng Mai thng

    29

    u g c p esp esso s g ay, g a t gminh khng hp l

    Tri thc n

    1. I want to solve the problemz I wanna solve the problem

    2. I understand these studentsz These students I understandz I want these students to solve the problemz These students I want [x] to solve the

    problem z [x]=these students 30

    c trng ca ngn ng

    z Mt s c th nh c:z Singing Sing+ing; Bringing bring+ing

    z Duckling ?? Duckl +ingz Cn phi bit duckl khng phi l t

    31

    z Nhng khng th nh tt c v qu nhiu

    Ngoi b nh, ta cn g?

    S nhiu trong ting Anh:z Toy+s -> toyz ; add zz Book+s -> books ; add sz Church+s -> churchiz ; add izz Box+s-> boxiz ; add iz

    32

    Cn c h thng lut sinh/x l cc trng hp ny

  • 9Phn tch = gn b ngoi vi cch biu din trong ca n

    z V sao XLNNTN kh: What makes NLP hard: khng c tng ng 1-1 vi bt k cch biu din no.

    z Ta cn bit cu trc d liu v thut ton thc hin, mc d c th xy ra bng n t

    33

    , y ghp bt c cng on x l no

    Phn tch cu hi LSAT / (former) GRE

    z Su tng iu khc C, D, E, F, G, H c trin lm trong cc phng 1, 2, 3 ca mt trin lm.

    T C E th kh t hz Tng C v E c th khng trong cng phng.z Tng D v G pha trong mt phng.z Nu tng E v F trong cng phng th khng c tng no khc

    trong phng z C ta nht 1 tng trin lm trong mt phng, khng c nhiu

    hn 3 tng trong bt c phng noz Nu tng D c trin lm trong phng 3 v cc tng E, F trong

    34

    phng 1, trong cc pht biu di y, pht biu no ng:A. Tng C trong phng 1B. Tng H trong phng 1C. Tng G trong phng 2D. Tng C v H trong cng phng E. Tng G v F trong cng phng

    U: A Bugs Life c chiu ti ch no ca Mountain View?

    Gii quyt ng tham chiu

    View?S: A Bugs Life c chiu rp Summit.U: Khi no n c chiu ? S: N c chiu lc 2pm, 5pm, v 8pm.U: Ti mun 1 ngi ln, 2 tr con cho bui chiu u tin. N gi bao nhiu?

    35

    z Cc ngun tri thc:z Tri thc min (Domain knowledge)z Tri thc v din ngn (Discourse knowledge)z Tri thc th gii (World knowledge)

    Ti sao XLNNTN li kh?

    NNTNNNTN:z Nhp nhng ti mi mcz Phc tp v mz Lin quan lp lun v th gii

    36

  • 10

    Gii phpz Ta cn cc cng c no?z Tri thc v ngn ngz Tri thc v th giiz Cch kt hp cc tri thc

    z Gii php tim nng:Cc m hnh xc sut xy dng t d liu

    37

    z Cc m hnh xc sut xy dng t d liuz P(maison house) caoz P(Lavocat general the general avocado) thp

    Nhc li cc bi ton trong XLNNTN

    z Vo: chui k tz Ra: cc cp (gc t, th hnh thi t )z Cc vn :z Kt hp cc thnh phn cu to nn tz Loi hnh thi t (t bin t, t phi sinh, t ghp) z V d: quotations ~ quote/V + -ation(der V->N) +z V d: quotations ~ quote/V + -ation(der.V->N) +

    NNS.

    38

    Phn tch c php

    z Vo: chui cc cp (t/t loi)z Ra: cu trc ng php ca cu vi cc nt c gn nhn (t, t loi, vai tr ng php)

    z Vn : z Quan h gia t, t loi, v cu trc cuz S dng nhn c php (Ch ng v ng b ngz S dng nhn c php (Ch ng, v ng, b ng,

    .)z V d: Ti/aT nhn thy/gT Mai/DT ((Ti/aT)CN ((nhn thy/gT) (Mai/DT)OBJ)VN)C

    39

    Ng ngha

    z Vo: cu trc ng php ca cuz Ra: cu trc ng ngha ca cuz Vn :z Quan h gia cc i tng nh ch th

    (Subject), i tng (Object), tc nhn (Agent), hu qu (Effect) v cc loi khcq ( )

    ((Hc sinh/DT)CN ((hc/gT sinh hc/DT)gN)VN)C(Hc sinh/DT)Sbj (hc/gT)action (sinh hc/DT)Obj

    40

  • 11

    Cc ng dng ca XLNNTNz Kh: x l ting ni (speech processing),

    dch my (machine translation) trch rtdch my (machine translation), trch rt thng tin (information extraction), giao din hi thoi = NNTN (dialog interface), hi p (question answering)

    z ng dng hin nay: sa li chnh t, phn loi vn bn, loi vn bn,

    41

  • 12

    Trch rt thng tin

    Martin Baker, a person

    4646

    Genomics job

    Employers job posting form

    Trch rt thng tin

    October 14 2002 4:00 a m PTOctober 14, 2002, 4:00 a.m. PT

    For years, Microsoft Corporation CEO Bill Gatesrailed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

    Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the

    NAME TITLE ORGANIZATIONBill Gates CEO MicrosoftBill Veghte VP MicrosoftRichard Stallman founder Free Soft..

    IE

    47

    coveted code behind the Windows operating system--to select customers.

    "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.

    Richard Stallman, founder of the Free Software Foundation, countered saying

    Newsinessence [Radev & al. 01]

  • 13

    49

    Google News [02]

  • Tch t ting Vit

    L Thanh HngB mn H thng Thng tin

    Vin CNTT &TT Trng HBKHNEmail: [email protected]

    1

    Tch tz Mc ch: xc nh ranh gii ca cc t trong cu. z L bc x l quan trng i vi cc h thng XLNNTN,

    c bit l i vi cc ngn ng n lp, v d: m tit Trung Quc, m tit Nht, m tit Thi, v ting Vit.

    z Vi cc ngn ng n lp, mt t c th c mt hoc nhiu m tit.

    Vn ca bi ton tch t l kh c s nhp nhng trong ranh gii t.

    2

    T vng

    z ting Vit l ngn ng khng bin hnhz T in t ting Vit (Vietlex): >40.000 t,

    trong :81 55% tit l t t z 81.55% m tit l t : t n

    z 15.69% cc t trong t in l t nz 70.72% t ghp c 2 m titz 13.59% t ghp 3 m titz 1.04% t ghp 4 m tit

    3

    T vng

    di # %1 6,303 15.692 28,416 70.723 2 259 5 623 2,259 5.624 2,784 6.935 419 1.04Tng 40,181 100

    4

    Bng 1. di ca t tnh theo m tit

    Qui tc cu to t ting Vitz T n: dng mt m tit lm mt t. z V d: ti, bc, ngi, cy, hoa, i, chy, v, , , nh, nh...

    z T ghp: t hp (ghp) cc m tit li, gia cc m tit c quan h v ngha vi nhau.

    z T ghp ng lp. cc thnh t cu to c quan h bnh ng vi nhau v ngha. z V d: ch ba, bp nc

    z T ghp chnh ph. cc thnh t cu to ny ph thuc vo thnh t cu to kia. Thnh t ph c vai tr phn loi, chuyn bit ho v sc thi ho cho thnh t chnh. z V d: tu ho, ng st, xu bng, tt m, ngay , thng

    tp, sng v...

    5

    Qui tc cu to t ting Vitz T ly: cc yu t cu to c thnh phn ng m c lp

    li; nhng va lp va bin i. Mt t c lp li cng cho ta t ly.

    z Bin th ca t: c coi l dng lm thi bin ng hoc dng "li ni" ca t.dng li ni ca t. z Rt gn mt t di thnh t ngn hnz ki-l-gam ki l/ k l

    z Lm thi ph v cu trc ca t, phn b li yu t to t vi nhng yu t khc ngoi t chen vo. V d:z kh s lo kh lo sz ngt ngho ci ngt ci nghoz danh li + ham chung ham danh chung li

    6

  • Qui tc cu to t ting Vitz Cc din t gm nhiu t (vd, bi v) cng c coi l

    1 tz Tn ring: tn ngi v v tr c coi l 1 n v t

    vng z Cc mu thng xuyn: s, thi gian

    7

    Cc hng tip cnz Tip cn da trn t inz Tip cn theo phng php thng kz Kt hp hai phng php trn.

    8

    Cc phng phpz So khp t di nht (Longest Matching)z Hc da trn s ci bin (Transformation-based

    Learning TBL)z Chuyn i trng thi trng s hu hn (Weighted Finite

    State Transducer WFST)z hn lon cc i (Maximum Entropy ME)z Hc my s dng m hnh Markov n (Hidden Markov

    Models- HMM) z Hc my s dng vect h tr (Support Vector

    Machines)z Kt hp mt s phng php trn

    9

    Tip cn da trn t in

    z Xy dng t inz Mi mc t lu thng tin v t, t loi, ngha loiz T chc sao cho tn t b nh v thun tin trong vic

    tm kimz M ha t in: T loi v ngha loi kiu byte c lu

    di dng mt k t. z VD: danh t -112 p, - 115 s

    10

    Tip cn da trn t inz Phn trang theo hai ch ci u ca t, sp tng. Vi mi trang,

    cc t li c sp theo vn ABC.

    ba b xe......

    Content

    Paragraph1 2 n

    11

    bao

    b ngoi bi tp

    xe c xe p

    Content

    1

    2

    n

    Tm t trong t in

    z di ti a ca t? 3? 4? 5?z Vn : khng x l c cc t hp t c nh, vd "ng chng b chuc tt t h t t i a ra tt c cc t ghp c trong t in trng vi phn u ca xu vo

    12

  • Tm t trong t inNu nh my ngh th ta v

    V tr t: 0 1 2 3 4 5 6 7z Ta c bng sau:zz

    z K hiu:z - LT - DTz - gT - aT

    13

    Phn gii nhp nhng

    z Ly tt c cc cch phn tch, nu phn tch c php cho ra cy ng th l cch phn tch ng.

    14

    Cch tip cn lai2008.>

    z Kt hp phn tch automat hu hn + biu thc chnh quy + so khp t di nht + thng k ( gii quyt nhp nhng)

    15

    Biu thc chnh quiz l mt khun mu c so snh vi mt chui z Cc k t c bit: z * - bt c chui k t no, k c khng c gz x t nht 1 k tz + - chui trong ngoc xut hin t nht 1 lnV dz V d: z Email: x@x(.x)+z dir *.txtz *John -> John, Ajohn, Decker John

    z Biu thc chnh quy c s dng c bit nhiu trong:* Phn tch c php* Xc nhn tnh hp l ca d liu* X l chui* Tch d liu v to bo co

    16

    Automat hu hnz Lp ngn ng chnh qui, c on nhn bi my o,

    gi tn l automat hu hn.z Automat hu hn n nh (Deterministic Finite Automat a DFAz Automat hu hn khng n nh (Nondeterministic Finite

    Automat a NFA)Automat a NFA)z Automat hu hn khng n nh, chp nhn php truyn rng

    (-NFA)

    17

    Gii thiu phi hnh thc v automat hu hn

    z Mt bi ton trong automat l nhn din chui w c thuc v ngn ng L hay khng.

    z Chui nhp c x l tun t tng k hiu mt t tri sang phimt t tri sang phi.

    z Trong qu trnh thc thi, automat cn phi nh thng tin qua x l.

    18

  • V d v automat hu hnL = {w {0, 1}* | w kt thc bng chui con 10}.

    19

    Automat hu hn cho cc t ting Anh

    20

    Cch tch t n ginz Pht hin cc mu thng thng nh tn ring, ch vit

    tt, s, ngy thng, a ch email, URL, s dng biu thc chnh qui

    z H thng chn chui m tit di nht t v tr hin ti v g c trong t in, chn cch tch c t t nht

    Hn ch: c th a ra cch phn tch khng ng.

    Gii quyt: lit k tt, c 1 chin lc chn cch tch tt nht.

    21

    La chn cch tch tz Biu din on bng chui cc m tit s1 s2 snz Trng hp nhp nhng thng xuyn nht l 3 t lin nhau s1s2s3

    trong s1s2 v s2s3 u l t.

    z BIu din 1 on bng th c hng tuyn tnh G = (V,E), V = {v0, v1, . . . , vn, vn+1}

    z Nu cc m tit si+1, si+2, . . . , sj to thnh 1 t -> trong G c cnh (vi,vj)

    z Cc cch tch t = cc ng i ngn nht t v0 n vn+122

    Thut tonThut ton 1. Xy dng th cho chui s1s2 . . . sn1: V ;2: for i = 0 to n + 1 do3: V V {vi};4: end for5: for i = 0 to n do5: for i = 0 to n do6: for j = i to n do7: if (accept(AW, si sj)) then8: E E {(vi, vj+1)};9: end if10: end for11: end for12: return G = (V,E);

    23accept(A, s): automat A nhn xu vo s

    Phn gii nhp nhng

    z Xc sut xu s:

    z P(wi|w1i-1): xc sut wi khi c i-1 m tit trc

    z n = 2: bigram; n = 3: trigram

    24

  • Phn gii nhp nhngz Khi n = 2, tnh gi tr P(wi|wi-1) ln nht maximum

    likelihood (ML)

    z c(s): s ln xu s xut hin; N: tng s t trong tp luynz Khi d liu luyn nh hn kch c ton b tp d liu

    P ~ 0z S dng k thut lm trn

    25

    K thut lm trn

    vi 1 + 2 = 1 v 1, 2 0PML(wi) = c(wi)/Nz Vi tp th nghim T = {s1,s2,,sn}, xc sut P(T) ca tp

    thth: z Entropy ca vn bn:

    vi NT: s t trong Tz Entropy t l nghch vi xc sut trung bnh ca 1 cch tch

    t cho cc cu trong vn bn th nghim.26

    Xc nh gi tr 1, 2z T tp d liu mu, nh ngha C(wi-1,wi) l s ln (wi-1,

    wi) xut hin trong tp mu. Ta cn chn 1 2 lm cc i gi tr

    vi 1 + 2 = 1 v 1, 2 0

    Thut ton

    28

    Kt quz S dng tp d liu gm 1264 bi trong bo Tui tr, c 507,358 tz Ly = 0.03, cc gi tr hi t sau 4 vng lp

    z chnh xc = s t h thng xc nh ng/tng s t h thng xc nh = 95%

    29

  • Gn nhn t loi

    L Thanh Hng

    1

    L Thanh HngB mn H thng Thng tin

    Vin CNTT &TT Trng HBKHNEmail: [email protected]

    nh nghaz Gn nhn t loi (Part of Speech tagging - POS

    tagging): mi t trong cu c gn nhn th t loi tng ng ca n

    z Vo : 1 on vn bn tch t + tp nhnz Ra: cch gn nhn chnh xc nht

    2

    z Ra: cch gn nhn chnh xc nht

    V d 1V d 2V d 3V d 4V d 5

    Gn nhn lm cho vic phn tch vn bn d dng hn

    Ti sao cn gn nhn?z D thc hin: c th thc hin bng nhiu phng php

    khc nhauz Cc phng php s dng ng cnh c th em li

    kt qu ttM d th hi b h t h b

    3

    z Mc d nn thc hin bng phn tch vn bnz Cc ng dng:z Text-to-speech: record - N: [reko:d], V: [riko:d]; lead

    N [led], V: [li:d]z Tin x l cho PTCP. PTCP thc hin vic gn nhn

    tt hn nhng t hnz Nhn dng ting ni, PTCP, tm kim, v.v

    z D nh gi (c bao nhiu th c gn nhn ng?)

    Tp t loi ting Anh

    z Lp ng (cc t chc nng): s lng c nhz Gii t (Prepositions): on, under, over,z Tiu t (Particles): abroad, about, around, before, in,

    instead, since, without,

    4

    z Mo t (Articles): a, an, thez Lin t (Conjunctions): and, or, but, that,z i t (Pronouns): you, me, I, your, what, who,z Tr ng t (Auxiliary verbs): can, will, may, should,

    z Lp m: c th c thm t mi

    Lp t m trong ting Anh

    open class

    verbs

    Proper nouns: IBM, Colorado

    nounscommon nouns

    count nouns: book, ticket

    mass nouns: snow, saltauxiliaries

    Color: red, white

    . . .

    5

    p

    adverbs

    adjectives Age: old, young

    Value: good, bad

    Degree adverbs: extremely, very, somewhat

    Manner adverbs: slowly, delicately

    Temporal adverbs: yesterday, Monday

    Locatives adverbs: home, here, downhill

    Tp nhn cho ting Anh

    z tp ng liu Brown: 87 nhnz 3 tp thng c s dng: Nh: 45 nhn - Penn treebank (slide sau)

    6

    Nh: 45 nhn - Penn treebank (slide sau) Trung bnh: 61 nhn, British national corpus Ln: 146 nhn, C7

  • 7I know that blocks the sun.He always books the violin concert tickets early.He says that book is interesting.

    Penn Treebank v d

    z The grand jury commented on a number of other topics.

    8

    The/DT grand/JJ jury/NN commented/VBDon/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

    Kh khn trong gn nhn t loi?

    l x l nhp nhng

    9

    Cc phng php gn nhn t loi

    z Da trn xc sut: da trn xc sut ln nht, da trn m hnh Markov n (hidden markov model HMM)

    Pr (Det N) > Pr (Det Det)

    10

    Pr (Det-N) > Pr (Det-Det)

    z Da trn lutIf Then

    Cc cch tip cn

    z S dng HMM : S dng tt c thng tin c v on

    z Da trn rng buc ng php: khng

    11

    g g p p gon, ch loi tr nhng kh nng sai

    z Da trn chuyn i: on trc, sau c th thay i

    Gn nhn da trn xc sut

    Cho cu hoc 1 xu cc t, gn nhn t loi thng xy ra nht cho cc t trong xu .

    Cch thc hin:

    12

    z Hidden Markov model (HMM): Chn th t loi lm ti a xc sut:P(t|t loi)P(t loi| n t loi pha trc)The/DT grand/JJ jury/NN commented/VBD on/IN a/DTnumber/NN of/IN other/JJ topics/NNS ./.

    P(jury|NN) = 1/2

  • V d -HMMs

    13

    Thc hin hc c gim st, sau suy din xc nh th t loi

    Gn nhn HMM

    z Cng thc Bigram HMM: chn ti cho wi c nhiu kh nng nht khi bit ti-1 v wi :ti = argmaxj P(tj | ti-1 , wi) (1)

    z Gi thit n gin ha HMM: vn gn nhn

    14

    z Gi thit n gin ha HMM: vn gn nhn c th gii quyt bng cch da trn cc t v th t loi bn cnh n

    ti = argmaxj P(tj | tj-1 )P(wi | tj ) (2)

    xs chui th(cc th ng xut hin)

    xs t thng xut hin vi th tj

    V d

    1. Secretariat/NNP is/VBZ expected/VBN to/TO race/VBtomorrow/NN

    2. People/NNS continue/VBP to/TO inquire/VB the/DTreason/NN for/IN the/DT race/NN for/IN outer/JJ

    15

    space/NNz Khng th nh gi bng cch ch m t trong tp ng

    liu (v chun ha)z Mun 1 ng t theo sau TO nhiu hn 1 danh t (to

    race, to walk). Nhng 1 danh t cng c th theo sau TO (run to school)

    Gi s chng ta c tt c cc t loi tr t race

    z Ch nhn vo t ng trc(bigram):to/TO race/??? NN or VB?the/DT race/???

    I/PP know/VBP that/WDT block/NN blocks/NNS?VBZ? the/DT sun/NN.

    16

    z p dng (2):

    z Chn th c xc sut ln hn gia 2 xc sut:P(VB|TO)P(race|VB) hoc P(NN|TO)P(race|NN)

    xc sut ca 1 t l race khi bit t loi l VB.

    ti = argmaxj P(tj | tj-1 )P(wi | tj )

    Tnh xc sutXt P(VB|TO) v P(NN|TO)z T tp ng liu Brown

    P(NN|TO)= .021P(VB|TO)= .340

    17

    P(race|NN)= 0.00041P(race|VB)= 0.00003

    z P(VB|TO)P(race|VB) = 0.00001z P(NN|TO)P (race|NN) = 0.000007

    race cn phi l ng t nu i sau TO

    Bi tpz I know that blocks the sun.z He always books the violin concert tickets early.z He says that book is interesting.

    z I/PP know/VBP that/WDT blocks/VBZ the/DT sun/NN.

    18

    z He/PP always/RB books/VBZ the/DT violin/NN concert/NN tickets/NNS early/RB.

    z I know that block blocks the sun.z I/PP know/VBP that/DT block/NN blocks/NNS?VBZ?

    the/DT sun/NN.

    z He/PP says/VBZ that/WDT book/NN is/VBZ interesting/JJ.

  • M hnh y z Chng ta cn tm chui th tt nht cho ton xuz Cho xu t W, cn tnh chui t loi c xc sut ln

    nhtT=t1, t2 ,, tn hoc,

    19

    (nguyn l Bayes)

    arg max ( | )T

    T P T W

    =

    M rng s dng lut chui

    P(A,B) = P(A|B)P(B) = P(B|A)P(A)

    P(A,B,C) = P(B,C|A)P(A) = P(C|A,B)P(B|A)P(A) = P(A)P(B|A)P(C|A,B)

    20

    P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C..)

    1 1 1 1 1 1 1 11

    ( ) ( | ) ( | ... ) ( | ... )n

    i i i i i i ii

    P T P W T P w w t w t t P t w t w t =

    =lch s nhnpr t

    Gi thit trigram

    z Xc sut 1 t ch ph thuc vo nhn ca n

    1 1 1( | ... ) ( | )i i i i iP w w t t t P w t=

    21

    z Ta ly lch s nhn thng qua 2 nhn gn nht (trigram: 2 nhn gn nht + nhn hin ti)

    1 1 1( | ... ) ( | )i i i i iP w w t t t P w t

    1 1 1 2 1( | ... ) ( | )i i i i iP t w t t P t t t =

    Thay vo cng thc

    n n

    P(T)P(W|T) =

    22

    1 2 1 2 13 1

    ( ) ( | ) ( | )[ ( | )]i i i i ii i

    P t P t t P t t t P w t = =

    nh gi xc sutz S dng quan h xc sut t tp ng liu nh gi xc sut:

    2 1( )( | ) i i ic t t tP t t t

    23

    2 11 2

    2 1

    ( )( | )( )i i i

    i i ii i

    P t t tc t t

    =

    ( , )( | )( )i i

    i ii

    c w tP w tc t

    =

    Bi ton

    Cn gii quyt

    arg max ( ) ( | )T P T P W T=

    24

    By gi ta c th tnh c tt c cc tch P(T)P(W|T)

    arg max ( ) ( | )T

    T P T P W T

    =

  • V dNNS

    DT

    NNS

    NNS

    25

    the dogVB

    sawVBP

    ice-cream

    Tm ng i tt nht?

    Tm ng i c im cao nht

    NNS NNS

    7530

    NNS1

    1 2 1 2 13 1

    ( ) ( | ) ( | )[ ( | )]n n

    i i i i ii i

    P t P t t P t t t P w t = =

    26

    the dog

    VB

    DT

    sawVBP

    ice-cream

    75

    1

    60301

    NNS1

    52

    Cch tm ng i c im cao nhtz S dng tm kim kiu best-first (A*)

    1. Ti mi bc, chn k gi tr tt nht ( ) . Mi gi tr trong k gi tr ny ng vi 1 kh nng kt hp nhn ca tt c cc t

    27

    2. Khi gn t tip theo, tnh li xc sut. Quay li bc 1

    z u: nhanh (khng cn kim tra tt c cc kh nng kt hp, ch k ci tim nng nht)

    z Nhc: c th khng tr v kt qu tt nht m ch chp nhn c

    chnh xcz > 96%z Cch n gin nht? 90%

    z Gn mi t vi t loi thng xuyn nht ca n

    28

    nz Gn t cha bit = danh t

    z Ngi: 97%+/- 3%; nu c tho lun: 100%

    Cch tip cn th 2: gn nhn da trn chuyn i

    Transformation-based Learning (TBL):

    z Kt hp cch tip cn da trn lut v cch tip t d h h h l i th

    29

    cn xc sut: s dng hc my chnh li th thng qua vi ln duyt

    z Gn nhn s dng tp lut tng qut nht, sau n tp lut hp hn, thay i mt s nhn, v tip tc

    Transformation-based painting

    30

  • Transformation-based painting

    31

    Transformation-based painting

    32

    Transformation-based painting

    33

    Transformation-based painting

    34

    Transformation-based painting

    35

    Transformation-based painting

    36

  • V d vi TBL

    37

    V d vi TBL

    1. Gn mi t vi nhn thng xut hin nht (thng chnh xc khong 90% ). T tp ng liu Brown:P(NN|race)= 0.98

    38

    ( | )P(VB|race)= 0.02

    2. expected/VBZ to/ TO race/NN tomorrow/NNthe/DT race/NN for/IN outer/JJ space/NN

    3. S dng lut chuyn i:Thay NN bng VB khi th trc l TO

    pos: NN>VB pos: TO @[-1] o

    TO race/VB

    Lut gn nhn t loi

    39

    Lut gn nhn t loi

    40

    Hc lut TB trong h thng TBL

    41

    Cc tp ng liu

    z Tp hun luynw0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

    z Tp ng liu hin ti (CC 1)

    42

    z Tp ng liu hin ti (CC 1)dt vb nn dt vb kn dt vb ab dt vb

    z Tp ng liu tham khodt nn vb dt nn kn dt jj kn dt nn

  • Khun dng cho lut gn nhn t loiz Trong TBL, ch cc lut tha khun dng mi c

    hc.z V d: cc lut

    tag:'VB'>'NN' tag:'DT'@[-1].

    43

    tag: VB NN tag: DT @[ 1].tag:NN>VB' tag:'DT'@[-1].

    tha khun dngtag:A>B tag:C@[-1].

    z C th to khun dng s dng cc bin v danhtag:_>_ tag:_@[-1].

    Hc lut TB trong h thng TBL

    44

    im, chnh xc, ngng

    z im ca 1 lut:

    score(R) = |pos(R)| - |neg(R)|

    z chnh xc:

    45

    z Threshold: ngng m chnh xc ca 1 lut cn vt qua c th c la chn.

    z Trong TBL, ngng ca chnh xc thng < 0.5.

    Sinh v tnh im cho lut ng vin 1z Template = tag:_>_ tag:_@[-1]z R1 = tag:vb>nn tag:dt@[-1]

    46

    z pos(R1) = 3z neg(R1) = 1z score(R1) = pos(R1) - neg(R1) = 3-1 = 2

    Sinh v tnh im cho lut ng vin 2z Template = tag:_>_ tag:_@[-1]z R2 = tag:nn>vb tag:vb@[-1]

    47

    z pos(R2) = 1z neg(R2) = 0z score(R2) = pos(R2) - neg(R2) = 1-0 = 1

    Hc lut TB trong h thng TBL

    48

  • Chn lut tt nht

    z Th hng hin ti ca lut ng vinR1 = tag:vb>nn tag:dt@[-1] Score = 2R2 = tag:nn>vb tag:vb@[-1] Score = 1

    49

    z Nu score threshold =< 2 th chn R1z ngc li nu score threshold > 2, dng

    Ti u ha vic chn lut tt nht

    z Gim d tha lut:ch sinh cc lut ng vin ph hp t nht vi 1 d liu trong tp luyn.

    z nh gi tng cng:

    50

    z nh gi tng cng: z Lu vt ca cc lut ng vin tt nhtz B qua cc lut ph hp vi s lng mu <

    score ca lut tt nht

    Tm kim tham lam kiu Best-First

    Hm gi

    h(n) = gi c lng ca ng i r nht t trng thi ca nt n n trng thi ch

    51

    thi ca nt n n trng thi ch

    u im ca TBL

    z Lut c th c to th cng

    z Lut d hiu v logic

    52

    z D ci t

    z C th chy rt nhanh (nhng ci t th phc tp)

    Phn tch li: kh khn i vi b gn nhn t loi

    Cc li thng thng (> 4%)z NN (common noun) vs .NNP (proper noun) vs. JJ

    (adjective): kh phn bit, s phn bit ny l quan

    53

    ( j ) p p y qtrng c bit trong trch rt thng tin

    z RP(particle) vs. RB(adverb) vs. IN(preposition):tt c cc loi ny c th xut hin tun t sau ng t

    z VBD vs. VBN vs. JJ: phn bit thi qu kh, phn t 2, tnh t (raced vs. was raced vs. the out raced horse)

    Cch tt nht pht hin cc t cha bit

    z Da trn 3 dng ui bin t (-ed, -s, -ing); 32 ui phi sinh (-ion, etc.); ch hoa; gch ni

    54

    z Tng qut hn:z Phn tch hnh thi tz Cc cch tip cn hc my

  • Gn nhn t loi ting VitCu ting Vit tch t

    Qua nhng ln t Si_Gn v Qung_Ngi kim_tra cng_vic , Sophie v Jane thng tr_chuyn vi Mai , cm_nhn ngn_la_sng v nim_tin mnh_lit t ngi ph_n VN ny .

    Cu ting Vit

    Qua nhng ln t Si_Gn v Qung_Ngi kim_tra cng vic Sophie v Jane thng tr chuyn viVit

    c gn nhn t loi

    cng_vic , Sophie v Jane thng tr_chuyn vi Mai , cm_nhn ngn_la_sng v nim_tin mnh_lit t ngi ph_n VN ny .

    Ch thch t loi

    55

    Cc bc thc hinz Tch tz Gn nhn tin nghim (gn mi t vi tt c cc nhn t loi m

    n c th c). z Vi mt t mi, dng mt nhn ngm nh hoc gn cho n tp

    tt c cc nhn. Vi ngn ng bin i hnh thi da vo hnh g g thi t

    z Quyt nh kt qu gn nhn (loi b nhp nhng)z da vo quy tc ng phpz da vo xc sutz s dng mng n-ron z cc h thng lai s dng kt hp tnh ton xc sut v rng buc

    ng phpz gn nhn nhiu tng

    56

    D liu phc v gn nhn

    z Ng liu: z T in t vngz Kho vn bn gn nhn, c th km theo cc quy

    tc ng php xy dng bng taytc g p p y d g b g tayz Kho vn bn cha gn nhn, c km theo cc thng

    tin ngn ng nh l tp t loiz Kho vn bn cha gn nhn, vi tp t loi c xy

    dng t ng nh cc tnh ton thng k

    57

    Kh khn trong gn nhn t loi ting Vit

    z c trng ring v ngn ngz thiu cc kho d liu chun nh Brown hay

    Penn Treebank kh kh t h i kt kh khn trong nh gi kt qu

    58

    Cch tip cn 1[inh in] Dien Dinh and Kiem Hoang, POS-tagger for English-

    Vietnamese bilingual corpus. HLTNAACL Workshop on Building and using parallel texts: data driven machine translation and beyond, 2003.

    z chuyn i v nh x t thng tin t loi t ting Anh doz gn nhn t loi trong ting Anh t chnh xc

    cao ( >97%) z nhng thnh cng gn y ca cc phng php

    ging hng t (word alignment methods) gia cc cp ngn ng.

    59

    [inh in]z Xy dng mt tp ng liu song ng Anh Vit ~ 5 triu

    t (c Anh ln Vit).

    z gn nhn t loi cho ting Anh da trn Transformation-based Learning TBL [Brill 1995]

    z ging hng gia hai ngn ng ( chnh xc khong 87%) chuyn nhn t loi sang ting Vit.

    z kt qu c hiu chnh bng tay lm d liu hun luyn cho b gn nhn t loi ting Vit.

    60

  • [inh in]z u im: z trnh c vic gn nhn t loi bng tay nh tn

    dng thng tin t loi mt ngn ng khc. z Nhc:z Ting Anh v ting Vit khc nhau: v cu to t, trt

    t v chc nng ng php ca t trong cu kh khn trong ging hng

    z Li tch ly qua hai giai on: (a) gn nhn t loi cho ting Anh v (b) ging hng gia hai ngn ng

    z Tp nhn c chuyn i trc tip t ting Anh sang ting Vit khng in hnh cho t loi ting Vit

    61

    Cch tip cn 2z [Nguyen Huyen, Vu Luong] Thi Minh Huyen Nguyen, Laurent

    Romary, and Xuan Luong Vu, A Case Study in POS Tagging of Vietnamese Texts. The 10th annual conference TALN 2003.

    z da trn nn tng v tnh cht ngn ng ca ting Vit. z xy dng tp t loi (tagset) cho ting Vit da trnz xy dng tp t loi (tagset) cho ting Vit da trn

    chun m t kh tng qut ca cc ngn ng Ty u, nhm m un ha tp nhn hai mc: z mc c bn/ct li (kernel layer): c t chung nht cho cc

    ngn ng z mc tnh cht ring (private layer): m rng v chi tit ha cho

    mt ngn ng c th da trn tnh cht ca ngn ng

    62

    [Nguyen Huyen, Vu Luong]

    z mc c bn: danh t (noun N), ng t (verb V), tnh t (adjective A), i t (pronoun P), mo t (determine D), trng t (adverb R), tin-hu gii t (adposition S), lin t (conjunction C), s t (numeral M) tnh thi t (interjection I) v t(numeral M), tnh thi t (interjection I), v t ngoi Vit (residual X, nh foreign words, ...).

    z mc tnh cht ring: c trin khai ty theo cc dng t loi trn nh danh t m c/khng m c i vi danh t, ging c/ci i vi i t, .v.v.

    63

    Cch tip cn 3z [Phuong] Nguyn Th Minh Huyn, V Xun Lng, L

    Hng Phng . S dng b gn nhn t loi xc sut QTAG cho vn bn ting Vit. K yu Hi tho ICT.rda03

    z lm vic trn mt ca s cha 3 t, sau khi b sung thm 2 t gi u v cui vn bn.

    z Nhn c gn cho mi t lt ra ngoi ca s l nhn kt qu cui cng.

    64

    Th tc gn nhn t loi [Phng]1. c t (token) tip theo 2. Tm t trong t in 3. Nu khng tm thy, gn cho t tt c cc nhn c th 4. Vi mi nhn c th

    a. tnh Pw = P(tag|token)b. tnh Pc = P(tag|t1,t2), t1, t2, l nhn tng ng ca hai t

    ng trc t token. c. tnh Pw,c = Pw * Pc, kt hp hai xc sut trn.

    5. Lp li php tnh cho hai nhn khc trong ca s Sau mi ln tnh li (3 ln cho mi t), cc xc sut kt qu

    c kt hp cho ra xc sut ton th ca nhn c gn cho t.

    65

    [Phng]

    z Chia kho vn bn gn nhn lm 2 tp: tp hun luyn v tp th nghim

    z T ng gn nhn cho cc phn vn bnz So snh kt qu thu c vi d liu mu. z Thi gian hun luyn vi 32000 t: ~ 30s

    66

  • [Phng]z Cu gn nhn:

    hi ln < w pos="Nn"> su , c ln ti nhn thy mt bc tranh tuyt p

    Nc - danh t n th, Vto - ngoi ng t ch hng, Nn - danh t s lng, Vs - ng t tn ti, Nu - danh t n v, Pp - i t nhn xng, Jt - ph t thi gian, Vt - ngoi ng t, Nt - danh t loi th, Jd - ph t ch mc , Aa - tnh t hm cht.

    67

    [Phng]z Cu t tp ng liu mu

    hi ln < w pos="Nn"> su , c ln ti nhn thy mt bc tranh tuyt p

    Cu do chng trnh gn nhn hi nhn thy mt bc tranh tuyt p

    68

    [Phng]

    z Kt qu: z ~94% (9 nhn t vng v 10 nhn cho cc loi k

    hiu)z ~85% (48 nhn t vng v 10 nhn cho cc loiz 85% (48 nhn t vng v 10 nhn cho cc loi

    k hiu)z Nu khng dng n t in t vng (ch s

    dng kho vn bn gn nhn mu) th cc kt qu ch t c tng ng l ~80% v ~60%.

    69

    Cch tip cn 4z Phan Xun Hiu:

    z da trn phng php Maximum Entropy (MaxEnt) v Conditional Random Fields (CRFs) - ng dng rt nhiu cho cc bi ton gn nhn cho cc thnh phn trong d liu chuiliu chui.

    z D liu hun luyn: l tp ng liu Viet Treebank bao gm hn 10.000 cu ting Vit c gn nhn t loi bi cc chuyn gia ngn ng.

    70

    [Hiu]

    Hc m hnh gn nhn t loi 71

    Trch chn c trngz ... thng tr_chuyn vi Mai ... z Cn xc nh t loi cho t tr_chuyn, cc c trng:z Chnh bn thn t tr_chuyn thng xut hin vi t loi no

    trong tp d liu Viet Treebank? T tr chuyn thng c nhn t loi l g trong t in? Lz T tr_chuyn thng c nhn t loi l g trong t in? L ng t chng?

    z T thng i ngay trc t tr_chuyn thng c gi g? z T vi i sau t tr_chuyn c gi g? C phi n gi l

    ngay trc n l mt ng t hay khng? z Kt hp ca hai t vi Mai gi iu g, chc t trc

    (tr_chuyn) nn l mt ng t?

    72

  • Ng cnh cho trch xut c trng

    73

    Ng cnh cho trch xut c trng

    74

    Kt qu gn nhn s dng MaxEnt v CRFs

    75

    Tp t loi ting VitidPOS symbolPOS vnPOS enPOS

    1 N danh t noun2 V ng t verb3 A tnh t adjective4 M s t numeral5 P i t pronoun6 R ph t adverb6 R ph t adverb7 O gii t preposition8 C lin t conjunction9 I tr t auxiliary word10 E cm t emotivity word11 Xy* t tt abbreviation12 S yu t t (bt, v) component stem13 U khng xc nh undetermined

    76T tt mang nhn kp: X = t loi ca t tt ; y = k hiu t tt. V d: GDP-Ny ; HIV Ny.

    Tp tiu t loi ting VitidPOS idSub

    POSsymbol

    POSvnPOS enPOS

    1 1 Np danh t ring proper noun1 2 Nc danh t n th countable noun1 3 Ng danh t tng th collective Noun1 4 Na danh t tru tng abstract noun1 5 Ns danh t ch loi classifier noun1 6 Nu danh t n v unit noun

    77

    1 6 Nu danh t n v unit noun1 7 Nq danh t ch lng quantity noun2 8 Vi ng t ni ng intransitive verb2 9 Vt ng t ngoi ng transitive verb2 10 Vs ng t trng thi state verb2 11 Vm ng t tnh thi modal verb2 12 Vr ng t quan h relative verb3 13 Ap tnh t tnh cht property adjective3 14 Ar tnh t quan h relative adjective3 15 Ao tnh t tng thanh onomatopoetic adjective3 16 Ai tnh t tng hnh pictographic adjective

    Tp tiu t loi ting VitidPOS idSub

    POSsymbol

    POSvnPOS enPOS

    4 17 Mc s t s lng cardinal numeral4 18 Mo s t th t ordinal numeral5 19 Pp i t xng h personal pronoun5 20 Pd i t ch nh demonstrative pronoun5 21 Pq i t s lng quality pronoun

    78

    5 21 Pq i t s lng quality pronoun5 22 Pi i t nghi vn interrogative pronoun6 23 R ph t adverb7 24 O gii t preposition8 25 C lin t conjunction9 26 I tr t auxiliary word

    10 27 E cm t emotivity word11 28 Xy t tt abbreviation12 29 S yu t t (bt, v) component stem13 30 U khng xc nh undetermined

  • Phn tch c php

    1

    L Thanh HngB mn H thng Thng tin

    Vin CNTT &TT Trng HBKHNEmail: [email protected]

    Bi ton PTCP

    P

    T

    C

    cy PTCP mu

    chnh xctnh

    i

    2

    C

    P

    Vn phm

    cu Cc b PTCP hin nay c chnh xc cao(Eisner, Collins, Charniak, etc.)

    cy c php

    im

    Khi nim v vn phm

    z Phn tch cu B vng gm c nonz Cy c php:z Tp lutz C CN VNz CN DNz VN gNz gN gT DNz DN DT TT

    3

    Vn phm

    z Mt vn phm sn sinh l mt h thngz G = ( T, N, S, R ), trong z T (terminal) tp k hiu kt thcz N (non terminal) tp k hiu khng kt thcz S (start) k hiu khi uz R (rule) tp lutz R = { | , (TN) } z gi l lut sn xut

    4

    Dng chun Chomsky

    z Mi NNPNC khng cha u c th sinh t mt vn phm tn mi sn xut u c dng A BC hoc A a, vi A,B,CN v a TT

    z V d: Tm dng chun Chomsky cho vn phm G vi T = {a,b}, N ={S,A,B}, R nh sau:z S bA|aBz A bAA|aS|az B aBB|bS|b

    5

    Nhc li v vn phmz Vn phm: 1 tp lut vit liz K hiu kt thc: cc k hiu khng th phn r c

    na.z K hiu khng kt thc: cc k hiu c th phn r c.Xt h G

    6

    z Xt vn phm G:S NP VPNP John, garbageVP laughed, walks

    G c th sinh ra cc cu sau:John laughed. John walks.Garbage laughed. Garbage walks.

  • Cu trc ng php

    Cy c php biu din cu trc ng php ca mt cu. B vng gm c non.

    C

    CN VN

    7

    DTB

    gTgm

    DTc

    TTnon

    TTvng

    DN gN

    DN

    Cc ng dng ca PTCP

    Dch my (Alshawi 1996, Wu 1997, ...)

    ting Anh ting Vitcc thao tc

    vi cy

    8

    Nhn dng ting ni s dng PTCP (Chelba et al 1998)Put the file in the folder. Put the file and the folder.

    Cc ng dng ca PTCP

    Kim tra ng php (Microsoft)

    Trch rt thng tin (Hobbs 1996)

    9

    Kho vn bnNY Times

    CSDL

    cu truy vn

    Vn phm phi ng cnh (Context-Free Grammar) cn gi l vn phm cu trc onz G = z T tp cc k hiu kt thc (terminals)z N - tp cc k hiu khng kt thc (non-terminals)z P k hiu tin kt thc (preterminals), khi vit li tr

    thnh k hiu kt thc P N

    10

    thnh k hiu kt thc, P Nz S k hiu bt uz R: X , X l k hiu khng kt thc; l chui cc

    k hiu kt thc v khng kt thc (c th rng)z Vn phm G sinh ra ngn ng L

    z B nhn dng: tr v yes hoc noz B PTCP: tr v tp cc cy c php

    So vi vn phm cm ng cnh R: A

    z Vn phm ng cu:z , vi V+ , V*

    z Vn phm cm ng cnh:z r = , vi V+ , V* , z v 1A212 vi

    z Vn phm phi ng cnh:z A , A N,

    i V* ( T N )*

    11

    z vi V*= ( T N )*z Vn phm chnh qui:z A aB, z A Ba, z A a, vi A, B N, a T.

    VPCQ

    VPPNC

    VPCNC

    VPNC

    Vn phm phi ng cnh

    12

  • p dng tp lut ng php

    z S NP VP DT NNS VBD The children slept

    13

    pz S

    NP VP DT NNS VBD NP DT NNS VBD DT NN The children ate the cake

    Cu trc on qui

    14

    Vn phm cho ngn ng t nhin c nhp nhng

    S

    NP VP

    Nhp nhng - PPc th gn ti 2 im (vi VP hoc vi NP)

    John saw snow on the campus

    15

    NP

    0 John

    VP

    PP

    NP

    1 saw NP2 snow

    3 on

    4 the 5 campus 6

    PTCP kiu trn xungz Hng chz Khi u vi 1 danh sch cc k hiu cn trin khai (S,

    NP,VP,) z Vit li cc ch trong tp ch bng cch:

    S

    NP VP

    .

    16

    z tm lut c v tri trng vi ch cn trin khaiz triu khai n vi v phi lut, tm cch khp vi cu u vo

    z Nu 1 ch c nhiu cch vit li chn 1 lut p dng (bi ton tm kim)

    z C th s dng tm kim rng (breadth-first search) hoc tm kim su (depth-first search)

    Kh khn vi PTCP trn xungz Cc lut qui triz PTCP trn xung rt bt li khi c nhiu lut c cng v tri

    SNP X1 SNP X2 SNP X600 SVP Y1

    17

    z Nhiu thao tc tha: trin khai tt c cc nt c th phn tch trn xung

    z PTCP trn xung s lm vic tt khi c chin lc iu khin ng php ph hp

    z PTCP trn xung khng th trin khai cc k hiu tin kt thc thnh cc k hiu kt thc. Trn thc t, ngi ta thng s dng phng php di ln lm vic ny.

    z Lp li cng vic: bt c ch no c cu trc ging nhau

    PTCP di ln

    z Hng d liuz Khi to vi xu cn phn tchz Nu chui trong tp ch ph hp vi v phi ca 1 lut

    thay n bng v tri ca lut

    .

    S

    NP VP

    18

    thay n bng v tri ca lut.z Kt thc khi tp ch = {S}.z Nu v phi ca cc lut khp vi nhiu lut trong tp ch, cn la chn lut p dng (bi ton tm kim)

    z C th s dng tm kim rng (breadth-first search) hoc tm kim su (depth-first search)

  • Kh khn vi PTCP di ln

    z Khng hiu qu khi c nhiu nhp nhng mc t vng

    z Lp li cng vic: bt c khi no c cu trc con chung

    19

    chungz C PTCP TD (LL) v BU (LR) u c phc

    tp l hm m ca di cu.

    Thut ton CKY (b nhn dng)

    Vo: xu n t Ra: yes/no Cu trc ng php: bng n x n (chart table)

    20

    g p p g ( ) hng nh s 0 n n-1 ct nh s 1 n n cell [i,j] lit k tt c cc nhn c php gia i v j

    Thut ton CKY (bottom-up) for i := 1 to n Thm tt c t loi ca t th i vo [i-1,i]

    for width := 2 to n for start := 0 to n-width

    end := start + width

    21

    end := start + width for mid := start+1 to end-1 for mi nhn c php X trong [start,mid] for mi nhn c php Y trong [mid,end] for mi cch kt hp X v Y (nu c) Thm nhn kt qu vo [start,end] nu cha

    c nhn ny

    V dB vng gm c non1 2 3 4 5

    0DT

    CNDN

    C

    22

    1TT

    2gT

    VNgN

    3DT DN

    4TT

    Vn phm phi ng cnh1. Start S2. S NP VP3. NP Det Noun4. NP Name

    9. V ate10. Name John11. Name ice-cream, snow12. Noun ice-cream, pizza

    23

    5. NP Name PP6. PP Prep NP7. VP V NP8. VP V NP PP

    13. Noun table, guy, campus14. Det the15. Prep on

    Lut kt hp

    z Cell[i,j] cha nhn X nuz C lut XYZ;z Cell[i,k] cha nhn Y v Cell[k,j] cha nhn Z,

    24

    vi k nm gia i v j;

    z VD: NP DT [0,1] NN[1,2]

  • CKY phi s dng lut nh phn

    z Chuyn VPV NP PP thnh:8.a. VPV Arguments8 b Arguments NP PP

    25

    8.b. Arguments NP PP

    CKY chart

    1 2 3 4 5 6 7 8

    0 DT1 NN2 VBD

    The guy ate the ice-cream on the table

    26

    2 VBD3 DT4 NN5 IN6 DT7 NN

    p dng thao tc dn

    1 2 3 4 5 6 7 8

    0 DT NP1 NN

    27

    2 VBD3 DT4 NN5 IN6 DT7 NN

    Nhp nhng!1 2 3 4 5 6 7 8

    0 DT NP S1 NN2 VBD VP

    5. NP NN PP8.a. VPV Arguments8.b. Arguments NP PP

    28

    3 DT NP NP, Args

    4 NN5 IN PP6 DT NP7 NN

    Thut ton Earley (top-down)

    z Tm cc nhn v cc nhn thiu (partial constituents) t u voz A B C . D E l nhn thiu:

    A D+ =A

    29

    z Tin hnh dn t tri sang phi

    B C D E

    A B C . D E

    B C D E

    A B C D . E

    V d

    ROOT S NP PapaS NP VP N caviarNP Det N N spoon

    30

    NP NP PP V ateVP VP PP P withVP V NP Det thePP P NP Det a

  • Recursive Descent ( quy)

    z 0 ROOT . S 0z 0 S . NP VP 0

    ROOT S VP VP PP NP Papa V ateS NP VP VP V NP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    31

    z 0 NP . Papa 0 z 0 NP Papa . 1

    z 0 S NP . VP 1

    Root S VPNP

    VPPapa

    ROOT S S NP VP NP Papa

    VP

    Papa

    Goal stack

    Recursive Descent

    z 0 S NP . VP 1z 1 VP . VP PP 1

    ROOT S VP VP PP NP Papa V ateS NP VP VP V NP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    32

    1 VP . VP PP 1 1 VP . VP PP 1

    1 VP . VP PP 1 stack overflowedVP VP PP VP VP PP

    PPVP

    PPVP

    PPPPVP

    PPPP

    VP VP PP

    VP PP

    PPPP

    VP VP PP

    Recursive DescentROOT S VP V NP NP Papa V ateS NP VP VP VP PP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    0 ROOT . S 0 0 S . NP VP 0

    NP P

    33

    z 1 VP . V NP 1 sau . = nonterminal, lp i lp li vic tm k hiu ny (predict) 1 V . ate 1 sau . = terminal, tm n u vo (scan) 1 V ate . 2 sau . = rng, ch con ca cha n hon chnh (attach)

    z 1 VP V . NP 2 predict (ch con tip theo) 2 NP . ... 2 phn tch tip v cui cng 2 NP ... . 7 we hon thnh ch con NP ca cha n attach

    z 1 VP V NP . 7 attachz 0 S NP VP . 7 attach

    0 NP . Papa 0 0 NP Papa . 1

    0 S NP . VP 1

    Recursive Descent

    z 0 ROOT . S 0z 0 S . NP VP 0

    z 0 NP . Papa 0

    ROOT S VP V NP NP Papa V ateS NP VP VP VP PP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    thc hin bng li gi hm:S() gi NP() v VP(), VP c trin khai 1

    34

    pz 0 NP Papa . 1

    z 0 S NP . VP 1z 1 VP . V NP 1 1 V . ate 1 1 V ate . 2

    z 1 VP V . NP 2 2 NP . ... 2 2 NP ... . 7

    z 1 VP V NP . 7z 0 S NP VP . 7

    cn quay li th 1 lut VP khc

    S() gi NP() v VP(), VP c trin khai 1 cch qui

    Recursive DescentROOT S VP V NP NP Papa V ateS NP VP VP VP PP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    0 ROOT . S 0 0 S . NP VP 0

    0 NP . Papa 0

    35

    1 VP . V NP 1 1 V . ate 1 1 V ate . 2

    1 VP V . NP 2 2 NP . ... 2 phn tch tip v cui cng 2 NP ... . 4 ... on NP ng l t 2 n 4

    ch ny cng cn quay li

    0 NP . Papa 0 0 NP Papa . 1

    0 S NP . VP 1 1 VP . VP PP 1

    Recursive DescentROOT S VP V NP NP Papa V ateS NP VP VP VP PP N caviar P withNP Det N PP P NP N spoon Det theNP NP PP Det a

    0 Papa 1 ate 2 the 3 caviar 4 with 5 a 6 spoon 7

    0 ROOT . S 0 0 S . NP VP 0

    NP P

    36

    1 VP . VP PP 11 VP . VP PP 1

    1 VP . VP PP 1stack overflowedkhng gii quyt c g

    cn thay i tp lut loi tr qui tri

    0 NP . Papa 0 0 NP Papa . 1

    0 S NP . VP 1 1 VP . VP PP 1

    1 VP . VP PP 1

  • Thut ton Earleyz Thut ton Earley ging thut ton qui ni trn, nhng gii

    quyt c vn qui tri. z S dng bng phn tch ging thut ton CKY, nhm lu li cc

    thng tin tm thy lp trnh ng Dynamic programming.Cc thao tc ca thut ton

    37

    z X l phn i sau du . theo kiu qui :z Nu l t, qut (scan) u vo xem c ph hp khngz Nu l k hiu khng kt thc, on (predict) cc kh nng

    khp n (gim s php tin on bng cch nhn trc k k hiu t u vo v ch s dng cc lut ph hp vi k k hiu )

    z Nu rng, ta hon thnh mt thnh phn ng php, gn (attach) n vo nhng ch lin quan

    00 ROOT . S

    khi to

    tng ng vi (0, ROOT . S)

    38

    00 ROOT . S0 S . NP VP

    predict lut c v tri l S

    (0, S . NP VP)

    39

    00 ROOT . S0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa

    predict lut c VT = NP(c 3 lut ph hp)

    40

    00 ROOT . S0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa0 D t th

    predict lut c VT = Det (2 lut)

    41

    0 Det . the0 Det . a

    00 ROOT . S0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa0 D t th

    predict lut c VT = NPta lm vic ny bc trc, v vy khng lm li!Ch : ta phi lm li vic ny vi lut qui tri

    42

    0 Det . the0 Det . a

    Ch : ta phi lm li vic ny vi lut qui tri

  • 0 Papa 1 0 ROOT . S 0 NP Papa .0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa0 D t th

    scan: t ph hp t u vo

    43

    0 Det . the0 Det . a

    0 Papa 1 0 ROOT . S 0 NP Papa .0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa0 D t th kh h h

    44

    0 Det . the0 Det . a

    scan: khng ph hp

    0 Papa 1 0 ROOT . S 0 NP Papa .0 S . NP VP0 NP . Det N0 NP . NP PP0 NP . Papa0 D t th

    45

    0 Det . the0 Det . a scan: khng ph hp

    0 Papa 10 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP0 NP . Papa0 D t th

    attach NP mi to (bt u t 0) vi cc phn lin quan (cc phn cha hon thnh kt thc ti 0 v c NP sau du . )

    46

    0 Det . the0 Det . a

    0 Papa 10 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th

    predict

    47

    0 Det . the0 Det . a

    0 Papa 10 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    predict

    48

    0 Det . the 1 PP . P NP0 Det . a

  • 0 Papa 10 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    predict

    49

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    0 Papa 10 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    predict

    50

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    0 Papa 1 0 ROOT . S 0 NP Papa .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP predict

    51

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    1 P . with

    predict

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    52

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    1 P . withscan: thnh cng!

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    53

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    1 P . with scan: khng hp

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP0 NP . NP PP 1 VP . V NP0 NP . Papa 1 VP . VP PP0 D t th 1 PP P NP

    attach

    54

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    1 P . with

  • 0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP

    predict

    55

    0 Det . the 1 PP . P NP0 Det . a 1 V . ate

    1 P . with

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    predict (cc bc sau tng t)

    56

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    predict

    57

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 20 ROOT . S 0 NP Papa . 1 V ate .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    scan (lc ny tht bi v P kh hi l t ti

    58

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    Papa khng phi l t tip theo)

    0 Papa 1 ate 2 the 30 ROOT . S 0 NP Papa . 1 V ate . 2 Det the .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th th h !

    59

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    scan: thnh cng!

    0 Papa 1 ate 2 the 30 ROOT . S 0 NP Papa . 1 V ate . 2 Det the .0 S . NP VP 0 S NP . VP 1 VP V . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    60

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

  • 0 Papa 1 ate 2 the 30 ROOT . S 0 NP Papa . 1 V ate . 2 Det the .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N0 NP . Det N 0 NP NP . PP 2 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    61

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 0 ROOT . S 0 NP Papa . 1 V ate . 2 Det the .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    62

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    63

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    64

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    attach

    65

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa0 D t th 1 PP P NP 2 D t th

    attach

    66

    0 Det . the 1 PP . P NP 2 Det . the0 Det . a 1 V . ate 2 Det . a

    1 P . with

  • 0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    attach

    67

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    68

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    attach

    69

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    70

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    71

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 40 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    72

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

  • 0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    73

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    74

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 5 NP . Papa0 D t th 1 PP P NP 2 D t th 1 VP VP PP

    75

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 5 NP . Papa0 D t th 1 PP P NP 2 D t th 1 VP VP PP 5 D t th

    76

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 5 Det . the0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 5 Det . a

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 5 NP . Papa0 D t th 1 PP P NP 2 D t th 1 VP VP PP 5 D t th

    77

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 5 Det . the0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 5 Det . a

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 5 NP . Papa0 D t th 1 PP P NP 2 D t th 1 VP VP PP 5 D t th

    78

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 5 Det . the0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 5 Det . a

    1 P . with 0 ROOT S .4 P . with

  • 0 Papa 1 ate 2 the 3 caviar 4 with 50 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 4 P with .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 5 NP . Papa0 D t th 1 PP P NP 2 D t th 1 VP VP PP 5 D t th

    79

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 5 Det . the0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 5 Det . a

    1 P . with 0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NPPP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PPPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    80

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . NPP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PPPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    81

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . NPP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N 6 N . caviarP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP 6 N . spoonPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    82

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . NPP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N 6 N . caviarP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP 6 N . spoonPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    83

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 spoon 7 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a . 6 N spoon .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . NPP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N 6 N . caviarP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP 6 N . spoonPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    84

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

  • ate 2 the 3 caviar 4 with 5 a 6 spoon 7 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a . 6 N spoon .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . N 5 NP Det N .PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N 6 N . caviarP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP 6 N . spoonPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    85

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    ate 2 the 3 caviar 4 with 5 a 6 spoon 7 . 1 V ate . 2 Det the . 3 N caviar . 4 P with . 5 Det a . 6 N spoon .P 1 VP V . NP 2 NP Det . N 2 NP Det N . 4 PP P . NP 5 NP Det . N 5 NP Det N .PP 2 NP . Det N 3 N . caviar 1 VP V NP . 5 NP . Det N 6 N . caviar 4 PP P NP .P 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP . NP PP 6 N . spoon 5 NP NP . PPPP 2 NP . Papa 0 S NP VP . 5 NP . PapaP 2 D t th 1 VP VP PP 5 D t th

    86

    P 2 Det . the 1 VP VP . PP 5 Det . the2 Det . a 4 PP . P NP 5 Det . a

    0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    87

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    88

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S .4 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    89

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    90

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP

  • 0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    91

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    92

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    93

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    94

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with0 ROOT S .

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    95

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with0 ROOT S .

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    96

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with0 ROOT S .

  • 0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    97

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with0 ROOT S .

    0 Papa 1 ate 2 the 3 caviar 4 with a spoon 70 ROOT . S 0 NP Papa . 1 V ate . 2 Det the . 3 N caviar . 6 N spoon .0 S . NP VP 0 S NP . VP 1 VP V . NP 2 NP Det . N 2 NP Det N . 5 NP Det N .0 NP . Det N 0 NP NP . PP 2 NP . Det N 3 N . caviar 1 VP V NP . 4 PP P NP .0 NP . NP PP 1 VP . V NP 2 NP . NP PP 3 N . spoon 2 NP NP . PP 5 NP NP . PP0 NP . Papa 1 VP . VP PP 2 NP . Papa 0 S NP VP . 2 NP NP PP .0 D t th 1 PP P NP 2 D t th 1 VP VP PP 1 VP VP PP

    98

    0 Det . the 1 PP . P NP 2 Det . the 1 VP VP . PP 1 VP VP PP .0 Det . a 1 V . ate 2 Det . a 4 PP . P NP 7 PP . P NP

    1 P . with 0 ROOT S . 1 VP V NP .4 P . with 2 NP NP . PP

    0 S NP VP .1 VP VP . PP7 P . with0 ROOT S .

    Vn vi PTCP trn xung: qui tri

    VP

    VP PPgn lin tc cc lut mi vo cy trc khi thy PP

    99

    VP PP

    VP PP

    PPs cn on trc s PP cn u vo

    nhng thut ton Earley Ok!VP

    PPVP

    1 VP . VP PP

    (ct 1)

    100

    attach

    VP

    V NP

    VP

    PPVP

    V NPate the caviar

    1 VP VP . PP

    (ct 4)

    nhng thut ton Earley Ok!

    VP

    VP

    PPVP

    1 VP . VP PP

    (ct1)

    attach

    c th dng li

    101

    VP

    V NP

    VP

    PPVP

    V NP

    VP

    PP

    1 VP VP . PP

    ate the caviar

    with a spoon

    (ct 7)

    attach

    nhng thut ton Earley Ok!

    VP

    VP

    PPVP

    1 VP . VP PPc th dng li

    (ct1)

    102

    VP

    V NP

    VP

    PPVP

    V NP

    VP

    PP

    ate the caviar

    with a spoon

    in his bed

    1 VP VP PP .

    (ct 10)

  • nhng thut ton Earley Ok!VP

    PPVP

    1 VP . VP PPc th dng li

    VP

    VP

    PP1 VP VP . PP(ct1) attach

    103

    VP

    V NP

    VP

    PPVP

    V NP

    PP

    VP PP

    ate the caviar

    with a spoon

    in his bed

    (ct10)

    Phc hi cy c phpS dng thut ton dng queue n gin,

    da trn cc thnh phn c ch 1 thnh phn trng thi kt thc l c ch If s=[A ,i] trong tp ch k & c ch then q=[A ,k] & item r= [B ,j] l

    c ch

    [s,itrong tp trng thi j

    i k j

    q r

    104

    i k j

    nh du tt c cc thnh phn trong tp trng thi Sn dng Start S, 0

    for j=n downto 0 dofor i=0 to j do

    for mi b nh du [s,i] trong tp trng thi j dofor k=i to j do

    if [q,i]Sk & [r,k] Sj & s= qr thennh du [q,i] v [r,k]

    [s,i] : mt thnh phn vi lut s & tr v con tr i.

    u im

    z Thut ton Earley thc hin mt vi php lc top-down: bt c thnh phn no (state, or triple) c a vo tp trng thi cn tng thch vi phn c sinh ra bn tri V

    105

    thch vi phn c sinh ra bn tri. V d: S wi trong wi l phn ca cu c duyt qua

    wi

    S

    *

    Nhc im

    z Biu din lut: Explicit representation of rules: wastes time building them.

    z Thc hin php lc bn tri nhng khng lc

    106

    bn phi

    Php lc nhn trc cho k hiu khng kt thc A:

    FIRST(A)= {x|A x }, x= 1 tokenv.d., FIRST(S)= who, did, the, etc.

    Cc phng php khc

    z Cc phng php khc ng vi cc cch khc nhau tm cc on

    z on X[i, j] l on c nhn X ph u vo t I n jExample:

    John ate ice cream on the table

    107

    0 John 1 ate 2 ice-cream 3 on 4 the 5 table 6PP[3,6]; S[0,6];

    z Biu din khng gian tm kim nh cy and-orz Disjuncts (or) = cc ng phn tch khc nhauz Conjuncts (and) = v phi ca lut, v d v phi ca

    S l NP VP

    PTCP l vic tm kim

    Det(0,1) Noun(1, 2)

    S(0, 7)0 the 1 guy 2 saw 3 ice-cream 4 on 5 the 6 hill 7

    NP(0, 1) VP(1, 8) NP(0, 2)

    V(1, 2)

    VP(2, 7)

    V(2, 3) NP(3,7) NP(3 4)

    Name (0, 1)

    108

    NP(5, 7)

    Det(5,6) Noun(6,7)the hill

    NP(5,7)

    Name(5,6)

    ( ) ( , )

    Name(3, 4) PP(4, 7)

    the guy

    saw

    NP(3, 4)

    Prep(4, 5)

    on

    ice-cream

  • PTCP gc tri (Left-corner parsing)

    z Nhn t di ln tm k hiu u tin (left-corner) ca on, sau phn tch phn cn li theo kiu trn

    S

    NP VP

    S NP VP

    NP the Noun

    VP ate NP

    109

    xungz Tm cch kt hp cc c trng tt nht ca tm phn tch trn xung v di ln

    theNoun

    12

    tm

    predict

    ate

    Phng php ny lm vic tt vi ngn ng vi thnh phn quan trng t u nh ting Anh. Cc ting c, H Lan, Nht l ngn ng c phn quan trng t cui.

  • Phn tch c php xc sut

    L Thanh Hng

    1

    gB mn H thng Thng tin

    Vin CNTT &TT Trng HBKHNEmail: [email protected]

    Lm cch no chn cy ng?

    z V d: I saw a man with a telescope.

    z Khi s lut tng, kh nng nhp nhng tngz Tp lut NYU: b PTCP Apple pie : 20,000-30,000

    2

    p pp plut cho ting Anh

    z La chn lut AD: V DT NN PP(1) VP V NP PP

    NP DT NN(2) VP V NP

    NP DT NN PP

    Kt hp t (bigrams pr)V d:

    Eat ice-cream (high freq)Eat John (low, except on Survivor)

    Nhc im:z P(John decided to bake a) c xc sut caoz Xt:

    P(w3) = P(w3|w2w1)=P(w3|w2)P(w2|w1)P(w1)

    3

    P(w3) P(w3|w2w1) P(w3|w2)P(w2|w1)P(w1)Gi thit ny qu mnh: ch ng c th quyt nh b ng trong

    cuClinton admires honesty

    s dng cu trc ng php dng vic lan truynz Xt Fred watered his mothers small garden. T garden c

    nh hng nh th no?z Pr(garden|mothers small) thp m hnh trigram khng ttz Pr(garden | X l thnh phn chnh ca b ng cho ng t to

    water) cao hn s dng bigram + quan h ng php

    Kt hp t (bigrams pr)

    z V c mt s loi b ng nht nh Verb-with-obj, verb-without-obj

    z S tng thch gia ch ng v b ng:John admires honesty Honesty admires John ???

    4

    Nhc im: Kch thc tp ng php tngz Cc bi bo ca tp ch Wall Street Journal trong 1 nm:

    47,219 cu, di trung bnh 23 t, gn nhn bng tay: ch c 4.7% hay 2,232 cu c cng cu trc ng php

    Khng th da trn vic tm cc cu trc c php ng cho c cu. Phi xy dng tp cc mu ng php nh

    V dS

    VP VP

    VP

    Lut 3

    5

    This apple pie looks good and is a real treat

    DT NN NN VBX JJ CC VBX DT JJ NNNP NP

    VP ADJLut 1 Lut 2

    Lut 1. NPDT NN NN2. NPDT JJ NN3. SNP VBX JJ CC VBX NPz Nhm (NNS, NN) thnh NX; (NNP, NNPs)=NPX;

    (VBP, VBZ, VBD)=VBX;

    6

    (VBP, VBZ, VBD) VBX; z Chn cc lut theo tn sut ca n

  • Tnh xc sut

    X NP

    1470

    Pr(X Y)

    7

    Y DT JJ NN

    9711NP

    = = 0.1532

    Tnh PrS

    NP VP

    DT JJ NN VBX NP

    DT JJ NNThe big guyate

    1

    4

    3

    S NP VP; 0.35NP DT JJ NN; 0.1532VP VBX NP; 0.302

    2

    8

    Lut p dng Chui Pr1 S NP VP 0.352 NP DT JJ NN 0.1532 x 0.35 = 0.05363 VP VBX NP 0.302 x 0.0536= 0.01624 NP DT JJ NN 0.1532 x 0.0162=0.0025Pr = 0.0025

    the apple pie

    Vn phm phi ng cnh xc sut

    z 1 vn phm phi ng cnh xc sut (Probabilistic Context Free Grammar) gm cc phn thng thng ca CFG

    z Tp k hiu kt thc {wk}, k = 1, . . . ,Vz Tp k hiu khng kt thc {Ni}, i = 1, . . . ,nz K hiu khi u N1

    9

    z K hiu khi u Nz Tp lut {Ni j}, j l chui cc k hiu kt thc v khng

    kt thcz Tp cc xc sut ca 1 lut l:

    i j P(Ni j) = 1z Xc sut ca 1 cy c php:

    P(T) = i=1..n p(r(i))

    Cc gi thitz c lp v tr: Xc sut 1 cy con khng ph thuc vo v tr

    ca cc t ca cy con trong cu

    k, P(Njk(k+c) ) l ging nhauz c lp ng cnh: Xc sut 1 cy con khng ph thuc vo

    10

    p g y g p cc t ngoi cy con

    P(Njkl| cc t ngoi khong k n l) = P(Njkl)z c lp t tin: Xc sut 1 cy con khng ph thuc vo

    cc nt ngoi cay con

    P(Njkl| cc nt ngoi cy con Njkl ) = P(Njkl)

    Cc thut ton

    z CKYz Beam searchz Agenda/chart based search

    11

    z Agenda/chart-based searchz

    CKY kt hp xc sut

    z Cu trc d liu:z Mng lp trnh ng [i,j,a] lu xc sut ln nht

    ca k hiu khng kt thc a trin khai thnh chui ij.

    12

    z Backptrs lu lin kt n cc thnh phn trn cyz Ra: Xc sut ln nht ca cy

  • Tnh Pr da trn suy din

    z Trng hp c bn: ch c 1 t u voPr(tree) = pr(A wi)

    z Trng hp qui: u vo l xu cc tAwij if k: A C, B wik ,C wkj ,ik j. * **

    13

    p[i,j] = max(p(A C) x p[i,k] x p[k,j]).

    i k j

    A

    B C

    wij14

    Tnh xc sut Viterbi (thut ton CKY)

    15

    0.0504

    V dz S NP VP 0.80z NP Det N 0.30z VP V NP 0.20z V includes 0 05

    z Det the 0.50z Det a 0.40z N meal 0.01z N flight 0 02z V includes 0.05 z N flight 0.02

    Dng thut ton CYK phn tch cu vo:The flight includes a meal

    Tnh Pr1. S NP VP 1.02. VP V NP PP 0.43. VP V NP 0.64. NP N 0.75. NP N PP 0.36. PP PREP N 1.0 NP NP PP

    VP

    S VP

    NP

    PPV N

    1.0

    0.40 7 0 7

    0.6

    0.3

    17

    7. N a_dog 0.38. N a_cat 0.59. N a_telescop 0.210. V saw 1.011. PREP with 1.0

    N V N PREP N PREP N

    0.7

    0.3 1.0 0.5 1.0 0.2

    0.71.0

    1.0

    Pl = 1.7.4.3.71.511.2 = .00588 Pr = 1.7.6.3.31.511.2 = .00378 Pl is chosen

    a_dog saw a_cat with a_telescope

    Xc sut Forward v Backward

    The big brown fox

    NPN

    NThe

    big

    t

    Xt

    1 t-1 T

    Forward= xc sut cc phn t trn v bao gm 1 nt c th no

    18

    NN

    bigbrown

    foxForwardProbability =ai(t)=P(w1(t-1), Xt=i)

    i

    Backward Probability =bi(t)=P(wtT |Xt=i)

    bi(t)

    ai(t)th no

    Backward= xc sut cc phn t di 1 nt c th no

  • Xc sut trong v ngoiN1= Start

    Nj

    Outside j(p,q)

    Inside j(p,q)

    19

    z Npq = k hiu khng kt thc Nj tri t v tr p n q trong xu

    z j = xc sut ngoi (outside)z j = xc sut trong (inside)z Nj ph cc t wp wq, nu Nj wp wq

    w1 wm

    wp wq wq+1wp-1

    N1= Start

    Nj

    Outside j(p,q)

    Inside j(p,q)

    Xc sut trong v ngoi

    20

    w1 wm

    wp wq wq+1wp-1

    j(p,q) j(p,q) = P(N1 w1m , Nj wpq | G)= P(N1 w1m |G) P(Nj wpq | N1 w1m, G)

    j(p,q)=P(w1(p-1) , Npqj,w(q+1)m|G)j(p,q)=P(wpq|Npqj, G)

    Tnh xc sut ca xu

    z S dng thut ton Inside, 1 thut ton lp trnh ng da trn xc sut inside

    P(w1m|G) = P(N1 * w1m|G) = P(w1m|N1m1, G) = 1(1,m)

    21

    z Trng hp c bn:j(k,k) = P(wk|Nkkj, G)=P(Nj wk|G)

    z Suy din:j(p,q) = r,sd(p,q-1) P(Nj NrNs) r(p,d) s(d+1,q)

    Suy din

    NjP(Nj NrNs)

    Tnh j(p,q) vi p < q tnh trn tt c cc im j thc hin t di ln

    22

    Nr Ns

    wp wdwd+1 wq

    r(p,d) s(d+1,q)x

    P(Nj NrNs)

    -nhn 3 thnh phn, tnh tng theo j, r,s.

    V d1. S NP VP 1.02. VP V NP PP 0.43. VP V NP 0.64. NP N 0.75. NP N PP 0.3 NP NP PP

    VP

    S VP

    NP

    PPV N

    1.0

    0.4

    0.6

    0.3

    23

    5. NP N PP 0.36. PP PREP N 1.07. N a_dog 0.38. N a_cat 0.59. N a_telescope 0.210. V saw 1.011. PREP with 1.0 P(a_dog saw a_cat with a_telescope) =

    N V N PREP N

    NP NP PP V N

    PREP N

    0.7

    0.3 1.0 0.5 1.0 0.2

    0.71.0

    1.0

    1.7.4.3.71.511.2 + ... .6... .3... = .00588 + .00378 = .00966

    Tm kim kiu chmz Tm kim trong khng gian trng thiz Mi trng thi l mt cy c php con vi 1 xc sut

    nht nhz Ti mi thi im, ch gi cc thnh phn c im cao nht

    24

  • Lm giu PCFG

    z PCFG n gin hot ng khng tt do cc gi thit c lp

    z Gii quyt: a thm thng tinPh th t

    25

    z Ph thuc cu trcz Vic trin khai 1 nt ph thuc vo v tr ca n

    trn cy ( c lp vi ni dung v t vng ca n)z V d: b sung thng tin cho 1 nt bng cch lu

    gi thng tin v cha ca n: SNP khc vi VPNP

    Lm giu PCFGz PCFG t vng ha : PLCFG (Probabilistic

    Lexicalized CFG, Collins 1997; Charniak 1997)

    z Gn t vng vi cc nt ca lutz Cu trc H