conditional random fields

Upload: vinh-le

Post on 19-Jul-2015

126 views

Category:

Documents


4 download

TRANSCRIPT

Conditional Random Fields CRFsTng quan v CRF.CRF (conditional random fields) l m hnh chui cc xc sut c iu kin, hun luyn ti a ha xc sut iu kin. N l mt framework cho php xy dng nhng m hnh xc sut phn on v gn nhn chui d liu [1]. Theo [3], CRF, cng ging nh trng ngu nhin Markov (Markov random field), l mt m hnh th v hng m mi nh biu din cho mt bin ngu nhin (random variable) m c phn phi (distribution) c suy ra, v mi cung (edge) biu din mi quan h ph thuc gia hai bin ngu nhin.

Hnh 1: Cu trc chui (chain-structured) ca th CRFs. X l mt bin ngu nhin trn chui d liu cn c gn nhn v Y l bin ngu nhin trn chui nhn (hoc trng thi) tng ng. V d X l chui cc t quan st (observation) thng qua cc cu bng ngn ng t nhin, Y l chui cc nhn t loi c gn cho nhng cu trong tp X (cc nhn ny c quy nh sn trong tp cc nhn t loi). Mt linear-chain (chui tuyn tnh) CRF vi cc tham s c cho bi cng thc [2]:

Trang 1

Vi Zx l mt tha s chun ha nhm m bo tng cc xc sut ca chui trng thi bng 1 [4].

fk(yt-1, yt, x, t) l mt hm c trng (feature function), thng c gi tr nh phn (binary-valued), nhng cng c th l gi tr thc (real-valued). V l mt trng s

hc (learned weight) kt hp vi c trng fk. Nhng hm c trng c th o bt k trng thi chuyn dch (state transition) no, yt-1 yt, v chui quan st x, tp trung vo thi im hin ti t. V d, mt hm c trng c th c gi tr 1 khi yt-1 l trng thi TITLE, yt l trng thi AUTHOR v xt l mt t xut hin trong tp t vng cha tn ngi. Ngi ta thng hun luyn CRFs bng cch lm cc i ha hm likelihood theo d liu hun luyn s dng cc k thut ti u nh LBFGS1. Vic lp lun (da trn m hnh hc) l tm ra chui nhn tng ng ca mt chui quan st u vo. i vi CRFs, ngi ta thng s dng thut ton qui hoch ng in hnh l Viterbi2 (l thut ton lp trnh ng nhm tm ra chui kh nng (most likely) ca cc trng thi n ) thc hin lp lun vi d liu mi [5].

Demo v d.Ngun: http://crf.sourceforge.net/ c thc hin bi Prof. Sunita Sarawagi. Package ny c xy dng c th s dng trong cc cng vic nh:

1

http://en.wikipedia.org/wiki/L-BFGS http://en.wikipedia.org/wiki/Viterbi_algorithm

2

Trang 2

Information Extraction

Segmentation of text into attributes

Sequence Classification B d liu hun luyn v kim th v cc a ch ti M, bao gm cc tp tin:

us50.train.raw tp tin vn bng th, mi dng cha mt a ch. Tt c

c 51 dng a ch c hun luyn.

us50.train.tagged tp tin cha nhng dng a ch c gn nhn

(c phn thnh 7 lp) nhm mc ch hun luyn.

us50.test.raw tp tin vn bn th chc cc dng a ch kim th.

Tng cng c 690 a ch.

us.test.tagged tp tin cha nhng a ch c gn nhn, nhm mc

ch so khp, t nh gi kt qu t vic gn nhn bng my tnh. Mt a ch c phn thnh 7 lp nh sau: 1: S nh. 2: N/ATrang 3

3: Tn ng. 4: N/A 5: Tn thnh ph. 6: Tn tiu bang. 7: M zip code. Sau thi thc thi chng trnh, kt qu tr v nh sau:

Tng cng c 220 c trng c tm thy. Cc c trng ch yu ph thuc vo s t c trong tp hun luyn, tc l a s cc t c trong tp hun luyn l mt c trng. D liu u ra sau khi thc thi chng trnh gm:

Th mc learntModels: cha 2 tp tin o features: l tp tin cha cc c trng, trong tp tin ny c 2 phn. Phn u thng k s lng t c trong tp hun luyn gm s th t, tn lp, s ln xut hin. Phn sau cha s lng cc c trng gm s th t, tn lp.Trang 4

Hnh 2: Phn u tp tin features.

Hnh 3: Phn sau tp tin features o crf: cha tng s cc c trng v s liu lin quan n c trng .

Th mc out: cha tp tin us50.test.tagged l tp tin lu tr thng a ch c gn nhn thng qua chng trnh.

Cc bc chng trnh thc thi:

Bc 1: c tp hun luyn bao gm: c tp vn bn th v tp tin d liu c gn nhn.

Bc 2: tch tng t trong tp hun luyn, thng k s ln xut hin ca cc t y. Bc 3: bt u qu trnh hun luyn. Xut ra tp tin features v crf. Bc 4: tin hnh c tp kim th.Trang 5

Bc 5: da trn cc c trng c hun luyn, chng trnh tin hnh kim th v xut ra file output. Bc 6: tin hnh so snh, tnh ton kt qu i chiu gia tp tin output chy thng qua chng trnh v tp tin kt qu kim th c lm bng tay.

Ti liu tham kho.[1]. John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 2001. [2]. Andrew McCallum, Fuchun Peng. Accurate Information Extraction from Research Papers using Conditional Random Fields. 2004. [3]. http://en.wikipedia.org/wiki/Conditional_random_field [4]. Charles Sutton, Andrew McCallum. An Introduction to Conditional Random Fields for Relational Learning. [5] Phan Xun Hiu. JVnTagger: Cng c gn nhn t loi ting Vit da trn Conditional Random Fields v Maximum Entropy.

Trang 6