beng update on automatic labelling

22
MM P05 automatic labeling term extraction Victor de Boer Josefien Schuurman Roeland Ordelman

Upload: victor-de-boer

Post on 14-Jul-2015

125 views

Category:

Education


4 download

TRANSCRIPT

Page 1: BenG Update on automatic labelling

MM P05 automatic labelingterm extraction

Victor de Boer

Josefien Schuurman

Roeland Ordelman

Page 2: BenG Update on automatic labelling

• Input:– TT888 subtitles

• Output:– GTAA terms

• Onderwerpen

• Persoonsnamen

• Namen

• Geografische namen

– For entire video (corresponds to documentalist tasks)

Term extraction from TT888

Page 3: BenG Update on automatic labelling

• version 0.1 – `naive baseline’ – Test input andoutput

• version 0.2 – Multiple GTAA axes– Improve statistics– Bespreking met metadatabeheer

• version 0.3– More improvements – Evaluation

• version 1.0– To be reimplemented

Planning

http

://ww

w.recen

siekon

ing.n

l/20

11

/09

/48

92

8/o

nd

ertiteling

Page 4: BenG Update on automatic labelling

• Java to make integration easier

• XML and CSV outputs– URI of GTAA term – pref-label – Confidence value– Axis

• Input comes from Immix OAI API, where segmentation should already have taken place– Algorithm expects one OAI identifier (Expressie or Selectie)

• Matching with GTAA using ElasticSearch instance

Implementation details

Page 5: BenG Update on automatic labelling

For every item1. Get TT888 words in a frequency list 2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)3. Take all words with freq > n 4. Match with GTAA “Onderwerpen” with ElasticSearch score > m

– Preflabel + altlabel

version 0.1

Algorithm

GTAA

gtaa:002151 “theater”

OAI

Stop words

Page 6: BenG Update on automatic labelling

Informal Evaluation:

Compare to hist labels (“Onderwerpen”)

Works a bit (< 20% correct). Input for version 0.2

version 0.1

Algorithm

GTAA

gtaa:002151 “theater”

OAI

Stop words

Page 7: BenG Update on automatic labelling

• Intermediate version, uses Named Entity Recognizer. Results discussed with Lisette and Vincent -> Version 0.3

version 0.2

Algorithm

GTAA

“theater”“Jos Brink”“Amsterdam”

OAI

Stop words

Named Entity Recognition

Word freq NL

Page 8: BenG Update on automatic labelling

• Webservice CLTL @ VU• Input:

– “Hallo, mijn naam is Victor de Boer en ik woon in de mooie stad Haarlem. Ik werk nu bij het Nederlands Instituut voor Beeld en Geluid in Hilversum. Hiervoor was ik werkzaam bij de Vrije Universiteit. “

• Output:[ Victor de Boer | PERSON ], [ Haarlem | LOCATION ], [ Nederlands | MISC ], [ Instituut voor Beeld en Geluid | ORGANIZATION ], [ Hilversum | LOCATION ], [ Vrije Universiteit | ORGANIZATION ]

Named Entity Recognition

Page 9: BenG Update on automatic labelling

For every item1. Track 1

1. Get TT888 words in a frequency list 2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)3. Take all N-GRAMS with normalized frequency > n 4. Match with GTAA “Onderwerpen” with score > m

2. Track 21. Present TT888 to Named Entity Recognizer (VU-webservice)2. Match result (with freq > L) with GTAA “PersoonsNamen”, “Geografische

Namen”, “Onderwerpen”, “Namen”

version 0.3

Algorithm

GTAA

“theater”“Jos Brink”“Amsterdam”

OAI

Stop words

Named Entity Recognition

Word freq NL

Page 10: BenG Update on automatic labelling

version 0.3 > Example output

Page 11: BenG Update on automatic labelling

• Setup– 4 evaluators (Vincent, Lisette , Alma, Tim)

• 3 in one 50 min session

• 1 in another session

– ~8 minutes per item

– Video + extracted terms• Open Videos in IE browser

• GTAA URIS + preflabels

• Any other info allowed

– Five point Likert scale• Only precision, no recall

Evaluation

De gebruikte evaluatieschaal. 0 betekent echt

fout (bv een verkeerd homonym) of echt niet

relevant (verkeerd persoon). Aangezien hier

wisselwerking optreedt kan dit niet veel verder

uitgesplitst worden.

0: Term is geheel niet relevant

1: Term is niet relevant

2: Term is een beetje relevant

3: Term is relevant

4: Term is zeer relevant

Page 12: BenG Update on automatic labelling

Evaluation

Page 13: BenG Update on automatic labelling

• Total of 70 terms for 13 videos (5.4 term per vid)

– Some videos did not start-> discarded

– 38 terms with three evaluations

– 32 with one

Results

Page 14: BenG Update on automatic labelling

Results

eval_1 eval_2 eval_

3

eval_4 Avg

gem: F Term 2,59 1,35 2,00 2,37 2,08

item 1 6 licht 0 0 0 0

item 1 2 Friesland 0 0 2 0,666667

item 3 2 soul 0 1 1 0,666667

item 3 3 Romme,

Gianni

3 4 4 3,666667

item 3 2 Somerville,

Jimmy

4 2 2 2,666667

item 3 3 Harrison,

George

4 4 3 3,666667

item 3 4 Clapton, Eric 4 4 2 3,333333

item 3 2 Milwaukee 3 1 1 1,666667

Page 15: BenG Update on automatic labelling

• Term “Milwaukee”

– Top2000 a gogo

Example of disagreement

Eval 1-> score=3 “Term an sich niet heel relevant, maar in combinatie met Romme, Gianni toch waardevol. Alweer: NER wint aan kracht als user tijdcode meekrijgt en kan afspelen ter check of fragment relevant of niet is voor zijn zoekactie/hergebruik.”

Eval 3-> score=1“twee keer genoemd, niet relevant”

Eval 2-> score=1“…”

Page 16: BenG Update on automatic labelling

Pearson eval1 eval2 eval3

eval1 1

eval2 0,52 1

eval3 0,67 0,58 1

eval4 0.78 x 0.92

Inter-annotator agreement

Agreement between 3 and 4 is largebetween 1 and 4 is substantivebetween 1 and 2 , 1 and 3, 2 and 3 is lower but ok

Task is fairly objective, but somewhat subjectiveWe look mainly at averages for the rest

Page 17: BenG Update on automatic labelling

• Total average of 2.15 (“beetje relevant”+)

Results: average scores

At threshold of 2: Precision = 0.61

At threshold of 3: Precision = 0.36

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.2 0.4 0.6 0.8 1 1.2

Page 18: BenG Update on automatic labelling

Results per video

item averageitem 1 0,3333333

item 3 2,6111111

item 5 2,4444444

item 6 1,75

item 8 1,4

item 9 3,6666667

item 10 2,4545455

item 13 2.375

item 14 0(!)

item 15 1.33

item 17 2.08

item 19 4.00 (!)

item 20 1.67

• For some videos we shouldn’t do this

– Nederland in Beweging

– Metadata on Reeks-level

“Advies: Niveau 1 programma's uitsluiten van trefwoordextractie, ws. ook van NER”

Page 19: BenG Update on automatic labelling

• Correlation between frequency of term in text and average score

– No correlation (?)

Results correlation freq/score

Page 20: BenG Update on automatic labelling

• For some videos this shouldn’t be done– Game shows, drama..– Annotate at Reeks level

• Some axes seem to work better then others – Persoonsnamen, Namen, Geografische namen

• More abstraction or combination would be helpful– Semantic Clustering?

• Subtitles with * are song lyrics

• Still a need for time-coded terms

Evaluator remarks

Page 21: BenG Update on automatic labelling

• Limited evaluation

• But it works (prec 0.61)– With some tweaks to 0.7-0.8

• NEs lower threshold, Subjects higher

• Better Elasticsearch matching

– With semantic clustering to 0.8-0.9?

• Currently re-implemented by Arjen as a proper service

• Re-use for annotating program guides

Conclusion and current steps

Page 22: BenG Update on automatic labelling

A huge thanks to the annotators for their valuable effort!!

Questions?

antw

oo

rdn

u.n

l