beng update on automatic labelling

MM P05 automatic labelingterm extraction

Victor de Boer

Josefien Schuurman

Roeland Ordelman

• Input:– TT888 subtitles

• Output:– GTAA terms

• Onderwerpen

• Persoonsnamen

• Namen

• Geografische namen

– For entire video (corresponds to documentalist tasks)

Term extraction from TT888

• version 0.1 – `naive baseline’ – Test input andoutput

• version 0.2 – Multiple GTAA axes– Improve statistics– Bespreking met metadatabeheer

• version 0.3– More improvements – Evaluation

• version 1.0– To be reimplemented

Planning

http

://ww

w.recen

siekon

ing.n

l/20

11

/09

/48

92

8/o

nd

ertiteling

• Java to make integration easier

• XML and CSV outputs– URI of GTAA term – pref-label – Confidence value– Axis

• Input comes from Immix OAI API, where segmentation should already have taken place– Algorithm expects one OAI identifier (Expressie or Selectie)

• Matching with GTAA using ElasticSearch instance

Implementation details

For every item1. Get TT888 words in a frequency list 2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)3. Take all words with freq > n 4. Match with GTAA “Onderwerpen” with ElasticSearch score > m

– Preflabel + altlabel

version 0.1

Algorithm

GTAA

gtaa:002151 “theater”

OAI

Stop words

http://gtaa.beeldengeluid.nl/


Informal Evaluation:

Compare to hist labels (“Onderwerpen”)

Works a bit (< 20% correct). Input for version 0.2

version 0.1

Algorithm

GTAA

gtaa:002151 “theater”

OAI

Stop words



• Intermediate version, uses Named Entity Recognizer. Results discussed with Lisette and Vincent -> Version 0.3

version 0.2

Algorithm

GTAA

“theater”“Jos Brink”“Amsterdam”

OAI

Stop words

Named Entity Recognition

Word freq NL





• Webservice CLTL @ VU• Input:

– “Hallo, mijn naam is Victor de Boer en ik woon in de mooie stad Haarlem. Ik werk nu bij het Nederlands Instituut voor Beeld en Geluid in Hilversum. Hiervoor was ik werkzaam bij de Vrije Universiteit. “

• Output:[ Victor de Boer | PERSON ], [ Haarlem | LOCATION ], [ Nederlands | MISC ], [ Instituut voor Beeld en Geluid | ORGANIZATION ], [ Hilversum | LOCATION ], [ Vrije Universiteit | ORGANIZATION ]


For every item1. Track 1

1. Get TT888 words in a frequency list 2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)3. Take all N-GRAMS with normalized frequency > n 4. Match with GTAA “Onderwerpen” with score > m

2. Track 21. Present TT888 to Named Entity Recognizer (VU-webservice)2. Match result (with freq > L) with GTAA “PersoonsNamen”, “Geografische

Namen”, “Onderwerpen”, “Namen”

version 0.3

Algorithm

GTAA

“theater”“Jos Brink”“Amsterdam”

OAI

Stop words


Word freq NL





version 0.3 > Example output

• Setup– 4 evaluators (Vincent, Lisette , Alma, Tim)

• 3 in one 50 min session

• 1 in another session

– ~8 minutes per item

– Video + extracted terms• Open Videos in IE browser

• GTAA URIS + preflabels

• Any other info allowed

– Five point Likert scale• Only precision, no recall

Evaluation

De gebruikte evaluatieschaal. 0 betekent echt

fout (bv een verkeerd homonym) of echt niet

relevant (verkeerd persoon). Aangezien hier

wisselwerking optreedt kan dit niet veel verder

uitgesplitst worden.

0: Term is geheel niet relevant

1: Term is niet relevant

2: Term is een beetje relevant

3: Term is relevant

4: Term is zeer relevant

Evaluation

• Total of 70 terms for 13 videos (5.4 term per vid)

– Some videos did not start-> discarded

– 38 terms with three evaluations

– 32 with one

Results

Results

eval_1 eval_2 eval_

3

eval_4 Avg

gem: F Term 2,59 1,35 2,00 2,37 2,08

item 1 6 licht 0 0 0 0

item 1 2 Friesland 0 0 2 0,666667

item 3 2 soul 0 1 1 0,666667

item 3 3 Romme,

Gianni

3 4 4 3,666667

item 3 2 Somerville,

Jimmy

4 2 2 2,666667

item 3 3 Harrison,

George

4 4 3 3,666667

item 3 4 Clapton, Eric 4 4 2 3,333333

item 3 2 Milwaukee 3 1 1 1,666667

• Term “Milwaukee”

– Top2000 a gogo

Example of disagreement

Eval 1-> score=3 “Term an sich niet heel relevant, maar in combinatie met Romme, Gianni toch waardevol. Alweer: NER wint aan kracht als user tijdcode meekrijgt en kan afspelen ter check of fragment relevant of niet is voor zijn zoekactie/hergebruik.”

Eval 3-> score=1“twee keer genoemd, niet relevant”

Eval 2-> score=1“…”

Pearson eval1 eval2 eval3

eval1 1

eval2 0,52 1

eval3 0,67 0,58 1

eval4 0.78 x 0.92

Inter-annotator agreement

Agreement between 3 and 4 is largebetween 1 and 4 is substantivebetween 1 and 2 , 1 and 3, 2 and 3 is lower but ok

Task is fairly objective, but somewhat subjectiveWe look mainly at averages for the rest

• Total average of 2.15 (“beetje relevant”+)

Results: average scores

At threshold of 2: Precision = 0.61

At threshold of 3: Precision = 0.36

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.2 0.4 0.6 0.8 1 1.2

Results per video

item averageitem 1 0,3333333

item 3 2,6111111

item 5 2,4444444

item 6 1,75

item 8 1,4

item 9 3,6666667

item 10 2,4545455

item 13 2.375

item 14 0(!)

item 15 1.33

item 17 2.08

item 19 4.00 (!)

item 20 1.67

• For some videos we shouldn’t do this

– Nederland in Beweging

– Metadata on Reeks-level

“Advies: Niveau 1 programma's uitsluiten van trefwoordextractie, ws. ook van NER”

• Correlation between frequency of term in text and average score

– No correlation (?)

Results correlation freq/score

• For some videos this shouldn’t be done– Game shows, drama..– Annotate at Reeks level

• Some axes seem to work better then others – Persoonsnamen, Namen, Geografische namen

• More abstraction or combination would be helpful– Semantic Clustering?

• Subtitles with * are song lyrics

• Still a need for time-coded terms

Evaluator remarks

• Limited evaluation

• But it works (prec 0.61)– With some tweaks to 0.7-0.8

• NEs lower threshold, Subjects higher

• Better Elasticsearch matching

– With semantic clustering to 0.8-0.9?

• Currently re-implemented by Arjen as a proper service

• Re-use for annotating program guides

Conclusion and current steps

A huge thanks to the annotators for their valuable effort!!

Questions?

antw

oo

rdn

u.n

l

http://antwoordnu.nl/uitpakken-en-branden

beng update on automatic labelling

Education

namen version

vincent version

metadatabeheer version

gtaa onderwerpen

itemget tt888 words

gtaa persoonsnamen

results results eval

tt888 subtitlesoutput