applicative evaluation of bilingual terminologies

47
1 Applicative evaluation of bilingual terminologies Estelle Delpech NODALIDA 12 th May 2011

Upload: estelle-delpech

Post on 26-Jun-2015

101 views

Category:

Technology


2 download

DESCRIPTION

Material presented at the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia. Download paper: http://hal.archives-ouvertes.fr/hal-00585187 Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina

TRANSCRIPT

Page 1: Applicative evaluation of bilingual terminologies

1

Applicative evaluation of bilingual terminologies

Estelle DelpechNODALIDA 12th May 2011

Page 2: Applicative evaluation of bilingual terminologies

2

Outline

1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements

2 / 47

Page 3: Applicative evaluation of bilingual terminologies

3

Outline

1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements

3 / 47

Page 4: Applicative evaluation of bilingual terminologies

4

Context of the work

• Bilingual terminology mining from comparable corpora

• Application to: – computer-aided translation– computer-aided terminology

4 / 47

Page 5: Applicative evaluation of bilingual terminologies

5

Scope of the work

• Find a way to show the "added-value" of the acquired terminology when used for technical translation– do translators translate better and/or faster ?

• Conception and experimentation of an "applicative" evaluation protocol for bilingual terminologies

5 / 47

Page 6: Applicative evaluation of bilingual terminologies

6

Outline

1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements

6 / 47

Page 7: Applicative evaluation of bilingual terminologies

7

Comparable corpora

English texts on breast cancer

French texts on breast cancer

It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...

Histological evaluation revealed the presence of DCIS...

L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....

Un diagnostic histologique est nécessaire...

7 / 47

Page 8: Applicative evaluation of bilingual terminologies

8

Comparable corpora

English texts on breast cancer

French texts on breast cancer

It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...

Histological evaluation revealed the presence of DCIS...

L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....

Un diagnostic histologique est nécessaire...

8 / 47

Page 9: Applicative evaluation of bilingual terminologies

9

Comparable corpora

English texts on breast cancer French texts on breast cancer

It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...

Histological evaluation revealed the presence of ductal carcinoma in situ.

L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....

Un diagnostic histologique est nécessaire...

9 / 47

Page 10: Applicative evaluation of bilingual terminologies

10

Advantages of comparable corpora

• More available– new domains– unprecedented language pairs

• Quality– spontaneous language– not influenced from source texts

10 / 47

Page 11: Applicative evaluation of bilingual terminologies

11

Reference evaluation of bilingual terminologies

• Reference evaluation: – output of the program is compared with a list

of reference translations• Precision:

– percentage of output translations which are in the reference

output∩referenceoutput

11 / 47

Page 12: Applicative evaluation of bilingual terminologies

12

Reference evaluation with comparable corpora

• Output:– source term → ordered list of candidate

translations• Example:

– histological → diagnostic1, histologie2,

histologique3, … nécessairen

12 / 47

Page 13: Applicative evaluation of bilingual terminologies

13

Reference evaluation with comparable corpora

• Precision: – percentage of output translations which are in

the reference when you take into account the Top 20 or Top 10 candidate translations

• State-of-the-art:– between 42% and 80% on Top 20

depending on corpus size, corpus type, nature of translated elements [Morin and Daille, 2009]

13 / 47

Page 14: Applicative evaluation of bilingual terminologies

14

Reference vs. Applicative evaluation

• Reference evaluation: – ok for testing/developing the alignment

program– fast, cheap, reproducible, objective

• Applicative evaluation:– how much does the alignment program help

the end-users ?– can the terminologies improve translation

quality?

14 / 47

Page 15: Applicative evaluation of bilingual terminologies

15

Outline

1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements

Page 16: Applicative evaluation of bilingual terminologies

16

Applicative evaluation scenario

16 / 47

Page 17: Applicative evaluation of bilingual terminologies

17

Applicative evaluation scenario

17 / 47

Page 18: Applicative evaluation of bilingual terminologies

18

Applicative evaluation scenario

18 / 47

Page 19: Applicative evaluation of bilingual terminologies

19

Applicative evaluation scenario

19 / 47

Page 20: Applicative evaluation of bilingual terminologies

20

Questions raised

2) Evaluate the whole of the translations or technical terms only ?

1) How do you assess translation quality ?

20 / 47

Page 21: Applicative evaluation of bilingual terminologies

21

1) How do you assess translation quality ?

• Translation studies evaluation grids:– SICAL, SAE J 2450– too complex, scarcely documented

• Machine translation objective metrics – BLEU, METEOR– not adapted to human translation– reproducibility is not an advantage in our case

21 / 47

Page 22: Applicative evaluation of bilingual terminologies

22

1) How do you assess translation quality ?

•  Machine translation subjective evaluation– translations evaluated by humans:

• quality judgement: adequacy, fluency... • ranking

– use annotator agreement measure to ensure judges agreement is sufficient

22 / 47

Page 23: Applicative evaluation of bilingual terminologies

23

2) Evaluate the whole text or just some terms ?

• Quality of a text translation = complex interaction of several parameters

• Focus on those elements for which the translator felt he/she needed a linguistic resource:– evaluates only the part of the translation on

which the terminology has an impact– easier and faster

23 / 47

Page 24: Applicative evaluation of bilingual terminologies

24

Applicative evaluation protocol

• Compare 3 different "situations of translations" – one situation = one type of resource

• Translators do the translation, note down the terms they had to look up

• The quality of the terms' translations is assessed by human judges

24 / 4724 / 47

Page 25: Applicative evaluation of bilingual terminologies

25

Situations of translation

25 / 47

Page 26: Applicative evaluation of bilingual terminologies

26

Situations of translation

26 / 47

Page 27: Applicative evaluation of bilingual terminologies

27

Situations of translation

27 / 47

Page 28: Applicative evaluation of bilingual terminologies

28

Situations of translation

28 / 47

Page 29: Applicative evaluation of bilingual terminologies

29

Translations' assessment

1. Quality judgement : – correct: standard term or expression– acceptable: meaning is retained– wrong: no meaning is retained

2. Ranking : – from best to worst– ties allowed

29 / 47

Page 30: Applicative evaluation of bilingual terminologies

30

Outline

1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements

Page 31: Applicative evaluation of bilingual terminologies

31

Data

• Comparable corpora : – breast cancer: 400k words/language– water science: 2M words/language

• Texts to translate :– research paper abstracts: ~500 words/domain– lay science texts: ~500 words/domain

Page 32: Applicative evaluation of bilingual terminologies

32

Translators' feedback

" Globally, 75% of technical words aren't in the glossary, and for the other 25%, 99% have between 10 and 20 candidate translations and none has been validated. So most of the time, you are just partly sure, but you are never totally sure of your translation. And in the worst cases, you translate instinctively ".

Translators were not prepared to use a bilingual terminology with many candidate translations The terminology covered partially the vocabulary of the texts to translate

32 / 47

Page 33: Applicative evaluation of bilingual terminologies

33

Terminology coverage of texts to translate

• Breast Cancer – 94% of the vocabulary of the texts is in the

terminology – fine-grained topic

• Water Science– 14% of the vocabulary of the texts is in the

terminology– topic is too general

33 / 47

Page 34: Applicative evaluation of bilingual terminologies

34

Quality judgement / Breast Cancer

SIT. 0 / GEN. LANG.SIT. 1 / CC

SIT. 2 / WEB

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

38% 43% 47%

42% 38% 35%

20% 19% 18%

BREAST CANCERK = 0,25

• equivalent proportion of incorrect translations

• Internet gives the more correct translations, then the Comparable Corpora.

34 / 47

Page 35: Applicative evaluation of bilingual terminologies

35

Quality judgement / Water Science

• Translations are much better with Internet

• Comparable corpora produces worse translations than the general resources

SIT. 0/ GEN. LANG.SIT. 1 / CC

SIT. 2 / WEB

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

59% 56%77%

23% 23%

16%18% 21%

7%

WATER SCIENCEK = 0,42

35 / 47

Page 36: Applicative evaluation of bilingual terminologies

36

Results seem incoherent

• Translations produced in situation 1 are worse than translations produced in sit. 2

• But they share the same "general language resource" basis

BASELINE Situation 1

generallanguageresources

Terminologymined from

COMPARABLECORPORA

general languageresources

36 / 47

Page 37: Applicative evaluation of bilingual terminologies

37

Possible explanation

When translators have a specialized ressource they tend to ignore the general language resource

BASELINE SITUATION 1Comparable corpora

SITUATION 2Web

General Language resource 43% 14% 3%

Specialized resource - 25% 56%

Intuition 79% 77% 44%

37 / 47

Page 38: Applicative evaluation of bilingual terminologies

38

Possible explanation

If translators of situation 1 had always looked up the general resource first, translations of situation 1 would have been at least as good as translations of situation 0

BASELINESITUATION 1Comparable

corporaSITUATION 2

Web

General Language resource 43% 14% 3%

Specialized resource - 25% 56%

Intuition 79% 77% 44%

38 / 47

Page 39: Applicative evaluation of bilingual terminologies

39

Ranking / Breast Cancer

CC vs. GEN. LANG. CC vs. WEB0

5

10

15

20

25

30

35

40

45

28% 26%

47% 42%

26%

32%

BREAST CANCERK=0,69

39 / 47

Page 40: Applicative evaluation of bilingual terminologies

40

Ranking / Water science

CC vs. GEN. LANG. CC vs. WEB0

10

20

30

40

50

60

70

80

90

18% 16%

49%41%

33%

43%

WATER SCIENCEK=0,63

40 / 47

Page 41: Applicative evaluation of bilingual terminologies

41

Outline

1. Context and scope of work2. Bilingual terminology mining : comparable vs. parallel corpora3. Evaluation of bilingual terminologies4. Applicative evaluation protocol5. Experimentation and results6. Future improvements

Page 42: Applicative evaluation of bilingual terminologies

42

Improvements: terminology coverage

• dependency between:– added-value of the bilingual terminology– its coverage of the texts to translate

• any added-value measure should also indicate to what extent the terminology contains the vocabulary of the translated texts

42 / 47

Page 43: Applicative evaluation of bilingual terminologies

43

Improvement 1: terminology coverage

• Perspectives:– create a "coverage" measure– find out what is the minimum coverage for a

terminology to be "useful" to translate a given text

– gather smaller but finer-grained corpora

43 / 47

Page 44: Applicative evaluation of bilingual terminologies

44

Improvement 2: situations of translations

• When translators have several ressources at their disposal, they tend to ignore the general language resource

• Consequence : the same resource is used differently depending on the situation

• Seems to be the cause for incoherent results

44 / 47

Page 45: Applicative evaluation of bilingual terminologies

45

Improvement 2: situations of translations

• Perspective : use 0 or 1 resource per situation of translation

Situation 0 Situation 1

terminologymined fromComparable

Corpora

Web

Situation 2

45 / 47

Page 46: Applicative evaluation of bilingual terminologies

46

Improvement 3: train translators

• Prepare translators to use "ambiguous", unvalidated terminologies

• Do a first blank evaluation to :– train the translators– train the judges → results in higher

agreement

46 / 47

Page 47: Applicative evaluation of bilingual terminologies

47

Acknowledgements

This work was funded by:– French National Research Agency, subvention

n° ANR-08-CORD-009– Lingua et Machina, www.lingua-et-machina.com

Annotators:– Clémence De Baudus– Mathieu Delage