towards the use of linguistic information in automatic mt evaluation metrics

21
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz

Upload: cato

Post on 14-Jan-2016

42 views

Category:

Documents


2 download

DESCRIPTION

Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics. Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz. Outline. Introduction State of the Art Discussion of MT Evaluation Metrics Hypothesis & Objective Methodology & Schedule. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Towards the Use of Linguistic Information in Automatic MT Evaluation

Metrics

Projecte de Tesi

Elisabet Comelles

Directores Irene Castellon i Victoria Arranz

Page 2: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Outline

• Introduction

• State of the Art

• Discussion of MT Evaluation Metrics

• Hypothesis & Objective

• Methodology & Schedule

Page 3: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Introduction

• Quickly access to Multilingual Information

• Need for quick translation

• High increase of MT Systems

• Need for evaluation of those MT Systems

• Evaluation needs to be quick and reliable

Page 4: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Introduction

• Current and most used Evaluation Metrics show problems

• New approaches to Evaluation using linguistic information:– Syntactic info

– Semantic info

• Our scenario:– Comparisson between already existing systems

– Direction of translation to test: English-Spanish

Page 5: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the Art

• MT absolutely linked to MT Evaluation

• Purpose of the evaluation methods:– Error analysis– System comparisson

• Chronologically:1. Human MT Evaluation

2. Automatic MT Evaluation

Page 6: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Focused on Context: – Context-based Evaluation (FEMTI)

• Evaluates suitability of the MT Technology & the MT System for the user’s purpose

• Parameters of analysis: functionality, reliability, usabiility, efficiency, maintainability, portability, cost, etc.

• Focused on Quantitiy & Quality: – Human Evaluation and Automatic Evaluation

Page 7: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Human Evaluation:– Several approaches:

• Fidelity (ALPAC report)• Intelligibility (ALPAC report)• Comprehensive evaluation of informativeness

(ARPA)• Quality panel evaluation• Adequacy and Fluency (Semantics and Syntax)• Preferred Translation• Required Post-Editing

Page 8: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Human Evaluation:– Advantage: human evaluators can evaluate

the overall qualitiy of the system– Disadvantages:

• Time-consuming• Expensive• Subjective

Page 9: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Automatic Evaluation:– Approaches:

• Based on Lexical Matching• Based on Syntax• Based on Semantics

Page 10: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Based on Lexical Matching:– Dominant approach to Automatic MT

Evaluation– Seeks for lexical similarities between MT

output and reference translations– Types:

• Edit Distance Measures (WER)• Precision-oriented Measures (BLEU)• Recall-oriented Measures (ROUGE)• Measure balancing Precision & Recall (GTM)

Page 11: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Based on Syntax– Recently developed– Focused on the syntax of the output sentence– Types:

• Constituency Parsing• Dependency Parsing• Combination of both analyses (Liu & Gildea 2005)

Page 12: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

State of the ArtTypes of MT Evaluation

• Based on Semantics:– Recently developed– Focused on the semantics of the output level– Types:

• NEs: Quality over NEs (NEE)• Semantic Roles: Similarities over Semantic Roles

(SR)

Page 13: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Discussion of MT evaluation Metrics

• Human Evaluation:– Advantatges:

• Allow to evaluate overall quality

– Disadvantatges:• Time-consuming• Expensive• Subjective

Page 14: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Discussion of MT Evaluation Metrics

• Automatic Evaluation:– Advantages:

• Fast• Not expensive• Objective• Updatable

– Disadvantages?

Page 15: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Discussion of MT Evaluation Metrics

• Automatic Metrics based on Lexical Matching:– Great advance in MT Research in the last decade– Widely accepted & used by the SMT research

community– BLEU is the most used Automatic Metric– Criticized by those not developing SMT systems– Usually depend on translation references– Only take into account lexical similarities &

disregard syntax– Biased

Page 16: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Discussion of MT Evaluation Metrics

• Automatic Metrics based on Syntax:– Good improvement– Works at sentence level– Only focused on Syntax– What about meaning?

• Automatic metrics based on Semantics:– Good improvement– Only NEs & Semantic Roles– NEs not too relevant– Need further development– Only focused on meaning, what about syntax?

Page 17: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Discussion of MT Evaluation Metrics

• Discussion of Automatic Metrics:– Each metric focuses on a partial aspect of

qualityStrongly biased evaluationsUnfair comparisson between systemsOvertuning of the system

− Need for integration of metrics• Parametric vs. Non-parametric• Evaluation of the quality of a metric combination

Human likeness Human acceptability

Page 18: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Hypothesis & Objective

• Hypothesis:Adding new linguistic information will improve

the performance of Automatic Metrics

• Main Objective:Proposing a new Automatic Evaluation Metric

based on linguistic information.

Page 19: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Hypothesis & Objective

• Secondary Objectives:– Explore linguistic information:

• Syntactic info: POS, shallow parsing, chunking, full parsing, dependency parsing, constituency parsing, etc.

• Semantic info: Semantic Roles, semantic features, Wordnet, Framenet, Lexical Semantics, etc.

– Look for linguistic resources appropriate to be computationally processed

– Look for linguistic resources publicly available– Explore the appropriate way to combine this

information

Page 20: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Methodology & Schedule

• 4 stages:– Stage 1 (year 1 & 2):

• Bibliography research and analysis:– Detailed exploration and analysis of Automatic

Evaluation Metrics– Detailed exploration, analysis and selection of the

adequate linguistic information.– Exploration of the feasibility and availability of the

linguistic resources needed

– Stage 2 (year 1 & 2):• Selection of the Corpus of evaluation

Page 21: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Methodology & Schedule

– Stage 3 (year 3):• Experiments on how to combine this linguistic

information and the automatic evaluation metrics• Evaluation of our metric combination based on

either likeness or acceptability.

– Stage 4 (year 4):• Analysis & discussion of the results obtained• Summary of the findings and reflection on the

results obtained• Proposal of a new evaluation metric