rules based machine translation

17
1 Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL

Upload: reese

Post on 23-Feb-2016

104 views

Category:

Documents


1 download

DESCRIPTION

Rules Based Machine Translation. Fred Hollowood. Consultant. Sample Agenda. Introduction. 1. Rules Based Machine Translation. 2. Post-Editing. 3. Quality Measurement. 4. Controlled Language. 5. Introduction. The Aim - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rules Based Machine Translation

1

Rules Based Machine Translation

Fred HollowoodConsultant

RBMT and CL

Page 2: Rules Based Machine Translation

Sample Agenda

RBMT and CL 2

Introduction1

Rules Based Machine Translation2

Post-Editing3

Quality Measurement4

Controlled Language5

Page 3: Rules Based Machine Translation

IntroductionThe Aim

Bring rapid, cost-effective translation to Symantec’s product and service divisions

Connect Symantec’s CMS to translation technologiesMetrics on the reduction of translation costs and time to market

The ApproachStructure source content so it accommodates MT

Use a language checker to monitor source grammar

Promote terminology as a key process and deliverableProactive rather than reactive

Define measures to monitor and drive productivityGTM, Meteor, BLEU

Work with post-editors to ensure a win-win

RBMT and CL 3

Technology Initiative - The Aim

Page 4: Rules Based Machine Translation

Rules Based Machine Translation

RBMT and CL 4

SL Text

Analysis

SL Lexicon &Grammars

Transfer

SL->TL Lexical &Structural Rules

Synthesis

TL Text

TL Lexicon &Grammars

Flowchart of Rule-Based Machine Translation (RBMT)

Page 5: Rules Based Machine Translation

MT Process Overview

RBMT and CL 5

Controlled Language Authoring

Automated Pre-processing

User Dictionary

Translation System

Normalisation Dictionary

Automated Post-processing

Human Post-Editing

Systran Engine

Remote Human ActivitySystem Control PhasesText Processing

Page 6: Rules Based Machine Translation

Post-Editing

Fundamentally same relationship as with traditional vendorIncreased daily throughput expected for Post Edited content (6-8k Vs 2.5k p/day)Style requirements have been critically reviewed in the light of PE

E.g. stylistic inconsistencies are acceptable for post-edited content

RBMT and CL 6

Page 7: Rules Based Machine Translation

RBMT and CL 7

Measurement

Page 8: Rules Based Machine Translation

Metrics based on Comprehensibility

RBMT and CL 8

Score CriteriaExcellent MT output (E) (4)

Read the MT output first. Then read the Source Text (ST). Your understanding of the MT output is not improved by the reading of the ST because the MT output is satisfactory and would not need to be modified. An end-user who does not have access to the ST would be able to understand the MT output.

Good MT output (G) (3)

Read the MT output first. Then read the source text.Your understanding of the MT output is not improved by the reading of the ST even though the MT output contains minor grammatical mistakes. An end-user who does not have access to the source text could possibly understand the MT output.

Medium MT output (M) (2)

Read the MT output first. Then read the source text. Your understanding of the MT output is improved by the reading of the ST, due to significant errors in the MT output. An end-user who does not have access to the source text could only get the gist of the MT output.

Poor MT output (P) (1)

Read the MT output first. Then read the source text. Your understanding only derives from the reading of the ST, as you could not understand the MT output. An end-user who does not have access to the source text would not be able to understand the MT output at all.

Page 9: Rules Based Machine Translation

Quality by Human Inspection

RBMT and CL 9

Hamlet Language Analysis TK 1 - 6

50

98

276

144

122

145

216

111112

141

230

107

41

102

190

267

0

50

100

150

200

250

300

Poor Medium Good Excellent

SpanishItalianGermanFrench

Page 10: Rules Based Machine Translation

GTM Scoring

RBMT and CL 10

From the machine

From the post-editor

Page 11: Rules Based Machine Translation

Quality Metrics by Language

RBMT and CL 11

Hamlet GTM Results

13.96%

18.92%20.07%

14.69%

2.93%

18.37%

1.11%0.64% 0.29%

6.39%

2.62%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00

FrenchSpanish

Hamlet GTM Results

0.64%0.29%

1.11%

2.62%

6.39%

13.96%

18.92%

20.07%

14.69%

2.93%

18.37%

0.60% 0.28%1.20%

2.97%

8.87%

18.37%

24.19%

19.54%

12.06%

2.35%

9.57%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00

FrenchSpanish

Hamlet GTM Results

0.64%0.29%

1.11%

2.62%

6.39%

13.96%

18.92%

20.07%

14.69%

2.93%

18.37%

0.60% 0.28%

1.20%

2.97%

8.87%

18.37%

24.19%

19.54%

12.06%

2.35%

9.57%

0.97% 0.86%

3.33%

9.42%

17.49%

22.31%

18.82%

12.55%

6.88%

1.25%

6.12%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00

FrenchSpanishItalian

Hamlet GTM Results

0.64%0.29%

1.11%

2.62%

6.39%

13.96%

18.92%

20.07%

14.69%

2.93%

18.37%

0.60% 0.28%

1.20%

2.97%

8.87%

18.37%

24.19%

19.54%

12.06%

2.35%

9.57%

0.97% 0.86%

3.33%

9.42%

17.49%

22.31%

18.82%

12.55%

6.88%

1.25%

6.12%

2.97%2.24%

5.67%

10.85%

16.29%

18.53%

16.14%

11.80%

6.30%

0.99%

8.21%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00

FrenchSpanishItalianGerman

Project Scores by LanguageFrench: 73% Spanish: 68%Italian: 59%

German:57%

Page 12: Rules Based Machine Translation

Example Style rulesAvoid using a colon after a drive letter

Avoid “he”, “she”, “he/she”, and “s/he”

Use numerals for all measurements over 10

Use the serial comma

Do not use more than two adverbs or adjectives in a series

Keep the subject and verb close to each other early in a sentenceAvoid meaningless openers

Avoid progressive tense when describing product use

Do not use future when describing product use

Make positive statements that tell users what to do or what they need to know

Use sentence-style capitalization for bulleted lists

Use a colon at the end of a sentence to introduce a bulleted list

Punctuate imperative sentences in bulleted lists

Use number × number

Use a hyphen in a unit

Repeat the unit of measure

RBMT and CL 12

Page 13: Rules Based Machine Translation

CL rules based on CDGAvoid using the passive voice

Do not use more than 25 words in a sentence (original recommendation was 20)

Use relative pronouns

Use complementizers (“that”)

Avoid unnecessary words (such as “basic” or “just”)

Do not use 'this' or 'that' when they are not followed by a noun

Place all non-translatable text on its own line (programming code snippets)

RBMT and CL 13

Page 14: Rules Based Machine Translation

CL rules for MTDo not use slashes to list lexical items

Do not write the full name of each operating system

Avoid –ing words

Use a noun at the start of subordinate clause

Repeat the head noun in ambiguous coordinated structures

Use a hyphen to indicate the first part of a compound

Use articles in specific contexts (for disambiguation)

Keep both parts of a two-part verb together

Use "could" with "if“

Avoid parenthetical expressions in the middle of a sentence

RBMT and CL 14

Page 15: Rules Based Machine Translation

Examples of CL ViolationKeep both parts of a two-part verb together

This document gives directions to turn email scanning on or off. Dieses Dokument gibt Richtungen zum Umdrehung E-Mail-Prüfung

an oder weg. Ce document donne des directions à l'analyse du courrier

électronique de tour en fonction ou hors fonction.

This document gives directions to turn on or turn off email scanning. Dieses Dokument gibt Richtungen, E-Mail-Prüfung zu aktivieren

oder zu deaktivieren. Ce document donne des directions pour activer ou désactiver

l'analyse du courrier électronique.

RBMT and CL 15

Page 16: Rules Based Machine Translation

Lessons LearnedStrict implementation when there is:

New contentLittle leverageTime

Rules can be context-sensitiveDifferent results depending on client applicationMay not always flag tag problems

Language-specific rulesProbably best implemented as:

Pre-processing stepNormalization dictionaries

CL + MT is not sufficientTerminology work to update dictionariesPE when specific qualify standard is required

RBMT and CL 16

Page 17: Rules Based Machine Translation

Thank you!

Copyright © 2010 FRED Hollowood CONSULTING . All rights reserved.

This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.

RBMT and CL 17

Fred [email protected]