webinar machine translation acrolinx welocalize

39
New Breakthrough With Machine Translation? The Secret Is in the Source

Upload: acrolinx

Post on 11-Apr-2017

1.011 views

Category:

Marketing


2 download

TRANSCRIPT

Page 1: Webinar Machine Translation Acrolinx Welocalize

New Breakthrough With Machine Translation?

The Secret Is in the Source

Page 2: Webinar Machine Translation Acrolinx Welocalize

New Breakthrough With Machine Translation? The Secret Is in the SourceJoint Webinar With Welocalize and Acrolinx

Olga Beregovaya, Vice President Technology Solutions, WelocalizeElaine O’Curran, Program Manager, Language Tools, WelocalizePeter Beargie, Senior Sales Engineer, Acrolinx

January 21, 2016

Page 3: Webinar Machine Translation Acrolinx Welocalize

Summary • Machine Translation and Post-Editing – Process, Benefits, and Issues• Source Content Challenges • Benefits of Pre-Editing the Source • Post-Editing Effort – “Before” and “After”• Findings and Conclusions

Page 4: Webinar Machine Translation Acrolinx Welocalize

1997 1571186

60055384Global Offices

Languages

Employees

Global Clients

Projects in 2014

Year Founded

13

Welocalize is the 9th largest provider in the world and 4th largest language service provider in the US.

Source: Common Sense Advisory, 2014

956 Million Words Translated in 2014 2.6 Million Words

Translated per Day

Welocalize – Who We Are

Page 5: Webinar Machine Translation Acrolinx Welocalize

clients+expertise

Dell IntuitPayPalNetApp CiscoCanonAOLWestern UnionSherwin WilliamsMettler ToledoTerex John DeereP&GCargillIKEAOptimizely

consumer technology

manufacturing legal

learning oil + gas

finance travel + hospitality

hotels.comSchlumbergerVMwareHarley-DavidsonDanaherTripAdvisorMicrosoft McAfeeSiemensAutodeskSchneider ElectricSalesforce.comVideojetBlackboardRolls RoyceF-Secure

Page 6: Webinar Machine Translation Acrolinx Welocalize

MT Engines MT ReadinessMT Post-Editing

• Moses• MS Hub• ProMT• Systran• Asia Online • IPTranslator• Google Translator

• Source Content audit

• Pre-translation editing

• Engine Training Data Optimization

• Author training on creating global content

• Evaluate MT output quality via human assessment + automated scoring

• Train post-editors for various engines output

• Provide engine training feedback to MT developers + Productivity testing

Page 7: Webinar Machine Translation Acrolinx Welocalize

Post-Editing: Process and ResourcesProcess:• Benefits: MT is a productivity tool that helps reduce time-to-market, increases

translator productivity, and lowers overall localization program costs • MT as a Part of Localization Workflow: MT PE and conventional translation often

coexist within the same program

Talent:• Same Talent: Our post-editors are our regular translators who we train • Account Knowledge: We keep our know-how and our experience• No Quality Degradation: Post-editing is just another way to reach the same result

We see between 10% and 100+% productivity gain depending on language and content complexity

Page 8: Webinar Machine Translation Acrolinx Welocalize

Content That We Post-Edit • Technical / IT / Training Exams• Business / Management Comms / Training• Corporate Image / Branding / Advertising• Voice-Over / Subtitles / Video• Marketing / Transcreation / Copywriting / Blurbs• Technical Documentation• User Interface / Website• User Assistance / Consumer Documentation

…but is the source always MT-friendly?

Page 9: Webinar Machine Translation Acrolinx Welocalize

Typical MT Output Issues • Issues around capitalization• Punctuation and spacing• Inconsistent terminology • Omissions / additions of text • Unknown / new words translated literally or be left in English• Word order• Compound formation• Word form agreement

CAN ANY OF THIS BE CONTROLLED BY INTRODUCING “QUALITY AT THE SOURCE”?

Page 10: Webinar Machine Translation Acrolinx Welocalize

Case Study Overview

Re-author

• English segment selection from four different IT accounts (1000 words)

• Re-authoring of 52% of segments based on Acrolinx recommendations

MT• Machine translation of both versions of the segments into German• Ranking by human evaluator of both MT versions

Post-edit

• Post-editing by account translators of both MT versions• Automatic scoring of MT and post-edited segments for Post-Edit

Distance comparison

Page 11: Webinar Machine Translation Acrolinx Welocalize

Examples of Source Content Challenges for MT

Page 12: Webinar Machine Translation Acrolinx Welocalize

Security Systems

• Avoid prerequisite in step• Make sure

• Use relative pronoun• gasket provided -> gasket that is

provided

English Source German MT When installing outdoors, make sure the mounting gasket provided with the camera is used to maintain the bracket’s water resistance.

Bei einer Installation, stellen Sie sicher, dass der Rahmen des sichergestellt der dichtungsring, die im Lieferumfang der Kamera verwendet wird.• Avoid possessives

• bracket’s

Page 13: Webinar Machine Translation Acrolinx Welocalize

Software / Hardware English Source German MT If you have any further problems contact Support (using the Support option on the portal or send an email to your relevant regional alias listed on first page of this guide).

Wenn Sie über die Probleme Contact Support (mit der Option "Support" auf dem Portal weitere verfügen oder Senden Sie eine e-Mail an Ihren entsprechenden regional Alias auf der ersten Seite dieses Handbuchs aufgeführt).• Avoid “-ing” word

• using• Sentence too long

• 31 words• Use comma after subordinate phrase

• problems contact -> problems, contact

Page 14: Webinar Machine Translation Acrolinx Welocalize

Control Sensing Device English Source German MT When enabled, if a duplicate IP address is detected a message appears on the Internet Protocol page and on the image of the Micro850 controller noting the IP address conflict.

Wenn aktiviert, wenn eine doppelte IP-Adresse, dass eine Meldung wird angezeigt erkannt wird, auf der Seite Internetprotokoll und auf das Bild des Micro850-Controllers der IP-Adressenkonflikt beachten.

• Sentence too long• 30 words

• Disambiguate “-ing” word• noting

Page 15: Webinar Machine Translation Acrolinx Welocalize

Digital Security English Source German MT Enhanced memory scanning works in conjunction with the monitoring component to detect malware variants during live scans and take quarantine actions against threats.

Verbesserter Memory-Technologie durchsuchen zusammen mit der Verhaltensüberwachung zur Malware-Varianten während einer Echtzeitsuche nicht durchsucht und Quarantäne-Aktionen gegen Bedrohungen ergreifen.• Avoid redundant word

• with• Remove unnecessary space

• component to -> component to

• Disambiguate “-ing“ word• scanning

• Simplify word• in conjunction with

Page 16: Webinar Machine Translation Acrolinx Welocalize

About Acrolinx and How We Can Help

Page 17: Webinar Machine Translation Acrolinx Welocalize

ABOUT USDEVELOPED

German Research Center for Artificial Intelligence

Company Started2002

LAUNCHED GROWTH250+ Customers

75 Employees

Page 18: Webinar Machine Translation Acrolinx Welocalize

ABOUT US

Acrolinx helps the world's greatest brands create amazing content:

on-brand, on-target, and at scale.

Page 19: Webinar Machine Translation Acrolinx Welocalize

HOW IT WORKSAcrolinx is the only software that actually “reads” your content and guides writers to make it better

Page 20: Webinar Machine Translation Acrolinx Welocalize

Writing for Machine Translation

• Use correct spelling and terminology

• Use clear and concise sentences

• Reuse sentences• Avoid ambiguity

Page 21: Webinar Machine Translation Acrolinx Welocalize

• Ensure style guide consistency

• Ensure brand terms are used consistently

• Increase sentence reuse

• Guide writers to correct issues

How Acrolinx Helps

Page 22: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

Security Systems• Acrolinx checks

the content

Page 23: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

Security Systems• Acrolinx

Scorecard

Page 24: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

• Use relative pronoun

• Avoid possessives

• Avoid prerequisite in step

Security Systems

Page 25: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

Software / Hardware• Use comma

after introductory phrase

• Sentence too long

• Avoid “-ing” word

Page 26: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

Control Sensing Device• Sentence too

long• Avoid “-ing”

word

Page 27: Webinar Machine Translation Acrolinx Welocalize

If you are surface mounting the camera, remove the mounting gasket and tuck the connected cables to one side through the cable entry notch in the mounting bracket.

If you surface mount the camera, first remove the mounting gasket.

Then tuck the connected cables to one side through the cable entry notch in the mounting bracket.

Digital Security• Remove

unnecessary space

• Avoid “-ing” word

• Avoid redundant word

• Simplify word

Page 28: Webinar Machine Translation Acrolinx Welocalize

Examples of Improved MT Quality and Post-Editing Effort

Page 29: Webinar Machine Translation Acrolinx Welocalize

Post-Edit Distance

• Lower is better – less edits!• Values are derived by comparing the post-edited

segments with the corresponding machine translation segments

• Measures the number of insertions, deletions, substitutions required to transform MT output to the required quality level

• Our analysis applies the Levenshtein algorithm and is character-based

Page 30: Webinar Machine Translation Acrolinx Welocalize

Security SystemsStatus Source Acrolinx comments MT Post-Edit Rank PE

Distance

Improved

Before Acrolinx

When installing outdoors, make sure the mounting gasket provided with the camera is used to maintain the bracket’s water resistance.

Avoid prerequisites: Make sure

Use relative pronoun: gasket provided -> gasket that is provided

Avoid possessives: bracket’s

Bei einer Installation, stellen Sie sicher, dass der Rahmen des sichergestellt der dichtungsring, die im Lieferumfang der Kamera verwendet wird.

Stellen Sie bei einer Installation im Außenbereich sicher, den der Kamera beiliegenden Dichtungsring zu verwenden, um die Wasserbeständigkeit des Rahmens zu gewährleisten.

New MT better

57%

22%After Acrolinx

When installing outdoors, use the mounting gasket that is provided with the camera to maintain the water resistance of the bracket.

  Bei einer Installation, verwenden Sie den dichtungsring, der zur Verfügung gestellt wird mit der Kamera die der Halterung sichergestellt.

Bei einer Installation im Außenbereich verwenden Sie den Dichtungsring, der mit der Kamera zur Verfügung gestellt wird, um die Wasserbeständigkeit der Halterung zu gewährleisten.

  35%

Page 31: Webinar Machine Translation Acrolinx Welocalize

Software / Hardware Status Source Acrolinx comments MT Post-Edit Rank PE

Distance

Improved

Before Acrolinx

If you have any further problems contact Support (using the Support option on the portal or send an email to your relevant regional alias listed on first page of this guide).

Sentence too long: 31 words

Use comma after subordinate phrase: problems contact -> problems, contact

Avoid “-ing” word: using

Wenn Sie über die Probleme Contact Support (mit der Option "Support" auf dem Portal weitere verfügen oder Senden Sie eine e-Mail an Ihren entsprechenden regional Alias auf der ersten Seite dieses Handbuchs aufgeführt).

Wenn Sie weitere Probleme haben, wenden Sie sich über die entsprechende Option im Portal an den Support, oder senden Sie eine E-Mail an Ihren zuständigen regionalen Alias, den Sie auf der ersten Seite dieses Handbuchs finden.

New MT better

40%

25%After Acrolinx

If you have any further problems, contact Support. Use the Support option on the portal or send an email to your relevant regional alias listed on the first page of this guide).

  Falls Sie weitere Probleme haben, wenden Sie sich an den Support. Verwenden Sie die Option "Support" auf dem Portal oder senden Sie eine e-Mail an Ihren entsprechenden regional Alias, auf der ersten Seite dieses Handbuchs aufgeführt).

Falls Sie weitere Probleme haben, wenden Sie sich an den Support. Verwenden Sie die Option „Support“ im Portal, oder senden Sie eine E-Mail an Ihren entsprechenden regionalen Alias, den Sie auf der ersten Seite dieses Handbuchs finden.

  15%

Page 32: Webinar Machine Translation Acrolinx Welocalize

Control Sensing Device Status Source Acrolinx comments MT Post-Edit Rank PE

Distance

Improved

Before Acrolinx

When enabled, if a duplicate IP address is detected a message appears on the Internet Protocol page and on the image of the Micro850 controller noting the IP address conflict.

Sentence too long: 30 words

Avoid “-ing” word: noting

Wenn aktiviert, wenn eine doppelte IP-Adresse, dass eine Meldung wird angezeigt erkannt wird, auf der Seite Internetprotokoll und auf das Bild des Micro850-Controllers der IP-Adressenkonflikt beachten.

Bei einer Aktivierung wird auf der Seite „Internetprotokoll“ und auf dem Bild des Micro850-Controllers eine Meldung angezeigt, die über einen IP-Adressenkonflikt benachrichtigt, sobald eine doppelte IP-Adresse erkannt worden ist.

New MT better

72%

40%After Acrolinx

When enabled: If a duplicate IP address is detected, a message appears on the Internet Protocol page and on the image of the Micro850 controller. The message notes the IP address conflict.

  Wenn aktiviert: Wenn eine doppelte IP-Adresse erkannt wird, erscheint die Meldung auf der Seite Internetprotokoll und auf das Bild des Micro850-Controllers. Die Nachricht Versionshinweise für die IP-Adressenkonflikt.

Bei Aktivierung: Wenn eine doppelte IP-Adresse erkannt worden ist, wird auf der Seite „Internetprotokoll“ und auf dem Bild des Micro850-Controllers eine Meldung angezeigt. Die Meldung weist auf einen IP-Adressenkonflikt hin.

  32%

Page 33: Webinar Machine Translation Acrolinx Welocalize

Digital SecurityStatus Source Acrolinx comments MT Post-Edit Rank PE

Distance

Improved

Before Acrolinx

Enhanced memory scanning works in conjunction with the monitoring component to detect malware variants during live scans and take quarantine actions against threats.

Avoid “-ing” word: scanning

Simplify word: in conjunction with

Avoid redundant word: with

Remove unnecessary space: component to -> component to 

Verbesserter Memory-Technologie durchsuchen zusammen mit der Verhaltensüberwachung zur Malware-Varianten während einer Echtzeitsuche nicht durchsucht und Quarantäne-Aktionen gegen Bedrohungen ergreifen.

Die optimierte Speicherscanfunktion lässt sich in Kombination mit der Überwachungskomponente zur Erkennung von Malware-Varianten bei Live-Scans und zur Einleitung von Quarantänemaßnahmen bei Bedrohungen verwenden.

Equal 61%

14%After Acrolinx

Enhanced memory scan works with the monitoring component to detect malware variants during live scans and take quarantine actions against threats.

Verbesserter Memory Scan arbeitet mit der Überwachung Komponente, Erkennen von Malware-Varianten bei der Live-Suche, und führen Quarantäneaktionen gegen Bedrohungen.

Die optimierte Speicherscanfunktion lässt sich in Kombination mit der Überwachungskomponente zur Erkennung von Malware-Varianten bei Live-Scans und zur Einleitung von Quarantänemaßnahmen bei Bedrohungen verwenden.

  47%

Page 34: Webinar Machine Translation Acrolinx Welocalize

Analysis and Results

Page 35: Webinar Machine Translation Acrolinx Welocalize

Post-Edit Distance Results• PE Distance is 9.2 points lower

for segments that were edited by Acrolinx

• TER values are 7 points lower for segments that were edited by Acrolinx

• TER is a word-based edit metric that captures insertion, deletion, and substitution of single words as well as shifts of word sequences

PE Distance (%) TER (Translation Edit Rate)0

10

20

30

40

50

60

70

80

90

100

5254

4347

Before editing with Acrolinx suggestionsAfter editing with Acrolinx suggestions

Page 36: Webinar Machine Translation Acrolinx Welocalize

Overall Results

• 52% of source segments received re-authoring suggestions by Acrolinx

• 68% of the re-authored segments produced better MT quality according to human ranking

• PE Distance improved 9.2 points for re-authored segments

• Most frequent re-authoring suggestions were:1. Avoid “-ing” word2. Sentence too long

Page 37: Webinar Machine Translation Acrolinx Welocalize

Why This Matters

• 9% Improvement in PE Distance translates to: – 7-8% Productivity gain for translators– 5% Post-Editing discount improvement– 5% Time-to-Market improvement

Page 38: Webinar Machine Translation Acrolinx Welocalize

Do You Have Any Questions?

Page 39: Webinar Machine Translation Acrolinx Welocalize

Thank You