can big data change the translation industry?

Post on 17-Feb-2017

103 Views

Category:

Business

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Can Big Data from the CloudRevolutionize Translation

Metrics?

Quick intro• 2010: Memsource founded• 2015: 50,000 users & 100+ million words translated monthly• Some of the world’s largest translation providers and buyers are

customers

SEGA FUJIFILM

Cloud tools lead to Big Data Server tools – private data silos Cloud tools – centralized data

And the clouds are getting bigger…

In May alone, users processed 0.8 billion words in Memsource

…Which opens opportunities for benchmarking and trendwatching

Impact

• Find market pain points

• Usage stats

• Universal performance metrics

• Eliminate free tests

• ROI tracking• Identify

synergies

• Higher margins• Real-time benchmarking• Notifications that help

manage operations

Translation companies Buyers

Technology providers

Freelancers and Project managers

Example problem - quality• Free testing• Since the end of LISA everyone has a unique quality metric

• Can we embed a certain standard into the tool itself?

Freelancer profile on Upwork.com

Our analytics building blocks

SQL technology

Visualization: 400 filters

Legacy solution

Visualization: about 20 filters

Kibana console look

So what can we track there?• In theory, anything:

• Translation data• Productivity• Business analytics• Notifications

• In practice (challenges):• Data clean-up• Relevance• Interpretation

Translation memory used for 85% of jobs

Users save 10 to 40% with TM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 500%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

overall TM Leverage by top 50 volume users

repetitions tm.match101 tm.match100 tm.match95 tm.match85 tm.match75 tm.match50 tm.match0

users

Data for jobs where post-editing analysis has been performed, December 2015 - May 2016

Sample9 bn words

Savings approx. $300 million

MT is currently used on 31% of projects

Top MT Engines

ENGINE %

Microsoft with Feedback 15.8%

Microsoft Translator Hub 9.9%

Google Translate 2.6%

Microsoft Translator 2.5%

SDL BeGlobal 0.4%

Other 0.6%

MT not used 68.2%

Up to 80% content pasted from MT then edited

Sample size 20 million words, December 2015 - May 2016

en:es pt:en en:pt es:en en:ru ru:en en:de pt:es en:fr es:pt0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

EDIT DISTANCE FOR MAJOR LANGUAGE PAIRS

mt.match100 mt.match95 mt.match85 mt.match75 mt.match50 mt.match0

SAMPLE LANGUAGE PAIRS

% O

F W

ORD

S IN

SEG

EMEN

TS F

ROM

MT MT not used

Raw MT

Moderate edits

Heavily edited

Many linguists translatemore than 10 pages a day consistently

2 30 58 86 1141421701982262542823103383663944224504785065345625906186466747027307587868148428708989269549820

200

400

600

800

1000

1200

1400

1600

Top 1000 Linguist role Productivity, Pages in April 2016

Users

Page

s Com

plet

ed i

n Ap

ril

Norm:8 pages a day x 20 days

20 pages a day

Probably not human translation

10 pages a day

Project manager productivity

Renato Joana Kris John Bill Robert Alex Sandor Dave Millingan Mihiko Olga Barbora0

50

100

150

200

250

300

350

400

450

408

325313

263

159143

122

74 68 63

3110 5

Job Created by PMs and Completed by Linguists in the last 30 days

– test organization

Benchmarking possibilities

1 or less from 1 to 10 from 11 to 100

from 101 to 200

from 200 to 300

from 300 to 400

from 400 to 500

from 500 to 600

from 501 to 1000

from 1001 to 2000

more than 2000

0

100

200

300

400

500

600

700

800

674

440 428

94

3713 9 5 12 10 7

PM Productivity, Completed Jobs Per Month

Number of jobs completed

Num

ber o

f use

rs

December – May 2016

Top 10%

Project manager productivity

Renato Joana Kris John Bill Robert Alex Sandor Dave Millingan Mihiko Olga Barbora0

50

100

150

200

250

300

350

400

450

408

325313

263

159143

122

74 68 63

3110 5

Job Created by PMs and Completed by Linguists in the last 30 days

Top 10% of Global PM User Population

“In fact, Big Data applications are bound only by the human imagination”.

Peter Pham

What you can do now• What to track?• How can organizations benefit from each other’s data?• Which data should not be shared?

Thank you!

konstantin@memsource.com

top related