industry-scale crowdsourcing of data and terminology, chat2013
DESCRIPTION
Presenter: Rahzeb Choudhury (TAUS) This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312TRANSCRIPT
![Page 1: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/1.jpg)
Industry-ScaleCrowdsourcing of
Data & TerminologyRahzeb Choudhury, TAUS
![Page 2: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/2.jpg)
TAUS MissionOur mission is to increase the size and significance of the translation industry to help the world communicate better.
Sharing Data & Knowledge…on an industry-level in anopen and transparentlandscape brings us all to a higher level of competence.
![Page 3: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/3.jpg)
Where We Stand
Together We Know
More
We KnowBetter
![Page 4: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/4.jpg)
Four Focus Areas
This slide may not be used or copied without permission from TAUS
Translation as a Utility
Data Technology
InteroperabilityMetrics
![Page 5: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/5.jpg)
![Page 6: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/6.jpg)
![Page 7: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/7.jpg)
Members
![Page 8: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/8.jpg)
Global Members
![Page 9: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/9.jpg)
Academic, NGO & Government Members
![Page 10: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/10.jpg)
Large Corporate Members
![Page 11: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/11.jpg)
Small Corporate Members
![Page 12: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/12.jpg)
Agency Members
![Page 13: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/13.jpg)
Terminology
![Page 14: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/14.jpg)
43.5%
39.9%
14.8%1.8%
Importance of Terminology Work
Very important
Quite important
Less important
Not important
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
![Page 15: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/15.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Information Sources
![Page 16: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/16.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Information Sources
![Page 17: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/17.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Information Sources
![Page 18: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/18.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Information Sources
![Page 19: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/19.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Information Sources
![Page 20: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/20.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Main Problems
20.6%
12.2%
11.5%
10.3%
36.0%
9.4%
Lack ofresources/InsufficientterminologymanagementPoor quality/Up-to-dateness
Lack of information
Lack of convincingverification/Misleadinginformation online
Rest
![Page 21: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/21.jpg)
Too many sources.Takes too much time.Effort is duplicated.
Results questionable.
![Page 22: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/22.jpg)
…Centralization…
![Page 23: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/23.jpg)
![Page 24: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/24.jpg)
OwnedShared
Web
![Page 25: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/25.jpg)
Machine Translation
![Page 26: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/26.jpg)
Data and Quality
Amount of Data
MT Quality
More data
Algorithms
In-domain Data
![Page 27: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/27.jpg)
OwnedShared
Web
![Page 28: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/28.jpg)
Lack of access.Copyright.
Takes too much time.Effort is duplicated.
Quality questionable.
![Page 29: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/29.jpg)
…Centralization…
![Page 30: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/30.jpg)
Central Source of In-domain Data
OwnedShared
Web – to come in 2014
![Page 31: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/31.jpg)
![Page 32: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/32.jpg)
Terminology and Machine Translation
![Page 33: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/33.jpg)
Data and Quality
Amount of Data
MT Quality
More data
Algorithms
In-domain Data
Usage/Feedback Data..Terminology!
![Page 34: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/34.jpg)
…Centralization…
![Page 35: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/35.jpg)
TAUS MissionOur mission is to increase the size and significance of the translation industry to help the world communicate better.
Sharing Data & Knowledge…on an industry-level in anopen and transparentlandscape brings us all to a higher level of competence.
![Page 36: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/36.jpg)
Central Sources of Data and Terminology
Own Data – Private Vault Shared Data – In domain data Web Data – Data Collector
Own Terms – Build Own Collections Shared Term – In-domain terms Web Terms – Term Collector
But what about the crowd?
For language workers, CAT Tools & MT Systems
![Page 37: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/37.jpg)
Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)
Main Problems
20.6%
12.2%
11.5%
10.3%
36.0%
9.4%
Lack ofresources/InsufficientterminologymanagementPoor quality/Up-to-dateness
Lack of information
Lack of convincingverification/Misleadinginformation online
Rest
![Page 38: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/38.jpg)
Central Sourcing of Data and Terminology
The crowd must verify!
Web Data – Data Collector Web Terms – Term Collector
But what about the crowd?
The crowd must source!
![Page 39: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/39.jpg)
Unless the crowd helps tosource and verify…….
Too many sources.Takes time.
Effort is duplicated.Results questionable.
We maintain the status quo..
![Page 40: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/40.jpg)
Register and engage:demo.taas-project.eu
![Page 41: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013](https://reader034.vdocuments.net/reader034/viewer/2022051323/54b9c4ad4a7959d40d8b46cb/html5/thumbnails/41.jpg)
This slide may not be used or copied without permission from TAUS
Thank you.Contact: [email protected]