04-02-2015 d 4.7 mosescore 3rd yr events, 2014 v02 · 2 !!! tableof!contents! executivesummary! 3!...
TRANSCRIPT
MOSES CORE
Deliverable D 4.7 Report on third year’s industry outreach events
Work Package: WP 4: Industry Outreach Date (mm/yy): January 2015 Dissemination level: Public Author: Yulia Korobova
2
Table of Contents
EXECUTIVE SUMMARY 3
1. OUTREACH EVENTS IN EUROPE 4
1.1. MT SHOWCASE, DUBLIN, IRELAND 4 1.1.1. RESULTS 5 1.2. MT DISCUSSION AT THE VVIN CONFERENCE, THE HAGUE (THE NETHERLANDS) 5
2. OUTREACH EVENTS IN NORTH AMERICA 6
2.1. MT SHOWCASE IN VANCOUVER, CANADA 6 2.1.1. RESULTS 7 2.2. MOSES INDUSTRY ROUNDTABLE IN VANCOUVER, CANADA 7 2.2.1. RESULTS 9
3. GENERAL FINDINGS 10
4. CONCLUSIONS 15
APPENDIX 1: TAUS MT SHOWCASE DUBLIN 2014 DISCUSSIONS 16
APPENDIX 2: TRANSCRIPT OF THE RECORDING OF THE TAUS MT SHOWCASE PANEL DISCUSSION 20
APPENDIX 3: NOTES FROM MOSES INDUSTRY ROUNDTABLE BREAKOUT DISCUSSIONS 31
APPENDIX 4: TAUS MT SHOWCASE VANCOUVER 2014 DISCUSSIONS 34
3
Executive Summary This report on industry outreach events outlines the third year MosesCore results in accordance with the Communication plan (D4.1) delivered in May 2012.
Based on the number of participants in previous years (low attendance in Asia) we decided to focus Moses outreach events in 2014 in Europe and North America. In 2014 together with our partner Localization World we organized two MT Showcases (in Dublin and in Vancouver) with an aim to foster the use of machine translation. In addition to that, AMTA hosted a TAUS Industry Roundtable in Vancouver. TAUS also gave a presentation at the VViN Lustrum in The Hague. Cooperation with these organizations was beneficial for the exposure of Moses. At all events we used Moses collateral. During the autumn events in Canada we have distributed the leaflets outlining the MosesCore project and the results we reached within the last three years. Prior to the events promotion campaigns were run to reach and attract participants: social media posts, one-‐on-‐one emails/conversations and big e-‐bulletins. For communication synergy in 2014 we continued to use three key messages of the MosesCore project: 1. Moses is a state-‐of-‐art machine translation toolkit. It is best suited to making specialized MT engines for specific clients and industry-‐domains. 2. Using Moses helps ensure flexibility and choice for users and fosters a healthy competitive landscape. 3. Using Moses helps to improve translation processes and capacity and create new business opportunities. Participants TAUS is the leader of work package 4 “Industry Outreach” and is supported by UEDIN and ALS (now Capita Translation and Interpreting)
4
1. Outreach events in Europe
1.1. MT Showcase, Dublin, Ireland Hosted at: Localization World Date: 4 June 2014 Audience: Translation buyers, translation companies, translation technology providers Aim: To raise awareness and demystify MT, to help set expectations and share knowledge/best practices of using MT and Moses Number of participants: 61 Use cases: European Commission, Iconic Translation Machines, KantanMT, Sovee, Tilde Web presence: 1.215 views of the presentations (18/01/2014-‐14/01/2015). An overview of the presentations: MT@EC for European public administrations and online services, Spyridon Pilos (European Commission)1 This is a showcase of the new MT system built by the Directorate General for Translation (DGT) using Moses. Beyond Data: Delivering Machine Translation with Subject Matter Expertise, John Tinsley (Iconic Translation Machines)2 A success story of commercial machine translation systems. Enabling MT for the everyone!, Tony O'Dowd (KantanMT)3 A cloud-‐based implementation of Moses. Sovee Smart Engine 2.0: a Leap beyond Base Moses Technology, Scott Gaskill (Sovee)4 A demonstration of the automated language tuning and training capabilities of the Sovee Smart Engine 2.0. MT applications in the EU public sector, Andrejs Vasiljevs (Tilde)5 Case study about the benefits of the MT in public sector.
1 http://www.slideshare.net/TAUS/taus-mt-showcase-mtec-for-european-public-administrations-and-online-services-spyridon-pilos-european-commission 2 http://www.slideshare.net/TAUS/taus-mt-showcase-beyond-data-john-tinsley-iconic-translation-machines 3 http://www.slideshare.net/TAUS/enabling-mt-for-the-everyone-tony-odowd-kantanmt 4 http://www.slideshare.net/TAUS/taus-mt-showcase-sovee-smart-engine-20-a-leap-beyond-base-moses-technology-scott-gaskill-sovee 5 http://www.slideshare.net/TAUS/taus-mt-showcace-mt-applications-in-the-eu-public-sector-adrejs-vasiljevs-tilde
5
All the presentations are publicly available on the MosesCore project website6, on the Moses Resources page on taus.net7 and on Slideshare.
1.1.1. Results The MT Showcase in Dublin started some interesting discussions. Appendix 1 is an overview of the questions based on the presentations and the podium discussion. All participants were asked to fill in the survey that was handed out. This survey was aimed to collect information about the current adoption and influence of the MosesCore project on the MT community. In addition to the above-‐mentioned survey, following the review recommendations we recorded the panel discussion. Appendix 2 contains a transcript of this discussion. This discussion covered a number of interesting points about the current use of Moses by industry leaders, as well as the sustainability of Moses beyond EC funding. “Moses has a place in any type of MT world, so even if you are a company like Systran who provides essentially rule-‐based machine translation systems you can use, something like Moses could be used at various stages in that process to enhance MT.”
John Tinsley (Iconic Translation Machines) “I have a very considerate view on Moses. If you equate Moses to the internal combustion engine: every car manufactured in the world uses the internal combustion engine and that is what Moses is to MT providers, it’s like an internal combustion engine but just like every car is different from every car manufacturer we are going to get lots of different flavours, we are going to get huge leaps in innovation, we are going to get lots of new reordering models, going to get analytic models and we are going to extend the power over and over again.”
Tony O’Dowd (KantanMT)
1.2. MT Discussion at the VViN Conference, The Hague (the Netherlands) Hosted by: VViN Date: 19 September 2014 Aim: To raise awareness of MT, share knowledge/best practices, explain what Moses is Audience: Translation companies Moderator: TAUS Number of participants: 20
6 http://www.statmt.org/mosescore/index.php?n=Main.Videos 7 https://translate.taus.net/translate/mosescore/mosescore-resources#use-cases
6
To meet the wishes of the participants of the previous MT Showcases about the shortage of similar events in Europe, the TAUS team took part in the VViN Conference in September. Together with 20 highly motivated participants we had a Q&A about open-‐source MT, Moses and adoption of MT in general.
2. Outreach events in North America
2.1. MT Showcase in Vancouver, Canada Hosted at: Localization World Date: 29 October 2014 Aim: To share knowledge/best practices, explain what Moses is Audience: Translation companies, buyers Presenters: eBay, Precision Translation Tools, Unbabel, Translated, TAUS Number of participants: 41 Web presence: 889 views of the presentations (04/10/2014-‐14/01/2015). An overview of the presentations: TAUS Introduction and MT market overview, Jaap van der Meer & Achim Ruopp, TAUS8 This presentation outlined the results of the 2014 MT Market report, including the use of Moses, and a number of predictions about the further development of the MT market. Machine Translation at eBay, Saša Hassan, eBay9 In this talk, eBay presented recent launches of Machine Translation, based on Moses, on the eBay site for various locales, e.g. Russian and Latin American markets, which enables buyers to shop in their native languages and fosters overall cross-‐border trade. The Simplified Guide to Getting Started in SMT, Tom Hoar, Precision Translation Tools10 This session reviews the fundamentals of selecting an SMT solution with examples that reference use cases with PTTools' DoMT Desktop, a commercial application with a Moses kernel. Seamless Globalization with distributed crowd post editing, Vasco Pedro, Unbabel11
8 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-taus-introduction-and-mt-market-overview-taus-2014 9 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-machine-translation-at-ebay-2014 10 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-the-simplified-guide-to-getting-started-in-smt-precision-translation-tools-2014?utm_source=slideshow&utm_medium=ssemail&utm_campaign=post_upload 11 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-seamless-globalization-with-distributed-crowd-post-editing-unbabel-2014?utm_source=slideshow&utm_medium=ssemail&utm_campaign=post_upload
7
In this talk Unbabel presented its method and technology of using MT based on Moses in combination with community post-‐editing, as well as showcaseing key integrations and early results. Introduction to Matecat, the open-‐source CAT tool for post-‐editing, Marco Trombetti, Translated12 Marco Trombetti discussed the strategy beyond CAT-‐tools, the use cases for LSPs and buyers, as well as tutorials on advanced Moses integration including real time online learning.
2.1.1. Results As during the other MT Showcases, we showcased a variety of options in Vancouver on how MT and Moses-‐solutions can be implemented in various environments from cross-‐border commerce to crowd-‐sourced post-‐editing. This latest showcase presented the breadth of solutions that Moses enables pointing to the versatility and value of this open source solution offers as an enabling technology.
2.2. Moses Industry Roundtable in Vancouver, Canada Hosted at: AMTA Date: 26 October 2014 Aim: Discussing the future of Moses: Moses beyond the MosesCore project Audience: Translation companies, government, academia Number of participants: 37 Web presence: 589 views of the presentations (04/10/2014-‐14/01/2015). An overview of the presentations: TAUS Moses Industry Roundtable 2014, MT Market, Jaap van der Meer, Achim Ruopp, TAUS13 In this presentation TAUS analyzed market MT trends, opportunities and challenges as well as market drivers and inhibitors. TAUS Moses Industry Roundtable 2014, Moses-‐Past, Present, Future, Hieu Hoang, Ulrich Germann, University of Edinburgh14 This presentation is about EC projects dedicated to MT. TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh15
12 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-mate-cat-translated-2014 13 http://www.slideshare.net/TAUS/taus-machine-translation-showcase-taus-introduction-and-mt-market-overview-taus-2014 14 http://www.slideshare.net/TAUS/taus-moses-industry-roundtable-2014-moses-past-present-future-hieu-hoang-ulrich-germann-university-of-edinburgh 15 http://www.slideshare.net/TAUS/taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh
8
Hieu Hoang presented results reached within the MosesCore project in the last three years. TAUS Moses Industry Roundtable 2014, Moses-‐Past, Present, Future, Hieu Hoang, Ulrich Germann, University of Edinburgh16 This presentation provided an overview of the EU projects that provided funding for Moses in the past and which ones are on the horizon to continue funding some of the development. TAUS Moses Industry Roundtable 2014, Introducing Strategic Questions17 Q&A and results of the breakout discussions. At the first Moses Industry Roundtable last year, TAUS brought together the Moses developer community and Moses users from industry and governments to discuss common challenges and opportunities for cooperation to tackle common issues. By collocating the roundtable at AMTA 2014 with the following TAUS and Localization World conferences this year, TAUS enabled the broadest possible audience the opportunity to participate and continue the conversation. As a discussion facilitator, TAUS captured the breakout notes of a organizational and technical breakout (Appendix 3) and audio recorded during stakeholder discussion following the breakouts. These valuable resources help to identify a stepping-‐stone for continued maintenance, support and development of Moses, also with additional non-‐governmental funding given the increased use of Moses by industry. The organizational breakout discussed the pros and cons of different options to organize and fund Moses development in the coming years for the different stakeholders that were present from academia, government and industry. In the technical breakout the current development process and the framework for contributions was discussed. After the breakouts the stakeholders got together in a larger group discussing the breakout findings and identifying opportunities for continued development. While a foundation idea was generally supported by the participants, potential funders stressed the importance of a defined foundation scope and a description of the benefits they would get from such an organization. The stakeholders also discussed where such a foundation should be located.
2.2.1. Results The above described results of the Moses Industry Roundtable discussions provided valuable input to the MosesCore Sustainability Report (D5.5). The roundtables organized by TAUS in 2013/2014 established a core community of Moses
16 http://www.slideshare.net/TAUS/taus-moses-industry-roundtable-2014-moses-past-present-future-hieu-hoang-ulrich-germann-university-of-edinburgh 17 http://www.slideshare.net/TAUS/taus-moses-industry-roundtable-2014-introducing-strategic-questions
9
stakeholders from industry, academia and government that can carry the development of Moses forward after the end of public funding.
10
3. General Findings In 2014 we set a goal to attract at least 40 registered participants per MT Showcase session (see Deliverable D4.1: Industrial Outreach Plan 2012-‐201518). Both Showcases in Dublin and Vancouver attracted the targeted amount of attendees. Chart 1 shows an overview of the number of participants at each Moses event we organized over the past 3 years. It is clear that the events held in the western part of the world were more highly attended than in the eastern part of the world. One reason for this is that the audience of LocWorld (the conference that hosted the MT Showcases) in Asia is always smaller than their conferences in Europe and North America. Chart 2 shows the average number of participants per year. We see that we dropped quite a bit in 2013 but climbed up again in 2014.
Chart 1 Number of participants at TAUS MosesCore events, 2012-‐2014
18 http://www.statmt.org/mosescore/uploads/Internal/io-plan2.pdf
11
Chart 2 Average number of participants in TAUS MosesCore events per year Following the recommendations listed in the previous EC reviews, in 2014 we have handed out short surveys during the MT Showcases in Dublin and Vancouver. The aim of these surveys was to collect more insight data on how attendees learn about Moses, their plans, as well as pros and cons influencing their decisions to (not) use Moses. We also wanted to get some more information on how their organization were using MT in general.
Chart 3 How is your organization using MT?
12
Chart 3 gives an overview on how the different organizations are using MT. It also gives us an idea on what percentage of organizations is already using Moses and what percentage is not (yet) and can thus be prospects for the Moses technology.
Chart 4 How did you learn about Moses, 2014 Chart 4 shows an overview of the attendees’ answers in 2014 to the question “How did you learn about Moses?”. These answers were returned by 56 participates (total participants from MT showcase in Dublin and Vancouver 102). The above answers show that a big share of the active respondents learns about Moses via events (Localization World, previous MT Showcase, MT Marathon), from business partners presenting at an MT Showcase (Precision Translation Tools, Asia Online, Sovee) or online research. Chart 5 shows answers of the MT Showcase attendees in Vancouver to the question “After attending this MT showcase are you more likely to look into Moses or a Moses-‐based MT solution?”.
13
Chart 5 After attending this MT showcase are you more likely to look into Moses or a Moses-‐based MT solution, 2014
Results show that 24% of those who answered the question are interested in the further exploitation and study of Moses. During the events it was also interesting to learn more about the factors influencing people’s choice for/against Moses. 20 attendees of the MT Showcases were kind enough to share their reasons. Which factors influenced your decision to use Moses?
Which factors influenced your decision not to use Moses?
Use case, quality
I am only just now hearing about it
Robust, fast, many companies use it
OSS
MT is a product we sell
I need to learn more
Availability, cooperation
Applicability. Few customers have data
It is open source
Scale of internal resources to implement into interval workflow systems
It works. Good support community
It does not support fine tuning (pre and post translation adjustment), segmentation rules and so on
Existing tool to leverage with it is fully We are very new to MT and it is all
14
replaced
customized. We need to save time. That is why we didn't use it.
It is easier to use for us Still gathering info
Table 1 Factors, influencing the Moses use, 2014
15
4. Conclusions The MT Showcases in 2014 have proven to be very successful, not only in terms of the number of people attending the events, but also in terms of the feedback. Eighty-‐three percent of the attendees answered that they are more likely to look into Moses as their MT solution than before attending the MT Showcase events. The MosesCore project has been crucial in developing not just awareness for Moses but also market share. The Moses MT Market Report (a separate deliverable) indicates that Moses MT constitutes 20% of the overall machine translation market and it also lists the number of new providers of Moses based solutions that entered the market place just since the start of the MosesCore project. To give the readers of this report the opportunity to witness the liveliness and maturity of the discussions at the different events we add transcriptions (see appendices) of the discussions that took place at the Dublin and Vancouver events. In 2015 TAUS plans to continue organizing MT Showcases during the Localization World Conferences. In December 2014 TAUS introduced free Academic Membership. Post-‐docs and students from universities around the world can get free access to TAUS Data, knowledge bases around Moses as well as the Dynamic Quality Framework to help them learn and experiment with the training of MT engines. With this initiative we intend to bridge the gap between industry and education and help companies to find the MT talents and computational linguists.
16
Appendix 1: TAUS MT Showcase Dublin 2014 Discussions 1. Panel Discussion questions
Moses(Core) related Is your solution Moses-‐based? If yes, what was the reason to base the solution on Moses? What is missing in the Moses open source project for industry use? Quality/Evaluation related I’m a small LSP. How can I verify that the solution you are offering works for my use case? How much will this evaluation cost me? Do you see the TAUS Dynamic Quality Framework as a good way to independently evaluate and compare different MT solutions? MT Market related In a survey for our upcoming MT Market report respondents identified the following main trends and drivers: Acceptance — Availability — Usability — Large(r) quantities of data — Low costs — Speed The MT market has always been driven on the abovementioned elements, so what’s new? Can you take me through what you/your company see(s) as the current state of MT? Do you see the drivers changing in the future? Where do you/does your company see MT going in the next 5 years? Where do you see growth areas for MT use? (over the next 5 years) What are the challenges to growth of the MT market sector? Is there a market for MT? Where is pricing going? MT Solutions related Where do you see the biggest opportunities for MT solutions over the next 5 years? General vs. domain-‐specific vs. customized Cloud vs. on-‐premise Broad language coverage vs. focus on small set of languages? Industry verticals? Post-‐edited MT vs. gisting and other non-‐edited uses 2. Questions from MT Market Report Survey 1. Please indicate the percentages of your MT related offerings for each of the items listed (as % of revenue) 2. What is the geographical spread (approximately) of your revenue in MT (as % of revenue)? 3. What is the delta in MT related revenue for your company from 2012 to 2013? 4. What do you see as the key market trends and what is driving the MT market sector? 5. What are the challenges to growth of the MT market sector? 6. What do you see as the opportunities for your company or for the MT sector in general? 7. What do you see as the threats for your company or for the MT sector in general?
17
3. Questions from MT Market Report Interviews MT Market Drivers and Inhibitors Main trends and drivers according to responses in the survey: Acceptance — Availability — Usability — Large(r) quantities of data — Low costs — Speed The MT market has always been driven on the abovementioned elements, so what’s new? Can you take me through what you/your company see(s) as the current state of MT? Do you see the drivers changing in the future? What are the challenges to growth of the MT market sector? Where do you/does your company see MT going in the next 5 years? Where do you see growth areas for MT use? (over the next 5 years) 4. Questions derived from presentations 1. MT@EC for European public administrations and online services, Spyridon Pilos (European Commission) The European Commission’s new machine translation system has been available since June 2013. It was built by the Directorate General for Translation (DGT) using Moses and the EU institutions’ translation memories, stored in the Euramis database. It is continuously improving through close collaboration with EC translators, and regular inclusion of their more recent translations. MT@EC will be the starting point for an "automated translation platform" to be funded by the Connecting Europe Facility in order to support multilingualism of other European digital service infrastructures. 2. Beyond Data: Delivering Machine Translation with Subject Matter Expertise, John Tinsley (Iconic Translation Machines) There are a number of current approaches to developing commercial machine translation systems, ranging from do-‐it-‐yourself platforms to fully customized development as a professional service. While these various approaches have their relative merits, they all present a number of drawbacks for the end user, be it the inability to handle complex content or a long and expensive period of development and testing. At Iconic Translation Machines, our approach goes beyond basic engineering of data to build MT systems and overcome these drawbacks. We combine deep domain knowledge and linguistic expertise to deliver highly focused MT engines for targeted domains and languages. Our IPTranslator service, for example, has been developed using this approach to produce intelligent MT systems adapted for patent and legal
18
content. We demonstrate how this approach has delivered significant value to end users and describe how these systems serve as an ideal launchpad for ongoing adaptation and optimization. 3. Enabling MT for everyone! Tony O'Dowd (KantanMT) Working with Moses and building high quality MT systems is not for the faint hearted. It requires a wide range of technical and linguistic based knowledge that is often difficult to find and develop within organizations. Consequently, only the biggest organizations have the financial muscle to invest and reap the awards of MT. This puts the small-‐to-‐medium sized organizations at a distinct disadvantage. KantanMT changes everything! KantanMT is a cloud-‐based implementation of Moses which enables SMEs to embrace the advantages of MT -‐ quickly and economically. This presentation will demonstrate the KantanMT approach to rapid engine training and tuning, data analytics used to predict MT quality and create tiered pricing structures and instantaneous engine deployment -‐ all of which are driving the new MT Revolution! 4. Sovee Smart Engine 2.0: A Leap Beyond Base Moses Technology, Scott Gaskill (Sovee) This month marks the advent of a new generation in Machine Translation. With the release of Sovee Smart Engine 2.0, it is now possible to process virtually unlimited simultaneous transactions without the limitations originally inherent to the base Moses technology. Sovee's latest development delivers an unprecedented 500 language engines, which will expand to thousands of languages in the next few years. This workshop will demonstrate the automated language tuning and training capabilities of Sovee Smart Engine 2.0. It will highlight the deep cascading framework that delivers the highest level of accuracy ever imagined for machine translation, and a new combined process for SMT and post-‐editing.
19
20
Appendix 2: Transcript of the recording of the TAUS MT Showcase Panel Discussion
(53:00)(03/06/2014)
Maria -‐ Pangeanic: Well just, interesting presentation, one question; did Google start to notice your efforts? Have they, have they seen what you guys are doing? Andrejs Vasiljevs: They are a search engine they should have noticed. (laughing) JVM: What do you mean? Whether they are afraid? Maria -‐ Pangeanic: Afraid, I don’t know if Google can be afraid. But, just if they noticed, you know that your engines for, you know, your search engines, or your MT engines are a little bit better than theirs for your languages. JVM: And by the way this was Maria from Pangeanic who asked the question. Andrejs Vasiljevs (54:07): Thank you for the question. The research community will, do share all over the world for instance published works of papers and client by client have a nice reactions with people from Google, but mostly we do not want disclose what we are doing for those reasons. But I think we are not pretty much worried about other developments, and we were happy to be approached by Microsoft, and we helped actually with Microsoft research for the ‘Bing’ engine for some of the languages. *text missing, help with others; Google as well* JVM (55:00): Alright, Good, anything from the audience, we are now going to zoom in on sort of the general implementation questions, concerns around using MT, we heard at the start that there are only actually a few of you, who are not using MT yet, but planning to do that. Have you learned something? Those of you who are not using MT yet, that you didn’t know before? What do you think? Are you now happier to start using MT? Sergio VMware: I never used MT before, but for my current company it would be a fresh start definitely, and that’s why I’m here. Yeah, but what I learn is; that there is a lot to learn. JVM: And that’s Sergio from VMware. What about Amazon are you now happier than you were before, after this session? Name (male), Amazon: Well I think in terms of MT, we touched different points and the data was one of the points that John discussed there, as one of the key areas, that the more data you have the better the quality of the translations you will get with the MT. But I was very impressed with John, John’s presentation because it demonstrates a little bit of a different aspect of how the MT can be used, so we are very, we were used to use machine translations as a hybrid, as a rule-‐based or a
21
statistical and see the different cogs and the different parts that would, could enable each other. It opened my mind a little bit on machine translation, yeah and the ways that we can go forward. JVM: Thank you, any other … George are you already using MT? George, Boffin Language: We don’t use MT too much; we just tried from Chinese to Japanese for, for patent abstract, something like that. The result is OK, it is pretty good plus PE (Post-‐editing) the result is our customer satisfied. We didn’t try from English to Chinese. We are thinking about that, thank you. JVM: What did you use? What are you using? George, Boffin Language: We cooperate with MT start-‐up company in Beijing, so we don’t know exactly the technology behind. JVM: OK, using Moses do you think? George, Boffin Language: I don’t know actually. JVM: We never know, that was George from Boffin Language. So what do we do with that kind of obscure, yeah it’s a kind of obscurity exists in the market where people don’t know what’s really behind that screen, you know is that important or another question like when you’re an LSP, like some of you, do you have to tell your customer that you are using MT, is that important, do they have to know? Any observations from the panel here? How transparent we have to be about what’s inside? Tony O’Dowd (58:26): I think, If, if you go back to the very early days of translation memory everybody wanted to know how a translation memory system worked, they wanted to know the innards because they didn’t trust technology, they didn’t trust ‘fuzzy match’, go fast forward to today, nobody questions fuzzy match. Today it’s a fundamental fact of this industry, it drives the cost model, it drives scheduling, project managers use it every day, in fact it drives the RFQ process that most companies use to a great extent. So I think in the absence in a situation where you have high trust in technology they tend not to question it because they just accept it. OK. With Machine translation, although it has been around longer than translation memory and that’s a startling fact, translation memory came after machine translation. JVM: It was just a lower feature of MT. Tony O’Dowd: Absolutely, yes, it’s a longer technology. But its only in the last couple of years that it has really started to emerge as a viable tool to aid productivity, so I think today there is a great interest and curiosity as to what goes on behind the scenes. But I would anticipate that as more and more people grow to trust it that, that curiosity will become less and less, and the curiosity will shift onto how we can
22
actually maximise the benefits of machine translation, not understand the technology behind it, it’s how we can leverage those benefits in our business. So for instance a lot of our clients when we engaged with them, when we were giving the product out for free, about 18 months ago were all about; what sort of reordering models your using, what sort of data crunching your using, what sort of, what’s the minimum number, amount of words you need to build a model. It was all sorts of technical questions, whereas today, the clients that are actually in deployment of machine translation They don’t care about that, it’s not part of their, their dialogue. Today they are all about; how many words can we pump through an engine. How scalable is the engine? How fast will this engine work? Can we take that engine and stick it on our customer service portal or user support forum. So it’s all about gaining the benefits, rather than understanding the technology. And I think that’s going to go better or sorry the curiosity of understanding *the technology* is going to get less and put more focus on maximising the benefits for the product. JVM: Yeah I guess an interesting, can I just carry on with Tony for awhile since you mentioned it, 28 years. So you’ve been in that previous revolution of translation memory entering the market. Would you say it’s very similar, exactly the same story that people are just shocked at the beginning and you know so? Tony O’Dowd: I just want to make it very clear that I was 12 when it entered the market (audience laughs). I think Jaap, that you were 20 I think? JVM: I started in 1913, so (audience laughs) Tony O’Dowd: Sorry the question was? JVM: The question was; is the MT revolution that we are going through now, very very similar or exactly the same as the translation memory revolution in the eighties, late eighties. Tony O’Dowd: It has certain characteristics, I remember back when, I remember one of the first meetings we had with you guys, you had to build your first translation memory system. I think at the time there was only one other product available, which was Xl8. I remember this is going back 25-‐28 years ago. Ah this man remembers it as well. It was like you know, if you were using it you were the pioneer OK. It’s like a brick in a dam; you’re a pioneer, and that’s one brick out of the dam, and then you get somebody else using it and that’s another brick out of the dam, and eventually, the more and more bricks that get out of the dam, the dam bursts and the revolution is here and that’s what we are going through now. So we are seeing lots and lots of progressively, curious LSPs and ISVs that want to get onto this train, the MT train. They are taking bricks out of the dam but we haven’t got the dam burst yet. It’s not quite a torrent of water but I think its accelerating, you know I think the work that TAUS is doing in these shows and exhibitions are certainly adding to that and I just
23
get a sense and maybe my competitors there, probably sense that today there is more and more people gravitating towards the benefits of MT. It is not a replacement strategy that they are adopting. It’s you know, augmenting a supplementary strategy to help them do things faster, cheaper to translate more content. I think it’s coming and I think Moses, the open source version of Moses has clearly been at the centre of that. To make MT accessible to more people than any other effort of MT has ever done before. JVM: Right yeah, so now we get all these questions because people are, they are fearing, and they are not knowing and that will just go away after a year or two from now no questions asked it’s just a given; you use MT technology. Scott Gaskill (1:03:31): Customers are going to see MT is an enabler to help them get their translations done. It’s no longer going to be how did you run it through the tool, what percentage went through the tool and so forth, it’s really going to get down to delivering translations to the customer the way the customer wants it, and enabling those tools, different tools will offer that today to be able to deliver that to an LSP or directly out to a customer. At the end of the day, I don’t think we are going to be asking questions about TMs and MT and everything else, we are really going to be asking questions; how well can we deliver to our customers and that our customers have the ability to us information back so that we can make it better in the long run. Tony (1:04:22): That’s a very strong point, because if you think about the ultimate end in that is that we won’t be talking in 3 or 4 year’s time about MT and TM, we’ll be just talking about pre-‐translation. You won’t care where it came from its just high quality pre-‐translation. So this argument, will kind of be almost muted in a few years time. We are getting to that today, almost every client we have today is not using machine translation in the absense of translation memory, they are using both technologies it’s a seamless experience. So I think we will be talking pre-‐translation, there will be no distinction. JVM: And since this is a Moses Core funded workshop, I want to ask the question, for the record because everything is recorded and we are reporting back to the European Commission. On the front row we have Systran, here in the audience, you know actually we requested that we would open this workshop up, with we have been running it for three years now, almost and called the MT showcase and not Moses Core showcase necessarily because we would like to know what else is out there and what else is progressing, is the future just Moses. We didn’t select you because your using Moses, because you all using Moses right. And so does the European Commission, what do you think? I mean is there, how would the other part of the machine translation market develop, and I’ll come back to you if you would like to comment on that too. John Tinsley, (1:06:02): I think as I kind of presented earlier on, Moses has a place in any type of MT world, so even if you are a company like Systran who provides
24
essentially rule-‐based machine translation systems you can use, something like Moses could be used at various stages in that process to enhance MT. JVM: And they do, yeah they put it in the mix too. Let me come straight to you. *Name (male)* Systran: Hi, Yes, Systran is actually using Moses since 2009, right? (Systran colleague agrees) 2009 to do some kind of statistical post-‐editing. So Moses alleviates our research and development just as it does for you, so that we can concentrate on peripheral technologies and improving the output from Moses. Actually Systran is reasoning a little like you, combining different steps and building the translation along, using either RBMT, SMT, pre-‐processing, post processing and so on. We counted that we have close to 49 different processes, between the time you entered your document into the system and the time it gets out of there. So yeah, we are using Moses and we believe Moses is a very nice initiative, because it allows to combine forces to provide core technologies that we can build around. John Tinsley: I think a good point there, you say it allows you to kind of alleviate your R&D efforts and kind of use it as a supplementary tool, but I think one of the real powers of Moses is that you because its open source you have the capacity to actually perfect it yourself, so it’s a really, really strong kind of baseline that it gives you that you can then build upon and make it do things that it doesn’t necessarily do yet, for your own benefits as well. JVM: Yes Andrejs, go ahead. Andrejs Vasiljevs, Tilde (1:08:07): It’s Moses, I think yes indeed for, for, for some time to come and actually Moses is one of the most mature, and best implementation of breakthrough in machine translation in the late eighties by researchers in IBM, Tomas Watson research centre have published very famous papers on statistical machine translation. And in those few details, Moses and people at Edinburgh and other things were able to implement these methods in quite a robust platform. But I think there is an ongoing discussion in research community that we have to look for, for other alternatives, other directions that statistical machine translation is not the end of the game, but there probably could be some next breakthrough possible in the coming decades. But it will take, first we have to come to this breakthrough in research field and then it will take at least a decade while the breakthrough will be mature enough to use for, for practical purposes. JVM: Thank you, yeah Tony? Tony O’Dowd (1:09:20): Just in relation to Moses, I have a very considerate view on Moses. If you equate Moses to the internal combustion engine. Every car manufactured in the world uses the internal combustion engine and that is what Moses is to MT providers, it’s like an internal combustion engine but just like every
25
car is different from every car manufacturer we are going to get lots of different flavours, we are going to get huge leaps in innovation, we are going to get lots of new reordering models, going to get analytic models and we are going to extend the power over and over again. And just like if you’re in formula one, you’re going to have a 3.5, 300 break horse, break bhp break horse power engine, the family sedan is only going to need 100 break horse power. So that is what I view Moses as, it’s the internal combustion engine for a whole range of industries and it’s just going to change the way we translate content. Chris: Do you mind if I…. JVM: Yes, Chris grab the microphone and then I’ll come to you Tom. Chris (1:10:21): So to take that analogy just a little bit further. Somebody is going to come around and advance the electric engine, and you’re going to have to test that. And that’s going to start the next revolution and so we are working really hard just as we all are together, em the internal combustion engine does go very far but there are its limitations and flaws. But another analogy that we kind of joke around about is. Does anybody know, I don’t know why they call the Moses project the Moses project, but if you think about it, Moses stuttered, and he had a lot of responsibilities that if you know the story. God gave him a lot of responsibilities, he kind of talked about him stuttering and then some responsibilities were taken away from him and given to his brother, or his brother Aaron. So there is going to be borrowing, and other tools that supplement because Moses stutters, you know Moses isn’t perfect, we can supplement it, but some big breakthrough is going to change and revolutionise and Moses never entered the promised-‐land, he wandered through the wilderness. So just the same way I don’t think Moses is going to take us into the promised-‐land of. Comment from Audience – too difficult to hear. Chris: Yeah, yeah forty years, yeah and Moses we can count down and figure out when that fortieth year is. Comment from Audience – too difficult to hear. Chris: But something is going to come, and it’s going to be the electric engine or you know it’s going to be somebody else leading us, some other tool leading us into the promised-‐land. JVM: Chris thank you for that nice allegory. Tom Hoar is in the audience also from Precision Language. User of Moses. Tom Hoar (1:12:09): Tom Hoar, MD Precision Translation Tools, we have a distributal software application that people license and insource the production of SMT. Like Tony said you were talking about the pre-‐processing tool chain. Or pre-‐processing. I like to call it, it’s simply MT production or translation production, we are all in the
26
business of translation production and when we get into the concept that we are really like a factory and there are lots of things that go in the tool chain and that’s what we are doing, but anyway, I agree with you Tony, I agree with you that we’ve got a an engine, it’s a generic thing that is customised. You put an internal combustion engine on four wheels, you can get a truck, a pick-‐up truck or you can get a Lamborghini OK. And they are different things and each has a different purpose. But let’s look at Moses is out there for 8 almost 9 years, it’s been heavily funded the development of Moses has been heavily funded by the European community and by DARPA. Moses Core was a three year scheduled project, we are on the third year. Sometime towards the end of this year, Moses released three and their deliverables will make the Moses core project complete. I don’t know, what the European, so one question is; what is the European community or other communities’ involvement in supplementing the Moses development after the Moses core expires? JVM: Nobody knows. Tom Hoar (1:13:50): Nobody knows, OK it’s undetermined. So let’s take some hypothetical, what happens if government funding stops for Moses? Each of you and I and maybe some other people in the room have systems built around Moses. We have a contingency plan for what happens if Moses is no longer developed, it really basically puts us all in a position as commercial vendors around a government sponsored open source project of having to either, fork the code, branch the code or do something with the code that’s there. And so I’d like, on the, so A) do you have, any of you thought about that eventuality and do you have contingency plans in place? So that’s a question and finally, in some of the, I’ve proposed on the discussion board that, in one of the unconference sessions, probably the second to last or the last one, not tomorrow but on the last day of the conference, which is Friday. I suggested that some of us as implementers get together and look at the practical things that are necessary to keep Moses going should funding disappear, what are we going to do? JVM: Thank you, that’s on record and I suggest you give the mic to your neighbour. Do you have a contingency plan? *Name (male)*, Precision Translation Tools (1:15:24): I wanted to add to Tom’s words. Yes, we have been Moses users for, since five years ago anyway* if there really is one* Moses, I think you will agree with me, levels the playing field, we are all using basically the same translation engine is just the others the other technologies that we merge around it that make us our technologies, our offerings different. There are other translators, there are other statistical translators out there, now with the funding coming to an end as Tom says, I don’t envisage the death of Moses, but I think it is going to speed up the birth of other alternatives and that could affect our business models. So it’s only the business model of people that only offer technology. We have to feed into, I think it’s going to speed up the birth of other technologies, disruptive technologies or disruptive translators. *
27
JVM: So it sounds like a recommendation to stop the funding. To stimulate innovation, disruptive. So what are your thoughts there, what’s next if the funding stops? Chris: With a development background that works in the open source community from time to time, open source typically if it’s a viable tool the community builds it and continues moving forward. It doesn’t have to have a governing body behind it pushing it. I think that was the initial intent, to bring it to the little more the limelight of the open source community is so the community actually contributes back to it. And as far as our contingency plan, as we said we have been taking away responsibilities from Moses and we actually have a timeline of when that is, of when those layers are finally replaced. But it is also a transitionary period where we have to make adjustments. Question from Tom: And you have identified Moses as a legacy technology, which you are moving away from? Chris: Exactly, yeah. JVM: Do you think that’s on record? Did we hear that, really hear that? Tom Hoar: Ill repeat that, so you have identified Moses as a legacy technology that you are moving away from and just for the record. Chris: Yes, we are actually surprised we were able to use it and in the work flow that we are currently are showcasing of the online learning to be able to learn in real time, to adapt to do both immediate learning and post analysis afterwards to make sure each post edit truly does take into account and adapts over time, that tuning based on each post-‐edit. We were surprised that we could get it done while implementing with Moses. Tom Hoar (1:18:24): Can I *arbour*, Tony you were shaking your head, yeah, yeah, yeah can you contribute here? Tony: (Laughing) Well I, you know, I wish I had a crystal ball and could stick it on the table here and kind of predict the future but nobody can go measure. I think there is plenty of examples of open source technology that is not government supported that have flourished, everybody uses them day to day. I use the *xerces xml parser* it’s been out there for donkeys years, its fully open source, fantastic. In fact I’d say most developers in this room probably *xerces xml parser*. Tom Hoar: Question does anybody use *post press database* in this room? Do you use lib what is it? Lidpqx the c++ library for it? The author is sitting right here. (laughing). Anyway, OK Tony, next I’d like to get a little bit of *input* because everybody was shaking their heads about legacy, the contingency plan.
28
John Tinsley: Yeah I think people seem to like analogies here so, I think Moses is like a snowball that’s you know it’s after gaining enough momentum that if you turned off the funding tap tomorrow, I think there are I think nearly every MT research group in the world I’d say is using Moses, is developing Moses, many of them are contributing to the code bases, so I don’t think the development from that perspective is going to go away but what I think as commercial providers of Moses we need to do is build the competency in Moses within our teams. A) so that you know if it suddenly you know disappears that, you know if suddenly our pool of open source developers who are doing all the work for us disappeared that you know we’d be able to pick it up ourselves. But also that we can you know to be able to improve Moses we have to understand it and it’s a very modular piece of software so you know there are methods in there for doing the word alignment, the phrase alignment, heuristics there is the decoder, etc etc. We might want our own version of that to get it in some different formats so we can manipulate it in other ways. Tom Hoar: And finally? Andrejs Vasiljevs (1:20:34): This community is so dynamic if you look on the papers presented at the research conference there are tons of papers on different modules, and components and methods have been viewed to complement or place some complements on the Moses. So it’s not just a European Commission funded activity, actually it’s a very vibrant feel, funded by universities and other funding agencies. But still the question is who will take care about packaging that together and ensuring some level of quality and support for the core Moses toolkit, and that is still a question for the community to organise better support activity * when it branches the Moses operations, and sometimes you see very nice feature in one package and very nice feature in another package and if that would be packaged together it would be so excellent. But who will do that? Tom Hoar: OK, I’m going to pass it to Jerone, because he has a question, you want to say something. JVM: You can take it over Tom. Tom Hoar: Introduce yourself JVM: That’s OK Jeroen Vermeulen, Canonical: Jeroen Vermeulen with Canonical the makers of Ubuntu Linux and precision Translation Tools here. I would like to amplify what Andrejs has just said. It is not just a matter of funding, in open source projects there are certain pathologies which set in. It’s a matter of stewardship, you need a central accepted authority to bring together developments in a project, and it’s OK if authority breaks down as long as somebody else steps in and becomes accepted as the active developer. If that turns into conflict, confusion or entirely disconnected
29
development, there is substantial risk to the further development of the combustion engine. To go back to that analogy of the combustion engine there was a time not very long ago when companies were confronted with a task of designing a combustion engine from scratch and discovered they could not do it. Some very well-‐known companies nearly broke their backs financially trying to reconstruct that, to develop a new engine from scratch. Today that is something we laugh at, several companies have been able to do it now, but this really substantially slowed down the development of the car industry and the same thing happens in open source. I could name you some vital, absolutely vital open source infrastructure to computing today, such as the X windows system which has been stagnant since, the turn of the century I would say. Everybody knows this needs a replacement, everybody agrees on it, everybody is working on it any yet we are not getting there. So I say funding be damned but there needs to be good stewardship. That is essential you can cannot just put a bunch of people in a room and say we’ll do it together. Somebody needs to have the responsibility it’s not a philosophical point it’s a practical necessity. JVM (1:24:09): Thank you this is all very valuable material. And I’ll tell you, just as you say Tom, we are approaching the end of the third round of funding of the Moses core project now and we will be interviewing a lot of Moses users and developers in the next two months. So since we have lovely insiders here. I’d like to just do a very quick exercise, I think we have a pretty good idea of who the users and developers around Moses are, but can I just do a quick, sort of name dropping, and so that we know that perhaps we don’t forget about a party. Can I go with you first, just drop names and they are all on record. Names of companies using Moses:
• Precision Translation Tools; our product is do Moses yourself as an open source collection of all the tools.
• Pangeanic (Manuel) • Tauyou • *Euroscript* We are also using Moses, we can mention also Asia online with
Philip Koehn, and the group in Edinburgh is instrumental for the development of Moses.
• Moravia is also using Moses among other engines • Crosslang *Natalie* we build Moses engines for our customers with their
data -‐ they are also using it indirectly I suppose. • Google • Welocalize • Safaba • Sovee • KantanMT • *Logos* • Arabis (LSP) • Iconic Translation Machines
30
JVM: Another topic? Is there still an appetite for another topic? We just talked about the sustainability of Moses and you know what happens and so on? But a totally different topic around Moses or MT, do you have something that you would like to discuss? Or are we sort of getting tired and ready for drinks? Tony O’Dowd: Let the audience decide. JVM: Ladies and Gentleman, just fill out the form and you are free to go. Thank you Panel, Thank you all for doing this.
31
Appendix 3: Notes from Moses Industry Roundtable Breakout Discussions
Business Breakout Thoughts from Phil Koehn: Follow the model of Apache sponsors sponsors who only get advertising and nothing else governed by board membership time limited and voted by foundation members different levels of sponsorship Possibly good to start foundation housed at one university Integration and documentation would be 2 priorities also events and then evaluation campaigns, plus student sponsorship should levels of sponsorship give you more votes? probably not because don’t want sponsors telling you what to do for a working group, where does the money come from? need to have at least a couple of staff members to do code maintenance who will pay for them? for a corporate sponsor what if that company loses interest does that scare off other companies? or is it like Android and multiple companies will still be interested Safaba would be more comfortable if there were not just one company controlling Tilde would need assurance of how much the sponsor would control the plan Being part of the open source project allows you access to the new development going on your own you miss out on the developments of the code if people go off on their own, the summed costs much much higher than pooling efforts Advantage of a single big sponsor is that they are willing to put more effort behind it like with Apache, each company invests very little in the environment Academic sponsor scale of support would be lower could house it there at a university, but the money won’t come from there Could there be an independent revenue stream, a la the Mozilla foundation? But hard to see where that stream would be? Looking at where Moses is as at a platform, still very blue sky very early stages of the development
32
the tool will grow for decades to come Need to have a framework from state-‐of-‐art research for long time to come so that platform serves its purpose Some companies will develop their own proprietary solution, but they will have to support it entirely themselves On the subject of community Cracker provides money for community-‐based development but also drives the comparison and the shared tasks comparison and shared tasks most important way to drive community so these were successful to drive the advancements through the 2000s (NIST, DARPA) Moses foundation could be more region-‐independent WME is more Euro-‐focused A large corporation could sponsor a specific shared task to encourage concentration on one task or problem what would be the first steps? Talk with possible host organizations to see if they are willing Geographic location is a question Maybe one in EU, one in US, how about in Asia? Moving out of EU will help to make it less EU-‐identified Also, see if there if there are enough people willing to put it funding Need to clarify legal issues and mission statement first before can begin the fundraising possibly could start with sponsoring smaller projects, beginning with baby steps
Technical Breakout Some organizations require software approval for major version changes
• this should be kept in mind when numbering Moses releases Windows support
• Moses originally developed on Windows • Later more contributors for Linux version • Supported under Cygwin
Users often develop own infrastructure around decoder Hieu described release process
• Train & run models for 16 language pairs • Branch about one month before release • Rare patch releases
API changes
33
• Unclear how API is used upstream • No versioning • Command line/named pipes don’t change
Contributions in /contrib folder • Often missing description • Often unclear if maintained or not • Some contributions where moved to /contrib folder to clean up decoder code
base • Some users would be open to contribute components, but infrastructure
developed around decoder (see above) often creates dependencies Feature requests
• Training cycle too long • Translation quality – how to judge(?) • Multilingual phrase table
34
Appendix 4: TAUS MT Showcase Vancouver 2014 Discussions Q&As (1:07) (Achim) Thanks to all the presenters for an interesting and diverse set of presentations. I know we haven’t had much time for questions to Sasha and Tom. Does anybody have any questions for them? Q (Marco) Am I allowed? (Laughs). So, Saša, I’ve seen the numbers you’ve shown us and they are incredible and good news for the Moses community. Now you have been delivering scale in terms of number of requests with Moses. But I need some clarification. You were saying that you managed to go to 20M seconds time but I think that in order to set the expectation, am I right, that this is only applicable for query translation where the reordering model is very easy, the phrase table can be pruned a lot, so, do you have any data in Moses speed if you try to translate the entire description? A(Saša) So first of all yes, you are right, the 20M seconds are only for the search backend and this is fortunately only queries which are mostly very short sequences and as I said we need, like, reordering so actually the distortion limit is only 1 in Moses and also the stake? limits of the pruning is very aggressive without much degradation in quality. So we still look at kind of finding the suites? spots so we have one part of our tuning or training of the systems is to look where the suite spot is for the training systems in terms of quality and speed. Obviously when you move to longer sequences like titles these 20M seconds with hold any more. But these are also use cases where you do not necessarily really need this real time thing. So search is something which is kind of user generated content essentially and if a user types a search query, the users don’t like to wait so there is really where you need the real time component. For the titles we allow some latency because the sellers put up their item, but then we have time. We then can take a couple of seconds to translate, put it in the cashes and even if someone hits it in a search like within the second that it went online and if they don’t see the translation, this is something we accept because it will just be a couple of customers but then at some time, once the translation is ready we serve this. Q(Marco) Can I ask the question in another way: So, do we still need to improve Moses in terms of performance? What I mean is that, if you have the full reordering model, if you don’t prune the phrase table, you don’t run optimization, is Moses able to translate a sentence as fast as Google Translate and Microsoft, or the technology is not there because we need to make it more scalable, in your opinion? A(Saša) In my opinion, I think Moses is pretty fast, so it is very competitive. I don’t know what the speeds of Google are, I mean they have a kind of massive parallelism, I don’t know if they do some kind of sub-‐sentential splitting or something, probably not, and so still they will not be able to, like, for long complex sentences they will probably also take a little bit longer but I don’t know about the measurements.
35
Q(Marco) You are satisfied? A(Saša) Yes, I am pretty satisfied with the robustness of Moses. I think our Moses servers crash very very rarely so this was something that we raised earlier on when we developed this orchestration layer and a lot of fail safety went into the Java or the orchestration layer at eBays old Java where we thought what happens if our Moses server starts crashing. What if we have 10 Moses online and first on crashes, and then the others take the load and then the next one crashes. Because at this point we fully load the phrase tables and the memory so the startup time is not instant, it takes some time. So what if we have this worst case chain where everything fails but the first Moses server didn’t come up yet? But so far, I mean knock on wood, it has not happened and it is pretty much stable. I don’t know the exact numbers, I mean there are crashes just here and there but as I said it is very rare. Q(audience) How customizes your Moses engine? Is it the standard thing or you guys rewrote parts of it? A(Saša) No, so actually it’s pretty much out of the box, no fancy features, it’s no complex models that are allowed within Moses so we use phrase-‐based MT, no syntax or hierarchical phrase-‐based. It is all too slow, no complex reordering models, it is just like the baseline of eight to ten features. We treat a lot the way the data goes in, so like we have a full repository of data assets and essentially the training pipeline allows you to figure which data goes in with which weight and there you can sort of fine tune you systems towards the e-‐commerce domain and there is a lot of selection on the data side essentially and filtering and so on. But then when it comes to the engine large language models, because we have large amounts of titles and queries, so these go in, I mean queries from the target language if we have it and titles also depending on whether we have titles available in that language. So for Russian for instance for titles translating then from English to Russian we don’t have Russian titles so we crawled the web at other e-‐commerce sites like TopShop and so on. So there are data efforts definitely, and it is targeted crawls because sometimes it is not easily crawlable so, we need to find specific solutions to go into the website and kind of extract the content. But besides that from the technological point it is just Moses 2.1.1 I think, the latest version. Q(audience) Are you planning at any point to open an API for letting other people also use your engine? A(Tom) Yes, I thing long term that is envisaged. I don’t know when so I think that if you ask this question Asani would say yes, tomorrow we’ll do it but if you ask me I say no, we first have to get the basics right but I mean at some point, when everything is sophisticated why not, I think there is some value, I don’t know how, I mean if this is going to be monetized or something, I don’t know but the systems if they work for our customers and they are happy they should also be useful to others. There is no reason not to do this. But as I said at this point we just started and we figured just the basics out and I think we need more time to kind of optimize
36
this process as it currently stands to be able to manage even more language pairs, to cut down the development time – so currently it takes 2 to 3 months to launch a new language pair and I think ideally if everything, so the longest time actually that we have to wait is for the in-‐domain data that we post edit, for the training data and that we human translate for the test sets and this is going to (1:15) extra? vendors currently through our localization team and this takes then, depending on how much it is it can take 4-‐8 weeks, or even, I don’t know, we had cases with large batches initially that took like more than 2 months but we are trying now to select better, to better sampling which is relevant and then only send this out. But if we can reduce these cycles maybe by using even your technology at some point I hope we should be able to launch new language pairs within 1 or 1 and a half months and if we get this right we can like look and see who can use this. Q(Achim) Any more questions from the audience to the podium? Q(audience) I have not seen your language pairs in any of your tools but do you have regional differences like French from France, French from Quebec… A(Saša) We do Spanish, Latin American Spanish, Portuguese from Brazil and Portuguese. Curiously initially, at some point we spited Latin American Spanish to Mexican Spanish etc. and that was too much like, it fragmented too much our community and we didn’t have request for it so we actually reverted and the other curious thing for us was that so far no one has ever asked us for English from the UK. So we didn’t split that. A(Marco) For us the CAT tool can have all the language locales but when it comes to MT for example the MT engine will be only one for Portuguese, only one for Spanish, so the CAT tool supports everything but the background suggestions will come only to the major language and we have to pick one of the two which made many Brazilian translators not very happy. Q(Tom)Oh, did you pick Portuguese from Portugal? A(Marco) Yes. A() That’s interesting because Google Translate does the Portuguese from Brazil. A(Marco) So, that is the opposite for us. The engine, like the commercial engine would be Google Translate in this case, so, is this correct, the Portuguese are complaining. A() They don’t declare that it is Portuguese from Brazil, I think that there is much more data and purchase from Brazil, so it tends to… A() Probably for them yes. So for the eBay use cases we use like the public data that is available for Spanish which is mostly like the European Continental Spanish but our in-‐domain data we use Colombian contractors, I think, because they are kind of closest to the general Spanish use case. So we try to address this by the in-‐domain data to get it right for Latin America. But still there is this smoothing happening because there are like regional dialects but we don’t address this. But then we
37
actually used our Latin American system and launched it in Europe for UK English to Spain kind of language pairs and the acceptability was just a little bit smaller than the others. So I don’t know the exact numbers but if the acceptability numbers for one system was 85% the same system for within Europe was at 81 or 82 so you can see like a little impact on people might be like saying oh this is not the right word but still I kind of understand it but it is not that big problem. A() So for us we noticed that the difference between Latin American Spanish and European Spanish was a lot about tone, so for queries it would not be so much of an issue but in text in Spain you almost always use the ‘tu’ so very informal whereas in Latin America you use ‘usted’ so it is very formal and in those cases verbs change and they all have to agree etc. so it more in the conversation aspect where you see all the differences. A(Saša) Yes, again for eBay for titles and queries it is not real languages, we define it by linguistic standards, it’s just like kind of a known language. A() And that also shows again that you need to evaluate again the MT for your specific use case and your material, … A()I think It shows that for your training data, whatever you use, to create your SMT engine sets the tone of whatever your translations are going to create, so if you train an engine through your data it is going to come out in the style of your data and that is true whether it is these guys using their system or whether it is our system or whether it is Microsoft hub…. (the rest not very audible) Q(not very audible) …(sb asking Translated and eBay) Are you both going to succeed? A(Marco) Well, it is a 250M dollar market, a good draw, but since we wanted to make 1B, 250M was not enough.. A()In 1990 although we launched Photoshop in the market where outsourcing graphics production was 300 dollars per hour, they now make 4B dollars a year and outsourcing graphics production went 30 dollar per hour. I don’t know, who is going to win? There is not going to be a winner… A(Marco) No zero winners, two winners!.. … Q(A question for Matecat guys. Do you train data for the engines) (Marco) We have not developed that yet. So if you use Matecat now it will come from a commercial engine Google Translate or Microsoft, if you use this interface. What we plan to add in the future is that after you load your TM into the system there will be a magic button to say get the Moses engine with this data and basically within the CAT tool as you are translating you will have the Moses engine with this online learning functionality. So today it is more manual. You take the data, you create the engine and basically you can connect the engine very easily to Matecat, but we haven’t created the automation of that.
38
C() One suggestion. We have experimented with this a little bit with system combination. The problem with Moses a lot of the times lacks of coverage and so what we started doing is can we do a combination of Google Translate and Moses and with this combination to have extra coverage but still have Moses to be able to do, to support regional… I mean this has been just in-‐house experiment but it seems to do fair well. With TR with are able to have most of the segments out and then use that for extra coverage. Q(Marco) You mean to suggest two things to the translator or combine? A() Combine. C(Marco) That is smart. I don’t know how to do it. …(not audible) () eBay is using version 2.1.1 what are you two using in your basic systems orwhen you build your Moses systems? What version? () I think we are using the same. Yes, because we started using Moses in June? Whatever the latest version was in June. (audience) You guys are building a new tool! () There are people right now working on it. (Marco) No, it is night in Europe, they sleep! () Not our guys! (laughs) Q (Achim)I have a few questions two for Saša. You’re kind of building out this functionality now on the eBay site to use MT to enable processes for trade. I guess you have a rush for the markets you are targeting you already have translated interfaces. But you see for the actual content you see opportunity is where you can use post-‐editors to correct the translations that the MT system makes, have or at least give sellers the option to use the services. A (Saša) So I think we are working on the sellers as well. I think … that this is a bullet point that we will offer to sellers if they want to reach more people to buy translation services, MT is a free service so they can kind of see, make changes if they speak the language, there are options for post-‐editing or even like human quality translation. I am not sure how this is going to play out, I am a little bit skeptical because I do not directly see the need but there are business people in the eBay that see the use case better and they see some value in there. Regarding the
39
corrections I think you asked like if we want to do post-‐editing of MT output on the side. So we have some feedback, lots of possibilities internally if we have mistranslated something there is an interface where you can correct it and then automatically we fix it and essentially overwrite it but this time it is not fed back into the MT engine so this is something that we hope to address with, yes, the recent changes to Moses where at some point when we identify that we make mistakes this then will also be fixed automatically in the engines. So there is still some potential there. Currently-‐ we would need to re-‐train and fully re-‐deploy a system which is still costly. Q (Achim) Tom, you talked a lot about what is required to train an MT system, what skills are required. What are the advantages for running you own MT on premises, maybe in the cloud… A (Tom) Well, that is actually my presentation on Friday! There is another presentation on Friday about the pros and cons for insourcing and outsourcing, OK? But in general if you need to go to quick production and you want your MT system to satisfy an immediate requirement, insourcing is probably not the way to go. It is definitely not your way to go. It takes time to learn how to operate the system, how to use the system and put it into implementation. So that is kind of to say that I had a lot of potential customers who come to us and say: ‘we need a system up and running in two weeks and we need it to…’ and I say, go to somebody to outsource it, so it is not something you are getting into when you build your system and you insource it. The advantages of insourcing are that you eliminate the recurring cost of a subscription-‐based system so whatever investment you make in building and operating you system comes eventually a fixed cost. And you have the ability to create new models that can create a new system and therefore make more money on that new system, I mean you can put yourself into a declining cost return on investment model. You also control over what you do, you can do your own quality assurance integration of the system. If you are using for example Google Translate, they may re-‐train their engine tomorrow and you have totally lost control of the quality control of the internet engine that was there yesterday. Q () Do you thing that this is a little comparable with that slightly older discussion shall I keep the services in-‐house or shall I move to the cloud Amazon EC2? A (Tom) Totally different conversation. For example our system is built in such a way that our server can only run EC2. So, do you want your infrastructure that runs our software in-‐house or EC2 is a different question that if I want to run my own service or outsource the production to somebody else. C () Right, so, what I mean is that instead of having someone with the skills necessary to manage the system, you can have someone that besides doing other things communicates, with the subscription service that does it. So, the advantage of having services in the cloud is that at some point it always becomes more cost efficient to having them in the cloud unless you got to huge scale and I wonder if just the efficiency that you were going to get if your are running it on something like
40
KantanMT or your system if it runs in the cloud there is so much efficiency that unless you run a huge system there is no, it is not cost efficient to do it yourself. It will always be better to do subscription. A () With your own data, sending your own data, of course. We actually have customers that provide that service. We don’t, we make the tools and our customers can either use their system they build for themselves or they can open it up as a tool for their customers. So we have customers who build their own system and then they use that system to service their customers. We just don’t provide that service as such. Q () Question for Marco: Will we have Matecat with a mobile UI? A() Well, it’s open source, so, maybe in two weeks we’ll have one! (laughs) A (Marco) The target that we are going with Matecat is more standard translation projects so I guess they will be longer. And today you can open on a mobile interface, you cannot create projects but you can open the translation interface. So as a translator, you can receive a link and you can open the document. Before in Trados you had to go to your desktop, import the thing, before seeing the content. So you can see the content, you can actually edit. It is not mobile friendly so, it is not like Unbable that has been designed around that experience so I would not recommend anyone to use Matecat on the mobile, but at least you can see the work and you can fix a sentence today. So it is not in the short roadmap to have a mobile system so you can create projects on the mobile. It is not the intent, but who knows. Q (Achim) Doesn’t this mean it relates to the Unbable then, because you’re focusing much on mobile editing for a medium term. Does it mean that you are good at certain kind of, targeting certain type of content and that is working very good for that, or? 1:34 A (Vasco Pedro) Yes, I mean I think that naturally certain types of content fit better on mobile, so anything that is social, conversational, you know, email type tends to be better than if it’s a technical document. We don’t, we are not right now filtering that out in the sense that since we are chunking everything goes to the mobile but you can see it, I mean it’s … there are just certain types of content that people, for example, you go to the site the way it works and you say give me a task, but you can skip it, you are not forced to do it. But you can select, like, what you want to do. So we assign you something. Now what you can see is that certain types of content are skipped much more on the mobile than on the web because they are less effective. Right now we are not so differentiating on this but I think that this will just happen naturally.
41
Q (audience) I have an intellectual property question. So we see here on the screen: Input your TMs. My question is what happens to My Data after we do the translation job, whether it’s the TMX or the actual document I want to translate, how private will this be or it will continue to be after the translate jobs. A (Marco) There is a link called Terms in a button that explains that. So what we do with this … first it is a cloud infrastructure so we are saying that this is something useful for everybody that is open to the cloud, open to use Gmail as an email provider etc. So if you’re talking about strictly confidential information about rockets probably you shouldn’t be doing translations in the cloud at all. But if you are there, the Terms basically say that the content is your content, you can get it back at any time that you want and delete all the content that is there and what we do as the tool works, if you’ve seen the online learning, we use your edits in order to improve the MT. And we don’t share, in the conditions we say we don’t share the data across customers so if you have a private TM obviously you will be the only one accessing this TM and no one else. We use you edits in order to improve the MT. Q (Achim) For everybody or specifically for you? A (Marco) Well today only for you. But the Terms say that we can use the data in order to improve the MT in general. Q (audience) And for Unbabel, is that similar? A () A, in what sense? I mean we don’t… you send us text, right? Other people don’t access your content but you can use it to train, unless people specifically ask us. So far no one did. We only have one customer who said, ‘could I do it?’ Yes, sure, we would be able to do that and have this content not be used. But not really requested. I mean we only use that just to train the engine. To be honest, initially we thought it would be much more of an issue. We thought about it a lot. For example one of the modules in building is an anonymous optimizer. But we thought, oh, it would be great to send things out because of the crowd in a way of being completely honest, by sending an email to other people, do you want to translate it, we are actually developing a tool to anonymize text after translation correction and to re-‐construct it. We started doing this, initially, as a research project with IST but to be honest no one requested it. Like we thought that people would immediately ask ‘what about anonymization? No one. C () Again on Friday’s session we talk about that. It falls into what I call Category. It comes out of the term Cryptography because it is called Trusted third parties. OK? You’re in a trusted third party relationship with outsourcing provider. So read the terms of terms and conditions, if you accept the terms and conditions then you’re in a trusted relationship but you still have to trust them as a third party to do what they say they will do. OK? And I am not questioning anybody’s integrity, I am just saying that this is the reality of it. It is a trusted relationship of the service provider being a third party, because you and your customer are the first and second party and you’re sending it to a third party. OK? I think, my experience has been that those people
42
who have requirements where the security endorse intellectual property requirements demand that it can’t be connected to the web, don’t go for these services anyway. It is a self-‐filtering environment. C () We only had some requests at the beginning that was a bit scary us. We started getting some requests from the Middle-‐east and one of the requests were documents, like they sent us documents in pdf and that’s when we said, unfortunately we can’t do it for technical reasons, but it did say, I think at the time was the Palestinian police and it was like this, people that were murdered and things like this, so we say, what do we do with this. It was very early on. There was another story involved in there, but you know, that scared us a little bit, I said, oh my God, what would happen. And fortunately we were not able to do it. If it helps anything, of course there is a trusted relationship. Everybody that signs as an editor, you know, basically they are accepting the terms of service, the can’t copy etc. and they are working on our platform which limits, makes it a little harder to extract any content out, so we don’t send jobs out to them, they go it and that’s it they don’t have access to the text afterwards. So there’s a few things that limit, but there hasn’t been an issue. C (Marco) Can I say something in favor of the Cloud? One reason we also did it in Matecat and Translated LSP is because we add privacy use. We have been shipping content to translators via email and even the TMs via email. And basically when you send someone by email you have no control of what will happen afterwards with the data. So also the security of the desktop of the translator is not as good as a server. But the big problem is that you cannot control it. So, by going into the Cloud at least for us what was really good was that I give access to the translator to the document he has to translate and the portion of the TM. But I don’t distribute data all around. Because, on point is that if I want to remove that now I simply remove it and it is done. So to us it was an improvement in terms of security, the Cloud. C () I agree. And it really depends on what you are comparing it to. I am no way degrading or putting down the Cloud because my customers use our software to expose their services to the cloud. OK? Just to share one very recent contract we just got. We were approached in the summertime, in July, by a client saying ‘we have a system that cannot be connected to the internet. Can you provide us a server and editing environment that we can disconnect from the internet altogether and still get machine translations?’ I said yes. We are not normally in the business of providing editing environment so there is a CAT tool in this particular contract we did. In this contract also was that we went out of our way to satisfy the customer because he had a budget. And paying customers come to the front line. And we are building their statistical models, we are training their system, we are building their corpus for them, they will be deployed into their system and they will operated it when it is actually built and deployed and functional. So, you know, we can satisfy those clients’ requirements in a way that other system providers can’t. So in that regard you really have to look into your system requirements, whether at IP or security.
43
C (Achim) There are two winners. C () The reality is that nowadays we are using so many services in the Cloud that the vast majority of people are getting used to it on one hand. So in our case there more of a self-‐selection like if you are going to use something that is a crowd translation service, you know, you are not going to use it if you are the kind of customer that needs… C () My customers would never consider that C (Marco) Yes but since you destroy documents into chunks and yet you consider anonymizing… C () Yes that’s true. That is one of the things, so the fact that you only see a portion of it. I mean, arguments can be met at both sides, one can be not enough and another better… C () I am not a security consultant. Go to your security consultant and to your lawyer for this C () We don’t go to those lanes and anything like that… I hope that it is clear to everybody that using data as TMs is quite different from using them to train MT systems. Through training MT systems it is not like getting whole sentences or that you are getting information that needs to be private (rest not audible) C () You can change the name of the country and see what happens. (laughs) Q () I think we have a very good model with Matecat. I think if you were able to provide more security to LSPs, I think you have an amazing business model out there. Q (Marco) What do you mean more security? A () More security, I mean to be able to create my micro-‐Cloud or private Clouds to be able to contain our TMs and also educate our own engines because in my case I don’t want like ten years of private data to be shared, to be an advantage to competition. And I am also dealing with very private industries, so, in which I am liable for the protection of my data. If I am able to have more protection to my data, I think this is an amazing system. I will love to be able to use it and also educating the engines. That would be something else. C () OK, that educating the engine I think that this is something that we will want to do, so, I think that partially we solve the problem because remember that this is open source, so you can click, you can get it, Matecat, that experience into your server, into your private environment. Now if you are patient and have the engineers to run the infrastructure and maintain it, so if for security reasons you want to do it
44
you can do it, actually. What you are saying, I think, is that we create private vaults in the Cloud for you, so that it is in the Cloud but it is just for you, an instance. Correct? OK. We haven’t thought about it but it is a good idea. C () I think that another point that would move into that direction is that if you encrypt the models, like your TMs or any data that comes from someone with a like a second special key or something which is not generated by a server, sorry, it can be generated but it is kind of just used by the user C (Marco) Client-‐side encryption. C () Yes, the client would provide the key and you would use that for encryption. C () You mean encrypting the phrase tables? And the translation models? C() All the data, the models themselves if they are encrypted you don’t care if they are somewhere in the cloud, that someone hacks the service and can extract them cause they will still be encrypted and very hard to decrypt them without the key you provided. C (Achim) I think we are out of time. Very interesting discussions, very different use models of the open-‐source software. So I want to thank all the presenters and the panelists. Thank you. And please don’t forget to fill in the survey in the back.