the multilingualweb-lt (=language technology) project · less than 33% of web users are native...
TRANSCRIPT
THE MULTILINGUALWEB-LT PROJECT DRUPALCON MUNICH, GERMANY. AUGUST 21, 2012
The current state of …
Page 2 2012-08-21
COCOMORE ESSENTIALS
THE MULTILINGUALWEB-LT PROJECT
Agency for integrated communication and IT services
130 employees
Offices in Germany and Bulgaria
Our clients include Fresenius, Nestlé, Procter & Gamble, PwC, RTL, Sanofi-Aventis
Drupal is our CMS of choice
Page 3 2012-08-21
THE MULTILINGUALWEB-LT PROJECT
WHAT WE WILL TALK ABOUT TODAY
1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL
4 CONCLUSION AND BACKGROUND INFORMATION
Page 4 2012-08-21
THE MULTILINGUALWEB-LT PROJECT
WHAT WE WILL TALK ABOUT TODAY
1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL
4 CONCLUSION AND BACKGROUND INFORMATION
Page 5 2012-08-21
MULTILINGUAL CONTENT: AN ESSENTIAL BUT (STILL) INSUFFICIENT PART OF THE WEB
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
Less than 33% of Web users are native English speakers • We need content for all non-English speakers too
• Knowledge and business opportunities shouldn’t be constrained by language borders
• Many people already use online translation engines for on-demand translation, but the quality can be lacking and some languages may not be supported
Translation is a US$ 21–26 billion per year business and steadily growing
Translation overhead amounts to 20% due to lack of standards
Reducing translation costs and increasing translation speed can help provide content to more Web users
Source: http://www.w3.org/2012/02/mlw-lt.html.en
Page 6 2012-08-21
To achieve the ambitous goal of helping the multilingual Web progress, the MultilingualWeb-LT project aims to
• create the translation metadata standard ITS 2.0, which will help translators and multilingual content managers deal with multilingual content
• bring CMS and localization chain closer together.
• Of course we also want to foster adoption of the results.
Both goals shall further the access to language resources and help increase the availability of multilingual content.
But why do we need metadata?
THE MULTILINGUALWEB-LT PROJECT
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
LT = Language technologies; ITS = Internationalization Tag Set
Page 7 2012-08-21
Metadata can assist the choice of LT resources
• Domain & genre
• Such information can help LSPs use the right lexicon and syntax for their translation systems or make human translators aware of particular content requirements. A medical text, for example has different requirements than a cooking recipe.
Metadata may guide the translation process and ensure quality
• Do-not-translate items and terminology
• This will help machine translation services and human translators know about what parts of a content may or may not be translatable or what parts are terminology which need to be translated with care.
WHY METADATA ARE IMPORTANT
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
Page 8 2012-08-21
Do-not-translate items and terminology
METADATA – EXAMPLES, PLEASE?
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
Domain & genre
Page 9 2012-08-21
Of course in the content at the LSP, but CMS also needs to store these metadata
• That includes the CMS’ ability to enhance content with metadata and manage these metadata
CMS need to be able to send content to the LSP while preserving the metadata even when the LSP requires a particular file format
WHERE DOES THIS METADATA NEED TO BE?
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
Page 10 2012-08-21
Thirteen partners working on the standard and its implementation until the end of 2013
We will make Drupal the first open-source CMS to handle MultilingualWeb-LT metadata
WHOSE DOING THE WORK?
THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
We also work with a W3C Working Group which includes an even wider range of partners working on the standard. We hope to bring the standard to W3C Recommendation status by the end of 2013! An ambitious goal indeed.
Page 11 2012-08-21
THE MULTILINGUALWEB-LT PROJECT
WHAT WE WILL TALK ABOUT TODAY
1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL
4 CONCLUSION AND BACKGROUND INFORMATION
Page 12 2012-08-21
Several rounds of requirements gathering • The consortium and the W3C Working Group approached all those whom the
standard aims to please: LSPs of all kinds and content producers
• Collecting requirements of a standard from each such party • Collating the information and generating of metadata categories from these
requirements
REQUIREMENTS GATHERING
METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
Page 13 2012-08-21
Published metadata standard ITS 2.0 draft
• The collected and defined categories were then published as the first ITS 2.0 draft
• The draft is in its second iteration now and a third is coming soon
• We plan on having a “feature final” draft by November
• The current draft: http://www.w3.org/TR/its20/
PUBLISHED METADATA STANDARD
METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
Page 14 2012-08-21
Due to security considerations, the CMS will only act as “client” sending and requesting information from the LSP “server”
PROGRESS – DEFINED API BETWEEN CMS & LSP
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT
Web Publisher
Language Service Provider
Translation, Enrichment, Quality Assurance ...
Multilingual Web Content
service request
content
Page 15 2012-08-21
THE MULTILINGUALWEB-LT PROJECT
WHAT WE WILL TALK ABOUT TODAY
1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL
4 CONCLUSION AND BACKGROUND INFORMATION
Page 16 2012-08-21
Drupal is being enhanced to allow the inclusion of the MultilingualWeb-LT metadata, using the ITS 2.0 standard
Three ways to apply metadata:
• 1. by the content manager through an extension to the WYSIWYG module
• 2. through automatic process either in the CMS or through a service
• 3. by the processes and technologies of the LSP
2+3 use Plugins for the Translation management module
Connection to multiple LSPs with XLIFF roundtripping
XLIFF: An OASIS Standard
PROGRESS – DRUPAL INTEGRATION
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT
d.o/project/tmgmt d.o/project/wysiwyg
http://de.wikipedia.org/wiki/XML_Localization_Interchange_File_Format
Page 17 2012-08-21
Interface to Language Service Providers
• extends the TMgmt module with LSP adaptors
• provides tool set for XLIFF (popular format for LSP systems) roundtripping
THE PROTOTYPE INTERFACE WITH LSPS
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT
Page 18 2012-08-21
THE MULTILINGUALWEB-LT PROJECT
WHAT WE WILL TALK ABOUT TODAY
1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW
2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED
3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL
4 CONCLUSION AND BACKGROUND INFORMATION
Page 19 2012-08-21
Allows for local attribution
Global attribution supports XPATH v1 selectors as required by the proposed ITS 2.0 standard
• Localisation notes
• Named entities
• Translate language information
PROTOTYPE: METADATA SUPPORTED
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT
Page 20 2012-08-21
Cocomore: [email protected]
W3C Working Group
• w3.org/International/multilingualweb/lt/
Translation Management Module
• d.o/project/tmgmt
Drupal Discussion
• g.d.o/multilingualweb-lt
MORE INFOS AND CONTACT
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT
Page 21 2012-08-21
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) Grant Agreement No. 287815 - http://l.olav.net/fp7-lt-web
FUNDING
PROGRESS IN THE MULTILINGUALWEB-LT PROJECT