the multilingualweb-lt (=language technology) project · less than 33% of web users are native...

21
THE MULTILINGUALWEB-LT PROJECT DRUPALCON MUNICH, GERMANY. AUGUST 21, 2012 The current state of …

Upload: others

Post on 15-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

THE MULTILINGUALWEB-LT PROJECT DRUPALCON MUNICH, GERMANY. AUGUST 21, 2012

The  current  state  of  …

Page 2: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 2 2012-08-21

COCOMORE ESSENTIALS

THE MULTILINGUALWEB-LT PROJECT

Agency for integrated communication and IT services

130 employees

Offices in Germany and Bulgaria

Our clients include Fresenius, Nestlé, Procter & Gamble, PwC, RTL, Sanofi-Aventis

Drupal is our CMS of choice

Page 3: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 3 2012-08-21

THE MULTILINGUALWEB-LT PROJECT

WHAT WE WILL TALK ABOUT TODAY

1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL

4 CONCLUSION AND BACKGROUND INFORMATION

Page 4: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 4 2012-08-21

THE MULTILINGUALWEB-LT PROJECT

WHAT WE WILL TALK ABOUT TODAY

1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL

4 CONCLUSION AND BACKGROUND INFORMATION

Page 5: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 5 2012-08-21

MULTILINGUAL CONTENT: AN ESSENTIAL BUT (STILL) INSUFFICIENT PART OF THE WEB

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

Less than 33% of Web users are native English speakers • We need content for all non-English speakers too

• Knowledge  and  business  opportunities  shouldn’t  be  constrained  by  language  borders

• Many people already use online translation engines for on-demand translation, but the quality can be lacking and some languages may not be supported

Translation is a US$ 21–26 billion per year business and steadily growing

Translation overhead amounts to 20% due to lack of standards

Reducing translation costs and increasing translation speed can help provide content to more Web users

Source: http://www.w3.org/2012/02/mlw-lt.html.en

Page 6: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 6 2012-08-21

To achieve the ambitous goal of helping the multilingual Web progress, the MultilingualWeb-LT project aims to

• create the translation metadata standard ITS 2.0, which will help translators and multilingual content managers deal with multilingual content

• bring CMS and localization chain closer together.

• Of course we also want to foster adoption of the results.

Both goals shall further the access to language resources and help increase the availability of multilingual content.

But why do we need metadata?

THE MULTILINGUALWEB-LT PROJECT

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

LT = Language technologies; ITS = Internationalization Tag Set

Page 7: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 7 2012-08-21

Metadata can assist the choice of LT resources

• Domain & genre

• Such information can help LSPs use the right lexicon and syntax for their translation systems or make human translators aware of particular content requirements. A medical text, for example has different requirements than a cooking recipe.

Metadata may guide the translation process and ensure quality

• Do-not-translate items and terminology

• This will help machine translation services and human translators know about what parts of a content may or may not be translatable or what parts are terminology which need to be translated with care.

WHY METADATA ARE IMPORTANT

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

Page 8: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 8 2012-08-21

Do-not-translate items and terminology

METADATA – EXAMPLES, PLEASE?

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

Domain & genre

Page 9: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 9 2012-08-21

Of course in the content at the LSP, but CMS also needs to store these metadata

• That  includes  the  CMS’  ability  to  enhance  content  with  metadata  and  manage  these metadata

CMS need to be able to send content to the LSP while preserving the metadata even when the LSP requires a particular file format

WHERE DOES THIS METADATA NEED TO BE?

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

Page 10: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 10 2012-08-21

Thirteen partners working on the standard and its implementation until the end of 2013

We will make Drupal the first open-source CMS to handle MultilingualWeb-LT metadata

WHOSE DOING THE WORK?

THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

We also work with a W3C Working Group which includes an even wider range of partners working on the standard. We hope to bring the standard to W3C Recommendation status by the end of 2013! An ambitious goal indeed.

Page 11: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 11 2012-08-21

THE MULTILINGUALWEB-LT PROJECT

WHAT WE WILL TALK ABOUT TODAY

1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL

4 CONCLUSION AND BACKGROUND INFORMATION

Page 12: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 12 2012-08-21

Several rounds of requirements gathering • The consortium and the W3C Working Group approached all those whom the

standard aims to please: LSPs of all kinds and content producers

• Collecting requirements of a standard from each such party • Collating the information and generating of metadata categories from these

requirements

REQUIREMENTS GATHERING

METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

Page 13: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 13 2012-08-21

Published metadata standard ITS 2.0 draft

• The collected and defined categories were then published as the first ITS 2.0 draft

• The draft is in its second iteration now and a third is coming soon

• We plan on having a “feature final” draft by November

• The current draft: http://www.w3.org/TR/its20/

PUBLISHED METADATA STANDARD

METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

Page 14: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 14 2012-08-21

Due to security considerations, the CMS will only act as “client” sending and requesting information from the LSP “server”

PROGRESS – DEFINED API BETWEEN CMS & LSP

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT

Web Publisher

Language Service Provider

Translation, Enrichment, Quality Assurance ...

Multilingual Web Content

service request

content

Page 15: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 15 2012-08-21

THE MULTILINGUALWEB-LT PROJECT

WHAT WE WILL TALK ABOUT TODAY

1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL

4 CONCLUSION AND BACKGROUND INFORMATION

Page 16: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 16 2012-08-21

Drupal is being enhanced to allow the inclusion of the MultilingualWeb-LT metadata, using the ITS 2.0 standard

Three ways to apply metadata:

• 1. by the content manager through an extension to the WYSIWYG module

• 2. through automatic process either in the CMS or through a service

• 3. by the processes and technologies of the LSP

2+3 use Plugins for the Translation management module

Connection to multiple LSPs with XLIFF roundtripping

XLIFF: An OASIS Standard

PROGRESS – DRUPAL INTEGRATION

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT

d.o/project/tmgmt d.o/project/wysiwyg

http://de.wikipedia.org/wiki/XML_Localization_Interchange_File_Format

Page 17: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 17 2012-08-21

Interface to Language Service Providers

• extends the TMgmt module with LSP adaptors

• provides tool set for XLIFF (popular format for LSP systems) roundtripping

THE PROTOTYPE INTERFACE WITH LSPS

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT

Page 18: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 18 2012-08-21

THE MULTILINGUALWEB-LT PROJECT

WHAT WE WILL TALK ABOUT TODAY

1 THE MULTILINGUALWEB-LT PROJECT. AN OVERVIEW

2 METADATA STANDARDIZATION AND WHAT WE HAVE ACHIEVED

3 OUR PROTOTYPE IMPLEMENTATION OF THE STANDARD IN DRUPAL

4 CONCLUSION AND BACKGROUND INFORMATION

Page 19: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 19 2012-08-21

Allows for local attribution

Global attribution supports XPATH v1 selectors as required by the proposed ITS 2.0 standard

• Localisation notes

• Named entities

• Translate language information

PROTOTYPE: METADATA SUPPORTED

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT

Page 20: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 20 2012-08-21

Cocomore: [email protected]

W3C Working Group

• w3.org/International/multilingualweb/lt/

Translation Management Module

• d.o/project/tmgmt

Drupal Discussion

• g.d.o/multilingualweb-lt

MORE INFOS AND CONTACT

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT

Page 21: The MultilingualWeb-LT (=Language Technology) Project · Less than 33% of Web users are native English speakers • We need content for all non-English speakers too • Knowledgeandbusiness

Page 21 2012-08-21

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) Grant Agreement No. 287815 - http://l.olav.net/fp7-lt-web

FUNDING

PROGRESS IN THE MULTILINGUALWEB-LT PROJECT