l10n standards warszawa 2014

59
L10N Standards Warszawa 2014 http://maturebabespics.com/ http://maturebabespics.com/

Upload: morgan-moody

Post on 18-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: L10N Standards Warszawa 2014

L10N Standards

Warszawa 2014

http://maturebabespics.com/http://maturebabespics.com/

Page 2: L10N Standards Warszawa 2014

Why Standards?

Page 3: L10N Standards Warszawa 2014

Why have Standards?

Page 4: L10N Standards Warszawa 2014

L10N Standards

What are we going to cover:

1. Why L10N standards are important2. The role XML has to play3. Key L10N standards data standards4. How to leverage L10N standards5. Creating a totally data driven automated L10N process6. Interoperability

Page 5: L10N Standards Warszawa 2014

Why have Standards?

Page 6: L10N Standards Warszawa 2014

Current State of Art

Page 7: L10N Standards Warszawa 2014

L10N Typical Workflow

Page 8: L10N Standards Warszawa 2014

What you need is a better crane!???

Page 9: L10N Standards Warszawa 2014

Localization without Standards

Customer

source text

source text extract extracted text tm

process

prepared text

translatetranslated text

target texttarget text

merge target text

QA

Page 10: L10N Standards Warszawa 2014

True Cost of Translation

Page 11: L10N Standards Warszawa 2014

Standards = Uniform Data

Page 12: L10N Standards Warszawa 2014

ISO Standard

Page 13: L10N Standards Warszawa 2014

Standards = Efficiency

Page 14: L10N Standards Warszawa 2014

Standards = Lower Costs

Page 15: L10N Standards Warszawa 2014

Standards = Safe to Implement

Page 16: L10N Standards Warszawa 2014

Standards = Greater Interoperability

Page 17: L10N Standards Warszawa 2014

Standards: Unforeseen Benefits

Page 18: L10N Standards Warszawa 2014

Standards: Unforeseen Benefits

Page 19: L10N Standards Warszawa 2014

Standards: Misuse

imap://azydron%40xml-intl%40xml-intl%[email protected]:143/fetch%3EUID%3E.INBOX%3E87222?part=1.2&filename=image003.jpg

Page 20: L10N Standards Warszawa 2014

Standards: Abuse

Page 21: L10N Standards Warszawa 2014

Standards: Sabotage

• Sabotaged Standards:• Proprietary extensions• Bad implementations

Page 22: L10N Standards Warszawa 2014
Page 23: L10N Standards Warszawa 2014

The importance of XML

Everything is now XML• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization

Page 24: L10N Standards Warszawa 2014

The power of XML

Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.

And then back into the original format

Page 25: L10N Standards Warszawa 2014

Benefits of XML for L10N

• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization

Page 26: L10N Standards Warszawa 2014

The significance of XML

• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootstrap its own localization

Page 27: L10N Standards Warszawa 2014

Benefits of XML for L10N

Why use XML for Localization?• Most localizable documents are now in XML• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine

Page 28: L10N Standards Warszawa 2014

Core L10 Standards

• W3C ITS Document Rules

• ETSI LIS SRX

• ETSI LIS xml:tm

• ETSI LIS TMX

• ETSI LIS TBX

• ETSI LIS GMX

• OASIS XLIFF

• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)

• Linport Interoperability: TIPP XLIFF:doc

Page 29: L10N Standards Warszawa 2014

ITS

• Internationalization and Localization Tag Set– http://www.w3.org/International/its

• Internationalization Tag Set – Document Rules for a given XML vocabulary:– Inline elements (within text)– Sub flows– Non-translatable– Translatable attributes

• Guidelines for localizing XML documents• Internationalization and Localization Markup Requirements• Version 1.0, 2008• Version 2.0, 2013

Page 30: L10N Standards Warszawa 2014

• http://www.etsi.org/deliver/etsi_gs/lis/001_099/002/01.04.02_60/gs_lis002v010402p.pdf

• Translation Memory Exchange• Current version 1.4b, 2.0 undergoing review• Allows for the interchange of translation

memories between different vendor systems– No translation vendor lock-in– Free exchange of translation assets

TMX

Page 31: L10N Standards Warszawa 2014

• First LISA OSCAR Standard– Version 1.1 1998 – Version 1.2 1999– Version 1.3 2001– Version 1.4b 2002

• Moved to ETSI/LIS 2012– Version 2.0 2014?

• Two level of implementation:– Level 1 (Plain Text Only) – Level 2 (Content Markup)

TMX History

Page 32: L10N Standards Warszawa 2014

http://www.gala-global.org/oscarStandards/srx/srx20.html

• Segmentation Rules Exchange

• Current version 2.0 2008

• How sentences are segmented

• Allows for the exchange of segmentation rules using regular expressions

• Complements TMX standard

• Quoted XLIFF, TMX and xml:tm

SRX

Page 33: L10N Standards Warszawa 2014

• Unicode Regular expression syntax defined• Meta characters – Unicode regular expressions: "\

X", "\s", "\S" etc.  • Operators – "*", "|", "?", "+" etc.• Defines:

– Language rules: segmentation rules– Map rules: how to apply the segmentation rules

SRXKey Concepts

Page 34: L10N Standards Warszawa 2014

GMX

http://docbox.etsi.org/ISG/Open/ISGLIS/GMX-V/GMX-V/GMX-V-2.0.html

• Global Information Management Metrics eXchange

• GMX/V Approved LISA OSCAR Standard February 2007

• Tripartite– GMX-V : Volume, published for public comment

– GMX-C : Complexity, initial specification

– GMX-Q : Quality

• Standard for defining a L10N job

• Allows for quantifying job complexity

• GMX/V 2.0 Approved ETSI LIS

– added support for CJK word counts

– overall character count including white space characters

Page 35: L10N Standards Warszawa 2014

• GIM Metrics eXchange – Volume• Objectives:

– Unambiguous and verifiable definition of word and character counts

– A method of exchanging counts within an XML framework

• Two types of count:– Verifiable, based on electronic documents– Non-verifiable

• Canonical form: XLIFF based• Word boundaries: Unicode TR29• Unicode character encoding• Minimum conformance

– Total Character Count– Total Word Count

GMX-V

Page 36: L10N Standards Warszawa 2014

XLIFF

http://www.oasis-open.org/committees/xliff• XLIFF – XML Localization Interchange File Format• Current status

– XLIFF 1.1 Committee Specification (31 Oct 2003)– XLIFF 1.2 Approved as an OASIS Standard 2008

• Segmentation support• (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1.

Representation Guide• Java / Windows / .Net Representation Guide

– XLIFF 2.0 currently out for public comment (not backwards compatible)

Page 37: L10N Standards Warszawa 2014

XLIFF

Page 38: L10N Standards Warszawa 2014

• Single format for exchanging L10N from disperate sources

• Loss-less• Tool-neutral• Formalized as an XML vocabulary • Can embed skeleton file

XLIFF

Page 39: L10N Standards Warszawa 2014

xml:tm

http://www.xtm-intl.com/manuals/xml-tm/xml-tm2.0.html

• XML based Text Memory– Radical rethink of how to handle Translation Memory– Donated by XML INTL to LISA OSCAR– OSCAR Standard Feb 2007– Adopted by ETSI LIS, version 2.0 ready for adoption

• Takes the DITA reuse principle down to sentence level– Author Memory– Translation Memory

Page 40: L10N Standards Warszawa 2014

xml:tm - Namespace

• Namespace is a major feature of XML• Allows the mapping of different ontological entities

onto the same representation• Allows different ways to look at the same data• Namespaces can be made transparent

Page 41: L10N Standards Warszawa 2014

xml:tm

• XML based text memory• Revolutionary approach to translating XML

documents• First significant advance in translation memory

technology• Uses XML namespace to transparently embed

contextual information• The one ring that binds them all

Page 42: L10N Standards Warszawa 2014

xml:tm namespace

Example of the use of tm namespace in an XML document:

<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

Page 43: L10N Standards Warszawa 2014

xml:tm namespace

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

Source document tm namespace

viewtete texttexttututexttext

tete sentencesentence sentencesentencetutu tutu

parapara texttext

parapara texttext

parapara texttext

parapara texttext

parapara texttext

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

texttext

Source document view

Page 44: L10N Standards Warszawa 2014

xml:tm Text Memory

• Author memoryMaintain memory of source textAuthoring statisticsAuthoring tool input

• Translation memoryAutomatic alignmentMaintain perfect link of source and target textReduce translation costs

Page 45: L10N Standards Warszawa 2014

xml:tm DOM differencing

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Original Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Updated Source Document

DOMDifferencin

g

Page 46: L10N Standards Warszawa 2014

xml:tm translated documentin Polish

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

Translated document tm namespace

viewtete tekstteksttututeksttekst

tete zdaniezdanie zdaniezdanietutu tutu

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

teksttekst

Translated document view

Page 47: L10N Standards Warszawa 2014

Putting It All Together

Page 48: L10N Standards Warszawa 2014

• Open Architecture for XML Authoring and Localization (OAXAL)

– http://wiki.oasis-open.org/oaxal/FrontPage

Page 49: L10N Standards Warszawa 2014

OAXAL 2.0

Page 50: L10N Standards Warszawa 2014

OAXAL 2.0

Page 51: L10N Standards Warszawa 2014

OAXAL Benefits

• SOA (Service Oriented Architecture) Open Architecture

• Open Standards - Open APIs

• Easy Exchange

• Modular design

• Interoperability

• Very high level of automation

Page 52: L10N Standards Warszawa 2014

Interoperability Now!/Linport

Interoperability Now!http://www.interoperability-now.org/• Born out of frustration and necessity• Early 2012• Members

• Bioloom Group• Kilgray• Medtronic• Ontram• Spartan Software• XTM-INTL

• The goal:• True 100% roundtrip interoperability between TMS/CAT tools

• Now part of Linport

Page 53: L10N Standards Warszawa 2014

Interoperability Now!/Linport

Linporthttp://www.linport.org/• Language INteroperability Portfolio• Created in 2012 by the merging of two initiatives:

• Multilingual Electronic Dossier• The Container Project

• Sponsored:• the European Union DG Translation• JAIMCATT (http://jiamcatt.org/) -

• Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology

Page 54: L10N Standards Warszawa 2014

OAXAL in Action

Page 55: L10N Standards Warszawa 2014

Translating English Soccer Articles into

Arabic 24x7

Page 56: L10N Standards Warszawa 2014

Translating English Soccer Articles into

Arabic 24x7

Page 57: L10N Standards Warszawa 2014

Browser-Based Workbench

Page 58: L10N Standards Warszawa 2014

OAXAL In Action

Page 59: L10N Standards Warszawa 2014

• Contact details:• Andrzej Zydroń• [email protected]• http://www.xtm-intl.com