Download - Dos and donts
![Page 1: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/1.jpg)
How to Localize XML Documents: A Workshop on the Do's and Don'ts of XML Localization
November 2014
http:///
![Page 2: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/2.jpg)
XML dominant!
• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization
![Page 3: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/3.jpg)
Benefits of XML for L10N
• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization
![Page 4: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/4.jpg)
The real significance of XML
• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootrstrap its own localization
![Page 5: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/5.jpg)
Benefits of XML for L10N
Why use XML for Localization• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine
![Page 6: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/6.jpg)
Benefits of XML
Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.
And then back into the original format
![Page 7: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/7.jpg)
Do: Use Standard XML Libraries
DO!• Xerces/Xalan• MSXML
DO NOT!• Write your own XML parser• Write you own serializer
![Page 8: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/8.jpg)
Do: Use XLIFF
XML Localization Interchange File FormatOASIS Standard:
https://www.oasis-open.org/committees/xliff1. Extract text for translation2. Use Standard library parser!!!!!
• Java/C++ - Apache Xerces• .Net – MSXML
3. XLIFF:doc – simplified XLIFF• XLIFF 1.2 subset
http://code.google.com/p/interoperability-now/4. Use skeleton file
• Use markers for inline elements• Use only <g> and <x> inline elements• Use <mrk> for terminology
![Page 9: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/9.jpg)
XML Document Design Issues
Important points to take into account
![Page 10: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/10.jpg)
Word/Phrase substitution IssuesWhat works in English/Chinese will not work in most other languages:
English/Chinese languages have an impoverished morphology:
• Please undo the bolt using a spanner.
• Proszę odkręcić śrubę kluczem.
Nominative form: klucz
The real killers: gender and case
• New Ford Model: New Fiesta, New Mondeo, New Focus• Nowa Ford Fiesta• Nowe Ford Mondeo• Nowy Ford Focus
![Page 11: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/11.jpg)
Avoid translatable Entity References
<para>Use a &tool; to release the catch.</para>
Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems
Solution:
<para> Use a <tool id="a1098">claw hammer</tool> to release the CPU retention catch. </para>
![Page 12: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/12.jpg)
Avoid Word Substitution Mechanisms<p>Using a<keyword conref=”tools.dita#tools/ClawHammer”/>,remove the CPU from its mount.</p>
Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems
Solution:
<para> Use a <tool id="a1098">claw hammer</tool> to release the CPU from its mount. </para>
![Page 13: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/13.jpg)
Incorrect use of Translatable Attributes<para> Use a <tool id="a1098" name="claw hammer"> to release the CPU retention catch. </para>
Problems:• Grammatical difficulties• Text flow problems
Solution:
<para> Use a <tool id="a1098">claw hammer</tool> to release the CPU retention catch. </para>
![Page 14: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/14.jpg)
CDATA Sections<TEMPLATE><![CDATA[<p>Please refer to the <em>index page </em> page for further information</p> ]]></TEMPLATE>
Problems:• Grammatical difficulties• Text flow problems
Solution:
<TEMPLATE> <dx:p>Please refer to the <dx:em>index page </dx:em> page for further information</dx:p> </TEMPLATE>
![Page 15: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/15.jpg)
Infinite Naming Schemes<?xml version="1.0" ?> <resources xml:lang="en"> <err001>Cannot open file $1.</err001> <hint001>Hint: does file $1 exist.</hint001> <err002>Incorrect value.</err002> <hint002>Hint: value must be between $1 and $2.</hint002> <err003>Connection timeout.</err999> . . </resources>
Problems:• Poor XML practice• Problems for extraction programs
Solution:
![Page 16: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/16.jpg)
Infinite Naming Schemes contd.
<?xml version="1.0" ?><resources xml:lang="en"> <error id="001"> <caption>Cannot open file $1.</caption> <hint>Does file $1 exist.</hint> </error> <error id="002"> <caption>Incorrect value.</caption> <hint>Value must be between $1 and $2.</hint> </error> . .</resources>
Solution:
![Page 17: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/17.jpg)
Avoid Processing Instructions<para> Use a <?tool name="claw hammer"?> to release the CPU retention catch.</para>
Problems:• Grammatical difficulties• Pis not guaranteed to survive transformations• Text flow problems
Solution:
<para> Use a <tool id="a1098">claw hammer</tool> to release the CPU retention catch. </para>
![Page 18: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/18.jpg)
Avoid text in bitmap graphics
![Page 19: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/19.jpg)
Text expansionNever make assumptions about text length in design
![Page 20: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/20.jpg)
Do: Always use UTF-8 or UTF-16 Encoding
• Avoid code conversion issues• Always use Unicode encoding• Not just CJK issues• Also wingbats and special characters such
as:• M-dash• Ndash• Non break spaces • Etc.
![Page 21: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/21.jpg)
Do not break text over non-inline elements
<para> <line>This text should not be</line> <line>broken this way – the translated text may well be in a different order.</line></para>
Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems
![Page 22: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/22.jpg)
Avoid the use of typographical elements<para><b>Do not use</b> <br/>’br’ type elements.</para>
Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems
Solution:
<para> <emph>Do not use</emph> 'br' type elements.</para>
![Page 23: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/23.jpg)
Do not mix xlatable and non-xlatable<data-items> <data id="class"> com.xmlintl.data.dataDefDefinition </data> <data id="text">Replace generic datadefinitions with specific instances. </data></data-items>
Problems:• Poor XML practice• Problems for extraction programs
Solution:
![Page 24: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/24.jpg)
Do not mix xlatable and non-xlatable contd.
<data-items> <class id="com.xmlintl.data.dataDefinition"><text>Replace generic datadefinitions with specific instances.</text> </class></data-items>
![Page 25: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/25.jpg)
Avoid mixed language documents<para> <text xml:lang="en"> My hovercraft is full of eels. </text> <text xml:lang="fr"> Mon aéroglisseur est plein d'anguilles. </text> <text xml:lang="hu"> Légpárnás hajóm tele van angolnákkal. </text> <text xml:lang="ja"> 私のホバークラフトは鰻で一杯です。 </text> <text xml:lang="pl"> Mój poduszkowiec jest pełen węgorzy. </text> <text xml:lang="es"> Mi aerodeslizador está lleno de anguilas. </text> <text xml:lang="zh-CH"> 我隻氣墊船裝滿晒鱔. </text> <text xml:lang="zh-TW"> 我的氣墊船充滿了鱔魚 [我的气垫船充满了鳝鱼 ] </text></para>
![Page 26: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/26.jpg)
Do: clearly mark non-translatable text
<para> The following part of this sentence should <its:its translate=‘no’>not be translated</its:its> at all.</para>
![Page 27: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/27.jpg)
Core L10 Interoperability Standards
• W3C ITS Document Rules
• ETSI LIS SRX
• ETSI LIS xml:tm
• ETSI LIS TMX
• ETSI LIS TBX
• ETSI LIS GMX
• OASIS XLIFF
• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)
• Unicode TR29
![Page 28: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/28.jpg)
Putting It All Together
![Page 29: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/29.jpg)
• Open Architecture for XML Authoring and Localization (OAXAL)
– http://wiki.oasis-open.org/oaxal/FrontPage
![Page 30: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/30.jpg)
OAXAL
![Page 31: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/31.jpg)
OAXAL
![Page 32: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/32.jpg)
Localization without Standards
Customer
source text source text extract extracted
text tm process
prepared text
translatetranslated text
target texttarget text
merge target text
QA
![Page 33: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/33.jpg)
True Cost of Translation
![Page 34: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/34.jpg)
OAXAL in Action
![Page 35: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/35.jpg)
Translating English Soccer Articles into
Arabic 24x7
![Page 36: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/36.jpg)
Translating English Soccer Articles into
Arabic 24x7
![Page 37: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/37.jpg)
Flagship website
![Page 38: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/38.jpg)
Flagship website
![Page 39: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/39.jpg)
Browser-Based Workbench
![Page 40: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/40.jpg)
OAXAL In Action
![Page 41: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/41.jpg)
Your opinion is important to us! Please tell us what you thought of the lecture. We look forward to your feedback via smartphone or tablet under
http://LOC23.honestly.deor scan the QR code
The feedback tool will be available even after the conference!
![Page 42: Dos and donts](https://reader035.vdocuments.net/reader035/viewer/2022081605/587968161a28ab1e388b78db/html5/thumbnails/42.jpg)
• Contact details:• Andrzej Zydroń• [email protected]• http://www.xtm-intl.com