xml and localization

42
XML and LOCALIZATION An overview by @Fantpmas from @YamagataEurope

Upload: yamagata-europe

Post on 03-Jul-2015

828 views

Category:

Technology


1 download

DESCRIPTION

An overview of XML and how it is used in the localization world

TRANSCRIPT

Page 1: XML and Localization

XML and LOCALIZATION

An overview by @Fantpmas from @YamagataEurope

Page 2: XML and Localization

What is XML? And why do you people love acronyms so much?

Page 3: XML and Localization

XML stands for eXtensible Markup Language

You can write your own language/dialect

A language to store data in a human readable format

Page 4: XML and Localization

XML is designed to carry data not display data like HTML XML doesn't do anything on its own, nada, zilch!

Page 5: XML and Localization

A sample XML document (Don't worry it's all plain text)

The root element

3 child elements

Page 6: XML and Localization

An XML element in detail

Start tag End tag

Attribute

Element content

Attribute value

Page 7: XML and Localization

XML elements can be empty

is the same as

Self-closing element

Page 8: XML and Localization

There are rules to follow When all rules are abided by, the XML is well-formed

Page 9: XML and Localization

XML well-formedness rules (not exhaustive) • There must be a root element • Elements must follow naming rules • All elements must be closed • Element names are case sensitive • Elements must be properly nested • Attributes must be quoted • Attributes can only appear once in same start tag • Some characters cannot be used as such • Entities must be declared

Page 10: XML and Localization

There must be a root element

Page 11: XML and Localization

Elements must follow naming rules

Names can only start with • A letter (in any language, including accented letters) • A colon • An underscore

筆者 筆者

Page 12: XML and Localization

Elements must follow naming rules

Names cannot contain • White spaces • Most punctuation characters except colon, underscore,

hyphen, dot, middle dot • Symbol characters

筆 者 筆 者

Page 13: XML and Localization

All elements must be closed

Page 14: XML and Localization

Element names are case sensitive

Page 15: XML and Localization

Elements must be properly nested

Page 16: XML and Localization

Attribute values must be quoted

Single or double quotes

Page 17: XML and Localization

Attention to those darn quotes

If double quotes are used you cannot use double quotes inside the attribute value . The same applies for single quotes.

Page 18: XML and Localization

Attributes must be unique in tags

Page 19: XML and Localization

Some characters cannot be used

• < and & need to escaped into entities: and • Most control characters

(characters to indicate carriage return, tab or backspace)

Page 20: XML and Localization

A word about entities

Entities are used to represent characters or a sequence of characters that needs to be repeated throughout a document Syntax:

Ampersand Semicolon

Page 21: XML and Localization

Predefined XML entities

5 predefined character entities, only 2 are obligatory

&lt; < less than

&gt; > greater than

&amp; & ampersand

&apos; ' apostrophe

&quot; " quotation mark

Page 22: XML and Localization

Entities must be declared

Except for predefined entities all entities must be declared in the Document Type Definition

Entity

DTD Entity declaration

Page 23: XML and Localization

Other constructs

• XML declaration

• Stylesheet declaration

• Document Type declaration

• Comments

• CDATA

Page 24: XML and Localization

Document Type Definition A DTD defines the structure of an XML document

Page 25: XML and Localization

How to declare DTDs

DTDs can be internal

DTD

Page 26: XML and Localization

How to declare DTDs

DTDs can be external

Page 27: XML and Localization

XML Schema

XML Schema (*.xsd) is an XML based alternative to DTD

Page 28: XML and Localization

DTDs in the localization world

Don't be scared, but XML really is everywhere • TMX • TBX • XLIFF • TTX • SRX • QT Linguist TS • DITA • ...

Page 29: XML and Localization

Encoding

All XML parsers must support at least UTF-8 and UTF-16. Default encoding is UTF-8. Always a good idea to specify the encoding

Page 30: XML and Localization

Byte Order Mark

A character to indicate the byte order of an XML document In UTF-8 it's optional and not even recommended In UTF-16 it's used to indicate endianness: little-endian or big-endian If you see these at the start of a file, something's wrong:

Page 31: XML and Localization

Complimentary technologies What? There's more of this geek stuff!?

Page 32: XML and Localization

Extensible Stylesheet Language Transformation (XSLT)

It's XML to transform another XML document!

Page 33: XML and Localization

XSL Transformations

XML

(X)HTML

XML

TXT

Page 34: XML and Localization

How to apply an XSLT

Declare the stylesheet in the XML file itself

Use an application like XMLSpy or xmlstarlet

Page 35: XML and Localization

XSLT localization examples

• Convert a TTX to a two-column HTML or CSV • Convert a TMX to a TBX • Convert a TMX to a TXT (for spell-check in MS Word) • Convert multilingual XML to TMX/TBX • Generate HTML preview for XML in SDL Trados Studio • Prepare XML files for translation

Page 36: XML and Localization

XPath

It's a query language to select nodes from an XML document It's used in XSLT

Will select all elements that have an attribute called

and whose value is

And also in SDL Trados Studio file types

Page 37: XML and Localization

Is XML good for localization? Yes, but not always

Page 38: XML and Localization

XML is great for localization

• Unicode supported by default

• Metadata gives more information about content

• Separates content from formatting (to some extent)

• Human readable

• Easily transformable using XSLT

• Excellent for single-sourcing

Page 39: XML and Localization

But bad XML is bad

• Translatable content in attributes

• No metadata to distinguish between content e.g. mixed languages, translatable vs not translatable

• CDATA is just plain cheating

• Bad implementations of standards (XLIFF)

Page 40: XML and Localization

And also

• Multilingual XML can be challenging (XSLT can help)

東京

• Big files and one-liners can cause processing problems

(pretty-printing can help)

Page 41: XML and Localization

Tools, tools, tools

• Altova XMLSpy: all-round XML editor

• Altova DiffDog: compare XML files

• xmlstarlet: command line XML toolkit

• EditPad Pro for all encoding/BOM matters

Page 42: XML and Localization

"Specification is only theory. In practice, there is only the parser."

@Tnkrd