20120510 webl10n memoqfest budapest
TRANSCRIPT
www.loctimize.com
Loctimize GmbH
Localizing dynamic websites
created from open source
content management systems memoQfest 2012, May 10, 2012, Budapest
Daniel Zielinski
Martin Beuster
[daniel|martin]@loctimize.com
2
Agenda
• Open source content management systems
• The localization challenges
• General localization strategies
• Conclusions
© 2012 Loctimize – All rights reserved
3
Open source content management systems
© 2012 Loctimize – All rights reserved
4
Challenges
© 2012 Loctimize – All rights reserved
Extract
content
Prepare
content
Translate
content
Integrate
translated content
Test localization
Fix bugs
Publish localized website
Identify content
Create /
update
content
5
Identify content - Database
• Most of the content is stored in databases
• Databases are made up of related tables
• The tables are made up of rows and columns
• The fields contain the content in different formats
(Text, HTML, XML, proprietary format) and
• Metadata used for identifying/filtering the
relevant content
translate = 0
language = 2
published = 1
deleted = 1
6
Identify content - Database
© 2012 Loctimize – All rights reserved
HTML
content
Text
content ?
7
Identify content - File system
© 2012 Loctimize – All rights reserved
• Template files (HTML, CSS, JPG, PNG, GIF)
• Configuration files (INI, PHP, PROPERTIES,
TXT…)
• Localization files (XLIFF, XML, PHP…)
• User files (PDF, DOC, XLS, PPT,…)
8
Identify content - Template files (HTML)
© 2012 Loctimize – All rights reserved
Translatable
content?
9
Identify content - Configuration files – INI Files
• Some of the content is stored in INI files.
• It is stored in key-value pairs.
Keys = Values
10
Identify content - Configuration files –
PHP Files
• Some of the content is stored in PHP files
• It is stored in key-value pairs or arrays
11
Identify content - Localization files - XML
© 2012 Loctimize – All rights reserved
UI strings
IDs
Language
groups
Extract content – Database
• Manually by copying
• Available extensions
– that understand the I18N/L10N logic of the CMS
– that extract and export into a translatable
exchange format
• Develop scripts and exchange formats
– to extract and export the content into a
translatable exchange format
Extract content – Database
• Joomla! Joom!Fish Plus, Jolomea (XML, XLIFF,
PO)
• TYPO3 Localization Manager (XML)
• Drupal i18n, Translation Management, (XML,
XLIFF)
• Wordpress Easy Translator Pro (PO)
• Wordpress WPML (XLIFF)
© 2012 Loctimize – All rights reserved
14
Extract content – Database
© 2012 Loctimize – All rights reserved
Meta data
Page content
Source URL
15
Extract content – Database
IDs
Extract content – Files
• Copy files
• Know the file structure of the CMS
• FTP access
• Access to CMS backend with appropriate rights
© 2012 Loctimize – All rights reserved
17
Automate workflow?
© 2012 Loctimize – All rights reserved
• Use content connector and/or API to pass on the
localisable content to memoQ.
Prepare files
• Defining non-translatable content
– Add additional tags
– Defining filter settings
• XML filter
• HTML filter
• RegEx text filter
• Cascading filters
• RegEx tagger
• Joining files
© 2012 Loctimize – All rights reserved
Translate content
• Lack of context
– Translation of content deltas (updates)
– Translation without visual information (XML, INI)
• Placeholders like %1, $2, {1}, $VAR, \n, \t
© 2012 Loctimize – All rights reserved
20
Translate content - HTML
• HTML files are added to memoQ using the
standard filter.
• Tags and attributes can not be configured
(localized hyperlinks).
• A preview is available to translators and revisers.
21
Translate content - HTML
© 2012 Loctimize – All rights reserved Preview
Lookup results
Editor
Translate content – XML files
• Add the XML files to memoQ using a pre-defined
XML filter (and a cascading HTML/RegEx text
filter).
• Content is grouped by page
• Source URL in comments field
© 2012 Loctimize – All rights reserved
23
Translate content - XML
© 2012 Loctimize – All rights reserved Preview
Lookup results
Editor
Source URL
24
Translate content – INI files
• Add the INI files to memoQ using a Regex text
filter and a cascading HTML filter.
• The Regex text filter defines paragraphs as
([^=]*=)(.+) with content group 2.
25
Post-processing translated content –
Convert to SQL
• Using a script the HTML files are converted to
SQL files.
• The IDs extracted from the tags in the HTML are
used to update the correct rows.
Integrate localised content
• Manually by copying & pasting
• Available extensions
– that understand the I18N/L10N logic of the CMS
– that import the localized content
• Develop scripts
– to import the localized content
© 2012 Loctimize – All rights reserved
27
Importing localised content - Database
© 2012 Loctimize – All rights reserved
• Preview links with login information
• Overwrite mode
Importing localised content - Database
• The translated SQL file is imported into the
database.
• The table rows are updated with the translated
content along with other settings.
– Original text
– Modified date
– Published flag
– Hashed value
29
Importing localised content – INI files
• Translated INI files are exported from memoQ
• These are stored in the appropriate folders on
the web server
Automate workflow?
• Watch export folders and use CMS API/script to
import localized files
© 2012 Loctimize – All rights reserved
31
Test localised content
• Find the localised content in the website
(frontend)
• Proof-read
• Layout check
© 2012 Loctimize – All rights reserved
32
Fix localization bugs
• Find where content to be updated came from
• Update content in CMS
• Update bilingual files and/or translation memory
• Modify stylesheets (CSS)
• …
© 2012 Loctimize – All rights reserved
33
Publish localized page
© 2012 Loctimize – All rights reserved
Update!
34
Conclusion
• Complex processes
• Interaction of a lot of people
• No standard procedures
• Need to develop processes and tools
• Risk of loosing/missing data when trying to
mimic CMS core functionality
© 2012 Loctimize – All rights reserved
35
Conclusion
Translation Service Provider
• Time consuming
• Scoping is a non trivial
first step
• Expertise in CMS, web
technology, databases
• Develop tools
• Educate client and web
developers
• Sponsor development!
Client
• Choose CMS wisely
• I18N & L10N strategy
• Expect additional costs
for localisation
engineering and/or
development
• Time consuming!
© 2012 Loctimize – All rights reserved
Thank you very
much for your
attention!