ppt

31
© Copyright 2007 Achim Ruopp © Copyright 2007 Achim Ruopp Web 2.0 Expo 2007 Web 2.0 Expo 2007 Making Cents of Yens Making Cents of Yens and Euros: Web 2.0 and Euros: Web 2.0 Internationalization Internationalization Achim Ruopp Achim Ruopp [email protected] [email protected] http://www.digitalsilkroad.ne http://www.digitalsilkroad.ne t/ t/

Post on 18-Sep-2014

3 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: ppt

© Copyright 2007 Achim Ruopp© Copyright 2007 Achim Ruopp Web 2.0 Expo 2007Web 2.0 Expo 2007

Making Cents of Yens and Making Cents of Yens and Euros: Web 2.0 Euros: Web 2.0

InternationalizationInternationalization

Achim RuoppAchim [email protected]@digitalsilkroad.net

http://www.digitalsilkroad.net/http://www.digitalsilkroad.net/

Page 2: ppt

DemoDemoA Currency Converter Application – A Currency Converter Application – before and after before and after Web 2.0 InternationalizationWeb 2.0 Internationalization

Page 3: ppt

AgendaAgenda Introduction to Web Internationalization (i18n)Introduction to Web Internationalization (i18n)

• Selecting and Persisting User PreferencesSelecting and Persisting User Preferences• Locales and Locale IdentifiersLocales and Locale Identifiers• UnicodeUnicode• Localization – Model and ToolsLocalization – Model and Tools

Multi-lingual SyndicationMulti-lingual Syndication• RSSRSS• AtomAtom

Client-side ScriptingClient-side Scripting• Javascript InternationalizationJavascript Internationalization• AjaxAjax

International Web Services DesignInternational Web Services Design• RESTREST• SOAPSOAP

Page 4: ppt

Intro to Web InternationalizationIntro to Web InternationalizationLanguage and LocationLanguage and Location

en-US

fr en;0.8

da-DK

Page 5: ppt

Intro to Web InternationalizationIntro to Web InternationalizationUser PreferencesUser Preferences

LanguageLanguage• HTTP Accept-Language headerHTTP Accept-Language header• E.g.: E.g.: en, fr-CA;0.8, fr;0.6en, fr-CA;0.8, fr;0.6• Language negotiation with the serverLanguage negotiation with the server

LocaleLocale• Cultural preferences for formatting, sorting etc.Cultural preferences for formatting, sorting etc.• Infer from Accept-Language header Infer from Accept-Language header • Map IPv4 address to ccTLD (country code top-level Map IPv4 address to ccTLD (country code top-level

domain)domain) Public information accessible through librariesPublic information accessible through libraries

• E.g. Perl IP::Country CPAN moduleE.g. Perl IP::Country CPAN module Commercial services offer more precisionCommercial services offer more precision

Always provide option to change defaultsAlways provide option to change defaults Store preferences in cookiesStore preferences in cookies

Page 6: ppt

Intro to Web Internationalization Intro to Web Internationalization Internet Language TagsInternet Language Tags

IETF Language Tags (BCP 47)IETF Language Tags (BCP 47)Language[-Language]*Language[-Language]*33

[-Script][-Region][-Script][-Region][-Variant]*[-Extension]*[-PrivateUse]*[-Variant]*[-Extension]*[-PrivateUse]*

ExamplesExamples• en-CA: English in Canadaen-CA: English in Canada• Zh-Hant-TW: Chinese written in traditional Zh-Hant-TW: Chinese written in traditional

Chinese script used in TaiwanChinese script used in Taiwan Obsoletes RFC 3066 & RFC 1766Obsoletes RFC 3066 & RFC 1766

• Often still used in products/earlier standardsOften still used in products/earlier standards

Page 7: ppt

Internationalization ChangesInternationalization Changes

Page 8: ppt

Intro to Web InternationalizationIntro to Web InternationalizationPOSIX LocalesPOSIX Locales

Cross-platform APICross-platform API• Locale-identifiers can have variationsLocale-identifiers can have variations

Un*x: en_USUn*x: en_US Windows: English_United StatesWindows: English_United States

• Results can be platform-dependentResults can be platform-dependent Basis for locale functionality in all scripting Basis for locale functionality in all scripting

languageslanguages Provides functionality forProvides functionality for

• Number Formatting: 1,000,000.23Number Formatting: 1,000,000.23• Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμDate/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ• SortingSorting• String processing (e.g. upper-/lower-casing)String processing (e.g. upper-/lower-casing)• Some translated strings like weekdays, yes/no messagesSome translated strings like weekdays, yes/no messages

Page 9: ppt

Intro to Web InternationalizationIntro to Web Internationalization International Components for UnicodeInternational Components for Unicode

IBM Open Source projectIBM Open Source project Extensive locale data and APIs Extensive locale data and APIs

• Data vetted as part of Common Locale Data vetted as part of Common Locale Data Repository (CLDR) projectData Repository (CLDR) project

Java and C++ APIsJava and C++ APIs Wrappers for scripting languagesWrappers for scripting languages

• PyICU (Python)PyICU (Python)• ICU4R (Ruby) – abandoned?ICU4R (Ruby) – abandoned?• DIY – difficult because of API complexity DIY – difficult because of API complexity

and character encoding issuesand character encoding issues

Page 10: ppt

Intro to Web InternationalizationIntro to Web InternationalizationMicrosoft Internationalization APIsMicrosoft Internationalization APIs

Windows NLS APIWindows NLS API Microsoft .NET Framework Microsoft .NET Framework

System.Globalization namespaceSystem.Globalization namespace Similar set of data to ICUSimilar set of data to ICU

• Data vetted by Microsoft subsidiariesData vetted by Microsoft subsidiaries APIs accessible from all Microsoft APIs accessible from all Microsoft

programming languagesprogramming languages

Page 11: ppt

Intro to Web InternationalizationIntro to Web InternationalizationUnicode 5.0Unicode 5.0

00000100002000030000

E0000F0000

100000

Basic Multilingual PlaneDead Languages & MathHan Characters

Language Tags

Private Use

0000100020003000400050006000700080009000A000B000C000D000E000F000

AlphabetsPunctuationAsian Languages

Han Characters

Yi

Hangul

SurrogatesPrivate UseLegacy/Compatibility

99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined

Page 12: ppt

Intro to Web InternationalizationIntro to Web InternationalizationUnicode Encodings FormsUnicode Encodings Forms

Variable length: UTF-8/UTF-16 Variable length: UTF-8/UTF-16 Fixed length: UTF-32Fixed length: UTF-32 U+2122: ™: Trade Mark SignU+2122: ™: Trade Mark SignUTF-8UTF-8 0xE2 0x84 0xA20xE2 0x84 0xA2 1110111000100010

10100001000001001010100010100010

UTF-16UTF-16 0x21220x2122 00100001 0010001000100001 00100010

UTF-32UTF-32 0x000021220x00002122 0…00100001 001000100…00100001 00100010

Page 13: ppt

* source: Google presentation at IUC30* source: Google presentation at IUC30

Intro to Web InternationalizationIntro to Web InternationalizationUnicode on the WebUnicode on the Web

XML processors are required to process XML processors are required to process UTF-8/UTF-16UTF-8/UTF-16

Encoding declaration precedenceEncoding declaration precedence1.1. HTTP Content-Type header charset declarationHTTP Content-Type header charset declaration2.2. XML encoding declaration (XHTML)XML encoding declaration (XHTML)3.3. meta charset declaration in (X)HTMLmeta charset declaration in (X)HTML4.4. link element charset attribute link element charset attribute

Approx. 4% of pages have encoding errors*Approx. 4% of pages have encoding errors* No real need for character references No real need for character references

• ü: &uuml; or &#252ü: &uuml; or &#252• Exceptions: <,>,&,"Exceptions: <,>,&,"

Use styles to control font selectionUse styles to control font selection

Page 14: ppt

DemoDemoA Currency Converter Application – A Currency Converter Application – globalized but not localizedglobalized but not localized

Page 15: ppt

Intro to Web InternationalizationIntro to Web InternationalizationLocalization RecommendationsLocalization Recommendations

Avoid translatable text in graphics

Make sure graphics are culturally neutral

Avoid absolute

sizingUse HTML

flow layout

Write complete sentences

Page 16: ppt

Intro to Web InternationalizationIntro to Web InternationalizationLocalization Model and ToolsLocalization Model and Tools

Text translationText translation• Localization formatsLocalization formats

HTML with template libraryHTML with template library• W3C Internationalization Tag Set (tool support?)W3C Internationalization Tag Set (tool support?)

GNU gettext/POGNU gettext/PO XLIFF - XML Localization Interchange File FormatXLIFF - XML Localization Interchange File Format

• Localization toolsLocalization tools OmegaTOmegaT Open Language Tools (Sun)Open Language Tools (Sun) The WordForge Project: PootleThe WordForge Project: Pootle ……

Searchability – Links/SitemapSearchability – Links/Sitemap

Page 17: ppt

DemoDemoA Currency Converter Application – A Currency Converter Application – fully internationalized Web 1.0 fully internationalized Web 1.0 applicationapplication

Page 18: ppt

Client-side ScriptingClient-side ScriptingJavascript InternationalizationJavascript Internationalization

ECMAScript edition 3 added a range of ECMAScript edition 3 added a range of internationalization features (1999)internationalization features (1999)• Good support for Unicode processingGood support for Unicode processing• Set of locale-sensitive functionsSet of locale-sensitive functions

Dependent on host locale (i.e. browser)Dependent on host locale (i.e. browser)• Set of locale-insensitive functionsSet of locale-insensitive functions• No number or date/time parsingNo number or date/time parsing

Javascript libraries with additional Javascript libraries with additional internationalization functionalityinternationalization functionality• dojo Toolkit (i18n contributed by IBM)dojo Toolkit (i18n contributed by IBM)• Microsoft AJAX LibraryMicrosoft AJAX Library

Page 19: ppt

Client-side ScriptingClient-side ScriptingAJAX RecommendationsAJAX Recommendations

Late globalizationLate globalization• Transmit data in locale-independent form with Transmit data in locale-independent form with

XMLHttpRequestXMLHttpRequest• Might require some creative parsing/UIMight require some creative parsing/UI

Early localizationEarly localization• Text localization server-sideText localization server-side• Browsers are missing a message-catalog Browsers are missing a message-catalog

facilityfacility• Dynamically created page content is invisible Dynamically created page content is invisible

to search enginesto search engines

Page 20: ppt

Multi-lingual SyndicationMulti-lingual SyndicationRSS 2.0RSS 2.0

Character encodingCharacter encoding• RSS 2.0 is an XML applicationRSS 2.0 is an XML application• XML encoding rules applyXML encoding rules apply

LanguageLanguage• Element only on channel (feed), not on itemElement only on channel (feed), not on item

Create one channel per languageCreate one channel per language• Specified to comply to RFC1766 language tagsSpecified to comply to RFC1766 language tags

Date/TimeDate/Time• In standard RFC 822 format (including 4-digit In standard RFC 822 format (including 4-digit

years)years) E.g. “Wed, 02 Oct 2002 08:00:00 EST”E.g. “Wed, 02 Oct 2002 08:00:00 EST”

Page 21: ppt

Multi-lingual SyndicationMulti-lingual SyndicationAtom SyndicationAtom Syndication

More granular language markingMore granular language marking• xml:lang can be applied to any human xml:lang can be applied to any human

readable text in the formatreadable text in the format• Aggregators need to deal with thisAggregators need to deal with this

Better date/time format: RFC 3339Better date/time format: RFC 3339• E.g. “2003-12-13T18:30:02-05:00”E.g. “2003-12-13T18:30:02-05:00”

Acknowledgement: Tim BrayAcknowledgement: Tim Bray

Page 22: ppt

DemoDemoA Currency Converter Application – A Currency Converter Application – adding a syndication feed with adding a syndication feed with exchange rate informationexchange rate information

Page 23: ppt

International Web Services DesignInternational Web Services DesignService PatternsService Patterns

DescriptionDescription Request dataRequest data Return dataReturn dataLocale NeutralLocale Neutral Neutral data Neutral data

formatsformatsCADCAD 1.17851.1785

Client Client InfluencedInfluenced

Service reacts Service reacts to client-locale to client-locale e.g. HTTP e.g. HTTP Accept-Accept-LanguageLanguage

CADCAD(Accept-(Accept-Language: de)Language: de)

Kanadischer Kanadischer DollarDollar

Service Service DeterminedDetermined

Service is Service is locale-specific locale-specific and ignores and ignores client client preferencepreference

03/08/2007 03/08/2007 12:00pm EST12:00pm EST

Data DrivenData Driven Service adjusts Service adjusts formatting and formatting and language to language to locale the data locale the data refers torefers to

NOKNOK

CHFCHF

norske kroner norske kroner

??

Page 24: ppt

International Web Services DesignInternational Web Services DesignRESTREST

REST naturally ties into i18n features in REST naturally ties into i18n features in HTTP/HTML/XMLHTTP/HTML/XML• Locale indicated with HTTP Accept-LanguageLocale indicated with HTTP Accept-Language• Encoding and language marking in markupEncoding and language marking in markup

Special caution for HTTP GET parametersSpecial caution for HTTP GET parameters• Locale-independent formatting recommendedLocale-independent formatting recommended• Text parametersText parameters

Encode in UTF-8 and escape in URIsEncode in UTF-8 and escape in URIs IRI (International Resource Identifier) functionality IRI (International Resource Identifier) functionality

might provide this for youmight provide this for you

Page 25: ppt

International Web Services DesignInternational Web Services DesignSOAPSOAP

Locale can be communicated inLocale can be communicated in• Transport header (e.g. HTTP)Transport header (e.g. HTTP)• SOAP headerSOAP header• SOAP message bodySOAP message body

Beware of automatically generated SOAP Beware of automatically generated SOAP interfacesinterfaces• Might be locale-dependent, but not allow to Might be locale-dependent, but not allow to

specify localespecify locale Use of XML Schema data types promotes Use of XML Schema data types promotes

locale-independencelocale-independence Also consider localization of error Also consider localization of error

messagesmessages

Page 26: ppt

ConclusionsConclusions

UnificationUnification• One code baseOne code base

Customization Customization • Localization and adaptation for localesLocalization and adaptation for locales

Next step: cross-language “leakage”Next step: cross-language “leakage”• Provide views in multiple languages to the Provide views in multiple languages to the

same (user-generated) datasame (user-generated) data• Translate user-generated contentTranslate user-generated content

VolunteersVolunteers Machine TranslationMachine Translation

Page 27: ppt

Call for ContributionsCall for Contributions Presentation and Perl CGI demo codePresentation and Perl CGI demo code

• http://www.digitalsilkroad.net/web2expohttp://www.digitalsilkroad.net/web2expo Add a version in your preferred languageAdd a version in your preferred language

• Ruby on RailsRuby on Rails• PHPPHP• PythonPython• ……

Similar ASP.NET application Similar ASP.NET application • http://quickstarts.asp.net/QuickStartv20/http://quickstarts.asp.net/QuickStartv20/

aspnet/doc/localization/default.aspxaspnet/doc/localization/default.aspx

Page 28: ppt

ReferencesReferences

W3C Internationalization ActivityW3C Internationalization Activity• http://www.w3.org/International/http://www.w3.org/International/

POSIX LocalePOSIX Locale• http://www.opengroup.org/onlinepubs/009695399/basedhttp://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.htmlefs/xbd_chap07.html

International Components for UnicodeInternational Components for Unicode• http://www-306.ibm.com/software/globalization/icu/http://www-306.ibm.com/software/globalization/icu/

Unicode/Common Locale Data RepositoryUnicode/Common Locale Data Repository• http://www.unicode.org/http://www.unicode.org/

Microsoft Internationalization APIsMicrosoft Internationalization APIs• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/

ms776254.aspxms776254.aspx• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/

system.globalization.aspxsystem.globalization.aspx

Page 29: ppt

ReferencesReferences

OmegaTOmegaT• http://www.omegat.org/omegat/omegat_en/omegat.htmlhttp://www.omegat.org/omegat/omegat_en/omegat.html

Open Language ToolsOpen Language Tools• https://open-language-tools.dev.java.net/https://open-language-tools.dev.java.net/

The WordForge ProjectThe WordForge Project• http://www.wordforge.org/drupal/http://www.wordforge.org/drupal/

Javascript InternationalizationJavascript Internationalization• http://www.icu-project.org/docs/papers/internationalization_support_forhttp://www.icu-project.org/docs/papers/internationalization_support_for_javascript.html_javascript.html

RSS 2.0RSS 2.0• http://www.rssboard.org/rss-specificationhttp://www.rssboard.org/rss-specification

Atom SyndicationAtom Syndication• http://www.atomenabled.org/developers/syndicationhttp://www.atomenabled.org/developers/syndication

RSS 1.0RSS 1.0• http://web.resource.org/rss/1.0/spechttp://web.resource.org/rss/1.0/spec

W3C Web Services Internationalization Usage ScenariosW3C Web Services Internationalization Usage Scenarios• http://www.w3.org/TR/ws-i18n-scenarios/http://www.w3.org/TR/ws-i18n-scenarios/

Page 30: ppt

Additional SlidesAdditional Slides

Page 31: ppt

Multi-lingual SyndicationMulti-lingual SyndicationRSS 1.0RSS 1.0

Character encodingCharacter encoding• RSS 1.0 is an XML applicationRSS 1.0 is an XML application• XML encoding rules applyXML encoding rules apply

Complies to RDF (Resource Description Complies to RDF (Resource Description Framework) specificationFramework) specification• Definition of language and date/time formats Definition of language and date/time formats

are left to RDF metadata formatsare left to RDF metadata formats Dublin Core Metadata Element Set Dublin Core Metadata Element Set Language: RFC1766/ISO639-2Language: RFC1766/ISO639-2 Date/Time: ISO 8601 (superset of RFC 3339)Date/Time: ISO 8601 (superset of RFC 3339)

• Also Dublin Core allows to specify time periods!Also Dublin Core allows to specify time periods!