presented by: prof. manikrao l. dhore mr. abhishek k. dhote department of computer engineering
DESCRIPTION
LRC-XI-11 th Annual Internationalisation and Localisation Conference. A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach. Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering - PowerPoint PPT PresentationTRANSCRIPT
1
Presented By:
Prof. Manikrao L. Dhore
Mr. Abhishek K. DhoteDepartment of Computer Engineering
Vishwakarma Institute of Technology, Pune, India
A Paper On
Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach
LRC-XI-11th Annual Internationalisation and Localisation Conference
Organised By:
Localisation Research Centre (LRC),
Department of Computer Science and Information Systems (CSIS),
University of Limerick,Limerick,Ireland.
2
Agenda
Introduction — Why Web Page Localisation?
— Borderless Integration
— Why Multilingual Web Sites?
— What is Locale and multi-locale Operation?
— Internationalisation and Key Challenges
— I18n Standard: Important Issues and Business Context
— Variance : Regional and Cultural Issues
System Design— Web Localisation and Rural India
— Localization Approaches
— Architecture of Servers
System Implementation and Test Results— Configuration of Server
— Localisation Test Results
— Alternative Approach
Conclusion
References
3
Why Web Page Localisation?
Web Localisation
InformationRepository
Internet
Banking Sector
OnlineBusiness
ServiceSector
OpenLinguistic Barriers
ClosedLinguistic Barriers
ObjectiveInformation Convenience
International Market and Customers
Increased Sales Leads
Advantage of Global growth
Reduce Marketing Costs
4
Borderless Integration
Model Business Process
Integration Logic
Resource Mapping
AnalyseOptimizeProcess
Integration Deployment
BusinessEntities
BusinessLogic
Customer
MarketResearch
Internet Framework
Local
Global Global
5
Over 100 million people access the Internet in a language other than English.
Over 50% of web users speak native language other than English
According to Forrester research, 50% of all online sales are expected to occur outside USA.
Web users are four times more likely to purchase from a site that communicates in the customer’s native language.
“Your website is your window to the world…”
Why Multilingual Websites?
6
Basic Terminology
Locale
Set of features that can be varied depending on the language and culture of the user or the data
Internationalisation
The process of designing software so that it can be easily adapted to different locales
Localisation
The process of adapting software to a locale
7
What is Locale?
A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences.
Java locales are associated with upwards of 300 pieces of data— time zone names— collation sequences— the infinity symbol— Number formats— Days of the week
Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system.
Locales are generally part of the programming context or environment.
8
Multi-Locale Operation
SystemContext
MessagePassing
LogicExecution
ClientLocale
ContextSeparation
Design Policy
ServerProcesses
APIs provide late binding localisation
MessagePassing
LogicExecution
ClientLocale
9
Internationalisation
"I18n" is an abbreviation for the word "Internationalisation". The term "i18n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n".
I+n1t2e3r4n5a6t7i8o9n10a11l12i13s14a15t16i17o18+n
The extension of this naming convention to the terms Localisation (l10n), Europeanisation (e13n), Japanisation (j10n), Globalisation (g11n), seemed to come somewhat after the invention of "i18n".
— Potentially handle multiple languages, customs in the world— Displaying/ Inputting characters for the users' native languages.— Handling popular encoding for the users' native languages.— Native characters for file names and other items.— Character classification & sorting.— Typesetting and hyphenation rules.
10
Unicode support and implementation Use of language specific encoding Configuring encoding
Encoding and Character Set
Availability, Performance Continuity of i18n features Translation
Locale and Parameterisation
UI design Handling collation Migration of existing data
Presentation, Processing
Sta
nd
ard
s
Data Correspondence
Refer
ence
Info
rmat
ion
Key Challenges
11
Important Issues in I18n
CurrencyLanguage rules
UI preferences
Localization
Culture context
Date/TimeCharacter encodings
Business impact
Content management
12
Internationalisation
OldApplication
NewProduct
NewApplication
OldProduct
To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces
To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences.
Mergers / Acquisitions.
To consolidate same functionality application/service developed and maintained separately for separate language/region.
To support region specific functionality (due to legal aspects, financial practice etc.).
To provide region specific value added services (like UI, look and feel, Sorting/Searching).
NewService
ExistingService
Business Context of I18n
13
Regional and Cultural Differences
Software solutions should be designed to fit into the cultural context of the user
ExamplesNaming of the productDifferences in the meanings of jargonsConfusing graphical symbolsNational rules, conventionsReligious beliefs and assumptionsBasic cultural values and customsNo appropriate translations available for phrases and slogansFavorite sports and slangscultural anachronismsReading left-to-right, top-to-bottom etc…
14
Language and Character Encoding
Language peculiaritiesHyphenationCollationSpellingTransliteration
English: ABC...RSTUVWXYZGerman: AÄB...NOÖ...SßTUÜV…YZSwedish/Finnish: AB...STUVWXYZÅÄÖNorwegian: AB…VWXYÜZÆØÅ
There are various “standards” and they are varied for different languages
ISO standards: ISO-8859-1,2,3,4,5,6,7, Windows-1252Chinese encodings: Big5, Big5-HKCS, GB18030, GB2312Japanese and Korean: EUC-JP,EUC-KR, ISO-2022-JP, ISO-2022-KR
15
Unicode Character Standard
Developed by the Unicode Consortium
Covers all major living scripts
Version 4.0 has 96,000+ characters
Capacity for 1 million+ characters
Unicode Character Set = ISO 10646
Unicode adds character properties and algorithms
ISO and Unicode work together to synchronize
ISO support enhances international acceptance
16
Date / Time Formats Variance
Locale Example Format
U. S. A. 2/16/05 mdy, /
France 16.2.05 dmy, .
France 16-2-05 dmy, -
CJKT 2005/2/16 ymd, /
Japan 17/2/16 ¥md, /
Hour minute separators,AM,PM,TimeZone
• India : 4:00 P.M.• U.S.A. : 4:00 p.m.• France : 16.00• Japan : 1600• Japan : 4:00
17
Numbers / Currency Variance
Varieties in group and fractional separators
• India : 12,34,567.89• England : 12,345.67• Germany : 12.345,67• Switzerland: 12’345,67• Swiss money: 12’345.67• France : 12 345,67
Varieties in symbol placement, symbol length, precision, number width, rounding rules
• India : Rs. 12,34,567.89 ; Re. 1• U.S.A : US $1,234,567.89• France : 12.345,67 €• Portuguese : 12$34ESC• Portuguese : 12$34€
18
System Design
19
Indian Languages Profile
20
Data Source : 2001 Census of India
Number Percentage
Hindi 337,272,114 40.22%
Bengali 69,595,738 8.30%
Telugu 66,017,615 7.87%
Marathi 62,481,681 7.45%
Tamil 53,006,368 6.32%
Urdu 43,406,932 5.18%
Gujarati 40,673,814 4.85%
Kannada 32,753,676 3.91%
Malayalam 30,377,176 3.62%
Oriya 28,061,313 3.35%
Punjabi 23,378,744 2.79%
Assamese 13,079,696 1.56%
Sindhi 2,122,848 0.25%
Nepali 2,076,645 0.25%
Konkani 1,760,607 0.21%
Manipuri 1,270,216 0.15%
Kashmiri 56,693 0.01%
Sanskrit 49,736 0.01%
Other Languages 31,142,376 3.71%
Total : 838,583,988 100.00%
Language
Percentage Languages Usage Index
21
Population resides in villages of India : 70%
Total number of Languages in India : 40
Official Languages : 22
Indian Currency Example
Language Panel
Indian Currency (Value Rs. 10)
15 major Indian Languages
Overall Literacy Rate : 64.20 %
English Language Literacy : 17.75 %
22
Internationalisation
Text Extraction
Translation
Localisation
Prepare material for localisation(account for text expansion, avoid embedded text..)
Extract text from sourceFiles (graphics, PDFs etc.)
Translate content fromExtracted materials
Replace graphics, change colors, redesign layout to accommodate target culture.
Information Channelisation
23
Localisation Process
Web page is
“dynamically” converted into
target language
Languageselection
Static web page
is selected and
displayed
TranslationLocalisation
Site Acceptance Factors— Color— Image— Representation
Translation ErrorsText Placement in Separate File
Late Binding
MappingTechniques
24
Server Architecture
ClientBrowser_1
ClientBrowser_2
ClientBrowser_3
SOCKET
API
HTML Server
Parse Request Module
Localised Content
--------------------------------
Default Alternative Language Response
ClientBrowser_n
Property File
------------------------------------
25
Definition – To parse the request header
Responsibilities – To parse the request header– To analyze and forward the request– Provide log to the administrator
Compositions – Main server loop– Threads
Interfaces/Ports — Socket APIs
Implementation: Parse Request Module
26
Parse Request Module Architecture
Main
Server Loop
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
Thread n
27
Definition – Default implementation of HTTP protocol– Processes static HTML requests
Responsibilities – Process static HTML request – Process dynamic Internationalisation request
Compositions – Server Processes
Interfaces/Ports— Socket APIs
HTML Server
28
HTML Server Architecture
Parse ProtocolGET/POST
Default Language
Alternative Language
Default Language
Alternative Language
Static Response
--------------------------------
Static Response
--------------------------------
GET Request Processor
POST Request Processor
.properties ------------------------------------
29
System Implementation and Test Results
30
Java Support for Internationalisation
The Locale class lets applications identify locales, allowing for truly multilingual applications.
The ResourceBundle class provides the foundation for localisation, including localization for multiple locales in a single application container.
The Date, Calendar, and TimeZone classes provide the basis for time handling around the globe.
The String and Character classes as well as the java.text package contain rich functionality for text processing, formatting, and parsing.
Text stream input and output classes support converting text between Unicode and other character encoding.
31
Conversion Process
Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings.
The reality is :— A single character in a non-Unicode encoding may have multiple equivalent
representations (say, a precomposed character and a sequence of base character and combining mark).
— A character in one encoding may not have an equivalent in the other encoding.
— An invalid sequence of bytes or characters may show up in the input.
32
Process: Configure Server
33
Process: Register
34
Process: Log
35
Process: Localise Servlet
36
Web Page in English with IE
37
Web Page in Spanish with IE
38
Web Page in Dutch with IE
39
Web Page in French with IE
40
Web Page in Italian with IE
41
Web Page in Portuguese with IE
42
Web Page in German with IE
43
Web Page in English with IE
44
Web Page in Marathi with IE
45
The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages
The rich set of Java class libraries such as java.util.ResourceBundle and java.util.Locale provide an efficient approach to work with locale specific information
More manageable workspace for users in native language
Regional Settings, Colour, Image representation not disturbed
Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces
Supports region specific functionality (due to legal aspects, financial practice etc.).
Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region.
Conclusion
46
References
[1]. Fernandez, N. C. (2000), Web Site Localisation and Internationalisation: A Case study, published, City University[2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D.A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The eBusiness Report. In: eMarketer[4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16th International Unicode Conference, Amsterdam[5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation
47