bangla localization of - pan localization

27
Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Bangla Localization of OpenOffice.org

Upload: others

Post on 04-Feb-2022

25 views

Category:

Documents


0 download

TRANSCRIPT

Asif Iqbal Sarkar

Research Programmer

BRAC University

Bangladesh

Bangla Localization of OpenOffice.org

L10n is the process of adapting the text and applications of a

product or service to enable its acceptability for a particular

cultural or linguistic market.

• Numerous Locale details ( Currency, National regulations

and holidays, Cultural sensitivities, Product or service

names etc.)

Localization primarily includes:

• Translating text content, software source code, web sites, or

database content

• Adjusting graphic and visual elements and examples to

make them culturally appropriate

Localization

The term "software localization"

describes the process of altering

software products for marketing to

people who speak languages other

than English.

Software Localization

I18n is planning and implementing products and services

so that they can easily be localized for specific languages

and cultures.

Internationalization include:

• Creating illustrations for documents

• Allowing space in user interfaces

• Creating print or web site graphic images

• Ensuring that the tools and product can support international

character sets for software,

• Ensuring data space so that messages can be translated from

languages with single-byte character codes (English) into languages

requiring multiple-byte character codes (Japanese Kanji).

Internationalization

The whole OpenOffice.org architecture is based on a

layered approach. There are four well-defined layers,

each covering a special area of the functionality.

• System Abstraction Layer

• Infrastructure Layer

• Framework Layer

• Application Layer

OpenOffice.org architecture

The L10N and I18N project contains a framework and tools

for localization (l10n) and internationalization (i18n).

l18N

The i18n framework offers functionality which is needed to

internationalize applications like an office suite (OOo).

• The old i18n framework (i18n) offers only support for western languages

• The new i18n framework support for western, Chinese, Japanese and Korean

(CJK) languages and languages which needs Complex Text Layout like Arabic,

Hebrew, Indic and others.

L10N and I18N Project of OpenOffice.org

• Word, line and sentence break

• Search and replace

• Paragraph numbering

• Transliteration

• Character classification

• Number Formats

• Calendar

• Collation

• Locale data (date/time/number/currency format,

calendar information etc )

l18N Framework Functionality

• Internationalization (i18n) of an application is complete only if any locale

support can be added without changing the application binary.

• Development platforms like win32® and Java® provide i18n APIs to

internationalize applications that will run on windows and Java platforms

only.

• The OpenOffice.org I18n framework provides a rich set of i18n APIs to

internationalize OpenOffice.org applications using the Universal Network

Objects (UNO) component model.

• This i18n framework is platform-independent and can run on any platform

on which OpenOffice.org is supported.

• The new OpenOffice.org i18n framework is Unicode based and offers a

rich set of APIs and functionality.

• The i18n framework allows localization developers to add new locales or

enhance existing locale behavior to meet regional market requirements

without modifying the OpenOffice binary.

l18N Framework

• The L10N-Framework of OpenOffice.org provides a easy

to use environment to introduce new languages to the

system and to support new localization of OpenOffice.org.

• The L10N-Framework is based on the multi platform build

environment of OpenOffice.org and it allows only rebuild

of those targets which are mandatory for localization.

L10N Framework

A second method for native language support.

The l10n module offers several localization tools which

support extraction of strings and context information

out of source code. Also merging back localized strings

is supported.

1. Adding a New Language to the Office Suite

2. Extracting and Merging Strings and Messages

The framework is built over the component model UNO

thus making the addition of new localization

components easy.

L10N Tools for Translation

Localization of OpenOffice.org

Localization of OpenOffice.org involves:

• Assuring that OpenOffice.org can work with your script

in the platform in which you want to use it.

• Assuring that some changes are made in the

OpenOffice.org source, so that the program recognizes

your language as one of the languages it can work in.

Translation of OpenOffice.org to your language.

Translation has different levels.

1. The first one is translating the menus and messages of

the program itself.

2. The second one includes also the translation of the help

pages of OpenOffice.org, not a small task.

3. The third level adds the development of documentation

for OpenOffice.org in your language.

Localization of OpenOffice.org

Steps of translation

• Extracting all strings and messages out of the source

code for translation.

• Translating the source code using well known

localization tools. (localize).

• Merging back the translated strings into the code.

• Rebuilding the localization targets inside the build

environment.

• A Localized installation set will be created automatically.

Localization of OpenOffice.org

Status of Bangla Localization of OO

• Aim is to Develop a Bangla version of the office suit.

• Developed a Bangla Locale file for OpenOffice.org.

• Bn_BD and bn_IN stable locale files.

• Bangla is not in the supported languages list in the

Localization framework project of OpenOffice.org.

OpenOffice.oreg support for Bangla

Bangla Script Support

Bangla Rendering Support

Rendering Problem

Unsynchronized forward and backward

movement of cursor causing problems for

Replacement, deletion operations for

Bangla text processing.No Bangla Script support

In dialogue boxes.

Lingucomponent Project

The Lingucomponent Project provides the writing aid

features: spell checking, hyphenation, and thesaurus.

One of the goals of the Lingucomponent project is to

develop dictionaries and affix files to support spell

checking in different languages.

OpenOffice.org doesn't provide dictionary for Bangla.

MySpell supports only 8-bit encoding. No Unicode Support

for Bangla script.

Ispell’s dictionary and affix files are converted to conform

with MySpell and then the licensed dictionary files are sent

to the project authority who includes the specific language

dictionary in the next release of OpenOffice.org

Lingucomponent Project Spellchecker

Bangla Computing Integration into OpenOffice.org

OpenOffice.org provides flexibility for adding components

to it through component based API called UNO (Universal

Network Object).

• UNO is the base component technology for

OpenOffice.org.

• It is used to write components that interact across

languages, component technologies, computer platforms,

and networks.

• Currently UNO is available on Linux, Solaris, and

Windows for Java, C++ and OpenOffice.org Basic.

UNO Features

• UNO is used to access OpenOffice.org using its Application Programming

Interface (API).

• The OpenOffice.org API is the comprehensive specification that describes the

programmable features of OpenOffice.org.

• It is possible to connect to a local or remote instance of OpenOffice.org from

C++, Java and COM/DCOM using UNO.

• C++ and Java Desktop applications, Java servlets, Java Server Pages, Jscript

and VBScript, and languages, such as Delphi, Visual Basic and many others can

use OpenOffice.org to work with Office documents.

• It is possible to develop UNO Components in C++ or Java that can be

instantiated by the office process and add new capabilities to OpenOffice.org.

For example, Chart Add-ins or Calc Add-ins, linguistic extensions, new file

filters, database drivers and even complete applications, such as a groupware

client.

UNO Features

Remote Connectivity to OpenOffice.org

OpenOffice.org

Server program

SERVICES

Client program

JAVA

Client program

C++

Client program

VB

UNO OpenOffice API

• Open Documents

• Write Components

• Add Components

A Unicode file is loaded remotely in OpenOffice.org

Remote Connectivity to OpenOffice.org

This is the normalized form of the loaded file...then “����” is searched

and replaced with “����“ and the replaced character is highlighted.

Remote Connectivity to OpenOffice.org

Simple Bangla OCR and OpenOffice Integration

OCR program with sample Bangla input

• OCR client program accesses the OpenOffice.org server to use it’s

services. This client-server model approach is platform independent.

• After getting the scanned Bangla document the Bangla OCR

program will simply recognize the characters and send the Unicode

code points of the characters to the OpenOffice.org server with a

request to open a window to display the recognized characters where

an user can edit or modify the document.

• Here a simple Bangla OCR program is used as a sample to

demonstrate the approach that could be extensively used to develop

useful programs or utility components as client programs in a

distributed system by getting the service from OpenOffice.org.

Simple Bangla OCR and OpenOffice Integration

Editable Output in a OpenOffice.org document

Simple Bangla OCR and OpenOffice Integration

References

OpenOffice Homepage

http://www.openoffice.org

OpenOffice Localization and Internationalization Project

http://l10n.oprnoffice.org

I18n API

http://api.openoffice.org

UNO Home page

http://udk.openoffice.org

THANK YOU