"computing support for pakistani languages, challenges & practices" by dr. sarmad...

Post on 20-Oct-2014

598 Views

Category:

Marketing

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation was shared on PAS Digital Marketing Conference "Dig-It 2.0" Session name: Urdu Internet - Leveraging Technologies Presentation: Computing support for Pakistani Languages, Challenges & Practices Speaker: Dr. Sarmad Hussain, Professor and Head, Center for Language Engineering, University of Engineering and Technology, Pakistan

TRANSCRIPT

www.cle.org.pk 1

Computing Support for Pakistani Languages – Challenges and Practice

Sarmad HussainCenter for Language Engineering

Al-Khawarizmi Institute of Computer ScienceUniversity of Engineering and Technology

Lahore

sarmad@cantab.net

Unlocking Information for Human Developmentwww.CLE.org.pk

www.cle.org.pk

NeedICTs promise significant socio-economic impact

Impact dependent on size of population which can use ICTs

180 Million citizens need access66+ languages

10% understand English58% literate

11% have access to computers70% have access to mobile phones

ITU IDI: Pakistan ranked 127 of 155 nations

Human Language Technology necessary to bridge the gap 2

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

3

Percent Population of Pakistan by

Mother Tongue

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

4

Percent Population of Pakistan by

Mother Tongue

Economic

Socio-cultural

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

Languages of Pakistan in Danger (UNESCO)

Vulnerable

definitely endangered

severely endangered 5

Percent Population of Pakistan by

Mother Tongue

Economic

Socio-cultural

www.cle.org.pk 6

How?

USE

Human Language Technology Linguistic Research

StandardsApplicationsMaterials

Training

Relevant Content AccessRelevant Content Generation

Adoption

www.cle.org.pk 7

Human Language Technology – Bridging Barriers

• Interfacing• Assisting• Enabling• Empowering

www.cle.org.pk 8

و سخرالشمس والقمر

Interfacing– Character Set

• Input Methods• Writing• Collation

– Terminology Translation

Language

Technology– Applications

• Fonts• Keyboards, Keypads and

Other Input Methods• Collation Methods• Localized Platform

Standards– National– International

• ISO 639• ISO 3166• ISO 10646/Unicode

– Platforms: Computers and Phones• Linux/Unix and Symbian• Microsoft Windows and Phone• iOS – iPAD, iPhone, Macbook, …• Google – Gmail, Docs, …Android

Software Localization

SeaMonkey Navigator

OpenOffice.org Writer

www.cle.org.pk 10

Terminology and Content

www.cle.org.pk 11

Assisting

• Text– Assistive input/auto-complete methods– Thesaurus, Spelling and Grammar Checking– Machine Translation, Language Identification, Text Summarization …

• Speech– Speech Recognition– Text to Speech– Emotion Detection, …

• Image – Optical Character Recognition – www.UrduOCR.net – Handwriting Recognition

www.cle.org.pk 12

www.cle.org.pk 13

www.cle.org.pk 14

Enabling

• Hybrid– Online Content Sharing Tools – CMS, Social

Networks– Screen Readers– Book Readers– Text based Search Engines– Dialogue Systems– Speech to Speech Translation– Multi-modal Search Engines

www.cle.org.pk 15

Dialogue System

www.cle.org.pk 16

Empowering

• ICT for ICT - Focused on infrastructure• ICT for Development - Focused on content and applications• ICT for Human Development - Focused on participatory process

www.cle.org.pk 17

www.cle.org.pk 18

LANGUAGE AND ICT TRAINING

Before Training After Training Before Training After TrainingSoftware Training Material

0%

20%

40%

60%

80%

100%Preference for Urdu

Preference for English

Before Training After Training Before Training After TrainingSoftware Training Material

0

20

40

60

80

100Preference for Urdu

Preference for EnglishPe

rcen

t Te

ache

rs

www.cle.org.pk 19

LANGUAGE AND ICT TRAINING

Icons

Icon Identification by Students

Urdu English

English Transliterated

into Urdu

Didn't Recogni

ze

Sub-Total F M F M F M F M

Sub- Total

691

656

132

198

150

183 49 40 2099

Total 1347 330 333 89 2099

64%16%

16%

4%

www.cle.org.pk 20

ACCESSING INFO ONLINE

Students

Language Used

TotalUrdu

English

Female 44 2 46Male 45 2 47Total 89 4 93

Participant

English Urdu

Students

0 138

Teachers

5 13

Total 5 151

Preferred Language for Setting a Homepage

Language Preference for Searching on the Internet

www.cle.org.pk 21

LANGUAGE IN ONLINE COMMUNICATION

89%

9%1% 2%

Urdu

English

Punjabi

Others

1467 emails and 363 chats

www.cle.org.pk 22

LANGUAGE FOR CONTENT DEVELOPMENT

Website Competition CategoryLanguage of Website

Urdu English Total

School Website (by 10 School Teacher Teams)

9 1 10

Local Village Website (by 10 School Student Teams)

8 0 8

Open Category (Individual Students) 38 0 38

Total 55 1 56

[1] One school did not participate, and one school website was disqualified as the team took significant external assistance.

www.cle.org.pk 23

CONTENT

Development Process of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

Select Language

24

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

URDU

Reasonable Support

Some Support

Minimal Support

25

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

SINDHI

Reasonable Support

Some Support

Minimal Support

26

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

PUSHTO

Reasonable Support

Some Support

Minimal Support

27

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

PUNJABI

Reasonable Support

Some Support

Minimal Support

28

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

BALOCHI

Reasonable Support

Some Support

Minimal Support

29

Status of Human Language Technology

30

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

SARAIKI

Reasonable Support

Some Support

Minimal Support

Status of Human Language Technology

31

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

OTHERS

Reasonable Support

Some Support

Minimal Support

www.cle.org.pk 32

top related