1 probabilistic artificial neural network for recognizing the arabic hand written characters khalaf...

29
1 Probabilistic Artificial Neural Network For Probabilistic Artificial Neural Network For Recognizing the Recognizing the Arabic Hand Written Characters Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary ,and Basem Al- Rifai Journal of Computer Science-2006 Rahma A. Al- Zahrani Presented by King Saud University King Saud University College of Computer and Information Science College of Computer and Information Science Computer Science Department Computer Science Department CSC 563 Neural Network CSC 563 Neural Network Nailah S. Al- Hassoun Prof. Mohamed Batouche Supervised by

Upload: jasper-owen

Post on 15-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

1

Probabilistic Artificial Neural Network For Recognizing theProbabilistic Artificial Neural Network For Recognizing theArabic Hand Written CharactersArabic Hand Written Characters

Khalaf khatatneh, Ibrahiem El Emary ,and Basem Al- RifaiJournal of Computer Science-2006

Rahma A. Al-Zahrani

Presented by

King Saud UniversityKing Saud UniversityCollege of Computer and Information ScienceCollege of Computer and Information Science

Computer Science DepartmentComputer Science DepartmentCSC 563 Neural NetworkCSC 563 Neural Network

Nailah S. Al-Hassoun

Prof. Mohamed Batouche

Supervised by

Page 2: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

2

ObjectivesObjectives

• Overview the various techniques used in recognizing the hand written Arabic characters.

• Focus on description of Optical Character Recognition (OCR)

• Present new technique which assists in developing a recognition

system for handling the Arabic Hand Written text (AHOCR).

Page 3: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

3

OutlineOutline

Introduction. DefinitionDefinition HistoryHistory

(OCR): Optical Character Recognition system. Proposed method (AHOCR) “Arabic Hand Written Optical Character

Recognition”. Moment InvariantsMoment Invariants ProbabilisticProbabilistic NeuralNeural NetworkNetwork

Experimental Results. Conclusion. References.

Page 4: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

4

IntroductionIntroduction

• Machine simulation of human reading has been the subject of intensive research.

• A large number of researches are concerned with Latin, chiness

and Japanese characters.

• A little work has been conduct on the automatic of Arabic characters because of its complexity of text.

• We will present the state of Arabic character recognition research .Then, we will present the proposed method .

Page 5: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

5

DefinitionDefinition

Recognized Characters Categories:Recognized Characters Categories:

The task of recognized characters can be separated into two categories:

1. The recognition of machine printed data.

Uniform (size,and position for any given font). Simpler.

2. The recognition of hand written data. Non-uniform( different styles and sizes by different writers and by the

same writers). Difficult.

Page 6: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

6

There are two kinds of input for Character Recognition : • Off-line character recognition:

1. Takes a raster image from a scanner, digital camera or other digital input source.

2. Off-line processing happens after the writing of characters is complete and the scanned image is preprocessed.

3. Its knowledge is limited to whether a given pixel is on (1) or off (0).

• On-line character recognition:1. Takes (x,y) coordinate pairs from an electronic pen touching a

pressure-sensitive digital tablet. 2. On-line processing happens in real-time while the writing is

taking place. 3. Relationships between pixels and strokes are supplied due to the

implicit sequencing of on-line systems that can assist in the recognition task.

Example: off-line and on-line

handwriting inputs

off-line on-line

Definition contDefinition cont . .

Page 7: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

7

HistoryHistory

(1990) Hierarchical rule based approach: In this approach the author (Sheik and Al Taweel )assumed :

• A reliable segmentation stage which divide letters into the four groups of position:

1. Initial,2. media,3. final,4. Isolated.

• The recognition system depends on a hierarchical division by the number of strokes. (One stroke letters were classified separately from two stroke letters …etc)

Page 8: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

8

History ContHistory Cont..

(1990) Segmented structural analysis approach: In this approach the author (Al -Emami and Usher) presented :

• An on-line system to recognize handwritten Arabic words.• Use a structural analysis method for selecting features of Arabic

characters.• The classifications use a decision tree.• Words are segmented into primitives that are usually smaller than

characters. • The system is fed by specifications of the primitives of each character.• The system was trained on 10 writers. They use one tester

who had a recognition rate of 86%.

Page 9: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

9

History Cont.History Cont.

(1995) Template matching and dynamic programming approach:

In this approach the author ( Alimi and Ghorbel) showed :

• how to minimize error in an off-line recognition system for isolated Arabic characters using template matching and dynamic programming.

• The reference bank of prototypes was prepared.• When new data was presented to the system, the distance

between the prototype and the new data string was minimized using dynamic programming.

• The number of prototypes was varied to see the effect on recognition rates. More prototypes give better accruing.

Page 10: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

10

History ContHistory Cont..

(1998) Structural and fuzzy approach:In this approach the author (Amin and Bouslama) presented :

• A hybrid system that combine structural and fuzzy techniques.

– Structural analysis : separated between various letter classes to be recognized.

– Fuzzy logic : allowed for variability in people hand writing within the same class.

Page 11: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

11

History Cont.History Cont.

(2005) Artificial neural networks classifiers: In this approach the author ( Haraty and El-Zabadani) present:

• A system for recognition of handwritten Arabic text using neural networks.

• Their work builds upon previous work that dealt with the vertical segmentation of the written text.

• In fact, They faced some problems like overlapping characters that share the same vertical space.

• They tried to fix that problem by performing horizontal segmented.• The system was tested and the rate of recognition obtained was

90%.

Page 12: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

12

Optical Character Recognition Optical Character Recognition system (OCR)system (OCR)

• Optical Character Recognition (OCR) :It is machine reads (machine printed /hand written) characters and tries to determine which character from a fixed set of the (machine printed /hand written) characters is intended to represent.

• The goal : is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters.

• The process of OCR involves several steps including:

1. segmentation, 2. feature extraction,3. classification.

Page 13: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

13

OCR-Pre-processingOCR-Pre-processing

• Pre-processing receives a first binary image of characters.

• It is primarily used to reduce variations of characters.

• The pre-processing steps often performed in OCR are:

– Scanning, – Binarization, – Noise Removal, – Segmentation,– Normalization.

Pre-processing

Optical Character Recognition System

Page 14: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

14

Feature ExtractorFeature Extractor

• Feature extraction abstracts high level information about individual patterns to facilitate recognition.

Page 15: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

15

RecognizerRecognizer

• There are three categories of character Recognizers (classifiers):

– Neural network approach,

– Statistical approach,

– Structural approach.

Page 16: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

16

Classification ProcessClassification Process

For the classification process, there are two steps in building a classifier:

1. Training.

2. Testing.

1 2

Page 17: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

17

Post-processorPost-processor

• The post-Processor is designed to improve the accuracy of the recognition process.

• The post-processor use various techniques to distinguish one character from another.

• Example, the geometry of an image can be used to distinguish characters represented by the image.

Geometric features include:– loops such that appear in the

handwritten letters ،ق،ف ه و، . – Straight lines which appear in such

handwritten letters as أ.

Page 18: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

18

Proposed Method (AHOCR)Proposed Method (AHOCR)

• Arabic Hand Written Optical Character Recognition (AHOCR)

• It aims to convert images document to text.

– moving the document from a storage location to a digital scanner.

– the output is formatted image with bmp, 'jpg' ,or 'jpeg’

– applies the image to a recognition system according to the invention.

Page 19: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

19

Input Image document

Binarization

Noise removal

Normalization

Segmentation

Thinning

Size normalization

Slant correction

Feature extraction ( moment invariant)

Recognizer (probabilistic neural network)

Text document

Preprocess

Character feature extraction

Page 20: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

20

TheThe main components of AHOCRmain components of AHOCR

• Preprocess:

– This stage is structured from various phase given by: Binarization, noise removal, segmentation and normalization.

• Character feature extraction operation of AHOCR:

– Geometric moment invariant , and probabilistic neural network classifier of AHOCR

Page 21: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

21

Scanned handwritten document on scanner with (300dpi) to create image document (as bmp or jpg file

format)

Binarization: convert coloring image to black andwhite (binary image) image

Noise removal by using median filtering and wienerfilter

Preprocess image document to segment characters tomake each character on the image document as a

single bmp image file

Thinning

Size normalization & slant correction

Calculate moment invariant (with sevenmoments invariant) for each segmented image

(Handwritten Arabic characters) and store it inmoment text file

Page 22: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

22

Moment InvariantsMoment Invariants

• Moment invariants are properties of connected regions in binary images that are invariant to translation, rotation and scale.

• They are useful because they define a simply calculated set of region properties that can be used for shape classification and part recognition.

Page 23: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

23

ProbabilisticProbabilistic NeuralNeural NetworkNetwork

• A PNN Probabilistic Neural Network has three layers of nodes :

– The input layer contains N nodes, one for each of the N input features of a feature vector.

– The hidden layer contains a node for each training vector. The hidden nodes are collected into groups: one group for each of the K classes.

– The output layer has a node for each class that is recognized by the PNN.

Page 24: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

24

ProbabilisticProbabilistic NeuralNeural NetworkNetwork

Page 25: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

25

Experimental ResultsExperimental Results

• 25 independent writers document• Documents were then processed. The experiments were done on

3 disjoint data sets given by:• 1. Training (37800)= 20 volunteers x 5 iterations x 378

characters• 2. Validation (3780)= 10 volunteers x 378 characters• 3. Test (764) (5 volunteers with different number of characters

in each document).

Page 26: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

26

• The baseline for recognition accuracy was defined as the average accuracy of the validation and test set of the best PNN architecture

• Probabilistic neural network (PNN) has 7 input nodes in its input layer and 142 nodes in its output layer architecture

ExperimentalExperimental ResultsResults

Page 27: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

27

Experimental ResultsExperimental Results

• After running our algorithm with a learning rate of η=0.9Train recognition

accuracy rateValidation recognition

accuracy rateTest recognition

accuracy rate Average accuracy rate

99% 98%97%97%

Page 28: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

28

ConclusionConclusion

•Arabic handwritten recognition is a difficult problem but the AHOCR system will be a step towards a neural network approach to robustly solve it.

• AHOCR is optical handwritten Arabic character recognition (OCR) software capable of producing a fully editable electronic document with current accuracy of 97% for isolated Arabic handwritten character recognition and 96% for Arabic handwritten document recognition.

Page 29: 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal

29

ReferencesReferences

• Klassen, T., “Towards Neural Network Recognition Of Handwritten Arabic Letters. ”Dalhousie University (2001). (Link)

• Abuhaiba, I.S. “A Discrete Arabic Script for Better Automatic Document Understanding,” The Arabian J. Science and Eng., vol. 28, pp. 77-94, (2003). (Link)

• https://www.dacs.dtic.mil/techs/neural/neural7.php

• www.homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FISHER/mominv.htm

• www.scipub.org/fulltext/ajas/ajas.html