mtech monica

38
Handwritten Character Recognization in English April 12, 2015 Faculty Of Technology Dharmsinh Desai University Prepared By: Monica D. Patel (13MEPOS009) Guided By: Prof. Shital P. Thakkar 1 / 38

Upload: monica-patel

Post on 09-Feb-2017

133 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mtech Monica

Handwritten Character Recognization in English

April 12, 2015

Faculty Of TechnologyDharmsinh Desai University

Prepared By:Monica D. Patel(13MEPOS009)

Guided By:Prof. Shital P. Thakkar

1 / 38

Page 2: Mtech Monica

Outline

I Handwritten Character Recognition System

I Literature Survey for separate characters

I Handwritten Character Recognition of separate character

I Handwritten Character Recognition from paragraph

I Literature Survey for cursive words

I Segmentation of lines and words from handwritten paragraph

I Character segmentation and recognition

I Observation of handwritten cursive words

I Rule based Character Segmentation

I Result analysis

I Conclusion

I Future Scope

I References

2 / 38

Page 3: Mtech Monica

Handwritten Character Recognition(HCR) System

3 / 38

Page 4: Mtech Monica

Literature Survey for separate characters

”Handwritten Character Recognition in English-A Survey” published in International Journal of Advanced Research

in Computer and Communication Engineering(IJARCCE), vol-4, Issue-2, February 2015.

4 / 38

Page 5: Mtech Monica

Pre-processingI Binarization

Figure: Binarization by Otsu’s method

I Skew detection and Correction

5 / 38

Page 6: Mtech Monica

Database Creation

I Collected Handwritten Document from different persons,created 1378 database of both uppercase and lowercasecharacters by scanning this documents.

I Collected 830 database of English Numerals.

I Normalize them to 64*64 and extracted features from it.

6 / 38

Page 7: Mtech Monica

Feature ExtractionI Structural Features:Structural features describe the

geometrical and topological properties of character.eg-Endpoints, crossing points, loops, aspect ratio, centroid , up,down, left and right projection profiles etc

Endpoint,Crosspoint andloop in Character

Up,down,left and rightprojection Profile ofCharacter

I Global Transformation:Global transformation based featuresgive representation of image shape.It is spatial domain tofrequency domain translation of image. eg-DFT ,DCT, DWTetc

7 / 38

Page 8: Mtech Monica

Discrete Wavelet Transform

I DWT is frequency and time domain representation of image.

Wφ(j0,m, n) = 1√MN

∑M−1x=0

∑N−1y=0 f (x , y)ψj0,m,n

(x , y)

W iψ(j ,m, n) = 1√

MN

∑M−1x=0

∑N−1y=0 f (x , y)ψi

j ,m,n(x , y)

for j ≥ j0 i = {H,V ,D}

Originalimage

DWT image IDWT image

8 / 38

Page 9: Mtech Monica

Types of Classifier

I Unsupervised classifier:The model is not provided with thecorrect results during the training. It used to cluster the inputdata in classes on the basis of their statistical properties only.eg-Different Types of Clustering, kmeans etc

I Supervised classifier:Classifier undergoes learning process byusing training data which includes both the input and thedesired results.Training data are used as references for theclassification of new(test) data.eg-SVM,HMM etc

9 / 38

Page 10: Mtech Monica

Support Vector Machine(SVM) Classifier

I SVM finds hyperplane which separates data perfectly into itstwo classes.hyperplane parameterized by a vector (w),and aconstant (b)

I In order to maximize geometric distance between data points,||w || should be minimized.

Linearly Separable Not Linearly Separable

10 / 38

Page 11: Mtech Monica

Kernel Functions

I Radial basis Function(RBF) kernelK (x , y) = e−γ||x−y ||

2

Effect of gamma(γ) in RBF kernel

11 / 38

Page 12: Mtech Monica

12 / 38

Page 13: Mtech Monica

Fuzzy K-Nearest Neighbour ClassifierI In KNN classifier labels of K nearest neighbor will be

considered and letting majority vote to decide label ofunknown instance.

I Fuzzy K- nearest neighbor assigns a class membership to thesample vector rather than assigning vector to particular class.

I The vectors membership to all the class must sum to one.

ui (x) =∑k

j=1 uij (||x−xj ||− 1

m−1 )∑kj=1(||x−xj ||

− 1m−1 )

13 / 38

Page 14: Mtech Monica

14 / 38

Page 15: Mtech Monica

Handwritten Character Recognition from paragraph

15 / 38

Page 16: Mtech Monica

Methods of cursive handwritten word recognition

I Holistic Approach: To recognize entire words, withoutsplitting them into single characters.

I Segmentation based Approach: Character segmentationstrictly precedes character classification and hence wordrecognition.

I Recognition-Based Segmentation Approach: Charactersegmentation and character classification steps are not totallyseparate.

I Mixed Approach: Systems that belong to this group containelements from above mentioned groups.

16 / 38

Page 17: Mtech Monica

Literature Survey of Cursive words

17 / 38

Page 18: Mtech Monica

Problem in Line segmentation

I Line segmentation using horizontal projection profile.

I Line segmentation by connected component labeling.

18 / 38

Page 19: Mtech Monica

Collected DatabaseI Collected database of finite words of cities name written by 13

different persons.I Database of handwritten paragraph from 40 different persons.

19 / 38

Page 20: Mtech Monica

Segmentation

I Line Segmentation

I Word Segmentation

20 / 38

Page 21: Mtech Monica

Slant Correction

I Thinning of word is done.

I Near-horizontal strokes in thinned word image are removed.

I Remaining parts are covered by bounding box.

I Estimate the average slant by measuring slant of individualfragments inside considered box.

I Using this slant angle perform shear(vertical) AffineTransform.

[x y 1

]=[v w 1

]∗ (T )

=[v w 1

]∗

1 0 0s 1 00 0 1

21 / 38

Page 22: Mtech Monica

Character Segmentation

Problems in character segmentation

I words written in different manner, cursive, separate and mixed.

I Broken and incomplete characters.

I Overlapping characters makes separation difficult.

I Some characters are joined with each other.

I Irregular intensity of handwriting due to difference in appliedpressure while writing, some words are broken after performingthresholding.

22 / 38

Page 23: Mtech Monica

Methods used in Literature for Character segmentationfrom word

I Removal of over-segmented points by using threshold[19].

23 / 38

Page 24: Mtech Monica

I Segmentation point validation using Artificial Neuralnetwork[17].

24 / 38

Page 25: Mtech Monica

I Handwritten Text Segmentation using Average Longest PathAlgorithm[18].

25 / 38

Page 26: Mtech Monica

Probable SolutionI Character Segmentation using Stroke width and Aspect

ratio

26 / 38

Page 27: Mtech Monica

I Segmentation of joined character

I Self organizing feature maps is implemented to identify thetouching portion in characters.

I Pixels of the image are mapped in to co-ordinate system asfeatures vector, they are clustered into three classes left, rightand middle.

I Vertical segmentation is performed at winner node of middleregion which is found by using SOM.

27 / 38

Page 28: Mtech Monica

Results of Self Organizing Feature Maps for joinedcharacters

28 / 38

Page 29: Mtech Monica

I Character Segmentation using Neural Network

Following features of segmentation points are extracted.I Position where segment boundary intersects the character

with respect to main height.I The value of vertical histogram at the starting segment

boundary of segment.I The width of current segment as a proportion of main height.I The position of the two longest horizontal strokes (if they

exist) in the segment, with respect to the main height.I The position of the two longest vertical strokes (if they exist)

in the segment, with respect to the segment width.

29 / 38

Page 30: Mtech Monica

Observation of Handwritten cursive words[10]

I Both ascenders and descenders of a word complicate thetracing of possible segment boundaries so focuses on the mainbody of the word has to be done.

I Characters without ascenders and descenders, such as a, e, c,etc has almost the same height and width.

I the width of characters like m and w is 3/2∗ height.

I the width of characters like l, j and i is 1/2∗ height.

I The distance between segmentation point should be at least1/3 the height.

I The segment boundaries that intersect a character twice areexcluded since they are extremely likely to be false.

30 / 38

Page 31: Mtech Monica

Rule based Segmentation

31 / 38

Page 32: Mtech Monica

Results of Words from ParagraphI Database of paragraph from 40 different persons which

includes different words are used for testing.I Total of 3588 characters are used for training SVM which

includes samples of both capital characters and smallcharacters.

Results of words from paragraph32 / 38

Page 33: Mtech Monica

Results of finite WordsWords of city names, written by ten different persons are manuallysegmented and used as training data, Same cities name written bythree different persons are used as test data.

Results of Finite words 33 / 38

Page 34: Mtech Monica

ConclusionI Wavelet features captures distinct characteristic of character efficiently compare

to structural features and thus provides more accuracy and so used for featureextraction.

I SVM and fuzzy KNN classifier are implimented for both wavelet and structuralfeatures. SVM gives more accuracy compare to fuzzy KNN about 92.307% foruppercase characters, 81.5% in lowercase characters and 96.3% for numerals.

I Less accuracy in small characters is observed due to different writing style ofperson.

I Different methods were tested to segment characters from word. SOM is usefulfor segmenting joint characters and learning method like Neural Network basedsegmentation was also tested to classify segmentation points into correct andincorrect segment, but they are not able to give satisfactory result.

I We used rule based method were we pre-segmented characters by using

information of stroke width and to improve segmentation we used rule based

segmentation method. We got segmentation accuracy of 80% for finite words

and 69.53% for different words. Segmentation accuracy reduces due to

overlapping and open characters. We got recognition accuracy of 53% for finite

words and 42% for different words. The recognition accuracy reduces due to

misclassification of characters.

34 / 38

Page 35: Mtech Monica

Future Scope

I To improve segmentation accuracy of words by methods whichcan prevent segmentation of open characters like ’u’, ’v’, ’w’,’r’ and ’b’ and can able to segment overlapping characters.

I To improve recognition accuracy by increasing number oftraining data which can include maximum variations incharacters.

35 / 38

Page 36: Mtech Monica

References

1. J. Pradeep, E. Srinivasan and S.Himavathi,”Neural Network Based Recognition System Integrating FeatureExtraction and Classification for English Handwritten”, International Journal of Engineering(IJE)Transactions B: Applications Vol. 25, No. 2,(May 2012) 99-106.

2. D. K. Patel, T. Som and M. K Singh,”Improving the Recognition of Handwritten Characters using NeuralNetwork through Multiresolution Technique And Euclidean Distance Metric”, International Journal ofComputer Applications (0975 8887) Volume 45 No.6 May 2012.

3. M. Blumenstein, B. Verma and H. Basli,”A Novel Feature Extraction Technique for the Recognition ofSegmented Handwritten Characters”,Proceedings of the Seventh International Conference on DocumentAnalysis and Recognition (ICDAR03) 0-7695-1960-1/03 IEEE 2003.

4. Sumedha B. Hallale, Geeta D. Salunke,”Twelve Directional Feature Extraction for Handwritten EnglishCharacter Recognition”, International Journal of Recent Technology and Engineering (IJRTE)ISSN:2277-3878, Volume-2, Issue-2, May 2013.

5. Amit Choudhary, Rahul Rishi and Savita Ahlawat,”Off-Line Handwritten Character Recognition usingFeatures Extracted from Binarization Technique”, American Applied Science Research Institutedoi:10.1016/j.aasri.2013.10.045.

6. Rafael M. O. Cruz,George D. C. Cavalcanti and Tsang Ing Ren,”An Ensemble Classifier For Offline CursiveCharacter Recognition Using Multiple Feature Extraction Techniques”, Center of Informatics, FederalUniversity of Pernambuco, Recife, Brazil,978-1-4244-8126-2/10/ 2010 IEEE.

7. Sepideh Barekat Rezaei, Abdolhossein Sarrafzadeh, and Jamshid Shanbehzadeh,”Skew Detection ofScanned Document Images”, Proceedings of the International MultiConference of Engineers and ComputerScientists 2013 Vol I, IMECS 2013, March 13 - 15, 2013, Hong Kong.

8. Radmilo M.Bozinovic and Sargur N. Srihari , ”Off-line Cursive Script word recognition” ,IEEE TransactionsOn Pattern Analysis and Machine Intelligence, Vol.11.No.1, January 1989.

9. H. Bunke, M. Roth and E.G.Schukat-Talamazzini, ”Offline Cursive Handwriting Recognition Using HiddenMarkov Models” Pattern Recognition, Vol.28, No.9, 1995 Elsevier Science Ltd.

10. E. Kavallieratou, N. Fakotakis and G. Kokkinakis,”An unconstrained handwriting recognition system”,International Journal on document Analysis and Recognition, Springer 2002.

36 / 38

Page 37: Mtech Monica

11. Nafiz Arica,”An Off-line Character Recognition System for free style Handwriting”,a thesis submitted tothe graduate school of natural and applied sciences of the middle east technical university,1998.

12. Yong Haw Tay, Pierre-Michel Lallican, Marzuki Khalid, Christian Viard-Gaudin, Stefan Knerr,”An OfflineCursive Handwritten Word Recognition System”,IEEE Catalogue No. 01 CH37239 2001.

13. Anshul Gupta, Manisha Srivastava and Chitralekha Mahanta,”Offline Handwritten Character Recognition”,International Conference on Computer Applications and Industrial Electronics (ICCAIE), 2011.

14. Christopher J, C. Burges,”A Tutorial on Support Vector Machines for Pattern Recognition”, BellLaboratories, Lucent Technologies ,Data Mining and Knowledge Discovery, 2, 121167 (1998) ,KluwerAcademic Publishers, Boston. Manufactured in The Netherlands.

15. Asa Ben-Hur1 and Jason Weston2,”A User’s Guide to Support Vector Machines”, 1 Department ofComputer Science Colorado State University, 2 NEC Labs America Princeton, NJ 08540 USA.

16. James M. Kaller, Michael R. Gray and James A. Givens,”A Fuzzy K-Nearest Neighbour Algorithm”, IEEETransactions on Systems, Man, And Cybernatics,Vol-SMC-15,NO-4,July/August 1985.

17. D.kavitha, P.Shamini,”Handwritten Document into Digitized Text Using Segmentation Algorithm”, 4thNational Conference on Advanced Computing, Applications & Technologies, May 2014.

18. Dhaval Salvi, Jun Zhou, Jarrell Waggoner, and Song Wang, ”Handwritten Text Segmentation usingAverage Longest Path Algorithm”, 978-1-4673-5052-5/12/ 2012 IEEE.

19. Amit Choudharya, Rahul Rishib, Savita Ahlawat, ”A New Character Segmentation Approach for Off-LineCursive Handwritten Words”,Published by Elsevier B.V.Selection and peer-review under responsibility ofthe organizers of 2013 International Conference on Information Technology and Quantitative Management.

20. Fajri Kurniawan,Mohd Shafry Mohd Rahim, Daut Daman, Amjad Rehman,Dzulkifli Mohamad and SitiMariyam Shamsuddin,”Region Based Touched Character Segmentation in Handwrittenwords”,International Journal of Innovative Computing, Information and control volume 7,Number 6,June2011.

37 / 38

Page 38: Mtech Monica

Thank You

38 / 38