coursework for issale - 2014 project demonstration

13
Coursework for ISSALE - 2014 Project Demonstration SINHALA LANGUAGE OCR Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna

Upload: cade

Post on 06-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Coursework for ISSALE - 2014 Project Demonstration. SINHALA LANGUAGE OCR. Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna. Sinhala Script consists of:. 18 vowels. 40 consonants. Sinhala Script. 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Coursework for ISSALE - 2014          Project Demonstration

Coursework for ISSALE - 2014 Project Demonstration

SINHALA LANGUAGE OCR

● Kasun Perera● Chamila Liyanage● Tharaka Viswakula● Laksri Wijerathna

Page 2: Coursework for ISSALE - 2014          Project Demonstration

Sinhala Script consists of:

18 vowels 40 consonants

Page 3: Coursework for ISSALE - 2014          Project Demonstration

Sinhala Script

18 modifiersother symbols (rakaranshaya, yansaya)

Font: AbhayaFont Size :12

Page 4: Coursework for ISSALE - 2014          Project Demonstration

Selected characters700 අ 708 ල්

701 ැ� 709 න්

702 නි 710 ණ

703 ර 711 සි

704 ස 712 ත්

705 ත 713 යි

706 ක් 714 එ

707 කි 708 ල්

Page 5: Coursework for ISSALE - 2014          Project Demonstration

Document Image

Image document has 16 different character types and 11 samples of each character type.

Page 6: Coursework for ISSALE - 2014          Project Demonstration

Line and Main Body segmentation● All lines were segmented correctly

o No of Lines in input Image -9 o Program Outputs 9 line segmentso 100% accuracy

● All Main bodies were segmented correctly(No diacritics) o 100% accuracy

Page 7: Coursework for ISSALE - 2014          Project Demonstration

Decision Tree Recognition results● Creation of Training(35) and Test data(15)● Decision Tree created using Weka - using Training data● Tested accuracy using Test data

Overall accuracy:

70 %

Bad recognition Chars702- නි / 708- ල් / 711- සි / 712- ත්

Page 8: Coursework for ISSALE - 2014          Project Demonstration

Tesseract Recognition results

Overall accuracy:93.181%

Page 9: Coursework for ISSALE - 2014          Project Demonstration

Complete OCR- DT MethodOverall accuracy - 28%

Page 10: Coursework for ISSALE - 2014          Project Demonstration

Complete OCR - Tesseract

Overall accuracy - 92.8%

Page 11: Coursework for ISSALE - 2014          Project Demonstration

Tesseract Output File

Page 12: Coursework for ISSALE - 2014          Project Demonstration

Conclusion

Test dataset (15)● Tesseract Accuracy- 93%● DT Accuracy- 70%

Document Image● Tesseract Accuracy- 92.8%● DT Accuracy- 28%

Page 13: Coursework for ISSALE - 2014          Project Demonstration

ස්තුතියි...!(Thank you...!)