coursework for issale - 2014 project demonstration
DESCRIPTION
Coursework for ISSALE - 2014 Project Demonstration. SINHALA LANGUAGE OCR. Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna. Sinhala Script consists of:. 18 vowels. 40 consonants. Sinhala Script. 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya - PowerPoint PPT PresentationTRANSCRIPT
Coursework for ISSALE - 2014 Project Demonstration
SINHALA LANGUAGE OCR
● Kasun Perera● Chamila Liyanage● Tharaka Viswakula● Laksri Wijerathna
Sinhala Script consists of:
18 vowels 40 consonants
Sinhala Script
18 modifiersother symbols (rakaranshaya, yansaya)
Font: AbhayaFont Size :12
Selected characters700 අ 708 ල්
701 ැ� 709 න්
702 නි 710 ණ
703 ර 711 සි
704 ස 712 ත්
705 ත 713 යි
706 ක් 714 එ
707 කි 708 ල්
Document Image
Image document has 16 different character types and 11 samples of each character type.
Line and Main Body segmentation● All lines were segmented correctly
o No of Lines in input Image -9 o Program Outputs 9 line segmentso 100% accuracy
● All Main bodies were segmented correctly(No diacritics) o 100% accuracy
Decision Tree Recognition results● Creation of Training(35) and Test data(15)● Decision Tree created using Weka - using Training data● Tested accuracy using Test data
Overall accuracy:
70 %
Bad recognition Chars702- නි / 708- ල් / 711- සි / 712- ත්
Tesseract Recognition results
Overall accuracy:93.181%
Complete OCR- DT MethodOverall accuracy - 28%
Complete OCR - Tesseract
Overall accuracy - 92.8%
Tesseract Output File
Conclusion
Test dataset (15)● Tesseract Accuracy- 93%● DT Accuracy- 70%
Document Image● Tesseract Accuracy- 92.8%● DT Accuracy- 28%
ස්තුතියි...!(Thank you...!)