thanks google hindi ocr guidelines
TRANSCRIPT
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 1/15
Installation guidelines for Hindi/Indic languages OCR (Windows)
Thanks Google, UBUNTU & all open source resources for Internet based Gayatri and Yagya whichhelped us to develop indi!Indic languages "#$ %"ptical #haracter $ecognition for our 'ega
Unicode conversion pro(ect of )edic *iterature. गगूल , ऊबणूटू और सभी खलु ेोत ससंाधन को हमारे
विैदक िसाह! के मगेा !"ूनको# $%ातंर& %'र!ो(ना के "ल) िह*द+ !भारती! भा,ा- ंके -सी.र
/012टकल कैरे3टर %'रवत4क5 6%ल7ध करान ेके "ल) ध*!वाद8This will help us to propagate & i'ple'ent "U$ +I**!"ur ole'n -ledge for everyone to have a
life like our -.Gurusatta. !ह सहा!ता हर िकसी का (ीवन %8ूग9ुस:ा की तरह (ीने के "ल); हमारे <=
सकं>% के ?@ारA?सार करने मB मदद करेगा8
वदंनी!ा माता(ी A CCबटेाD मEुे और गु9(ी को कभी Fलग मत करनाGHH िIर बोली;ं CCबेटा; .ने वाल ेसम!
मB द "ुन!ा F%नी समJ!ा-ं का समाधान मरेे गीत मB और ग9ु(ी के ?व@न मB K ू L=ेगीGHH स@ तो है; "Mव
और MNO को भला Fलग िक!ा भी कैस े(ा सकता हैP A QN, !Rुम की Eलक A EाLकी %S8 TU
Overview:
/. can docu'ent. %0112-I for better output I'age or -23 file.
4. 3or post5processing for scanned pages, save!e6port -23 as i'ages into one folder.
0. Use can Tailor for post5processing of scanned pages.
7. 8ake -23 file fro' i'ages by creating -23. 3iles 9 #reate -23 fro' 'ultiple files 9 :dd files.#heck & correct serial of the pages!docu'ent.
;. Use gI'age$eader ! )eit"#$ for "#$. ave file in UT35< for'at.
=. #heck spellings using spell checker.
>. #onvert font using font convertor. -rint for 'anual proof reading.
<. #heck 'anually logical errors of the docu'ent.
NOTE: Tesseract hin.traineddata found working good for #hanakya like fonts.
Required Installation instructions:%+e should be connected to Internet throughout the installation process.
1. gs?1;w04.e6e 5 G-* Ghotscript http@!!sourceforge.net!pro(ects!ghostscript! 2. (re5=u075windows5i;<=.e6e 5 Aava $unti'e =.1
http@!!www.oracle.co'!technetwork!(ava!(avase!downloads!(re=5downloads5/=0>;?;.ht'l 3. vcredistC6<=.e6e 5 8 )# $edistributable etup.
http@!!www.'icrosoft.co'!en5in!download!details.asp6DidE;;;;
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 2/15
4. can Tailor 5 :n interactive post5processing tool for scanned pages.http@!!sourceforge.net!pro(ects!scantailor!
5. tesseract5ocr5setup50.14.14.e6ehttp@!!tesseract5ocr.googlecode.co'!files!tesseract5ocr5setup50.14.14.e6e
a. 8ake Internet connection "N.b. #hoose #o'ponents F
2ownload & Install indi *anguage 2ata 2ownload & Install 8ath ! Huation 2etect
c. Installation co'plete successfully.
! Restart is i"#ortant i""ediatel$!7. #opy5paste hin.traineddata file #@-rogra' 3ilesTesseract5"#$tessdata folder if it is not
downloaded there fro' http@!!tesseract5ocr.googlecode.co'!files!tesseract5ocr50.14.hin.tar.gJ 8. Try 'ore Indic language traineddata files & paste into above folder fro'
http@!!code.google.co'!p!parichit!downloads!list . Thanks to Indu and $K) $a'anhttp@!!code.google.co'!p!parichit! for their -arichit % िपरचत pro(ect. :ccuracy is low.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 3/15
%can Tailor & 'n interactive #ost&#rocessing tool for scanned #ages!http@!!sourceforge.net!pro(ects!scantailor!
/. 2ownload and install.4. -ut all scanned i'ages ! e6ported i'ages fro' -23 file into one folder.
0. tart can Tailer. "pen new pro(ect. elect folder.
7. elect all files ! reHuired files. #lick L3i6 2-I even if M... #lick "K.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 4/15
;. 3i6 orientation here if needed. plit pages if needed %ones 'anual & then auto'atically !'anually.. 2eskew all pages auto'atically, click on arrow button. %8aking pages vertical forbetter scanning
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 5/15
=. elect content. 3or selecting all pages contents auto'atically, click on arrow button.
>. #heck all pagesO selection 'anually and correct if needed.<. 8argins is optional. Try it if needed.?. "utput. "n clicking this tab, single page will be output in out folder of your chosen folder.
I'portant@ #hange resolution 2-I E 011 & 4P5;P thicker %=11 & 4P5/1P thicker for allpages for better "#$.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 6/15
3or auto'atically output, click on arrow button.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 7/15
/1. "utput is ready into out folder of chosen folder for "#$.
I software Installation: * ' si"#le version!http@!!code.google.co'!p!tesseract5ocr!wiki!0rd-arty
/. gi'agereaderC1.?5/Cwin04.e6e G I'age $eader.http@!!sourceforge.net!pro(ects!gi'agereader!
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 8/15
sing I"age Reader:
/. "pen it through progra' shortcut.4. #onfigure it for indi language data.
0. #hoose languages tab.
7. #usto' Tesseract language for indi. #lick :dd button.-refi6 F hin Na'e F indi or हनदी I" #ode F hiCIN-refi6 F gu( Na'e F Gu(arati or ગજુરાતી I" #ode F guCIN
#lick "K.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 9/15
;. #lick apply.
! Restart is i"#ortant i""ediatel$!>. n(oy using G I'age $eader for i'age and pdf files.<. very ti'e when we open G I'age $eader, we have to configure 9 apply.
?. +e have to set indi language option every ti'e.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 10/15
/1. "pen any i'age!pdf file.
//. elect area to be scanned.
/4. #lick $ecogniJe election.
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 11/15
/0. +ait for "#$ to be done.
/7. ave file in Unicode at your desired destination.
I software Installation: * 'n advanced version! http@!!code.google.co'!p!tesseract5ocr!wiki!0rd-arty
1. +ietOCR, )iet"#$ is a Aava GUI frontend for Tesseract "#$ engine, providingcharacter recognition support for co''on i'age for'ats, and 'ulti5page i'ages.http@!!vietocr.sourceforge.net! http@!!sourceforge.net!pro(ects!vietocr!
4. 2ownload and e6tract the folder.0. To download hin.traineddata, go to settings 9 2ownload *anguage data 9 elect *anguage and
download.7. tart with ocr.bat file.;. et "#$ language to indi.=. tart "#$Oing any i'age file.>. -23 files are supported but that reHuire 'ore technical support. -ls go through read'e.ht'l file.
+e were unable to open -23 files.
-ore lin.s for $our el#!
/. Use indi s#ell cec.er http@!!www.awgp.in!spellchecker! or http@!!www.bhashagiri.co'!.
4. Use indi *ekhak. http@!!www.awgp.in!hindilekhak! or https@!!dl.dropbo6.co'!u!;0=>411=!indi5*ekhak5$elease541.1/.41/4.Jip for fontconversion/t$#ing and -ra'ukh Type -ad %#urrently it supports 41 Indian languages 5http@!!www.vishalon.net!-ra'ukhI8!-ra'ukhType-ad.asp6 for typing.
0. Bhasha IndiaQs TBI* #onverter 0.1 http@!!bhashaindia.co'!2ownloads!-ages!ho'e.asp6 ,TBI* #onverter 0.1 with 8icrosoft .NT 3ra'ework 0.; ervice pack /http@!!www.'icrosoft.co'!en5in!download!details.asp6DidE44 http@!!download.'icrosoft.co'!download!1!=!/!1=/311/#5<>;457=115:/?<5;04/7#=?B;/3!dotnetf60;setup.e6e
3or "33*IN use %40/8B, 8icrosoft .NT 3ra'ework 0.; ervice pack / %3ull -ackagehttp@!!www.'icrosoft.co'!en5us!download!details.asp6DidE4;/;1 http@!!download.'icrosoft.co'!download!4!1!!41?17/05>/43570<#5?<<532::>?:<:#02!dotnetf60;.e6e
7. #co"ing integrated "odule F :ll5in5"ne %"#$spell checkerfont conversion withinone pack F visit http@!!www.bhashagiri.co'! .
;. Irfan )iewer http@!!www.irfanview.co'! for i'age file conversion in batch.=. linu65intelligent5ocr5solution fro' Nalin & athyan.
http@!!code.google.co'!p!linu65intelligent5ocr5solution!downloads!list .
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 12/15
E0a"#les of Hindi OCR
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 13/15
आपकाल का अधयाम
दे य! " #ा$य! " "
क%& 'मय (' ेह)त ेह*, +क) हम -ामानय
कह 'कत ेह* 'ामानय 'मय म/ 'ामानय 0कार क1
2त3य!4 चलत ेरह ेका 5चय 'म6 म/ आता
ह7 र)+ आदमी प7 दा ह)त ेह*, 89त ेह*, :े ती;8म
करत ेह*, <ाह;=ादी ह)त ेह*, 8ाल;8> ेह)ते ह*, 8?े
ह)त ेह* 5र म@त क ेम% A ह म/ चल े+ात ेह* यह 'ामानय
Bम ह7 C2 ेचलता रहता ह7 $'म/ अचD# ेक1 क)E F
8ात हG ह7, ले Hक क#ी;क#ी &ाE (' े'मय #ी
आत ेह*, +क) हम लIJ कह 'कत ेह* +क)
K आपकाल ह7 कहा +ा 'कता ह7 म% Lय के +ी
म/ आपकाल #ी कE 8ार आत ेह* आपकाल म/
-मानय 0कार क) 0HBयाMA ल?:?ा +ाती ह* Nर
म/ आ2 ल2 +ाM &Oपर +ल रहा ह), C' 'मय
:Pता, हाा '8 क%& &)?कर +लत े&Oपर यर पाी
आपकाल का अधयाम Q
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 14/15
दे य! #ा$य! "R
दे ताS के अ% Tह क1 8ात +ीप '8 े'U ी
ह)2ी दे ता ाम ही $'लM र:ा 2या ह7 +, ेHदया
करते ह* 0ाV कर ेक1 $W&X ' े+ते ही ल)2 Cक1
पU +ा करते ह*, Cपा'ा करत ेह*, #+ करत ेह*, +ाM; $' पर चार कर/
दे ता दे त ेत) ह*, $'म/ क)E =क हG ह7 अ2र े
दे त े ह)त ेत) Cका ाम दे ता र:ा 2या ह)ता
दे ा का अYZ ही ह)ता ह7;दे ेपीता दे ेाल े' ेअ2र
मर/2 ेाला क%& माA 2ता ह7 त) क)E 8े +ा ात हG ह7
पर चार करा प?े 2X +, अX:र दे ता दे त े[या ची+
ह* \ दे ता 8ही 8ी+ दे ते ह* +) Cके पा' ह7 +'के
पा' +) ची+ ह)2ी, ही त) द ेपाM2ा दे ता के पा'
']Z Mक ची+ ह7 5र C'का ाम ह7;दे म दे
कहत ेह*;3, कमZ4 5र P#ा;ती! क1 अW&ाE क),
^े _तX क) $ता दे े के 8ाद म/ दे ता `नत ह)
+ात ेह*, a ह) +ात ेह7 5र कहत ेह* +, +) ; हम
आपक) द े'कत ेY ेहम ेह द ेHदया +द अXपका
काम ह7 Hक, +) ची+ हम ेदी ह7, C'क) +हD र/ #ी आप
म% ा'8 'म6/, हाA $Pते माल क1 5र C'ी HकPम क1
']लता पाMA
7/27/2019 Thanks Google Hindi Ocr Guidelines
http://slidepdf.com/reader/full/thanks-google-hindi-ocr-guidelines 15/15
Installation guidelines for Hindi OCR (1NT 23!45)
/. #onnect to Internet.4. "pen ynaptic 8anager.0. earch for tesseract5ocr. Install tesseract5ocr and tesseract5ocr5hin language file.7. Install.
;. "pen Google and search for gI'age$eader downloadhttp@!!sourceforge.net!pro(ects!gi'agereader! .
=. 2ownload .deb file fro' ourceforge.>. Install it.<. "pen gI'age$eader .?. #onfigure indi laguage./1. 3ile 9 #onfigue 9 *anguages 9 :dd 9 -refi6Ehin, Na'eEindi,
I" codeEhiCIN//. "K and :pply./4. elect reHuired language . indi 9 hi/0. "pen file.
/7. can selected area or full page or all pages./;. can Tailor 5 :n interactive post5processing tool for scanned pages.
http@!!sourceforge.net!pro(ects!scantailor!/=. Use synaptic 'anager in UBUNTU for installing cantailor software./>. Tan.s to Nalin & athyan Ai for their pro(ect linu65intelligent5ocr5solution .
*inu65intelligent5ocr5solution5/.=
*ios is a free and open source software for converting print in to te6t using either scanneror a ca'era, It can also produce te6t out of scanned i'ages fro' other sources such as - 23,
I'age or 3older containing I'ages. -rogra' is given total accessibility for visually i'paired. *iosis written in python, and we release it under G-*0 license. *ios will work with 2ebian basedoperating syste's. There are great 'any possibilities for this progra', 3eedback is the key to it,6pecting your feedback at Nalin.6.*inu6RG'ail.co' or at 1?77=1/44/;.
Wit su#er I and wor.ing great for Hindi! 8ore 'odifications are going on. Needed yourhelp to i'prove. They are working selflessly for )isually I'paired -ersons.