Download - optical character recognition system
![Page 1: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/1.jpg)
OCR System
Presented By:-
Vijay apurva(9910103462),
From 4th year,CSEGuided By:-
Mr. Ankur
kulhari
![Page 2: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/2.jpg)
The current capacity to translate paper documents
quickly and accurately into machine readable form using
optical character recognition technology augments the
opportunities in document searching and storing, as well
as the automated document processing. A fast response in
translating large collections of image-based electronic
documents into structured electronic documents is still a
problem. The availability of a large number of processing
units in Grid environments and of free optical character
recognition
tools can be exploited to produce a fast translation.
ABSTRACT:-
![Page 3: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/3.jpg)
CONTENTS :-
What is OCR?
When and Why OCR?
Existing System.
Proposed System.
Architecture of OCR.
Algorithms of OCR.
Modules of OCR.
Design of OCR.
Design of Screen shots for OCR.
Conclusion.
![Page 4: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/4.jpg)
WHAT IS OCR? :-
OCR stands for Optical Character
Recognition. It is one such system that allows us to
scan printed, typewritten or hand written text
(numerals, letters or symbols) and/or convert scanned
image in to a computer process able format, either in
the form of a plain text or a word document.
Later the converted documents can be edited, used
or reused in other documents. Thus the documents
become editable.
![Page 5: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/5.jpg)
WHEN AND WHY OCR? :-
OCR is used when recreating a similar document in
paper as a document in electronic form takes more
time.
The converted text files take less space than the
original image file and can be indexed. Hence the use
of OCR adds an advantage to the user who had to
deal with conversion of great amount of paper works
in to electronic form.
![Page 6: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/6.jpg)
EXISTING SYSTEM:-
In the running world there is a growing
demand for the users to convert the printed documents
in to electronic documents for maintaining the security
of their data. Hence the basic OCR system was invented
to convert the data available on papers in to computer
process able documents, So that the documents can be
editable and reusable.
![Page 7: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/7.jpg)
PROPOSED SYSTEM:-
Our proposed system is OCR ON A
GRID INFRASTRUCTURE which is a character recognition
system that supports recognition of the characters of
multiple languages. This feature is what we call grid
infrastructure which eliminates the problem of
heterogeneous character recognition. In this context, Grid
infrastructure means the infrastructure that supports
group of specific set of languages. Thus OCR on a grid
infrastructure is multi-lingual.
![Page 8: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/8.jpg)
ARCHITECTURE :-
The Architecture of the optical character recognition
system on a grid infrastructure consists of the three main
components. They are:-
Scanner
OCR Hardware or Software
Output Interface
![Page 9: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/9.jpg)
Document
Illuminator
Detector
Document Analysis
Character Recognition Contextual
Processing
Scanner
OCR Hard-Ware Or Soft-Ware
Document image
Output Interface
Recognition Results
To application user
![Page 10: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/10.jpg)
TYPES OF TRAINING:-
Basically there are two major types of training using which
we can train a neural network system. They are:-
Supervised Training
Unsupervised Training
![Page 11: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/11.jpg)
FLOWCHART FOR UNSUPERVISED LEARNING:-
![Page 12: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/12.jpg)
KOHONEN NETWORK:-
The Kohonen network is presented with data, but the correct output that corresponds to that data is not specified. Using the Kohonen network this data can be classified into groups.
![Page 13: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/13.jpg)
FLOWCHART FOR KOHONEN TRAINING:-
![Page 14: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/14.jpg)
ALGORITHMS OF OCR:-
TRAINING ALGORITHM:-
One of the most common learning algorithms is called
Hebb’s Rule. This rule was developed to assist with
unsupervised training.
Hebb’s rule is expressed as:
Δ Wi j= µ ai aj (d-a)
![Page 15: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/15.jpg)
MODULES :-
The Modules that were identified in the Optical
Character Recognition system are as follows:-
Document Processing
Neural network System Training
Document Recognition
Document Editing and
Document Searching
![Page 16: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/16.jpg)
DESIGN OF OCR :-
The design of our OCR system can be
best explained with the following diagram:-
Scan
Store
Recognize Editing
Searching
Document and users Database
![Page 17: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/17.jpg)
OVERALL USECASE DIAGRAM:-
end-user1end-user2
Document modification Document deletion
Document recognition
scan documents
store documents
Document processing
<<includes>>
<<includes>>
Document processing
Document editing
administrator
Trains the system
end-user
![Page 18: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/18.jpg)
OVERALL CLASS DIAGRAM:-
Document
docid : integerdocname : Stringdocsize : integerdoctype : String
getDocumentDetails()scanDocument()covertToImage()storeImage()
Editor
cut()copy()paste()new()open()find()
HelpFrame
HEntry
hLineClear()vLineClear()findBounds()
TrainingSet
inputCount : intoutputcount : inttrainingSetCount : int
setInputCount()setOutputCount()setTrainingSetCount()setClassify()
1..*
1
1..*
1
MainScreen
editor()helpFrame()printedFrame()handWrittenFrame()
Entry
recog : intdownSampleLeft : intdownSampleRight : intdownSampleTop : intdownSampleBottom : int
hLineClear()hLineClearWithin()vLineClear()vLineClearWithin()
PrintedFrame
open_action()train_action()topen_action()recogniseAll_action()
1..*
1
1..*
1
KohenNetwork
LearnMethod = 1:intLearnRate = 0.3:doublequitError : double
copyWeights()clearWeights()winner()normalizeInput()
1..*1..* 1..*1..* 1..*1..* 1..*1..*
![Page 19: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/19.jpg)
DESIGN OF SCREEN SHOTS FOR OCR:-
Main Screen
Hand Written Recognition Screen
Scanned Document Recognition Screen
Training Screen
Recognition Screen
Editor Screen
The screenshots that describe the operations carried
out by our system are as follows :-
![Page 20: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/20.jpg)
![Page 21: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/21.jpg)
![Page 22: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/22.jpg)
![Page 23: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/23.jpg)
![Page 24: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/24.jpg)
![Page 25: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/25.jpg)
![Page 26: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/26.jpg)
![Page 27: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/27.jpg)
![Page 28: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/28.jpg)
![Page 29: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/29.jpg)
![Page 30: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/30.jpg)
![Page 31: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/31.jpg)
![Page 32: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/32.jpg)
![Page 33: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/33.jpg)
CONCLUSION:-
The Grid infrastructure used in the implementation
of Optical Character Recognition system can be efficiently
used to speed up the translation of image based
documents into structured documents that are currently
easy to discover, search and process.
The automated entry of data by OCR is one of the most attractive, labor reducing technology
The recognition of new font characters by the system is very easy and quick.
We can edit the information of the documents more conveniently and we can reuse the edited information as and when required.
The extension to software other than editing and searching is topic for future works.
![Page 34: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/34.jpg)
• Training and recognition speeds can be increased greater and greater by making it more user-friendly.
• Many applications exist where it would be desirable to read handwritten entries. Reading handwriting is a very difficult task considering the diversities that exist in ordinary penmanship. However, progress is being made.
![Page 35: optical character recognition system](https://reader030.vdocuments.net/reader030/viewer/2022020717/54825930b07959490c8b47a5/html5/thumbnails/35.jpg)