senior project paper

37
Kurtz 1 Computer Vision: Optical Character Recognition Mark Kurtz Fontbonne University Department of Mathematics and Computer Science ABSTRACT Humans are very visual creatures. With everything we do or think about, our visual system is generally involved. The activities that incorporate our visual system span everything from reading to driving. Computer Vision is the study of how to implement the human visual system and how we perform visual tasks into machines and programs. My studies and this paper revolve around this topic, specifically Optical Character Recognition. Here OCR (Optical Character Recognition) tries to recognize characters or words inside of images. My focus revolved around implementing OCR through custom algorithms I wrote after studying techniques used in the computer vision field. I successfully created a program with a few algorithms with the rest of the algorithms untested but documented. The results for the tested algorithms are documented in this paper.

Upload: mark-kurtz

Post on 13-Feb-2017

61 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Senior Project Paper

Kurtz 1

Computer Vision: Optical Character Recognition

Mark Kurtz

Fontbonne University

Department of Mathematics and Computer Science

ABSTRACT

Humans are very visual creatures. With everything we do or think about, our visual

system is generally involved. The activities that incorporate our visual system span everything

from reading to driving. Computer Vision is the study of how to implement the human visual

system and how we perform visual tasks into machines and programs. My studies and this

paper revolve around this topic, specifically Optical Character Recognition. Here OCR (Optical

Character Recognition) tries to recognize characters or words inside of images. My focus

revolved around implementing OCR through custom algorithms I wrote after studying

techniques used in the computer vision field. I successfully created a program with a few

algorithms with the rest of the algorithms untested but documented. The results for the tested

algorithms are documented in this paper.

Page 2: Senior Project Paper

Kurtz 2

1.1 Introduction

Computer Vision has slowly implemented itself into our lives in limited aspects in the

fields of automotive drive assistance, eye and head tracking, film and video analysis, gesture

recognition, industrial automation and inspection, medical analysis, object recognition,

photography, security, 3D modeling, etc. (Lowe) These applications and the algorithms behind

them are extremely specific. Because the programs created from the algorithms are so specific,

much of the code does not successfully transfer among different applications. Since much of

the code does not transfer among applications, there is no master code or algorithm for the

computer vision field. Hense even the visual system of a two-year old cannot be replicated—

computer programs still cannot successfully find all the animals in a picture. The reasons for this

are many which simplify down to one point: the human visual system is extremely hard to

understand and replicate. The process becomes an inverse problem where we try to form a

solution that resembles what our eyes process. This seems easy, but there are many different

hidden layers between the input images on our eyes to what we perceive. With numerous

unknowns, much of the focus in the computer vision field has resorted to physics-based or

statistical models to determine potential solutions. (Szeliski)

Because of how well our visual system handles images, I underestimated the complexity

of computer vision. It seems easy to select different objects in an image in our everyday world.

Looking around you can readily distinguish objects in your surroundings, what they are, how far

away they are, and their three-dimensional shape. To determine all of this our brains transform

the images taken in by our eyes in many different steps. As an example, we may perceive colors

darker or lighter than what the actual color value is. This is how we see the same shade of red

Page 3: Senior Project Paper

Kurtz 3

on an apple throughout the day despite the changing colors of light reflecting through the

atmosphere. An example is in the picture that follows. The cells A and B are the exact same

color, but our visual system changes the color we perceive based on the surrounding colors:

Current algorithms have yet to effectively replicate the human color perception system.

(McCann) Other visual tricks our eyes perform range from reconstructing a three-dimensional

reality from two-dimensional images in our retinas to perceiving lines and edges from missing

image data. An article titled What Visual Perception Tells Us about Mind and Brain explains this

in more detail. “What we see is actually more than what is imaged on the retina. For example,

we perceive a three-dimensional world full of objects despite the fact that there is a simple

two-dimensional image on each retina. … Thus perception is inevitably an ambiguity-solving

process. The perceptual system generally reaches the most plausible global interpretation of

the retinal input by integrating local cues”.

Despite the complexity of the human visual system, many people have tried to replicate

it and implement it in machines and robotics. Many algorithms have been developed for

specific applications across different fields and disciplines. One of the most prevalent in

consumer applications is facial recognition. In fact, it has nearly become a part of everyday use

because of its use in images in social networks such as Facebook, images processed in digital

Page 4: Senior Project Paper

Kurtz 4

cameras, and images processed in photo editing software such as Picasa. The most successful

algorithm creates Eigenfaces, which are a set of eigenvectors, and then looks for these inside of

an image. The eigenvectors are derived from statistical analysis of many different pictures of

faces. (Szeliski) In other words, it creates a template from other pre-labeled faces and searches

through an image to see where they occur. This way seems surprising to me since the algorithm

never tries to break apart the constituents of the image or even determine what is contained

within the image. The algorithms never process corresponding shapes, three-dimensional

figures, edges, etc. from the image data. With no other processing performed other than a

simple template search, the program has no clues about context leading it to label faces within

an image that we may consider unimportant. For example, I used Picasa to sort my images by

who was in each picture. It does this through facial recognition. However, in some images it

recognized faces that were far off in the background, faces that were out of focus, and even a

face on the cover of a magazine someone was reading in the background of the image.

Not only does facial recognition suffer from a lack of context, but also the exact method

of template matching by using Eigenfaces in facial recognition makes the algorithm extremely

specific. This speaks volumes about the methods used in computer vision. The methods and

algorithms for facial recognition cannot readily be used to identify animals in an image without

completely retraining the algorithm. By extension, the specificity of algorithms developed for

each application in computer vision cannot transfer over to another application. I see this as a

huge problem since we cannot possibly have thousands of different image analysis processes in

our brain running all at once to look specifically for certain objects. I believe the way our brain

determines there is a person in an image is the same as how it determines there are animals in

Page 5: Senior Project Paper

Kurtz 5

an image. The current field of computer vision is moving towards this template matching. While

this may work in specific situations, I cannot see how this will ever replicate human vision

because of the reasons stated before. To break away from the specificity of the computer vision

field I took a few ideas from lower level processing techniques, developed my own algorithms

in place of the ones used, and then explained a new matching technique.

1.2 New Techniques Needed in Computer Vision

Over the decades that computer vision has been studied and developed, it has not

progressed as well as most predicted or hoped for. In fact it has led some who contributed and

pioneered the field to form the extreme view that computer vision is dead. (Chetverikov) While

I do not believe the study of computer vision is dead, I think new algorithms are needed in the

field. The algorithms need to move away from the specific applications and involve more ideas

from neuroscience and biological processes.

Jeff Hawkins, a pioneer of PDAs (Personal Digital Assistant, essentially the precursor to

the smart phone) and now an artificial intelligence researcher, spoke at a TED (Technology,

Entertainment, and Design) conference in 2003. His speech focused on artificial intelligence. In

it he states not only is computer vision moving in the wrong direction, but also the entire field

of artificial intelligence. He explains that this comes from an incorrect, but strongly held belief,

that intelligence is defined by behavior--we are intelligent because of the way we do things. He

counters that our intelligence comes from experiencing the world through a sequence of

patterns that we store and recall to match with reality. In this way we make predictions about

what will happen next, or what should happen next. An example he gives is our recognition of

Page 6: Senior Project Paper

Kurtz 6

faces. As observed through study when humans view a person, we first look at an eye, then the

next eye, then at the nose, and finally at the mouth. This is simply explained by predictions

happening and then being confirmed as we observe our world. We expect an eye, then an eye

next to it, then a nose, then a mouth. If we see something different it will not match up with

our predictions. Here learning or more concentrated analysis will occur. (How Brain Science Will

Change Computing)

I developed my research around the direction of prediction as explained by Jeff

Hawkins. In this way patterns, sequences, and predictions can be used in object recognition and

classification. Logically pattern matching seems to make more sense than specific template

matching. By looking for exact matches to templates, we will never be able to reproduce visual

tricks such as when we see shapes in clouds. Also, template matching often produces false

results if an object is shaded differently or partially hidden. (Ritter, G. X., and Joseph N. Wilson)

Pattern matching seems to offer a fix for the different lighting conditions that occur in images.

Also, pattern matching may have a better chance of identifying objects which are slightly

obscured or are missing some data. This is something that must be studied further through

experimentation, however.

1.3 Simplifying Things to Optical Character Recognition

In my beginning research, I hoped to apply the ideas I developed to full images for

classification and recognition of objects. Unfortunately time constraints limited my research

and implementation. Thus I decided to focus on OCR (Optical Character Recognition). This field

is a subset of computer vision which focuses on text recognition in images. In OCR there is a

Page 7: Senior Project Paper

Kurtz 7

limited number of objects that can occur within the images. Also, the images processed in OCR

are much simpler than the full images processed in computer vision. In most cases it is a 2-

dimensional binary image such as an image of a page within a book. While it is simpler than

computer vision, the algorithms I created can be implemented within OCR since it is a field

within computer vision.

OCR is used by many industries and businesses. It is often packaged with Adobe PDF

software and document scanning software. Also, the United States Post Office has heavily

implemented OCR to recognize addresses written on packages and letters. OCR is generally

divided into two methods. The first is matrix matching where characters are matched pixel by

pixel to a template. The second is feature extraction where the program looks for general

features such as open areas, closed shapes, diagonal lines, etc. This method is applied much

more than matrix matching, and is much more successful. (What’s OCR?)

Feature extraction has become the default for OCR software. It works by analyzing

characters to determine strokes, lines of discontinuity, and the background. From here the

program builds up from the character to the word and then assigns a confidence level after

comparing to a dictionary of words. This seems to work well for images converted straight from

computer documents with a 99% accuracy rate. For older, scanned papers the accuracy rate

drops and varies wildly from 71% to 98%. (Holley, Rose)

The method I have developed follows the feature extraction method. The ideas of

feature extraction have worked very well within OCR so far and seem to resemble the idea of

prediction matching described earlier. The problem is defining what features are and how to

decode them inside of an image.

Page 8: Senior Project Paper

Kurtz 8

1.4 Describing the Overall Idea

The general idea of my algorithms in computer vision and OCR is to seperate an image

into constituent objects and then define those objects by their surprising features. In doing this,

the algorithm builds a definition of an object rather than a template for an object. It seems

more natural to describe objects by a definition rather than a template, so I pursued algorithms

that defined objects by definitions. For example, when we describe a face we do not build an

exact template from thousands of faces we have seen in the past. Instead we describe a face as

having two eyes, a nose, and a mouth. These are features that distinguish a face from anything

else. If there were no features that protruded from the interior of the face, we would have a

hard time distinguishing it. Definitions work the same for outlines of an object, too. We can

easily draw the outline of a dolphin because we know the border points that are most

memorable and stick out from a regular oval or other shape. For a dolphin it is the tail, dorsal

fin, and the mouth.

While definition building seems to be a more natural way at understanding images, it

also may offer reasons as to why we see objects in clouds or ink blobs. I believe we see images

in these objects because certain features resemble patterns we have seen before in other

objects. We may see a face in an ink blob because there are two dots to represent eyes and a

nose all in the correct relation to each other. We may see a dolphin in the clouds because a part

of it resembles the dorsal fin and the nose. In both of these examples general objects portray

specific objects because certain key features match up.

Page 9: Senior Project Paper

Kurtz 9

1.5 Lower Level Processing

Lower level processing for computer vision defines finding key points, features, edges,

lines, etc. in an image. At the lower level no matching takes place. The focus is to decode the

image into basic points, lines, and shapes for processing later on. (Szeliski, Richard) One of the

many lower level processes is edge detection. Edges define boundaries between regions in an

image and occur when there is a distinct change in the intensity of an image. There are tens if

not hundreds of algorithms written for edge detection ranging from the simple to the extremely

complex. (Nadernejad, Ehsan)

The most widely used algorithms for edge detection are the Marr-Hildreth edge

detector and the Canny edge detector. (Szeliski, Richard) The general algorithm for the Marr-

Hildreth edge detector first applies a Gaussian smoothing operator (a matrix which

approximates a bell-shaped curve) and applies a two dimensional Laplacian to the image

(another matrix which is equivalent to taking the second derivative of the image). The Gaussian

reduces the amount of noise in the image simply by blurring it. This has the unwanted effect of

losing fine detail in the image, though. The Laplacian is applied to take the second derivative of

the image. The idea is that if there is a step difference in the intensity of an image, it is

represented by a zero crossing in the second derivative. The Canny edge detector also begins by

applying a Gaussian smoothing operator. It then finds the gradient of the image at each point to

indicate the presence of edges while suppressing any points that are not maximum gradient

values in a small region. After all this has been performed, thresholding is applied by using

hysteresis which applies a high and low threshold. Again, the Gaussian loses detail as it tries to

reduce the amount of noise in an image. Both the Marr-Hildreth and the Canny edge detectors

Page 10: Senior Project Paper

Kurtz 10

are very expensive in terms of computation time because of the operations that are involved. I

stepped away from these approaches and tried to look at edge detection in a simpler way that

could be reproduced by artificial neural networks.

Artificial neural networks were inspired by the way biological nervous systems process

information. The neural networks are composed of a large number of processing elements that

work together to solve specific problems. Neurons inside the neural networks are

interconnected and feed inputs into each other. If an operation applied to all of the inputs into

a neuron is above a certain threshold then the neuron sends an input to other neurons. The

way the neurons are connected, the thresholds set for each, and the operations performed on

each all can change and are adapted to better the performance of an algorithm. This replicates

the learning process in biological brains and nervous systems. Artificial neural networks have

been effectively applied in pattern recognition and other applications because of their ability to

change and adapt to new inputs. (Stergiou, Christopher, and Dimitrios Siganos)

Neural networks have a downside, though. They need many sets of training data in

order to achieve accurate results. These sets of training data take a lot of time to make. So,

instead of abandoning neural networks I tried to combine the best of traditional computing and

neural networks for my edge detection algorithms. I built upon using arrays as inputs and

outputs to form the neurons of the neural networks. Next I predetermined what the operations

would be, and then applied a threshold which could be changed to maximize the accuracy of

the algorithm.

My first idea built upon the previously described combination of neural networks and

predefined algorithms. I explored the ideas that edges are step changes (non-algebraic) in light

Page 11: Senior Project Paper

Kurtz 11

intensity in an image. I figured out that I could calculate the change in light intensity along a

specific direction in the image. By doing this, I could approximate the derivative at each point in

an image. After this I could approximate the second derivative, the third derivative, and so on.

With the derivatives approximated, the algorithm can then work backwards and figure out

what the next pixel value should be for an algebraic equation. The following image explains this

further:

This system works perfectly for predicting the next value in algebraic equations such as y = 2x +

20 or y = 5x3 + 3x – 10. If the array is expanded to include more pixel values, it can work with

even higher order equations. The algorithm is able to do this because eventually the

approximate derivative is a constant value or 0. Surprisingly, the algorithm also was able to

reasonably approximate the next value in y = cos(x) and y = ln(x). I then designed a program to

apply the algorithm to images. For each pixel it would calculate a predicted value from the

surrounding pixels. If the predicted value and the actual value were off by a certain threshold,

Page 12: Senior Project Paper

Kurtz 12

then the pixel would be marked as an edge. The algorithm did not work as well in practice,

though.

Small variations in light intensity in an image would create large changes in values

higher up in the array. This led to extreme predicted values which did not match with what a

human might expect the next value to be. Another problem appeared whenever edges

occurred between the values used for the prediction. For example, if an edge occurred at Pixel

2 in the image shown above, then the value of PR4 became an extreme value. Noise such as a

bad pixel or spec in the image also created the same problem as an edge occurring in the values

used for the prediction. All three of these problems created false edges and spots inside of the

generated images. Here are the results of this algorithm (the top pictures are gray scale, and

the bottom are the edge detection):

After days of trying and several algorithms, I derived an algorithm that was based off of

the previously described neural network and predefined algorithm combination. Essentially it

approximates the derivative of order n on each side of the pixel being tested. This includes local

pixels inside of the reasoning for finding edges. The benefit of including these local pixels is to

eliminate noise that might be present in the image already. It also approximates the first

Page 13: Senior Project Paper

Kurtz 13

derivative on each side of the pixel to be tested. The purpose of the first derivative is to make

sure the edge occurred at the pixel being tested instead of in the general region of the pixel

being tested. Next, instead of predicting the next values, my algorithm simply compared the

values of the approximated derivatives. The following picture explains the algorithm in a visual

way.

The new algorithm seemed to work better, and fairly quickly since all operations are performed

on integer values. I believe it is faster than the Canny and Marr-Hildreth edge detector, but that

is only speculation. To speed up the algorithm and use less memory, I figured out the relation of

each successive approximate derivative. It follows Pascal’s Triangle with alternating signs. For

example, to approximate only the first derivative you take PL1 – P1 (when referring back to the

image above). To approximate the second derivative you would normally calculate the first and

then find the difference between these first derivatives. This equation can be simplified from

DLA2-DLA3 to PL2 – 2*PL1 + P1 (again when referring to the image above). To approximate the

Page 14: Senior Project Paper

Kurtz 14

third derivative, the equation simplifies to PL3 – 3*PL2 + 3*PL1 – P1 (when referring to the

image above).

After the program finds the necessary approximate derivatives, it then calculates the

difference between these derivatives. If the difference is more than a specific threshold, then it

records the results as an edge. The thresholds are adjustable, but I was not able to experiment

with many thresholds or design the program to choose the appropriate thresholds. Despite the

limited testing, the results seem promising. Here are the results of this algorithm (the top

pictures are gray scale, and the bottom are the edge detection):

As shown in the above image, the algorithm seems to work fairly well for single objects

within a landscape. The algorithm has problems decoding edges when there are a lot of objects

within the same image such as with the picture of trees. However, I believe all edge detectors

have this problem. It also has problems with texture such as the water in the dolphin picture or

the fur on the kangaroo. Some of these problems with texture occurred because the algorithm

only uses the gray scale version of the images.

To fix the problem of only being able to analyze gray scale images, I decided to make a

three dimensional color mapping to plot the points of every possible color combination. I

Page 15: Senior Project Paper

Kurtz 15

worked on the color mapping using two main facts. The first is that the lowest possible color is

black and the highest possible color is white. All other colors can be considered a tint of black or

a shade of white. The fact that there are two colors that all others build from gave me two

limiting points to build a mapping off of at each end of the map. This gave me a basis for where

to put colors when related to the Z-axis in the three dimensional mapping. The sum of the red,

blue, and green pixel values would determine where the color was on the Z-axis. Black (with

RGB pixel values 0,0,0) occurs at Z=0. White (with RGB pixel values of 255,255,255) occurs at

Z=765. Red, green and blue then occur at Z=255 (Red has an RGB value of 255,0,0 for example)

and yellow, cyan, and magenta occur at Z=510 (yellow has an RGB value of 255,255,0 for

example). The second fact I used comes from color theory where the colors red, blue, and

green are all separated by an angle of 120 degrees on a color wheel. By using this color wheel

and the separation of the three primary colors by 120 degrees, I created an equilateral triangle

mapping for color values with the maximum red values at the top corner of the triangle, the

maximum green values in the left corner of the triangle, and the maximum blue values in the

right corner of the triangle. Using this triangle, I mapped an XY axis over it with the Y axis

aligned with the red corner of the triangle and the origin at the centroid of the triangle. Also,

the Z-axis was shifted so that 382 (the approximate center of the range 0 to 765) occurs at Z =

0. With all of this explained, here are the equations for the mapping and a pictorial diagram:

X=(Blue∗cos−π6 )+(Green∗cos 7π6

)

Y=Red+(Blue∗sin−π6 )+(Green∗sin 7 π

6)

Z=(Red+Green+Blue )−382

Page 16: Senior Project Paper

Kurtz 16

Computation time was decreased by rounding and increasing the values so that

everything was kept as an integer. The formulas then became:

X=(Blue−Green)∗173

Y=Red∗100−(blue+green )∗50

Z=(red+green+blue−382 )∗100

The first algorithm written with this new color mapping worked very well despite its

simplicity. The algorithm compared the distances between the color values of pixels on either

side of the pixel being tested. It did this to all pixels within a predetermined distance away from

the pixel being tested along the horizontal and vertical directions in the image. If the calculated

magnitudes between the color values of pixels on either side of the pixel being tested exceeded

Page 17: Senior Project Paper

Kurtz 17

a threshold, then the pixel being tested was marked as an edge. Here is an image diagram of

the algorithm, the results, and the results compared with other edge detectors:

Page 18: Senior Project Paper

Kurtz 18

(Images from Eshan Nadernejad’s Edge Detection Technique)

The edge detection algorithms listed so far are all the ones that have been developed

and tested. I still have two more I would like to develop and test, so the general ideas behind

the algorithms are included here. The first builds further on the color mapping I developed. The

algorithm uses the color mapping to parse through an image and group pixels together

according to which pixels are closest in value to each other. It does this recursively by searching

matching every pixel in an image to neighboring pixels that are closest in value. I created an

algorithm to attempt this methodology, but it took several minutes to go through one image

and did not produce accurate results. The second algorithm yet to be developed and tested can

be added onto any of the algorithms listed in this paper. Essentially the algorithm checks to

make sure a pixel marked as an edge is part of a line and not just random values or single pixels

in an image. Also, the algorithm will be able to fill in missing pixels based on the presence of

surrounding edge values.

Page 19: Senior Project Paper

Kurtz 19

The results of the edge detection were easiest to see in regular images, but the

algorithms are fully applicable in OCR. Here is an example of text in an image that has been run

through the color mapping edge detection algorithm:

The edge detection algorithms will help with recognition in the next section.

1.6 Dopamine Maps

With the lower level processing resolved, I still needed a way to match characters and

objects. The general ideas for the algorithms to match characters and objects come from a

prediction and pattern approach. Generally humans define objects by their features. For

example, a face contains two eyes, a nose, and a mouth. The letter “A” is made up of two

slanted lines which meet at a point at the top with a horizontal line in the middle. I followed

this methodology rather than trying to match objects by templates like many algorithms before.

My way of matching assumes smooth transitions between areas and lines. In this way it can

predict what the next value would be much like the first algorithm I described for edge

detection. If the predictions the algorithm makes do not match up, then it marks the points as

surprises or unexpected values. In the same way that dopamine in your brain is release by

Page 20: Senior Project Paper

Kurtz 20

unexpected events, the algorithm marks points that it did not expect. I will explain these

algorithms in detail, but the programs behind the algorithms are not working yet.

There are three types of dopamine maps I have developed and plan to implement. The

first map is called the boundary dopamine map which describes the outside shape of an object.

The algorithm traces along the boundary of an object predicting where the next point should be

based off of previous points and their slopes. If the point predicted by the algorithm is not part

of the boundary of the object, then that point is marked in the dopamine boundary map. A

good example is the letter “A” again. Here the algorithm explores the boundary of the letter

and marks anything it did not expect. In this case it marks the ends of two slanted lines since it

expected the lines to continue. It also marks the point at the top and the intersections of the

horizontal and the slanted lines because the change in slope is very extreme. It did not mark

those points because they were corners. Any extreme change in slope will create a surprise

point according to the algorithm. The image below shows the final output from this type of

algorithm.

The points marked from the boundary dopamine algorithm are only relational to each other. In

other words, these points can show up in any orientation and will still be matched as long as

their distance and orientation towards each other are the same.

Page 21: Senior Project Paper

Kurtz 21

The next boundary map is called an interior dopamine map. Again, the algorithm makes

predictions about what it expects to see and marks anything that does not match up. The

interior dopamine maps algorithm assumes smooth transitions between surfaces. It tracks

along the image and continues predicting what the next values should be based on the previous

values. For example, we defined a face earlier as being made up of two eyes, a nose, and a

mouth. Those objects distinguish a face, and that is exactly what this type of algorithm would

mark. The features that show up inside of an object and their relation to each other define

what an object is. It only makes sense to design an algorithm to do try this. Also, once the

algorithm has been trained on what a nose, eyes, and mouth are it can define a face in the

same way. So, instead of saying there should be feature points in this position relative to this

position, it can build from the bottom up. If the algorithm finds an eye, it would expect another

eye and a nose to be present for a face.

The final map is called the background dopamine map. It is defined by using the border

and interior methods. The only difference is every object stores the types of backgrounds it can

be found on. This is to avoid confusion between objects that look like other objects. If the

algorithm saw a marble that looked like an eye, the above algorithms would try to label it as an

eye. By including the background, the algorithm can figure out that it is not an eye since it does

not occur on a face.

1.7 Generating Dopamine Maps and their Importance

The maps are generated from a pre-labeled data set. Dopamine maps would be

generated for each image in the pre-labeled data set. An algorithm would then find the

Page 22: Senior Project Paper

Kurtz 22

correlation among the maps and create a new map taking into account the differences between

the pictures. If a dopamine map is sufficiently different from other maps, then learning will take

place and a new map would be created alongside the old one. The need for learning becomes

apparent when we look at the letter “a” in different fonts which may be represented as “a” or

“a”. These correlation maps allow a general description for an object that is self relational and

definition based so that it can recognize different types of objects it has not seen before.

These maps seem to be a much more human way of thinking about things than

template matching. When we are asked to describe what an object looks like, we are not

drawing an exact object from a template stored in our brain. Instead we draw an object based

off of its definition. For the letter “E” we understand that it is a vertical line with horizontal lines

that extends from the top and bottom a certain distance and another horizontal line that

extends from the middle. These maps allow learning and adjustments to be made within the

program constantly.

1.8 Smart Algorithms

The focus so far has been on algorithms that allow for learning and prediction of values.

These are present in the human cognitive process and provide what should be a much more

dynamic object recognition process. Also, all the implementations have been simple

mathematical operations which can be easily and quickly performed. I intentionally stayed

away from complex mathematics such as Gaussian smoothing, using Eigen vectors, or statistical

analysis because it’s hard to see how our brains neural cells could implement these operations.

Page 23: Senior Project Paper

Kurtz 23

Also, the methods I have written so far have all had customizable values or thresholds. This

allows the program to learn and adapt so that it runs efficiently and outputs the best results.

1.9 Conclusion and Results

I have very few results to report at this time because of time constraints. I can report

results on the edge detection algorithm, though. It appears to perform at the same level as

more complicated algorithms as shown in the included pictures. Also, the implementation in

OCR will help out greatly. I ran the program on sample text and it performed better that I could

have expected. The text images were full of different text colors, text sizes, and background

colors. The program was able to successfully separate out the characters into a binary black and

white image. The next step will be to implement the border prediction algorithm.

I plan to continue this research outside of class since it looks promising. It looks

promising, and I look forward to working on it. The working code used so far is attached.

2.0 Acknowledgments

I would like to thank Dr. Abkemeier and the Fontbonne Math Department for letting me

pursue this area of interest. This paper was submitted to the faculty of Fontbonne University’s

Department of Mathematics and Computer Science as partial requirement for the degree of

Bachelor of Science in Mathematics.

Page 24: Senior Project Paper

Kurtz 24

2.1 References

Chetverikov, Dmitry. Is Computer Vision Possible? Rep. N.p.: n.p., n.d. Print.

Holley, Rose. "How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale

Historic Newspaper Digitisation Programs." D-Lib Magazine N.p., n.d. Web. 5 Apr. 2013.

<http://www.dlib.org/dlib/march09/holley/03holley.html>.

How Brain Science Will Change Computing. Dir. Jeff Hawkins. TED, n.d. Web.

Lowe, David. The Computer Vision Industry. N.p., n.d. Web. 6 Apr. 2013.

<http://www.cs.ubc.ca/~lowe/vision.html>.

McCann, John J. Human Color Perception. Cambridge: Polaroid Corporation, 1973. Print.

Nadernejad, Ehsan. "Edge Detection Techniques: Evaluations and Comparisons." Mazandaran

Institute of Technology, n.d. Print.

Ritter, G. X., and Joseph N. Wilson. Handbook of Computer Vision Algorithms in Image Algebra.

Boca Raton: CRC, 1996. Print.

Shimojo, Shinsuke, Michael Paradiso, and Ichiro Fujitas. What Visual Perception Tells Us about

Mind and Brain. Rep. N.p., n.d. Web. 5 Apr. 2013.

<http://www.pnas.org/content/98/22/12340.full>.

Stergiou, Christopher, and Dimitrios Siganos. "Neural Networks." Neural Networks. N.p., n.d.

Web. 5 Apr. 2013.

<http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html>.

Szeliski, Richard. Computer Vision: Algorithms and Applications. London: Springer, 2011. Print.

"What's OCR?" Data ID. N.p., n.d. Web. 5 Apr. 2013. <http://www.dataid.com/aboutocr.htm>.