mcgillivary thesis

Upload: craig-mcgillivary

Post on 09-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 McGillivary Thesis

    1/118

    QUANTIFYING NOISE EFFECTS IN BILEVEL DOCUMENT IMAGES

    by

    Craig D. McGillivary

    A thesis

    submitted in partial fulfillment

    of the requirements for the degree of

    Master of Science in Electrical Engineering

    Boise State University

    October, 2007

  • 8/8/2019 McGillivary Thesis

    2/118

    2007Craig D. McGillivary

    ALL RIGHTS RESERVED

  • 8/8/2019 McGillivary Thesis

    3/118

    iii

    The thesis presented by Craig D. McGillivary entitled Quantifying Noise Effects inBilevel Document Images is hereby approved

    Elisa H. Barney Smith Date

    Advisor

    Tim Andersen Date

    Committee Member

    Jim Browning DateCommittee Member

    John R. Pelton Date

    Dean of the Graduate College

  • 8/8/2019 McGillivary Thesis

    4/118

    iv

    ACKNOWLEDGEMENTS

    I would like to Dr. Barney Smith who encouraged me to get a graduate degree and

    then poked and prodded me until I completed it. I would not have succeeded without her

    support and mentorship.

    I would also like to thank my friends and family. They helped me to overcome the

    stress and frustrations that came up from time to time as I worked towards my goals. I

    would especially like to thank my coworkers Joetta Anderson, Chris Hale, Darrin Reed,

    Jim Steele who acted as sounding boards for ideas, reviewed and edited my writing and

    provided friendship in this journey. I am honored and grateful to my committee members

    Dr. Tim Anderson and Dr. Jim Browning who provided time and energy to review this

    thesis.

    This material is based upon work supported by the National Science Foundation

    under Grant No. CCR- 0238285.. Any opinions, findings, and conclusions or

    recommendations expressed in this material are those of the author(s) and do not

    necessarily reflect the views of the National Science Foundation.

  • 8/8/2019 McGillivary Thesis

    5/118

    v

    ABSTRACT

    The effect of binarization via global thresholding on additive Gaussian noise in high

    contrast images is explored. A measure of noise in bilevel images called noise spreadis

    developed with the use of a degradation model that applies to many image degradations

    included in desktop scanning. When high contrast images are binarized, noise is

    concentrated on the edges of objects in the image.Noise spreadis the breadth of the

    domain in which pixels are affected by noise after binarization. It depends on both the

    noise level and the gradients of the image prior to thresholding.

    There is a strong linear relationship between noise spreadand the expected Hamming

    distance between an image with noise added and the same image without noise added. It

    is also known that if two images of an object are synthetically generated with

    independent random phases and zero noise, there is a small Hamming distance between

    them. Experiments on circles and on a 5 were run to determine the combined effect of

    random independent phase and noise spreadon the expected Hamming distance. The two

    factors are not additive and that the phase effects become less significant when the noise

    spreadincreases. The degree to which this is true depends on the shape of the object

    being scanned.

    In addition to experiments on Hamming distance, experiments were run to determine

    the geometric precision of images with noise. This includes experiments relating noise

    spreadon the localizability of straight edges at several different orientations. The

    localizability of and edge is defined by the ability to determine the orientation and

  • 8/8/2019 McGillivary Thesis

    6/118

    vi

    position of an edge segment. The quality of edge measurements is quantified by the angle

    between the measured edge and the true edge and by the distance between the measured

    edge segment and the midpoint of the true edge segment. Surprisingly the distance

    measurements for edges at certain orientations actually are precise when the noise spread

    increases, but this variation is offset by less precision in the measurements of edge

    orientation. For most edge orientations the precision of both distance measurements and

    orientation measurements decreases when noise spreadincreases. Experiments relating

    noise spreadto the localizability of circles were also conducted. These experiments

    reveal that the positional error in circle measurements has a Rayleigh distribution, while

    the radius measurements have a normal distribution, and that circle localizability

    decreases as noise spreadincreases.

    Noise Spreadprovides a strong theoretical foundation for future research. Since

    random effects play a critical role in optical character recognition (OCR) and in pattern

    recognition generally, it is important to understand and quantify them. Future research

    will focus on relating noise spreadto human preference, on finding novel techniques for

    measuring noise spreaddirectly from binary images, and on developing filters and other

    techniques which will make OCR systems less susceptible to noise.Noise spreadmay

    also have unforeseen applications in problems other than document research.

  • 8/8/2019 McGillivary Thesis

    7/118

    vii

    TABLE OF CONTENTS

    LIST OF FIGURES ........................................................................................................... ix

    LIST OF TABLES........................................................................................................... xiii

    LIST OF SYMBOLS, TERMS & ABBREVIATIONS ...................................................... i

    1. INTRODUCTION ........................................................................................................ 1

    2. TECHNICAL BACKGROUND................................................................................... 5

    2.1. GENERATING SYNTHETIC SCANNED IMAGES .................................................. 6

    2.1.1. Basic Scanner Model ...................................................................... 7

    2.1.2. Scanned Straight Edges................................................................. 10

    2.1.3. Scanned Strokes............................................................................ 13

    2.1.4. Scanned Circles............................................................................. 17

    2.1.5. Scanned Characters....................................................................... 20

    2.2. EDGE FINDING TECHNIQUES ......................................................................... 22

    2.3. CIRCLE FITTING TECHNIQUES ....................................................................... 26

    2.4. EFFECT OF SAMPLING ................................................................................... 29

    3. NOISE SPREAD THEORY ....................................................................................... 35

    3.1. NOISE SPREAD FOR STRAIGHT ISOLATED EDGES............................................ 35

    3.2. EXTENDINGNOISE SPREAD TO GENERAL SHAPES ......................................... 40

    3.3. RELATIONSHIP BETWEENNOISE SPREAD AND HAMMING DISTANCE ............ 42

    3.4. NOISE SPREAD WITH VARYINGNOISE LEVELS .............................................. 45

  • 8/8/2019 McGillivary Thesis

    8/118

    viii

    4. HAMMING DISTANCE BETWEEN SCANNED OBJECTS.................................. 49

    4.1. HAMMING DISTANCE BETWEEN SCANNED EDGES ........................................ 49

    4.2. HAMMING DISTANCE OF SCANNED CIRCLES................................................. 52

    4.3. HAMMING DISTANCE OF SCANNED CHARACTERS......................................... 56

    5. GEOMETRIC MEASUREMENTS OF BILEVEL SCANS...................................... 60

    5.1. LOCALIZABILITY OF SCANNED EDGES .......................................................... 60

    5.1.1. Comparing Various Approaches to Finding Edges ...................... 63

    5.1.2. Effect of Threshold ....................................................................... 65

    5.1.3. Effect of PSF Width...................................................................... 67

    5.1.4. Effect of Edge Orientation............................................................ 70

    5.2. LOCALIZABILITY OF SCANNED CIRCLES ....................................................... 77

    6. CONCLUSIONS AND FUTURE WORK ................................................................. 81

    REFERENCES ..................................................................................................................84

    APPENDIX A 86

    Iso Curves for Cauchy and Gaussian PSFs

    APPENDIX B 100

    Code for Calculating

    APPENDIX C 102

    Code for CalculatingRi

  • 8/8/2019 McGillivary Thesis

    9/118

    ix

    LIST OF FIGURES

    Figure 1.1 The ideal edge and the noisy edge have the same phase and orientation.

    The Hamming distance between the two images is the number of pixelsthat are different between the two images. ................................................. 3

    Figure 2.1 This scanner model is used to determine the value of the pixelf[i,j]centered on each sensor element................................................................. 8

    Figure 2.2 Gaussian ESFs are shown with w1=1 and w2=2. When the image is

    blurred and thresholded the position of the edge is shifted from the dotted

    step function to the solid edge function. This shift is called c and is byconvention positive when the edge is shifted to the left. .......................... 11

    Figure 2.3 When a stroke characterized by two parallel edges is scanned the resulting

    stroke thickness changes. Interference between the parallel edges causes

    the grayscale value of pixels to be less than that predicted by the ESF. As

    a result the stroke thickness will be less than that predicted by c. .......... 14

    Figure 2.4 The Iso Curves for a Cauchy PSF are significantly different from the ccurves even when the stroke width is 15. ................................................. 16

    Figure 2.5 When a circle is scanned its radius changes. As with scanned strokes if the

    threshold is too high the circle will disappear when it is scanned. ........... 18

    Figure 2.6 Algebraic Fitting can sometimes result in a poor fit. The center of the

    Algebraic fit is (10.24,20.98) and the radius is 4.83. The center of the

    Geometric fit is (10.10,7.92) and the raidus is 11.77............................... 29

    Figure 2.7 Spirographs are a useful tool for studying the phase effects of straight

    edges. This spirograph was created using N=20 and an edge with a 35degree angle of inclination. The marks on the circle represent the locations

    of sample points relative to the edge. The lines on the interior of the circle

    connect sample points that are on adjacent columns of the image. The

    location of the edge can be represented by a point on the circle. ............. 31

    Figure 2.8 (a) The variance of the distance between the measured and actual edges is

    determined for edges that are 20 pixels long. (b) The variance in the

    angular error is determined for noiseless edges that are 20 pixels long. .. 33

  • 8/8/2019 McGillivary Thesis

    10/118

    x

    Figure 3.1 Edges with varying amounts of noise spread. While the standard deviation

    of the noise in first three images is the same, the noise spread is different.The picture on the far left shows an extreme amount of noise. ................ 35

    Figure 3.2 (a) Edge after blurring with a generic PSF of width, w. When no noise is

    added, the thresholding produces the edge shift c. (b) Edge with noise.The uncertain boundary shown in gray, is the noise spread region. Theeffects of sampling are not shown. ........................................................... 36

    Figure 3.3 The threshold probability (THP) function is shown for a Gaussian PSF

    with w=1, =0.7 and noise=0.1.Noise spreadin this case is about .72. Thepiecewise approximation of threshold probability is inaccurate at the tails

    of theTHP function. ................................................................................. 38

    Figure 3.4 The threshold probability function given in Equation 3.1 (solid) is

    compared to the approximation in Equation 3.8 (dashed) with the

    parameters w=1, =0.7 and =0.1. The actual threshold probabilityis a little lower on the tails. ....................................................................... 40

    Figure 4.1 The relationship between the expected Hamming distance and the noisespreadis shown for (a) 13 different PSF widths (b) 7 different thresholdsand (c) 8 different angles of inclination. The PSF width, threshold and

    orientation of the edge dont affect the relationship between Hamming

    distance and noise spreadas long as it is not a degenerate orientation with

    a slope that can be represented by an irreducible fraction of small integerssuch as 0 degrees....................................................................................... 51

    Figure 4.2 Hamming Distance versus noise spreadfor circles of varying radii. (a)

    Both images have the same phase shift (b) Images have randomindependent phases. (c)Noise spreadis plotted against the Hammingdistance for both in-phase and independent phase together using the

    average of the results from each radius. The effect of phase is less

    significant when the noise spreadis higher. ............................................ 55

    Figure 4.3 (a) Results of in-phase experiments. (b) Results of independent phase

    experiments show a strong relationship between noise spreadandexpected Hamming distance. .................................................................... 58

    Figure 4.4 The effect of using an independent phase becomes less significant as

    Noise increases.......................................................................................... 59

    Figure 5.1 An example of a straight edge scanned with w=3, =.3 and NS=.6. Thesolid line shows the position of the original edge. The dotted line is the

    theoretical position of the scanned edge which results from shifting by c.The center point is the midpoint of the theoretical edge segment. ........... 63

  • 8/8/2019 McGillivary Thesis

    11/118

    xi

    Figure 5.2 (a) The variance of angle measurements from perpendicular fitting and

    standard fitting out perform the other methods and LoG is particularlypoor. (b) The variance of distance measurements clearly shows that

    standard and perpendicular fitting both perform markedly better then the

    other methods............................................................................................ 64

    Figure 5.3 (a) The perpendicular and standard fitting both low perpendicular bias.

    However, this is not true of the other methods. (b) There is a small angularbias in the standard fitting compared to the perpendicular fitting. LoG has

    a very large angular bias. .......................................................................... 64

    Figure 5.4 (a) Variance of angular error with a fixed sampling grid. (b) Variance of

    distance measurements for fixed sampling grid........................................ 66

    Figure 5.5 (a) Bias in angle measurement with a fixed sampling grid. (b) Bias in edge

    position measurement with a fixed sampling grid. ................................... 66

    Figure 5.6 (a) Variance of angular error with a random sampling grid. (b) Variance ofdistance measurements for random sampling grid.................................... 67

    Figure 5.7 (a) Bias in angle measurement. (b) Bias in edge position measurement. . 67

    Figure 5.8 (a) Variance of angular error with a random sampling grid and severaldifferent values ofw. (b) Variance of distance measurements for a random

    sampling grid and several different values ofw. ...................................... 68

    Figure 5.9 (a) Bias in the orientation measurements of edges with fixed sampling grid

    and several different values ofw. (b) Bias of distance measurements for a

    fixed sampling grid and several different values ofw. ............................. 68

    Figure 5.10 (a) Variance of angular error with a random sampling grid and several

    different values ofw. (b) Variance of distance measurements for a randomsampling grid and several different values ofw. ...................................... 69

    Figure 5.11 (a) Bias in the orientation measurements of edges with random sampling

    grid and several different values ofw. (b) Bias of distance measurementsfor a random sampling grid and several different values ofw.................. 69

    Figure 5.12 (a) Bias in the orientation measurements of edges with a fixed sampling

    grid and several different edge orientations. (b) Bias in the distance

    measurements of edges with a fixed sampling grid and several differentedge orientations. ...................................................................................... 71

    Figure 5.13 (a) Variance in the orientation measurements of edges with a fixedsampling grid and several different edge orientations. (b) Variance of the

    distance measurements of edges with a fixed sampling grid and several

    different edge orientations ........................................................................ 71

  • 8/8/2019 McGillivary Thesis

    12/118

    xii

    Figure 5.14 (a) Variance of angular error with a random sampling grid and several

    different edge orientations. (b) Variance of distance measurements forrandom sampling grid and several different edge orientations................. 72

    Figure 5.15 (a) Bias in the orientation measurements of edges with random samplinggrid and several different edge orientations. (b) Bias of distance

    measurements for random sampling grid and several different edge

    orientations................................................................................................ 72

    Figure 5.16 (a) Variance of angular error with a fixed sampling grid and degenerateedge orientations. (b) Variance of distance measurements for a fixed

    sampling grid and degenerate edge orientations....................................... 73

    Figure 5.17 (a) Bias of angular error with a fixed sampling grid and degenerate edge

    orientations. (b) Bias of distance measurements for a fixed sampling grid

    and degenerate edge orientations. ............................................................. 73

    Figure 5.18 (a) Variance of angular error with a random sampling grid and degenerateedge orientations. (b) Variance of distance measurements for a randomsampling grid and degenerate edge orientations....................................... 74

    Figure 5.19 (a)Bias in the orientation measurements of edges with random sampling

    grid and degenerate edge orientations. (b) Bias of distance measurements

    for random sampling grid and degenerate edge orientations. ................... 74

    Figure 5.20 Distance and Angular Error for 0 degree edge without noise................... 75

    Figure 5.21 Distance and Angular Error for 20 degree edge without noise................. 75

    Figure 5.22 Distance and Angular Error for 0 degree edge withNS=.3. ..................... 76

    Figure 5.23 Distance and Angular Error for 20 degree edge withNS=.3. ................... 76

    Figure 5.24 (a) The radius measurements from Experiment 1 have a normal

    distribution. (b) The distances between the measured circle centers and the

    actual circles centers from Experiment 1 have a Rayleigh distribution.... 78

    Figure 5.25 (a)The Rayleigh parameter of the positional error for several different

    values ofw. (b) Variance of radius measurements for several different

    values ofw. ............................................................................................... 79

    Figure 5.26 (a)The Rayleigh parameter of the positional error for several different

    values of. (b) Variance of radius measurements for several different

    values of. ............................................................................................... 80

  • 8/8/2019 McGillivary Thesis

    13/118

    xiii

    LIST OF TABLES

    Table 2.1 Simple operators used for edge detection. .................................................24

    Table 3.1 Functions for calculating blurred noise......................................................48

    Table 4.1 Edge experiment parameters......................................................................50

    Table 5.1 Parameters for threshold experiments........................................................65

    Table 5.2 Parameters for PSF width experiments......................................................68

    Table 5.3 Parameters for edge orientation experiments.............................................70

    Table 5.4 Circle Experiment Parameters ...................................................................77

  • 8/8/2019 McGillivary Thesis

    14/118

    LIST OF SYMBOLS, TERMS & ABBREVIATIONS

    OCR Optical Character Recognition

    PSF Point Spread Function

    ESF Edge Spread Function

    LSF Line Spread Function

    CSF Circle Spread Function

    o(x,y) bilevel continuous input image

    s(x,y) intensity of pixels at (x, y) before noise

    s(x) intensity of pixels at position x before noise

    s[i,j] sampled unquantized output of blurring convolution

    f[i,j] final binary scanner output

    n[i,j] Gaussian noise added in degradation model

    noise standard deviation of Gaussian noise

    w general width parameter for PSF

    binarization or threshold level

    Distance between samples

    max binarization or threshold level which causes stroke to disappear.

    c This is the distance that an edge shifts when it is scanned

    Thickness of a scanned stroke before scanning

    scanned Thickness of a scanned stroke after scanning

    Amount that the thickness of a scanned stroke increases by

    Rf Scanned circle radius

    Ri Original circle radius

    Cutoff used for noise spread

    Z Z value for alpha cutoff for noise spread

  • 8/8/2019 McGillivary Thesis

    15/118

    (x) Density of pixels

    v(x,y) Noise on source image

    qs(x,y) Variance of noise on source image

    qb(x,y) Variance of noise after blurring

    Angle of inclination for edge

  • 8/8/2019 McGillivary Thesis

    16/118

    1

    1. INTRODUCTION

    High contrast images, which occur frequently in document images, are often digitized

    into binary images. These binary images are then analyzed by optical character

    recognition (OCR) systems which convert the text images into ASCII characters. OCR

    systems depend on many features measured from the text images. They use large sets of

    images that have already been classified and ideally are representative of the characters in

    document images. Features are then measured from these training sets and those

    measurements are used to label unclassified characters. One way of generating these

    training sets is to generate large numbers of synthetic character images whose labels are

    known and then to use these images to train the OCR system. To do this effectively it is

    best if the synthetic characters are as similar as possible to real scanned characters. It is

    not enough that individual characters be similar to characters in real document images;

    the statistical properties of large numbers of characters need to match those of real

    characters. This means that a good theoretical basis for the nondeterministic effects in the

    generation of scanned characters is required.

    Part of the nondeterministic aspects of generated characters is the random position or

    phase of the sampling grid relative to a scanned object. When the sampling grid is shifted

    relative to a continuous character, a large number of different bitmaps can result. It is

    possible to determine the number and frequency of each of these bitmaps using modulo

    grid diagrams [1]. Modulo grid diagrams are formed by performing a modulo one

    operation with respect to both the horizontal and vertical coordinates of an objects

  • 8/8/2019 McGillivary Thesis

    17/118

    2

    boundary. The effects of random phase can be incorporated into OCR by generating

    training sets with random phases.

    Unfortunately random phase is not the only nondeterministic aspect of scanned

    characters. Another source of randomness is the noise that is in the document image prior

    to digitization and the noise that is added during scanning. When the image is binarized,

    the noise becomes concentrated on the boundaries of objects. In order for training sets to

    have the same statistical properties as real scanned characters, it is necessary for the

    training sets to have the appropriate amount of noise. Before this problem can be

    addressed the amount of noise on the edges of binary images must be quantified and

    understood theoretically.

    The amount of noise in binary images is not only dependent on the amount of noise

    added to the image, but on the shape of the object being scanned and on the scanner

    model parameters. This research focuses on quantifying the noise in bilevel images and

    on relating that amount of noise to the parameters of a commonly used degradation

    model. The quantity that was developed is called noise spread.Noise spreadis the size of

    the domain over which pixels in the binary image are affected by noise. While the

    research in this thesis focuses on document images, the concept ofnoise spreadcould be

    applied to other situations where an image is binarized.

    Noise spreadprovides a theoretical basis for understanding and measuring noise in

    document images. It is critical for developing methods to mitigate the effects of noise in

    binary images.Noise spreadallows bilevel images to be created with different

    degradation parameters but the same amount of noise. Filters can be tested to see if the

    negative effects of noise can be suppressed. Measurement techniques can also be

  • 8/8/2019 McGillivary Thesis

    18/118

    3

    developed to measure noise directly from document images. The noise in a binary

    document image can be measured, and the training sets that are used to design an OCR

    system could have the same levels of noise as the documents that the OCR system is

    being applied to. Alternatively the noise measurement could be included into a general

    OCR system as an additional feature.

    Noise in binary images is equivalent to errors in general binary signals. In

    information theory the Hamming distance between two binary signals of the same length

    is the number of bits that are different between the two signals. The amount of noise in a

    bilevel image can reasonably be measured by the Hamming distance between the image

    with noise and the same image without noise. Figure 1.1 shows the Hamming distance for

    images of a straight edge with and without noise. Theory about the relationship between

    Hamming distance and noise spreadwill be introduced in Section 3.3, and experiments to

    verify this theory are described in Chapter 4. Chapter 4 also includes experiments on the

    relationship between the Hamming distance and noise spread when the phases of the two

    objects are independent.

    Since objects with noise are likely harder to precisely locate in an image, several

    localization experiments were conducted to verify that noise spread accurately quantifies

    Figure 1.1: The ideal edge and the noisy edge have the same phase and orientation. The Hamming

    distance between the two images is the number of pixels that are different between the two

    images.

    Ideal Edge Noisy Edge Hamming Distance

  • 8/8/2019 McGillivary Thesis

    19/118

    4

    the noise in thresholded images. The experiments were designed to verify that noise

    spread accounts for all the effects of different document degradation parameters on the

    localizability of circles and straight edges. There are applications where the ability to

    precisely locate edges in bilevel images is important. For instance Hok Sum Yam [2]

    developed a method for finding the degradation parameters from bilevel document

    images. The method depended on precise edge measurements in order to determine how

    much the corners of characters were eroded by scanning.

    Chapter 2 discusses the technical background. That chapter includes a discussion of

    how synthetic scanned images are created, a review of edge and circle localization

    techniques, and a discussion of the effect of sampling. Chapter 3 introduces noise spread

    in great detail and shows that it is related to Hamming distance. Chapter 4 provides an

    experimental relationship between Hamming distance and noise spread that reinforces

    and expands upon the relationship theorized in Chapter 3. Experiments in Chapter 5

    provide evidence that all of the effects of the document degradation parameters on the

    localizability of scanned edges and circles are determined by the noise spread.

  • 8/8/2019 McGillivary Thesis

    20/118

    5

    2. TECHNICAL BACKGROUND

    Before the theory behind noise in binarized images can be presented, it is necessary to

    discuss the basic degradation model upon which it is based. It is also necessary to review

    literature on the precise geometric measurements of straight edges and circles in images.

    Determining the effect of noise on localizability is important to prove that noise spread is

    a good quantitative measure of edge noise in bilevel images. The effects of phase are also

    important because phase affects both localizability and Hamming distance between

    objects.

    The degradation model describes the acquisition of a binary scanned image as a

    multistage process whose steps include: convolving with apoint spread function (PSF),

    sampling, adding noise, and thresholding. Section 2.1 will discuss the document

    degradation model in more depth and show how the model can be applied to scanned

    edges, circles, and strokes. A method for determining the gradients of the simulated grey

    level images is also discussed.

    Edge finding is a very basic computer vision task and has been widely studied. The

    experiments in this thesis are designed to determine the localizability of scanned edges

    under different amounts of noise. Section 2.2 provides a review of the literature on edge

    finding techniques and discusses in detail the edge finding techniques that were used in

    this thesis.

    There are several techniques for localizing circles. All the techniques that were

    considered use least squares, but some techniques work better than others. Section 2.3

  • 8/8/2019 McGillivary Thesis

    21/118

    6

    discusses these circle localization techniques and describes the Gauss-Newton algorithm

    used to solve the nonlinear least squares problem.

    The effects of sampling images have been studied extensively. Section 2.4 reviews

    the literature on sampling effects, discusses the effect of sampling and edge orientation

    on the localizability of scanned edges, and also discusses tools like the modulo-grid

    diagram which provide a means for predicting the possible bitmaps that result from

    different shapes due to phase effects alone.

    2.1. Generating Synthetic Scanned ImagesThis thesis uses a degradation model based on a model proposed by Baird [3]. In that

    model an ideal continuous bilevel image is convolved and sampled by a point spread

    function (PSF). Then Gaussian noise is added to the image to represent noise added

    during scanning and noise that would have been originally present on the paper image.

    Finally, the image is binarized at a certain threshold level as shown in Figure 2.1.

    This section begins with a detailed mathematical description of the scanner model.

    Then in subsequent subsections this model is applied to straight edges, scanned strokes,

    circles, and general scanned characters. In order to understand how noise affects scanned

    bilevel images, we will need to determine the intensity gradients of the image before

    thresholding. These gradients help determine how noisy a bilevel image will be. A

    technique for obtaining these gradients is discussed for each of the different types of

    scanned objects.

  • 8/8/2019 McGillivary Thesis

    22/118

    7

    2.1.1. Basic Scanner Model

    The basic scanner model describes the sampling of the spatially continuous image of

    blackness or absorptance, o(x,y), where absorptance is one minus the reflectance. The

    values ofo(x,y) can be either 0 (white) or 1 (black). The image is digitized by a sensor

    array in the scanner. A PSF is used to model the fact that for each point on a physical

    paper image different amounts of light are reflected to each sensor. The PSF is the 2-D

    equivalent to the impulse response of a scanner. This equivalence means that convolution

    can be used to predict the amount of reflected light each sensor detects. If the image is

    sampled at pointsxj, yi on a rectangular grid, then the image is given by

    [ ] ( ) ( )dudvvuovyuxjis ij ,,PSF, = . (2.1)

    This equation assumes that the scanner is spatially invariant over the field of view, which

    is valid for small regions. In order to model the noise that would exist on the original

    image and the noise that is added during scanning, Gaussian noise n[i,j] is added to the

    image

    [ ] [ ] [ ]jinjisjia ,,, += . (2.2)

    The noise is added to every sensor independently and has a mean of zero and a standard

    deviation ofnoise. Other types of additive noise could be used as well.

    To produce a bilevel image the intensity is quantized using a thresholding operation

    [ ][ ]

    [ ]

    (2.23)

    and

    c 2 . (2.24)

    The lower bound comes from the fact that scanned cannot be negative, and the upper

    bound comes from Equation 2.20, Equation 2.13 and the fact that the ESF is always

    positive. These upper and lower bounds can be used with the bisection method of root

    finding to find .

    Values ofw and which result in certain /2 values can be represented by iso curves

    which are equipotential curves in w and on which /2 has the same values. Figure 2.4

    shows these iso curves for a Cauchy PSF on the same plot as the c curves. As can be

    seen the values of/2 are significantly smaller than c even when is 15. Appendix A

    shows the iso curves that result from both Gaussian and Cauchy PSFs and for several

    different values of.. Code for calculating is included in Appendix B.

    For a Gaussian PSF the difference between /2 and c becomes insignificant as gets

    larger. Ifc is used to estimate the size of the scanned stroke after scanning, then it is

    necessary to determine whether the edges are close enough to have interference. As a rule

    of thumb, a Gaussian ESF is approximately either 1 or 0 at 3w from an edge. This means

    that if the size of the stroke after scanning predicted by c is greater than 3w, no

    interference occurs. This rule of thumb can be summarized by the following inequality

  • 8/8/2019 McGillivary Thesis

    31/118

    16

    which if satisfied means that no interference occurs

    w c 3 . (2.25)

    As with straight edges it is very important to determine the magnitude of the gradient

    ofs(x) at the location of the thresholded edges. The derivative ofs(x) is given by

    ( )w

    w

    x

    w

    x-

    xs

    +

    =

    2

    2LSF

    2

    2LSF

    . (2.26)

    Since the slope on the rising edge is positive

    w

    w

    w

    s

    scannedscanned

    scanned

    ++

    +

    =

    2LSF

    2LSF

    2. (2.27)

    Figure 2.4 The Iso Curves for a Cauchy PSF are significantly different from the c curves even whenthe stroke width is 15.

    -3

    -2-1

    0

    1

    2

    3

    -3

    -2

    -1

    0

    12

    3

    c

    /2

    /2c

  • 8/8/2019 McGillivary Thesis

    32/118

    17

    The gradient can also be expressed as a function of

    w

    ww

    s scanned

    +

    +=

    2LSF

    2

    2LSF

    2

    . (2.28)

    2.1.4. Scanned Circles

    Circles are among the simplest geometric shapes. As a consequence, when

    experiments are done on the effects of noise on bilevel images, it is useful to apply them

    to circles. A circle spread function CSF can be defined to describe the intensity of pixels

    as a function of the distance from the center of the circle. When the circle is scanned, its

    size changes. The scanned circle radiusRfcan be found from the original circle radiusRi

    given the scanner parameters. Likewise sometimes when circles are generated for

    experiments,Ri needs to obtained fromRf. As with edges and scanned strokes the

    gradient of the scanned circle can be determined for both Gaussian and Cauchy PSFs.

    Figure 2.5 shows the cross section of a circle scanned with a Cauchy PSF.

    The intensity of a pixel as a function of the distance from the center of the circle can

    be obtained using

    ( )

    =i

    i

    i

    i

    R

    R

    dx

    xR

    r,y;w)dy(xr

    xR22

    22

    PSFCSF . (2.29)

    For a Gaussian PSF the equation becomes

    ( )( )

    =

    i

    i

    R

    R

    idx

    w

    xRerf

    w

    rx

    wr

    22exp

    2

    1CSF

    22

    2

    2

    Gaussian . (2.30)

    For a Cauchy PSF the equation becomes

  • 8/8/2019 McGillivary Thesis

    33/118

    18

    ( )( )( ) +++

    =

    i

    i

    R

    R i

    idx

    wRxrrwrx

    xR

    wr

    22222

    22

    Cauchy2

    CSF . (2.31)

    The integrals have to be evaluated numerically. Special care must be taken when

    numerically solving for the Gaussian CSF. The value of the integrand is near zero for a

    large part of the domain over which it is integrated. This causes large errors when certain

    numerical algorithms are used. If

    iRwr 5 , (2.32)

    then the following integral should be used to calculate the Gaussian CSF

    ( )

    ( )

    =

    i

    i

    R

    wr

    i

    dxw

    xR

    erfw

    rx

    wr5

    22

    2

    2

    Gaussian 22exp2

    1

    CSF . (2.33)

    The value ofRfdepends onRi, , and w. As with stroke thickness,Rfis defined

    implicitly by

    Figure 2.5 When a circle is scanned its radius changes. As with scanned strokes if the threshold is too

    high the circle will disappear when it is scanned.

    R

    R

  • 8/8/2019 McGillivary Thesis

    34/118

    19

    ,w;RR if =CSF . (2.34)

    Numerical methods are necessary to find the value ofRf. The CSF is a monotonically

    decreasing function because of the restrictions that were placed on the PSF. As with

    scanned strokes there is a max above whichRfwill be zero. To find max we use

    ( ) max0CSF = . (2.35)

    The easiest way to find CSF(0) is to use polar coordinates

    ( )=iR

    rdrr;w)(

    0

    PSF20CSF . (2.36)

    For a Gaussian PSF this simplifies to

    ( )

    =

    2

    2

    Gaussian2

    exp10CSFw

    Ri . (2.37)

    For a Cauchy PSF it is

    ( )22

    Cauchy 10CSF

    iRw

    w

    += . (2.38)

    Once it is confirmed that is not greater than max the value ofRfcan be found. There is

    a lower bound onRfsince it cannot be negative. To find an upper bound onRf, a value

    must be found which causes the CSF to be less than . The upper bound is first chosen to

    be two timesRi. If this does not result in a CSF value less than , then 2Ri becomes the

    new lower bound, and the upper bound is chosen to be four timesRi. The assumption for

    the upper bound is doubled until it results in a CSF value less than . Once the upper

    bound is determined, a bisection method can be used to findRf. It is also possible to

    determineRi ifRf, w and are known.Ri is greater than zero andRfincreases

  • 8/8/2019 McGillivary Thesis

    35/118

    20

    monotonically asRi increases. The algorithm for findingRi is essentially the same as the

    method for findingRf. The algorithm for findingRi is included in Appendix C.

    The magnitude of the gradient of scanned circles is important for determining the

    effects of noise. If the gradient is calculated by evaluating the CSF at two points, the

    calculation is prone to error. The numerical integration has some noise which is

    magnified by this technique. Instead it is better to take the derivative analytically. The

    gradient is given by

    ( ) ( )( ) dxdyyrx

    r

    r

    r

    i

    i

    R

    R

    xR

    xR

    =

    22

    22

    ,PSFCSF . (2.39)

    For a Cauchy PSF this simplifies to

    ( )( ) ( )( )

    ( )( ) ( )( ) +++++

    = i

    i

    R

    R i

    i dx

    wRxrxwrx

    xRwRxrxxrwr

    r 2/32222222

    222222

    Cauchy

    3223CSF

    . (2.40)

    For a Gaussian PSF the gradient is given by

    ( ) ( )

    = R

    R

    idx

    w

    xRerf

    w

    rx

    w

    rxr

    r 22exp

    2CSF

    22

    2

    2

    3Gaussian . (2.41)

    In both cases the gradient is found by numerical integration. For a Gaussian PSF the same

    problem exists with the integrand being near zero over a large part of the domain over

    which it is integrated. The solution for finding the CSF can be applied in exactly the same

    way to obtain the gradient.

    2.1.5. Scanned Characters

    The shape of scanned characters is too complicated to use many of the analytical

    methods in the previous section. Instead the value of the scanned image is determined by

  • 8/8/2019 McGillivary Thesis

    36/118

    21

    using the discrete convolution of a sampled character image and a sampled PSF. There is

    some error associated with this method, but it is reduced by using sampled images and

    PSFs with resolutions larger than the resolutions of the final images. Thescale factoris

    defined as the simulated resolution divided by resolution of the final image. A PSF is

    generated which is also sampled at the samescale factoras the character image. Because

    each pixel in the sampled PSF represents an area smaller than the pixels in the final

    image the PSF kernel is

    [ ]2

    ,PSF,PSFKernel

    rscalefacto

    yxji

    ij= . (2.42)

    Since the PSF kernel must be finite in size, the PSF is effectively truncated.

    There are several advantages of using a Gaussian PSF over a Cauchy PSF in terms of

    accurately simulating the continuous convolution. A Gaussian PSF can be safely

    truncated at four times the w and have very little error. However, for a Cauchy PSF this is

    a problem because it is a heavy tailed distribution. To achieve the same accuracy the

    Cauchy PSF would have to be truncated at about 3000 times the w. Another advantage of

    the Gaussian PSF is that it is separable. This means that the convolution can be calculated

    by taking the one dimensional PSF, convolving it with each row, and then convolving it

    with each column. While a Gaussian PSF provides several advantages over the Cauchy

    PSF, the experiments in this thesis involve isolated characters. The white background

    makes it possible for this situation to be simulated even for a Cauchy PSF as long as the

    convolution kernel is a little more than twice the size of the original character. This is

    because the truncated part of the Cauchy PSF would always be over white background.

    After the high resolution images are convolved with the high resolution truncated

    PSFs, the images are then down sampled. The location of the final sampling grid does not

  • 8/8/2019 McGillivary Thesis

    37/118

    22

    necessarily coincide with the high resolution sampling grid. In order to have random

    continuous phase shifts and non-integer factor values it is necessary to interpolate the

    values of pixels. To do this bilinear interpolation can be used because of its simplicity

    and because the errors associated with it are not significant.

    It is also necessary to determine the gradients of the scanned characters. While the

    gradients could be measured from the high resolution grey level image, this is not the

    method that was used. The derivative is a linear operation which means that the

    derivative of the PSF can be taken and then the gradient of the image can be determined

    by convolving the original character image with the resulting kernel. The derivative with

    respect tox of the Cauchy PSF is

    ( )

    ( ) 25

    222Cauchy

    2

    3,PSF

    wyx

    xwyx

    x++

    =

    . (2.43)

    For the Gaussian PSF the derivative is

    ( )

    +

    =

    2

    22

    4Gaussian 2exp2,PSF w

    yx

    w

    x

    yxx . (2.44)

    When these functions are used to create convolution kernels, the functions have to be

    divided by thescale factorsquared. The kernels for the derivatives with respect toy can

    be obtained by transposition. The Gaussian kernel is separable which can be used to

    speed up computations.

    2.2. Edge Finding TechniquesThis thesis focuses on the effect of additive Gaussian noise on scanned images. One

    component is to explore the ability to accurately locate edges in scanned document

    images. Finding lines in an image is critically important in the fields of image processing

  • 8/8/2019 McGillivary Thesis

    38/118

    23

    and computer vision, and there is a substantial amount of work that has been done on the

    topic. A significant amount of attention has gone to developing operators, which bring

    out the edges in an image. This is usually followed by techniques that use the Hough

    transform to find the location of the line [6]. There has also been study of accurately

    locating edges and lines in bilevel rasterized images [8].

    One approach to edge detection is to convolve the image with an edge detector and

    then to threshold the image and locate the edge using a Hough transform [6]. The Hough

    transform works by mapping points to the set of lines that pass through those points.

    Edges can be represented by two parameters such as angle and distance from the origin.

    These two parameters form a parameter space which can be divided into discrete bins.

    The Hough transform is performed by looping through every edge point in the image and

    then incrementing the value in every bin that contains parameters to an edge that runs

    through the point. After this is done for every edge point the true edge can be determined

    by finding the bin with the largest value.

    There are a variety of operators that can be used for edge detection. One such

    operator is the Sobel operator. The Sobel operator is a combination of two operators

    which estimate the two components of the image gradient Gx and Gy. IfA is the original

    image Gx is given by

    AGx

    =

    101

    202

    101

    . (2.45)

    Gyis calculated using an operator that is simply the transpose of the one used to calculate

    Gx. The magnitude of the gradient can then be estimated as

  • 8/8/2019 McGillivary Thesis

    39/118

    24

    22 yx GGG += . (2.46)

    Once the gradient image is determined, it is thresholded to find the edge points, and then

    the Hough transform is used to find the edge. The maximums of the Hough transform

    correspond to the parameters of the edge. The Prewitt and Roberts operators work in a

    way that is similar to that of the Sobel operator. Table 2.1 shows the Sobel, Prewitt, and

    Roberts operators.

    Table 2.1: Simple operators used for edge detection.

    Sobel Operators Prewitt Operators Roberts Operators

    -1 0 1

    -2 0 2

    -1 0 1

    -1 -2 -1

    0 0 0

    1 2 1

    -1 0 1

    -1 0 1

    -1 0 1

    -1 -1 -1

    0 0 0

    1 1 1

    0 -1

    1 0

    -1 0

    0 1

    In addition to the simple Sobel, Prewitt, and Roberts operators more complicated

    operators can be used. One such operator is theLaplacian of Gaussian (LoG) operator.

    This operator is given by

    ( )

    =

    2

    2

    4

    22

    2exp

    rrrh . (2.47)

    This operator is the second derivative of a Gaussian function with a width parameter of.

    The operator is circularly symmetrical. Numerically it is represented by at least a five by

    five kernel. One approximation of the LoG kernel is given by

    =

    00100

    01210

    121621

    01210

    00100

    LoG . (2.48)

    Some of the most important work on finding edges in grey level images was done by

    Canny [7]. The theoretical basis for the edge detection mask developed by Canny

  • 8/8/2019 McGillivary Thesis

    40/118

    25

    depends on being able to separate the image into noise and signal components. However,

    when an image is subjected to a nonlinearity such as thresholding, the noise and signal

    components cannot be separated in this way. The Canny operator begins by smoothing

    the image with a Gaussian. Then the gradients of the image are determined. Two

    thresholds are used to determine which pixels are edge pixels. The first threshold is set

    very higher than the other and any pixel whose gradient exceeds the threshold is labeled

    as an edge pixel. Then pixels that are adjacent to an edge pixel are also labeled edge

    pixels if their gradient exceeds the second threshold. The Canny operator was

    implemented in this thesis using Matlabs built in edge detection function.

    Because using operators such as the Canny operator has no strong theoretical basis in

    bilevel images, we can use a more basic approach. This approach involves selecting data

    points between each pair of adjacent black and white pixels. Then a line can be fitted to

    these points based on the least squared distance. The least squares fitting can either use

    the squared vertical distance of points to the edge or use the squared perpendicular

    distances. Gordon and Seering [8] analyzed the accuracy of least squares at finding the

    location of edges. They use an assumption that the vertical distance between points on a

    digitized line and its corresponding continuous line vary independently of one another.

    Using this assumption they determined the estimation error of edges. The case in which

    the points do not vary independently of one another will be explored in more detail in

    Section 2.4.

    The least squares approach and the operator based approaches are explored

    extensively in Section 5.1.1. In that section experiments are conducted to determine

  • 8/8/2019 McGillivary Thesis

    41/118

    26

    which of the methods work best for bilevel straight edges. The effectiveness of

    perpendicular vs. vertical least squares will also be analyzed.

    2.3.

    Circle Fitting Techniques

    In order to understand the effects of noise on 2-D objects, it is necessary to explore

    the effect of noise on the ability to precisely determine the position and radius of scanned

    circles. To do this, data points were selected between adjacent pairs of black and white

    pixels, then a circle was fit to these data points. Several classical methods of doing this

    fitting are discussed in [9]. This section includes a discussion of these methods.

    The simplest method for fitting a circle to data points is called Algebraic circle fitting.

    The equation of a circle can be given implicitly by

    ( ) 0=++= caF TT xbxxx , (2.49)

    where the coefficients a, b and c are such that a is not zero and b is a two element

    column vector. If the values of each data point are plugged into this equation, the result is

    uB = , (2.50)

    where is the error vector which is to be minimized, B is a matrix

    +

    +

    =

    1

    1

    212

    22

    1

    1211212

    211

    mmmm xxxx

    xxxx

    MMMMB (2.51)

    and u=[a,b1,b2,c]. Since both sides of Equation 2.49 can be multiplied by a constant, a

    constraint can be applied to u that it must be a unit vector. The squared Euclidean norm

    of can be minimized using Lagrange multipliers. The constraint that u is a unit vector is

    applied to create the following equation

  • 8/8/2019 McGillivary Thesis

    42/118

    27

    u

    u

    u

    =

    22

    . (2.52)

    The left side of the equation becomes

    ( ) ( )( ) ( )uBB

    u

    BuBu

    u

    uBuB

    u

    =

    =

    =

    TTTT

    2

    2

    . (2.53)

    The right side also simplifies giving

    uuBB = 22 T . (2.54)

    The Lagrange multipliers are also the eigenvalues ofBTB. Substitution gives

    === uuBuBu TTT2 . (2.55)

    This means that the squared Euclidean norm of is minimized by using the value ofu

    associated with the smallest eigenvalue ofBTB. Equivalently u is the right singular vector

    associated with the smallest singular value ofB. The center can be obtained from u using

    =

    a

    b

    a

    bz

    2

    ,

    2

    21 . (2.56)

    The radius is obtained by using

    a

    c

    ar =

    2

    2

    4

    b. (2.57)

    The problem with Algebraic circle fitting is that minimizing the Euclidean norm of does

    not necessarily result in the best fitting circle. It is especially poor when fitting a circle to

    an arc of data points.

    An alternative to the Algebraic method is Geometric circle fitting. Geometric circle

    fitting is a nonlinear least squares procedure which minimizes the sum of the squared

    distances of points to the nearest point on the circle. If the center point of the circle is z

  • 8/8/2019 McGillivary Thesis

    43/118

    28

    and the radius is r, then the distance of a pixel to the circle is

    ( )22 rd ii = zx . (2.58)

    Ifu=[z1,z2,r]T

    defines the circle, then u needs to be selected to minimize

    ( )m

    i

    id u2 . (2.59)

    A method called Gauss-Newton is used to minimize this expression. The method starts

    out with a decent guess of the best value ofu. Ifd(u) is a column vector of the functions

    di(u), then the idea is to find the change h in u which will minimize d(u) in the least

    squares sense. To do this d(u+h) is approximated using a Taylor series expansion

    ( ) ( ) ( ) huJudhud +=+ , (2.60)

    where J(u) is the Jacobian matrix. In this case the Jacobian is given by

    ( )

    =

    1

    1

    2211

    1

    122

    1

    111

    m

    m

    m

    m

    xu

    xu

    xu

    xu

    xu

    xu

    xu

    xu

    MMMuJ . (2.61)

    The change in u that is required is found by solving the linear least squares problem

    ( ) ( ) 0+ huJud . (2.62)

    The value ofh is

    ( )( ) ( ) ( )uduJuJuJh = TT 1)( . (2.63)

    With every iteration of the algorithm, h is used to update u and a closer approximate

    solution of the nonlinear least squares problem is found. This method produces a much

    better fit. As was stated earlier, this method requires an initial guess of the value ofu.

    One way to obtain this is to use the Algebraic circle fitting. Another way is to find the

  • 8/8/2019 McGillivary Thesis

    44/118

    29

    mean of all the data points and make this the center of the circle. The radius can be

    estimated by taking the distance of each point to this center and taking the mean of those

    distances. Figure 2.6 compares the results of Algebraic and Geometric least squares for a

    certain set of points. The points were chosen experimentally to show the weakness of the

    Algebraic technique. The Geometric technique always produces better results because the

    error that it attempts to minimize is more sensible.

    2.4. Effect of SamplingSampling of continuous bilevel images can produce several undesirable effects. Since

    the position of the sampling grid relative to the image is random, there are variations that

    occur in the resulting bitmaps. Even without noise there is an unavoidable Hamming

    Distance between different scans of an image, even from the same scanner. The

    geometric precision of edge and circle measurements is limited by the sampling

    Figure 2.6 Algebraic Fitting can sometimes result in a poor fit. The center of the Algebraic fit is

    (10.24,20.98) and the radius is 4.83. The center of the Geometric fit is (10.10,7.92) and the

    raidus is 11.77.

  • 8/8/2019 McGillivary Thesis

    45/118

    30

    resolution. In addition the sampling grid for images is anisotropic. This means that edges

    at certain orientations are measured with less precision than edges at other orientations.

    One of the goals of this thesis is to explore how the random effects of noise and random

    phase shifts affect document images.

    A review of the literature shows several tools for analyzing the effect of phase on

    scanned images. Dorst and Smeulders [10] gave an expression for determining the set of

    continuous line segments which could generate a certain chaincode string. This

    expression could also be used to find the worst case positional accuracy of an edge

    segment. Dorst and Duin [11] introduced the concept of spirographs and used it to

    calculate the average and worst case positional accuracy of edges. Havelock [12] used

    modulo grids to analyze the positional accuracy of various shapes. Sarkar [1] expanded

    upon Havelocks work by using modulo grids to calculate the number and frequency of

    bitmaps that an object would produce.

    Spirographs can be used to describe the way in which a continuous edge is sampled.

    Any edge can be flipped on the reflection linesx=y,y=0 andx=0. Because of this the

    effect of sampling any straight edge can be determined by studying those straight edges

    with a slope in the range (0, 1). A spirograph consists of a circle withNpoints on it

    which divide it intoNarcs as seen in Figure 2.7

    Each consecutive point is placed the same constant clockwise distance around the

    circle from the previous point. The sampling grid for an edge can be represented by

    making the distance between each consecutive point equal to the slope of the edge. The

    random location of the sampling grid with respect to the edge can be represented by

    randomly placing the edge as a point on the spirograph;Nis then the number of columns

  • 8/8/2019 McGillivary Thesis

    46/118

    31

    in the edge image. If the edge is shifted up vertically, it is moved clockwise around the

    circle. If it crosses a sample point on the spirograph, the bitmap of the edge will change,

    and the number of segments formed around the spirograph is the number of bitmaps a

    certain edge can have.

    One special case is when the slope of an edge can be represented by the irreducible

    fractionp/q and when q

  • 8/8/2019 McGillivary Thesis

    47/118

    32

    the sampling grid. The position of the edge relative to the sampling grid is a random

    number. So the variance of the perpendicular distance between the measured and actual

    positions of a noise free edge is

    ( )22121

    qpVar

    += . (2.65)

    In order for the length of an edge segment with slope m to beL the number of

    columnsNmust be determined by

    .

    12

    +=

    m

    LroundN (2.66)

    If a spirograph is defined with the first parameter being the distance between successive

    points and the second byN, then the spirograph for this edge is

    +1,

    2m

    LroundmSPIRO . (2.67)

    The precision of edge measurements can be determined by the combination of two

    parameters. The distance parameter is the perpendicular distance of the edge from the

    midpoint of the continuous edge segment where the distance is positive if the measured

    edge is above the continuous edge and negative otherwise. The angular error is the

    difference between the angle of inclination associated with the theoretical edge and the

    angle of inclination associated with the measured edge. The variance in the distance for

    an edge can be shown to be

    ( )( )

    +

    += 2

    2

    3

    112ii

    i pedm

    dDistanceVar , (2.68)

    where di is the length of the ith

    arc on the spirograph andpei is perpendicular distance

  • 8/8/2019 McGillivary Thesis

    48/118

    33

    between the actual and measured edge when the phase is chosen to be the midpoint of the

    ith

    arc. The variance of the angular error between the measured and theoretical edge can

    be shown to be

    ( ) = 2ii aedorAngularErrVar , (2.69)

    where aei is the angular error of the measured edge when the phase is chosen to be on the

    ith

    arc. Figure 2.8 shows the variance of the distance and angular error as a function of

    slope. The slopes of 0, 1/2 and 1 have large distance variance, but the angular errors for

    these slopes are zero. The greatest angular errors occur for edges with slopes close to but

    not equal to 0, 1/2 and 1.

    In addition to the effects of sampling on the geometric measurements, sampling also

    affects the Hamming distance between two scans of the same object. If the two scans had

    the same phase and there were no noise, then the two scans would have a Hamming

    distance of zero. However, because different phases result in different bitmap

    configurations, there is some Hamming distance even when two objects are aligned to

    (a) (b)Figure 2.8: (a) The variance of the distance between the measured and actual edges is determined for

    edges that are 20 pixels long. (b) The variance in the angular error is determined for

    noiseless edges that are 20 pixels long.

  • 8/8/2019 McGillivary Thesis

    49/118

    34

    minimize the difference. The Hamming distance that will occur depends on the shape of

    the object being scanned. Modulo grids can be used to determine the expected Hamming

    distance between scans with independent random phases and no noise. However, this

    approach probably would not be more efficient than large experiments that generate scans

    and then find the minimum Hamming distance. Certain shapes like circles are known to

    have high Hamming distances because the size of the locals in the modulo grid are small.

    For this same reason these shapes have been analyzed for their use in image registration

    [13],[14].

    Neither modulo grids nor spirographs can predict the effect of combining sampling

    and noise on geometric measurements. The phase of simulated scans in an experiment

    can be fixed in order to isolate the effects of noise. Then further experiments can explore

    the combined effects of noise and random phase. These noise effects are explored in

    detail in Chapters 4 and 5.

  • 8/8/2019 McGillivary Thesis

    50/118

    35

    3. NOISE SPREAD THEORY

    For grey level images noise is usually described by the standard deviation noise of the

    additive noise. However the amount of noise present in a bilevel scanned image is not

    dependent purely on the level of noise added prior to thresholding. This can be seen

    clearly by looking at Figure 3.1. The first three images all have the same amount of

    additive noise. However, the noise spread(NS) increases from left to right. One of the

    central points of this thesis is to derive this quantity and show that it is a good

    representation of the amount of noise in a bilevel image. This makes it possible to

    generate synthetic bilevel images with specific amounts of noise.

    3.1. Noise Spread for straight isolated edgesThe basic idea behind noise spreadis that when an image is thresholded the noise is

    concentrated on the edges of the objects in the image. The noise spreadfor a given edge

    is the size of the domain in which pixels are affected by additive noise. Typically this

    domain, called the noise spread region, is less than a pixel thick. Its size is still relevant

    because if it is larger then it is more likely that an edge pixel will be in this region. Noise

    Figure 3.1 Edges with varying amounts of noise spread. While the standard deviation of the noise in

    first three images is the same, the noise spread is different. The picture on the far left shows

    an extreme amount of noise.

    w=0.64

    =0.5

    noise=0.05

    NS=0.2

    w=1.27

    =0.5

    noise=0.05

    NS=0.4

    w=1.9

    =0.5

    noise=0.05

    NS=0.6

    w=3.16

    =0.5

    noise=0.1

    NS=2.0

  • 8/8/2019 McGillivary Thesis

    51/118

    36

    spreadis dependent in part on the shape of the object being scanned. Initially

    noise

    spreadis derived for isolated edges. Isolated edges are among the simplest shapes upon

    which to do experiments and can be represented in one dimension as step functions.

    Section 2.1.2 discussed how straight edges are affected by scanning, but that section was

    focused on the deterministic effects of scanning. Nondeterministic effects such as

    additive noise must be discussed in the context of probability.

    Figure 3.2 shows how an edge is affected by scanning. Figure 3.2(a) shows what

    happens when noise is disregarded. As was discussed in Section 2.1.2 the edge shifts by

    c. However, as shown in Figure 3.2(b), when noise is added there is a region in which

    the value of pixels after thresholding is uncertain. This region is called the noise spread

    region. The size of this region is called the noise spread(NS), and as illustrated in Figure

    3.1, it is a good quantitative measure of how noisy a bilevel image is. To precisely define

    NSit is necessary to define the probability that a pixel at a certain distance from the edge

    will be above the threshold. This threshold probability (THP) depends on the cumulative

    distribution function (CDF) of the noise and is

    (a) (b)

    Figure 3.2: (a) Edge after blurring with a generic PSF of width, w. When no noise is added, the

    thresholding produces the edge shift c. (b) Edge with noise. The uncertain boundary shownin gray, is the noise spread region. The effects of sampling are not shown.

  • 8/8/2019 McGillivary Thesis

    52/118

    37

    ( )

    =noise

    w

    x

    x

    ESF

    CDFTHP Gauss . (3.1)

    The noisy edge will be above the threshold with probability near 0 on one side of theNS

    region and with a probability of near 1 on the other side of theNSregion. The THP can

    then be represented with a piecewise approximation

    ( )

    ( ) ( )

    ( )

    +

    ++

    +

    =

    21

    22

    12

    0

    )THP(

    NSx

    NSx

    NSx

    NS

    NSx

    x

    c

    cc

    c

    , (3.2)

    where

    cxx

    NS

    =

    =THP

    1. (3.3)

    The derivative of THP is a function of the noisesprobability density function (PDF)

    noise

    noise

    w

    x

    w

    w

    x

    x

    =

    ESF

    PDF

    LSFTHP

    Gauss

    . (3.4)

    Evaluating atx=-c, which is where ESF(x/w)=, gives

    ( ))wx noisex c =

    =

    2ESFLSFTHP

    1

    . (3.5)

    Substituting this back into Equation 3.3 gives an estimate of the noise spread

    ( )( )

    1ESFLSF

    2

    =

    wNS noise . (3.6)

  • 8/8/2019 McGillivary Thesis

    53/118

    38

    Figure 3.3 shows the piecewise approximation of the threshold probability curve.

    While the piecewise approximation is illustrative it is also fairly crude. The original

    definition of the noise spreadwas the size of the domain in which the values of pixels are

    uncertain. This level of uncertainty can be quantified by defining the noise spread as the

    breadth of the domain over which the threshold probability is in the range (,1- ). The

    arbitrary cutoff is used to determine the boundaries of the noise spread region. The more

    accurate approximation of the THP starts by linearizing the ESF atx=-c

    ( )( )cx

    ww

    x

    ++

    1ESFLSFESF . (3.7)

    Substituting this into Equation 3.1 gives

    ( )( )( ) ( )

    +=

    w

    xx

    noise

    c

    1

    GaussESFLSF

    CDFTHP . (3.8)

    A parameterZcan be defined such that

    Figure 3.3 The threshold probability (THP) function is shown for a Gaussian PSF with w=1, =0.7

    and noise=0.1.Noise spreadin this case is about .72. The piecewise approximation of

    threshold probability is inaccurate at the tails of theTHP function.

  • 8/8/2019 McGillivary Thesis

    54/118

    39

    ( ) ZGaussCDF1 = . (3.9)

    Since the Gaussian CDF is odd symmetric, the noise spread region will be centered on c

    so

    = cNS

    2

    THP1 . (3.10)

    This can be evaluated by using Equation 3.8 which gives

    ( )( )

    =

    w

    NS

    noise

    2

    ESFLSFCDF1

    1

    Gauss . (3.11)

    NSis then solved for by using Equations 3.9 and 3.11, which produces

    ( )( )

    1ESFLSF

    2

    =

    wZNS noise . (3.12)

    This definition is identical to the one in Equation 3.6 if

    253.12

    2==

    Z . (3.13)

    In order to maintain consistency the cutoff defined in Equation 3.13 will be used. The

    resulting value ofis 0.105 which is a reasonable level of uncertainty. This cutoff means

    that the noise spread is the breadth of the domain over which the threshold probability is

    in the range (0.105, 0.895). The edge images in Figure 3.1 show that noise is very

    noticeable in images withNSvalues as low as .2. In most casesNSwill be less than the

    extreme example on the far right in Figure 3.1. At some point the added noise is extreme

    enough that even pixels away from the edge have an uncertain value. When this occurs,

    the approximation in Equation 3.8 is no longer valid. Where this occurs depends on the

    PSF used and the degradation parameters. Generally the approximation is better when

    is close to 0.5 and when noise levels are small. Figure 3.4 illustrates the approximation in

  • 8/8/2019 McGillivary Thesis

    55/118

    40

    Equation 3.8. The parameters used in Figure 3.4 are extreme; for most sets of degradation

    parameters it is very hard to distinguish the results from Equations 3.1 and 3.8.

    3.2. Extending Noise Spread to general shapesNoise spreadwas introduced for straight edges in Section 3.1, but it is possible to

    extend the noise spread theory to arbitrary shapes. In Section 3.1 the noise spread region

    was defined to be anywhere the THP is between and 1-. This applies directly to

    general shapes. However, with the exceptions of isolated straight edges, scanned strokes,

    and circles, the noise spread region will not have the same thickness along the boundary

    of a general object. For this reason the noise spread must be defined for any point on the

    contourCdefined bys(x,y)=. To do thisNScan be defined as the thickness of the noise

    spread region along the direction defined by the gradient at any point on the contourC.

    Figure 3.4 The threshold probability function given in Equation 3.1 (solid) is compared to the

    approximation in Equation 3.8 (dashed) with the parameters w=1, =0.7 and =0.1.

    The actual threshold probability is a little lower on the tails.

  • 8/8/2019 McGillivary Thesis

    56/118

    41

    NSof an entire object can then be defined as the mean value of the noise spreadon the

    contourC. If the mean reciprocal of the magnitude of the gradient ofs(x,y) along Cis

    estimated, then this estimate can be used to find the noise spreadof the object. To do this

    a linearization procedure is required.

    If (x0,y0) is a point on the contours(x,y)=, and the notationsx andsy is used to

    denote the partial derivatives with respect tox andy, then the approximation

    ( )( )

    ( )( )

    ( )0000

    000

    00

    000 ,

    ,

    ,,

    ,

    ,yxsu

    yxs

    yxsuy

    yxs

    yxsuxs xx +

    +

    + (3.14)

    can be made. This relationship can be used to find the value ofs(x,y) at any point near the

    contours(x,y)=. Following the reasoning that gave Equation 3.1, the THP can be

    represented as a function ofu and the magnitude of the gradient

    =

    noise

    usu

    GaussCDF)THP( . (3.15)

    The noise spread at any point on the contour can be expressed as a function of the

    gradient at that point

    ( )( )00

    00,

    2,

    yxs

    ZyxNS noise

    =

    . (3.16)

    The noise spread of the object is just the line integral with respect to arc length divided

    by the total arc lengthL

    ( )

    ( ) ==

    C

    noiseC

    yxsLL

    dlyxNS

    NS,

    12,

    . (3.17)

    In the case of straight edges, scanned strokes, and circles the gradient is constant. For

    other shapes the line integral has to be estimated numerically.

  • 8/8/2019 McGillivary Thesis

    57/118

    42

    Noise spreadhas been generalized to apply to arbitrary shapes. It is a very powerful

    measure of the edge noise in binary document images. It makes it possible to compare

    noise levels of different objects scanned with different scanner parameters. In Section 3.3

    its relationship with the Hamming distance of a scanned object and its noiseless template

    is explored.

    3.3. Relationship between Noise Spread and Hamming DistanceThe real benefit of determining the noise spreadof a scanned object is that it provides

    an effective measure of how noisy an object is. Since Hamming distance provides a

    metric of how different two scanned objects are, it is very useful for analyzing the noise

    in bilevel images. The Hamming distance between a template and a scanned character is

    determined by the combination of the phase effects and of the noise. The phase effects

    were described earlier. If the phase effects are removed by forcing the template and the

    scanned object to have the same phase, then the effects of noise alone can be analyzed.

    When this is done it is possible to relate the expected Hamming distanceHto the noise

    spread.

    To see how this is true it is best to start with the case of isolated scanned edges. The

    probability of error (PE) is defined to relate the expected Hamming distance to the noise

    spread. The probability of error is related to the THP and is the probability of a pixel

    having a different value because of noise than it would without noise. Formally it is

    defined by

    tol)

    R=(Rmin+Rmax)/2;

  • 8/8/2019 McGillivary Thesis

    118/118

    103

    [CSF]=CauchyCSF(Rf,alpha,R);if CSF-theta>0

    Rmax=R;

    elseRmin=R;

    endend