document auto alignment

Upload: yoram-furth

Post on 08-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 Document Auto Alignment

    1/14

     

    Document Auto-alignment 

    Yoram Furth, Felix Vilensky, Moti Ben Laish

     ABSTRACT 

    These days, when smartphones take the functionalities of the traditional office, document

    scanning by just "one click" is a necessary ability.

    Such an application may have so much examples of usage: business cards, receipts, ID cards,

    lecture notes, blackboard, screens, brochures, etc.

    The task is an automatic alignment and enhancement dedicated for documents.

    The main challenge is to correctly detect the document under a variety of conditions:

    different viewing angles (projection), different backgrounds, detection when more than a

    single document is present in the image.

    The proposed solution performs similarly and even in some scenarios outperforms existing

    commercial algorithms.

    Some ideas, if were implemented, could push performance even further.

  • 8/19/2019 Document Auto Alignment

    2/14

     

    1.  The overall system at a glanceThe provided system takes a photograph of a document, automatically detects it, eliminates

    projection and performs basic enhancement.

    The system operation flow is as follows (see Figure 1.1): Image acquisition (from file or from

    webcam), document boundary detection, projection elimination, basic enhancement.

    The most challenging mission of this project is the document boundary detection. So, this

    report focuses mainly on it.

    The “detection” is divided into 3 main parts (see Figure 1.2):

    1.  Textual region detection – detection of the textual region in the target document.

    2. 

    Candidate line detection – detection of potential document boundary lines.

    3.  Document quadrangle finding – finding the combination of lines which represents

    the target document best.

    Part 1 – receives an RGB image and returns a polygon representing the area unifying the

    textual regions within the photographed document. The assumption is that many such

    regions exist in the document. The algorithm uses sophisticated tools which take in account

    texture, color components, and scaling variety. Finally, all the detected regions are unified to

    one big polygon.

    Part 2 – receives an RGB image and returns a “stack” of “suspicious” lines, for each of the 4

    document quadrangle segments. The assumption is that the document represents a rigid

    planar rectangle in the space, and that its boundaries are well observed in the photographed

    image. The algorithm analyses the gradients of the color image, and estimates the general

    orientation angle of the document. Then, using Directional Radon, it finds candidate lines

    corresponding to each direction of the searched quadrangle.

    Part 3 – receives the output of part 1 & 2, and returns the final detected quadrangle. The

    assumption of this part is that the edge along the document boundaries is relatively uniform.

    The algorithm first filters lines which intersect the text area (part 1). Then, all possible

    combinations of 4 lines are investigated. For each combination the algorithm analyses the

    gradients along the 4 lines, and calculates the estimated probability to represent a uniform

    straight edge.

  • 8/19/2019 Document Auto Alignment

    3/14

    Figure 1.1: Detection overall precess

    Figure 1.2: main blocks 

    Figure 1.3: implemented GUI 

    1  2  3  4 

    Acquisition  Detection  Editing  Back-projection 

    & Enhancement 

  • 8/19/2019 Document Auto Alignment

    4/14

    2. 

    Textual Area Detection in a Document

    Student Name :Felix Vilensky

    The input of this section is the image and the output is a polygon containing the union of

    textual regions of a document. It enables the system to eliminate non-boundary lines from

    the pool of lines obtained from the line detection stage. The polygon should be high

    confidence, meaning it should not exceed the boundaries of the document, even at the price

    of not detecting everything.

    2.1. The physical problem: Measuring texture and salient regions

    in a document

    Regions with a lot of text, dense drawings and handwriting are characterized by

    strong texture - strong edges in different directions in a small area. These regions

    are also characterized by strong salience relative to the background.

    2.2. 

    Multi-Scale corner density and MSER analysis

    In order to measure texture strength and by inspiration from SIFT, the approach of

    multi-scale corner density is introduced. Harris corner detection is implemented on

    different scales using a Gaussian pyramid of a gray level image. Each image in the

    pyramid is divided into a grid of non-overlapping spatial bins. The corner detections

    in each such bin are counted and a corner count map for each scale in the pyramid is

    obtained. Since the grid size is the same for all the scales of the pyramid, the count

    maps are added together bin to bin (see Fig 2.1) to obtain a single one. The corner

    count map is then scaled to image size, thresholded and multiplied by MSER mask

    (in order to utilize the salience property).

    2.3. 

    Differentiating between background and document

    Since the goal is to auto-crop and auto-align a single document from an image, the

    following reductions can be reasonably made:

    1.  There is a significant color difference between the document and its

    background.

    2.  The center of the image is a pixel in the document.

    3. 

    The image is taken with “good intentions” – At least part of the document is in

    focus and without strong shadows.4.  The surface on which the document lies is relatively lambertian.

    Reductions 2 and 3 are almost naturally maintained since the image should be taken

    to capture a good quality document. Reduction 4 is needed to avoid light spots (Fig

    2.2), which are hard to filter using a single image.

    In order to differentiate between detected regions inside the target document and

    in the background, color histogram analysis is performed. For each MSER, a color

    histogram of its neighborhood is calculated. The histogram of the MSER, which is

    closest to the center of the image, is considered “canonical” and MSERs with

    histograms that are substantially different are excluded, thereby removing possible

  • 8/19/2019 Document Auto Alignment

    5/14

    detections in the background. Final polygon is obtained by convex hull union of the

    centroids of the regions in the final mask (color enhanced MSER applied with

    thresholded image size corner count map).

    See full outline of the algorithm (Fig 2.3).

    The histogram difference measure is calculated by conversion of the histograms to

    binary ones (1 for color with occurrences, 0 for color without). Then the difference is

    calculated by| − | ℎ     ℎ. Hence a single color presentin one histogram and absent in the other increases by one the histogram difference

    metric.

    2.4. Performance analysis of this part

    A total 97 images are used to compute two performance measures of this part. The

    selected images maintain reductions 1-3, reduction 4 is not always maintained to

    demonstrate the strength of the algorithm (it sometimes deals well with non-lambertian surfaces.) and its limitations and because it’s not a natural demand from

    the user.

    The measures calculated are precision (between 0 and 1), which is defined as

    [   ]∩ [   ] and recall (between 0 and 1), which isdefined as:

    [   ]∩ [      ℎ ].The main demand on this part of the whole system is that the polygon doesn’t

    exceed the area of the target document (precision=1).

    Three sets of parameters are evaluated. Three values of: thresholds on the corner

    count map, numbers of colors in the histogram, histogram metric threshold to

    differentiate between background and document. For each set of parameters the

    mean and standard deviation of each measure is computed. For the results table see

    Fig 2.4.

    It can be seen that the precision is generally very high and close to 1 and that there

    is a tradeoff between recall and precision.

    2.5. 

    What have I learned?

    First, I’ve learned that trivial solutions usually don’t achieve the desired results. Second,

    I’ve grasped the power of multi-scale analysis.

  • 8/19/2019 Document Auto Alignment

    6/14

    Figure 2.1 : The multi-scale corner density method-grid is scale invariant.

    Figure 2.2: The effects of strong light spots and the ease with lambertian surfaces.

  • 8/19/2019 Document Auto Alignment

    7/14

     

    Figure 2.3: The general outline of the algorithm. 

    Criteria Set 1 Set 2 Set 3

    Corner Count Map Threshold top 5% top 1% top 3%

    Number of Colors 15 15 15

    Color Differnce Threshold 2 2 2

    Mean Precision 0.9913 0.9942 0.9934

    STD. Precision 0.0419 0.036 0.0358

    Mean Recall 0.5046 0.3261 0.4768

    STD. Recall 0.2652 0.2424 0.2571

    Figure 2.4: Performance Analysis.

    Multi-

    Scale

    Corner

    Color

    HistogramMSER

    Convex

    Hull

    Polygon

  • 8/19/2019 Document Auto Alignment

    8/14

    3. 

    Find candidate boundary lines

    Student Name: Yoram Furth

    This part, as just described, receives an RGB image and returns a “stack” of suspicious lines

    corresponding to each one of the 4 document quadrangle segments. The main requirement

    from this part is to not miss any boundary. That is, any of the 4 document boundaries must

    have a corresponding line, among all the other lines. Those lines are going to be used by the

    following section in order to find the one which belong to the rectangle of interest.

    3.1. The physical problem: Find boundaries of planar rectangular

    objects

    The assumption is that the document represents a rigid planar rectangle in the space,

    and that its boundaries are well observed in the photographed image. In fact, its

    projection to the image plane transforms this rectangle to a general quadrangle. Usually

    the observed edges are a function of intensity changes, but sometimes also of othercomponents such colors, texture, focus, etc. Hence, the challenge is to find aligned

    gradient which could be part from a rectangular plane in space.

    The natural photographing’s context has advantages and disadvantages. On the one

    hand it allows some reductions, but on the other hand it requires to release some

    degrees of freedom. For example, it allows to assume that the planar rectangle has a

    relatively small tilt angle with the optical axis (

  • 8/19/2019 Document Auto Alignment

    9/14

    assumption is that the document boundaries are strongly relatively to other straight

    edges in the image. The proposed solution is to build an orientation histogram. It is

    similar to the one of SIFT, but on the whole image, and adaptations to the specific case.

    c) Candidate lines detection – at this stage the suspicious lines are detected based on

    Radon transform applied around the orientation angle. First, a Radon projection is

    applies at the estimated orientation angle ±20°. Then, all the highest peaks are selected.

    Each peak represents a candidate line in the original image. See also figure 3.3. 

    3.3. 13BCombining gradients direction

    The gradients magnitude has often only partial information which is not good enough

    for detection and estimation. In this project, like many real cases, it were found that

    taking the gradient direction in consideration may have a very positive impact on the

    detection quality.

    One important usage of the gradient directions were assimilated in the Radontransform. Instead of naively accumulate the Radon along the magnitude map, a special

    Directional Radon were developed. For each projected angle, a corresponding projected

    gradients map is calculated, and only then the Radon is accumulated. By this technic all

    the points that have no contribution toward the search direction are weaken, and the

    lines scores became a much more significant.

    Another usage exists while preparing the lines profiles for analyzing them on section 4

    (here below). There, one could sample the lines profile directly on the magnitude map.

    Instead, for each line a gradients projection map is generated at the specific direction,

    and only then the profile is sampled.

    3.4. 14B

    Finding perpendicular orientation

    One way to estimate orientation is by using gradients directions, similar to what is done

    in SIFT [2]. In the actual case it involves several challenges. First, since the search area is

    not local, one may confuse the desirable orientation with other objects, or even with

    noise. Second, even within the document itself there is not only one single direction, but

    4 different directions. So, the developed solution has 2 parts as the following:

    While building histogram of the gradients angles, a special weighted accumulation is

    done. Where, the weight is the magnitude amplitude at each pixel. In that way the mostdominant edges become more dominant in the histogram, in expense of the weak, like

    noise etc.

    While searching orientation it would be nice is the document edge could contribute each

    other rather than disturb. Therefore, the proposed solution is to “fold” to histogram by

    cycle of 90°. In this way, at the ideal case, the histogram will remain with only one

    significant mode.

    An illustration is available at figure 3.2

  • 8/19/2019 Document Auto Alignment

    10/14

    3.5. Combining color components

    In many cases simple “gray” gradients is enough for finding the document boudaries.

    But sometimes there are boundaries with very weak edges in the intensity domain, even

    though they are well observed by human eyes. Usually those kind of edges are well seen

    in color domains. That is why dedicated Color Gradients were developed here.

    Those Color Gradients are composed by magnitude and direction. The magnitude is a

    simple extension to the vector case, whose meaning is Euclidian distance between

    colors:

    ‖⃗(, )‖ =   ⃗(,) = �⃗2+ ⃗

    212= �

    2+

    2

      12

     

    For the direction, for this case, it is enough to find the most dominant direction. So, for

    each pixel the direction is selected from the map in which the magnitude is the maximal,

    that is,

    ⃗(,) =  (,) , while, =    (,) 3.6. What have you learned, and what left open?

    It this project I learned that sometimes we don’t need to look far for good solutions.

    Often ones can develop solutions based on deep understanding of the relatively simple

    tools, and by fitting or expending them to the specific project requirements. It is

    particularly important in this context to well defining what is given, and what is required

    to be solved.

    At the end, I would like to go deeper into color and texture analyses. For example I

    would like to develop a mechanism that can detect fractional edges defined by different

    operators, such color, texture etc. Although there are works on the subject, I feel that

    there is still quite a bit of progress which may be done in the field.

  • 8/19/2019 Document Auto Alignment

    11/14

     

    Figure 3.1: The general outline of the algorithm. 

    Figure 3.2: Orientation detection process.

    Figure 3.3: Lines detection process for a specific direction. 

  • 8/19/2019 Document Auto Alignment

    12/14

    4.  Finding the Best Document Quadrangle

    Student Name: Moti Ben Laish

    The purpose of this part is finding four points that could be the corners of a document. The

    input of the algorithm is a set of lines (Yoram) and the polygon of text area (Felix) in the

    potential document. The algorithm is based on two assumptions. The first is that all the

    document lines are found in the input lines got from Yoram, the second is that the text area

    is accurate.

    4.1. 

    17BThe physical problem: Find document quadrangle in image 

    In my case the physical problem was filtering wrong lines. Wrong lines are lines created

    by some objects or texture in the input image without relation to the real boundaries of

    the document.

    4.2. 

    18BSmart information from linesDuring integration with Yoram we built a smart structure of a line which includes

    information about radon score, r and theta value of each line, color gradient of each

    pixel in the line, start and end coordinates.

    4.3. 19B

    Finding the Best Quadrangle – step by step

    1.  Text filtering – first I drop all the lines that pass through the text polygon. 

    2.  Creating groups of lines – the way that a line is created by Yoram, gave me the

    idea to separate the line into four groups, up, down, right and left of the target

    document.

    3. 

    Filtering the line with low radon score

    4.  Find all the possible combinations of a quadrangle from the line groups

    5.  Correlation calculating – every possible combination gets a correlation score. As

    we can see in figure 4.2 I take a vector of edge Intensities of each line in the

    group and a vector of series of 4 windows, the size and the place of each window

    depends on the location of vertex points. Then I calculate the correlation

    coefficients of each group of lines. The highest score group gives the best fitting

    group and hence the 4 vertices of the quadrangle. 

    4.4. 

    20BWhat have you learned and what left open?

    In this project I realized the importance of integration between parts of projects, I

    learned about the capabilities of Radon algorithms. To make the output of this part

    more robust I would recommend evaluating the entropy in the scope of the candidate

    quadrangle – figure 4.3.

  • 8/19/2019 Document Auto Alignment

    13/14

    Figure 4.1 – All the possible lines: purple – line pass through text (polygon in red),

    thick yellow – the document

    Figure 4.2 – correlation method: red – vector of 4 window, blue –

    Intensity of gradient in each line

    Figure 4.3 – Entropy measured: Left – entropy of image in lab color map, Right –

    entropy of scope candidate rectangle

  • 8/19/2019 Document Auto Alignment

    14/14

    5.  Summary:The proposed algorithm succeeds in a large percent of cases under the aforementioned

    reductions and in many cases even when these reductions are not met. A visual comparison

    with commercial applications shows that the proposed algorithm achieves better results in a

    substantial number of cases (especially when a single image is given as input).

    Some ideas to further improve the results include: per image optimization of algorithm

    parameters, cross-talk between line detection part and textual regions detection part,

    improving reliance of quadrangle finding part on textual regions detection part to better

    deal with cases when document boundaries are absent from the image, using preview to

    eliminate undesired effects in the captured image (i.e. light spots).

    An excel file with performance results for the proposed algorithm (as a whole and by part)

    applied to many images can be found here:

    Performance Analysis Excel File.

    A detailed description of the performance measures can be found in the “Performance

    Analysis Readme” tab.

    6.  References:[1]  Canny, John. "A computational approach to edge detection." Pattern Analysis and Machine Intelligence,

    IEEE Transactions on 6 (1986): 679-698.

    [2]  Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive representation for local image descriptors."

    Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE ComputerSociety Conference on. Vol. 2. IEEE, 2004.

    [3]  Jung, Claudio Rosito, and Rodrigo Schramm. "Rectangle detection based on a windowed Hough transform."

    Computer Graphics and Image Processing, 2004. Proceedings. 17th Brazilian Symposium on. IEEE, 2004.

    [4] 

    Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010 –

    Section “Color edge detection” pp 233-234

    [5]  Epshtein, Boris. "Determining document skew using inter-line spaces." Document Analysis and Recognition

    (ICDAR), 2011 International Conference on. IEEE, 2011.

    [6]  Hartl, Andreas, and Gerhard Reitmayr. "Rectangular target extraction for mobile augmented realityapplications." Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.

    [7]  Skoryukina, Natalya, et al. "Real time rectangular document detection on mobile devices." Seventh

    International Conference on Machine Vision (ICMV 2014). International Society for Optics and Photonics,2015.

    [8]  Kurilin, Ilya V., et al. "High-performance automatic cropping and deskew of multiple objects on scannedimages." IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2014.

    https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0