document auto alignment

8/19/2019 Document Auto Alignment

1/14

Document Auto-alignment

Yoram Furth, Felix Vilensky, Moti Ben Laish

ABSTRACT

These days, when smartphones take the functionalities of the traditional office, document

scanning by just "one click" is a necessary ability.

Such an application may have so much examples of usage: business cards, receipts, ID cards,

lecture notes, blackboard, screens, brochures, etc.

The task is an automatic alignment and enhancement dedicated for documents.

The main challenge is to correctly detect the document under a variety of conditions:

different viewing angles (projection), different backgrounds, detection when more than a

single document is present in the image.

The proposed solution performs similarly and even in some scenarios outperforms existing

commercial algorithms.

Some ideas, if were implemented, could push performance even further.


2/14

1. The overall system at a glanceThe provided system takes a photograph of a document, automatically detects it, eliminates

projection and performs basic enhancement.

The system operation flow is as follows (see Figure 1.1): Image acquisition (from file or from

webcam), document boundary detection, projection elimination, basic enhancement.

The most challenging mission of this project is the document boundary detection. So, this

report focuses mainly on it.

The “detection” is divided into 3 main parts (see Figure 1.2):

1. Textual region detection – detection of the textual region in the target document.

2.

Candidate line detection – detection of potential document boundary lines.

3. Document quadrangle finding – finding the combination of lines which represents

the target document best.

Part 1 – receives an RGB image and returns a polygon representing the area unifying the

textual regions within the photographed document. The assumption is that many such

regions exist in the document. The algorithm uses sophisticated tools which take in account

texture, color components, and scaling variety. Finally, all the detected regions are unified to

one big polygon.

Part 2 – receives an RGB image and returns a “stack” of “suspicious” lines, for each of the 4

document quadrangle segments. The assumption is that the document represents a rigid

planar rectangle in the space, and that its boundaries are well observed in the photographed

image. The algorithm analyses the gradients of the color image, and estimates the general

orientation angle of the document. Then, using Directional Radon, it finds candidate lines

corresponding to each direction of the searched quadrangle.

Part 3 – receives the output of part 1 & 2, and returns the final detected quadrangle. The

assumption of this part is that the edge along the document boundaries is relatively uniform.

The algorithm first filters lines which intersect the text area (part 1). Then, all possible

combinations of 4 lines are investigated. For each combination the algorithm analyses the

gradients along the 4 lines, and calculates the estimated probability to represent a uniform

straight edge.


3/14

Figure 1.1: Detection overall precess

Figure 1.2: main blocks

Figure 1.3: implemented GUI

1 2 3 4

Acquisition Detection Editing Back-projection

& Enhancement


4/14

2.

Textual Area Detection in a Document

Student Name :Felix Vilensky

The input of this section is the image and the output is a polygon containing the union of

textual regions of a document. It enables the system to eliminate non-boundary lines from

the pool of lines obtained from the line detection stage. The polygon should be high

confidence, meaning it should not exceed the boundaries of the document, even at the price

of not detecting everything.

2.1. The physical problem: Measuring texture and salient regions

in a document

Regions with a lot of text, dense drawings and handwriting are characterized by

strong texture - strong edges in different directions in a small area. These regions

are also characterized by strong salience relative to the background.

2.2.

Multi-Scale corner density and MSER analysis

In order to measure texture strength and by inspiration from SIFT, the approach of

multi-scale corner density is introduced. Harris corner detection is implemented on

different scales using a Gaussian pyramid of a gray level image. Each image in the

pyramid is divided into a grid of non-overlapping spatial bins. The corner detections

in each such bin are counted and a corner count map for each scale in the pyramid is

obtained. Since the grid size is the same for all the scales of the pyramid, the count

maps are added together bin to bin (see Fig 2.1) to obtain a single one. The corner

count map is then scaled to image size, thresholded and multiplied by MSER mask

(in order to utilize the salience property).

2.3.

Differentiating between background and document

Since the goal is to auto-crop and auto-align a single document from an image, the

following reductions can be reasonably made:

1. There is a significant color difference between the document and its

background.

2. The center of the image is a pixel in the document.

3.

The image is taken with “good intentions” – At least part of the document is in

focus and without strong shadows.4. The surface on which the document lies is relatively lambertian.

Reductions 2 and 3 are almost naturally maintained since the image should be taken

to capture a good quality document. Reduction 4 is needed to avoid light spots (Fig

2.2), which are hard to filter using a single image.

In order to differentiate between detected regions inside the target document and

in the background, color histogram analysis is performed. For each MSER, a color

histogram of its neighborhood is calculated. The histogram of the MSER, which is

closest to the center of the image, is considered “canonical” and MSERs with

histograms that are substantially different are excluded, thereby removing possible


5/14

detections in the background. Final polygon is obtained by convex hull union of the

centroids of the regions in the final mask (color enhanced MSER applied with

thresholded image size corner count map).

See full outline of the algorithm (Fig 2.3).

The histogram difference measure is calculated by conversion of the histograms to

binary ones (1 for color with occurrences, 0 for color without). Then the difference is

calculated by| − | ℎ ℎ. Hence a single color presentin one histogram and absent in the other increases by one the histogram difference

metric.

2.4. Performance analysis of this part

A total 97 images are used to compute two performance measures of this part. The

selected images maintain reductions 1-3, reduction 4 is not always maintained to

demonstrate the strength of the algorithm (it sometimes deals well with non-lambertian surfaces.) and its limitations and because it’s not a natural demand from

the user.

The measures calculated are precision (between 0 and 1), which is defined as

[ ]∩ [ ] and recall (between 0 and 1), which isdefined as:

[ ]∩ [ ℎ ].The main demand on this part of the whole system is that the polygon doesn’t

exceed the area of the target document (precision=1).

Three sets of parameters are evaluated. Three values of: thresholds on the corner

count map, numbers of colors in the histogram, histogram metric threshold to

differentiate between background and document. For each set of parameters the

mean and standard deviation of each measure is computed. For the results table see

Fig 2.4.

It can be seen that the precision is generally very high and close to 1 and that there

is a tradeoff between recall and precision.

2.5.

What have I learned?

First, I’ve learned that trivial solutions usually don’t achieve the desired results. Second,

I’ve grasped the power of multi-scale analysis.


6/14

Figure 2.1 : The multi-scale corner density method-grid is scale invariant.

Figure 2.2: The effects of strong light spots and the ease with lambertian surfaces.


7/14

Figure 2.3: The general outline of the algorithm.

Criteria Set 1 Set 2 Set 3

Corner Count Map Threshold top 5% top 1% top 3%

Number of Colors 15 15 15

Color Differnce Threshold 2 2 2

Mean Precision 0.9913 0.9942 0.9934

STD. Precision 0.0419 0.036 0.0358

Mean Recall 0.5046 0.3261 0.4768

STD. Recall 0.2652 0.2424 0.2571

Figure 2.4: Performance Analysis.

Multi-

Scale

Corner

Color

HistogramMSER

Convex

Hull

Polygon


8/14

3.

Find candidate boundary lines

Student Name: Yoram Furth

This part, as just described, receives an RGB image and returns a “stack” of suspicious lines

corresponding to each one of the 4 document quadrangle segments. The main requirement

from this part is to not miss any boundary. That is, any of the 4 document boundaries must

have a corresponding line, among all the other lines. Those lines are going to be used by the

following section in order to find the one which belong to the rectangle of interest.

3.1. The physical problem: Find boundaries of planar rectangular

objects

The assumption is that the document represents a rigid planar rectangle in the space,

and that its boundaries are well observed in the photographed image. In fact, its

projection to the image plane transforms this rectangle to a general quadrangle. Usually

the observed edges are a function of intensity changes, but sometimes also of othercomponents such colors, texture, focus, etc. Hence, the challenge is to find aligned

gradient which could be part from a rectangular plane in space.

The natural photographing’s context has advantages and disadvantages. On the one

hand it allows some reductions, but on the other hand it requires to release some

degrees of freedom. For example, it allows to assume that the planar rectangle has a

relatively small tilt angle with the optical axis (


9/14

assumption is that the document boundaries are strongly relatively to other straight

edges in the image. The proposed solution is to build an orientation histogram. It is

similar to the one of SIFT, but on the whole image, and adaptations to the specific case.

c) Candidate lines detection – at this stage the suspicious lines are detected based on

Radon transform applied around the orientation angle. First, a Radon projection is

applies at the estimated orientation angle ±20°. Then, all the highest peaks are selected.

Each peak represents a candidate line in the original image. See also figure 3.3.

3.3. 13BCombining gradients direction

The gradients magnitude has often only partial information which is not good enough

for detection and estimation. In this project, like many real cases, it were found that

taking the gradient direction in consideration may have a very positive impact on the

detection quality.

One important usage of the gradient directions were assimilated in the Radontransform. Instead of naively accumulate the Radon along the magnitude map, a special

Directional Radon were developed. For each projected angle, a corresponding projected

gradients map is calculated, and only then the Radon is accumulated. By this technic all

the points that have no contribution toward the search direction are weaken, and the

lines scores became a much more significant.

Another usage exists while preparing the lines profiles for analyzing them on section 4

(here below). There, one could sample the lines profile directly on the magnitude map.

Instead, for each line a gradients projection map is generated at the specific direction,

and only then the profile is sampled.

3.4. 14B

Finding perpendicular orientation

One way to estimate orientation is by using gradients directions, similar to what is done

in SIFT [2]. In the actual case it involves several challenges. First, since the search area is

not local, one may confuse the desirable orientation with other objects, or even with

noise. Second, even within the document itself there is not only one single direction, but

4 different directions. So, the developed solution has 2 parts as the following:

While building histogram of the gradients angles, a special weighted accumulation is

done. Where, the weight is the magnitude amplitude at each pixel. In that way the mostdominant edges become more dominant in the histogram, in expense of the weak, like

noise etc.

While searching orientation it would be nice is the document edge could contribute each

other rather than disturb. Therefore, the proposed solution is to “fold” to histogram by

cycle of 90°. In this way, at the ideal case, the histogram will remain with only one

significant mode.

An illustration is available at figure 3.2


10/14

3.5. Combining color components

In many cases simple “gray” gradients is enough for finding the document boudaries.

But sometimes there are boundaries with very weak edges in the intensity domain, even

though they are well observed by human eyes. Usually those kind of edges are well seen

in color domains. That is why dedicated Color Gradients were developed here.

Those Color Gradients are composed by magnitude and direction. The magnitude is a

simple extension to the vector case, whose meaning is Euclidian distance between

colors:

‖⃗(, )‖ = ⃗(,) = �⃗2+ ⃗

212= �

2+

2

12

For the direction, for this case, it is enough to find the most dominant direction. So, for

each pixel the direction is selected from the map in which the magnitude is the maximal,

that is,

⃗(,) = (,) , while, = (,) 3.6. What have you learned, and what left open?

It this project I learned that sometimes we don’t need to look far for good solutions.

Often ones can develop solutions based on deep understanding of the relatively simple

tools, and by fitting or expending them to the specific project requirements. It is

particularly important in this context to well defining what is given, and what is required

to be solved.

At the end, I would like to go deeper into color and texture analyses. For example I

would like to develop a mechanism that can detect fractional edges defined by different

operators, such color, texture etc. Although there are works on the subject, I feel that

there is still quite a bit of progress which may be done in the field.


11/14

Figure 3.1: The general outline of the algorithm.

Figure 3.2: Orientation detection process.

Figure 3.3: Lines detection process for a specific direction.


12/14

4. Finding the Best Document Quadrangle

Student Name: Moti Ben Laish

The purpose of this part is finding four points that could be the corners of a document. The

input of the algorithm is a set of lines (Yoram) and the polygon of text area (Felix) in the

potential document. The algorithm is based on two assumptions. The first is that all the

document lines are found in the input lines got from Yoram, the second is that the text area

is accurate.

4.1.

17BThe physical problem: Find document quadrangle in image

In my case the physical problem was filtering wrong lines. Wrong lines are lines created

by some objects or texture in the input image without relation to the real boundaries of

the document.

4.2.

18BSmart information from linesDuring integration with Yoram we built a smart structure of a line which includes

information about radon score, r and theta value of each line, color gradient of each

pixel in the line, start and end coordinates.

4.3. 19B

Finding the Best Quadrangle – step by step

1. Text filtering – first I drop all the lines that pass through the text polygon.

2. Creating groups of lines – the way that a line is created by Yoram, gave me the

idea to separate the line into four groups, up, down, right and left of the target

document.

3.

Filtering the line with low radon score

4. Find all the possible combinations of a quadrangle from the line groups

5. Correlation calculating – every possible combination gets a correlation score. As

we can see in figure 4.2 I take a vector of edge Intensities of each line in the

group and a vector of series of 4 windows, the size and the place of each window

depends on the location of vertex points. Then I calculate the correlation

coefficients of each group of lines. The highest score group gives the best fitting

group and hence the 4 vertices of the quadrangle.

4.4.

20BWhat have you learned and what left open?

In this project I realized the importance of integration between parts of projects, I

learned about the capabilities of Radon algorithms. To make the output of this part

more robust I would recommend evaluating the entropy in the scope of the candidate

quadrangle – figure 4.3.


13/14

Figure 4.1 – All the possible lines: purple – line pass through text (polygon in red),

thick yellow – the document

Figure 4.2 – correlation method: red – vector of 4 window, blue –

Intensity of gradient in each line

Figure 4.3 – Entropy measured: Left – entropy of image in lab color map, Right –

entropy of scope candidate rectangle


14/14

5. Summary:The proposed algorithm succeeds in a large percent of cases under the aforementioned

reductions and in many cases even when these reductions are not met. A visual comparison

with commercial applications shows that the proposed algorithm achieves better results in a

substantial number of cases (especially when a single image is given as input).

Some ideas to further improve the results include: per image optimization of algorithm

parameters, cross-talk between line detection part and textual regions detection part,

improving reliance of quadrangle finding part on textual regions detection part to better

deal with cases when document boundaries are absent from the image, using preview to

eliminate undesired effects in the captured image (i.e. light spots).

An excel file with performance results for the proposed algorithm (as a whole and by part)

applied to many images can be found here:

Performance Analysis Excel File.

A detailed description of the performance measures can be found in the “Performance

Analysis Readme” tab.

6. References:[1] Canny, John. "A computational approach to edge detection." Pattern Analysis and Machine Intelligence,

IEEE Transactions on 6 (1986): 679-698.

[2] Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive representation for local image descriptors."

Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE ComputerSociety Conference on. Vol. 2. IEEE, 2004.

[3] Jung, Claudio Rosito, and Rodrigo Schramm. "Rectangle detection based on a windowed Hough transform."

Computer Graphics and Image Processing, 2004. Proceedings. 17th Brazilian Symposium on. IEEE, 2004.

[4]

Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010 –

Section “Color edge detection” pp 233-234

[5] Epshtein, Boris. "Determining document skew using inter-line spaces." Document Analysis and Recognition

(ICDAR), 2011 International Conference on. IEEE, 2011.

[6] Hartl, Andreas, and Gerhard Reitmayr. "Rectangular target extraction for mobile augmented realityapplications." Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.

[7] Skoryukina, Natalya, et al. "Real time rectangular document detection on mobile devices." Seventh

International Conference on Machine Vision (ICMV 2014). International Society for Optics and Photonics,2015.

[8] Kurilin, Ilya V., et al. "High-performance automatic cropping and deskew of multiple objects on scannedimages." IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2014.
https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0

document auto alignment

Documents