document auto alignment
TRANSCRIPT
-
8/19/2019 Document Auto Alignment
1/14
Document Auto-alignment
Yoram Furth, Felix Vilensky, Moti Ben Laish
ABSTRACT
These days, when smartphones take the functionalities of the traditional office, document
scanning by just "one click" is a necessary ability.
Such an application may have so much examples of usage: business cards, receipts, ID cards,
lecture notes, blackboard, screens, brochures, etc.
The task is an automatic alignment and enhancement dedicated for documents.
The main challenge is to correctly detect the document under a variety of conditions:
different viewing angles (projection), different backgrounds, detection when more than a
single document is present in the image.
The proposed solution performs similarly and even in some scenarios outperforms existing
commercial algorithms.
Some ideas, if were implemented, could push performance even further.
-
8/19/2019 Document Auto Alignment
2/14
1. The overall system at a glanceThe provided system takes a photograph of a document, automatically detects it, eliminates
projection and performs basic enhancement.
The system operation flow is as follows (see Figure 1.1): Image acquisition (from file or from
webcam), document boundary detection, projection elimination, basic enhancement.
The most challenging mission of this project is the document boundary detection. So, this
report focuses mainly on it.
The “detection” is divided into 3 main parts (see Figure 1.2):
1. Textual region detection – detection of the textual region in the target document.
2.
Candidate line detection – detection of potential document boundary lines.
3. Document quadrangle finding – finding the combination of lines which represents
the target document best.
Part 1 – receives an RGB image and returns a polygon representing the area unifying the
textual regions within the photographed document. The assumption is that many such
regions exist in the document. The algorithm uses sophisticated tools which take in account
texture, color components, and scaling variety. Finally, all the detected regions are unified to
one big polygon.
Part 2 – receives an RGB image and returns a “stack” of “suspicious” lines, for each of the 4
document quadrangle segments. The assumption is that the document represents a rigid
planar rectangle in the space, and that its boundaries are well observed in the photographed
image. The algorithm analyses the gradients of the color image, and estimates the general
orientation angle of the document. Then, using Directional Radon, it finds candidate lines
corresponding to each direction of the searched quadrangle.
Part 3 – receives the output of part 1 & 2, and returns the final detected quadrangle. The
assumption of this part is that the edge along the document boundaries is relatively uniform.
The algorithm first filters lines which intersect the text area (part 1). Then, all possible
combinations of 4 lines are investigated. For each combination the algorithm analyses the
gradients along the 4 lines, and calculates the estimated probability to represent a uniform
straight edge.
-
8/19/2019 Document Auto Alignment
3/14
Figure 1.1: Detection overall precess
Figure 1.2: main blocks
Figure 1.3: implemented GUI
1 2 3 4
Acquisition Detection Editing Back-projection
& Enhancement
-
8/19/2019 Document Auto Alignment
4/14
2.
Textual Area Detection in a Document
Student Name :Felix Vilensky
The input of this section is the image and the output is a polygon containing the union of
textual regions of a document. It enables the system to eliminate non-boundary lines from
the pool of lines obtained from the line detection stage. The polygon should be high
confidence, meaning it should not exceed the boundaries of the document, even at the price
of not detecting everything.
2.1. The physical problem: Measuring texture and salient regions
in a document
Regions with a lot of text, dense drawings and handwriting are characterized by
strong texture - strong edges in different directions in a small area. These regions
are also characterized by strong salience relative to the background.
2.2.
Multi-Scale corner density and MSER analysis
In order to measure texture strength and by inspiration from SIFT, the approach of
multi-scale corner density is introduced. Harris corner detection is implemented on
different scales using a Gaussian pyramid of a gray level image. Each image in the
pyramid is divided into a grid of non-overlapping spatial bins. The corner detections
in each such bin are counted and a corner count map for each scale in the pyramid is
obtained. Since the grid size is the same for all the scales of the pyramid, the count
maps are added together bin to bin (see Fig 2.1) to obtain a single one. The corner
count map is then scaled to image size, thresholded and multiplied by MSER mask
(in order to utilize the salience property).
2.3.
Differentiating between background and document
Since the goal is to auto-crop and auto-align a single document from an image, the
following reductions can be reasonably made:
1. There is a significant color difference between the document and its
background.
2. The center of the image is a pixel in the document.
3.
The image is taken with “good intentions” – At least part of the document is in
focus and without strong shadows.4. The surface on which the document lies is relatively lambertian.
Reductions 2 and 3 are almost naturally maintained since the image should be taken
to capture a good quality document. Reduction 4 is needed to avoid light spots (Fig
2.2), which are hard to filter using a single image.
In order to differentiate between detected regions inside the target document and
in the background, color histogram analysis is performed. For each MSER, a color
histogram of its neighborhood is calculated. The histogram of the MSER, which is
closest to the center of the image, is considered “canonical” and MSERs with
histograms that are substantially different are excluded, thereby removing possible
-
8/19/2019 Document Auto Alignment
5/14
detections in the background. Final polygon is obtained by convex hull union of the
centroids of the regions in the final mask (color enhanced MSER applied with
thresholded image size corner count map).
See full outline of the algorithm (Fig 2.3).
The histogram difference measure is calculated by conversion of the histograms to
binary ones (1 for color with occurrences, 0 for color without). Then the difference is
calculated by| − | ℎ ℎ. Hence a single color presentin one histogram and absent in the other increases by one the histogram difference
metric.
2.4. Performance analysis of this part
A total 97 images are used to compute two performance measures of this part. The
selected images maintain reductions 1-3, reduction 4 is not always maintained to
demonstrate the strength of the algorithm (it sometimes deals well with non-lambertian surfaces.) and its limitations and because it’s not a natural demand from
the user.
The measures calculated are precision (between 0 and 1), which is defined as
[ ]∩ [ ] and recall (between 0 and 1), which isdefined as:
[ ]∩ [ ℎ ].The main demand on this part of the whole system is that the polygon doesn’t
exceed the area of the target document (precision=1).
Three sets of parameters are evaluated. Three values of: thresholds on the corner
count map, numbers of colors in the histogram, histogram metric threshold to
differentiate between background and document. For each set of parameters the
mean and standard deviation of each measure is computed. For the results table see
Fig 2.4.
It can be seen that the precision is generally very high and close to 1 and that there
is a tradeoff between recall and precision.
2.5.
What have I learned?
First, I’ve learned that trivial solutions usually don’t achieve the desired results. Second,
I’ve grasped the power of multi-scale analysis.
-
8/19/2019 Document Auto Alignment
6/14
Figure 2.1 : The multi-scale corner density method-grid is scale invariant.
Figure 2.2: The effects of strong light spots and the ease with lambertian surfaces.
-
8/19/2019 Document Auto Alignment
7/14
Figure 2.3: The general outline of the algorithm.
Criteria Set 1 Set 2 Set 3
Corner Count Map Threshold top 5% top 1% top 3%
Number of Colors 15 15 15
Color Differnce Threshold 2 2 2
Mean Precision 0.9913 0.9942 0.9934
STD. Precision 0.0419 0.036 0.0358
Mean Recall 0.5046 0.3261 0.4768
STD. Recall 0.2652 0.2424 0.2571
Figure 2.4: Performance Analysis.
Multi-
Scale
Corner
Color
HistogramMSER
Convex
Hull
Polygon
-
8/19/2019 Document Auto Alignment
8/14
3.
Find candidate boundary lines
Student Name: Yoram Furth
This part, as just described, receives an RGB image and returns a “stack” of suspicious lines
corresponding to each one of the 4 document quadrangle segments. The main requirement
from this part is to not miss any boundary. That is, any of the 4 document boundaries must
have a corresponding line, among all the other lines. Those lines are going to be used by the
following section in order to find the one which belong to the rectangle of interest.
3.1. The physical problem: Find boundaries of planar rectangular
objects
The assumption is that the document represents a rigid planar rectangle in the space,
and that its boundaries are well observed in the photographed image. In fact, its
projection to the image plane transforms this rectangle to a general quadrangle. Usually
the observed edges are a function of intensity changes, but sometimes also of othercomponents such colors, texture, focus, etc. Hence, the challenge is to find aligned
gradient which could be part from a rectangular plane in space.
The natural photographing’s context has advantages and disadvantages. On the one
hand it allows some reductions, but on the other hand it requires to release some
degrees of freedom. For example, it allows to assume that the planar rectangle has a
relatively small tilt angle with the optical axis (
-
8/19/2019 Document Auto Alignment
9/14
assumption is that the document boundaries are strongly relatively to other straight
edges in the image. The proposed solution is to build an orientation histogram. It is
similar to the one of SIFT, but on the whole image, and adaptations to the specific case.
c) Candidate lines detection – at this stage the suspicious lines are detected based on
Radon transform applied around the orientation angle. First, a Radon projection is
applies at the estimated orientation angle ±20°. Then, all the highest peaks are selected.
Each peak represents a candidate line in the original image. See also figure 3.3.
3.3. 13BCombining gradients direction
The gradients magnitude has often only partial information which is not good enough
for detection and estimation. In this project, like many real cases, it were found that
taking the gradient direction in consideration may have a very positive impact on the
detection quality.
One important usage of the gradient directions were assimilated in the Radontransform. Instead of naively accumulate the Radon along the magnitude map, a special
Directional Radon were developed. For each projected angle, a corresponding projected
gradients map is calculated, and only then the Radon is accumulated. By this technic all
the points that have no contribution toward the search direction are weaken, and the
lines scores became a much more significant.
Another usage exists while preparing the lines profiles for analyzing them on section 4
(here below). There, one could sample the lines profile directly on the magnitude map.
Instead, for each line a gradients projection map is generated at the specific direction,
and only then the profile is sampled.
3.4. 14B
Finding perpendicular orientation
One way to estimate orientation is by using gradients directions, similar to what is done
in SIFT [2]. In the actual case it involves several challenges. First, since the search area is
not local, one may confuse the desirable orientation with other objects, or even with
noise. Second, even within the document itself there is not only one single direction, but
4 different directions. So, the developed solution has 2 parts as the following:
While building histogram of the gradients angles, a special weighted accumulation is
done. Where, the weight is the magnitude amplitude at each pixel. In that way the mostdominant edges become more dominant in the histogram, in expense of the weak, like
noise etc.
While searching orientation it would be nice is the document edge could contribute each
other rather than disturb. Therefore, the proposed solution is to “fold” to histogram by
cycle of 90°. In this way, at the ideal case, the histogram will remain with only one
significant mode.
An illustration is available at figure 3.2
-
8/19/2019 Document Auto Alignment
10/14
3.5. Combining color components
In many cases simple “gray” gradients is enough for finding the document boudaries.
But sometimes there are boundaries with very weak edges in the intensity domain, even
though they are well observed by human eyes. Usually those kind of edges are well seen
in color domains. That is why dedicated Color Gradients were developed here.
Those Color Gradients are composed by magnitude and direction. The magnitude is a
simple extension to the vector case, whose meaning is Euclidian distance between
colors:
‖⃗(, )‖ = ⃗(,) = �⃗2+ ⃗
212= �
2+
2
12
For the direction, for this case, it is enough to find the most dominant direction. So, for
each pixel the direction is selected from the map in which the magnitude is the maximal,
that is,
⃗(,) = (,) , while, = (,) 3.6. What have you learned, and what left open?
It this project I learned that sometimes we don’t need to look far for good solutions.
Often ones can develop solutions based on deep understanding of the relatively simple
tools, and by fitting or expending them to the specific project requirements. It is
particularly important in this context to well defining what is given, and what is required
to be solved.
At the end, I would like to go deeper into color and texture analyses. For example I
would like to develop a mechanism that can detect fractional edges defined by different
operators, such color, texture etc. Although there are works on the subject, I feel that
there is still quite a bit of progress which may be done in the field.
-
8/19/2019 Document Auto Alignment
11/14
Figure 3.1: The general outline of the algorithm.
Figure 3.2: Orientation detection process.
Figure 3.3: Lines detection process for a specific direction.
-
8/19/2019 Document Auto Alignment
12/14
4. Finding the Best Document Quadrangle
Student Name: Moti Ben Laish
The purpose of this part is finding four points that could be the corners of a document. The
input of the algorithm is a set of lines (Yoram) and the polygon of text area (Felix) in the
potential document. The algorithm is based on two assumptions. The first is that all the
document lines are found in the input lines got from Yoram, the second is that the text area
is accurate.
4.1.
17BThe physical problem: Find document quadrangle in image
In my case the physical problem was filtering wrong lines. Wrong lines are lines created
by some objects or texture in the input image without relation to the real boundaries of
the document.
4.2.
18BSmart information from linesDuring integration with Yoram we built a smart structure of a line which includes
information about radon score, r and theta value of each line, color gradient of each
pixel in the line, start and end coordinates.
4.3. 19B
Finding the Best Quadrangle – step by step
1. Text filtering – first I drop all the lines that pass through the text polygon.
2. Creating groups of lines – the way that a line is created by Yoram, gave me the
idea to separate the line into four groups, up, down, right and left of the target
document.
3.
Filtering the line with low radon score
4. Find all the possible combinations of a quadrangle from the line groups
5. Correlation calculating – every possible combination gets a correlation score. As
we can see in figure 4.2 I take a vector of edge Intensities of each line in the
group and a vector of series of 4 windows, the size and the place of each window
depends on the location of vertex points. Then I calculate the correlation
coefficients of each group of lines. The highest score group gives the best fitting
group and hence the 4 vertices of the quadrangle.
4.4.
20BWhat have you learned and what left open?
In this project I realized the importance of integration between parts of projects, I
learned about the capabilities of Radon algorithms. To make the output of this part
more robust I would recommend evaluating the entropy in the scope of the candidate
quadrangle – figure 4.3.
-
8/19/2019 Document Auto Alignment
13/14
Figure 4.1 – All the possible lines: purple – line pass through text (polygon in red),
thick yellow – the document
Figure 4.2 – correlation method: red – vector of 4 window, blue –
Intensity of gradient in each line
Figure 4.3 – Entropy measured: Left – entropy of image in lab color map, Right –
entropy of scope candidate rectangle
-
8/19/2019 Document Auto Alignment
14/14
5. Summary:The proposed algorithm succeeds in a large percent of cases under the aforementioned
reductions and in many cases even when these reductions are not met. A visual comparison
with commercial applications shows that the proposed algorithm achieves better results in a
substantial number of cases (especially when a single image is given as input).
Some ideas to further improve the results include: per image optimization of algorithm
parameters, cross-talk between line detection part and textual regions detection part,
improving reliance of quadrangle finding part on textual regions detection part to better
deal with cases when document boundaries are absent from the image, using preview to
eliminate undesired effects in the captured image (i.e. light spots).
An excel file with performance results for the proposed algorithm (as a whole and by part)
applied to many images can be found here:
Performance Analysis Excel File.
A detailed description of the performance measures can be found in the “Performance
Analysis Readme” tab.
6. References:[1] Canny, John. "A computational approach to edge detection." Pattern Analysis and Machine Intelligence,
IEEE Transactions on 6 (1986): 679-698.
[2] Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive representation for local image descriptors."
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE ComputerSociety Conference on. Vol. 2. IEEE, 2004.
[3] Jung, Claudio Rosito, and Rodrigo Schramm. "Rectangle detection based on a windowed Hough transform."
Computer Graphics and Image Processing, 2004. Proceedings. 17th Brazilian Symposium on. IEEE, 2004.
[4]
Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010 –
Section “Color edge detection” pp 233-234
[5] Epshtein, Boris. "Determining document skew using inter-line spaces." Document Analysis and Recognition
(ICDAR), 2011 International Conference on. IEEE, 2011.
[6] Hartl, Andreas, and Gerhard Reitmayr. "Rectangular target extraction for mobile augmented realityapplications." Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.
[7] Skoryukina, Natalya, et al. "Real time rectangular document detection on mobile devices." Seventh
International Conference on Machine Vision (ICMV 2014). International Society for Optics and Photonics,2015.
[8] Kurilin, Ilya V., et al. "High-performance automatic cropping and deskew of multiple objects on scannedimages." IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2014.
https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0https://drive.google.com/open?id=0BxGZYQZOPhUQamltQUNLaDRlN3c&authuser=0