recognising text from captcha - iit kanpurhome.iitk.ac.in/~ankitkr/projects/captcha/report.pdf ·...

Recognising Text from CAPTCHACS674 : Machine Learning

Project Report

Akshay MittalY8056

Ankit KumarY8088

Abstract—In this report we present application of machinelearning techniques to the problem of reading text fromCAPTCHAs. We focus on Eliot’s CAPTCHA for this purpose.We first present techniques for reading a simple CAPTCHAthat requires only single character classification and trivialsegmentation. We then show how we use more complex seg-mentation approach along with clustering techniques to readmore difficult CAPTCHA; one that requires significant effort fornoise removal and cluttering before classification of individualcharacters can be attempted. We detail our noise removal andclustering-based techniques, and present our results for thisCAPTCHA.

Keywords-captcha; Eliot; noise;

I. INTRODUCTION

A CAPTCHA is a type of challenge-response test usedin computing as an attempt to ensure that the response isnot generated by a computer. The process usually involvesone computer (a server) asking a user to complete a simpletest which the computer is able to generate and grade.Because other computers are supposedly unable to solve theCAPTCHA, any user entering a correct solution is presumedto be human. Thus, it is sometimes described as a reverseTuring test, as it is administered by a machine and targetedto a human, in contrast to the standard Turing test that istypically administered by a human and targeted to a machine.A common type of CAPTCHA requires the user to typeletters or digits from a distorted image that appears on thescreen.

A number of research projects have attempted (often withsuccess) to beat visual CAPTCHAs by creating programsthat contain pre-processing, segmentation and classification.The pre-processing and classification steps are easy tasksfor computers. The only step where humans still outper-form computers is segmentation. If the background clutterconsists of shapes similar to letter shapes, and the lettersare connected by this clutter, the segmentation becomesnearly impossible with current software. Hence, an effectiveCAPTCHA should focus on the segmentation.

Most of the previous work in this area has been donefocussing on the CAPTCHA which increase complexity bycurvilinear distortion and overlapping of images. Little work

has been done on CAPTCHA where complexity arises dueto cluttering in the background which connects the text. Insuch case, with the current software techniques, segmentationbecomes almost impossible. Hence, we focus on such anopen source captcha : The Eliot’s Captcha [1].

For computational reasons, we generate CAPTCHAs fromthe generator with digits (0-9) as their elements.

A. Prior Work

Breaking CAPTCHAs is not new. Mori and Malik [2]have successfully broken the EZGimpy (92% success) andGimpy (33% success) CAPTCHAs from CMU. Researchersin past have worked upon cracking CAPTCHAS of Yahoo,Gmail, TicketMaster, MSN/Hotmail, Register.com [3], [4]with good success rates. Chellapilla [3], [4] shows thatcracking a CAPTCHA is essentially a two fold process,first segmentaing the CAPTCHA into individual characters(which is harder) and then the easier part of classificationof the segmented characters for recognition. Chellapilla hasshown that the classification part is an easier one whileusing a convolutional neural network [5]. These papersalso cite some preprocessing techniques that could help insegmentation of the characters.

II. THE ELIOT’S PHP CAPTCHA

Eliots PHP CAPTCHA system [1] is a CAPTCHA gen-erating library freely available on the web. Some examplesof a CAPTCHA generated with EPC is shown in Fig. 1,Fig. 2 and Fig. 3. The important characteristics to note in thisCAPTCHA are the following. First, the individual charactersare placed in invisible equally-sized subdivisions of theimage. Second, the characters use regular computer typeface(with random font color) without any transformations otherthan translation, rotation, size scaling and shadowing. Third,lines have been randomly drawn with random font colorsacross the image.

Figure 1: Type-1 Eliot Captcha



III. OUR APPROACH

A. SVM Classification

We use SVM classifier for classifying each segmenteddigit obtained from a CAPTCHA using raw pixel valuesas features. We used the libsvm Support Vector Machinepackage [6] for this purpose.

The libsvm library provides support for several kernels.We determined the classification accuracy with libsvmusing the linear kernel, and the radial basis function (RBF)kernel. The libsvm authors recommend that the RBF kernelbe used with kernel parameter γ, and penalty parameterC, chosen by cross-validation. The linear kernel providesgood results, and has the advantage that when selected, theSVM will train far more quickly than with the RBF kernel.The default cross-validation training implementation inlibsvm produces excellent results, but requires considerablecomputing resources. With the linear kernel we used thelibsvm default C = 1, and with the RBF kernel we usedlibsvms grid search optimization of C and γ.

Table I: Impact of SVM Kernel choice on accuracy

Accuracy Linear Kernel RBF KernelIndividual Character 98.4% 99.9%

End-to-End 94.5% 96.5%

Table I shows the classification accuracy for Type-1Eliot CAPTCHA. Recognising the superior performanceof RBF kernel, we have used the RBF kernel for furtherclassifications.

For each type of Eliot’s CAPTCHA, we trained the SVMwith 100 samples for each numeral (40 x 40px) i.e. 1000 intotality. For testing, we used 100 CAPTCHA images of therespective type of CAPTCHA.

B. Targeting Type-1 and Type-2 Eliot CAPTCHA

1) Vertical Segmentation: A vertical segmentationmethod is applied to segment a captcha vertically intoseveral chunks, each of which contains exactly one character[7]. The process of vertical segmentation starts by mappingthe image to a histogram that represents the sum of theintensities of pixels per column in the image. Then, verticalsegmentation (Fig 4) lines separate the image into chunksby cutting through columns that have total intensity less

than a predefined threshold.

Figure 4: Vertical Segmentation Example [7]2) Horizontal Segmentation: After getting the individual

numerals, we do horizontal segmentation (similar to verticalsegmentation) in order to get the best fit rectangle enclosingeach numeral.

3) Scaling: The best-fit rectangles for the numeralsso obtained are of varied sizes as the numerals in theCAPTCHAs are of different sizes. Hence, we scale theobtained best-fit rectangles to pre-fixed 40 x 40 resolutionand then feed them to the SVM for classification.

4) Results: The results of the SVM classifier for Type-1and Type-2 Eliot’s captchas are shown in Table II.

Table II: Classification accuracy for Type-1 and Type-2 EliotCAPTCHAs

Accuracy Type-1 Type-2Individual Character 99.9% 89.7%

End-to-End 96.5% 33.4%

C. Targeting Type-3 Eliot CAPTCHA

Note that the Type-3 CAPTCHAs contain a lot ofcluttering in the background. As a result, the verticalsegmentation technique cannot be applied to the captcha.In order to solve this problem, we use the fact that thenumerals in Eliot CAPTCHA are drawn in an invisiblebox of the size that is one-fifth of the captcha size. Hence,we cut the captcha equally into 5 parts to separate out thenumerals.

To separate the numerals from the clutter, the first thingthat next comes in mind is to use a clustering technique .This method, however, fails since the number of clustersis not fixed and varies from captcha to captcha dependingupon on the amount of clutter. Fig 5 and Fig 6 show theresult of directly applying k-means clustering on a samplesegmented numeral box.

From the above figures, it is evident that a clusteringalgorithm cannot be directly applied to a captcha with such ahuge amount of noise. In the following sections, we discussa 5-fold filtering process for refining the numeral. The

(a) Origi-nal

(b) k=2 (c) k=5

Figure 5: Clustering for different classes

(a) Origi-nal

(b) k=2 (c) k=3

Figure 6: Clustering for different classes

various stages of the sample numeral through the followingfilters is shown in Fig 8 with the starting image being Fig 7a.

1) Color Based Filter: The cluttering in the CAPTCHAsis due to the various lines present in the background.These lines of different color composition. We utilisethis information to reduce the noise level. Each pixel ofthe image is analysed and its Manhattan distance fromits neighbouring 8-pixels is computed. If the Manhattandistance of a pixel A from another pixel B is greater than athreshold α, then the pixel A is said to be distant from pixelB. The color-based filter removes each pixel (i.e. convertsthe pixel to white) which has atleast β distant neighbours.The result of the filter for the segmented numeral are shownin Fig 7b. For this example, α = 70 and β = 4. Fig 7bshows significant noise removal.

2) Binary Image Filter: The cluttering lines present in thebackground of the CAPTCHA undergo many intersections.As a result, many holes are present in the captchabackground. The binary-image filter aims at using the vacantspace present around individual pixels to remove secludedpixels from the image obtained after color-based filtering.This filter first converts the RGB image to GrayScale andthen it binarizes the pixel intensities with some threshold.The filter then removes the black-pixels which have atleast ηwhite neighbours. The result of the filter for the segmentednumeral are shown in Fig 7c. For this example, η = 6.Fig 7c shows a removal of various secluded noise pixels.

3) Connected Component Filter (Pass-1): The resultantimage from binary filter has many small connectedcomponents of clutters. Note that the pixel count of most ofthese noise patches is significantly less than the pixel countof the numerals. Hence the aim of the connected componentfilter is to remove connected components from the imagewhich have pixel count less than a threshold λ. Pass-1 is

(a) Origi-nal

(b) ColorBasedFilter

(c) BinaryImage Fil-ter

Figure 7

(a)ConnectedCompo-nent Filter(Pass-1)

(b) k-meansClusteringFilter (2-classses)

(c) k-meansClusteringFilter(RGB)

(d)ConnectedCompo-nent Filter(Pass-1)

Figure 8: The Applied Filters

used as this filter is applied once again at a later stage. Theresult of the filter for the segmented numeral are shown inFig 8a. For this example, λ = 50. Fig 8a shows a significantremoval of clutter patches.

4) k-means Clustering Filter: The resultant image fromthe connected component filter shows presence of noisepixels attached with the numerals. We earlier showed thatk-means clustering could not be applied because of anundetermined number of classes. The image at this stage offiltering process consists mostly of the numeral with somenoise. The situation is different now as the previous filtersreduced the number of classes as well their cardinality.Hence, the number of classes to the k-means clusteringalgorithm can be specified to be 2. The result of the k-meansclustering filter for the segmented numeral are shown inFig 8c and Fig 8b. Fig 8c shows removal of noise attachedwith the numeral which was earlier difficult to remove. Notethat even if the noise does not get completely removed, thenumeral gets detached from the noise. The noise is thenremoved in the next filter.

5) Connected Component Filter (Pass-2): The resultantimage from k-means clustering filter shows traces of verysmall noise patches which are removed by again passingthrough the connected component filter (stage-2). The resultof the filter for the segmented numeral are shown in Fig 8d.For this example, λ = 50. Fig 8d shows the numeral whichis almost free from any noise.

6) Crop and Scale: The image obtained after applyingthe afore-mentioned filters is then exposed to vertical andhorizontal segmentations and scaling as explained earlier.

The numeral image is then fed to the SVM classifier.

Table III: Classification accuracy for Type-3 EliotCAPTCHAs

Number of Cluttering Lines Individual Character End-to-End10 94.7 69.7%20 93.4 67.5%40 92.2 65.1%60 88.3 55.8%

7) Results: We used 100 Eliot test CAPTCHAs ofType-3. After passing all these CAPTCHAs through theafore-mentioned filters, they were fed to the SVM classifier.The results of the classifier are shown in Table III.

D. An Alternative Approach - Difference MethodWhen Manhattan or Euclidian distance metric is used

for computing the distance between two RGB values, theweightage given to each color R, G and B is same. Wetried a different approach in which we tested the distancemetric on individual colors. In this approach, instead ofremoving the pixels which have large number of distantneighbours, we removed those pixels which have all nearbyneighbours. The resultant images for the red-region of theRGB scale are shown in Fig 9. This approach, however, wasnot much successful as it resulted in many disconnectedcomponents of the numerals which made it difficult toproperly distinguish between the numeral and noise, shownin Fig 10.

Figure 9: Difference Method: Red Region

Figure 10: Difference Method: Resultant RGB Image

IV. CONCLUSION

We have demonstrated that SVMs can be used to clas-sify uncluttered CAPTCHA images with a success rate of96.5%, shadowed CAPTCHA images with 33.4% and clut-tered CAPTCHA images with 65% on an average. We havealso presented methods for attacking CAPTCHAs with highamounts of background noise. In our knowledge, little workhas been done in solving such CAPTCHAs. In particular forthe Eliot CAPTCHA, we found an attempt [8] where ourapproach outperforms them significantly.

V. FUTURE WORK

This work can be extended to the case where the numeralsin the CAPTCHA are overlapping. Curvilinear transforma-tions of characers in CAPTCHA also require further atten-tion.

ACKNOWLEDGEMENT

We would like to sincerely thank our advisor Dr. KrithikaVenkataramani for her guidance in this project throughoutthe semester. We would also like to thank the course TAs fortheir valuable support.

REFERENCES

[1] Ed Eliot’s PHP CAPTCHAs, software available at http://www.ejeliot.com/pages/2.

[2] G. Mori and J. Malik, “Recognizing objects in adversarialclutter: Breaking a visual captcha,” 2003, pp. 134–141.

[3] K. Chellapilla and P. Y. Simard, “Using machine learning tobreak visual human interaction proofs (hips),” in NIPS, 2004.

[4] K. Chellapilla, K. Larson, P. Y. Simard, and M. Czerwinski,“Computers beat humans at single character recognition inreading based human interaction proofs (hips),” in CEAS, 2005.

[5] P. Simard, D. Steinkraus, and J. C. Platt, “Best practicesfor convolutional neural networks applied to visual documentanalysis,” in ICDAR, 2003, pp. 958–962.

[6] C.-C. Chang and C.-J. Lin, LIBSVM: a library for supportvector machines, 2001, software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

[7] J. Yan and A. S. El Ahmad, “A low-cost attack on a microsoftcaptcha,” in Proceedings of the 15th ACM conference onComputer and communications security, ser. CCS ’08. NewYork, NY, USA: ACM, 2008, pp. 543–554. [Online]. Available:http://doi.acm.org/10.1145/1455770.1455839

[8] R. Fortune, G. Luu, and P. MacMahon, “Cracking captchas: Learning to read obscured and distorted text in images,”Stanford University, Tech. Rep., 2008.

recognising text from captcha - iit kanpurhome.iitk.ac.in/~ankitkr/projects/captcha/report.pdf ·...

Documents