automatic ascii art conversion of binary images using …€¦ · automatic ascii art conversion of...

ISSC 2008, Galway, June 18–19

Automatic ASCII Art Conversion of Binary ImagesUsing Non-Negative Constraints

Paul D. O’Grady and Scott T. Rickard

Complex & Adaptive Systems Laboratory,

University College Dublin,

Ireland.

E-mail: [email protected]

Abstract — It is hard to avoid ASCII Art in today’s digital world, from the ubiquitousemoticons—;)—to the esoteric artistic creations that reside in many people’s e-mailsignatures, everybody has come across ASCII art at some stage. The origins of ASCII artcan be traced back to the days when computers had a high price, slow operating speedsand limited graphics capabilities, which forced computer programmers and enthusiaststo develop some innovative ways to render images using the limited graphics blocksavailable, viz., text characters. Here, we treat automatic ASCII art conversion of binaryimages as an optimisation problem, and present an application of our work on Non-Negative Matrix Factorisation to this task—where a basis constructed from monospacefont glyphs is fitted to a binary image using a winner-takes-all assignment.

Keywords — ASCII Art, Non-Negative Matrix Factorisation.

I Introduction

The art of creating images using text charactersas their constituent elements has been around longbefore the advent of computers. A really nice ex-ample exists in the book Alice in Wonderland byLewis Carroll, which was written in 1865. In chap-ter 3 page 37 of the book the text is arranged inthe form of a mouse tail, which is considered to beone of the first printed character art creations.

In the digital realm, character art was firstcreated using Radio Teletype (RTTY), which ismachine-to-machine communication method usingradio or telephone lines, RTTY typically usedthe 5-bit Baudot code character encoding scheme,which provides a limited palette that includes onlycapital letters. It was not until the arrival of com-puter bulletin board systems and the Internet—when ASCII was the most popular standard forcharacter encoding—that character art really tookoff giving the name ASCII Art [1] to this creativeendeavour. Furthermore, a subculture developedaround the art resulting in the creation of ASCIIanimations, comic books, etc.

ASCII art continues to capture the imagina-

tion of computer enthusiasts. The advent of textmarkup languages such as HTML, has enabledASCII art to evolve to include colour information,where it is possible to quantise images to a fullRGB colour space. Furthermore, new characterencoding schemes such as Unicode [2] have a hugevariety of characters, enabling the possibility ofcreating character art with a lot more detail.

In this work, we employ methods related to Non-Negative Matrix Factorisation (NMF) in the con-version of binary images, i.e., black and white, toASCII art, where a fixed basis constructed frommonospace font glyphs is fitted to the image, andthe glyphs used to represent the image are se-lected using a winner-takes-all approach. Further-more, we use a parameterisable divergence mea-sure known as the β-divergence as the algorithm’sreconstruction cost function, which enables the se-lection of a range of different cost functions, eachproducing different ASCII art.

We focus our attention on classic ASCII art,which is created using the standard 7-bit ASCIItable resulting in a palette of 95 printable char-acters (numbered 32 to 126), as opposed to high

ASCII art, which is created from an extended 8-bitASCII set that includes block graphics characters[1].

This paper is organised as follows: We presentan overview of NMF in Section II and discuss itsapplication to ASCII art conversion of binary im-ages in Section III. We present some examples inSection IV and finish the paper with a discussionand conclusion in Section V & Section VI respec-tively.

II Non-negative Matrix Factorisation

Non-Negative Matrix Factorisation [3, 4] is amethod for the decomposition of multivariate data,where a non-negative matrix, V, is approximatedas a product of two non-negative matrices, V ≈WH. NMF is a parts-based approach that makesno statistical assumption about the data. In-stead, it assumes for the domain at hand, e.g.binary images, that negative numbers are physi-cally meaningless—which is the foundation for theassumption that the search for a decompositionshould be confined to a non-negative space, i.e.,nonnegativity assumption. The lack of statisticalassumptions makes it difficult to prove that NMFwill give correct decompositions. However, it hasbeen shown in practice to give correct results.

NMF, and its extensions, has been applied toa wide variety of problems including face recogni-tion [5], brain imaging [6] and tensor factorisation[7]. Furthermore, in combination with a magni-tude spectrogram representation, NMF has beenapplied to audio processing tasks such as speechseparation [8, 9, 10] and automatic transcriptionof music [11].

a) Standard NMF Algorithm

Given a non-negative matrix V ∈ R≥0,M×T , the

goal is to approximate V as a product of twonon-negative matrices W ∈ R

≥0,M×R and H ∈R

≥0,R×T ,

V ≈WH, vik ≈R

∑

j=1

wijhjk. (1)

Typically, R < M , where W contains a low-rankbasis and H contains associated activations.

Two NMF algorithms were introduced by Leeand Seung [4], each optimising a different costfunction to measure reconstruction quality. Thecost functions specified are the Squared EuclideanDistance (SED),

DSED(V,W,H) =1

2‖V −WH‖2, (2)

and a generalised version of the Kullback-Leibler

Divergence (KLD),

DKLD(V‖W,H) =∑

ik

(

vik logvik

[WH]ik−vik+[WH]ik

)

.

(3)NMF is treated as an optimisation problem thatminimises the selected cost function, and enforcesa non-negativity constraint on the factors:

minW,H

D(V‖W,H) W,H ≥ 0,

resulting in a parts-based decomposition, wherethe basis in W resemble parts of the input data,which can only be summed together to approxi-mate V. Both Eq. 2 and Eq. 3 are convex in W

and H individually, but not together. ThereforeNMF algorithms usually alternate updates of W

and H. The cost functions are minimised using adiagonally rescaled gradient descent algorithm [4],which enforces the non-negativity constraint andleads to the following multiplicative updates forSquared Euclidean Distance (SED),

wij ← wij

[VHT]ij

[WHHT]ij, (4a)

hjk ← hjk

[WTV]jk

[WTWH]jk

, (4b)

and Kullback-Leibler Divergence (KLD),

wij ← wij

∑T

k=1(vik/[WH]ik)hjk∑T

k=1hjk

, (5a)

hjk ← hjk

∑M

i=1wij(vik/[WH]ik)∑M

i=1wij

, (5b)

As the NMF algorithm iterates, its factors con-verge to a local optimum of its cost function.

The parameter R, which specifies the number ofcolumns in W and rows in H, determines the rankof the approximation. If R < M then W is over-determined and NMF reveals low-rank features ofthe data. The selection of an appropriate value forR usually requires prior knowledge, and is impor-tant in obtaining a satisfactory decomposition.

III ASCII Art Conversion Using

Non-Negative Constraints

To increase the flexibility of our NMF algorithmwe employ the β-divergence as its cost function.The Beta Divergence (BD) (proposed as an costfunction for NMF by [12]; also referred to as themodified alpha divergence [13]) is a parameteriseddivergence measure that encompasses the previ-ously discussed cost functions of SED and KLD,and the Itakura-Saito Divergence (ISD) [14],

DISD(V‖W,H) =∑

ik

( vik

[WH]ik−log

vik

[WH]ik−1

)

.

(6)

Fig. 1: Plot of NMF cost functions: Solid line: Itakura-Saito divergence; Dashed line: Kullback-Leibler divergence;Dotted line: Squared Euclidean distance. The curves indi-cate the penalty scheme imposed by each function, wherethe reconstruction error is represented on the x-axis, whilethe associated penalty is represented on the y-axis. Here,the reference variable is 3, where an estimate of 3 has nopenalty.

The NMF cost function utilising β-divergence is

DBD(V‖W,H, β) =

∑

ik

(

vik

vβ−1

ik − [WH]β−1

ik

β(β − 1)+ [WH]β−1

ik

[WH]ik − vik

β

)

,

(7)for β = 2, SED is obtained; for β → 1, the di-vergence tends to KLD; and for β → 0, it tendsto ISD. The choice of the β parameter depends onthe statistical distribution of the data, and requiresprior knowledge, see [15, Chapter 3]. The utility ofthe β-divergence cost function is that it enables theselection of many different reconstruction penaltyschemes, as illustrated in Fig. 1, through the selec-tion of a single parameter. In effect, providing awide selection of NMF algorithms, each producingdifferent results.

For the purposes of binary image to ASCII artconversion, W is known in advance, where the fontglyphs of the specified monospace font are used toconstruct the basis. Therefore removing the re-quirement for an W update as in standard NMF.The columns of matrix V are constructed from thebinary image under consideration, where the imageis partitioned into blocks, which are the same di-mension as the glyphs used to construct W. Eachimage block corresponds to a single font glyph inthe ASCII art image. V is fitted to W using thefollowing update rule for H,

hjk ← hjk

∑M

i=1wij(vik/[WH]2−β

ik )∑M

i=1wij [WH]β−1

ik

. (8)

Subsequent to fitting, the ASCII art representation

of the image is indicated by H:

V ≈W maxcol(H, ǫ), (9)

where the maxcol operator sets all entries to zeroand replaces the maximum activation in each col-umn with a one, provided that the maximum valueis greater than ǫ. From our experience, threshold-ing of the maximum values improves the appear-ance of the resultant ASCII art, as whitespace ismore pleasing to the eye than a glyph with a smallactivation. The columns of V contain the bitmapsfor the winning glyphs at each block location, theblock partitioning step is reversed and the ASCIIart image is constructed.

More formally, we use the following proce-dure for automatic conversion of binary images toASCII art:

1. Construct W from a monospace font, e.g.,Courier, where the glyphs that represent the95 printable characters (numbered 33 to 126)of the 7-bit ASCII character encoding schemeare stored as M × N bitmaps, which are ar-ranged as vectors of size R and placed in eachcolumn, wj . Rescale each column to the unitL2-norm, wj =

wj

‖wj‖, j = 1, . . . , R.

2. Partition the binary image X ∈ RP×Q into

M × N blocks forming a P/M × Q/N grid,where each block corresponds to a font glyphin the final ASCII art image. Construct V

from the blocks by arranging as vectors andplacing in columns. If X is not evenly divisibleinto M ×N blocks then perform zero paddingto the required dimensions.

3. Randomly initialise H; specify β & ǫ.

4. Fit V to W using the H update rule (Eq. 8),and repeat for the desired number of itera-tions.

5. Assign each block location in the original im-age a glyph based an a winner-takes-all ap-proach, where the maximum value in each col-umn of H corresponds to the winning glyphin W (Eq. 9). Reverse the block partitioningprocedure of step 2 and render the ASCII artimage using the identified glyphs in the spec-ified monospace font.

IV ASCII Art Examples

In order to demonstrate the utility of the proposedapproach we select a test image (UCD CASL logo)and perform conversion using SED (β = 2) andKLD (β = 1). Following the procedure detailedin Section III, we specify 100 iterations of Eq. 8,ǫ = 0, and construct W from a courier font (fontfile = c0419bt .pfb) with a glyph size of 19× 38

Binary Image Pseudoinverse

NMF SED NMF KLD

Fig. 2: A test image (UCD CASL logo) and three ASCII art representations, which are created using the pseudoinverseand NMF utilising the SED (β = 2) and KLD (β = 1) cost function. Inspection of the logo text reveals that NMFpreserves the curves best and minimises black space. Furthermore, the selection of a different β creates a different ASCIIart representation.

pixels (width × height). The glyph basis is fittedto our 1209 × 962 pixel test image, resulting in aASCII representation with 91×37 characters. As away of comparison, we also convert the test imageusing the pseudoinverse, H = |(WT

W)−1W

TV|,

and present the resultant ASCII art images inFig. 2. On initial inspection, the most notice-able difference between the three ASCII art imagesis the glyph used to represent a fully black block(p for pseudoinverse, Q for SED and | for KLD),which occurs because each cost function has a dif-ferent notion of what a correct solution should be,resulting the in selection of different glyphs. Com-parison of the pseudoinverse image to the SEDimage—where both methods produce minimumℓ2-norm solutions for an over-determined systemor equations—reveals that the non-negative con-straint on the SED image appears to make thelogo text look better defined. This is especiallyevident when inspecting the logo letters C and Son the pseudoinverse image, where the curve at

the top of each letter, as indicated in the originalbinary image, appears to be truncated. Therefore,it appears that introducing a non-negativity con-straint minimises the black space in the ASCII rep-resentation, preserving the curves in the originalbinary image. Moreover, our subjective assertionis backed up quantitatively, where the Frobeniusnorm of the matrix representation of the pseudoin-verse and SED image is 1.0×105 and 9.5×104 re-spectively.

The selection of the β-divergence as the pro-posed algorithm’s cost function, introduces an el-ement of flexibility to the algorithm, where differ-ent ASCII art can be produced for the same inputimage by specifying a different β. The effect ofthe selection of β is demonstrated in Fig. 2, whereit is evident that the KLD image utilises differentglyphs in its ASCII representation than SED, whilecontinuing to minimise black space. Using the pro-posed method, we present a number of ASCII artexamples of various images in Fig. 3.

Fig. 3: ASCII art representations, created by the proposed NMF procedure, of Homer J. Simpson, the Aphex Twin Logoand Andy Warhol’s print of Marilyn Monroe (after thresholding), which are available at http://ee.ucd.ie/∼pogrady/

V Discussion

It may be possible to improve the resultant ASCIIart representations by finding the most naturalgrid for the binary image, which may be achievedby shifting the image both vertically and horizon-tally and fitting the image to W. The grid thatresults in the best reconstruction, as indicated bythe signal-to-noise ratio for example, may be con-sidered to be the most natural grid.

The chosen glyphs in an ASCII art imageare selected based on a winner-takes-all approach(Eq. 9). It is possible to reduce the number of ac-tivations in H by using a sparse NMF algorithm[10], which may result in fewer iterations to achievethe same ASCII art representation.

For the glyph set used to construct W in ourexamples, M had the largest amount of black spaceas indicated by the Frobenius norm. However, Mwas not chosen as the fully black block glyph usingany of the presented cost functions, which suggeststhat a more suitable cost function exists.

The utility of ASCII Art in the early comput-ing era is clear. In today’s world, where transmis-sion of photograph quality images is not a prob-lem, ASCII art still has relevance. For example,the proposed method may be employed in imagemanipulation software, or may be used to createASCII art for the many bulletin board systemsthat are still popular today, e.g. 2channel [16].

Finally, in this work we concentrate on bi-nary images, where the resultant ASCII art ismonochromatic. However, it is possible to createmulti-colour ASCII art, where a binary image iscreated from a colour image and ASCII art conver-sion is performed giving a monochromatic ASCIIart representation, which is subsequently used tomask the original colour image.

VI Conclusion

In this paper, we presented a novel application ofNMF related methods to the task of automaticASCII art conversion, where we fit a binary im-age to a basis constructed from monospace fontglyphs using a winner-takes-all assignment.

We presented some examples, and demonstratedthat when compared to a standard pseudoinverseapproach, non-negative constraints minimise theblack space of the ASCII art image, producing bet-ter defined curves. Furthermore, we propose theuse of the β-divergence cost function for this task,as it provides an element of control over the finalASCII art representation.

Acknowledgements

This material is based upon works supportedby the Science Foundation Ireland under GrantNo. 05/YI2/I677.

References

[1] Wikipedia. ASCII Art — Wikipedia, the free encyclo-pedia, 2008. [Online; accessed 11-March-2008].

[2] Wikipedia. Unicode — Wikipedia, the free encyclope-dia, 2008. [Online; accessed 11-March-2008].

[3] P. Paatero and U. Tapper. Positive matrix factoriza-tion: A nonnegative factor model with optimal utiliza-tion of error estimates of data values. Environmetrics,5:111–26, 1994.

[4] Daniel D. Lee and H. Sebastian Seung. Algorithmsfor non-negative matrix factorization. In Adv. in Neu.Info. Proc. Sys. 13, pages 556–62. MIT Press, 2001.

[5] David Guillamet and Jordi Vitria. Classifying faceswith non-negative matrix factorization, 2002.

[6] M. N. Schmidt and H. Laurberg. Non-negative matrixfactorization with gaussian process priors. Computa-tional Intelligence and Neuroscience, 2008.

[7] Amnon Shashua and Tamir Hazan. Non-negative ten-sor factorization with applications to statistics andcomputer vision. In ICML ’05: Proceedings of the22nd international conference on Machine learning,pages 792–799, New York, NY, USA, 2005. ACM.

[8] Paris Smaragdis. Non-negative matrix factor decon-volution; extraction of multiple sound sources frommonophonic inputs. In Fifth International Conferenceon Independent Component Analysis, LNCS 3195,pages 494–9, Granada, Spain, September 22–24 2004.Springer-Verlag.

[9] D. FitzGerald, M. Cranitch, and E. Coyle. Soundsource separation using shifted non-negative tensorfactorisation. In Proceedings, IEEE InternationalConference on Acoustics, Speech and Signal Process-ing, 2006.

[10] Paul D. O’Grady and Barak A. Pearlmutter. Dis-covering convolutive speech phones using sparsenessand non-negativity. In Seventh International Con-ference on Independent Component Analysis, LNCS4666, pages 520–7, London, UK, September 2007.Springer-Verlag.

[11] S. A. Abdallah and M. D. Plumbley. Polyphonic tran-scription by non-negative sparse coding of power spec-tra. In Proceedings of the 5th International Conferenceon Music Information Retrieval (ISMIR 2004), pages318–25, 2004.

[12] Raul Kompass. A generalized divergence measure fornon-negative matrix factorization. In Neuroinformat-ics workshop, Torun, Poland, September 2005.

[13] Andrzej Cichocki, Rafal Zdunek, and Shun-ichi Amari.Csiszar’s divergences for non-negative matrix factor-ization: Family of new algorithms. In Justinian P.Rosca, Deniz Erdogmus, Jose Carlos Prıncipe, and Si-mon Haykin, editors, Independent Component Anal-ysis and Blind Signal Separation, 6th InternationalConference, ICA 2006, Charleston, SC, USA, March5-8, 2006, Proceedings, volume 3889 of Lecture Notesin Computer Science, pages 32–39. Springer, 2006.

[14] F. Itakura and S. Saito. An analysis-synthesis tele-phony based on maximum likelihood method. In 6thInt. Conf. Acoustics, pages 17–20, 1968.

[15] Paul D. O’Grady. Sparse Separation of Under-Determined Speech Mixtures. PhD thesis, NationalUniversity of Ireland Maynooth, 2007.

[16] Wikipedia. 2channel — Wikipedia, the free encyclo-pedia, 2008. [Online; accessed 21-May-2008].

automatic ascii art conversion of binary images using …€¦ · automatic ascii art conversion of...

Documents