mathfor: the mathematical formula recognition system · 2015-07-29 · mathfor: the mathematical...

MathFoR: The Mathematical FormulaRecognition System

E. Tapia. P. Holzschneider, A. Stoffel, S. Ihlefeld, G. Hohl and R. Rojas

Freie Universitat Berlin, Institut fur Informatik, Takustr. 9, 14195 Berlin, [email protected]

Abstract. We describe MathFoR, a system for the recognition of on-line handwritten mathematical expressions. The system consists of twomain components. The first component is a complete set of Java librariesthat handles and recognizes digital ink. The second component is the lay-out analyzer that translates the recognized ink into a tree structure inXML format, which is transformed to another language using XSLT. Wepresent also example of final top level applications developed with oursystem.Keywords: Graphics Recognition, Technical Drawings, Complete Sys-tems.

1 Introduction

The introduction of devices such as personal digital assistants (PDAs) or TabletPCs has influenced a growing interest on developing pen-computing applicationsduring the last years. Such devices use the stylus as input tool, being a substitutefor keyboards and mouse and a natural extension of the pen and paper, themost widely used form to collect information. Handwritten information storedin electronic form eases its processing and understanding, in particular, withartificial intelligence techniques.

Handwriting Recognition (HWR) is an important technology that convertshandwriting into text or other data structures that can be automatically pro-cessed by computers. An example of these technologies is the stroke alphabetGraffiti that is the most widely used method for symbol recognition in PDAs.HWR can be also used to develop methods for user-computer interaction and in-terfaces, by recognizing gestures that indicate the execution of operating systemcommands.

Most of the handwritten information in pen-computing is, however, a two-dimensional composite of symbols and drawings that have rich structure, whichcannot be simply interpreted and recognized as plain text. In particular, recog-nition mathematical notation –such as formulas, matrices and diagrams– is acomplex and difficult task that has gained attention among the AI communityduring the last years. Recognition of such two-dimensional structures have apotential application in scientific document processing, human-computer inter-action and mathematical knowledge management, only to mention some areas.

1.1 Existing Architectures

Development of pen-computing applications led vendors and researchers to de-velop software architectures and to implement libraries that handle digital ink.The next sections give a short overview about such architectures.

General-Purpose ArchitecturesMicrosoft, for instance, released the Tablet PC version of the Windows oper-

ating system that provides a Software Development Kit (SDK) for processing,storing, and recognizing digital ink. These libraries are specialized towards therecognition of letters and words of western languages and East Asian languages,allowing a more natural writing when compared with Graffiti. One of its mostattractive characteristics is that the recognizers are integrated natively in theoperating system. Unfortunately, some researchers have found difficult to extendthe recognizers and to define new ones within the SDK.

Madhvanath et al. [3] offer, in contrast, a general-purpose open-source toolkitfor on-line handwritten recognition. The aim of the LipiTk toolkit is to facilitatethe development of new recognizers and their use in real-world applications. Themain components that the toolkit provides are a generic library to handle digi-tal ink, two algorithms for symbol recognition, and tools to collect and annotatedigital ink, and to train classifiers. LipiTk runs in both Linux and Windows, andmost of the programs are written in standard C++. The toolkit is a very am-bitious project that aims also to define standard interfaces and communicationprotocols for exchange of digital-ink among different platforms and devices.

Specialized Architectures for Mathematical NotationMost of the working systems and prototype programs for the recognition of

pen-mathematics follow, in essence, the steps proposed by Lee and Wang [2].Although his system is specialized for off-line mathematical equations, i.e. ex-pressions in scanned documents, almost all of the procedures can be used aswell for the recognition of pen-based mathematics. We have only to substitutethe “Optical Scanning” step for a “Ink Input” step in their flow diagram. Theirprocedural framework can be divided in three main modules:

1. Document Segmentation: Mathematical expressions are extracted and iso-lated from text lines.

2. Symbol Recognition: The label of symbols is established by means of a clas-sifier

3. Structural Analysis: The structure of the recognized symbols is analyzed toform a hierarchical structure that represents a mathematical expression.

The internal hierarchical structure is processed and interpreted to obtain a finalresult. This result can be a character string that represents the expression inLaTeX, another structure used by a computer algebra system, or an image rep-resenting the plot of a function given by the expression, among other structures.

Fig. 1. MathFoR architecture.

Differences between recognition systems found in the literature are generated byvariations and improvements of these modules.

Smirnova and Watt [4] propose an architectural framework for pen-basedmathematics from the perspective of software engineering. Their target deploy-ment is document processing and mathematical computing. Their objective isto define a platform-independent pen-based framework for mathematical entry,edition and calculation. These objectives lead them to define two portabilitycriteria for the framework, which consider the software life and usability given aplatform, and the software life across platforms. The modules of their frameworkwhich shall vary, depending of the platform, are 1) basic software to collect digitalink, 2) low level processing module to interconnect the components that processdigital ink and manipulate mathematical objects, and 3) use of the frameworkin some hosting application.

2 The MathFoR System Architecture

MathFoR is a system for the recognition of on-line handwritten mathematicalexpressions. Even though our system did not intent to be, at least at the verybeginning of our research, a general-purpose toolkit for pen-based systems, ourexperience developing MathFoR led us to regard the whole system in a generalconcept as represented in Fig. 1(a).

MathFoR’s current structure aims to build top-level application in the topof two main components: one handling the raw data and the other analyzingthe document structure. Such an application is a bridge between the main com-ponents. The main components hide a third component that implements therequired classification algorithms. It remains, in general but not necessarily, hid-den to the top level application. The whole architecture is based on an executionplatform that can be the operating system or a virtual machine.

The very concrete implementation of the architecture uses the Java virtualmachine by Sun Microsystems as execution platform, and Weka library [8] forthe classification component, see Fig. 1(b). Java allows users to run the systemover a wide variety of operating systems including Windows, Linux, MacOS andSun OS. Weka contains a collection of machine learning algorithms that can be

included directly in the Java code. We developed a pair of specialized toolkits,Jink and MathFoR, as a concrete implementation of the raw data handling andthe structural analysis components. Both are described in the next section.

3 MathFoR Main Components

3.1 Digital-Ink Toolkit: Jink

Processing and interpreting digital stylus input imposes special requirements toraw data representation, visualization and integration into applications. Eventhough available general-purpose vector graphics frameworks offer a completeinfrastructure to deal with vector graphics, they don’t comply with the specialneeds (i.e. runtime guarantees or data representation) when dealing with digitalink. Additionally, most of them don’t integrate well and seamlessly to customprojects -not speaking of being lightweight and easy-to-use. Thus, forcing theuse of an improper or inconvenient framework, or quick and dirty custom GUIcode, constrains and slows down the development on small experimental projectsfor applications evolving out of current research.

In contrast to existing frameworks, JInk is a Digital Ink Toolkit for Java thatemerged from the cumulated working practice of the Workgroup for ArtificialIntelligence on Project MathFoR. The library has been especially designed todefuse the most significant difficulties when dealing with digital ink, such thatforthcoming projects can be placed upon a solid foundation. In short: the toolkitkeeps all needed runtime guarantees without dropping versatility and applicationquality look and feel.

JInk provides:

1. Straight and uncorrupted live-input recording: Input recording responsive-ness and processing speed does not depend on input length, document sizeor visualization complexity. Stroke insertion and deletion, visibility calcula-tions, and painting are local access operations that have O(1) execution timeexecution time in practice, no matter how large the document is.

2. Ease-of-use and flexibility: JInk consists of 100% Java code and is entirelybased on Sun’s Java2D foundation and the Swing GUI Toolkit –J2SDK 1.3and up or J2SDK 1.5 when using generics. JInk can be integrated into everyJava AWT or Swing-based application or applet. All GUI classes conse-quently follow the Swing design patterns, such that every developer who isused to Swing-API can work with JInk without reading a manual. JInk islightweight –less than 100k binary package size– and does not introduce anyadditional dependencies to heavyweight frameworks.

3. Especially designed for concurrent and deferred processing: Most applica-tions of digital ink interpretation perform complex computations on recordedinput concurrently or distributed among (remote) workers in order not toharm user interface responsiveness. This practice usually introduces manydata synchronization difficulties and pitfalls. By using JInk, the raw dataexchange with concurrent or distributed workers is inherently thread-safe

without any use of thread synchronization, and without handing the needfor thread synchronization to the application-level code: The data represen-tation virtually mimics the copy-on-write paradigm that is used by JavaStrings. Modifications on ink strokes do not alter strokes itself, because avirtual copy is created instead. Therefore, referenced data cannot lose in-tegrity.

The JInk framework itself can be divided into two main packages:

– The core package contains everything needed for the actual digital ink datarepresentation. The central component ”Ink” encapsulates the raw data andalso provides functions for most common modification operations. All thoseoperations are well designed and stick to guaranteed execution cost bound-aries. Additionally, a sequence of modifications is executed at the moment ofaccess. Such collected operations are aggregated and, if possible, conflated.This minimizes the overall execution cost of data batch processing and alsooptimizes the overall memory footprint and data storage or transfer volumes,since only the executed operations need to be stored or transferred insteadof the modified data. Finally, the Ink component fully integrates into Java’sGUI API and can be directly used as a drawing primitive in Sun’s Java2D.

– The Editor package contains GUI components that can be assembled toan end-user ink editor. The editor displays and edits arbitrary large inkdocuments, and provides familiar input and drawing methods, including truerubber erase, selection and transformation and unconstrained undo and redo.

3.2 Structural Analysis Toolkit: MathFoR

The layout analysis of mathematical notation operates on recognized symbols,acting independently from the original raw input, because we do not relay di-rectly on the stylus information. This approach assumes that the raw data isrepresented as node elements containing information about their identity andspatial location within the abstract document. The layout analyzer takes a listof nodes and, depending on their spatial relationships, constructs a baselinestructured tree that represents the mathematical expression, see Fig fig:mathfor.

SymbolsMathFoR defines symbols and their relationships in a XML configuration file

that consists of three main parts: symbol classes, orientation and the symbolsthemselves. A symbol class defines a ”grammar” of the sensitive regions of asymbol and the type of relation between these regions. In our standard gram-mar, for instance, we expect that an element of the ”variable” class has super-scripted or subscripted components. An element of the ”left parenthesis” classhas neither a superscripted component nor a subscripted one. The orientationpart defines some spatial regions of symbols. This information contains the as-cent and descent thresholds of the symbols and the left and right border space

Fig. 2. A conmutative diagram recognized by MathFoR. The baseline structure tree isshowed at the left of the component.

in the bounding box. Finally, the symbol part lists concrete symbols associatedwith a class and orientation and associate ambiguities which can be resolved byMathFoR. The information about ambiguities helps the system to distinguish,for example, between a capital sigma and the sum operator.

The configuration file allows the use of different symbols sets and grammarsunder a common layout analyzer. Users can change symbol sets and grammarswithout modifying the MathFoR core classes or their program code. Especiallythe orientations part of the symbols depends on the user and the input method:If one uses an OCR system for the recognition of symbols, one does not concernabout the irregularities in writing; the configuration can be more restrictive,increasing the recognition rate in this case.

RecognitionRecognition of mathematical expressions uses some algorithms developed previ-

ously by our work group [6, 5, 7]. The algorithms use a Minimum Spanning Tree(MST) construction, using the symbols as nodes of a totally connected graph,combined with information about the typical usage of the symbols, as given inthe aforementioned configuration file. Mathematical structures with tabular lay-

out require, however, special treatment, which led us to develop a new algorithmto recognize such structures.

Mathematical structures with tabular layout, like matrices or commutativediagrams, require special assumptions. One assumption about matrices is thatthey occur only between parentheses. Therefore, when MathFoR recognizes anopen and closing bracket pair, it switches to matrix mode and tries to recognizethe matrix, assuming that the matrix components are mainly cells aligned alonga grid. Commutative diagrams consist on an abstract level of cells and arrows,where each cell is connected to another by at least one arrow. This representationis inspired by several packages for LaTeX that commonly uses a matrix formatto describe commutative diagrams. The recognition of matrices uses projectionmethods to find their cells, and the recognition of diagrams is a two-step methodthat uses an optimization criterion [9].

The first step constructs an initial grouping of the raw symbols in the dia-gram, using the arrow ends in combination with the MST approach, see Fig. 3(a)-(b). By using the MST, each symbol is assigned to a start, end point or to a labelof an arrow. Symbols that are connected to a start or end point are grouped withtheir neighbors to construct an initial group of cells in the diagram.

The second step finds the final cells and the grid structure of the diagram. Tosimplify this task, we assume that the cells are aligned in a grid structure, and therows and columns of the grid are parallel to the screen coordinate system. Theseassumptions allow us to define a cost function for a given grid, transforming ourlayout analysis into a minimization problem. We found in our experiments thata simple hill climbing algorithm is sufficient to find, in most cases, the correctdiagram. See Fig. 3(c)-(d).

The cost function is a linear combination of several numerical features of thegrid. At first, we consider the number of rows and the height of the highest row.The minimization of these features ensures that the rows are compact, and thatthere are not too many rows in the grid. The next features are row overlappingand distance between rows, which help to overcome some irregularities in theinput. If the overlapping increases, then the grid cost decreases. The minimaloverlapping of groups in the same row is used to split a row, even when thetwo rows are closed to each other. If the distance between two adjacent rows istoo small, then they merge. Analogous features are used to create and destroycolumns within the grid. Output

The output is a structure tree which can be used by a Java program directly,using the classes defined in MathFoR. The structure tree has also a representa-tion as XML document, which allows the user to transform the tree into a customformat using XSLT. At time, we use a set of XSLT transformations to convertthe structure tree into mathematical expressions in LaTeX or Mathematica for-mat. The transformations defined in the XSLT file translate the names of thesymbols into a proper representation and deal with the quirks of the selectedformat. In LaTeX, for example, it is necessary to create two objects \left( anda \right., when the original text only has an opening parenthesis.

Fig. 3. (a) The intial grouping collect symbols that belong to the same cell in differentgoups, which leads to (b) an incorrect conversion of the diagram. (c) After otimizationthe final grid is found that leads to (d) the correct interpretation of the diagram.

Queries on the Structure TreeAccessing and extracting nodes in the structure tree were quite simple oper-

ations in the first versions of MathFoR, because we accessed only direct childnodes and discarded the nodes that we don’t needed. Such operations became,however, a complicated and confusing task during the recognition of commuta-tive diagrams, because we not only accessed direct children but also grandchil-dren using some special constraints. For that reason, we decided to implementa query mechanism that allows us to build up a query on the structure tree inadvance, and to execute this query late as many times as we need. This reducesthe code complexity and improves the readability.

4 Top-Level Applications and Tools

JEdit Plug-inAs a demonstrator of the MathFoR system, we developed a JEdit plug-in that

embeds the recognizer in an editor used to write LaTeX manuscripts. Figure 4(a)

Fig. 4. (a) The JEdit plug-in and (b) WebFoR.

shows a view of the LaTeX editor in the background, when the handwritingrecognizer plug-in has been started. The idea is that when a person is workingwith LaTeX and needs to enter a complex formula, he starts the plug-in bypressing a function key on the keyboard. The function key opens a window wherethe user can draw the formula using the mouse or a digitizing tablet connected tothe computer. The digital ink is then translated into LaTeX code that the editorimmediately incorporates into the document. Here we are thinking of proficientLaTeX users who usually enter small expressions directly using the keyboard.Thus, the handwriting recognition plug-in will be called only for larger formulas,where handwriting input is more convenient.

The advantage of using a standard LaTeX editor for embedding the hand-writing recognizer is that information about the variables and processed symbolsis immediately available. The special syntax of LaTeX tells the system exactlywhich variables are being used. In a future research, the LaTeX editor will allowus to use context information for on-line improvement of symbol recognition,which can be accomplished more easily as when only considering the digital inkdocument without the context.

WebFoRAnother demonstrator for the MathFoR system and specifically for the capa-

bilities of the JInk library is WebFoR, a web-based version of the JEdit LaTeXPlug-In, see Fig. 4(b). The WebFoR application design splits the MathFoR sys-tem into a notepad applet integrated into a website, which is executed withinthe client-side web browser. A server-side ink recognition daemon delegates allincoming recognition requests among a set of workers. The client-side notepadallows a user to input a formula as done in the JEdit plug-in. All Input andmodifications are transmitted to the ink recognition daemon and evaluated. Theresult is then shown within the website.

Fig. 5. (a) Main view of the classification wizard. (b) The classifier dialog.

This split design keeps the client-side user interface responsive and smallenough for a website, while server-side daemon encapsulates the heavyweightsymbol classifier and the formula recognition system. In that way, a full-scalerecognition system can be used, since there is no need to cut down the recognitioncomponent in order to fit into an applet.

Character Classification WizardThe main purpose of the Character Classification Wizard is to offer a comfort-

able and easy way to enter and manage sets of handwritten symbols, which willbe used as training samples to build new classifiers. The classification wizard isbased on the JInk classes, see Fig. 5.

One can use the wizard to create projects that organize the training samplesinto user-defined alphabets. Such a structure makes possible to select a certainsubset of the whole database to train a classifier. These alphabets contain a set ofisolated characters that are managed internally as an instance of the InkGruopclass each.

Selected characters are used to train default and custom classifiers. The wiz-ard offers two default classifiers, k-nearest neighbors and multilayer perceptron,which are included in the Weka library. We have chosen the Weka library be-cause it already implements a wide variety of classification algorithms and isfreely available. Custom self-written classifiers can be created by only imple-menting the Classifier interface of Weka.

The output of the training process is an instance of the InkClassifier class.This is just a convenience class that contains the trained Weka-classifier. InkClas-sifier encapsulates the digital-ink preprocessing and feature extraction algo-rithms needed as input for the Weka classifier. The default preprocessing settings

Fig. 6. The UNIPEN viewer.

can be adjusted in the training dialog or, again, custom preprocessing algorithmscan be implemented by using the given Preprocessor interface. Object serializa-tion is used to store an InkClassifier abject. This eases the inclusion of the clas-sifier in some external application that tryes to classify a new InkGroup withoutworrying about a reimplementation of preprocessing and data conversion.

UNIPEN Reader and ViewerTo make sure that classifiers deliver the best possible results, it is necessary to

train them with as much data as possible. For this reason we use the UNIPENdatabase [1], the de-facto database of on-line handwriting for benchmarkingclassifiers. This database contains a collection of online handwritten symbols,collected by 40 different institutions and companies around the world.

The UNIPEN data format is very flexible and allows the user to store a broadvariety of symbols, since the writer has the freedom to choose the hardwareand the organisation of the data into metadata files. A typical UNIPEN datafile consists on sets of strokes organized as segments that can be handwrittenparagraphs, word or characters. The file can contain several .INCLUDE codewordthat indicate the path to the actual stroke data that define the segments.

The main function of the viewer is, of course, the visualization of the UNIPENdata, and its conversion into an own internal format. The viewer uses the meta-data file to collect the information needed to find the right coordinate sequencefor a symbol in the database. The coordinates are normalized and used to createinstances of the InkGroup object that are used by our classification wizard orother algorithms. See Fig. 6.

5 Status

The first version of the MathFoR system was developed between 2004 and 2005.Some modification and extensions were made in 2006 to include the recognitionof matrices. This version of the system was included as an intelligent tool withinthe Electronic Chalkboard (E-Chalk) [6]. The treatment of digital ink becamea toolkit as its own and has been developed separately since the beginning of2007. That allowed the rewriting of the layout analysis libraries from scratch,also including the separate definition of layout parameters using XML files andthe use of XSLT to convert the final recognition into several formats. The wholeset of libraries has been developed by some researches in our work group and,currently, by hard-working students of our university.

The current version has been used only internally to develop the tools andapplications described in the last section. We plan to release a beta version of thelibraries at the beginning of October under some open source license. Studentsof our university will use our library during the next semester in the patternrecognition courses and other specialized seminars, related with recognition ofon-online handwriting and pen-based interfaces.

6 Summary and Further Work

MathFoR uses the Java Virtual Machine as execution platform, taking as advan-tage that the virtual machine runs on a wide variety of operating systems. Themain components of the system are the libraries JInk and MathFoR. They han-dle digital ink and analyze the spatial relations between the recognized symbols.A top level application is a bridge between both packages. It is also a concreteimplementation of several abstract classes and interfaces defined in JInk andMathFoR.

JInk is a library based completely in the Component Architecture of theJava AWT. For this reason, experimented Java programmers can easily integratedigital ink documents into GUI Applications. Digital ink is defined as an interfacethat has a concrete implementation as a GUI Component Object, offering all theadvantages of the components patterns defined in Java AWT: handling mouseevents, drawing, serialization, affine transformations, etc. The library also offersan editor for digital ink, that can be easily extended for the developer to includeown processing algorithms and recognizers.

The structural analysis is based on previous research in our work group. Anew library for the structural analysis has been implemented from scratch, of-fering a better ordering and organization of the classes that describe the dataused by the algorithms. The XML representation of the data and the recognizedstructures allows a flexible the translation, via XSLT, of the recognized expres-sions into several formatting and programming languages. We have currentlytranslators into the LaTeX language.

Our libraries have been growing ”horizontally” depending on the needs ofthe system. We have on-going projects for storing and retrieval of digital ink

data, annotation of digital ink, word recognition and communication using client-server architecture for the recognition of digital ink.

References

1. I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPENProject of On-Line Data Exchange and Recognizer Benchmarks. Pattern Recogni-tion, 1994. Vol. 2-Conference B: Proceedings of the 12th IAPR International Con-ference on Computer Vision & Image Processing, 2, 1994.

2. H.J. Lee and J.S. Wang. Design of a mathematical expression understanding system.Pattern Recognition Letters, 18(3):289–298, 1997.

3. S. Madhvanath, D. Vijayasenan, and T.M. Kadiresan. LipiTk: A Generic Toolkit forOnline Handwriting Recognition. In Proceedings of the tenth International Work-shop on Frontiers in Handwriting Recognition (IWFHR-10), 2006.

4. E. Smirnova and S. Watt. A context for pen-based mathematical computing. Pro-ceedings of the 2005 Maple Summer Conference, 2005.

5. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Expres-sions using a Minimum Spanning Tree Construction and Symbol Dominance. FifthIAPR International Workshop on Graphics Recognition (GREC), 2003.

6. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Formulasin the E-Chalk System. Proceedings of the Seventh International Conference onDocument Analysis and Recognition (ICDAR), 2003.

7. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Formulasin the E-Chalk System - An Extension. Proceedings of the Eighth InternationalConference on Document Analysis and Recognition (ICDAR), 2005.

8. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools andTechniques with Java Implementations. Morgan Kaufmann, 1999.

9. M. Ye, H. Sutanto, S. Raghupathy, C. Li, and M. Shilman. Grouping Text Linesin Freeform Handwritten Notes. Proceedings of the Eighth International Conferenceon Document Analysis and Recognition (ICDAR), 2005.

mathfor: the mathematical formula recognition system · 2015-07-29 · mathfor: the mathematical...

Documents