issues in mathematical table recognition - school of computer

Issues in Mathematical Table Recognition

Mohamed Alkalai and Volker Sorge

School of Computer Science, University of BirminghamEmail: M.A.Alkalai|[email protected]

Abstract. While a number of techniques have been developed for ta-ble recognition in ordinary text documents, when dealing with tables inmathematical documents these techniques are often ineffective as tablescontaining mathematical structures can differ quite significantly from or-dinary text tables. In fact, it is even difficult to clearly distinguish tablerecognition in mathematics from layout analysis of mathematical formu-las. Again, it is not straight forward to adapt general layout analysistechniques for mathematical formulas. However, a reliable understand-ing of formula layout is often a necessary prerequisite to further semanticinterpretation of the represented formulae.We present the first steps towards a table recognition technique that spe-cialises on tables in mathematical documents. It is based on a robust linerecognition technique for mathematical expressions, which is fully inde-pendent of understanding the content or specialist fonts of expressions.Based that ultimately will allow for number of alternative, neverthelessvalid interpretations of tables.

1 Introduction

Table recognition is field of document analysis that has attracted much atten-tion over the years. And our aim is to develop a table recognition algorithmthat is particularly good for tables containing mathematical expressions. As thedistinction between tables and complex typeset mathematical formulas spanningmultiple lines is often difficult, we forgo a narrow definition of table and insteadconsider a far wider range of expressions as tables as is usually the case in theliterature [15].

The tabular form in which many mathematical expressions are being pre-sented can often lead to ambiguities in the interpretation what essentially con-stitutes a table component (i.e., a column, row or cell). While in table under-standing of ordinary text tables the goal is generally to restrict a result to a singlevalid interpretations, for mathematical tables these ambiguities can lead to sev-eral perfectly acceptable interpretations. Therefore, the aim of our recognitionprocedure is produce as a result a number of equally valid interpretations.

Our approach of the problem is in two steps: First we design an algorithmthat enables the separation and detection of lines in mathematical documents.This is a non-trivial task since contrary to ordinary text simple line separationby vertical whitespace does not suffice, due to the two-dimensional nature of

Fig. 1. Cell identification with tables containing multiline expressions.

mathematical formulas. The procedure uses simple yet effective spatial meansand serves as a bases of a cell detection algorithm. This algorithm builds onsome of the same spatial measures as the line detection and will ultimately leadto the full-fledged table recognition algorithm.

In the next section we will discuss some examples to further motivate ourproblem. We will then describe the histogrammatic algorithm for line detectionin mathematical documents and a first approach at cell detection for tabularmathematical formulas.

2 Interpretation of Mathematical Table

The boundary what constitutes a mathematical table and what is a complexformula given in a tabular presentation is quite fluid. While some tables, forexample Cayley tables in abstract algebra, are quite straight forward to recog-nise due to their easy tabular composition and sometimes clear separation ofrows and columns with bars, this is generally not the case. In fact, the commonabsence of any vertical or horizontal bars as well as the complexity of formulasoften spanning multiple lines make it not only difficult to identify the cell struc-ture but can lead to a number different interpretations for the same table, whichare often equally valid.

Figure 1 presents two tables taken from [15] with a fairly conservative col-umn and row layout. There is indeed a unique ideal interpretation for Table1, consisting of two columns and three rows, where the cell in the lower righthand corner contains a math expression spanning two lines. And given the dif-ferent in font weights one could even interpret the first line as a clear header row.

Table 2 on the right of Fig. 1 is slightly less straight forward given the over-lapping expressions in the second line. However, one can still come up with aunique interpretation of five rows and three columns. But it becomes alreadyobvious that due to the overlap for formulas as well as the condition formulas

Fig. 2. Two different interpretations of a single table.

which are effectively in a column of their own this interpretation is not so easyto obtain automatically.

Figure 2 presents a clipping from a more complex table also taken from [15].Here we can already see two different interpretations, both with their own merits.While both interpretations regard the basic table as consisting of four columns,the first interpretation results in three rows, using the formula names on theright hand side as header column. The second interpretation on the other handuses the enumeration in the first column as header. Obviously there are still moreinterpretations: For example, one with three columns with the middle columncontaining complex formulas or even one with only two columns, where the rightcolumn contains named multiline formulas that possibly even be considered assubtables.

3 A Histogrammatic Approach to Line Segmentation

The basis of our table recognition system is a robust algorithm for separationof mathematical lines. While in regular text documents generally lines can beclearly separated by simply detecting consecutive whitespace between lines andapplying vertical cuts, for documents containing mathematical expressions sim-ple vertical separation of lines does not suffice due to the occurrence of particularartifacts of mathematical notations (e.g., math accents, the limits of sum sym-bols). As a consequence our first goal is the development of a robust line recog-nition algorithm that is reliable independent of knowledge on any peculiaritiesof mathematical expressions.

The basic idea of our approach is to detect all possible individual lines firstand then merge neighbouring lines into single lines likely to contain mathemat-

ical expressions. Thereby we rely neither on knowledge of the content of lines,font information nor vertical distance. Instead we use a histogrammatic mea-sure on space within a single line. In a final step we then employ simple heightconsiderations to detect lines that have not been merged or wrongly merged.

In summary our procedure consists of the following six steps:

1. Compute bounding boxes of all glyphs on a page.2. Construct lines by separating sets of glyphs with respect to vertical whites-

pace. That is, we use vertical cuts similar to projection profile cutting [30].3. We classify lines as principal or non-principal — where the former constitute

a proper line while the latter should be merged with a principal line aboveor below — by comparing the horizontal distances of neighbouring glyphs.A line is classified as non-principal if no pairs of its glyphs have a separationdistance within a certain threshold, where the threshold is computed fromthe histogram of all horizontal distances between neighbouring glyphs forthe entire page.

4. The classification is then further corrected by comparing non-principal toneighbouring principal lines with respect to their maximum character height.If a non-principal line is not within a ratio of the character height of itsneighbouring principal lines it is promoted to principal line.

5. In a second corrective step we compare all lines identified as principal ona page with the maximum height of all existing non-principle lines on thatpage. If any principal line is less than or equal to that maximum, it is demotedto non-principle line.

6. Finally non-principal lines are merged recursively with their closest neigh-bouring principal lines.

More formally we define the following concepts for our algorithms. First werender our notion of bounding boxes more precise:

Definition 1 (Bounding Box). Let g be a glyph, then the limits of its bound-ing box are defined by l(g), r(g), t(g), b(g) representing left, right, top and bottomlimit respectively. We also have l < r and t < b.

Definition 2 (Vertical and Horizontal Overlap). Let g1,g2 be two glyphs.We say g1 overlaps vertically with g2 if we have [t(g1), b(g1)]∩ [t(g2), b(g2)] 6= ∅,where [t(g), b(g)] is the interval defined by the top and bottom limit of the glyphg.

Similarly we define horizontal overlap of two glyphs g1,g2 by [l(g1), r(g1)] ∩[l(g2), r(g2)] 6= ∅.

We can now define a line using the vertical overlap on a set of glyphs.

Definition 3 (Line). Let G = {g1....gn} be a set of glyphs. We call a subsetL ⊆ G a line if for every g ∈ L there is a h ∈ L such that g and h overlapvertically and there is not g ∈ G \L that overlaps vertically with any element inL.

We can now define the distance measure with respect to which we will con-sider histograms.

Definition 4 (Horizontal Distance). Let L = {g1....gn} be a line. We calltwo glyphs g, g′ ∈ L neighbours if r(g) < l(g′) and there does not exist a g′′ ∈ Lwith g′′ 6= g and g′′ 6= g′ such that [l(g′′), r(g′′)]∩ [r(g), l(g′)] =6 ∅. We then definethe horizontal distance d between two neighbouring glyphs g, g′ as d(g, g′) =l(g′)− r(g).

Observe that in the above definition we define distances only for elements inthe line that do not overlap horizontally. Thus the distances represent the actualwhitespace in lines.

The distance measure from the previous definition allows us now to computea histogram that captures the horizontal distances between glyphs in lines forthe entire page. Figure 3 shows two examples for the histograms, where the x-axis denotes the values for the distance measure d in pixels and the y-axis thenumber of occurrences of a particular distance. Note, that when building thehistogram, we deliberately omit all the values where no distance occurs or inother words, where the y value is equal to 0.

We can observe a general pattern in these histograms: They can be split intotwo parts by a global minimum that is roughly in the middle of the x-axis. Thisleaves two parts, each with a global maximum. Furthermore in the right part onecan identify a further global minimum. While this can be at the very end of thex-axis it usually is not. We call these two minimal points v1 and v2, respectively,and use them to define classification of lines as follows:

Definition 5 (Principal Lines). Let L be a line. We call L a principal line ifthere exists two neighbouring glyphs g, h ∈ L with v1 ≤ d(g, h) ≤ v2. OtherwiseL is a non-principal line.

The intuition behind this definition is that the values in the histogram lessthan v1 represent distances between single characters in a word or a mathe-matical expression, whereas the area between v1 and v2 represent the distancebetween single words, which generally do not occur in lines that only constitutepart of a mathematical formula, for example, those consisting of limit expressionof a sum.

While this measure alone already yields good results, it can improved upon byconsidering a simple ratio between glyph heights of principal and non-principallines.

Definition 6 (Height Ratio). Let L1 and L2 be two consecutive lines, suchthat L1 is a non-principle line and L2 is the nearest principle line to L1. Ifmaxg∈L1

[b(g) − t(g)] > 1T maxg∈L2

[b(g) − t(g)], where 1 ≤ T ≤ 2 then L1 isconverted into a principal line.

Observe that the value for the parameter T is fixed and determined empiricallyby experiments in on a small sample set.

Fig. 3. Examples of pages and their histogram of the gap between glyphs

Since the previous step tackles only the problem of wrongly classified princi-pal lines, we also need to define a corrective instrument to detect non-principallines that have wrongly been classified as principal lines. This is achieved asfollows:

Definition 7 (Non-principal Height Bound). Let Ln = {n1, n2, ...nl} bethe set of non-principal lines of a page and Lp = {p1, p2, ...pk} be the principallines of the same page.

Then we define the non-principal height bound as the maximum height of allnon-principal lines M as

M = maxn∈Ln

|b′(n)− t′(n)|,

where t′ and b′ are the top and bottom limits of L respectively, such that t′(n) =ming∈n t(g) and b′(n) = maxg∈n b(g).

Any p ∈ Lp is converted to a non-principal line, if and only if, max[b′(p) −t′(p)] ≤M .

Once the classification of lines is finished, in the final step we merge non-principal lines with their horizontally closest neighbouring principal line, butonly if there exists horizontal overlapping between them. If not, the non −principal line is converted to principle line. This can be formally defined asfollows:

Definition 8 (Merging Lines). Let N and P be non-principal and principallines respectively, such that P is the nearest neighbour of N . Let l′ and r′ bethe left and right limits of L respectively, such that, l′ = ming∈L l(g) and r′ =maxg∈L r(g). If l′(P ) < r′(N) or r′(P ) > l′(N) then N and P are merged.Otherwise, N is converted to P .

4 Experimental Results

In an initial experimental evaluation of our approach we have carried out on 200pages. These pages are taken from 12 documents comprising a mixture of books[17, 22, 24, 10, 27] and journal articles [21, 20, 31, 28, 12, 16, 8].

In table 1 we present our experimental results for our dataset. We havecompared using simple vertical cuts, with our techniques of using the horizontaldistance measure introduced in Def. 4 as well as using additionally the heightratio defined in Def. 6 and the non-principal height bound defined in Def. 7. Asheight ratio parameter we have set T = 1.7, a value that was experimentallydetermined on a small independent sample set.

Altogether we manually identified 5801 lines in the 200 pages of the dataset.We compare this number with the number of lines found altogether and thenumber of lines identified correctly, that is, those lines corresponding to theactual line as manually identified in the dataset.

Method Pages. Total line Lines found Correct lines Accuracy

Vert. Cuts 200 5801 6987 5015 86.4%Hori. Dist. 200 5801 5727 5265 90.7%

Height Ratio 200 5801 5910 5587 96.3%Height Bound 200 5801 5863 5625 96.9%

Table 1. Experimental results for line recognition.

Not surprisingly simple vertical cuts results in a larger number of lines and,as there is no subsequent merging of lines, in a relatively low accuracy of 86.4%.Using the horizontal distance measure improves this accuracy, however, in gen-eral merging too many lines. This is corrected by the addition of the height ratio

that re-classifies some of the lines incorrectly assumed to be non-principal asprincipal lines. As a consequence we get a slightly higher number of lines butalso a higher accuracy of 96.3%. A further slight improvement in this accuracyto 96.9% is obtained using the height bound.

To further examine the robustness of our technique and in particular to ruleout that there was overfitted to our original data set we have experimented witha second independent and larger data set. The data set contains 1000 pagescomposed from more than 60 mathematical papers different of our original set.(Some of the papers included are [1–3, 9, 4, 5, 11, 6, 7].)

We ran our technique on this second larger data set and then manuallychecked the results by painstakingly going through every page line by line. Con-sequently we have done this comparison only for the full classification includingboth height ratio and height bound correction. And while we can not rule outsome classification mistakes due to human error we are very confident that theexperimental results given in Table 2 are accurate.

Pages. Total line Lines found Correct lines Accuracy

1000 34146 34526 33678 98.6%

Table 2. Experimental results of 1000 pages

Table 2 demonstrates that although, the data set is five times the size ofthe previous one our classification results remain stable. In fact, one can seethat in comparison with table 1 we have even a increase of recognition rate byapproximately 2%. This result gives us confidence about the effectiveness of ourtechnique even on large datasets and documents.

For the lines that were not identified, it is possible to categorise the recogni-tion error into two types.

Incorrect Non-principal Lines: The most common error stems from classify-ing a line with respect to the horizontal distance measure as a principal line thatshould actually be non-principal. This is the case when there is a gap betweentwo neighbouring glyphs that satisfies the horizontal distance condition. Beloware some examples show several cases of errors taken directly from our dataset.

Although, these expressions should be detected as a single line, the limitsunder the two summation symbols are at a distance that coincides with the dis-

tance identified by the histogram for the entire page. Likewise, in the expressionbelow, also taken from our dataset, the tilde accents have a similar distance:

Incorrect Principal Lines: This error occurs when a line is initially classifiedas non-principal line as it does not contain any glyph gaps that coincide withthe distance measure derived from the histogram. Examples of these lines arethose with single words, page numbers, single expressions etc. While these canbe corrected by the height ratio, sometimes they are not as they do not satisfythe ratio condition. Below is an example taken from our dataset:

Here the page number 12 is merged as a non-principal line to the expressionabove, as firstly it does not exhibit a glyph gap satisfying the distance mea-sure and secondly its height is significantly smaller as the height of the openparenthesis in the mathematical expression.

5 Towards Full Cell Recognition

Having a robust line finding algorithm available for documents containing largenumbers of mathematical expressions gives us a good basis to tackle our actualgoal, identifying tabular layout of mathematical expressions. The method willattempt to overcome the problem discussed in the literature. In particular, Silvaet al [26] state that most approaches at table recognition are still not generalistenough to derive the physical model of complex tables with overlapping cells ifno grid-lines are available. As example, they present the table in Fig. 4 where thetop left cell spans two columns, but can usually not detected by table recognitionapproaches if the cell contains more than one distinguishable word (e.g., [29, 18]).Unfortunately, tabular layout of this nature is very common for mathematicalformulas.

An exceptional approach that can deal with these types of cells is presentedby Hurst [13, 14]. Rather than using abstract words and their position to identifycolumns, Hurst uses a language model in order to group words into cells, beforeconstructing rows and columns. While this makes the method more robust tothe presence of multicolumn or multirow cells as well as to some extend to celloverlap, one needs to have a sufficiently robust language model first, which toour knowledge does not exist for arbitrary mathematical expressions.

To appreciate the obstacles encounter in segmenting such tables we haveapplied an implementation of Kieninger’s algorithm [18] to the two tables with

Fig. 4. A table most segmentation algorithms would not process accurately

Fig. 5. Table with mathematical expressions before applying the Kieninger’s algorithm.

mathematical expressions presented in Fig. 5. The result of the algorithm isgiven in Fig. 6.

The cell recognition errors are indicated with red and blue stars in Fig. 6,respectively. These errors can be classified into typical ones that occur duringtable segmentation, namely splitting and merging errors. A merging error occurswhen two blocks of cells are wrongly transformed into one block. A splittingerror occurs if the converse case happens. Splitting errors can be found in table1 whereas merging errors can be found in table 2.

Splitting Errors Analysis The splitting errors in table 1 result in fake columns.The cause of this is the structure nature of math expression that usually have in-consistent space values distribution between their symbols which in turn makesKieninger’s algorithm more likely to mistake casual space character alignmentwith a column delimiter.

Merging Errors Analysis The merging errors in table 2 result in too few cellsand more importantly do not detect the correct rows. And while some overlap-ping cells are detected, it is probably not the overlap one would like to have.The method primarily fails to split columns because there is no discernible spacebetween them relative to the other rows. That is the consecutive cells in one rowobscures the column structure in the next row.

Table Column Extraction To avoid such errors of the above nature in ourapproach we identify cells first before combining them to rows and columns. In

Fig. 6. The results of applying Kieninger’s algorithm.

the absence of a language model we have to rely on purely spatial techniques toidentify enough features to determine a cell boundary.

We first use our line finding algorithm to identify the single lines in the pageor area of the table we want to recognise. We then exploit again the histogram-matic information that we have gathered during the line finding algorithm inthe following way:

We consider the the distribution of horizontal distance values d between con-secutive glyphs as given by the histogram derived in the line finding algorithm.However, we now consider the gaps between the values on the x-axis. As men-tioned previously we do not denote any x-value for which no horizontal distanceexists in the page. As a consequence not all values are given along the axis.We now look for the first difference between neighbouring values that exceeds aparticular threshold (in our case we generally take the value 10). We assign thesmaller of the two difference values to be our cell separation distance T . Thisprocedure can be formally defined as follows:

Definition 9 (Cell Separation Distance). Let X = {x0, . . . , xn} be the or-dered set of all existing x-values of a page. Then let V = {v1, . . . , vn} be the setof differences between neighbouring elements of X, that is, vi = xi − xi−1.

The cell separation distance T is then assigned to the smallest xi−1 such thatvi ≥ 10, where i = 1, . . . , n.

We then separate lines into columns in the places where the horizontal dis-tance between two neighbouring glyphs is larger than T .

Definition 10 (Cells). Let L = {g1.....gn} be a line. We impose a partition onL as follows:

(i) two neighbouring glyphs gi, gi+1 belong to the same set if l(gi+1)−r(gi) < T ,(ii) two neighbouring glyphs gi, gi+1 belong to different sets if l(gi+1)− r(gi) ≥

T .

We call the subsets of L given by this partition cells.

To exemplify the idea assume that we have the following horizontal distancesoccurring between all glyphs of a given page: 12, 15, 17, 21, 23, 28, 35, 50, 56,

75, 120. Using our threshold parameter of 10 the first gap of exceeding this valueis between 35 and 50. Consequently, we will choose T = 35 and separate everyline into columns whenever the horizontal distance between two neighbouringglyphs is greater than 35.

While the choice of this distance measure is fairly simple, it has the advantagethat it does not rely on any additional knowledge on the content or elaboratestatistical consideration. Nevertheless, in practice we have observed very goodresults. For example, when running our algorithm over table 1 and table 2 fromFig. 5 we obtain the result presented in Fig. 7, where we can see that we indeedobtain the cells that we would ideally identify before putting together the table.

Fig. 7. Ideal segmentation output of both table 1 and table 2 in figure (5).

As a means to both evaluate the performance of our approach as well as tolater be able to locate tables in a page we adapt a technique from the literature.For example, in [19, 23] lines on a page are classified as tabular lines, if they canbe split into two or more segments, or otherwise as text lines. This classificationis then used to identify table cells and consecutive tabular lines are clusteredand the top and bottom boundaries of table calculated based on the first andlast.

And although for mathematical documents this techniques needs to be modi-fied as not all tabular lines should be considered part of a table, the classificationhas still of valid use in evaluating our algorithm’s performance where we are onlyinterested in extracting the correct table cells. Consequently we classify lines intotwo categories: either text or table together with the number of recognised cells.Fig. 8 gives an example of a fully classified page.

For our evaluation we have manually ground truthed the first 200 pagesof [15], a document that contains many tabular components. We then ran ourtechnique over the 200 pages, which produces a corresponding classified pagefor each page in the test set, that contains a mixture of Tab and Txt labelannotations.

Fig. 8. An example of a classified page from [15].

Pages. Total cell Cells found Correct cells Accuracy

200 9508 9849 9292 97.7%

Table 3. Experimental results for cell recognition.

Although our experimental results — presented in Table 3 — show alreadya promisingly high accuracy rate, there is still a significant problem with thededection of too many cells. An error analysis yields that the majority of mis-recognised cells are due to splitting errors as mentioned previously in this section.One possible approach to tackle this kind of error and eventually improve theaccuracy of our approach is to experimentally determine the threshold for settingthe cell separation distance using a small independent sample set.

6 Conclusions

We have presented some issues regarding recognition of tables containing math-ematical expressions. As those tables are often particularly complex and can beambiguous in interpretation, generalist algorithms for table recognition tend tofail.

As a bases for table recognition algorithm serves a line detection procedurethat is particularly suitable for mathematical expressions. The procedure exploitsonly simple spatial features in a histogrammatic approach, avoiding the use ofmany parameters that need to be fine tuned or relying on statistical data from

large sample sets. Our experiments show that we nevertheless get a high rate ofaccuracy in detecting correct lines.

One issue we have not yet addressed is when two distinct lines have verticaloverlap. These occur occassionally when mathematically expressions of excep-tional height vertically overlap with ascenders or descenders of neighbouringlines or when lines are deliberately vertically squeezed. Then lines can not beseparated simply by vertical cuts, which is the very first step of our algorithm.We are currently exploring techniques such as seam carving [25] as a solutionfor this problem.

The actual table recognition is based on the idea of detecting cells first,rather than doing a spatial analysis of rows and columns. In the absence of alanguage model for mathematical expressions we again use spatial features anal-ogous to our line finding algorithm. Initial experiments with our approach yieldpromising results, however a next step is to evaluate and further hone our tech-niques in a case study. For this we are currently putting together a large groundtruth set. Since the mathematical table structures often allow for a number ofequally correct interpretations we are currently looking into a framework thatallows for adequate representation of these interpretations and ways to movesbetween them. We are currently considering graph rewrite systems as a possibleframework to exploit.

References

1. M. Aassila and A. Benaissa. Existence of global solutions to a quasilinear waveequation with general nonlinear damping. Electronic Journal of Differential Equa-tions, 2002(91):1–22, 2002.

2. B. Ahmad, R. Khan, and P. Eloe. Generalized quasilinearization method for asecond order three point boundary-value problem with nonlinear boundary condi-tions. Electronic Journal of Differential Equations, 2002(90):1–12, 2002.

3. N. Aissaoui. Maximal operators, lebesgue points and quasicontinuity in stronglynonlinear potential theory. Acta Math. Univ. Comenianae, LXXI(1):35–50, 2002.

4. Y. Akdim, E. Azroul, and A. Benkirane. Existence of solutions for quasilin-ear degenerate elliptic equations. Electronic Journal of Differential Equations,2001(71):1–19, 2001.

5. D. Akhmetov, M. Lavrentiev, and R. Spigler. Existence and uniqueness of classi-cal solutions to certain nonlinear integro-differential fokker-planck type equations.Electronic Journal of Differential Equations, 2002(24):1–17, 2002.

6. H. Alkahby, G. Ansong, P. Frempong-Mireku, and A. Jalbout. Symbolic com-putation of appell polynomials using maple. Electronic Journal of DifferentialEquations, pages 1–13, 2001.

7. M. Allali. Compression on the digital unit sphere. Electronic Journal of DifferentialEquations, pages 15–24, 2001.

8. C. Chui, C. Mercat, W. Orrick, and P. Pearce. Integrable lattice realizations ofconformal twisted boundary conditions. In , 2001.

9. E. Dads and K. Ezzinbi. Boundedness and almost periodicity for some state-dependent delay differential equations. Electronic Journal of Differential Equa-tions, 2002(67):1–13, 2002.

10. D. Denisov and V. Wachtel. Random walks in cones. , 2011.11. S. Al Ghour. Components and local prehomogeneity. Acta Math. Univ. Comeni-

anae, LXXIII(2):187–196, 2004.12. D. Guido and T. Isola. Novikov-shubin invariants and asymptotic dimensions for

open manifolds. In , 1998.13. M. Hurst. Layout and language: An efficient algorithm for detecting text blocks

based on spatial and linguistic evidence. In Document Recognition and RetrievalVIII, pages 55–67, 2001.

14. M. Hurst. A constraint-based approach to table structure derivation. In ICDAR-03, pages 911–915, 2003.

15. A. Jeffrey and D. Zwillinger. Table of Integrals, Series, and Products. Elsevier Inc,2007.

16. M. Jinzenji. Completion of the conjecture: Quantum cohomology of fano hyper-surfaces. , 1999.

17. T. Judson. Abstract Algebra Theory and Applications. , 2009.18. T. Kieninger. Table structure recognition based on robust block segmentation. In

Proc. of the fifth SPIE Conference on Document Recognition, pages 22–32, 1998.19. S. Mandal, S. Chowdhury, A. Das, and B. Chanda. Detection and segmentation of

tables and math-zones from document images. In SAC-06, pages 841–846, 2006.20. P. Mateus and A. Sernadas. Weakly complete axiomatization of exogenous quan-

tum propositional logic. Inf. Comput., 204(5), 2006.21. S. Meljanac, A. Perica, and D. Svrtan. The energy operator for a model with

a multiparametric infinite statistics. Journal of Physics A: Mathematical AndGeneral, 36:6337–6349, 2003.

22. S. Meljanac and D. Svrtan. Determinants and inversion of gram matrices in fockrepresentation of qkl. , 2003.

23. Ermelinda Oro and Massimo Ruffolo. Pdf-trex: An approach for recognizing andextracting tables from pdf documents. In Proceedings of the 2009 10th InternationalConference on Document Analysis and Recognition, ICDAR ’09, pages 906–910,Washington, DC, USA, 2009. IEEE Computer Society.

24. B. Riemann. On the number of prime numbers less than a given quantity. , 1998.25. R. Saabni and J. El-Sana. Language-independent text lines extraction using seam

carving. In Document Analysis and Recognition, volume 0, pages 563–568. IEEEComputer Society, 2011.

26. A. Costa Silva, A. Jorge, and L. Torgo. Design of an end-to-end method to ex-tract information from tables. International Journal Document Analysis Research,8(2):144–171, 2006.

27. S. Sternberg. Theory of functions of a real variable. , 2005.28. T. Takebe. A system of difference equations with elliptic coefficients and bethe

vectors. Communications in Mathematical Physics, 183(1):161–181, 1996.29. S. Tupaj, Z. Shi, and C. Hwa Chang. Extracting tabular information from text

files. In EECS Department, Tufts University, 1996.30. R. Zanibbi. Recognition of mathematics notation via computer using baseline

structure. External technical report, Department of Computing and InformationScience, Queen’s University, Kingston, Canada, 2000.

31. Y. Zhang and M. Ge. Ghz states, almost-complex structure and yang—baxterequation. Quantum Information Processing, 6(5):363–379, 2007.

issues in mathematical table recognition - school of computer

Documents