a combinatorial approach to the phonetic similarity of

25
Mathématiques et sciences humaines Mathematics and social sciences 179 | Automne 2007 Varia A combinatorial approach to the phonetic similarity of language Une approche combinatoire de la ressemblance phonétique entre langues Daniele A. Gewurz and Andrea Vietri Electronic version URL: http://journals.openedition.org/msh/6803 DOI: 10.4000/msh.6803 ISSN: 1950-6821 Publisher Centre d’analyse et de mathématique sociales de l’EHESS Printed version Date of publication: 1 September 2007 Number of pages: 21-44 ISSN: 0987-6936 Electronic reference Daniele A. Gewurz and Andrea Vietri, « A combinatorial approach to the phonetic similarity of language », Mathématiques et sciences humaines [Online], 179 | Automne 2007, Online since 21 December 2007, connection on 23 July 2020. URL : http://journals.openedition.org/msh/6803 ; DOI : https://doi.org/10.4000/msh.6803 © École des hautes études en sciences sociales

Upload: others

Post on 09-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A combinatorial approach to the phonetic similarity of

Mathématiques et sciences humainesMathematics and social sciences 179 | Automne 2007Varia

A combinatorial approach to the phoneticsimilarity of languageUne approche combinatoire de la ressemblance phonétique entre langues

Daniele A. Gewurz and Andrea Vietri

Electronic versionURL: http://journals.openedition.org/msh/6803DOI: 10.4000/msh.6803ISSN: 1950-6821

PublisherCentre d’analyse et de mathématique sociales de l’EHESS

Printed versionDate of publication: 1 September 2007Number of pages: 21-44ISSN: 0987-6936

Electronic referenceDaniele A. Gewurz and Andrea Vietri, « A combinatorial approach to the phonetic similarity oflanguage », Mathématiques et sciences humaines [Online], 179 | Automne 2007, Online since 21December 2007, connection on 23 July 2020. URL : http://journals.openedition.org/msh/6803 ; DOI :https://doi.org/10.4000/msh.6803

© École des hautes études en sciences sociales

Page 2: A combinatorial approach to the phonetic similarity of

Math. & Sci. hum. / Mathematics and Social Sciences (45e année, n◦ 179, 2007(3), p. 21–44)

A COMBINATORIAL APPROACH TO THE PHONETIC SIMILARITYOF LANGUAGES

Daniele A. GEWURZ1, and Andrea VIETRI2

résumé – Une approche combinatoire de la ressemblance phonétique entre languesEn exploitant une représentation géométrique des phonèmes vocaliques, nous réalisons un modèle

bidimensionnel dans lequel des voyelles sont des points et les distances entre ces points expriment

des différences auditives. Ceci nous permettra de décrire le système vocalique d’une langue du

point de vue d’une autre langue au moyen d’une partition d’un ensemble fini dont les propriétés

combinatoires peuvent être explorées. Le concept de base que nous utilisons est celui du diagramme

de Voronoï, qui a été largement utilisé dans d’autres domaines. Dans le cas présent, nous mettons

en évidence quelques particularités combinatoires de partitions d’entiers qui décrivent des dissem-

blances entre des inventaires vocaliques de différentes langues et classons les relations possibles entre

inventaires par le biais des graphes orientés appropriés et, en particulier, parmi les diagrammes

de Voronoï adéquats. Nous appliquons cette théorie à quelques langues réelles, en recherchant des

améliorations pouvant faciliter la compréhension d’un inventaire vocalique par un auditeur dont la

catégorisation auditive est différente. Enfin, nous décrivons des inventaires privilégiés, facilement

compréhensibles dans beaucoup de langues en même temps.

mots clés – Arrangements de points, Diagrammes de Voronoï, Graphe fonctionnel, Langagenaturel, Partitions

summary – By exploiting a well-known geometrical representation of vowel phonemes, we

devise a two-dimensional model in which vowels are points and distances between points express

auditory discrepancies. This will allow us to describe the vowel system of a language, as seen by

another language, by means of a set partition whose combinatorial properties can be explored. The

basic concept we employ is that of the Voronoi diagram, which has been, so far, extensively used

for many other purposes. In the present framework, we point out some combinatorial features of

integers partitions which describe dissimilarities between vowel inventories of different languages.

We classify the possible relations between inventories via suitable directed graphs related to point

configurations and, in particular, to the pertinent Voronoi diagrams. We apply the above theory to

some real languages we also look for possible improvements that make a vowel inventory easier to

understand by a listener whose auditory categorisation is different. Finally, we describe particular

inventories, easily understandable in many languages at the same time.

keywords – Functional graphs, Natural languages, Partitions, Point arrangements, Voronoidiagrams

1Dipartimento di Matematica, Università di Roma “La Sapienza”, Piazzale A. Moro 5, 00185Roma, Italia, [email protected]

2Dipartimento Me.Mo.Mat., Università di Roma “La Sapienza”, via A. Scarpa 16, 00161 Roma,Italia, [email protected]

Page 3: A combinatorial approach to the phonetic similarity of

22 d. a. gewurz, a. vietri

1. INTRODUCTION

In this paper we analyse phonetic similarities between vowels of distinct languages,by means of combinatorial properties of point arrangements in the plane. To thisend, we shall borrow from phonetics a rigorous geometrical setting where the sim-ilarity between vowel phonemes can be assessed. We start by introducing a basicconcept which will provide the link between phonetics and combinatorics.

definition 1. Let a1, . . . , an be distinct points in the Euclidean plane. The Voronoidomain related to ai, in symbols V(ai), is the region all of whose elements are closerto ai than to any other aj. Representing all the V(ai)’s in the plane yields theVoronoi diagram for a1, . . . , an.

r

r

rr

r

figure 1. A Voronoi diagram

Clearly, any two Voronoi domains (which are, topologically, open sets) are dis-joint, while the union of the closures of all domains is precisely the whole plane.Voronoi diagrams have been so far employed in several applicative and theoreticalcontexts. We quote, for instance, [Aurenhammer, 1991; Klein, 1989; Okabe et al.,2000].

In order to introduce the phonetical setting to which Voronoi diagram theorywill be applied, we start by describing a related model that actually provided theinitial motivation for the present study. This model, namely the so-called “vowelquadrilateral” first introduced in the 1910s by Daniel Jones [1917, 1922], is widelyrecognised as a valuable tool for effectively classifying vowels without getting in-volved in too deep calculations. In fact, such a quadrilateral (see Figure 2) displaysvowels3 in a 2-dimensional grid, according to the part of the tongue raised whenuttering the sound, and to the extent of the raising. More precisely, the horizontalaxis of the grid measures the “tongue backness”, that is, the position of the raisedpart (the more on the left, the closer to the lips), while the vertical axis refers tothe raising degree (higher points correspond to positions closer to the palate). Forexample, when uttering the vowel /i/ of “see” the tongue tip is quite close to theupper incisives, at one extreme of the mouth cavity. On the contrary, the /2/ of“cut” requires positioning the tongue quite backward, without touching the palate.

It must be noticed that representing vowels as above fails to yield an injectivemap. In fact, one does not take into account possible lip rounding, or nasalisation,

3In keeping with the standard conventions, the symbols for vowel phonemes are taken from theInternational Phonetic Alphabet (see [1999]).

Page 4: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 23

æ 5

UI Y

Æ•3•

@8•9•

0•1• u•W•

o•7•

O•2•

6•A•Œ•a•

œ•E•

ø•e•

y•i•

figure 2. The vowel quadrilateral

or other adjustments, which would produce distinct sounds when applied to somefixed position of the tongue. Furthermore, although the vowel trapeze is an unques-tionably practical and suggestive tool, it does not so faithfully describe the actualposition of the tongue (as it can be checked using X-rays or artificial palates). Sucha lack of accuracy does not arise in a more rigorous model than the vowel trapeze,namely the diagram obtained by suitably plotting on a 2-dimensional space the firstand second formant of any given vowel. For, roughly speaking, any given sound maybe regarded as the superposition of a fundamental frequency (its “pitch”) and somehigher frequencies, called harmonics. In human speech, the vocal tract emphasisesto different degrees the harmonics generated by the vocal cords, thus giving rise tofrequency peaks, the formants of a given vowel. By definition, the (n+1)th formantFn+1 has a higher frequency than Fn.

In Figure 3 we compare some of the vowels in the vowel quadrilateral with thesame ones plotted in the plane with the formants as coordinates. Formants are givenin Hertz, with the first one on the vertical axis; the reference is [Wise, 1957].

æ

UIu

O•2•

A

E

i

0100020000

400

800

b

i

bI

bEb

2b

æb

A

bObU

b

u

figure 3. The vowel quadrilateral compared with formant plane

Unlike in the vowel trapeze case, many secondary adjustments subject to a fixedtongue position result in distinct points of the new model. For example, roundingthe lips modifies the length of the oral cavity and, consequently, lowers the secondformant (in the same figure, notice the disambiguation of /2/ and /O/). As a matterof fact, distinct vowels are mostly characterised by the first and second formant– which are almost entirely determined by the tongue position and all additionaladjustments – while the zeroth formant is strongly related to vocal cords (see e.g.[Canepari, 2005; Fairbanks, Grubb, 1961; Ladefoged, 1975; Wise, 1957]). It is then

Page 5: A combinatorial approach to the phonetic similarity of

24 d. a. gewurz, a. vietri

clear that whenever two vowels are perceived as distinct, they are extremely likelyto correspond to distinct points of the formant diagram.

Due to its precision, the acoustic model provides a handy geometrical settingwhere similarities between vowel sets of different languages can be analysed. In thesequel we will actually use the logarithmic plotting of F1 against F2, which is wellknown to yield vowel configurations rather similar to those appearing in the vowelquadrilateral. The use of logarithms is justified by the fact that the human earperceives sounds having distinct frequencies, say f1, . . . , fn, more or less as pointslog(f1), . . . , log(fn) in a 1-dimensional space, where a greater distance correspondsto a more marked distinction of the sound [Wise, 1957]4.

The main working hypothesis in this papers takes into account the set of vowels ofan individual’s mother tongue, with the individual trying to relate some unfamiliarvowel inventory to his own inventory, during an auditory process. Our assumptionis that the listener assigns, more or less unconsciously, every perceived vowel tothe most similar one in his inventory, and we translate this similarity into a spatialcloseness.

On the way to giving a formalisation of this point of view, we plot two given vowelsets A = {a1, . . . , am}, B = {b1, . . . , bn} in the (log(F1) , log(F2))-plane togetherwith one or both the corresponding Voronoi diagrams (see for example Figure 6).It then becomes quite easy to assess how much one set “comprehends” the other.In fact, by regarding these sets as the vowel inventories of two individuals IA, IB

speaking different languages, we can interpret any formula “bi ∈ V(aj)” as “vowelbi is approximated to vowel ai by individual IA”, and conversely using the otherVoronoi diagram.

In Section 2 we provide a formal description of the case where one set compre-hends or, more generally, recognises the other (here sets consist of points in theEuclidean plane, not necessarily regarded as vowels). As a major tool we employsuitable “approximation” functions, which in the phonetic model assign each vowelof a given language to the “more similar” vowel of another language. Further onin that section, to each ordered pair of sets (A, B) we associate a real numberIB,A ∈ [0, 1] indicating the “degree of unbalance” of the approximations of pointsof B by means of points in A. In particular, the least value of that number forfixed sizes |A|, |B| corresponds to recognition or, if |A| ≥ |B|, comprehension. Sucha value is attained precisely when the listened vowels are distributed as equablyas possible among the listener’s Voronoi domains. The whole section is managedin a geometrical-combinatorial fashion, with a fleeting mention to the underlyingphonetical meaning (Theorem 1).

The approximation functions between sets lend themselves to a purely graph-theoretical investigation, which would, though, carry us to a far different topicthan the present one. Nonetheless we could not refrain from collecting some basicdefinitions and properties in the Appendix 1, where approximation functions areclassified via a special class of directed graphs. This classification is shown to be on

4A sharper representation of vowels is obtainable by replacing F1, F2 with F2 − F1, F3 − F2

respectively [Ladefoged, 1975], and by using a different scale (the Bark) instead of the usual Hertzscale [Sorianello, 2003; Syrdal, 1985; Traunmüller, 1981]. However, the logarithmic plotting of F1,F2 is precise enough for the present purposes. Future investigations might possibly involve theabove refinements.

Page 6: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 25

the one side well-posed, while on the other side it associates to each graph at leastone pair of sets.

The third and last section is devoted to applicative issues such as the measure-ment of vowel dissimilarities between different real-life languages, and the construc-tion of vowel sets which are “as much as possible understood” by an arbitrarily largeset of languages. In that section we shall exploit previously proved results to addressconcrete issues.

Geometrical notions related to the vowel quadrilateral – e.g. collinearity of vow-els, parallelism of vowel segments, minimum distance between vowels in differentsets – have been tacitly invoked by many linguists who, although in more intuitiveterms, came to grips with actual mathematical problems. On the other side, nota great number of mathematicians seem to have tackled the phonetic classificationproblem from a geometric-combinatorial viewpoint. Among them, Petitot [1989]appears to have been the only one to resort to Voronoi diagrams. His definitionand results, mostly concerned with consonants, are partly along the same lines asthe present work. Other geometrical approaches can be found in [Hall, 1999; Liljen-crants, Lindblom, 1972].

Not surprisingly, the idea of comparing systems of phonemes of different lan-guages is far from new. Over the decades, very many methods have been developedfor the analysis and comparison of speech inventories (see, for example, [Kondrak,2001; Nerbonne et al., 1996; Nerbonne, Heeringa, 1997]). A big amount of work hasalso been devoted, and is still being, to the development of effective algorithms forspeech recognition, synthesised speech generation, and so forth. Among the numer-ous applicative contributions of mathematical flavour we quote [Badino et al., 2004];a formal approach from a logical viewpoint can be found in [Batóg, 1961, 1968].

2. COMPREHENSION AND RECOGNITION

As already mentioned, we are regarding vowels as points in the Euclidean plane.For this reason, many results shall be expressed in purely geometrical terms, whilebearing in mind the initial, applicative, setting.

Let us assume that two finite sets A = {a1, . . . , am}, B = {b1, . . . , bn} in theEuclidean plane endowed with the usual metric (where we denote by xy the distancebetween x and y) have the following property: for each element bi there exists anelement aj such that bi ∈ V(aj) (as the Voronoi domains are disjoint, aj is necessarilyunique). In other words, for every i, bi is closest to a unique aj . If this propertyholds we say that B is in general position with respect to A and define ϕB,A : B → Aas the map sending bi to the above defined aj , for every i.

In the present paper we shall assume that pairs of sets like (A, B) above arealways in general position with respect to one another. In phonetical terms, thisamounts to postulating that for any vowel v of a given language there exists onlyone vowel of another fixed language, lying at the smallest distance from v in the(log(F1) , log(F2))-plane. Such an assumption seems sensible enough, dealing as weare with empirical data. In the next section we shall show how to manage thecase where some vowels of a language lie too close to the boundary of a Voronoidomain of another language. Accordingly, to each pair (A, B) we shall associate a

Page 7: A combinatorial approach to the phonetic similarity of

26 d. a. gewurz, a. vietri

number δ(B, A) which, if too small, warns that some vowel in B might be hard toapproximate using vowels in A, because it lies more or less at the same distancefrom two or more vowels in A. Other tools will be introduced that manage suchlimit cases.

We now provide the basic definitions that formalise the notions of comprehen-sion and recognition between sets of vowels (actually, between sets of points in theEuclidean plane).

definition 2. The set A is said to comprehend B (A |= B) if ϕB,A is injective (inparticular, |A| ≥ |B|).

More generally, A recognises B (A ⊢ B) if ⌊|B|/|A|⌋ ≤ |ϕ−1B,A(aj)| ≤ ⌈|B|/|A|⌉

for each j. (Notice that, in the comprehension case, these inequalities reduce to|ϕ−1

B,A(aj)| ≤ 1 for each j.)Finally, if A ⊢ B and |B| is a multiple of |A|, we say that A equably recognises

B (A ⊢e B).

Figure 4 illustrates four different situations with respect to comprehension orrecognition. The sets A, B consist respectively of full and empty circles. The Voronoidiagram for A is displayed in each case.

r

r

r

���

b

b

A |= B

r

r

r

���

b

b b

b

A ⊢ B

r

r

r

���

b

b b

b

b

b

A ⊢e B

r

r

r

���

b

b b

b

b

A 6⊢ B

figure 4. Comprehension and recognition: a showcase

Using Voronoi diagrams we can express |ϕ−1B,A(aj)| as |B ∩ V(aj)|. In fact, any

of the three properties above means that vowels from B are as fairly as possibledistributed in the Voronoi domains related to A. Comprehension and recognitionare not in general symmetric nor transitive relations, even if points are assumed tolie in a line. Nonsymmetry of comprehension is trivial, due to the size constraint.On the top of Figure 5 a case is shown where A |= B |= C and A 6|= C 6|= B;in particular, notice that both symmetry and transitivity fail to hold in general –comprehension possibly lacking symmetry even in the equal size case. In the bottomof the same figure we deal with recognition. In this case we have that A ⊢ B ⊢ Cand B 6⊢ A 6⊢ C.

b r d b r dc1 b1 a1 c2 b2 a2

d r b b d d r r dc1 b1 a1 a2 c2 c3 b2 b3 c4

figure 5. Relations |= and ⊢ are not symmetric nor transitive

It turns out that mutual comprehension holds only in the trivial case, namelywhen ϕA,B(a) = b implies ϕB,A(b) = a for all a ∈ A, b ∈ B, that is, when ϕA,B and

Page 8: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 27

ϕB,A are inverse of each other. This follows quickly by a preparatory lemma, whichwill be utilised later on again.

lemma 1. Let (a1, b1, . . . , ak, bk) be a sequence of points in the Euclidean plane, withno repeated element, such that bi = ϕA,B(ai) for all i, ai = ϕB,A(bi−1) for all i > 1,and a1 = ϕB,A(bk). Then, k = 1.

Proof. If k > 1, using the very definition of ϕ_,_ we deduce that a1b1 > b1a2 >

a2b2 > · · · > akbk > bka1, which is a contradiction. For, a1 cannot be nearer to bk

than to b1.

proposition 2. A |= B |= A if and only if ϕA,B and ϕB,A are inverse of each other.

Proof. The if part holds trivially, because the assumption implies that the two mapsare injective. Reasoning by contradiction, let us conversely assume that some a1 ∈ Aexists such that ϕB,A(ϕA,B(a1)) = a2 6= a1. Now using the injectivity of both mapsone obtains a maximal sequence a1, b1, a2, b2, a3, . . . , an (or . . . , bn) with n ≥ 2,where bi = ϕA,B(ai), ai = ϕB,A(bi−1), such that the points are all distinct. However,Lemma 1 prevents the last point to be mapped to any preceding element (such amapping would indeed cause a nontrivial loop). Due to the finite size of the sets, acontradiction is then reached.

Having settled the mutual comprehension case, we now deal with the more gen-eral issue of assessing the extent to which a given set fails to recognise another set.Before introducing the relevant formal notion, we resort to Figure 4 again and pro-vide a simple justification of the mathematical tools to be introduced later on. Inthe first illustration, namely in the presence of comprehension, the membership ofempty circles in the Voronoi domains of A can be described by the partition (0, 1, 1)of 2 = |B|, according to the number of circles in each domain (we use nondecreasingentries). In the other illustrations one obtains respectively the partitions (1, 1, 2),(2, 2, 2), (1, 1, 3). If we now consider the most balanced partition of |B| for eachcase – using rational numbers if necessary – then we get (|B|/|A|, |B|/|A|, |B|/|A|)with |B|/|A| respectively equal to 2/3, 4/3, 2, and 5/3. Now a way of assessing thediscrepancies between the actual partitions and the most balanced ones is summingthe absolute values of the differences over all entries. By doing so, in the first casewe find the “error” 2/3 + 1/3 + 1/3 = 4/3, and similar computations yield 4/3, 0,8/3. In order to normalise the error estimation we divide each sum by the greatestpossible error, coming from the most unbalanced partition (0, 0, . . . , 0, |B|), thusobtaining (4/3)/(8/3) = 0.5, then similarly (4/3)/(16/3) = 0.25, then 0 and finally0.4. Notice that the normalisation plays a crucial role in discriminating the first twocases, while the third error, equal to 0, signals an optimal situation. By this handfulof examples we are now led to the formal definition.

definition 3. Given two finite sets A, B and the map ϕB,A as above, let πB,A denotethe partition of |B| in |A| parts given by (|ϕ−1

B,A(a1)|, |ϕ−1B,A(a2)|, . . . , |ϕ−1

B,A(a|A|)|) (weregard partitions of a given integer as tuples of nonnegative integers summing up tothat number, and consider two partitions as equal if they differ only in the ordering

Page 9: A combinatorial approach to the phonetic similarity of

28 d. a. gewurz, a. vietri

of the terms). Furthermore, for any partition ω = (ω1, . . . , ωv) of m in at most vparts (as above, with ωi possibly equal to zero for some i), let the error Eω stand for

v∑

i=1

∣∣ωi −

mv

∣∣

2m − 2mv

.

We define the recognition index from B to A, in symbols IB,A, as EπB,A.

Notice that 2m − 2m/v is equal to (v − 1) · |0 − m/v| + |m − m/v| (as we willsoon show, this quantity is the largest numerator obtainable among all partitions).In plain words, Eω provides a normalised measure of the “unevenness” of ω withrespect to (m/v, m/v, . . . , m/v) – which is in general not a partition.

Before focussing on some mathematical aspects of IB,A, let us apply the abovedefinitions to a practical context. We denote by eng the set of British Englishmonophthongs (i.e., single vowels, as opposed to diphthongs or more complex sounds)as described in [Wells, 1962]5 and by ita the Italian vowel set, as in [Cosi, Ferrero,Vagges, 1995]. In Figure 6, left side, we have depicted the elements of eng – regardedas points lying in the (log(F1) , log(F2))-plane – in the Voronoi diagram of ita. Vow-els are confined into a suitable area which is a little larger than the the area spannedby human frequencies (see also the right side of Figure 3). Italian vowels are fullcircles, and the Voronoi region of x is labeled by xita. The resulting partition πeng,ita,namely (1, 1, 1, 1, 2, 2, 3), reveals that ita does not recognise eng (see the right sideof Figure 6). The recognition index from eng to ita is 8/33. Notice that this numberis greater than 2/11 = E(1,1,1,2,2,2,2), which refers to the recognition case.

The next result sheds light on a basic property of recognition indices, by pointingout the “most balanced” and the “most unbalanced” partitions.

proposition 3. If ω = (ω1, . . . , ωv) is a partition of m in at most v parts, with

ωi ≤ ωj if i < j, then r(v−r)m(v−1)

≤ Eω ≤ 1, where m = qv + r according to theEuclidean division. The lower and upper bounds are attained, respectively, only bythe partitions

σ = (q, q, . . . , q︸ ︷︷ ︸

v−r

, q + 1, q + 1, . . . , q + 1︸ ︷︷ ︸

r

) and σ′ = (0, 0, . . . , 0, m) .

Proof. We do not provide the easy calculation showing that σ and σ′ yield theclaimed values. In order to prove the lower bound claim let us consider a partitionω different from σ. The general idea is that starting from ω we can construct apartition carrying a smaller error. The existence of an entry ωh which differs fromthe corresponding σh actually entails the existence of two indices i, j such thatωi < σi and ωj > σj . This easily implies that i is smaller than j (the only case that

5As a matter of fact, phoneticians do not always agree on the number of vowels in a givenlanguage, as well as on the very definition of vowel (see e.g. [Abercrombie, 1967], p.79-80), onthe classification of diphthongs and other issues which provide many challenging open questionsin phonetics. For example, Wells’ vowel inventory omits the “schwa” /@/, whereas this vowel isincluded in other inventories.

Page 10: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 29

b

b

b

b

b

b

b

iita eita Eita

aita

Oitaoitauita

bcibcI

bcE

bcæ

bc3bc2

bcA

bc6

bc

O

bcUbcu

5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.86.4

6.6

6.8

7.0

7.2

7.4

7.6 iIEæA6OUu23

i

e

E

a

O

o

u

figure 6. English vowels as seen by Italian language

deserves some more attention is when σi > σj , but if this occurs then σi = q + 1and σj = q, which entails that ωi ≤ σj and eventually – as in the other cases – thati < j; hence, the present case cannot actually occur).

Let us first assume that σi = q and σj = q + 1. By increasing ωi by 1 anddecreasing ωj by 1, we get a new partition τ such that Eτ = Eω − 2 (the contributeof each change amounts indeed to 1).

Let us now assume that σi = σj = q. Again, i is smaller than j. By defining apartition τ as above, we have that Eτ = Eω −1+(|ωj −1−m/v|−|ωj −m/v|). Notethat in this case the contribute of the j-th term is not necessarily equal to 1. Usingthe triangle inequality we have |ωj − 1 − m/v| ≤ |ωj − m/v| + 1, which is actuallystrict because ωj > m/v. Therefore, also in this case we have that Eτ < Eω. Asimilar argument can be used to manage the case σi = σj = q + 1.

Now we prove the upper bound claim. Let us consider again the partition ω =(ω1, ω2, . . . , ωv) and modify it to ω′ = (0, ω2, ω3, . . . , ωv−1, ωv + ω1). Since ωv ≥m/v ≥ ω1, we have that Eω′ ≥ Eω, equality holding only if ω1 = 0. Indeed, thesummand |ω1 − m/v| = m/v − ω1 in Eω becomes |0 − m/v| = m/v in Eω′ , andlikewise |ωv − m/v| = ωv − m/v becomes |ωv + ω1 − m/v| = (ωv − m/v) + ω1. So,Eω′ = Eω + 2ω1.

Iterate this procedure to get a sequence of partitions, the i-th of which is obtainedfrom the (i − 1)-th by reducing to 0 the ith term ωi and increasing the vth termby ωi. If ω̃ and ω̃′ are consecutive partitions in this sequence, we have Eω̃′ =Eω̃ − |ωi − m/v| + m/v + ωi. If ωi ≥ m/v, this equals Eω̃ + 2m/v, else it equalsEω̃ + 2ωi. In both cases we have Eω̃′ ≥ Eω̃, with the equality holding only if ωi = 0.

By recursively setting entries to zero while increasing the rightmost entry, we endup with (0, 0, . . . , 0, m). The claim is thus proved by concatenating the equalities orstrict inequalities obtained from each step, noting that at least one strict inequalityholds if ω 6= σ′.

Page 11: A combinatorial approach to the phonetic similarity of

30 d. a. gewurz, a. vietri

We are now in a position to deduce the following result which – due to itsrelevance – we record as a theorem, although it is simply an instantiation ofProposition 3.

theorem 1. For any sets of vowels A, B, where |B| = q|A| + r according to the

Euclidean division, the inequalities r(|A|−r)|B|(|A|−1)

≤ IB,A ≤ 1 hold. The lower bound isattained if and only if A recognises B, whereas the upper bound is attained if andonly if ϕB,A(B) = {a} for some a ∈ A. In particular, IB,A = 0 if and only if A ⊢e B.

Proof. Because πB,A is a partition of |B| in at most |A| parts, the lower bound claimfollows by Proposition 3. The upper bound claim can be deduced in a similar way.The final claim follows even more easily.

3. RESULTS ON VOWEL SETS OF REAL LANGUAGES

In this final section we use the previously developed machinery to obtain practicalresults on similarities between vowel sets of languages actually spoken. First, wecompare some different languages. Second, we describe a way of modifying a givenset A, so as to increase the recognition index from another set to A. Third, weexhibit privileged sets of vowels which can be recognised, or comprehended, byseveral languages at the same time.

We start by associating a positive, real number to a given set B in generalposition with respect to another set A. This number plays a fundamental role inany context involving distance measurements for two given sets. In fact, it providesa rating how meaningfully the present combinatorial model suits the experimentaldata.

For any fixed a ∈ A, b ∈ B with b ∈ V(a), let Vb denote the Voronoi domain ofA, different from V(a), to which b is closest. Also, let pb denote the distance of bfrom the closest boundary. Finally, let ∆(b) stand for pb/ab. If the minimum, overall b ∈ B, of ∆(b) is smaller than 1, we denote that number by δ(B, A). Otherwise,we set δ(B, A) = 1.

�������PPPPPP���������XXXXXXXJ

JJ

JJJ

AA

HHHH

rb

ba

pb

V(a)

Vb

figure 7. ∆(b) = 12

definition 4. The set B is said to be in general position of degree δ(B, A) withrespect to A.

Page 12: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 31

A value of δ which approaches zero indicates that at least one vowel in the set Blies more or less at the same distance from two vowels of the listener’s inventory (theset A). In that case, although the general position property ensures a deterministiccorrespondence from vowels in B to vowels in A, yet the actual correspondence fromspeaker to listener may well not be deterministic. In fact, the listener might havedifficulties in spontaneously categorising the perceived vowels. On the other hand,the higher δ(B, A), the more reliable IB,A. In particular, δ is equal to 1 preciselywhen no b ∈ V(a) lies closer to Vb than to a.

The above considerations and, notably, the introduction of δ(_, _), are justifiedby the fact that empirically one never has pointlike data, but rather distributionsspread around an average value. So, it becomes of great importance to look at thebehaviour of points lying pretty close to a border.

Instead of looking for a degree δ(B, A) large enough, we might be contented if thepartition πB,A does not change should any troublesome vowel, or a number of them,cross the corresponding closest boundaries. In other words, we are now looking atsets of vowels which, irrespective of their “swinging” auditive interpretation, yieldthe same, global, partition in the listener’s mind.

definition 5. The loose degree of B with respect to A, in symbols λ(B, A), is thelargest number l in {∆(b):b ∈ B}∪{0} with the property that, given any b ∈ B suchthat ∆(b) ≤ l, the partition πB,A is not altered by moving b to Vb.

It is straightforward to observe that the loose degree attains the minimum value,namely 0, if and only if any element b such that ∆(b) = δ(B, A) affects πB,A whenmoving to Vb.

Notice that even if each single point of a subset {b1, b2, . . .} ⊆ B can cross thecorresponding closest boundary without affecting the partition, the same propertymay not be true for some, or all, points simultaneously crossing the boundaries (thisis easily seen, for example, when considering two points lying in a half plane, andthree other points lying in the complementary half plane). For this reason, it seemsworthwhile to improve and generalise the above definition, as follows.

definition 6. Let n be a positive integer. The n-loose (or simply “loose” ifn = 1) degree of B with respect to A, in symbols λn(B, A), is the largest number l in{∆(b):b ∈ B} ∪ {0} with the property that given any u ≤ n points b1, b2, . . . , bu ∈ Bsuch that ∆(bi) ≤ l for every i, the partition πB,A is not altered by simultaneouslymoving bi to Vbi

for every i.

We now have all the ingredients to compare different vowel sets and, in particular,to assess and quantify the degree of similarity in each case. Let us therefore consideragain Figure 6. The following result, partly established in Section 2, can now beclaimed in full.6

property 1. πeng,ita = (1, 1, 1, 1, 2, 2, 3). In particular, ita 6⊢ eng. Furthermore,Ieng,ita = 8

33= 0.24, while δ(eng, ita) = ∆(A) = 0.08. Finally, the corresponding

loose degree is zero (indeed, the vowel /A/ modifies the partition, once it crosses theclosest boundary).

6The numerical data from whence the numbers ∆(x) have been calculated are collected in theAppendix 2.

Page 13: A combinatorial approach to the phonetic similarity of

32 d. a. gewurz, a. vietri

As remarked earlier, recognition would hold only if the partition were (1, 1, 1,2, 2, 2, 2), the recognition index being 2/11 = 0.18. On the other hand, a quickglance to the Voronoi diagram of eng – which we are not providing – would showthat eng |= ita, the related partition being indeed (0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1). Alsonotice that, due to the trivial inequality:

λu(B, A) ≥ λu+1(B, A) for all u ≥ 1 ,

there is no hope of finding any positive n-loose degrees.A different situation occurs if we look at Figure 8. Here the German vowel

inventory, say ger, is taken from [Delattre, 1981].

b

b

b

b

b

b

b

iita eita Eita

aita

Oitaoitauita

bcibcI

bcE

bce

bca

bco

bcA

bcy

bcObcU

bcu

bcYbc

ø bcœ

5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.86.4

6.6

6.8

7.0

7.2

7.4

7.6

figure 8. German vowels in the Italian Voronoi diagram (Italian /O/ and German /O/virtually coincide)

property 2. πger,ita = (1, 1, 2, 2, 2, 2, 4). In particular, ita 6⊢ ger. Furthermore,Iger,ita = 1

6= 0.16, while δ(ger, ita) = ∆(U) = λ1(ger, ita) = 0.01. Finally,

λn(ger, ita) = 0 for any n > 1.

A better case concerns the Spanish inventory spa, depicted in Figure 9. Again,the plotted frequencies refer to [Delattre, 1981].

property 3. πspa,ita = (0, 0, 1, 1, 1, 1, 1). Therefore, ita |= spa or, equivalently,Ispa,ita = 1

3= 0.3 is the smallest possible. Furthermore, δ(spa, ita) = ∆(o) = 0.31,

and λ1(spa, ita) = λ2(spa, ita) = 0.65 = ∆(e). Finally, λn(spa, ita) = 0 for anyother n.

Although the positiveness of λ2(spa, ita) is not quite surprising – due to the littlesize of the inventory – still such a property also relies on the fairly balanced positionof the five vowels which, in general, may not be taken for granted.

Page 14: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 33

b

b

b

b

b

b

b

iita eita Eita

aita

Oitaoitauita

bci

bc

e

bca

bco

bcu

5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.86.4

6.6

6.8

7.0

7.2

7.4

7.6

figure 9. Spanish vowels in the Italian Voronoi diagram

We conclude by showing, in Figure 10, an instance of recognition which is not acomprehension. The inventory is now the French one [Delattre, 1981]. Notice thatthe vowel /o/ somehow weakens the nice recognition property. This example offersindeed a good opportunity to see δ(_, _) and λ(_, _) in action.

property 4. πfre,ita = (1, 1, 1, 1, 2, 2, 2). Therefore, ita ⊢ fre or, equivalently,Ifre,ita = 1

5= 0.2 is the smallest possible. However, δ(fre, ita) = ∆(o) = 0.27, with

λ1(fre, ita) = 0.

As anticipated, the middle of this section is devoted to an improvement procedurewhich, by slightly changing the position of some vowels of the listener’s inventory,increases the auditory capabilities with respect to a given foreign language. Leavingaside the purely combinatorial questions raised by this procedure,7 such a vowelmoving could be interpreted in several ways. For example, it gives an indicationabout the effort of a listener who wishes to adjust his own inventory, so as to makeit more similar to – and hence more capable of understanding – another inventory.

definition 7. A map f : A → A′ is a recognition improvement of type τ , withrespect to B, if, for each a ∈ A, one has

∣∣∣∣

∣∣ϕ−1

B,A(a)∣∣−

|B|

|A|

∣∣∣∣≥

∣∣∣∣

∣∣ϕ−1

B,A′(f(a))∣∣−

|B|

|A|

∣∣∣∣

where at least one inequality is strict and, for any a such that equality holds, V(a) =V(f(a)), with the exception of τ vowels.

7A study of these questions could not be contained in few lines. It deserves at least a separatesection, which would make the present paper too lengthy. Some related work, by the same authors,is in progress.

Page 15: A combinatorial approach to the phonetic similarity of

34 d. a. gewurz, a. vietri

b

b

b

b

b

b

b

iita eita Eita

aita

Oitaoitauita

bcibce

bcE

bca

bcy

bcø

bc

œ

bcO

bcobcu

5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.86.4

6.6

6.8

7.0

7.2

7.4

7.6

figure 10. French vowels in the Italian Voronoi diagram

Therefore, the larger τ , the more domains are modified which are not involvedin the actual recognition improvement. Clearly, if τ is equal to 0 we have done areally good job.

Let us then analyse the English and German examples above (Figures 6 and8, respectively), looking for some possible recognition improvements. In the firstexample, we could try to move the Italian vowel /a/, in such a way that either theEnglish /3/ or /æ/ finds relocated from V(a) to V(E). However, in the former case/3/ would eventually belong to V(O), whereas in the latter case it seems hard toplace /a/ without causing the English vowel /A/ to fall in V(a) instead of V(O) –using the data in the Appendix 2 the reader can verify the above. We encounterfewer difficulties when trying to move the (Italian) /E/. The required adjustmentcan be realised as shown in Figure 11. Unfortunately, the resulting recognitionimprovement is of type 2 (besides V(E) and V(a), also V(e) and V(O) get affected).8

Finally, we remark that the just described improvement produces a vowel inventorythat actually recognises eng. However, the type equal to 2 warns us there is a priceto pay.

Let us now examine the second example. Here the situation is more entangled,for we should as carefully as possible modify V(e) together with either V(o) orV(O) (note that, with our data, /U/ ∈ V(u)). The only practicable way seems thatof moving /o/ and/or /O/ (which is eclipsed by the homonymous German vowel)towards the centre of the diagram. By doing so, however, there is no hope of bringingτ down to 3, which is not exactly a great result. At any rate, this experiment helpsus to get a better knowledge of the German inventory, when compared to the Italianone.

8Although improvements of type 0 are rather uncommon to find – depending as they are frompeculiar geometric features of the involved domains – devising improvements of type 1 is expectedto be not so difficult in general.

Page 16: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 35

b

b

b

b

b

b

b

+

iita eita Eita

aita

Oitaoitauita

Eita

bcibc

I

bcE

bcæ

bc3bc2

bcA

bc6

bc

O

bcUbcu

5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.86.4

6.6

6.8

7.0

7.2

7.4

7.6

figure 11. Enhancing recognition capabilities

Recognition improvements of small type are certainly welcome although, asshown in the above examples, they seem to be seldom obtainable. If we refrainfrom requiring that certain Voronoi domains be not affected, another definition ofimprovement can be given which, remarkably, lends itself to a fruitful generalisationin the case of more than two languages involved. The great difference between thiscombinatorial approach and the previous one reflects in the even greater differencebetween the two phonetical interpretations. Namely, in the new setting we shall nolonger try to modify an existing inventory. Rather, we would like to work out aprivileged inventory, starting from the given inventories. The final outcome mightvery well be the inventory of a nonexistent language – and yet quite useful. We havethus come to the third and last part of this section.

We are going to point out a sufficient condition, on the Voronoi diagrams of somegiven vowel sets, for the existence of a vowel set of given size which is recognised, orcomprehended when possible, by all the initial sets. The already defined inventoriesita, eng and spa will be the main ingredients of the next example, showing how toobtain a suitable inventory of size 11 (as the reader will soon realise, the chosen sizeis not related to the size of eng).

Let us then focus on the next three diagrams. Numbers 1 to 11 have beenassigned to the domains of each Voronoi diagram, in such a way that the followingtwo properties hold:

1. For each integer, the three domains it is assigned to (in the three diagrams)have, all together, nonempty intersection.

2. For every diagram, the numbers are distributed as equably as possible. Equiva-lently, in every domain there are either n or n+1 numbers, where n = ⌊11/|A|⌋and A is the corresponding vowel inventory.

Page 17: A combinatorial approach to the phonetic similarity of

36 d. a. gewurz, a. vietri

Using the above numbering, it is now immediate to obtain a new inventory ofsize 11 which is recognised by both ita and spa and comprehended by eng. Indeed,for each i = 1, 2, . . . , 11 it suffices to pick any point in the intersection of the threecorresponding domains, which is non-empty by the first of the above properties (seeFigure 12, bottom right side).

1 2 3 4

5

6

78

910

11

5.2 5.6 6.0 6.4 6.86.4

6.8

7.2

7.6 1 2 3 4

5 11

6 710

8 9

5.2 5.6 6.0 6.4 6.86.4

6.8

7.2

7.6

1 2 3 4

5 6 11

7 108 9

5.2 5.6 6.0 6.4 6.86.4

6.8

7.2

7.6

1 2 3 4

5

6

78

910

11

5.2 5.6 6.0 6.4 6.86.4

6.8

7.2

7.6

figure 12. Defining an inventory recognised by eng, ita, and spa

Notice that the above example could be slightly modified by removing either thethree 5’s, or the 6’s, or the 11’s, thus ending up with a suitable inventory of size 10.One could further reduce the size (some possible choices for an inventory of size 9are those obtained by removing the 5’s and the 6’s simultaneously, or the 6’s andthe 8’s; a possible inventory of size, say, 5 is {1, 3, 5, 7, 8}, and so forth). And also,in the other direction, adding three 12’s to the domains labelled by 1 (or to thoselabelled by 2 or 10) would yield an inventory of size 12 which is recognised by thethree initial sets (clearly, in this case eng no longer comprehends the inventory).

The two above properties are instances of a far more general case that can beformalised as follows (the elementary proof is omitted).

proposition 4. Let {V11, V12, . . . , V1i1}, {V21, V22, . . . , V2i2}, . . . , {Vs1, Vs2, . . . , Vsis}be the Voronoi domains associated to some sets A1, . . . , As. Assume that, for somepositive integer t, there exist s maps {fj : {1, 2, . . . , t} → {1, 2, . . . , ij} : 1 ≤ j ≤ s}such that

⌊t

ij

≤ |f−1j (h)| ≤

⌈t

ij

for all admissible j, h, and with the further property that

Page 18: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 37

1≤j≤s

Vjfj(k) 6= ∅

for each k ∈ {1, 2, . . . , t}. Then there exists a set C, of size t, which is recognisedby every Aj (thus, in particular, Aj |= C whenever ij ≥ t).

In the special case where s = 2 and t = i1 ≤ i2, by assuming – clearly withno loss of generality – that f1 is the identity, the above stipulations amount tosaying that the family {Np} (1 ≤ p ≤ t) of subsets of {1, 2, . . . , i2}, defined throughq ∈ Np ⇔ V1p ∩ V2q 6= ∅, has a transversal or a system of distinct representatives(namely, a set of t distinct elements {ρ1, ρ2, . . . , ρt} such that ρi ∈ Ni for all i). It isthen possible to apply Hall’s Marriage Theorem (see e.g. [Bryant, 1993]) and obtainthe following.

proposition 5. Under the above assumptions, a set of size t comprehended by A1

and A2 exists if and only if, for every I ⊆ {1, 2, . . . , t},

∣∣∣∣∣

{

q :

(⋃

p∈I

V1p

)

∩ V2q 6= ∅

}∣∣∣∣∣≥ |I| .

Also in the present context, combinatorial questions could be addressed, such asthe classification of all possible transversals for small sets of points (with respectto the geometrical features of the Voronoi domains). Finally, some basic properties(e.g. the transversal hypothesis being strictly weaker than the recognition of A2 byA1) could be established.

4. CONCLUSION

The present work was a springboard for fruitful discussions with many researchersin the world and, in the authors’ minds, a motivation for further research both onthe purely mathematical side and on the applicative one. In this last regard, severalopen questions in Linguistics seem to be quite compatible with the recognition-comprehension approach and, even more simply, with the representation by meansof Voronoi domains.

Acknowledgments. The authors wish to thank Francesca Merola for her valuable comments,suggestions, and support provided along the way. They are also grateful to R.A. Bailey and JamesW.P. Hirschfeld for their kindly accepting to undergo some phonetical tests about English vowelpronunciation, as well as for their vivid encouragement, and to Paolo Milizia for reading, andconstructively commenting, a preliminary draft of the paper. They also thank the anonymousreferees for their precious and detailed comments. Finally, the authors found TIPA, the packagefor processing IPA symbols in LATEX by Rei Fukui, indispensable for the typesetting of this paper.

Page 19: A combinatorial approach to the phonetic similarity of

38 d. a. gewurz, a. vietri

REFERENCES

abercrombie d., Elements of general phonetics, Edinburgh, Edinburgh University Press,1967.

aurenhammer f., “Voronoi diagrams – a survey of a fundamental geometric data struc-ture”, ACM Comp. Surveys 23, 1991, p. 345-405.

badino l., barolo c., quazza s., Language independent phoneme mapping for foreignTTS, 5th ISCA Speech Synthesis Workshop, Pittsburgh, 2004.

batóg t., “A logical reconstruction of the concept of phoneme” (Polish), Studia Logica 11,1961, p. 139-183.

batóg t., The axiomatic method in phonology, New York, Humanities Press, 1968.

bryant v., Aspects of combinatorics: A wide-ranging introduction, Cambridge, CambridgeUniversity Press, 1993.

canepari l., A Handbook of phonetics, München, Lincom Europa, 2005.

cosi p., ferrero f., vagges k., “Rappresentazioni acustiche e uditive delle vocali ital-iane”, A. Cocchi, Atti del XXIII Convegno Nazionale AIA, Bologna, 1995, p. 151-156,http://www.csrf.pd.cnr.it/Papers/PieroCosi/cp-AIA95.pdf.

delattre p., Studies in comparative phonetics: English, German, Spanish and French,Heidelberg, J. Groos, 1981.

fairbanks g., grubb p., “A psychophysical investigation of vowel formants”, Journal ofSpeech and Hearing Research 4, 1961, p. 203-219.

hall d.c., “On the geometry of auditory space”, Toronto Working Papers in Linguistics17, 1999.

Handbook of the international phonetic association. A guide to the use of the internationalphonetic alphabet, Cambridge, Cambridge University Press, 1999.

harary f., “The number of functional digraphs”, Math. Ann. 138, 1959, p. 203-210.

jones d., English pronouncing dictionary, London, J.M. Dent & Sons Ltd, 1917.

jones d., An outline of english phonetics, [second edition], Leipzig, Teubner, 1922.

klein r., “Concrete and abstract Voronoi diagrams”, Lecture Notes in Comput. Sci. 400,Berlin, Springer, 1989.

kondrak g., “Identifying cognates by phonetic and semantic similarity”, Proceedings ofthe Second Meeting of the North American Chapter of the Association for ComputationalLinguistics (NAACL 2001), Pittsburgh, 2001, p. 103-110.

ladefoged p., A Course in phonetics, New York, Harcourt Brace Jovanovich, 1975.

liljencrants j., lindblom b., “Numerical simulation of vowel quality system: the roleof perceptual contrast”, Language 48, 1972, p. 839-862.

nerbonne j. et al., “Phonetic distance between Dutch Dialects”, G. Durieux, W. Daele-mans, and S. Gillis (eds.), CLIN VI: Proc. of the sixth CLIN meeting, Antwerp, 1996, p.185-202.

nerbonne j., heeringa w., “Measuring dialect distance phonetically”, J. Coleman (ed.),Workshop on Computational Phonology, Madrid, 1997, p. 11-18.

okabe a., boots b., sugihara k., nok chiu s., Spatial tessellations. Concepts andapplications of Voronoi diagrams,[second edition], New York, John Wiley & Sons, 2000.

petitot j., “Morphodynamics and the categorical perception of phonological units”, The-oretical Linguistics 15, 1989, p. 25-71.

Page 20: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 39

sorianello p., “Un tentativo di classificazione uditiva delle vocali senesi”, P. Cosi,E. Magno Caldognetto, and A. Zamboni (eds.), Studi in onore di Franco Ferrero, Padova,Unipress, 2003.

syrdal a.k., “Aspects of a model of the auditory representation of American English”,Speech Communication 4, 1985, p. 121-135.

traunmüller h., “Perceptual dimension of openness in vowels”, Journal of the AcousticalSociety of America 69, 1981, p. 1086-1100.

wells j.c., A study of the formants of the pure vowels of British English, MA thesis,University of London, 1962, [unpublished],http://www.phon.ucl.ac.uk/home/wells/formants/index.htm.

wise c.m., Applied phonetics, Englewood Cliffs (NJ), Prentice-Hall, 1957.

Page 21: A combinatorial approach to the phonetic similarity of

40 d. a. gewurz, a. vietri

APPENDIX 1: CLASSIFYING APPROXIMATIONS VIA GRAPHS

Let us consider two sets A,B of points in the Euclidean plane, in general position withrespect to one another (see beginning of Section 2.). By putting together ϕA,B and ϕB,A

so as to obtain a unique map µ : A ⊔ B → A ⊔ B, the involved points can be interpretedas vertices of a directed graph, GA,B , whose generic arc (u, v) indicates that µ(u) = v (seeleft side of Figure 13; empty circles correspond to elements, say, of A).

s

s

c

c

c

c

c

s

s

c

s

c

-

6? @@R6?

6?-HHHHj

�����@@I

=⇒ re r re re

r

r

r

r

re0

0 10

figure 13. The functional digraph from a map µ and the corresponding (0,1)-rootedforest

Directed graphs arising from mappings of a set into itself (a directed arc (a, b) meaningthat b is the image of a) are called functional digraphs [Harary, 1959]. They have beencharacterised by the following theorem. (A digraph or component is called weak if it isconnected as a non-directed graph; R−1(b) denotes the set of vertices from which thereexists a directed path to the vertex b; and a directed rooted tree is a directed tree in whichevery arc is directed towards the root.)

theorem 2. [Harary, 1959] A finite digraph is functional if and only if each of its weakcomponents contains exactly one directed cycle Z and, for each vertex b of Z, the subgraphR−1(b) is a directed rooted tree with root b.

In the present context, Lemma 1 and the above theorem immediately imply the fol-lowing.

proposition 6. The unique cycle of any weak component of a digraph GA,B is a cycle(v,w, v) of length 2.

By shrinking every such trivial cycle to a vertex, any digraph GA,B can be regarded asa disjoint union of trees, each of which has a root (that is, a distinguished vertex) corre-sponding to the cycle, and whose edges leaving the root have been labelled using numbersin {0, 1} (so as to recover the vertices adjacent to the removed cycle, by associating 0 to A,1 to B; see the right side of Figure 13). Let us term any such graph a (0, 1)-rooted forest. Ifwe define two (0,1)-rooted forests isomorphic when a vertex bijection exists between themwhich preserves roots, incidences, and labels, then nonisomorphic (0,1)-rooted forests turnout to be associated to “essentially distinct” pairs (A,B), according to the next terminology.

notation. We write (A,B) ∼ (A′, B′) if there exist two bijections ξ : A → A′, η : B → B′

such that ξ ◦ ϕB,A = ϕB′,A′ ◦ η and η ◦ ϕA,B = ϕA′,B′ ◦ ξ.

Page 22: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 41

A B

A′ B′

ϕA,B

ϕB,A

ϕA′,B′

ϕB′,A′

ξ η

In plain words, we require that the two pairs behave in the same way with respect to theapproximation functions ϕ_,_. It is not difficult to show that ∼ is an equivalence relationand that the (0,1)-rooted forests arising from two pairs (A,B), (A′, B′) are isomorphic ifand only if (A,B) ∼ (A′, B′). We leave to the reader the proof details of the above facts,which we summarise as follows.

proposition 7. Let FA,B be the (0,1)-rooted forest associated to (A,B) in the way de-scribed above. Then, the map

G :{ pairs (A,B) }

∼−→

{ (0,1)-rooted forests }

isomorphism

which sends [(A,B)]∼ to the isomorphism class of FA,B, is well-defined and injective.

For example, pairs of sets comprehending one another, and of fixed size s, make up asingle equivalence class. The (0,1)-rooted forest associated to any pair of this class is a setof s roots with no edges.

We can actually say something more about G.

theorem 3. The map G is surjective (thus, it is a bijection).

Proof. We have to show that, given any (0,1)-rooted forest F , there exists a pair (A,B)such that F = FA,B. Let us first examine the case of a connected forest, that is, of a tree.In keeping with Proposition 6 and with the remarks following it, we can replace F withthe corresponding cycle C = (v1, v2, v1) having directed trees T_1, T2 rooted at verticesv1, v2 respectively (trees may consist of a single vertex). For j = 1, 2 let us define T ′

j asthe digraph obtained by adding the arcs (v1, v2), (v2, v1) to Tj .

As a first step we exhibit two pairs of sets (Xj , Yj) (j = 1, 2) such that, for both j,FXj ,Yj

= T ′j . The construction is performed by induction on the depth, d, of the tree Tj,

as follows (the case d = 0 requires a little adjustment and is therefore postponed). If d = 1and there are m children, we consider a convex sector (see top of Figure 14) and placeone point in the middle of each of the arcs σ1, . . . , σm which equably partition the arc ofthe sector (in the illustration, m is equal to 1). We also place a point in the centre of thesector, and denote by r1 the sector radius.

The recursion step, with d ≥ 2, is managed as follows. We first remove the leavesat depth d and consider the sector and points arising, by induction, from the modifiedtree. Let rd−1, rd−2 denote the radii of the two sectors last constructed (set r0 = 0). Weextend the sector to a new sector having the same amplitude and whose radius, rd, satisfiesrd − rd−1 > rd−1 − rd−2. On the arc, A, of the new sector we define as many points as theremoved leaves, in the following way. Let v be the number of leaves adjacent to a givenvertex N of the smaller, “defoliated”, tree. Consider the projection – with respect to thesector centre – on A of the arc σN corresponding to N . Once partitioned such projectioninto v isometric arcs, we place one point in the middle of each arc.

Page 23: A combinatorial approach to the phonetic similarity of

42 d. a. gewurz, a. vietri

v3−j

vj

Tj

T ′j

d = 4b

hj

b

b b

b

bc

bc bc bc bc

v1 v2T1 T2

Σ1Σ2

b b

p q

figure 14. Construction of a suitable pair (A,B)

We complete the whole construction by placing one point hj at a distance from thecentre smaller than r1, and opposing the sector.

The reader can easily verify that the eventually obtained set of points, say Σj, yields therequired digraph T ′

j, provided its elements are partitioned in two classes Xj , Yj accordingto the parity of the depth at which they lie. Notice that an arbitrarily small amplitude ofthe sector can be chosen at the beginning. Indeed, what really counts is that the differencesof consecutive radii increase as new radii are introduced.

Let us finally put the two above pairs {(Xj , Yj)} together, as follows. We set twopoints, p, q, at a distance smaller than r1. Then for each j we remove hj and identify thetwo points, one from Σj and one from {p, q}, which refer to the same vertex in FXj ,Yj

andC. Subsequently, we define A as the set of points of Σ1 occupying an even level, togetherwith the points of Σ2 occupying an odd level. Clearly, the remaining points will form theset B.

In the special case where a tree Tj consists of a unique point (that is, the case d = 0), wedefine the corresponding set Σj as a unique point and proceed with the vertex identification,as above.

By possibly reducing the sectors amplitudes (to avoid “interferences” between the twosectors), while rotating sectors so as to let their axes coincide, with {p, q} contained inneither sector (see Figure 14), the claimed pair (A,B) is easily obtained.

In the general case of a forest with connected components K1, . . . ,Km, the aboveconstruction can be replicated, with the proviso that the corresponding pairs(A1, B1), . . . , (Am, Bm) be placed at a suitable distance apart (say, with the minimumdistance greater than the maximum diameter of the set of points corresponding to a com-ponent).

The above argument relies on the “plenty of room” available in the 2-dimensional space(even though the sectors can be arbitrarily narrow). In fact, the 1-dimensional analogue ofTheorem 3 does not hold in general. As an example, in the left side of Figure 15 we havedepicted an admissible functional digraph which cannot be obtained from a pair (A,B) ofsets of points entirely contained in a line. For let us suppose that any such configuration on

Page 24: A combinatorial approach to the phonetic similarity of

a combinatorial approach to the phonetic similarity of languages 43

a line exists. Then, without loss of generality we can assume that either a3 is freely placedbetween a1 and b1, or it lies outside the interval [a1, b1] and closer to b1, with a1b1 > b1a3.In the first case the only two possible arrangements – with a2 and a4 still to be placed –are those depicted in the right upper side of the same figure (b2 and b3 can be thereforeinterchanged). Such configurations prevent any placement of a2 or a4 respectively. Thediscussion in the second case is similar, and is left to the reader (see the right lower sideof the figure).

s

c

scs c s

a1

b1

a3b2a2 b3 a4

?

wo

- - � �

s s c c c

a1 a3 b1 b2 b3b3 b2

no

distance

constraint

s c s c c

a1 b1 a3 b2 b3

b3 b2

⇒ b1a3 < a3b2 .

⇒ b1a3 < a3b3 .

a1b1 > b1a3 ,

figure 15. Some difficulties in the 1-dimensional case

Page 25: A combinatorial approach to the phonetic similarity of

44 d. a. gewurz, a. vietri

APPENDIX 2

In the following tables we present the first and second formants (in Hertz) of the vowelphonemes for Italian [Cosi, Ferrero, Vagges, 1995], English [Wells, 1962], German, French,and Spanish (the last three from [Delattre, 1981]).

Italian (ita) English (eng) German (ger)phoneme F1 F2 phoneme F1 F2 phoneme F1 F2

i 291 2251 i 285 2373 i 300 2300e 394 2082 I 356 2098 I 350 2100E 513 1989 E 569 1965 e 400 2100a 742 1420 æ 748 1746 E 525 1850O 552 949 A 677 1083 a 725 1400o 447 856 6 599 891 A 750 1200u 325 789 O 449 737 O 550 950

U 376 950 o 425 850u 309 939 U 375 8752 722 1236 u 300 8253 581 1381 y 300 1750

Y 350 1600ø 400 1550œ 525 1475

French (fre) Spanish (spa)phoneme F1 F2 phoneme F1 F2

i 275 2400 i 300 2250e 400 2200 e 475 1950E 550 1900 a 750 1400a 750 1400 o 475 950O 575 1050 u 300 800o 400 800u 275 775y 275 1900ø 400 1600œ 600 1350