a fourier series description of the tongue profile...i stl-qpsr 4/197 i 11. speech production a....

26
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report A Fourier series description of the tongue profile Liljencrants, J. journal: STL-QPSR volume: 12 number: 4 year: 1971 pages: 009-018 http://www.speech.kth.se/qpsr

Upload: others

Post on 12-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

A Fourier series descriptionof the tongue profile

Liljencrants, J.

journal: STL-QPSRvolume: 12number: 4year: 1971pages: 009-018

http://www.speech.kth.se/qpsr

I

STL-QPSR 4/197 I

11. SPEECH PRODUCTION

A. FOURIER SERIES DESCRIPTION OF THE TONGUE PROFILE+

J. Liljencrants

Introduction

In the study of the movements of the ar t iculatory organs the sagittal pro-

jection i s often used. Such images can for instance be obtained by x- ray

photography.

The purpose of this r epor t i s to show how data on the tongue profile ob-

tained that way may be described and modelled in t e r m s of a Four ier ser ies .

Capitalizing on the generally smooth shape of the tongue, the description is

compact, and the accuracy can be selected with the number of coefficients

included. The method can thus be conveniently used both to descr ibe profiles

of live speakers a s well a s fo r operating simplified artificial models. I I

Attention i s given to some pract ical aspects like placement of the coord-

inate system, and interpretation of the Four ier coefficients.

Data collection

The mater ia l investigated i s a se t of x - r a y photographs of two speakers ,

each uttering ten different sustained vowels. Some processing has a l so been

made on mater ia l with the same persons singing, both subjects being educated

singers. The mater ia l was courteously supplied by Sundberg and Lindblom

and i s the same a s used by them for other investigations, see ref. (1).

The mid-sagittal contour of the tongue body was t raced f rom the photo-

graphs and put on a coordinate system, a s shown in Fig. 11-A-I. The system

has a polar p a r t covering the ora l cavity and a Cartesian p a r t for the pharyngeal

region. The contour was sampled a t lo0 intervals in the polar system and at

5 m m intervals in the Cartesian. The data was given to the computer a s a

ma t r ix with a 30 -point element for each vowel. In general the profile occupied

samples number 2 to 28. The remaining samples number 0, 1, and 29 were

filled in with values l inearly interpolated between samples number 28 and 2.

All coordinate values in this study a r e in cm. The scale per tains to the

x - ray photographs which a r e l a rge r than the subjects by a factor of 1.2.

ZZ

' Expanded version of paper DD16, presented a t the 79th Meeting of the Acoustical Society of America, April 1970.

STL-QPSR 4/197 1 10.

To arr ive a t a suitable placement of the coordinate system the different

tracings for one subject were aligned with respect to the upper incisors and

the frontal protrusion of the second vertebra, Then the origin was placed a t

a visually determined ccnter to the ora l par t of the tongue profile and the

vertical axis was made approximately parallel to the r e a r pharyngeal wall.

Fig. 11-A-2 graphically shows the input data for one subject arranged in

a linear manner. The horizontal coordinate i s the sample number. The ver-

t ical coordinate is the radius in the polar system of the mouth, and the dis-

tance from the vertical axis in the pharyngeal system.

Different subjects and placements of the coordinate systems a r e indicated

with speaker codes in the illustrations a s explained in Table 11-A-I. Also the

vowels used a r e indicated with codes a s given in Table 11-A-11.

Philosophy

By inspection of Figs. 11-11-1 and 11-A-2 we can make some elementary

observations. F i r s t we see that the mean value of the "excursion coordinate",

plotted vertically, for the different profile s does not change significantly be - tween vowels. This can be regarded a s a consequence of the "conservation

of mass'l in the tongue body. Of course this statement has only a limited

validity since the sagittal third dimension has been ignored. Secondly, the

strong coherence between successive samples is apparent, that i s , the geo-

metr ical magnitude of the fine structure is much smaller than that of the

overall gross shape variation. Many of the shapes have a strong resemblance

to a sinusoid. These a r e the reasons for the proposition of this experiment,

that the shape could efficiently be described in t e r ms of a Fourier series .

To test this a number of computations have been performed.

Basic proces sing -

The profiles were analyzed into a number of Fourier coefficients. F o r

convenience a representation in t e rms of sine and cosine t e rms was selected:

2 CCr = - I: Yn cos ( 2 n r n / ~ )

Nn=o

N-1 CSr = - Z Yn sin

Nn=o

SPEAKER 2

I. IPS LftRYNX SAMPLE NO LENGTH NORMALIZED 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

Fig. 11-A-2. Tongue profiles on a Cartesian coordinate system. The data has been normalized using the distance along the contour as the abscissa. Dots indicate points in the original coordinate system.

-

STL-QPSR 4 / 1 9 7 1 1 I.

Table I1 -A -I.

Speaker codes in the i l lustrations

Speaker Comment s

0 Subject RL, fixed coordinate system

2 Subject JS, fixed coordinate system

5 Subject RL, moving coordinate system, following the mandible

6 Subject RL, fixed coordinate sys tem, translated . 5 crn upwards and . 5 cm forwards a s compared to speaker 0

Table II-A -11,

Vowel codes in the illustrations

Vowel

0

I

2

3

4

5

6

7

8

IPA

u

i

a 0

Table II-A -111.

Speakers 0 , 2

cos/sin

CC 0.794 0.895 0.957 0.365 0.637

Speakers 0, 5

cos/sin

CC 0.977 0.987 0.956 0.882 0.968

Speakers 0, 6

cos/sin

CC 0.961 0.948 0 .994 0.650 0.970

STL-QPSR 4/197 1 12.

Y a r e the ordinate values of the N samples from the original profile. n

CS and CC a r e the sine and cosine Fourier coefficients in the single-sided r r

(towards positive spatial frequencies) line spectrum of the periodic continua - tion of the shape.

Alternatively the Fourier coefficients may be represented with magnitude

and phase a s

Having computed the Fourier coefficients the shape can be reconstructed

using the inverse transform. Let us then include only the lower terms, up

to number U:

The omission of the higher t e rms is equivalent to a spatial lowpass fil ter-

ing, and only the gross features of the original shape will be preserved. Thus

the computation will render an approximate model of the contour. The accuracy

will depend on the number of coefficients included. Fo r U = N/2-i there will

be no approximation, the r e -synthesized contour will match the original exactly.

In Figs. 11-A-4 and 11-A-5 the input data a r e shown a s small squares and

the re-synthesized data, using U = 2, a r e drawn a s solid lines. At the lower

left corner of each plot the Fourier cosine and sine coefficients a r e given. Also,

preceded by an E , i s the RMS deviation between the data and the model. This

e r r o r represents the total "power" of all the higher harmonics, not used in the

r e -synthesis.

As can be seen in the plots the e r r o r i s in general quite small, even using

the DC term and the fundamental only (Fig. 11-A-3). This i s a primary indica-

tor that the Fourier expansion i s an efficient descriptor of the profile shape.

It may be meaningful to measure the fidelity of the Fourier model to the in-

put data in t e rms like harmonic distortion. If the fundamental i s the highest

harmonic used this distortion can be defined in a conventional manner:

Using U = 2. If also some harmonics higher than the fundamental incorporated

in the model we can use U greater than 2 . I

Fig. 11-A-3 - 7. Input data (dots) and Four ier model (solid) for the various speakers and a set of 10 vowels. (Cf. Tables 11-A-I and 11-A-2). Legend a t lower left in each plot shows vowel number, RMS e r r o r , and cosine and sine coefficients.

SPEAKER 0 UNNORM HI TERM 1 FORTUN 12

Fig. 11-A-3.

SPEAKER 0 UNNORM H I TERM 2 FORTUN 12

Fig. 11-A-4.

2. 93 I - r

SPEAKER 5 UNNORM HI TERM 2 FORT UN 12

Fig. 11-A-6.

SPEAKER 6

UNNORM HI TERM 2 FORTUN 12

Fig. 11-A-7.

Fig. 11-A-8 - 11. Fourier coefficients for the profiles of Figs. 11-A-4 - 11-A-7. Digits denote vowel number and are placed according to the coefficients of the spatial fundamental. Vectors extending from the digits denote the coefficients of the second harmonic.

Fig. 11-A-9.

Fig . 11-A- 10.

STL-QPSR 4/1971 13.

This expresses the rat io of the !!unwanted signal", (which i s equivalent to

the RMS e r r o r ) to the j'wanted signal", (namely the data. The formula a l so

includes the DC t e r m which i s a deviation f rom usual practice in defining d is -

tortion on signals.

Visualization of analysis data

The representation of the tongue profile by means of a few Four ier co-

efficients may a t a f i r s t glance seem to be very f a r fetched and difficult to

interpret. We shall now il lustrate that this need not be the case.

In Figs . 11-A-8 and 11-A-9 the sine and cosine coefficients of the funda-

mental i s plotted on a Cartesian coordinate system. The values for each of

the contours in Figs . 11-A-4 and 11-A-5 i s indicated with numerals c o r r e s -

ponding to the vowel number. Also f rom each of these points there is a vector

showing the coefficients for the spatial second harmonic. The representation

of the coefficients in t e r m s of magnitude and phase i s aided in the coefficient

plot by the circles . These then indicate the magnitude coordinate of spatial

fundamental.

Figs . 11-A-12 and 11-A-13 show the profiles and coefficients fo r a simpli-

f ied, art if icial case. Item number 0 here has a DC coefficient only and co r -

responds to a case with "neutral" articulation. In i tems 1 to 5 a spatial funda-

mental having a constant magnitude i s added, and i t s phase i s systematically

varied. In the correspondent plots of contours we see how the point of maxi-

m a l excursion, o r equivalently: point of narrowest constriction, moves f rom

a "dental" position to a "pharyngeal". Thus the phase of the spatial funda-

mental i s a measure of the place of articulation. This i s a l so indicated in

Fig. 11-11-13 with m a r k e r s around the circumference of the plot.

In the simple model with DC t e r m and fundamental and having a fixed co-

ordinate value for the contour of palate, velum, and r e a r pharyngeal wall it

i s a l so easy to find the locus of complete lingual closure in the coefficient I

plot. It i s the circle where the magnitude of the fundamental equals the dif - ference between the palatal-pharyngeal coordinate value and the DC coefficient.

When a lso a second harmonic i s introduced a s in i tems 6 to 9 in Fig.

11-A-12, we see that the increased spatial resolution may be used in modeling

for instance the tongue tip. It i s then however more difficult to find a locus

of complete closure, but it can sti l l be done in an elementary way by vector

addition in the coefficient plot, and keeping in mind to scale the phase of the

higher harmonic vectors properly.

H I TERM 2 FORTUN 12

Fig. 11-A-12. Plots of various artificial contours, using the two lowest spatial harmonics, to i l lustrate the function of the model. Cf. Fig. II-A-13.

Fig. 11-A- 13. Coefficients for the artificial profiles of Fig. 11-A- 12.

8 ' ~

STL-QPSR 4/197 I 14.

The function of the Four ier model of the tongue m a y a lso be fur ther i l lus-

t ra ted with a mechanical device a s outlined in Fig. 11-A- 14. A circular disk

has a central pivot. When the disk i s tilted the height of i t s circumference

will be approximately n sinusoidal function of the angle around the disk. The

vert ical movements of equidistant points on the disk a r e t ransfer red with thin

wires to pointers arranged in a coordinate system. The se t of pointers will

indicate the model contour. The cosine and sine coefficients of the spatial

fundamental a r e given by the back-forth and the sideways t i l t of the disk

respectively. The DC t e r m may be controlled by elevating the pivot of the

disk.

Positioning of the coordinate system

At the outset of this experiment it was thought natural to use a coordinate

system having the mandible a s a reference, since a la rge p a r t of the lingual

muscles a r e joined with the mandible. Some data for this placement of the

coordinate system i s shown in Figs . 11-11-6 and 11-A-10.

A comparison with the data of Figs . 11-A-4 and 11-A-8. which pertain to

the same subject, shows that the deviation of the inode1 f rom the ideal is not

improved, the mean e r r o r i s essentially the same. So the accuracy aspect

does not seem to be an argument fo r a coordinate system following the man-

dible. One could however sti l l hope that the additional information neces - s a r y to specify the mandible position would give a dividend in form of a

smaller variation in the coefficients describing the tongue shape. Especial-

l y this ought to be the case with the spatial DC t e r m if the bulk of the tongue

was to follow the jaw. 11 comparison of the data shows that the standard

deviation of the DC t e r m i s indeed smaller in the mandibular coordinate

system, but not very much. In the fixed coordinate system we have

0 = 0 . 2 1 9 a n d i n t h c m o v i n g o =0 .172 . R 0 R 0

The conclusion will be that for the purpose of describing the profile shape

there i s no gain in taking the mandibular position into account, because the

increased number of pa ramete r s will not give a better accuracy o r a more

easy interpretation. Also i t will be more circumstantial with a moving tongue

coordinate system to determine the distazce between the tongue profile and

the stationary s t ructures of the speech organs.

11 different p r o b l e ~ n is to determine how cri t ical the positioning of the co-

ordinate system i s with relation to the stationary s t ructure. Figs. 11-A-7

and 11-A-11 shots. data on speaker 6. These pertain to the same tracings as

cas 4

Fig. 11-A- 14. Mechanical device to illustrate the Fourier model. Sine and cosine coefficients of the spatial fundamental control the tilt of circular disc with a central pivot. Vertical movements of its periphery a r e transferred with thin wires to coordinate system at top, Wires a re kept taut by small weights.

STL-QPSR 4/1971 15.

f o r speaker 0 but the coordinate system has been translated . 5 cm forwards

and . 5 cm upwards. Inspecting the polar coefficient plots in Figs. 11-A-8

and 11-A-I1 shows that the distribution of the coefficients i s very closely s im-

i la r . We see that the set of data points has been translated and somewhat

rotated, but the interrelations a r e essentially unchanged.

Normalizing

Vlhen the tongue contour comes close to the origin of the polar system,

then the sampling density, defined a s samples per unit of contour length,

becomes r a the r high in this portion. To compensate for that over - represen-

tation of a small pa r t of the contour a length normalization was performed in

p a r t of the computations. To do this an auxiliary table was computed which

contained the distances between the successive samples. This table was then

converted by summation into another table with cumulative distance along the

contour. Finally, normalized input data was derived by interpolation f rom

the original input in 30 points equidistant along the contour. Some data no r -

malized in this way a r e shown in Fig. 11-A-2, where the original data points

a r e marked with dots.

When the e r r o r s a r e compared in the two cases with and without normal-

izing with respec t to the contour length, then we find that the difference in

general i s so small that i t i s of no consequence. An important proviso i s

however that the coordinate system has been put reasonably central with

respect to the population of contours so that these will not come too close to

the origin of the polar system.

A s a consequence the extra labour of normalizing the data was abandoned

for a l a rge r p a r t of the investigation.

Correlation analysis - - o f t h e - - Four ier coefficients

F o r a further illustration a number of correlation coefficients have been

computed. In Table 11-A-I11 (p. I I), different speakers a r e pairwise compared.

F o r each of the Four ier coefficients the correlation coefficient has been com-

puted, using the sine/cosine coefficients. F o r the Four ier coefficient X with

ordinal number r , speakers k and 1, and using index n for the N vowels, the

correlation coefficient was computed a s

STL-QPSR 4/197 I

with the covariance

variance

and mean

m 1 - -

r , k - N X r , k , n

A high correlation for different speakers using the sine/cosine coefficients

will indicate that their corresponding coefficient plots ( a s in Figs. 11-A-8 to

11-A-11) a r e s imilar . The differences eliminated by the correlation analysis

a r e those of a paral le l translation and of magnitude scaling. A shortcoming

i s however that differences caused by rotation of the plots a r e not compensated

for. Rotational differences do exis t a s can be seen f rom the plots, and if

these had been compensated for the correlations of Table 11-A-I11 would prob-

ably have been sti l l higher.

The purpose of these computations i s to show that:

a. comparing different subjects ( speakers 0 and 2 ) , the DC t e r m and the spatial fundamental a r e strongly correlated. This implies that the gross shapes a r e s imilar for the two subjects uttering different vowels. The second harmonic i s not a s strongly correlated which should be due to a more subject dependent fine structure.

b. comparing different coordinate systems, cranium based and jaw based, (speakers 0 and 5), a l l coefficients a r e strongly correlated. F r o m this i t i s concluded that it i s i r relevant which coordinate system i s selected, so f a r a s the relative distribution of the coefficients for the vowels is concerned. In other words, the points in Figs . 11-A-8 and 11-A-10 a r e arranged in essentially the same pattern.

c. considering a displacement of a cranium based coordinate system (speakers 0 and 6) the same conclusions a s in b. seem to hold.

These resu l t s a r e not very revolutionary in themselves, but they con-

f i r m the usefulness of the Four ier coefficients a s descr iptors .

Another correlation study i s shown in Table 11-A-IV. F o r each speaker ,

the correlation between the various Four ier coefficients has been computed.

The aame formulae a s above were used, but using r a s subscript for the

speaker and k and 1 a s subscripts for two Four ier coefficients. Fur thermore

these computations were made on both the sine/cosine and the magnitude/phase

representations. Then a normalized phase measure was used, varying f rom

0 to 1:

fr = ((pr + )/2n

In the table, for simplicity, the nonexistent DC sine and phase components

a r e represented by zeros.

STL-QPSR 4/i97 1

Table 11-A-IV.

Speaker 2 -0.580 0.000 -0.7h0 0.000

Speaker 5 -0.5 18 0.000 -0.809 0.000

Speaker 6 -0.445 0,000 -0.824 0,000

Speaker 2 -0.153 0.000 0.709 0.000

STL-QPSR 4/197 1 18.

The more interesting p a r t of ihe table i s the one with the correlations

between the m a g n i t ~ d e / ~ h a s e coefficiznts, because these of the spatial funda-

mental can be interpreted a s measures of effort and place of articulation.

Examination of the table may lead to the following conclusions:

a. In a l l ca ses there i s a strong correlation between the DC t e r m and the phase of the spatial fundamental. This indicates that these two t e r m s a r e not orthogonal when regarded a s ar t iculatory parameters .

b. The fact that this holds a l so for speaker 5 (jaw based coordinate system) tends to counter indicate the usefulness of the mandibular position a s a n ar t iculatory parameter .

Conclusions -

The mid- sagittal profile of the tongue can be efficiently de scribed in

t e r m s of a set of Four ier coefficients. F o r modelling purposes it should be

sufficient with only two num'ber s to de scr ibe the profile , the magnitude, and

the phase of the spatial fundamental. Possibly the additional inclusion of the

DC t e r m may be useful, but this i s found to be correlated to the fundamental

phase . F o r increased accuracy in the modelling of live subjects an a rb i t r a ry

number of higher t e r m s m a y he included, without changes in the values of

the basic, low spatial frequency harmonics. There seem to be no tangible

gains in using the mandibular position a s a reference in the description. On

the contrary, a moving coordinate systeln for the tongue profile will make

c stimations of vocal t r ac t diameter and a r e a function a good deal m o r e com-

plicated. It i s thus recommended that the mandibular position i s not used a s

a p r imary articulatory parameter , but ra ther a s a sec.3ndary parameter ,

dependent on the others.

Reference

(1) Lindblom, 3. and Sundberg, J. : ErAcoustical consequences of lip, tongue, jaw and larynx movement", J. Acoust. Soc. Am. - 50, NO. 2 (1971), pp. 1166-1179.