usc graphic identity program rachel alexander, asterios toutios,...

1
Articulatory Speech Synthesis from Vocal-Tract MRI Data Rachel Alexander, Asterios Toutios, Shrikanth Narayanan Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, USA Results The synthesizer generated vowel-consonant-vowel sound sequences, including voiced, voiceless, and nasal consonants, with reasonable intelligibility compared to the original recorded sound. Synthesized speech signal for VCV sequence /asi/ Conclusion Thus far, acoustics of vowel-consonant-vowel sequences have been synthesized on the basis of MRI data. Expansion to synthesis of short sentences is currently under investigation. With further developments, this synthesizer could potentially produce fully comprehensive text-to-speech simulation, using available vocal- tract data to generate more realistic and extensive articulatory synthesis results. References [1] Maeda, S.: “A digital simulation method of the vocal-tract system”. Speech Commun., 1 (3–4) (1982), 199–229 [2] Story, B.; Titze, I.; Hoffman, E.: “Vocal tract area functions from magnetic resonance imaging”. J. Acoust. Soc. Am., 100 (1) (1996), 537– 554. [3] Maeda, S, “Phonemes as concatenable units: VCV synthesis using a vocal-tract synthesizer”, in Sound Patterns of Connected Speech: Description, Models and Explanation, A. Simpson and M. Patzold, Eds., 1996, pp. 145–164. [4] Toutios, A.; Maeda, S. "Articulatory VCV synthesis from EMA data", Interspeech, Portland, Oregon, 2012. Email: [email protected] Methodology The synthesizer creates a consistent glottal pulse comprised of slow- and fast-varying components, given data representing the amplitudes of each component, the frequency of glottal vibration, and a variable sampling frequency, to simulate the noise source emitted from the vocal cords, using empirical rules developed by Maeda [3]. Simulated glottal area of voiceless VCV sequence /asi/ The glottal signal is passed through a series of acoustic equations simulating a dynamically changing lumped electrical transmission-line network derived from the provided vocal tract shape dynamics. The network consists of three segments – the vocal tract prior to the nasal branching point, the vocal tract beyond the nasal branching point, and the nasal tract – and is solved at any point in time with a backward substitution and elimination procedure, which also accounts for earlier states of the network. The output is the volume velocity at the exits of the lips and nostrils. The final speech signal is given by the differential of the sum of these velocities. Electrical transmission-line network simulation of the vocal tract. Taken from [1] Introduction Using a model-based method of articulatory synthesis, outlines of the midsagittal vocal tract provide a substantial basis for simulating human speech production. A speech synthesizer has been programmed in MATLAB, which, in the long term, will output sound signals corresponding to given inputs of dynamically changing vocal-tract shapes, obtained through real-time magnetic resonance imaging data from several speakers. Background and Data Articulatory synthesis: Synthesis of speech acoustics by simulation of the physics of the propagation of sound in the vocal tract and the dynamics of the vocal-tract shaping. The time-domain simulation was developed by Maeda [1]. Vocal-tract data: Vocal-tract area function dynamics for vowel- consonant-vowel sequences were generated by interpolating area functions for static vowels and consonants derived from MRI data by Story and Titze [2]. Inputs describing the area of the glottis were derived by rules described by Maeda [3] and Toutios and Maeda [4]. Vocal-tract area functions for /a/ and /s/ on the basis of data from [2] The area functions were effectively modified to replicate a variety of consonants. Fricative consonants were created by modifying the area functions for plosives and adding automatically friction noise in the vocal tract when correct aerodynamic conditions were met. Nasal consonants were synthesized using the area functions for plosives, by accounting for the shape of the nasal tract and the velopharyngeal port opening. 0 1000 2000 3000 4000 5000 Time (ms) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Glottal area (cm 2 ) Glottal area vs time 0 1000 2000 3000 4000 5000 Time (ms) -40 -20 0 20 40 60 Speech signal for VCV /asi/ 0 5 10 15 Distance from glottis (cm) 2 4 6 8 Area of vocal-tract section (cm 2 ) Vocal-tract area for vowel /a/ 0 5 10 15 Distance from glottis (cm) 0 1 2 3 4 5 6 Area of vocal-tract section (cm 2 ) Vocal-tract area for consonant /s/ a s s i a s i a

Upload: others

Post on 29-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: USC Graphic Identity Program Rachel Alexander, Asterios Toutios, …sipi.usc.edu/~toutios/pdfs/alexander2016articulatory.pdf · 2016-09-16 · Rachel Alexander, Asterios Toutios,

ArticulatorySpeechSynthesisfromVocal-TractMRIDataRachelAlexander,AsteriosToutios,ShrikanthNarayananSignalAnalysisandInterpretationLaboratory(SAIL),UniversityofSouthernCalifornia,USA

The University IdentityThis document provides the basic guidelines for the University of Southern California’s graphic identity program. The key to the program is the typographic system of wordmark, monogram and contrasting color. To maintain consistency throughout the system, the new graphic identity program prohibits the use of any additional iconography, marks or artwork outside of the approved University seal, shield and Trojan head (the latter to be used for spirit-related applications only). To obtain a full version of the new style guide, contact USC Purchasing or visit: usc.edu/identity

The Seal and ShieldThe primary logotype for the overarching University identity has been redesigned to combine the new USC monogram with the Adobe Caslon Pro wordmark and the shield. Regular- and small-use versions of both the seal and shield have been redrawn to ensure the best quality repro-duction. Size restrictions for each are detailed above. The shield exists as part of the official University seal and will be used as an additional brand icon throughout the new identity program. The only times the shield should be locked up with the wordmark or monogram are in the primary combinations shown at left. For complete guidelines visit: [email protected]

ColorThe official colors of the University of Southern California, USC Cardinal and USC Gold, are equal in importance in identifying the university. Precise matching of all USC colors is essential in all color applications. Do not use maroon and yellow, or red and ochre as a substitute for the USC colors. Correct and consistent use of USC’s official colors helps reinforce the university’s identity.

Introducing the New

Primary Logotype Combination

Informal Logotype (Horizontal)

Informal Logotype (Vertical)

Primary Monogram Combination

Formal Monogram Combination

Formal Logotype Combination

USC Graphic Identity Program

1-color USC monogram reproduces as 50% of the color being used.

Cardinal Background When the lockup appears on a cardinal background it reverses to white and the monogram runs gold.

Gold Backgrounds When the lockup appears on a gold background it can either reverse to white or run black while the monogram runs cardinal.

Black Backgrounds When the lockup appears on a black background it reverses to white and the monogram runs either cardinal, gold or 50% black.

The minumum required space around all logotypes is half the logotype’s height on all sides as shown above.

The Small-use Seal should be used for any application

between 1.25 inches and 0.75 inches wide.Do not reproduce

small-use version smaller than 0.75 inches.

0.75"1.25" wide

The Regular-use Seal should be used for any application

wider than 1.25 inches.For applications smaller

than 1.25 inches wide, use the small-use version.

1" wide 0.25"

The Regular-use Shield should be used for any

application wider than 1 inch.For applications smaller than 1 inch wide, use the

small-use version.

The Small-use Shield should be used for any application

between 1 inch and 0.25 inch.Do not reproduce small-use

version smaller than 0.25 inches wide.

Approved Color CombinationsBelow are the approved color combinations for all USC logotypes. The logotypes shown here have been specifi-cally created for use across all media. Always use approved artwork. Do not attempt to typeset or create any logos on your own.

Do not center the shield over the wordmark.

Do not combine shield with any informal academic unit logotype.

Do not place the shield to the right of

the monogram.

Do not string wordmark in one line whencombining with monogram.

Do not place monogram beneath

wordmark. Unapproved colors and alignment.

Unapproved color comb-ination.

Light Gray30% Black

Hex: #CCCCCC

PMS 201C

PMS 123C

Dark Gray70% Black

Hex: #777777

Direct CMYK Conversion

C31 M88 Y51 K22DO NOT USE

Direct CMYK Conversion

C0 M24 Y94 K0DO NOT USE

Black100% Black

Hex: #000000

Proper Bridge CMYK ConversionC7 M100 Y65 K32

Proper Bridge CMYK ConversionC0 M27 Y100 K0

WhiteHex: #FFFFFF

Approved RGB Conversion

R153 G27 B30Approved HEX:

#990000

Approved RGB Conversion

R255 G204 B0Approved HEX:

#FFCC00

PMS 123C

Secondary Web Colors

PMS 201C

Primary logotypes are to be used most often throughout USC publications and collateral.

Formal logotypes are to be used in the most formal aca-demic applications such as certificates, legally sanctioned documents and for-mal invitations.

Informal logotypes are to be used in any informal applications that require more flexibility in repro-duction, particularly when there are size constraints (small book bind, fabric, embroidery).

Results• Thesynthesizergeneratedvowel-consonant-vowelsoundsequences,

includingvoiced,voiceless,andnasalconsonants,withreasonableintelligibilitycomparedtotheoriginalrecordedsound.

SynthesizedspeechsignalforVCVsequence/asi/

Conclusion• Thusfar,acousticsofvowel-consonant-vowelsequenceshavebeen

synthesizedonthebasisofMRIdata.Expansiontosynthesisofshortsentencesiscurrentlyunderinvestigation.

• Withfurtherdevelopments,thissynthesizercouldpotentiallyproducefullycomprehensivetext-to-speechsimulation,usingavailablevocal-tractdatatogeneratemorerealisticandextensivearticulatorysynthesisresults.

References[1]Maeda,S.:“Adigitalsimulationmethodofthevocal-tractsystem”.SpeechCommun.,1(3–4)(1982),199–229

[2]Story,B.;Titze,I.;Hoffman,E.:“Vocaltractareafunctionsfrommagneticresonanceimaging”.J.Acoust.Soc.Am.,100(1)(1996),537–554.

[3]Maeda,S,“Phonemesasconcatenableunits:VCVsynthesisusingavocal-tractsynthesizer”,inSoundPatternsofConnectedSpeech:Description,ModelsandExplanation,A.SimpsonandM.Patzold,Eds.,1996,pp.145–164.

[4]Toutios,A.;Maeda,S."ArticulatoryVCVsynthesisfromEMAdata",Interspeech,Portland,Oregon,2012.

Email:[email protected]

Methodology• Thesynthesizercreatesaconsistentglottalpulsecomprisedofslow-

andfast-varyingcomponents,givendatarepresentingtheamplitudesofeachcomponent,thefrequencyofglottalvibration,andavariablesamplingfrequency,tosimulatethenoisesourceemittedfromthevocalcords,usingempiricalrulesdevelopedbyMaeda[3].

SimulatedglottalareaofvoicelessVCVsequence/asi/

• Theglottalsignalispassedthroughaseriesofacousticequationssimulatingadynamicallychanginglumpedelectricaltransmission-linenetworkderivedfromtheprovidedvocaltractshapedynamics.

• Thenetworkconsistsofthreesegments– thevocaltractpriortothenasalbranchingpoint,thevocaltractbeyondthenasalbranchingpoint,andthenasaltract– andissolvedatanypointintimewithabackwardsubstitutionandeliminationprocedure,whichalsoaccountsforearlierstatesofthenetwork.

• Theoutputisthevolumevelocityattheexitsofthelipsandnostrils.Thefinalspeechsignalisgivenbythedifferentialofthesumofthesevelocities.

Electricaltransmission-linenetworksimulationofthevocaltract.Takenfrom[1]

Introduction• Usingamodel-basedmethodofarticulatorysynthesis,outlinesofthe

midsagittalvocaltractprovideasubstantialbasisforsimulatinghumanspeechproduction.

• AspeechsynthesizerhasbeenprogrammedinMATLAB,which,inthelongterm,willoutputsoundsignalscorrespondingtogiveninputsofdynamicallychangingvocal-tractshapes,obtainedthroughreal-timemagneticresonanceimagingdatafromseveralspeakers.

BackgroundandData• Articulatorysynthesis:Synthesisofspeechacousticsbysimulationof

thephysicsofthepropagationofsoundinthevocaltractandthedynamicsofthevocal-tractshaping.Thetime-domainsimulationwasdevelopedbyMaeda[1].

• Vocal-tractdata:Vocal-tractareafunctiondynamicsforvowel-consonant-vowelsequencesweregeneratedbyinterpolatingareafunctionsforstaticvowelsandconsonantsderivedfromMRIdatabyStoryandTitze [2].InputsdescribingtheareaoftheglottiswerederivedbyrulesdescribedbyMaeda[3]andToutiosandMaeda[4].

Vocal-tractareafunctionsfor/a/and/s/onthebasisofdatafrom[2]

• Theareafunctionswereeffectivelymodifiedtoreplicateavarietyofconsonants.Fricativeconsonantswerecreatedbymodifyingtheareafunctionsforplosivesandaddingautomaticallyfrictionnoiseinthevocaltractwhencorrectaerodynamicconditionsweremet.

• Nasalconsonantsweresynthesizedusingtheareafunctionsforplosives,byaccountingfortheshapeofthenasaltractandthevelopharyngealportopening.

0 1000 2000 3000 4000 5000Time (ms)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Glo

ttal a

rea

(cm

2 )

Glottal area vs time

0 1000 2000 3000 4000 5000Time (ms)

-40

-20

0

20

40

60Speech signal for VCV /asi/

0 5 10 15Distance from glottis (cm)

2

4

6

8

Area

of v

ocal

-trac

t sec

tion

(cm

2 ) Vocal-tract area for vowel /a/

0 5 10 15Distance from glottis (cm)

0

1

2

3

4

5

6

Area

of v

ocal

-trac

t sec

tion

(cm

2 ) Vocal-tract area for consonant /s/

a s

s ia

s ia