s legrand s nack for r uby. talk objectives tour of api learn the walk and talk have fun

Post on 11-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

S Legrand

Snack

for

Ruby

Talk Objectives

Tour of APILearn the walk and talkHave Fun

SnackSnack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experimentSnack is a tcl-based APISnack has been adapted to and included in Standard Python Distribution

SnackSnack is Swedish for “talk” or “chat”Kåre Sjölander is the principal investigator for tcl-based snackTcl Snack is available at http://www.speech.kth.se/snack/

Snack for

RubyrbSnack is a ruby wrapper around tcl snackrbSnack has additional ruby based utilitiesrbSnack has html-based help. (rdoc+rbTeX)rbSnack can be found at http://rbsnack.sourceforge.net/

Snack Toolkit Includes

Recording, PlaybackWaveform displaySpectrogram: Fourier, LPCFormant analysisPower analysisFilters

(will demo)

The Speech Signal

Continuous speech is discretely sampledSignal consist of rapidly changing data points.The display of the sampled signal is called the waveformSnack can display the waveform real-time

Analysis uses framesSignal is broken into framesFrames may overlapCharacteristics of signal analyzed using Fourier and LPC analysis on a per frame basis.

Going in Circles

Complex numbers is just a funny way of multiplying: add angles.

Eulers formula

Fourier Analysis

Fourier matrix is an unitary matrixMultiplication by Fourier matrix returns the frequency components of the signal, called the Fourier coefficientsEasy to compute the inverse: Called Fourier Inverse

The Fourier Matrix Looks Like

Spinning disks

Multiplication by signal produces Fourier coefficients (frequency components)

Examining Fourier components

A Spectrogram gives a picture of the Fourier components (coefficients) as they evolve over time. Snack can display real time.Looks like an X RayBands of high activity correspond to formants

Linear Filters

Useful to understand nature of speech signalsGenerators: generate square waves, sin waves, saw tooth, etc.Composers: composes several filters.FIR: Finite impulse responseIIR: Infinite impulse response

FIR Filter

Determined completely by response to a unit impulse.Response finite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n)

(We will demo FIR using rbSnack)

IIR Filter

Also called Recursive filterResponse infinite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n) +a1 y(t-1)+ a2y(t-2)+…+an y(t-n)

(We will demo IIR using rbSnack)

Linear Predictive Analysis

Analogous to Fourier analysisAssumption: For each frame, the signal is predicted by

The LPC coefficients are the best least squares approximation.Can also be used to predict formants

y(t)=a1 y(t-1)+ a2y(t-2)+…+ap y(t-p)

What is Sound? What is Speech?

Sound is the resulting signal created by the longitude waves in some medium like air.Sound waves are continuousCan be decomposed into linear combination of sin waves.Speech is a special noise made by humans

It’s Just Tubing…

The simplest model of speech is to consider the lungs and trachea as one long tube.Resonance frequencies are called Formants.

F1 F2

Some Speech Recognition

FeaturesFormantsPitchVoiced/UnvoicedNasalityFrication

Energy

Our current work only uses Formants and Energy

Basic UtterancesA basic unit of speech is called a PhoneVowels are utterances with constant formantsDiphthong is the transitioning from one vowel to anotherVowels and Diphthongs are essentially characterized by the first and second formant.

Other Phones: The Consonants

Plosives: closure in oral cavity /p/Nasal: Closure of nasal cavity /m/Fricative: Turbulent airstream noise /s/Retroflex liquid: Vowel like-tongue high curled back /r/Lateral liquid: Vowel like, tongue central, side air stream /l/Glide: Vowel like /y/

Some Problems with Speech Signals

Segmentation: when does a word begin and end? (Noise?)Wet ware: (speaker’s internal configuration + lip smacks, breathing etc.)

SegmentationWorkshop demos one approach.

Code Books

A code book consists of code words.Idea is to search through code book to find code word corresponding to best match of feature sequence.RbSnack uses codebook approach in word recognition.

Code Book Approach

++ Easy to implement

+ Good for isolated words

+- Works best on small vocabularies

-- Is insensitive to context, prone to errors

Code Book Approach

WhichWay is a simple demo of this approach

More Problems with Speech Signals

Accent: Southern vs. New England vs. California Valley vs. Other.Variation in rate of speech makes it hard to compare words

Dynamic Time Warping

A pattern comparison techniqueA way of stretching or compressing one sequence to match another.Evaluated using dynamic programming

Dynamic Programming

Form a grid, with start at lower left, end at upper right.Label each node with difference (error) between pattern 1 at time i and pattern 2 at time j.Find minimal distance from start to end using

Dynamic Programming

A possible path

Basic Assumption:

If best path P(S,E) passes through node N, then P(S,E) is the concatenation of P(S,N) (best from S to N) and P(N,E) (best from N to E)

Dynamic Programming

RbSnack includes examples for various time alignment approaches

1

2 13

3

2

Type I

Type III

Dynamic Programming

1

Itakura

1 1

Type IV

1

11

1

1

Hidden Markov Models

Sometime the second (or third) best match is the right word. Use HMM’s to ascertain the correct word in the context of the sentence. (Ditto for phones within a word)HMM’s are similar to non-deterministic finite state machines, except for they have non-deterministic output.

Hidden Markov Models

Dynamic Programming is used to compute weights.HMM’s look like

31

4

2

.4

.2

.4

P(/i/)=.5 P(/a/)=.2 P(/o/)=.3

PossibleFuture Directions

Examine other features, (pitch?)Incorporate other libraries. (Do the computationally hard work in C)Add more signal processing routinesAdd more examplesUse Hidden Markov Models

Lessons Learned/to be learned

Document everything.Nothings perfectAutomate everythingProject is never done

What’s next?Try it out.

top related