27
by by Jiazhi Ou Jiazhi Ou [email protected] [email protected] Tal Blum Tal Blum [email protected] [email protected] Wild Dolphin Project Wild Dolphin Project 11-751 Speech Final 11-751 Speech Final Project Project

Upload: horace-page

Post on 23-Dec-2015

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

byby

Jiazhi Ou Jiazhi Ou [email protected]@cs.cmu.edu

Tal Blum Tal Blum [email protected]@cs.cmu.edu

Wild Dolphin ProjectWild Dolphin Project 11-751 Speech Final 11-751 Speech Final

Project Project

Page 2: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

OutlineOutline

Wild Dolphin Project, Dolphin SpeechWild Dolphin Project, Dolphin Speech Data, Labeling, Labeling problemsData, Labeling, Labeling problems Previous workPrevious work Models trainingModels training Experiments & ResultsExperiments & Results ConclusionsConclusions

Page 3: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

The Wild Dolphin Project (WDP)The Wild Dolphin Project (WDP)

The Wild Dolphin Project The Wild Dolphin Project (WDP), founded by Dr. Denise (WDP), founded by Dr. Denise Herzing in 1985, is engaged in Herzing in 1985, is engaged in an ambitious, long-term an ambitious, long-term scientific study of a specific scientific study of a specific pod of Atlantic spotted pod of Atlantic spotted dolphins that live 40 miles off dolphins that live 40 miles off the coast of the Bahamas, in the coast of the Bahamas, in the Atlantic Ocean. For about the Atlantic Ocean. For about 100 days each year, Phase I 100 days each year, Phase I research has involved the research has involved the photographing, videotaping, photographing, videotaping, and audio taping of a group of and audio taping of a group of resident dolphins, aiming to resident dolphins, aiming to learn about their lives. learn about their lives.

http://www.wilddolphinproject.ohttp://www.wilddolphinproject.org/index.cfmrg/index.cfm

Page 4: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Dolphin’s SpeechDolphin’s Speech

Range of frequencies is widerRange of frequencies is wider Two mechanisms for producing Two mechanisms for producing

sound simultaneouslysound simultaneously Directionality of some of the Directionality of some of the

frequenciesfrequencies Carried in waterCarried in water Can travel large distancesCan travel large distances

Dolphin’s Speech is very different than man’s Dolphin’s Speech is very different than man’s speechspeech

Page 5: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Dolphin’s Speech(2)Dolphin’s Speech(2)

Is used for:Is used for: IdentificationIdentification CommunicatingCommunicating

• FightingFighting• DefendingDefending• CourtingCourting• WarningWarning• CallingCalling

HuntingHunting

Page 6: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Dolphin’s Speech(3)Dolphin’s Speech(3)

3 main types3 main types WhistlesWhistles

• SignatureSignature• Non-signatureNon-signature

ClicksClicks Spike trainsSpike trains

Page 7: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

What do we knowWhat do we know

Not muchNot much We know that each dolphin has a unique We know that each dolphin has a unique

whistle called signature whistle.whistle called signature whistle. The signature whistle is similar to those The signature whistle is similar to those

that are in close contact with the baby that are in close contact with the baby dolphindolphin

Page 8: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

DataData

164 files containing sounds of one dolphin 164 files containing sounds of one dolphin whose name is known.whose name is known.

Average file length is 7 secAverage file length is 7 sec Total data length less than 20 minutes out Total data length less than 20 minutes out

of which about half is silenceof which about half is silence The data does not contain all of the The data does not contain all of the

relevant frequenciesrelevant frequencies

Page 9: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

LabelingLabeling

Dolphin NamesDolphin Names Dolphin ID projectDolphin ID project

Pause, Noise, Dolphin Signature Whistles, Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.Dolphin Non-Signature whistles.

Page 10: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Labeling ProblemsLabeling Problems

How do we distinguish between those 2 How do we distinguish between those 2 whistles?whistles?

How to distinguish between whistles and non-How to distinguish between whistles and non-whistles?whistles? They co-occurThey co-occur

How to determine the duration of the label?How to determine the duration of the label? Should close labels be labeled as one label?Should close labels be labeled as one label? This has an effect on the modelThis has an effect on the model

Some signals are weak, probably due to a Some signals are weak, probably due to a change in the dolphins directionchange in the dolphins direction

Page 11: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Mapping from Labels to ModelsMapping from Labels to Models

LabelLabel ModelModel

dd Signature WhistlesSignature Whistles

dp, mddp, md Non-Signature WhistlesNon-Signature Whistles

click, electnoise, click, electnoise, electricnoise, h#, H#, electricnoise, h#, H#,

MachineSpike, sMachineSpike, s

GARBAGEGARBAGE

paupau PAUSE (Water)PAUSE (Water)

Page 12: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Label StatisticsLabel Statistics

PAUSE SIGWHISTLE GARBAGE DOLPHIN

#occurrences

756 633 13 24

Accumulated time (in

secs)

466 320 7.1 11.3

Average time per

occurrence

0.6 0.5 0.55 0.47

Page 13: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Previous WorkPrevious Work

Dolphin-ID Project by Tanja, Alan and YueDolphin-ID Project by Tanja, Alan and Yue Task: To identify dolphin ID using their Task: To identify dolphin ID using their

signature whistlessignature whistles 51 labeled files by Alan51 labeled files by Alan 13 HMMs: 10 for each dolphin + DOLPHIN, 13 HMMs: 10 for each dolphin + DOLPHIN,

PAUSE, and GARBAGEPAUSE, and GARBAGE Use Janus to do training and testingUse Janus to do training and testing Try different kinds of featuresTry different kinds of features

Page 14: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Our WorkOur Work

Model Generalized Signature WhistlesModel Generalized Signature Whistles Label More FilesLabel More Files Create HMMs for signature whistles, non-Create HMMs for signature whistles, non-

signature whistles, garbage, and pausesignature whistles, garbage, and pause Train and test the HMMs using JanusTrain and test the HMMs using Janus Evaluate the test results with our own methodEvaluate the test results with our own method Compare different model selectionsCompare different model selections

Page 15: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Signal ProcessingSignal Processing

Tanja scriptsTanja scripts Down samplingDown sampling High Pass FilterHigh Pass Filter FFTFFT LDALDA

Page 16: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

HMM TopologiesHMM Topologies

b m eb m e

b m e m m m

Signature Whistles Non-Signature Whistles

Garbage Pause (Water)

Page 17: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Model SelectionModel Selection

Scheme 1Scheme 1 Signature Whistles, Non-Signature Whistles, Signature Whistles, Non-Signature Whistles,

GARBAGE, PAUSEGARBAGE, PAUSE Scheme 2Scheme 2

Signature Whistles, GARBAGE, PAUSESignature Whistles, GARBAGE, PAUSE Scheme 3Scheme 3

10 HMMs (one for each dolphin), GARBAGE, 10 HMMs (one for each dolphin), GARBAGE, PAUSEPAUSE

Page 18: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

EvaluationEvaluation

We can not use WER here since there are We can not use WER here since there are no words, just segments.no words, just segments.

The method we used was to compute a The method we used was to compute a confusion matrix over hidden states.confusion matrix over hidden states.

Janus treat silence differently and doesn’t Janus treat silence differently and doesn’t show silence classification which show silence classification which complicates the evaluation.complicates the evaluation.

Page 19: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

ExperimentsExperiments

DataData 162 labeled files were used162 labeled files were used Half of the data for training, half for testingHalf of the data for training, half for testing Swap the training set and test setSwap the training set and test set 162 test results all together162 test results all together

FeaturesFeatures The same as those in dolphin-ID projectThe same as those in dolphin-ID project

Model SelectionModel Selection 3 different schemes3 different schemes

Page 20: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Results – Scheme 1Results – Scheme 1

Sig Sig Non-SigNon-Sig GarbageGarbage PausePause

SigSig 58%58% 6%6% 18%18% 34%34%

Non-SigNon-Sig 33%33% 8%8% 37%37% 22%22%

GarbageGarbage 77%77% 0%0% 5%5% 18%18%

PausePause 31%31% 6%6% 27%27% 34%34%

Page 21: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Results – Scheme 2Results – Scheme 2

SigSig GarbageGarbage PausePause

SigSig 79%79% 9%9% 21%21%

GarbageGarbage 52%52% 21%21% 27%27%

PausePause 48%48% 14%14% 38%38%

Page 22: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Results – Scheme 3Results – Scheme 3

SigSig GarbageGarbage PausePause

SigSig 91%91% 0.6%0.6% 8%8%

GarbageGarbage 80%80% 10%10% 10%10%

PausePause 69%69% 1%1% 30%30%

Page 23: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Analysis of ResultsAnalysis of Results

You can only get as good as your labelsYou can only get as good as your labels Scheme 3 is the best to align signature whistles -- Scheme 3 is the best to align signature whistles --

speaker dependentspeaker dependent Scheme 1 is the worst – Not enough data to Scheme 1 is the worst – Not enough data to

model non-signature whistles and garbagemodel non-signature whistles and garbage Scheme 2 is in the middle – speaker independentScheme 2 is in the middle – speaker independent Pause is the most difficult to model – It contains Pause is the most difficult to model – It contains

all different things. We modeled it with only 1 stateall different things. We modeled it with only 1 state

Page 24: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

ConclusionConclusion

Analyzing dolphin sounds is quite different Analyzing dolphin sounds is quite different than analyzing human speech. The than analyzing human speech. The methods used have to be adjusted to the methods used have to be adjusted to the characteristics of the dolphin sounds.characteristics of the dolphin sounds. There is a lot of work to be done in the signal There is a lot of work to be done in the signal

processing stageprocessing stage Partly supervised trainingPartly supervised training It might be better just to construct a model for It might be better just to construct a model for

the labels we are sure and let the model learn the labels we are sure and let the model learn what are signature whistles or units that what are signature whistles or units that discriminate between different labels.discriminate between different labels.

Page 25: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

We also tried …We also tried …

One-state model for non-signature One-state model for non-signature whistles, garbage, and pausewhistles, garbage, and pause-- Segmentation fault in training-- Segmentation fault in training

““Loop back” model for signature whistlesLoop back” model for signature whistles-- The loop back transition makes no difference-- The loop back transition makes no difference

Page 26: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

AcknowledgementAcknowledgement

Tanja SchultzTanja Schultz

Yue PanYue Pan

Alan W BlackAlan W Black

Szu-Chen Stan JouSzu-Chen Stan Jou

Hua YuHua Yu

Page 27: By Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Thank You!Thank You!

Jiazhi OuJiazhi Ou

Tal BlueTal Blue

{jzou, tblum}@cs.cmu.edu{jzou, tblum}@cs.cmu.edu