adam coates at ai frontiers: ai for 100 million people with deep learning
TRANSCRIPT
AdamCoates
SiliconValleyAILab
• Mission: DevelophardAItechnologiesthatletushavesignificantimpactonhundredsofmillionsofusers.
Ø Chooseproblemsthatsignificantlyaffectlargenumbersofpeople.
AdamCoates
AIfor100millionpeople
• Firstgoal:speechrecognitioneverywhere.
Ifyou’reconnectingtointernetforfirsttimein2017,you’relikelyusingamobiledevice.
Speechwilltransformmobiledeviceinterfaces.
AdamCoates
AIfor100millionpeople
• Firstgoal:speechrecognitioneverywhere.
Mobiledevices
Captioning
Homedevices
Cars/Hands-freeinterfaces
AdamCoates
AIfor100millionpeople
• Firstgoal:speechrecognitioneverywhere.
Diversity
Accuracy
Specializedsolutions(e.g.,IVR)
TraditionalLVCSR
Human-levelspeechrecognition
AdamCoates
Speechrecognition:TraditionalASR• TraditionalspeechsystemsbuiltonstandardML+engineering
practices.Features
Computefeatures
Acou
sticM
odel
Predictphonemes
LanguageM
odel
Transcrip
tion
“WhattimeisitinBeijing?”
Mergewithpronunciationandlanguagedata.
Sequ
encem
odel
Combineintostatesequence
Someapplicationscanbesolvedthisway.Butit’shardtoscaleourowncleverness.
AdamCoates
Deeplearning
Data&Computation
Performance DeepLearningalgorithms
Manypreviousmethods
Majoradvantageofdeeplearning:scalability.
Effort
AdamCoates
Speechrecognitionwithdeeplearning
• Replacemostofspeechsystemwithlargeneuralnetwork.
Spectrograms
Languagem
odel
Transcrip
tion
“WhattimeisitinBeijing?”
SimpleLM(nolinguisticinfo)
Deep
Learning
AdamCoates
“DeepSpeech”
• Poureffortintodata+computation.– Trytocatchuptohumanaccuracybyscaling.
Data&Computation
Performance DeepSpeech
Manypreviousmethods
AdamCoates
DeepSpeech
Spectrogram
• Traingiantneuralnetworkstopredictcharactersfromaudio.– Train“endtoend”
[e.g.,Gravesetal.,2006]
Ø Hardpartistrainingatscale andsearchingforbestmodel.
Ø Needdata+computingpower.
AdamCoates
RawTrainingData
0
2000
4000
6000
8000
10000
12000
14000
WSJ Switchboard Fisher DeepSpeech
80 300
2000
11940
Hoursofrawspeechdata
WeneedalotofdataforendtoendDLsystems:usereadspeech.
AdamCoates
Dataaugmentation
• Manyformsofdistortionthatmodelshouldberobustto:– Reverb,noise,farfieldeffects,echo,
compressionartifacts,changesintempo.
• Learntoberobustbytrainingfromdatawithdistortions!– Easiertoengineerdatapipelinethantoengineerrecognitionpipeline.
Rawaudio($$$$) Novelaudio
Augmentation
AdamCoates
Example:farfield
Speech
ImpulseResponse
=*AugmentedFar-fieldspeech
Compare:RealFar-fieldspeech
[BillyJun,Rewon Child,SanjeevSatheesh]
Reduceserrorsonfar-fielddataby15%-20%.Reliesonmodelsearch+large-scaletraining.
AdamCoates
Augmenteddataset
0100002000030000400005000060000700008000090000100000
WSJ Switchboard Fisher DeepSpeech
80 300 2000
~96,000
Synthesizeddata
• Augmentationgreatlyexpandsavailabledata.– Trainedmodelshavehearddecades ofuniqueaudio.
AdamCoates
Compute
TitanXGPU~6TeraFLOPs
“Speedoflight”=3-6weekson1GPU
1experimentrequires>10,000,000,000,000,000,000FLOPs(10sofExaFLOPs)
AdamCoates
Compute
Infiniband network
ScaleouttolargenumbersofGPUs(e.g.,8– 64)
• Cutexperimenttimesto~3-5days.– Achieve~50%ofpeakFLOPson8+GPUs.– Comparabletosupercomputingworkloads.
AdamCoates
DeepSpeechforMandarin
• DeepSpeechisdrivenbydata.–MandarinisverydifferentfromEnglish.• “Tonal”,thousandsofcharacters
DeepSpeechTraining
“Myfavoritesingeris…”
Englishtrainingdata
AdamCoates
DeepSpeechforMandarin
• DeepSpeechisdrivenbydata.–MandarinisverydifferentfromEnglish.• “Tonal”,thousandsofcharacters
DeepSpeechTraining
“我最喜欢的歌手是…”
Mandarintrainingdata
AdamCoates
DeepSpeechforMandarin
• Withafewchanges,asinglealgorithmcanlearnanentirelynewlanguage.– Competitivewithcommitteeofnativespeakersforshortaudioclips.
Ø Learnshybridspeech(e.g.,famouspeople,iphones):
我最喜欢的歌手是Angelababy
AdamCoates
Comparingspeechwithkeyboardinput
• Compareuserperformance/experienceforspeechvs.traditionalkeyboard.[Ruan etal.,arxiv.org/abs/1608.07323]
AdamCoates
TalkType – voice-centrickeyboardforAndroid
• OpportunitytorethinkproductexperiencesaroundspeechandAI.
[NickyChan,Bijit Halder,KennyLiou,Thuan Nguyen,NinaWei]talktype.baidu.com
AdamCoates
AIfor100millionpeople
• DeepLearningisclosinggapwithhumansonspeech,throughscalability.– Stillmoretodo;butitkeepsgettingbetter.
• SpeechalreadyenablingproliferationofnewAIproducts.– Let’smakethemworkforeveryone.
AdamCoates
Thankyou!
AdamCoates
@adampaulcoates
IfyouwanttohelpbringAIto100sofmillionsofpeople,cometalktous!
http://research.baidu.com