MachineLearning@AmazonRalfHerbrich
9/20/16 1
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 2
Amazon’sVirtuousCycles
Growth CustomerExperience
Traffic
Sellers
Selection & Convenience
LowerPrices
Lower Cost Structure
1. Savingcostsbybetterplanning(e.g.,forecasting)2. Savingcostsbyautomatinghumandecisionmaking(e.g.,pricing)3. Increasingrevenuebylow-frictionexperience(e.g.recommendation)
39/20/16
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 4
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 5
MachineLearning:TheScience
Science• ComputerScience• Statistics• Neuroscience• OperationsResearch
ArtificialIntelligence• Ruleextractionfromdata• Inspiredbyhumanlearning• Adaptivealgorithms
Engineering• Training:DataàModels• Prediction:Modelsà Forecast• Decision:Forecastà Actions
9/20/16 6
MachineLearning:AProgramerPerspective
TraditionalProgramming
MachineLearning
ComputerData
ProgramOutput
ComputerData
OutputProgram
79/20/16
HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Hersmilewasmagic.….
MLExamples:Named Entity Extraction
8
Author AnnotatorHighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Her smilewasmagic.….
if (word is capitalized) and(word before is ‘in’) thenPLACE
else if (word = ‘her’) or (word = ‘his’)or (word = ‘he’) or (word = ‘she’) thenPERSON
...
Data Output (Annotation)
Program
9/20/16
HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Hersmilewasmagic.….
MLExamples:Named Entity Extraction
9
Author Annotator…"Robert,hurryup!IknewIshouldhavemarriedayoungerman!".….
Machine Learning Service
HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim.…Her smilewasmagic.….
9/20/16
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 10
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 11
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 12
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 13
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 14
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 15
HistoryofMachineLearning
• DeepNeuralNetworks
• Fasthardware(GPUs)
• Distributedcomputingandstorage
• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture
2015("AI")
• Distributedcomputingandstorage
• Adaptivesystems
• Learning=Scalable,AdaptiveComputationforVariousBigData
2010(“Service”)
•Wideapplicationinproducts
• StatisticalModelingofData
• Learning=ParameterEstimationorInference
2005(“GraphicalModels”)
• StatisticalLearningTheory
• ScoringSystems
• Learning=OptimizationofConvexFunctions
2000(“KernelMachines”)
• ExpertSystems• Decision-TreeLearning(C4.5)
• Learning=MethodstoautomaticallybuildExpertSystems
1990(“Symbolic”)
• NeuralNetworks
• ArtificialIntelligence
• Learning=AdaptationofNeuronsbasedonExternalStimuli
1980(“Neuro”)
9/20/16 16
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 17
MachineLearningOpportunities@Amazon
Retail• DemandForecasting
• VendorLeadTimePrediction
• Pricing• Packaging• SubstitutePrediction
Customers• ProductRecommendation
• ProductSearch• VisualSearch• ProductAds• ShoppingAdvice• CustomerProblemDetection
Seller• FraudDetection• PredictiveHelp• SellerSearch&Crawling
Catalog• Browse-NodeClassification
•Meta-datavalidation
• ReviewAnalysis• HazmatPrediction
Digital• Named-EntityExtraction
• XRay• PlagiarismDetection
• EchoSpeechRecognition
• KnowledgeAcquisiion
189/20/16
Locations
19
MLSeattle
MLBangalore
S9
A9A2Z
9/20/16
Ivona
MLBerlin
Evi
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning• MachineLearningandArtificialIntelligence
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 20
Forecasting
• Givenpastsalesofaproductineveryregion,predictregionaldemanduptooneyearintothefuture
Setting
• NewProducts:Nopastdemand!• Regionalized:100+fulfillmentcentersworldwide• Sparsity:Hugeskew– manyproductssellveryfewitems• Seasonal:Hugevariationduetoexternal,seasonalevents• Distributions:Futureisuncertainè predictionsmustbedistributions• Scale:20M+productsfulfilledbyAmazonalone!• Orders:Customersdemandbundle ofproducts• Censored:Pastsales≠pastdemand(inventoryconstraint)
Challenges
9/20/16 21
DemandForecasting
229/20/16
Training Range: Non-fashion items have longer training ranges that we can leverage. Need to information share across new and old products.
Seasonality: This item has Christmas seasonality with higher growth over time. This is where we need growth features in addition to date features.
Missing Features or Input: Unexplained spikes in demand are likely caused by missing features or incomplete input data.
Example Softlines product to illustrate the challenges of forecasting.
NewProducts
239/20/16
Learning across groups of products with varying ages to improve accuracy for new products
New Product Without Sharing: Product is less than 1 year old and hasn’t seen all dates before. Features learned per product are not very strong.
Red = Actual DemandBlack = Forecast
New Product With Sharing: Once we share data across groups of products, we start to see the appropriate lift for new holidays.
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 24
ASINMachineTranslation
ASINs
Con
tribu
tion
Prof
it
Human Translation
Machine Translation
Selection Gap
9/20/16 25
MachineTranslationPipeline
9/20/16 26
InputNormalization Tokenization
SentenceSegmentationLowercasing
Translation/Decoding Recasing
Post-processing De-Tokenization
InputRequest
Detection&EscapingofNon-translatables
Re-insertionof(converted)Nontranslatables
TranslatedRequest
MachineTranslation:Deep Dive
p(English |Chinese) = p(English)× p(Chinese | English)p(Chinese)
∝ p(English)× p(Chinese | English)
Language Model
Translation Model
• Language Model: What are fluent English sentences?
• Translation Model: What English sentences account well for a given Chinese sentence?
9/20/16 27
Overview
• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning
• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems
9/20/16 28
AutomatedProduceInspection:TheGoal
NewAutomated InspectionCurrent Inspection
Computer Vision
Conclusions
• MachineLearningisanemergingandscientificallyyoungdiscipline!
• MachineLearning“translates”datafromthepastintoaccuratepredictionsaboutthefuture!
• AmazonhasabroadrangeofapplicationsforMachineLearning– it’scentraltoAmazon’sbusiness!
9/20/16 30
Thanks!
9/20/16 31