learn to build an app to find similar images using deep learning- piotr teterwak
TRANSCRIPT
- 1. Learn to Build an App to Find Similar Images using Deep Learning Piotr Teterwak Dato, Machine Learning Engineer
- 2. 2 First things first, installation http://bit.ly/dato_pydata import graphlab sf = graphlab.Sframe() sa = graphlab.Sarray(range(100)).apply(lambda x: x)
- 3. 3 Who is Dato?
- 4. Graphlab Create: Production ML Pipeline DATA YourWebServiceor IntelligentApp ML Algorithm Data cleaning & feature eng Offline eval & Parameter search Deploy model Data engineering Data intelligence Deployment Goal: Platform to help implement, manage, optimize entire pipeline
- 5. Deep Learning
- 6. Todays talk
- 7. Todays tutorial Deep Learning Overview Training a model on MNIST Building a Similar Dress Recommender Deploying the Dress Recommender
- 8. Features are key to machine learning
- 9. 9 Simple example: Spam filtering A user opens an email - Will she thinks its spam? Whats the probability email is spam? Text of email User info Source info Input: x MODEL Yes! No Output: Probability of y
- 10. 10 Feature engineering: the painful black art of transforming raw inputs into useful inputs for ML algorithm E.g., important words, complex transformation of input, MODEL Yes! No Output: Probability of y Feature extraction Features: (x) Text of email User info Source info Input: x
- 11. Deep Learning for Learning Features
- 12. 12 Linear classifiers Most common classifier - Logistic regression - SVMs - Decision correspond to hyperplane: - Line in high dimensional space w0 + w1 x1 + w2 x2 > 0 w0 + w1 x1 + w2 x2 < 0
- 13. 13 What can a simple linear classifier represent? AND 0 0 1 1
- 14. 14 What can a simple linear classifier represent? OR 0 0 1 1
- 15. 15 What cant a simple linear classifier represent? XOR 0 0 1 1 Need non-linear features
- 16. 16 Non-linear feature embedding 0 0 1 1
- 17. 17 Graph representation of classifier: Useful for defining neural networks x 1 x 2 x d y 1 w2 w0 + w1 x1 + w2 x2 + + wd xd > 0, output 1 < 0, output 0 Input Output
- 18. 18 What can a linear classifier represent? x1 OR x2 x1 AND x2 x 1 x 2 1 y -0.5 1 1 x 1 x 2 1 y -1.5 1 1
- 19. Solving the XOR problem: Adding a layer XOR = x1 AND NOT x2 OR NOT x1 AND x2 z 1 -0.5 1 -1 z1 z2 z 2 -0.5 -1 1 x 1 x 2 1 y 1 -0.5 1 1 Thresholded to 0 or 1
- 20. 20 http://deeplearning.stanford.edu/wiki/images/4/40/Network3322.png Deep Neural Networks P(cat|x) P(dog|x)
- 21. 21 Deep Neural Networks Can model any function with enough hidden units. This is tremendously powerful: given enough units, it is possible to train a neural network to solve arbitrarily difficult problems. But also very difficult to train, too many parameters means too much memory+computation time.
- 22. 22 Neural Nets and GPUs Many operations in Neural Net training can happen in parallel Reduces to matrix operations, many of which can be easily parallelized on a GPU.
- 23. 24 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 24. 25 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 25. 26 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 26. 27 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 27. 28 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 28. 29 Convolutional Neural Nets Strategic removal of edges Input Layer Hidden Layer
- 29. 30 Convolutional Neural Nets http://ufldl.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif
- 30. 31 Pooling layer Ranzato, LSVR tutorial @ CVPR, 2014. www.cs.toronto.edu/~ranzato
- 31. 32 Pooling layer http://ufldl.stanford.edu/wiki/images/6/6c/Pooling_schematic.gif
- 32. 33 Final Network Krizhevsky et al. 12
- 33. Applications to computer vision
- 34. 35 Image features Features = local detectors - Combined to make prediction - (in reality, features are more low-level) Face! Eye Eye Nose Mouth
- 35. 36 Standard image classification approach Input Computer$vision$features$ SIFT$ Spin$image$ HoG$ RIFT$ Textons$ GLOH$ Slide$Credit:$Honglak$Lee$ Extract features Use simple classifier e.g., logistic regression, SVMs Face
- 36. 37 Many hand crafted features exist Computer$vision$features$ SIFT$ Spin$image$ HoG$ RIFT$ Textons$ GLOH$ Slide$Credit:$Honglak$Lee$ but very painful to design
- 37. 38 Change image classification approach? Input Computer$vision$features$ SIFT$ Spin$image$ HoG$ RIFT$ Textons$ GLOH$ Slide$Credit:$Honglak$Lee$ Extract features Use simple classifier e.g., logistic regression, SVMs FaceCan we learn features from data?
- 38. 39 Use neural network to learn features Input Learned hierarchy Output Lee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009
- 39. Sample results Traffic sign recognition (GTSRB) - 99.2% accuracy House number recognition (Google) - 94.3% accuracy 40
- 40. Krizhevsky et al. 12: 60M parameters, won 2012 ImageNet competition 41
- 41. 42 ImageNet 2012 competition: 1.2M images, 1000 categories 42
- 42. 43 Application to scene parsing Carlos Guestrin 2005-2014 Y LeCun MA Ranzato Semantic Labeling: Labeling every pixel with the object it belongs to [ Farabet et al. ICML 2012, PAMI 2013] Would help identify obstacles, targets, landing sites, dangerous areas Would help line up depth map with edge maps
- 43. Challenges of deep learning
- 44. Deep learning score card Pros Enables learning of features rather than hand tuning Impressive performance gains on - Computer vision - Speech recognition - Some text analysis Potential for much more impact Cons
- 45. Deep learning workflow Lots of labeled data Training set Validation set 80% 20% Learn deep neural net model Validate
- 46. Deep learning score card Pros Enables learning of features rather than hand tuning Impressive performance gains on - Computer vision - Speech recognition - Some text analysis Potential for much more impact Cons Computationally really expensive Requires a lot of data for high accuracy Extremely hard to tune - Choice of architecture - Parameter types - Hyperparameters - Learning algorithm - Computational + so many choices = incredibly hard to tune
- 47. 48 Can we do better? Input Learned hierarchy Output Lee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009
- 48. Deep features: Deep learning + Transfer learning
- 49. 50 Transfer learning: Use data from one domain to help learn on another Lots of data: Learn neural net Great accuracy Some data: Neural net as feature extractor + Simple classifier Great accuracy on new problem Old idea, explored for deep learning by Donahue et al. 14
- 50. 51 Whats learned in a neural net Neural net trained for Task 1 Very specific to Task 1More generic Can be used as feature extractor vs.
- 51. 52 Transfer learning in more detail Neural net trained for Task 1 Very specific to Task 1More generic Can be used as feature extractor Keep weights fixed! For Task 2, learn only end part Use simple classifier e.g., logistic regression, SVMs Class?
- 52. 53 Using ImageNet-trained network as extractor for general features Using classic AlexNet architechture pioneered by Alex Krizhevsky et. al in ImageNet Classification with Deep Convolutional Neural Networks It turns out that a neural network trained on ~1 million images of about 1000 classes makes a surprisingly general feature extractor First illustrated by Donahue et al in DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition 53
- 53. Transfer learning with deep features Training set Validation set 80% 20% Learn simple model Some labeled data Extract features with neural net trained on different task Validate Deploy in production
- 54. In real life
- 55. What else can we do with Deep Features? 56
- 56. 57 Clustering images Goldberger et al. Set of Images
- 57. Finding similar images 58
- 58. 59 Finding similar images A B C A B C
- 59. Deploying ML in production
- 60. 67 Deployment system Model Historical Data Predictions Batch training Real-time predictions REST Client Model Mgmt Dato Predictive Services Robust, Elastic Direct Train deep model REST API for predictions for new images, text,
- 61. Summary
- 62. Deep learning made easy with deep features Deep learning: exciting ML development Slow, lots of tuning, needs lots of data Can still achieve excellent performance Deep features: reuse deep models for new domains Needs less data Faster training times Much simpler tuning
- 63. 70 Next Build a Deep Learning model with GLC Creating an Image Similarity Service Deploying with Predictive Services
- 64. 71 First things first, installation http://bit.ly/dato_pydata import graphlab sf = graphlab.Sframe() sa = graphlab.Sarray(range(100)).apply(lambda x: x)