deeplearning - graz university of · pdf file deeplearning...

Click here to load reader

Post on 27-May-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Deep Learning Knowledge Discovery and Data Mining 2 (VU) (707.004)

    Roman Kern, Stefan Klampfl

    Know-Center, KTI, TU Graz

    2015-05-07

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 1 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Outline

    1 Introduction

    2 Deep Learning Definition History Approaches

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 2 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Introduction to Deep Learning What & Why

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 3 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    History of Artificial Intelligence

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 4 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Success Stories of Deep Learning

    Unsupervised high-level feature learning Using a deep network of 1 billion parameters, 10 million images (sampled from YouTube), 1000 machines (16,000 cores) x 1 week. Evaluation

    ImageNet data set (20,000 categories) 0.005% random guessing 9.5% state-of-the-art 16.1% for deep architecture 19.2% including pre-training

    https://research.google.com/archive/unsupervised_icml2012.html

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 5 / 33

    https://research.google.com/archive/unsupervised_icml2012.html

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Success Stories of Deep Learning

    Primarily on speech recognition and images

    Interest by the big players Facebook

    Face recognition https://research.facebook.com/publications/480567225376225/ deepface-closing-the-gap-to-human-level-performance-in-face-verification/

    Baidu Speech recognition https://gigaom.com/2014/12/18/ baidu-claims-deep-learning-breakthrough-with-deep-speech/

    Microsoft Deep learning technology centre e.g. NLP - Deep Semantic Similarity Model http://research.microsoft.com/en-us/projects/dssm/

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 6 / 33

    https://research.facebook.com/publications/480567225376225/deepface-closing-the-gap-to-human-level-performance-in-face-verification/ https://research.facebook.com/publications/480567225376225/deepface-closing-the-gap-to-human-level-performance-in-face-verification/ https://gigaom.com/2014/12/18/baidu-claims-deep-learning-breakthrough-with-deep-speech/ https://gigaom.com/2014/12/18/baidu-claims-deep-learning-breakthrough-with-deep-speech/ http://research.microsoft.com/en-us/projects/dssm/

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Prerequisite Knowledge

    Neural Networks Backpropagation Recurrent neural network (good for time series, NLP)

    Optimization Generalisation (over-fitting), regularisation, early stopping Logistic sigmoid, (stochastic) gradient descent

    Hyper Parameters Number of layers, size of e.g. mini-batches, learning rate, ... Grid search, manual search, a.k.a Graduate Student Descent (GSD)

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 7 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Neural Network Properties

    1-layer networks can only separate linear problems (hyperplane)

    2-layer networks with a non-linear activation function can express any continuous function (with an arbitrarily large number of hidden neurons)

    For more than 2 layers, one needs fewer nodes → therefore one wants have deep neuronal networks

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 8 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Introduction

    Neural Network Properties

    Back propagation does not work well for more than 2 layers Non-convex optimization function Uses only local gradient information Depends on initialisation Gets trapped in local minima Generalisation is poor Cumulative backpropagation error signals either shrink rapidly or grow out of bounds (exponentially) (Hochreiter, 1991)

    Severity increases with the number of layers

    Focus shifted to convex optimization problems (e.g., SVM)

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 9 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Deep Learning

    Deep Learning Approaches Overview of the most common techniques

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 10 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Deep Learning Definition

    Definition of Deep Learning

    Several definitions exist Two key aspects:

    1 models consisting of multiple layers or stages of nonlinear information processing

    2 methods for supervised or unsupervised learning of feature representations at successively higher, more abstract layers

    Deep Learning architectures originated from, but are not limited to artificial neural networks

    contrasted by conventional shallow learning approaches not to be confused with deep learning in educational psychology:

    “Deep learning describes an approach to learning that is characterized by active engagement, intrinsic motivation, and a personal search for meaning.”

    Roman Kern, Stefan Klampfl (Know-Center, KTI, TU Graz) Deep Learning 2015-05-07 11 / 33

  • .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .