statistical machine learning an overview · z. ghahramani (2015) probabilistic machine learning and...

Statistical Machine Learningan overview

Andreas Svensson

Department of Information TechnologyUppsala University

[email protected] Department of Information Technology, Uppsala University

mailto:[email protected]

Machine Learning: a very subjective view

‘Classical’ approach

Firstprinciplesmodeling

CalibrationSimulationsPredictions

. . .

Kirchhoff’s lawsIdeal gas law

Newton’s laws of motion

Stefan-Boltzmann law

Drag equation. . .

Stokes’ law

Graham’s law

Data

Model withunknown

parameters Model

Statistical machine learning approach

Flexibleblack-boxmodeling

LearningSimulationsPredictions

. . .

Data

Flexiblemodel structure Model

I Level of detailedknowledge present?

I Model-reduction:manual or automated?

I When to makeapproximations?

I Interpretation of themodel?

I Representinguncertainty

1 / 21 [email protected] Department of Information Technology, Uppsala University


Outline

I Introduction

I Deep learning

I Gaussian processes

I Outlook



Introduction to neural networks (I/III)

What is a neural network?

A neural network (NN) is a nonlinear function y = gθ(x) from aninput variable x to an output variable y, parameterized by θ.

Linear regression models the relationship between the input x andthe output y as a linear combination

y =

n∑i=1

xiθi + θ0 = xTθ,

where θ is the parameters composed by the “weights” θi and theoffset (“bias”) term θ0,

θ =(θ0 θ1 θ2 · · · θn

)T,

x =(1 x1 x2 · · · xn

)T.



Introduction to neural networks (II/III)

1. Form m1 linear combinations of the input

a(1)j =

n∑i=1

θ(1)ji xi + θ

(1)j0 = xTθ

(1)j , j = 1, . . . ,m1.

2. Apply a (simple) nonlinear transformation

z(1)j = f

(a

(1)j

), j = 1, . . . ,m1.

(common choices: f(a) = 1/(1 + e−a), f(a) = tanh(a) or f(a) = max(0, a))

3. Form a linear combination of the zjs

y =

m1∑j=1

θ(2)j z

(1)j + θ

(2)0 = z(1)T

θ(1)j



Introduction to neural networks (III/III)

y(θ) =

m1∑j=1

θ(2)j f

( n∑i=1

θ(1)ji xi + θ

(1)j0

)︸︷︷︸

z(1)j

+θ(2)0

...

f

f

f

...f

f

x1

x2

xn

z(1)

y

θ(1)11

θ(1)m1n

θ(2)1

θ(2)m1

Inputs Hidden layer Output layer



Deep neural networks

......

......

......

......

...

x1

x2

x3

xn

z(1)1

z(1)m1

z(8)1

z(8)m8

y

How to learn the unknown parameters θ(d)ij ?

Differentiation of (y − y)2 with respect to θ(d)ij , backpropagation.



Deep learning: Image classification

Input: pixels of an imageOutput: object identityEach hidden layer extractsincreasingly abstractfeatures.

M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networksComputer Vision - ECCV.



Deep learning: A recent example

An AI defeated a human professional forthe first time in the game of Go

D. Silver et al. (2016) Mastering the game of Go with deep neural networks and tree search, Nature, vol 529.



Deep learning—Why now?

Neural networks have been around for more than fifty years. Why have theybecome so popular now (again)?

To solve really interesting problems you need:

1. Efficient learning algorithms

2. Efficient computational hardware

3. A lot of labeled data!

This has not been fulfilled to a satisfactory level until the last 5-10 years.



Outline

I Introduction

I Deep learning


I Outlook



GP: a probability distribution over functions

f1

Gaussian distribution for f1,

p(f1) = 1√2πσ2

exp(− 12σ2 (f1 − µ)2)

f1 f2

Multivariate Gaussian distribution for [f1 f2],

p([f1 f2]) = 12π|Σ| exp(−1

2(([f1 f2] − µ)Σ−1([f1 f2] − µ)T)

f1 f2 f3 f4 f5

Multivariate Gaussian distribution for [f1 f2 f3 f4 f5] = f ,

p(f) = 12π|Σ| exp(−1

2((f − µ)Σ−1(f − µ)T)

x

f(x)

Gaussian Process distribution for f(x),

p(f(x?)) = 12π|K(x?,x?)| exp(−

12((f(x?)−µ(x?))TK(x?, x?)−1(f(x?)−µ(x?)))



GP: a probability distribution over functions

x

f(x)

Distribution for f(x) conditioned on one observation (orange dot)



Gaussian Processes: a flexible model

x

f(x)

x

f(x)

x

f(x)

x

f(x)

There are ‘tuning knobs’ for the smoothness etc.



Gaussian Processes: Fault detection I

O. Samuelsson et al. (2016), Improved monitoring and fault detection of wastewater treatment processes withGaussian process regression, manuscript.



Gaussian Processes: Fault detection II

0 20 40 60 80 100 120 140 160 1800

2

4

dissolvedox

ygen

(mg/l)

0 20 40 60 80 100 120 140 160 1800

2

4

time (hours)

dissolvedox

ygen

(mg/l)

A. Svensson, J. Dahlin and T.B. Schon (2015) Marginalizing Gaussian Process Hyperparameters using SequentialMonte Carlo. IEEE CAMSAP.



GP optimization

I Optimization of some parameters

I Expensive to evaluate the objective function (e.g., a lengthycomputer simulation)

M. Osborne, R. Garnett and S. Roberts (2009) Gaussian Processes for Global Optimization, LION3

B. Shahriari et al. (2016) Taking the Human Out of the Loop: A Review of Bayesian Optimization, Proceedingsof the IEEE vol 104



Outline

I Introduction

I Deep learning


I Outlook



Of course there is more...

I RegressionI Classification

I Support vector machinesI Decision treesI Boosting

I Clustering

I Reinforcement learning

I Probabilistic programmingI Computational methods

I Markov chain Monte CarloI Sequential Monte CarloI Variational inference



Learn more

General

Z. Ghahramani (2015) Probabilistic machine learning and artificial intelligence, Nature, Vol 521

G. James, D. Witten, T. Hastie and R. Tibshirani (2013) An introduction to statistical learning with applicationsin R, Springer

C. Bishop (2006) Pattern recognition and machine learning, Springer

Deep learning

I. Goodfellow, Y. Bengio and A. Courville (2016) Deep learning, Book in preparation for MIT Press,http://www.deeplearningbook.org/

Y. LeCun, Y. Bengio and G. Hinton (2015) Deep learning, Nature, vol 521

http://deeplearning.net/

Gaussian processes

C. Rasmussen and K. Williams (2006) Gaussian processes for machine learning, MIT Press

http://gaussianprocess.org/

Conferences: Neural Information Processing Systems (NIPS, http://nips.cc/), the International Conference onMachine Learning (ICML, http://icml.cc/)

Journals: Journal of Machine Learning Research (JMLR), IEEE Transactions Pattern Analysis and MachineIntelligence (PAMI)


http://www.deeplearningbook.org/

http://deeplearning.net/

http://gaussianprocess.org/

http://nips.cc/

http://icml.cc/


New course

Statistical Machine Learning 5 hp, January-March 2017

I Classical and Bayesian linear regression

I Classification via logistic regression

I Linear discriminant analysis

I Gaussian processes and kernel methods

I Cross-validation and model selection techniques

I Regularization (ridge regression and the LASSO)

I Regression and classification trees

I Principal component analysis

I k-means clustering

I Neural networks

http://www.it.uu.se/edu/course/homepage/sml


http://www.it.uu.se/edu/course/homepage/sml


Thank you!

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x1

x2

x3

xn

z(1)1

z(1)m1

z(8)1

z(8)m8

y

Questions?

x

f(x)

x

f(x)

x

f(x)

x

f(x)



statistical machine learning an overview · z. ghahramani (2015) probabilistic machine learning and...

Documents