marios polycarpou - msccis.ucy.ac.cy · • occam’s razor – is a principle that states that...

45
1 funded by: Machine Learning Lecture 1 Marios Polycarpou Director, KIOS Research and Innovation Center of Excellence Professor, Electrical and Computer Engineering University of Cyprus MSc on Intelligent Critical Infrastructure Systems

Upload: others

Post on 03-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

1

funded by:

Machine LearningLecture 1

Marios PolycarpouDirector, KIOS Research and Innovation Center of Excellence

Professor, Electrical and Computer Engineering

University of Cyprus

MSc on Intelligent Critical Infrastructure Systems

Page 2: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

2

Fourth Industrial Revolution

What is a technological revolution? 1st Industrial Revolution (1760) 2nd Industrial Revolution (1900) 3rd Industrial Revolution (1960) 4th Industrial Revolution (2000) – term coined in 2015 by Klaus Schwab,

executive director of the World Economic Forum. Combines hardware, software, and biology (cyber-physical systems), and emphasizes advances in communication and connectivity. It is expected to be marked by breakthroughs in fields such as machine learning, robotics, nanotechnology, biotechnology, the internet of things (IoT), wireless technologies (5G), 3D printing and fully autonomous vehicles.

Page 3: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

3

Sensor technology• wealth of sensors• new generation sensors

Information & Communication Technology (ICT)• store, process and transmit data collected by sensors

Motivation for Machine Learning

Page 4: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

4

Motivation for Machine Learning

Big Data• sensor technology and ICT enabled the collection of

extremely large data sets• contribute to the better perception of complex systems

Internet of Things (IoT)• sensor enabled devices

connected to the internet and able to communicate with each other

Page 5: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

5

Motivation for Machine Learning

Image processing Speech recognition Prediction in financial services Finding patterns in marketing and sales Improving healthcare Automation – no human intervention needed Continuous improvement Wide applications

Page 6: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

6

How is Machine Learning related to Intelligent Critical Infrastructure Systems?

Critical Infrastructure Systems (CIS)

Heterogeneous - Interdependent - InterconnectedRisks, Faults, Attacks

Intelligent Systems and Networks

Sensors, ActuatorsBig Data, Internet of Things

Monitoring, ControlManagement, Security

Information Communication Technologies (ICT)

Big Data Smart Decisions

Page 7: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

7

How is Machine Learning related to Intelligent Critical Infrastructure Systems?

Smart BuildingsSmart Grids

Page 8: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

8

How is Machine Learning related to Intelligent Critical Infrastructure Systems?

Water Systems

Revenue & Water Losses Water Quality Energy Consumption Safety & Security

Page 9: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

9

How is Machine Learning related to Intelligent Critical Infrastructure Systems?

Multi-Agent FormationIntelligent Transportation

Page 10: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

10

How is Machine Learning related to Intelligent Critical Infrastructure Systems?

Smart Cities

Physical Interconnections

Cyber Interconnections

Interdependencies

Page 11: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

11

Defining Machine Learning

A machine learning algorithm is an algorithm that is able to learn from data. Other related terms: data-driven methods, computational intelligence, artificial intelligence (AI).

In general, Artificial Intelligence and Computational Intelligence are bigger concepts aimed at creating intelligent machines that can simulate human thinking capability and behaviour. Machine Learning is a subset of AI that allows machines to learn from data without being programmed explicitly.

Page 12: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

12

Defining Machine Learning

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E (Mitchell, 1997).

T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.

Page 13: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

13

The Task T

Classification Regression Monitoring and anomaly detection Transcription Speech recognition Machine translation Navigation and Control

Page 14: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

14

The Task T -- Classification

x 1, 2,y K Classifier

: 1, 2,nf K

1

2

n

xx

x

x

Features

1, 2,y K

Classes

Page 15: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

15

The Task T -- Regression

x yPredictor

: nf

Page 16: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

16

The Performance Measure P

Error rate; error properties Training set; Test set Different performance measures may be used What are norms?

Examples of norms ( )n R

11

: i

n

i

21

1/22:

n

ii

(1-norm) (2-norm)Euclidean norm

1: max ii n

(∞-norm)

Page 17: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

17

The Experience E

What type of dataset is available?

Types of Learning Supervised Learning (there exist labelled examples) Unsupervised Learning (clustering, dimensionality reduction) Semi-Supervised Learning Reinforcement Learning

Page 18: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

18

Machine Learning – Simple Example

Dataset:

1 2

( ) sin(2 )( )

( , , ) 10N

f x xy f xx x x x N

Page 19: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

19

Adaptive Approximation Model: polynomial

20 1 2

0

ˆ ( ; )M

M jM j

j

f x x x x x

Learning: minimizing a cost function (or error function)

2

1

1 ˆ( ) ( ; )2

N

n nn

J f x y

Machine Learning – Simple Example

Page 20: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

20

Machine Learning – Simple Example

underfitting underfitting

overfittingappropriate capacity

Page 21: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

21

underfitting and overfitting depends on both M and N

Machine Learning – Simple Example

Page 22: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

22

Capacity, Overfitting and Underfitting

• The ability to perform well on previously unobserved inputs is called generalization. This is a key challenge in machine learning.

• Training set vs. test set. Typically, the generalization error (test error) is measuring the performance on a test set.

• The expected test error is greater than or equal to the expected training error

The performance of a machine learning algorithm is based on both:1. Making the training error small2. Making the gap between the training and test error small

Page 23: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

23

Capacity, Overfitting and Underfitting

• Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should choose the simplest one. Named after William of Ockham (1287-1347).

• Statistical Learning Theory provides various means of quantifying model capacity. The Vapnik-Chervonenkis dimension (VC dimension) is the most well known. However, it is not very practical with advanced machine learning algorithms

Page 24: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

24

2 2

1

1 ˆ( ) ( ; )2 2

N

R n nn

J f x y

Regularizarion

Page 25: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

25

Hyperparameters and Validation Sets

• Hyperparameters – parameters that affect the performance of the algorithm but they are not adapted by the learning algorithm itself; e.g., the degree of the polynomial is a capacity hyperparameter; the value of λ in regularization is another hyperparameter.

• Validation Set – is used to select the hyperparameters. Split the training set into two disjoint subsets; one subset of data is used to learn the parameters, the other subset (validation set) is used to estimate the generalization error in order to update the hyperparameters; e.g.; 80% of data is used for training and 20% for validation.

• Three types of data: Training set, Validation set, Test set.

Page 26: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

26

Batch Learning vs. Online Learning

• Batch Learning – machine learning methods that are based on learning on the entire training data set.

• Online Learning – machine learning methods that are based on data becoming available in sequential order and is sued to update the parameters at each step. Data does not need to be stored after it is used to update the adjustable parameters

• Something in between batch learning and online learning is to use a window of data (not the entire data) to update the parameters.

Page 27: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

27

Mathematical Foundations for Machine Learning

Linear Algebra Optimization Approximation Theory Probability Theory Random Signals and Systems

Page 28: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

28

Optimization

Optimization is fundamental in multiple science and engineering fields Main idea is to take the derivative and set it to zero (but it soon becomes 

more complicating than that……)

Page 29: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

29

OptimizationOptimization problem:

minimize

subject to 0 1,2,

0 ', ' 1

, '

,,

nx

i

i

F x

c x i

c x i m m

m

n

R

:{ }:i

F xc

Objective function constraint functions

Some notation: : nF R R

Gradient of :F1 2

, , ,n

F F FFx x x

Hessian matrix:( symmetric matrix)

2 2 2

22

2

2

12

1 1

2

1 2

i j

nn

F F Fx x x x xn

FFx x

F Fx x x

n n

Page 30: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

30

Optimization

Necessary and sufficient conditions for unconstrained optimization

Global

minimum

(strong) local 

minimum(weak) local 

minima

f x

x

1

minimizex

f xR

(univariate case)

0

0

f x

f x

Necessary conditions for         to be a minimum

0

0

f x

f x

Sufficient conditions for         to be a minimum

x

x

Page 31: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

31

Optimization

1 2

minmize

, ,,n

n

xF x

F x F x xx

R

(multivariate case)

2

0F x

F x

NecessaryConditions is positive semi‐definite

2

0F x

F x

SufficientConditions is positive definite

Page 32: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

32

Optimization

When is a matrix A positive (semi)‐definite?

is positive‐definite if                      for any non‐zero     .

is positive‐semidefinite if

Lemma:       

A ( 0)A 0x Ax x

A ( 0)A 0 .xx Ax

0 0 1, ,iA A i n

all the eigenvalues are positive

Page 33: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

33

OptimizationConvexity:The set       is convex if

A function                       is convex over       if 

Lemma: If                              and       is convex over        then      is a global minimum of         iff                          .

S (1 ) , , 0,1 .x y S x y S

convex Not convex

:F S R S (1 ) (1 ) , , 0,1F x y F x F y x y S

x y F x

F y

1( , )CF FF

xnR 0F x

Page 34: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

34

OptimizationExample: Quadratic function 1minimize

2nxc x x Gx

R

n n nc G R R symmetric

0F x c Gx F x Gx c

2F x G x F xis a minimum of 0

Gx cG

Note:               is convex  F x 0G

Warning:  not all functions that we want to minimize are smooth

Page 35: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

35

OptimizationContour plots:

for different values of      .

F x

2 21 2 1 2,F x x x x

1 2 21 2 1 2 1 2 2, 4 2 4 2 1xF x x e x x x x x

Page 36: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

36

Optimization Linear Programming:  (Simplex method)

Standard Form: 

Unconstrained Minimization: (Steepest descent;   Newton method)

Constrained Minimization: (Gradient Projection; Penalty methods ;   Lagrange methods)

minimize

subject to0

nxc x

Ax bx

R

, ,A b c are given.

minnxF x

R

minimize

0subject to

0

nxF x

c xd x

R

Page 37: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

37

Optimization

Optimization Algorithms

What are the performance criteria?• Convergence• Rate of convergence• Complexity

Page 38: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

38

Optimization

Algorithms for Unconstrained Minimization

(A) Steepest Descent Method (Gradient Method)

Start

minnxF x

R

1 20 kx x x x vectors

1k k k kx x F x 0k

k is determined be a linear search so that         minimizes           in 

the direction                         from       .1kx F x

k kd F x kx

Page 39: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

39

Optimization

Trade‐offs in choosing       :• speed• convergence• complexity

1 const.k k kx x F x

Simplest gradient descent: 

Page 40: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

40

Optimization

Intuition for Steepest Descent

x

x

x

1-dimentional xR

0x1x

x

2-dimentional 2xR

Contour lines

1 2,F x x

Page 41: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

41

Optimization

Example:  212

F x x 1

0(1 )

k k

k k k

kk

F x xx x

x x

x

Example:  1 2 21 2 1 2 1 2 2, 4 2 4 2 1xF x x e x x x x x

1

1

2 21 2 1 2 1 2

1 21 2

4 2 4 8 6 1,

4 4 2

x

x

e x x x x x xF x x

e x x

1k k k kx x F x

Page 42: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

42

OptimizationIn continuous‐time:

as

After rescaling:

11

k kk k k k

x xx F x F xx

0 x F x 00x x

x F x 0 is symmetric & positive definite

Example:  minnxx Gx F x

R

0; (0)F x G x xx Gx x

0Gtx t e x

Page 43: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

43

Optimization(B) Newton’s Method: min

nxF x

R( is convex)F

0F x (necessary condition)

2

( )k

k

x x

F x

k kdF x F x F x x xdx

Let 2:H x F x

Hessian

( ) 0k k kF x F x H x x x (we want)

1 10 ( )k k k k kF x F x H x x x

11k k k kx x H x F x (assuming exists) 1

kH x

Page 44: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

44

Optimization

In continuous‐time: 1 ; 0x H x F x

Example:  212

F x x

11 1F x x H x H x

11

1 0.k k k k

k k k

x x H x F xx x x

0 1 0x x

Convergence in one step!

Page 45: Marios Polycarpou - msccis.ucy.ac.cy · • Occam’s Razor – is a principle that states that among competing hypothesis that explain known observations equally well, we should

45

Optimization

Example:  1 2 21 2 1 2 1 2 2, 4 2 4 2 1xF x x e x x x x x

1

1

2 21 2 1 2 1 2

1 21 2

4 2 4 8 6 1,

4 4 2

x

x

e x x x x x xF x x

e x x

1

2 21 2 1 2 1 2 1 2

1 21 2

4 2 4 16 10 9 4 4 6,

4 4 6 4x x x x x x x x x

H x x ex x

1

1 22 2

1 2 1 2 1 221 22 1 2 1 2

1

1 2

4 (4 4 6), 4 2 4

(4 4 6)8( 2 2 )16 10 9

xx x

eH x x x x x xx xx x x x x

x x

11k k k kx x H x F x