cidu2011 banerjee intro to ml 01

120
Tutorial: Introduction to Machine Learning Arindam Banerjee [email protected] Dept of Computer Science & Engineering University of Minnesota, Twin Cities 2011 NASA Conference on Intelligent Data Understanding October 19, 2011

Upload: yahamid

Post on 21-Jul-2016

12 views

Category:

Documents


7 download

DESCRIPTION

Its is an introduction to machine learning.

TRANSCRIPT

Page 1: Cidu2011 Banerjee Intro to Ml 01

Tutorial Introduction to Machine Learning

Arindam Banerjee banerjeecsumnedu

Dept of Computer Science amp Engineering

University of Minnesota Twin Cities

2011 NASA Conference on Intelligent Data Understanding

October 19 2011

Success Stories

Parent of a baby

Medical Doctor Intro to Machine Learning 2

Success Stories

Bioinformatics

Computer vision

Medical imaging

Social network analysis

From UCLA Center on Aging

Intro to Machine Learning 3

Types of Data

bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial

Spatiotemporal

bull Data Dependencies ndash Independent and Identically

Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies

bull Other Considerations ndash Observable vs Latent variables

Average wind direction

From NOAA Earth System Research Laboratory

Average annual temperature

Intro to Machine Learning 4

bull Supervised ndash Given learn ndash Examples Classification Regression

bull Unsupervised

ndash Given find structure in X ndash Examples Clustering Embedding

bull Semi-supervised ndash Given and Learn

Types of Learning

Intro to Machine Learning 5

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 2: Cidu2011 Banerjee Intro to Ml 01

Success Stories

Parent of a baby

Medical Doctor Intro to Machine Learning 2

Success Stories

Bioinformatics

Computer vision

Medical imaging

Social network analysis

From UCLA Center on Aging

Intro to Machine Learning 3

Types of Data

bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial

Spatiotemporal

bull Data Dependencies ndash Independent and Identically

Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies

bull Other Considerations ndash Observable vs Latent variables

Average wind direction

From NOAA Earth System Research Laboratory

Average annual temperature

Intro to Machine Learning 4

bull Supervised ndash Given learn ndash Examples Classification Regression

bull Unsupervised

ndash Given find structure in X ndash Examples Clustering Embedding

bull Semi-supervised ndash Given and Learn

Types of Learning

Intro to Machine Learning 5

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 3: Cidu2011 Banerjee Intro to Ml 01

Success Stories

Bioinformatics

Computer vision

Medical imaging

Social network analysis

From UCLA Center on Aging

Intro to Machine Learning 3

Types of Data

bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial

Spatiotemporal

bull Data Dependencies ndash Independent and Identically

Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies

bull Other Considerations ndash Observable vs Latent variables

Average wind direction

From NOAA Earth System Research Laboratory

Average annual temperature

Intro to Machine Learning 4

bull Supervised ndash Given learn ndash Examples Classification Regression

bull Unsupervised

ndash Given find structure in X ndash Examples Clustering Embedding

bull Semi-supervised ndash Given and Learn

Types of Learning

Intro to Machine Learning 5

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 4: Cidu2011 Banerjee Intro to Ml 01

Types of Data

bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial

Spatiotemporal

bull Data Dependencies ndash Independent and Identically

Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies

bull Other Considerations ndash Observable vs Latent variables

Average wind direction

From NOAA Earth System Research Laboratory

Average annual temperature

Intro to Machine Learning 4

bull Supervised ndash Given learn ndash Examples Classification Regression

bull Unsupervised

ndash Given find structure in X ndash Examples Clustering Embedding

bull Semi-supervised ndash Given and Learn

Types of Learning

Intro to Machine Learning 5

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 5: Cidu2011 Banerjee Intro to Ml 01

bull Supervised ndash Given learn ndash Examples Classification Regression

bull Unsupervised

ndash Given find structure in X ndash Examples Clustering Embedding

bull Semi-supervised ndash Given and Learn

Types of Learning

Intro to Machine Learning 5

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 6: Cidu2011 Banerjee Intro to Ml 01

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Classification

Regression

Regularization

Example Land Variable Regression

Nonlinear Models Ensembles

6

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 7: Cidu2011 Banerjee Intro to Ml 01

Classification Linear Models

bull Linear Classification Rule ndash If then ndash If then

bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps

bull Two Aspects Optimization Statistical guarantees

Intro to Machine Learning

+1

-1

7

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 8: Cidu2011 Banerjee Intro to Ml 01

Regression Linear Models

bull Linear Regression

ndash Model

bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise

bull Two Aspects Optimization Statistical Guarantees

Intro to Machine Learning 8

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 9: Cidu2011 Banerjee Intro to Ml 01

Hierarchical Linear Models

bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value

etc) is fed into next node

bull Family of modelsmethods ndash ClassificationRegression trees

Multi-layer Perceptrons Deep Belief Networks

bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees

Intro to Machine Learning 9

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 10: Cidu2011 Banerjee Intro to Ml 01

bull Controlling complexity of predictor is key

ndash Low loss with low complexity predictor guarantees good generalization

bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models

Regularization Generalization

Intro to Machine Learning

x

y

Loss term Regularization term

10

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 11: Cidu2011 Banerjee Intro to Ml 01

bull The Kernel Trick

ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces

bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels

Non-linear Methods

Intro to Machine Learning 11

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 12: Cidu2011 Banerjee Intro to Ml 01

bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging

bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features

Ensembles

Intro to Machine Learning 12

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 13: Cidu2011 Banerjee Intro to Ml 01

High Dimensional Modeling

bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees

bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression

Intro to Machine Learning 13

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 14: Cidu2011 Banerjee Intro to Ml 01

Land Variable Regression

bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)

bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative

Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations

bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations

bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant

Intro to Machine Learning 14

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 15: Cidu2011 Banerjee Intro to Ml 01

Land Regions for Prediction Temp Precip

Intro to Machine Learning 15

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 16: Cidu2011 Banerjee Intro to Ml 01

Ordinary Least Squares

bull Naive method Use ordinary least squares (OLS)

bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model

θ

Location 1 Location L

Response Variable

16

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 17: Cidu2011 Banerjee Intro to Ml 01

Sparse Group Lasso (SGL)

bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo

Location 1 Location L

Response Variable 17

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 18: Cidu2011 Banerjee Intro to Ml 01

Example Land Variable Regression

Relevant Variables for Temperature Prediction in Brazil

Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees

18

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 19: Cidu2011 Banerjee Intro to Ml 01

Error (RMSE) SGL NC OLS

Variable Region Sparse Group Lasso Network Clusters Ordinary Least

Squares A

ir Te

mpe

ratu

re

Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383

Prec

ipita

tion

Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 20: Cidu2011 Banerjee Intro to Ml 01

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Basics

Inference

Example Matrix Completion

Example Drought Detection

Example Text Analysis

20

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 21: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Graphical Models What and Why bull Graphical models

ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult

bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency

bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)

bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

21

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 22: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Example Burglary Network

22

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 23: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Bayesian Networks

bull Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

23

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 24: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Example II Rain Network

24

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 25: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Latent Variable Models

bull Bayesian network with hidden variables ndash Semantically more accurate less parameters

Heart disease

25

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 26: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Inference

bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called

bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)

26

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 27: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Inference Algorithms bull Graphs without loops Tree-structured Graphs

ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases

bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)

bull Graphs with loops ndash Junction tree algorithms

bull Convert into a graph without loops bull May lead to exponentially large graph

ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice

ndash Approximate inference

27

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 28: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Approximate Inference bull Variational Inference

ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions

bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation

bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)

bull Powerful family of methods ndash Gibbs sampling

bull Useful special case of MCMC methods

28

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 29: Cidu2011 Banerjee Intro to Ml 01

The Inference Problem

Intro to Machine Learning 29

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 30: Cidu2011 Banerjee Intro to Ml 01

Bayes Nets to Factor Graphs

Intro to Machine Learning 30

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 31: Cidu2011 Banerjee Intro to Ml 01

Factor Graphs Product of Local Functions

Intro to Machine Learning 31

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 32: Cidu2011 Banerjee Intro to Ml 01

Marginalize Product of Functions (MPF)

bull Marginalize (or maximize) product of functions

bull For g1(x1) we have bull For g3(x3) we have

Intro to Machine Learning

Distributive Law ab + ac = a(b+c)

32

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 33: Cidu2011 Banerjee Intro to Ml 01

Example Computing g1(x1)

Intro to Machine Learning 33

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 34: Cidu2011 Banerjee Intro to Ml 01

Compute product of descendants with f Then do not-sum over parent

Message Passing

Intro to Machine Learning

Compute product of descendants

34

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 35: Cidu2011 Banerjee Intro to Ml 01

The Sum-Product Algorithm

bull To compute gi(xi) form a tree rooted at xi

bull Starting from the leaves apply the following two rules ndash Product Rule

At a variable node take the product of descendants ndash Sum-product Rule

At a factor node take the product of f with descendants then perform not-sum over the parent node

bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc

Intro to Machine Learning 35

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 36: Cidu2011 Banerjee Intro to Ml 01

Example Computing g3(x3)

Intro to Machine Learning 36

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 37: Cidu2011 Banerjee Intro to Ml 01

Hidden Markov Models (HMMs)

Intro to Machine Learning

Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT

Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T

p(z1T|x1T)

Similar problems for chain-structured Conditional Random Fields (CRFs)

37

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 38: Cidu2011 Banerjee Intro to Ml 01

Distributive Law on Semi-Rings

bull Idea can be applied to any commutative semi-ring bull Semi-ring 101

ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)

Intro to Machine Learning

bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip

38

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 39: Cidu2011 Banerjee Intro to Ml 01

Message Passing in General Graphs

bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters

bull General Graphs ndash Active research topic

bull Progress has been made over the past 10 years ndash Loopy Message passing

bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo

ndash New approaches to convergent and correct message passing

bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter

Intro to Machine Learning 39

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 40: Cidu2011 Banerjee Intro to Ml 01

Temporal Models

bull State Space Models

bull Time Series Analysis

Intro to Machine Learning

State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems

Time Domain Models

ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting

Spectral Analysis

Spectral Density Fourier Transforms Wavelet Transforms Filtering

40

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 41: Cidu2011 Banerjee Intro to Ml 01

Spatial Models

bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)

bull Models for Discrete Data ndash Markov Random Fields (MRFs)

bull Structure Learning ndash Finding statistical dependencies

Intro to Machine Learning

From H Wallach Introduction to Gaussian process regression 2005

Gau

ssia

n P

roce

ss

41

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 42: Cidu2011 Banerjee Intro to Ml 01

Drought Detection

bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences

bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl

bull Dataset Climate Research Unit

ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude

Intro to Machine Learning 42

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 43: Cidu2011 Banerjee Intro to Ml 01

Detection with Markov Random Field (MRF)

bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)

bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018

bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200

bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 44: Cidu2011 Banerjee Intro to Ml 01

Drought Regions from MRFs

Intro to Machine Learning 44

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 45: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in the Sahel in the 1970s

Drought in Mongolia in the 1970s

Drought in northern Russia in the 1970s

Intro to Machine Learning 45

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 46: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in northwest America in the 1920s

Drought in southern Africa in the 1920s

Drought in eastern China in the 1920s

The Dust Bowl in Midwest US in the 1930s

Drought in central Canada in the 1920s

Intro to Machine Learning 46

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 47: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in Iran In the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 47

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 48: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in north America in the 1910s

Drought in Turkmenistan and Afghanistan in the 1910s

Drought in former Soviet Union in the 1910s

Drought in northeast China in the 1920s

Intro to Machine Learning 48

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 49: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in central Africa In the 1960s

Drought in India in the 1960s

Intro to Machine Learning 49

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 50: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in eastern Africa in the 1930s

Drought in Kazakhstan In the 1930s

Intro to Machine Learning 50

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 51: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in northern Mexico in the 1950s

Drought in western Europe in the 1940s

Drought in eastern Canada in the 1940s

Intro to Machine Learning 51

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 52: Cidu2011 Banerjee Intro to Ml 01

Results Drought Detection

Drought in Iran in the 1950s

Drought in central China in the 1950s

Intro to Machine Learning 52

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 53: Cidu2011 Banerjee Intro to Ml 01

Graphical Models (Contd)

bull Basics Inference

bull Markov Random Fields ndash Example Draught Detection

bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling

bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems

Intro to Machine Learning 53

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 54: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Background Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

54

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 55: Cidu2011 Banerjee Intro to Ml 01

θ

x

d n

d

Intro to Machine Learning

03

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Model 1 Independent Features

d=3 n=1

55

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 56: Cidu2011 Banerjee Intro to Ml 01

θ

x

d n

d

π

z

θ

x

d n

k d

Model 2 Naiumlve Bayes (Mixture Models)

56

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 57: Cidu2011 Banerjee Intro to Ml 01

-1

3

-05

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

57

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 58: Cidu2011 Banerjee Intro to Ml 01

01

21

-15

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Naiumlve Bayes Model

n k

π z θ x

d d

58

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 59: Cidu2011 Banerjee Intro to Ml 01

θ

x d

n

d

π

z

θ

x d

n

k d

π

z

x d

n

α

θ k d

Model 3 Mixed Membership Model

Intro to Machine Learning 59

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 60: Cidu2011 Banerjee Intro to Ml 01

07

31

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

60

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 61: Cidu2011 Banerjee Intro to Ml 01

09

21

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

-10 -8 -6 -4 -2 0 2 4 6 8 100

01

02

03

04

05

06

07

08

Intro to Machine Learning

Mixed Membership Models

π z x

d n

α θ

k d

61

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 62: Cidu2011 Banerjee Intro to Ml 01

01

21

-15

x

Intro to Machine Learning

Mixture Model vs Mixed Membership Model

-1

3

-05

x 07

31

-1

x 09

21

-2

x

Single component membership Multi-component mixed membership

62

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 63: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Nd D

zi

xi

π (d)

α

p(d) sim Dirichlet(α)

zi sim Discrete(π (d) )

xi sim Discrete(β (zi) )

K

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

βj

63

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 64: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

LDA Generative Model

document1 document2

64

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 65: Cidu2011 Banerjee Intro to Ml 01

Learning Inference and Estimation

Intro to Machine Learning 65

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 66: Cidu2011 Banerjee Intro to Ml 01

Variational Inference

Intro to Machine Learning 66

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 67: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Nd D

zi

xi

p (d)

φ (j)

α

β p(d) sim Dirichlet(α)

zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)

xi sim Discrete(φ (zi) )

T

distribution over topics for each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

67

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 68: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Aviation Safety Reports (NASA)

68

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 69: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Results NASA Reports I

Arrival Departure

Passenger Maintenance

runway approach departure altitude

turn tower

air traffic control heading taxi way

flight

passenger attendant

flight seat

medical captain

attendants lavatory

told police

maintenance engine

mel zzz

air craft installed

check inspection

fuel Work

69

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 70: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Results NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical passenger

doctor attendant oxygen

emergency paramedics

flight nurse aed

tire wheel

assembly nut

spacer main axle bolt

missing tires

knots turbulence

aircraft degrees

ice winds wind speed

air speed conditions

departure sid

dme altitude

climbing mean sea level

heading procedure

turn degree

70

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 71: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight

While performing a sky diving a jet approaches at the same altitude but an

accident is avoided

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

71

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 72: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Altimeter has a problem but the pilot overcomes the

difficulty during the flight

During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out

the problem

Red Flight Crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

72

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 73: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

The pilot has a landing gear problem Maintenance crew

joins radio conversation to help

The captain has a medical emergency

Red Flight crew Blue Passenger Green Maintenance

Two-Dimensional Visualization for Reports

73

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 74: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Red Flight Crew Blue Passenger Green Maintenance

Mixed Membership of Reports

Flight Crew 07039 Passenger 00009 Maintenance 02953

Flight Crew 01405 Passenger 00663 Maintenance 07932

Flight Crew 02563 Passenger 06599 Maintenance 00837

Flight Crew 00013 Passenger 00013 Maintenance 09973

74

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 75: Cidu2011 Banerjee Intro to Ml 01

Generalizations

bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath

bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles

bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc

Intro to Machine Learning 75

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 76: Cidu2011 Banerjee Intro to Ml 01

Recommendation Systems

Intro to Machine Learning

Age 28 Gender Male Job Sales man Interest Travel hellip

Users

Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip

Movie ratings matrix 76

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 77: Cidu2011 Banerjee Intro to Ml 01

Advertisements on the Web

Intro to Machine Learning

1 2 001 hellip

01 2 3 hellip

2 2 05 hellip

02 03 15 2 hellip

25 1 hellip

15 1 004 hellip

Click-Through-Rate matrix

Category Sports shoes Brand Nike Ratings 425 hellip

Category Baby URL babyearthcom Content Webpage text Hyperlinks

Webpages

Products

hellip

77

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 78: Cidu2011 Banerjee Intro to Ml 01

Forest Ecology

Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density

2 3 5 hellip

4 1 2 hellip

3 3 hellip

1 1 3 2 hellip

4 2 1 hellip

1 1 3 hellip

Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)

Plants

Traits

78 Intro to Machine Learning

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 79: Cidu2011 Banerjee Intro to Ml 01

Matrix Factorization

bull Singular value decomposition

bull Problems

ndash Large matrices with millions of rowcolumns bull SVD can be rather slow

ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries

Intro to Machine Learning

asymp

79

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 80: Cidu2011 Banerjee Intro to Ml 01

Probabilistic Matrix Factorization (PMF)

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(0 σu2I)

N(0 σv2I)

uiT ~ N(0 σu

2I) vj ~ N(0 σv

2I) Rij ~ N(ui

Tvj σ2)

Inference using gradient descent

80 Intro to Machine Learning

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 81: Cidu2011 Banerjee Intro to Ml 01

Bayesian Probabilistic Matrix Factorization

Xij ~ N(ui

Tvj σ2)

uiT

vj

N(microu Λu)

N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui

~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui

Tvj σ2)

Wishart

Gaussian

Inference using MCMC 81 Intro to Machine Learning

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 82: Cidu2011 Banerjee Intro to Ml 01

Parametric PMF (PPMF)

bull Are the priors suitable Can we get faster algorithms

Intro to Machine Learning

uiT

vj

N(0 σu2I)

N(0 σv2I) PMF

Diagonal covariance

uiT

vj

N(microu Λu)

N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo

uiT

vj

N(microu Λu)

N(microv Λv)

Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo

82

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 83: Cidu2011 Banerjee Intro to Ml 01

Results PMF on Netflix

Intro to Machine Learning 83

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 84: Cidu2011 Banerjee Intro to Ml 01

Results Bayesian PMF on Netflix

Intro to Machine Learning 84

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 85: Cidu2011 Banerjee Intro to Ml 01

Results Bayesian PMF on Netflix

Intro to Machine Learning 85

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 86: Cidu2011 Banerjee Intro to Ml 01

Results PMF on Netflix

Intro to Machine Learning 86

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 87: Cidu2011 Banerjee Intro to Ml 01

Results Bayesian PMF on Netflix

Intro to Machine Learning 87

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 88: Cidu2011 Banerjee Intro to Ml 01

PPMF vs PMF BPMF

Intro to Machine Learning

PPMF mostly achieves higher accuracy

88

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 89: Cidu2011 Banerjee Intro to Ml 01

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Learning with Expert Advice

Online Convex Optimization

Example Portfolio Selection

89

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 90: Cidu2011 Banerjee Intro to Ml 01

Learning with Expert Advice

bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts

Intro to Machine Learning

e1

e2

e3

e4

xt1

xt2

xt3

xt4

wt1

wt2

wt3

wt4

y

90

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 91: Cidu2011 Banerjee Intro to Ml 01

x1 x2 x3 x4 x5 xT

Goal Maximize prediction accuracy

Today 2020

TM Choose best w in hindsight

OL Choose wt before seeing xt

w1 w2 w3 w4

x1 x2 x3 x4 wT

xT

Time Machine (TM) vs Online Learning (OL)

Guarantee OL will be competitive with TM

Base Models

91

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 92: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Online Concave Optimization (OCO)

92

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 93: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

OCO Good News and Bad News

93

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 94: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Portfolio Selection Notation and Definitions

94

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 95: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

The Portfolio Selection Problem

95

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 96: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Constant Rebalanced Portfolio (CRP)

96

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 97: Cidu2011 Banerjee Intro to Ml 01

ge

o(T)

minus

1T

log(wt sdot xt )t=1

T

sum ge1T

log(w sdot xt ) minusεt=1

T

sum

Regret Difference of log wealth between Best CRP and Online Strategy

log(wt sdot xt )t=1

T

sum

On-Line

sum=

sdotT

ttxw

1

)log(

Best CRP

Intro to Machine Learning

Regret Competing with the Best CRP

97

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 98: Cidu2011 Banerjee Intro to Ml 01

Datasets

Intro to Machine Learning

bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)

bull NYSE

ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974

98

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 99: Cidu2011 Banerjee Intro to Ml 01

Results UP EG

Intro to Machine Learning 99

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 100: Cidu2011 Banerjee Intro to Ml 01

Results UP EG ONS

Intro to Machine Learning 100

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 101: Cidu2011 Banerjee Intro to Ml 01

Results UP EG ONS AC30

Intro to Machine Learning 101

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 102: Cidu2011 Banerjee Intro to Ml 01

Results UP EG ONS AC30

Intro to Machine Learning 102

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 103: Cidu2011 Banerjee Intro to Ml 01

Best of both worlds

Competitive with best CRP

Strong empirical performance

Heuristic based methods

Strong Empirical Performance No performance guarantee

Universal Algorithms

Competitive with best CRP Can be outdone by simple heuristics

Intro to Machine Learning

Motivation for Meta Algorithms

103

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 104: Cidu2011 Banerjee Intro to Ml 01

Meta Algorithm

Base Algo 2

Base Algo 3

Base Algo m

Base Algo 1

hellip

pt1 ptm pt3 pt2

pt1x ptkxt pt3xt pt2xt

Day t

xt Wealth Gain

tjtjt xpr sdot=

m base algorithms

wt distribution over algorithms

w(1) w(2) w(3) w(m)

Weight update

Intro to Machine Learning

bull wt updated based on rt bull Good algorithms get more money

104

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 105: Cidu2011 Banerjee Intro to Ml 01

Results MAEG MAONS

Intro to Machine Learning 105

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 106: Cidu2011 Banerjee Intro to Ml 01

Results MAEG MAONS

Intro to Machine Learning 106

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 107: Cidu2011 Banerjee Intro to Ml 01

Results UP EG ONS AC30

Intro to Machine Learning 107

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 108: Cidu2011 Banerjee Intro to Ml 01

Results MAEG MAONS

Intro to Machine Learning 108

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 109: Cidu2011 Banerjee Intro to Ml 01

Results New Base Algorithm BAH(AC30)

Intro to Machine Learning 109

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 110: Cidu2011 Banerjee Intro to Ml 01

Results MAEG MAONS

Intro to Machine Learning 110

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 111: Cidu2011 Banerjee Intro to Ml 01

Results NYSE Base Algorithms

Intro to Machine Learning 111

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 112: Cidu2011 Banerjee Intro to Ml 01

Results MAEG MAONS

Intro to Machine Learning 112

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 113: Cidu2011 Banerjee Intro to Ml 01

Overview

Intro to Machine Learning

Predictive Models

Graphical Models

Online Learning

Exploratory Analysis

Clustering

Dimensionality Reduction

113

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 114: Cidu2011 Banerjee Intro to Ml 01

bull Mixture Modeling

ndash l ndash Expectation Maximization (EM) or SpectralRandom projections

bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation

Model-based Clustering

Intro to Machine Learning

n

x

θ h

π

k

114

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 115: Cidu2011 Banerjee Intro to Ml 01

Spectral Clustering

bull Weighted Graph with weights bull Spectral Clustering

ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means

Intro to Machine Learning

From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001

wwwcsberkeleyedu~fowlkesBSE

115

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 116: Cidu2011 Banerjee Intro to Ml 01

Linear Dimensionality Reduction

bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem

bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation

bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized

Intro to Machine Learning

PCA LDA

116

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 117: Cidu2011 Banerjee Intro to Ml 01

Nonlinear Dimensionality Reduction

bull Manifold Embedding

ndash Preserve local geometric structure

bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps

bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models

Intro to Machine Learning 117

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 118: Cidu2011 Banerjee Intro to Ml 01

Summary and Conclusions

bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip

bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity

bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies

bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts

Intro to Machine Learning 118

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 119: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning

Acknowledgements

Amrudin Agovic Qiang Fu Soumyadeep Chatterjee

Hanhuai Shan Puja Das

119

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120
Page 120: Cidu2011 Banerjee Intro to Ml 01

Intro to Machine Learning 120

  • Tutorial Introduction to Machine Learning
  • Success Stories
  • Success Stories
  • Types of Data
  • Types of Learning
  • Overview
  • Classification Linear Models
  • Regression Linear Models
  • Hierarchical Linear Models
  • Regularization Generalization
  • Non-linear Methods
  • Ensembles
  • High Dimensional Modeling
  • Land Variable Regression
  • Land Regions for Prediction Temp Precip
  • Ordinary Least Squares
  • Sparse Group Lasso (SGL)
  • Example Land Variable Regression
  • Error (RMSE) SGL NC OLS
  • Overview
  • Graphical Models What and Why
  • Example Burglary Network
  • Bayesian Networks
  • Example II Rain Network
  • Latent Variable Models
  • Inference
  • Inference Algorithms
  • Approximate Inference
  • The Inference Problem
  • Bayes Nets to Factor Graphs
  • Factor Graphs Product of Local Functions
  • Marginalize Product of Functions (MPF)
  • Example Computing g1(x1)
  • Message Passing
  • The Sum-Product Algorithm
  • Example Computing g3(x3)
  • Hidden Markov Models (HMMs)
  • Distributive Law on Semi-Rings
  • Message Passing in General Graphs
  • Temporal Models
  • Spatial Models
  • Drought Detection
  • Detection with Markov Random Field (MRF)
  • Drought Regions from MRFs
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Results Drought Detection
  • Graphical Models (Contd)
  • Background Plate Diagrams
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • Slide Number 58
  • Slide Number 59
  • Slide Number 60
  • Slide Number 61
  • Slide Number 62
  • Latent Dirichlet Allocation (LDA)
  • LDA Generative Model
  • Learning Inference and Estimation
  • Variational Inference
  • Smoothed Latent Dirichlet Allocation
  • Aviation Safety Reports (NASA)
  • Results NASA Reports I
  • Results NASA Reports II
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Generalizations
  • Recommendation Systems
  • Advertisements on the Web
  • Forest Ecology
  • Matrix Factorization
  • Probabilistic Matrix Factorization (PMF)
  • Bayesian Probabilistic Matrix Factorization
  • Parametric PMF (PPMF)
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results Bayesian PMF on Netflix
  • Results PMF on Netflix
  • Results Bayesian PMF on Netflix
  • PPMF vs PMF BPMF
  • Overview
  • Learning with Expert Advice
  • Time Machine (TM) vs Online Learning (OL)
  • Online Concave Optimization (OCO)
  • OCO Good News and Bad News
  • Portfolio Selection Notation and Definitions
  • The Portfolio Selection Problem
  • Constant Rebalanced Portfolio (CRP)
  • Regret Competing with the Best CRP
  • Datasets
  • Results UP EG
  • Results UP EG ONS
  • Results UP EG ONS AC30
  • Results UP EG ONS AC30
  • Motivation for Meta Algorithms
  • Meta Algorithm
  • Results MAEG MAONS
  • Results MAEG MAONS
  • Results UP EG ONS AC30
  • Results MAEG MAONS
  • Results New Base Algorithm BAH(AC30)
  • Results MAEG MAONS
  • Results NYSE Base Algorithms
  • Results MAEG MAONS
  • Overview
  • Model-based Clustering
  • Spectral Clustering
  • Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Summary and Conclusions
  • Acknowledgements
  • Slide Number 120