unit 3 multiple linear regression
TRANSCRIPT
NYUWIRELESS
Unit 3 Multiple Linear RegressionEL-GY 6143/CS-GY 6923: INTRODUCTION TO MACHINE LEARNING
PROF. PEI LIU
1
NYUWIRELESS
Learning ObjectivesqFormulate a machine learning model as a multiple linear regression model.
◦ Identify prediction vector and target for the problem.
qWrite the regression model in matrix form. Write the feature matrix
qCompute the least-squares solution for the regression coefficients on training data.
qDerive the least-squares formula from minimization of the RSS
qManipulate 2D arrays in python (indexing, stacking, computing shapes, …)
qCompute the LS solution using python linear algebra and machine learning packages
2
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
3
NYUWIRELESS
Example: Blood Glucose LevelqDiabetes patients must monitor glucose level
qWhat causes blood glucose levels to rise and fall?
qMany factors
qWe know mechanisms qualitatively
qBut, quantitative models are difficult to obtain◦ Hard to derive from first principles◦ Difficult to model physiological process precisely
qCan machine learning help?
4
NYUWIRELESS
Data from AIM 94 Experiment
qData collected as series of events◦ Eating◦ Exercise◦ Insulin dosage
qTarget variable glucose level monitored
5
NYUWIRELESS
Demo on GitHubqAll code is available in github:https://github.com/pliugithub/MachineLearning/blob/master/unit03_mult_lin_reg/demo_glucose.ipynb
6
NYUWIRELESS
Using Google ColaboratoryqTwo options for running the demos:
qOption 1: ◦ Clone the github repository to your local machine and run it using jupyter notebook◦ Need to install all the software correctly
qOption 2: Run on the cloud in Google Colaboratory
qFor Option 2:◦ Go to https://colab.research.google.com/◦ File->Open Notebook◦ Select GitHub tab◦ Enter github URL:
https://github.com/sdrangan/introml◦ Select the unit03_mult_lin_reg/demo1_glucose.ipynb
7
NYUWIRELESS
Demo on Google Colab
8
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
9
NYUWIRELESS
Simple vs. Multiple RegressionqSimple linear regression: One predictor (feature)
◦ Scalar predictor 𝑥◦ Linear model: "𝑦 = 𝛽! + 𝛽"𝑥◦ Can only account for one variable
qMultiple linear regression: Multiple predictors (features)◦ Vector predictor 𝒙 = (𝑥" , … , 𝑥#)◦ Linear model: "𝑦 = 𝛽! + 𝛽"𝑥" + ⋯+ 𝛽#𝑥#◦ Can account for multiple predictors◦ Turns into simple linear regression when 𝑘 = 1
10
NYUWIRELESS
Comparison to Single Variable ModelsqWe could compute models for each variable separately:
𝑦 = 𝑎" + 𝑏"𝑥"𝑦 = 𝑎$ + 𝑏$𝑥$⋮
qBut, doesn’t provide a way to account for joint effects
qExample: Consider three linear models to predicting longevity:◦ A: Longevity vs. some factor in diet (e.g. amount of fiber consumed)◦ B: Longevity vs. exercise◦ C: Longevity vs. diet AND exercise◦ What does C tell you that A and B do not?
11
NYUWIRELESS
Special Case: Single VariableqSuppose 𝑘 = 1 predictor.
qFeature matrix and coefficient vector:
𝐴 =1 𝑥!⋮ ⋮1 𝑥"
, 𝛽 = 𝛽#𝛽!
qLS soln: 𝛽 = !$𝐴%𝐴
&! !$𝐴%𝑦 = 𝑃&!𝑟
𝑃 = 1 �̅��̅� 𝑥' , 𝑟 = -𝑦
𝑥𝑦qObtain single variable solutions for coefficients (after some algebra):
𝛽! =𝑠()𝑠('
, 𝛽# = -𝑦 − 𝛽!�̅�, 𝑅' =𝑠()'
𝑠('𝑠)'
12
NYUWIRELESS
Loading the Data
qscikit-learn package:◦ Many methods for machine learning◦ Datasets◦ Will use throughout this class
qDiabetes dataset is one example
13
NYUWIRELESS
Simple Linear Regression for Diabetes DataqTry a fit of each variable individually
qCompute 𝑅*' coefficient for each variable
qUse formula on previous slide
q“Best” individual variable is a poor fit◦ 𝑅#$ ≈ 0.34
14
Best individual variable
NYUWIRELESS
Scatter PlotqNo one variable explains glucose well
qMultiple linear regression could be much better
15
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
16
NYUWIRELESS
Finding a Mathematical Model
17
qGoal: Find a function to predict glucose level from the 10 attributes
qProblem: Several attributes ◦ Need a multi-variable function
Attributes Target𝑦 =Glucose level
𝑦 ≈ "𝑦 = 𝑓(𝑥" , … , 𝑥"!)
𝑥": Age𝑥$: Sex𝑥%: BMI𝑥&: BP𝑥': S1
⋮𝑥"!: S6
NYUWIRELESS
Matrix Representation of DataqData is a matrix and a target vector
q𝑛 samples: ◦ One sample per row
q𝑘 features / attributes /predictors: ◦ One feature per column
qThis example:◦ 𝑦( = blood glucose measurement of i-th sample◦ 𝑥(,* : j-th feature of i-th sample◦ 𝒙(
+ = [𝑥(,", 𝑥(,$,…, 𝑥(,#]: feature or predictor vector◦ i-th sample contains 𝒙( ,𝑦(
18
𝑋 =𝑥22 ⋯ 𝑥23⋮ ⋱ ⋮𝑥42 ⋯ 𝑥43
Attributes
Samples𝑦 =𝑦2⋮𝑦4
Target vector
NYUWIRELESS
In class exercise
19
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
20
NYUWIRELESS
Multivariable Linear Model for Glucose
21
qGoal: Find a function to predict glucose level from the 10 attributes
qLinear Model: Assume glucose is a linear function of the predictors:
glucose ≈ prediction = 𝛽# + 𝛽! Age +⋯+ 𝛽+ 𝐵𝑃 + 𝛽, S1 +⋯+ 𝛽!# S6
qGeneral form:
AttributesTarget𝑦 =Glucose level
𝑦 ≈ "𝑦 = 𝑓(𝑥" , … , 𝑥"!)
Age, Sex, BMI,BP,S1, …, S6𝒙 = [𝑥" , … , 𝑥"!]
10 Features
𝑦 ≈ "𝑦 = 𝛽! + 𝛽"𝑥" + ⋯+ 𝛽&𝑥& + 𝛽'𝑥' + ⋯+ 𝛽"!𝑥"!Target
Intercept
NYUWIRELESS
Multiple Variable Linear ModelqVector of features: 𝒙 = [𝑥!, … , 𝑥*]
◦ 𝑘 features (also known as predictors, independent variable, attributes, covariates, …)
qSingle target variable 𝑦◦ What we want to predict
qLinear model: Make a prediction M𝑦
𝑦 ≈ M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*qData for training
◦ Samples are (𝒙( , 𝑦(), i=1,2,…,n. ◦ Each sample has a vector of features: 𝒙( = [𝑥(" , … , 𝑥(#] and scalar target 𝑦(
qProblem: Learn the best coefficients 𝜷 = [𝛽# , 𝛽!, … , 𝛽*] from the training data
22
NYUWIRELESS
Example: Heart Rate IncreaseqLinear Model: HR increase ≈ 𝛽# + 𝛽! mins exercise + 𝛽' exercise intensity
qData:
23
Subjectnumber
HR before HR after Mins ontreadmill
Speed(min/km)
Days exercise/ week
123 60 90 1 5.2 3
456 80 110 2 4.1 1
789 70 130 5 3.5 2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
Measuring fitness of athletes
https://www.mercurynews.com/2017/10/29/4851089/
NYUWIRELESS
Why Use a Linear Model?qMany natural phenomena have linear relationship
qPredictor has small variation◦ Suppose 𝑦 = 𝑓 𝑥◦ If variation of 𝑥 is small around some value 𝑥!, then
𝑦 ≈ 𝑓 𝑥! + 𝑓, 𝑥! 𝑥 − 𝑥! = 𝛽! + 𝛽"𝑥,
𝛽! = 𝑓 𝑥! − 𝑓, 𝑥! 𝑥! , 𝛽" = 𝑓′(𝑥!)
qSimple to compute
qEasy to interpret relation◦ Coefficient 𝛽* indicates the importance of feature j for the target.
qAdvanced: Gaussian random variables: ◦ If two variables are jointly Gaussian, the optimal predictor of one from the other is linear predictor
24
NYUWIRELESS
Matrix ReviewqConsider
𝐴 =1 23 45 6
, 𝐵 = 2 03 2 , 𝑥 = 2
3 ,
qCompute (computations on the board):◦ Matrix vector multiply: 𝐴𝑥◦ Transpose: 𝐴+
◦ Matrix multiply: 𝐴𝐵◦ Solution to linear equations: Solve for 𝑢: 𝑥 = 𝐵𝑢◦ Matrix inverse: 𝐵-"
25
NYUWIRELESS
Slopes, Intercept and Inner ProductsqModel with coefficients 𝜷: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*qSometimes use weight bias version:
M𝑦 = 𝑏 + 𝑤!𝑥! +⋯+𝑤*𝑥*
◦ 𝑏 = 𝛽! : Bias or intercept◦ 𝒘 = 𝜷":# = [𝛽" , … , 𝛽#]: Weights or slope vector
qCan write either with inner product:
M𝑦 = 𝛽# + 𝜷!:* ⋅ 𝒙 or M𝑦 = 𝑏 + 𝒘 ⋅ 𝒙qInner product:
◦ 𝒘 ⋅ 𝒙 = ∑*/"# 𝑤*𝑥*
◦ Will use alternate notation: 𝐰+𝒙 = 𝒘, 𝒙
26
NYUWIRELESS
Matrix Form of Linear RegressionqData: 𝒙., 𝑦. , 𝑖 = 1,… , 𝑛
qPredicted value for 𝑖-th sample: M𝑦. = 𝛽# + 𝛽!𝑥.! +⋯+ 𝛽*𝑥.*qMatrix form
M𝑦!M𝑦'⋮M𝑦"
=11
𝑥!!𝑥'!
⋯⋯
𝑥!*𝑥'*
⋮ ⋮ ⋯ ⋮1 𝑥"! ⋯ 𝑥"*
𝛽#𝛽!⋮𝛽*
qMatrix equation: �̀� = 𝑨 𝜷
27
𝜷 with 𝑝 = 𝑘 + 1 coefficient vector
𝑨 a 𝑛×(𝑘 + 1) feature matrix
K𝒚 a 𝑛 predicted values
NYUWIRELESS
In-Class Exercise
28
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
29
NYUWIRELESS
Least Squares Model FittingqHow do we select parameters 𝜷 = (𝛽#, … , 𝛽*)?
qDefine M𝑦. = 𝛽# + 𝛽!𝑥.! +⋯+ 𝛽*𝑥.*◦ Predicted value on sample 𝑖 for parameters 𝜷 = (𝛽! , … , 𝛽#)
qDefine average residual sum of squares:
RSS 𝜷 :=e/0!
"
𝑦. − M𝑦. '
◦ Note that "𝑦( is implicitly a function of 𝜷 = (𝛽! , … , 𝛽#)◦ Also called the sum of squared residuals (SSR) and sum of squared errors (SSE)
qLeast squares solution: Find 𝜷 to minimize RSS.
30
NYUWIRELESS
Variants of RSSqOften use some variant of RSS
◦ Note: these are not standard
qResidual sum of squares: RSS = ∑.0!" 𝑦. − M𝑦. '
qRSS per sample or Mean Squared Error:
MSE =RSS𝑛
=1𝑛e.0!
"
𝑦. − M𝑦. '
qNormalized RSS or Normalized MSE: ⁄𝑅𝑆𝑆 𝑛
𝑠)'=𝑀𝑆𝐸𝑠)'
=∑.0!" 𝑦. − M𝑦. '
∑.0!" 𝑦. − -𝑦 '
31
NYUWIRELESS
Finding Parameters via OptimizationA general ML recipe
qPick a model with parameters
qGet data
qPick a loss function◦ Measures goodness of fit model to data◦ Function of the parameters
qFind parameters that minimizes loss
32
Linear model: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*Data: 𝒙., 𝑦. , 𝑖 = 1,2, … , 𝑛
Loss function: 𝑅𝑆𝑆 𝛽#, … , 𝛽* ≔ ∑ 𝑦. − M𝑦. '
Select 𝜷 = (𝛽#, … , 𝛽*) to minimize 𝑅𝑆𝑆 𝜷
General ML problem Multiple linear regression
NYUWIRELESS
RSS as a Vector NormqRSS is given by sum:
RSS =e.0!
"𝑦. − M𝑦. '
qDefine norm of a vector: ◦ 𝒙 = 𝑥"$ + ⋯+ 𝑥0$ "/$
◦ Standard Euclidean norm.◦ Sometimes called ℓ-2 norm. ℓ is for Lebesque
qWrite RSS in vector form:RSS = 𝒚 − �̀� '
33
NYUWIRELESS
Least Squares SolutionqConsider cost function of the RSS:
RSS 𝜷 = Q(/"
2𝑦( − "𝑦( $ , "𝑦( =Q
*/!
3𝐴(*𝛽*
◦ Vector 𝜷 that minimizes RSS called the least-squares solution
qLeast squares solution: The vector 𝜷 that minimizes the RSS is:
n𝜷 = 𝑨%𝑨 &!𝑨%𝒚
◦ Can compute the best coefficient vector analytically◦ Just solve a linear set of equations ◦ Will show the proof below
34
NYUWIRELESS
Proving the LS FormulaqLeast squares formula: The vector 𝜷 that minimizes the RSS is:
n𝜷 = 𝑨%𝑨 &!𝑨%𝒚
qTo prove this formula, we will:
◦ Review gradients of multi-variable functions
◦ Compute gradient 𝛻𝑅𝑆𝑆 𝜷
◦ Solve 𝛻𝑅𝑆𝑆 𝜷 = 0
35
NYUWIRELESS
Gradients of Multi-Variable FunctionsqConsider scalar valued function of a vector: 𝑓 𝜷 = 𝑓(𝛽!, … , 𝛽")
qGradient is the column vector:
𝛻𝑓 𝜷 =⁄𝜕𝑓(𝜷) 𝜕𝛽!⋮⁄𝜕𝑓(𝜷) 𝜕𝛽"
qRepresents direction of maximum increase
qAt a local minima or maxima: 𝛻𝑓 𝜷 = 0◦ Solve 𝑛 equations and 𝑛 unknowns
qEx: 𝑓 𝛽!, 𝛽' = 𝛽! sin 𝛽' + 𝛽!' 𝛽'. ◦ Compute 𝛻𝑓 𝜷 . Solution on board
36
NYUWIRELESS
Proof of the LS FormulaqConsider cost function of the RSS:
RSS = Q(/"
2𝑦( − "𝑦( $ , "𝑦( =Q
*/!
3𝐴(*𝛽*
◦ Vector 𝜷 that minimizes RSS called the least-squares solution
qCompute partial derivatives via chain rule: 1233144
= −2∑.0!" 𝑦. − M𝑦. 𝐴.5, 𝑗 = 1,2, … , 𝑘
qMatrix form: RSS = 𝐴𝜷 − 𝒚 ', 𝛻𝑅𝑆𝑆 = −2𝐴%(𝒚 − 𝑨𝜷)
qSolution: 𝐴% 𝒚 − 𝐴𝜷 = 0 → 𝜷 = 𝐴%𝐴 &!𝐴%𝒚 (least squares solution of equation 𝐴𝜷 = 𝐲)
qMinimum RSS: 𝑅𝑆𝑆 = 𝒚% 𝐼 − 𝐴 𝐴%𝐴 &!𝐴% 𝒚◦ Proof on the board
37
NYUWIRELESS
LS Solution via Auto-Correlation FunctionsqEach data sample has a linear feature vector:
𝐴. = 𝐴.#, ⋯ , 𝐴.* = 1, 𝑥.!, ⋯ , 𝑥.*
qDefine sample auto-correlation matrix and cross-correlation vector:◦ 𝑅55 =
"2𝐴+𝐴, 𝑅55 ℓ,𝑚 = "
2∑(/"2 𝐴(ℓ𝐴(7 (correlation of feature ℓ and feature m)
◦ 𝑅58 ="2𝐴+𝑦, 𝑅85 ℓ = "
2∑(/"2 𝐴(ℓ𝑦( (correlation of feature ℓ and target)
qLeast squares solution is: 𝜷 = 𝑅66&!𝑅6)
38
NYUWIRELESS
Mean Removed Form of the LS SolutionqOften useful to remove mean from data before fitting
qSample mean: -𝑦 = !$∑.0!$ 𝑦., �̅�5 =
!$∑.0!$ 𝑥.5, v𝒙 = �̅�!, ⋯ , �̅�*
qDefined mean removed data: w𝑋.5 = 𝑥.5 − �̅�5, y𝑦. = 𝑦. − -𝑦
qSample covariance matrix and cross-covariance vector:◦ 𝑆99 ℓ,𝑚 = "
:∑(/": (𝑥(ℓ−�̅�ℓ)(𝑥(7−�̅�7), 𝑆99 =
":V𝑿𝑻V𝑿
◦ 𝑆98 ℓ = ":∑(/": (𝑥(ℓ−�̅�ℓ)(𝑦(−X𝑦), 𝑆98 =
":V𝑿+Y𝒚
qMean-Removed form of the least squares solution:
M𝑦 = 𝜷!:* ⋅ 𝒙 + 𝛽#, 𝜷!:* = 𝑆((&!𝑆(), 𝛽# = -𝑦 − 𝜷!:* ⋅ v𝒙
qProof: On board
39
NYUWIRELESS
𝑅!: Goodness of FitqMultiple variable coefficient of determination:
𝑅" =𝑠#" −𝑀𝑆𝐸
𝑠#"= 1 −
𝑀𝑆𝐸𝑠#"
◦ MSE = <==2= "
2∑(/"2 𝑦( − "𝑦( $
◦ Sample variance is: 𝑠8$ ="2∑(/"2 𝑦( − X𝑦 $, X𝑦 = "
2∑(/"2 𝑦( ,
qInterpretation:
◦ >=?@!"
= Error with linear predictorError predicting bymean
◦ 𝑅$ = fraction of variance reduced or “explained” by the model.
qOn the training data (not necessarily on the test data):◦ 𝑅$ ∈ [0,1] always◦ 𝑅$ ≈ 1 ⇒ linear model provides a good fit ◦ 𝑅$ ≈ 0 ⇒ linear model provides a poor fit
40
NYUWIRELESS
In-Class Exercise
41
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
42
NYUWIRELESS
Arrays and Vector in Python and MATLAB
43
qThere are some key differences between MATLAB and Python that you need to get used to
qMATLAB◦ All arrays are at least 2 dimensions◦ Vectors are 1×𝑁 (row vectors) or 𝑁×1 (column) vectors◦ Matrix vector multiplication syntax depends if vector is on left or right: x’*A or A*x
qPython:◦ Arrays can have 1, 2, 3, … dimension◦ Vectors can be 1D arrays; matrices are generally 2D arrays◦ Vectors that are 1D arrays are neither row not column vectors◦ If x is 1D and A is 2D, then left and right multiplication are the same: x.dot(A) and A.dot(x)
qLecture notes: We will generally treat 𝑥 and 𝑥% the same. ◦ Can write 𝑥 = (𝑥" , … , 𝑥:) and still multiply by a matrix on left or right
NYUWIRELESS
Fitting Using sklearnqReturn to diabetes data example
qAll code in demo
qDivide data into two portions:◦ Training data: First 300 samples◦ Test data: Remaining 142 samples
qTrain model on training data.
qTest model (i.e. measure RSS) on test data
qReason for splitting data discussed next lecture.
44
NYUWIRELESS
Manually Computing the SolutionqUse numpy linear algebra routine to solve
𝛽 = 𝐴%𝐴 &!𝐴%𝑦
qCommon mistake:◦ Compute matrix inverse 𝑃 = 𝐴+𝐴 -", ◦ Then compute 𝛽 = 𝑃𝐴+𝑦◦ Full matrix inverse is VERY slow. Not needed.◦ Can directly solve linear system: 𝐴 𝛽 = 𝑦◦ Numpy has routines to solve this directly
45
NYUWIRELESS
Calling the sklearn Linear Regression methodqConstruct a linear regression object
qRun it on the training data
qPredict values on the test data
46
NYUWIRELESS
In-Class Exercise
47
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing in python
qExtensions
48
NYUWIRELESS
Transformed Linear ModelsqStandard linear model: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽7𝑥7qLinear model may be too restrictive
◦ Relation between 𝑥 and 𝑦 can be nonlinear
qUseful to look at models in transformed form:
M𝑦 = 𝛽!𝜙! 𝒙 +⋯+ 𝛽8𝜙8 𝒙◦ Each function 𝜙* 𝒙 = 𝜙*(𝑥" , … , 𝑥A) is called a basis function◦ Each basis function may be nonlinear and a function of multiple variables
qCan write in vector form: M𝑦 = 𝝓 𝒙 ⋅ 𝜷◦ 𝝓 𝒙 = 𝜙" 𝒙 , … , 𝜙3 𝒙 , 𝜷 = [𝛽" , … , 𝛽3]
49
𝑥
𝑦
NYUWIRELESS
Fitting Transformed Linear ModelsqConsider transformed linear model
M𝑦 = 𝛽!𝜙! 𝒙 +⋯+ 𝛽8𝜙8 𝒙
qWe can fit this model exactly as before◦ Given data 𝒙( , 𝑦( , 𝑖 = 1, … , 𝑁◦ Want to fit the model from the transformed variables 𝜙* 𝒙 to target 𝑦◦ Define the transformed matrix:
𝐴 =𝜙"(𝒙") ⋯ 𝜙3(𝒙")
⋮ ⋮ ⋮𝜙"(𝒙:) ⋯ 𝜙3(𝒙:)
◦ Predictions: "𝑦 = 𝐴𝛽◦ Least squares fit t𝛽 = 𝐴+𝐴 -"𝐴+𝑦
50
NYUWIRELESS
Example: Polynomial FittingqSuppose 𝑦 only depends on a single variable 𝑥,
qWant to fit a polynomial model◦ 𝑦 ≈ 𝛽! + 𝛽"𝑥 +⋯𝛽#𝑥#
qGiven data 𝑥( , 𝑦( , 𝑖 = 1,… , 𝑛
qTake basis functions 𝜙* 𝑥 = 𝑥* , 𝑗 = 0,… , 𝑑qTransformed model: "𝑦 = 𝛽!𝜙! 𝑥 +⋯+𝛽A𝜙A 𝑥qTransformed matrix is:
𝑨 =1 𝑥" ⋯ 𝑥"A⋮ ⋮ ⋯ ⋮1 𝑥2 ⋯ 𝑥2A
, 𝛽 =𝛽!⋮𝛽A
◦ 𝑝 = 𝑑 + 1 transformed features from 1 original feature
qWill discuss how to select 𝑑 in the next lecture
51
NYUWIRELESS
Other Nonlinear ExamplesqMultinomial model: M𝑦 = 𝑎 + 𝑏!𝑥! + 𝑏'𝑥' + 𝑐!𝑥!' + 𝑐'𝑥!𝑥' + 𝑐C𝑥''
◦ Contains all second order terms◦ Define parameter vector 𝛽 = 𝑎, 𝑏" , 𝑏$ , 𝑐" , 𝑐$ , 𝑐%◦ Transformed vector 𝜙 𝑥" , 𝑥$ = [1, 𝑥" , 𝑥$ , 𝑥"$ , 𝑥"𝑥$ , 𝑥$$]◦ Note that the features are nonlinear functions of 𝒙 = [𝑥" , 𝑥$]
qExponential model: M𝑦 = 𝑎!e&DB( +⋯+ 𝑎7𝑒&DC(◦ If the parameters 𝑏" , … , 𝑏A are fixed, then the model is linear in the parameters 𝑎" , … , 𝑎A◦ Parameter vector 𝛽 = 𝑎" , … , 𝑎A◦ Transformed vector 𝜙 𝑥 = [𝑒-D#9 , … , 𝑒-D$9]◦ But, if the parameters 𝑏" , … , 𝑏A are not fixed, the model is nonlinear in 𝑏" , … , 𝑏A
52
NYUWIRELESS
Linear Models via Re-ParametrizationqSometimes models can be made into a linear model via re-parametrization
qExample: Consider the model: M𝑦 = 𝐴𝑥!(1 + 𝐵𝑒&(E)◦ Parameters (𝐴, 𝐵)
qThis is nonlinear in (𝐴, 𝐵) due to the product 𝐴𝐵: M𝑦 = 𝐴𝑥! + 𝐴𝐵𝑥!𝑒&(E
qBut, we can define a new set of parameters:◦ 𝛽" = 𝐴 and 𝛽$ = 𝐴𝐵
qThen, M𝑦 = 𝛽!𝑥! + 𝛽'𝑥!𝑒&(E
qBasis functions: 𝜙 𝑥!, 𝑥' = [𝑥!, 𝑥!𝑒&(E]
qAfter we solve for 𝛽!, 𝛽' we can recover 𝐴, 𝐵 via inverting the equations:
𝐴 = 𝛽!, 𝐵 =𝛽'𝐴
53
NYUWIRELESS
Example: Learning Linear SystemsqLinear system: 𝑦$ = 𝑎%𝑦$&% +⋯+ 𝑎'𝑦$&' + 𝑏(𝑥$ +⋯+ 𝑏)𝑥$&) +𝑤$
qTransfer function: 𝐻 𝑧 = *$+⋯+*%-&%
%&.'-&'&⋯&.(-&(
qGiven input sequence and output sequence for T samples,
How do we determine 𝛽 = 𝑎%, ⋯ , 𝑎', 𝑏(, ⋯ , 𝑏) /?
qCan be solved using linear regression!
qWrite 𝑦 = 𝐴𝛽 + 𝑤 and define 𝐴, y◦ See homework problem
qMany applications◦ Learning dynamics in robots / mechanical systems◦ Modeling responses in neural systems◦ Stock market time series◦ Speech modeling. Fit a model each 25 ms.
54
NYUWIRELESS
One Hot EncodingqSuppose that one feature 𝑥5 is a categorical variable
qEx: Predict the price of a car, 𝑦, given model 𝑥! and interior space 𝑥'◦ Suppose there are 3 different models of a car (Ford, BMW, GM)◦ Bad idea: Arbitrarily assign an index to each possible car model◦ Can give unreasonable relations
qOne-hot encoding example: ◦ With 3 possible categories, represent 𝑥" using 3 binary features (𝜙", 𝜙$ , 𝜙%)◦ Model: 𝑦 = 𝛽! + 𝛽"𝜙" + 𝛽$𝜙$ + 𝛽%𝜙% + 𝛽&𝑥$◦ Essentially obtain 3 different models:
◦ Ford: 𝑦 = 𝛽% + 𝛽& + 𝛽'𝑥(◦ BMW: 𝑦 = 𝛽% + 𝛽( + 𝛽'𝑥(◦ GM: 𝑦 = 𝛽% + 𝛽) + 𝛽'𝑥(
◦ Allows different intercepts (or mean values) for different categories!
55
Model 𝝓𝟏 𝝓𝟐 𝝓𝟑
Ford 1 0 0
BMW 0 1 0
GM 0 0 1
NYUWIRELESS
Lab: Robot CalibrationqPredict the current draw
◦ Needed to predict power consumption
qPredictors:◦ Joint angles, velocity and acceleration◦ Strain gauge readings (measure of load)
qFull website at TU Dortmund, Germany◦ http://www.rst.e-technik.tu-
dortmund.de/cms/en/research/robotics/TUDOR_engl/index.html
◦ http://www.rst.e-technik.tu-dortmund.de/forschung/robot-toolbox/MERIt/MERIt_Documentation.pdf
56