data modeling patrice koehl department of biological sciences national university of singapore...

25
Data Modeling Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore http://www.cs.ucdavis.edu/~koehl/Teaching/BL5229 [email protected]

Upload: adela-james

Post on 18-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Data ModelingData Modeling

Patrice KoehlDepartment of Biological Sciences

National University of Singapore

http://www.cs.ucdavis.edu/~koehl/Teaching/[email protected]

Page 2: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Data Modeling

Data Modeling: least squares

Data Modeling: Non linear least squares

Data Modeling: robust estimation

Page 3: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Data Modeling

Page 4: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Least squares

Suppose that we are fitting N data points (xi,yi) (with errors i on eachdata point) to a model Y defined with M parameters aj:

The standard procedure is least squares: the fitted values for the parameters aj are those that minimize:

Where does this come from?

Page 5: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Let us suppose that:The data points are independent of each otherEach data point as a measurement error that is random, distributed as a Gaussian distribution around the “true” value Y(xi)

The probability of the data points, given the model Y is then:

Least squares

Page 6: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Application of Bayes ‘s theorem:

With no information on the models, we can assume that the prior probabilityP(Model) is constant.

Finding the coefficients a1,…aM that maximizes P(Model/Data) is thenequivalent to finding the coefficients that maximizes P(Data/Model).This is equivalent to maximizing its logarithm, or minimizing the negative of itslogarithm, namely:

Least squares

Page 7: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Fitting data to a straight line

Page 8: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Fitting data to a straight line

This is the simplest case:

Then:

The parameters a and b are obtained from the two equations:

Page 9: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Fitting data to a straight line

Let us define:

then

a and b are given by:

Page 10: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Fitting data to a straight line

We are not done!

Uncertainty on the values of a and b:

Evaluate goodness of fit:

-Compute 2 and compare to N-M (here N-2)

-Compute residual error on each data point: Y(xi)-yi

-Compute correlation coefficient R2

Page 11: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Fitting data to a straight line

Page 12: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

General Least Squares

Then:

The minimization of 2 occurs when the derivatives of 2 with respect to theparameters a1,…aM are 0. This leads to M equations:

Page 13: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

General Least Squares

Define design matrix A such that

Page 14: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

General Least Squares

Define two vectors b and a such that

and a contains the parameters

The parameters a that minimize 2 satisfy:

Note that 2 can be rewritten as:

These are the normal equations for the linear least square problem.

Page 15: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

General Least Squares

How to solve a general least square problem:

1) Build the design matrix A and the vector b

2) Find parameters a1,…aM that minimize

(usually solve the normal equations)

3) Compute uncertainty on each parameter aj:

if C = ATA, then

Page 16: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Data Modeling

Page 17: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

In the general case, g(X1,…,Xn) is a non linear function of the parameters X1,…Xn;

2 is then also a non linear function of these parameters:

Finding the parameters X1,…,Xn is then treated as finding X1,…,Xn that minimize 2.

Non linear least squares

Page 18: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Minimizing 2

Some definitions:

Gradient:The gradient of a smooth function f with continuous first and second derivatives is defined as:

HessianThe n x n symmetric matrix of second derivatives, H(x), is calledthe Hessian:

Page 19: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Steepest descent (SD):

The simplest iteration scheme consists of following the “steepestdescent” direction:

Usually, SD methods leads to improvement quickly, but then exhibitslow progress toward a solution.They are commonly recommended for initial minimization iterations,when the starting function and gradient-norm values are very large.

Minimization of a multi-variable function is usually an iterative process, in which updates of the state variable x are computed using the gradient and in some (favorable) cases the Hessian.

( sets the minimum along the line defined by the gradient)

Minimizing 2

Page 20: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Conjugate gradients (CG):

In each step of conjugate gradient methods, a search vector pk isdefined by a recursive formula:

The corresponding new position is found by line minimization alongpk:

the CG methods differ in their definition of .

Minimizing 2

Page 21: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Newton’s methods:

Newton’s method is a popular iterative method for finding the 0 of a one-dimensional function:

x0x1x2x3

It can be adapted to the minimization of a one –dimensional function, in which case the iteration formula is:

Minimizing 2

Page 22: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

The equivalent iterative scheme for multivariate functions is based on:

Several implementations of Newton’s method exist, that avoid Computing the full Hessian matrix: quasi-Newton, truncated Newton, “adopted-basis Newton-Raphson” (ABNR),…

Minimizing 2

Page 23: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Data analysis and Data Modeling

Page 24: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Robust estimation of parameters

Least squares modeling assume a Gaussian statistics for the experimentaldata points; this may not always be true however. There are other possibledistributions that may lead to better models in some cases.

One of the most popular alternatives is to use a distribution of the form:

Let us look again at the simple case of fitting a straight line in a set of data points (ti,Yi), which is now written as finding a and b that minimize:

b = median(Y-at) and a is found by non linear minimization

Page 25: Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore koehl/Teaching/BL5229 dbskoehl@nus.edu.sg

Robust estimation of parameters