interpolation and regression

Upload: justdaggas

Post on 02-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Interpolation and Regression

    1/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 60

    CHAPTER V

    Interpolation and Regression

    Topics

    Interpolation Direct Method; Newtons Divided Difference; Lagrangian Interpolation; Spline Interpolation.

    Regression Linear and non-linear.

    1. What is interpolation?

    A function ( ) x f y = is, often, given only at discrete points such as( ) ( ) ( ) ( )nnnn y x y x y x y x ,,,,......,,,, 111100 . How does one find the value of y at any other valueof x?Well, a continuous function ( ) x f may be used to represent the n+1 data values with ( ) x f passing through the n+1 point. Then we can find the value of y at any other value of x. This iscalled interpolation . Of course, if x falls outside the range of x for which the data is given, it is nolonger interpolation, but instead, is called extrapolation .

    So what kind of function ( ) x f should we choose? A polynomial is a common choice for aninterpolating function because polynomials are easy to

    - Evaluate- Differentiate, and- Integrate

    as opposed to other choices such as a sine or exponential series.Polynomial interpolation involves finding a polynomial of order n that passes through the n+1points. One of the methods is called the direct method of interpolation. Other methods includeNewtons divided difference polynomial method and Lagrangian interpolation method.

    (x0, y0)

    (x1, y1)

    (x2, y2)

    (x3, y 3)

    f (x)

    x

    y

  • 8/10/2019 Interpolation and Regression

    2/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 61

    1.2. Direct Method

    The direct method of interpolation is based on the following principle. If we have 'n+1' datapoints, fit a polynomial of order 'n' as given below

    nn xa xaa y +++= ...............10

    (1)through the data, where a 0, a 1, . . ., a n are n+1 real constants. Since n+1 values of y are given atn+1 values of x, one can write n+1 equations. Then the 'n+1' constants, a 0, a 1, . . ., a n, can befound by solving the n+1 simultaneous linear equations (Ahaaa !!! do you remember previouscourse !!!). To find the value of y at a given value of x, simply substitute the value of x in thepolynomial form.

    But, it is not necessary to use all the data points. How does one then choose the order of thepolynomial and what data points to use? This concept and the direct method of interpolation arebest illustrated using an example.

    1.2.1. Example

    The upward velocity of a rocket is given as a function of time in Table 1.

    Table 1. Velocity as a function of time

    t [s] v(t) [m/s]

    0 0

    10 227.04

    15 362.78

    20 517.35

    22.5 602.9730 901.67

    1. Determine the value of the velocity at t=16 s using the direct method and a first orderpolynomial.

    2.. Determine the value of the velocity at t=16 s using direct method and a third orderpolynomial interpolation using direct method.

  • 8/10/2019 Interpolation and Regression

    3/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 62

    0

    250

    500

    750

    1000

    0 10 20 30 40

    t [s]

    v (t) [s]

    Figure 5.2. Velocity vs. time data for the rocket example.

    1.3. Newtons divided difference interpolation

    To illustrate this method, we will start with linear and quadratic interpolation, then, the generalform of the Newtons Divided Difference Polynomial method will be presented.

    1.3.1. Linear interpolation

    Given ),,( 00 y x ),,( 11 y x fit a linear interpolant through the data. Note taht )( 00 x f y = and)( 11 x f y = , assuming a linear interpolant means:

    )()( 0101 x xbb x f +=

    Since at 0 x x = : 00010001 )()()( b x xbb x f x f =+== ,and at 1 x x = : )()()( 0110111 x xbb x f x f +== )()( 0110 x xb x f += Then

    01

    011

    )()( x x

    x f x f b

    =

    so

    )( 00 x f b =

    01

    011

    )()( x x

    x f x f b

    =

    And the linear interpolant,

    )()( 0101 x xbb x f +=

  • 8/10/2019 Interpolation and Regression

    4/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 63

    Becomes: )()()(

    )()( 001

    0101 x x x x

    x f x f x f x f

    +=

    1.3.2. Quadratic interpolation

    Given ),,( 00 y x ),,( 11 y x and ),,( 22 y x fit a quadratic interpolant through the data. Note that

    ),( x f y = ),( 00 x f y = ),( 11 x f y = and ),( 22 x f y = assume the quadratic interpolant )(2 x f given by

    ))(()()( 1020102 x x x xb x xbb x f ++=

    At 0 x x = ))(()()()( 100020010020 x x x xb x xbb x f x f ++==

    0b= )( 00 x f b =

    At 1 x x = ))(()()()( 110120110121 x x x xb x xbb x f x f ++==

    )()()( 01101 x xb x f x f += then

    01

    011

    )()( x x

    x f x f b

    =

    At 2 x x = ))(()()()( 120220210222 x x x xb x xbb x f x f ++==

    ))(()()()(

    )()( 12022020101

    02 x x x xb x x x x

    x f x f x f x f +

    +=

    then

    02

    01

    01

    12

    12

    2

    )()()()(

    x x

    x x

    x f x f

    x x

    x f x f

    b

    =

    Hence the quadratic interpolant is given by

    ))(()()( 1020102 x x x xb x xbb x f ++=

    ))((

    )()()()(

    )()()(

    )()( 1002

    01

    01

    12

    12

    001

    0102 x x x x x x

    x x x f x f

    x x x f x f

    x x x x

    x f x f x f x f

    ++=

  • 8/10/2019 Interpolation and Regression

    5/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 64

    Figure 5.4. Quadratic interpolation

    1.3.3. General Form of Newtons Divided Difference Polynomial

    In the two previous cases, we found how linear and quadratic interpolation is derived by NewtonsDivided Difference polynomial method. Let us analyze the quadratic polynomial interpolantformula

    ))(()()( 1020102 x x x xb x xbb x f ++= where

    )( 00 x f b =

    01

    011

    )()( x x

    x f x f b

    =

    02

    01

    01

    12

    12

    2

    )()()()(

    x x

    x x

    x f x f

    x x

    x f x f

    b

    =

    Note that ,0b ,1b and 2b are finite divided differences . ,0b ,1b and 2b are first, second, andthird finite divided differences, respectively. Denoting first divided difference by

    )(][ 00 x f x f =

    the second divided difference by

    01

    0101

    )()(],[

    x x

    x f x f x x f

    =

    and the third divided difference by

    02

    0112012

    ],[],[],,[

    x x

    x x f x x f x x x f

    =

    02

    01

    01

    12

    12 )()()()(

    x x

    x x

    x f x f

    x x

    x f x f

    =

  • 8/10/2019 Interpolation and Regression

    6/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 65

    where ],[ 0 x f ],,[ 01 x x f and ],,[ 012 x x x f are called bracketed functions of their variablesenclosed in square brackets.

    We can write:

    ))(](,,[)](,[][)( 1001200102 x x x x x x x f x x x x f x f x f ++=

    This leads to the general form of the Newtons divided difference polynomial for )1( +n datapoints, ( ) ( ) ( ) ( )nnnn y x y x y x y x ,,,,......,,,, 111100 as

    ))...()((....)()( 110010 +++= nnn x x x x x xb x xbb x f

    where][ 00 x f b =

    ],[ 011 x x f b = ],,[ 0122 x x x f b = M

    ],....,,[ 0211 x x x f b nnn = ],....,,[ 01 x x x f b nnn =

    where the definition of the thm divided difference is],........,[ 0 x x f b mm =

    0

    011 ],........,[],........,[ x x

    x x f x x f

    m

    mm

    =

    From the above definition, it can be seen that the divided differences are calculated recursively.

    For an example of a third order polynomial, given ),,( 00 y x ),,( 11 y x ),,( 22 y x and ),,( 33 y x

    ))()(](,,,[

    ))(](,,[)](,[][)(

    2100123

    1001200103

    x x x x x x x x x x f

    x x x x x x x f x x x x f x f x f

    +++=

    0b

    0 x )( 0 x f 1b

    ],[ 01 x x f 2b

    1 x )( 1 x f ],,[ 012 x x x f 3b ],[ 12 x x f ],,,[ 0123 x x x x f

    2 x )( 2 x f ],,[ 123 x x x f

    ],[ 23 x x f

    3 x )( 3 x f

  • 8/10/2019 Interpolation and Regression

    7/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 66

    1.4. Lagrangian Interpolation

    Polynomial interpolation involves finding a polynomial of order n that passes through the n+1points. One of the methods to find this polynomial is called Lagrangian Interpolation.

    Lagrangian interpolating polynomial is given by

    =

    =n

    iiin x f x L x f

    0

    )()()(

    where n in )( x f n stands for the thn order polynomial that approximates the function

    )( x f y = given at )1( +n data points as ( ) ( ) ( ) ( )nnnn y x y x y x y x ,,,,......,,,, 111100 , and

    =

    =

    n

    i j j ji

    ji x x

    x x x L

    0

    )(

    )( x Li is a weighting function that includes a product of )1( n terms with terms of i j = omitted.

    1.5. Spline Method of Interpolation

    Spline method was introduced to solve one of the drawbacks of the polynomial interpolation. In

    fact, when the order (n) becomes large, in many cases, oscillations appear in the resultingpolynomial. This was shown by Runge when he interpolated data based on a simple function of

    22511

    x y

    +=

    on an interval of [-1, 1]. For example, take six equidistantly spaced points in [-1, 1] and find y at

    these points as given in Table 1.

    Example

    Use the same previous data of the upward velocity of a rocket, to determine the value of thevelocity at t=16 s using third order polynomial interpolation using Newtons Divided Differencepolynomial.

    Example

    Use the same previous data of the upward velocity of a rocket, to determine the value of thevelocity at t=16 s using third order polynomial interpolation using third order polynomialinterpolation using Lagrangian polynomial interpolation.

  • 8/10/2019 Interpolation and Regression

    8/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 67

    Table 1: Six equidistantly spaced points in [-1, 1]

    Figure.5.5. 5 th order polynomial vs. exact function.

    Now through these six points, we can pass a fifth order polynomial

    ,2019.17308.156731.0)( 425 x x x f += 11 x

    through the six data points.

    When plotting the fifth order polynomial and the original function, you can notice that the two do

    not match well. So maybe you will consider choosing more points in the interval [-1, 1] to get a

    better match, but it diverges even more (see figure below). In fact, Runge found that as the order

    of the polynomial becomes infinite, the polynomial diverges in the interval of 1 < x < 0.726 and

    0.726 < x < 1.

    1 0.5 0 0.5 1

    1

    0

    1

    2

    f x( )

    f 1 n 1 x,( )f 1 n 2 x,( )f 1 n 3 x,( )

    x Figure.5.6. Higher order polynomial interpolation is a bad idea.

    x 22511

    x y

    +=

    -1.0 0.038461

    -0.6 0.1

    -0.2 0.5

    0.2 0.5

    0.6 0.1

    1.0 0.038461

  • 8/10/2019 Interpolation and Regression

    9/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 68

    1.5.1. Linear spline interpolation

    Given ( ) ( ) ( )( )nnnn y x y x y x y x ,,,......,,,, 111100 , fit linear splines to the data. This simplyinvolves forming the consecutive data through straight lines. So if the above data is given in anascending order, the linear splines are given by ( ))( ii x f y =

    Figure.5.7. Linear splines.

    ),()()(

    )()( 001

    010 x x x x

    x f x f x f x f

    += 10 x x x

    ),()()(

    )( 112

    121 x x x x

    x f x f x f

    += 21 x x x

    .

    .

    .),(

    )()()( 1

    1

    11

    += nnn

    nnn x x x x

    x f x f x f nn x x x 1

    Note the terms of

    1

    1 )()(

    ii

    ii

    x x

    x f x f

    in the above function are simply slopes between 1i x and i x .

    1.5.2. Quadratic Splines

    In these splines, a quadratic polynomial approximates the data between two consecutive datapoints. The splines are given by

  • 8/10/2019 Interpolation and Regression

    10/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 69

    ,)( 112

    1 c xb xa x f ++= 10 x x x ,22

    22 c xb xa ++= 21 x x x

    .

    .

    .

    ,2

    nnn c xb xa ++= nn x x x 1

    Now, how to find the coefficients of these quadratic splines? There are 3n such coefficients

    ,ia =i 1, 2, , n,ib =i 1, 2, , n,ic =i 1, 2, , n

    To find 3n unknowns, we need 3n equations and then simultaneously solve them. These 3nequations are found by the following.

    1) Each quadratic spline goes through two consecutive data points

    )( 01012

    01 x f c xb xa =++ )( 1111

    211 x f c xb xa =++

    .

    .

    .

    )( 112

    1 =++ iiiiii x f c xb xa )(2 iiiiii x f c xb xa =++

    .

    .

    .

    )( 112

    1 =++ nnnnnn x f c xb xa )(2 nnnnnn x f c xb xa =++

    This condition gives 2n equations as there are n quadratic splines going through twoconsecutive data points.

    2) The first derivatives of two quadratic splines are continuous at the interior points. Forexample, the derivative of the first spline

    112

    1 c xb xa ++ is

    112 b xa +

    The derivative of the second spline

    222

    2 c xb xa ++ is

    222 b xa +

  • 8/10/2019 Interpolation and Regression

    11/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 70

    and the two are equal at 1 x x = giving

    212111 22 b xab xa +=+ 022 212111 =+ b xab xa

    Similarly at the other interior points,

    022 323222 =+ b xab xa ...

    022 11 =+ ++ iiiiii b xab xa ...

    022 1111 =+ nnnnnn b xab xa

    Since there are (n-1) interior points, we have (n-1) such equations. Now, the total number ofequations is )13()1()2( =+ nnn equations. We still then need one more equation.We can assume that the first spline is linear, that is:

    01 =a

    This gives us 3n equations and 3n unknowns. These can be solved by a number of techniquesused to solve simultaneous linear equations.

  • 8/10/2019 Interpolation and Regression

    12/21

  • 8/10/2019 Interpolation and Regression

    13/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 72

    xaa y 10 +=

    and using minimizing =

    n

    ii

    1

    as a criteria to find a o and a 1, we find that for (Figure 5.8)

    Y = 4x -4

    Figure.5.8. Regression curve y = 4x 4 for y vs. x data.

    The sum of the residuals, 04

    1

    ==i

    i as shown in the table below .

    x y y predicted = y - y predicted2.0 4.0 4.0 0.03.0 6.0 8.0 -2.02.0 6.0 4.0 2.03.0 8.0 8.0 0.0

    04

    1

    ==i

    i

    So does this give us the smallest error? It does as 04

    1

    ==i

    i . But it does not give unique values

    for the parameters of the model. A straight-line of the model: Y = 6.

    0

    2

    4

    6

    8

    10

    0 1 2 3 4

    x

    y

    y =4x - 4

  • 8/10/2019 Interpolation and Regression

    14/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 73

    Figure.5.9. Regression curve y = 6 for y vs. x data.

    also makes 04

    1

    ==i

    i as shown in the table below.

    x y y predicted = y - y predicted2.0 4.0 6.0 -2.03.0 6.0 6.0 0.02.0 6.0 6.0 0.03.0 8.0 6.0 2.0

    0

    4

    1 ==i i

    Since this criterion does not give unique regression model, it cannot be used for finding theregression coefficients. Why? Because, we want to minimize

    ( )==

    =n

    iii

    n

    ii xaa y

    110

    1

    Differentiating this equation with respect to a 0 and a 1, we get

    na

    n

    i

    n

    i i ==

    =

    =

    10

    1 1

    _

    11

    1 xn xa

    n

    ii

    n

    ii

    ==

    =

    =

    0

    2

    4

    6

    8

    10

    0 1 2 3 4

    x

    y

    y = 6

  • 8/10/2019 Interpolation and Regression

    15/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 74

    Putting these equations to zero, give n= 0 but this is impossible. Therefore, unique values of a 0 and a 1 do not exist.

    You may think that the reason the minimization criterion =

    n

    ii

    1

    does not work is that negative

    residuals cancel with positive residuals. So is minimizing

    =n

    i i1

    criterion may be better? Let us

    look at the data given below for equation 44 = x y . It makes 44

    1

    ==i

    i as shown in the

    following table.

    x y y predicted | | = |y - y predicted |2.0 4.0 4.0 0.03.0 6.0 8.0 2.02.0 6.0 4.0 2.03.0 8.0 8.0 0.0

    44

    1==i i

    The value of 44

    1

    ==i

    i also exists for the straight line model y = 6. No other straight line for this

    data has 44

    1

  • 8/10/2019 Interpolation and Regression

    16/21

  • 8/10/2019 Interpolation and Regression

    17/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 76

    2

    11

    2

    1111

    =

    ==

    ===

    n

    ii

    n

    ii

    n

    ii

    n

    ii

    n

    iii

    x xn

    y x y xna

    2

    11

    2

    1111

    2

    0

    =

    ==

    ====

    n

    ii

    n

    ii

    n

    iii

    n

    ii

    n

    ii

    n

    ii

    x xn

    y x x y xa

    Redefining _ _

    1

    y xn y xS n

    iii xy =

    =

    2 _

    1

    2 xn xS n

    ii xx =

    =

    n

    x x

    n

    ii

    == 1 _

    n

    y y

    n

    ii

    == 1 _

    we can rewrite

    xx

    xy

    S

    S a =1

    _

    1

    _

    0 xa ya =

    2.4. Nonlinear models using least squares

    2.4.1. Exponential model

    Given ( )11 y , x , ( )22 y , x , . . . ( )nn y x , , we can fit bxae y = to the data. The variables a andb are the constants of the exponential model. The residual at each data point i x is

    ibxii ae y E =

    The sum of the square of the residuals is

    =

    =n

    iir E S

    1

    2 ( )=

    =n

    i

    bxi

    iae y1

    2

    To find the constants a and b of the exponential model, we minimize S r by differentiating withrespect to a and b and equating the resulting equations to zero.

  • 8/10/2019 Interpolation and Regression

    18/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 77

    ( )( ) 021

    ==

    =

    ii bxn

    i

    bxi

    r eae ya

    S

    ( )( ) 021

    ==

    =

    ii bxi

    n

    i

    bxi

    r eaxae yb

    S

    or

    01

    2

    1=+

    ==

    n

    i

    bxn

    i

    bxi

    ii eae y

    01

    2

    1=

    ==

    n

    i

    bxi

    n

    i

    bxii

    ii e xae x y

    These equations are nonlinear in a and b and thus not in a closed form to be solved as was thecase for the linear regression. In general, iterative methods must be used to find values of a andb.

    However, in this case, a can be written explicitly in terms of b as

    =

    =

    =n

    i

    bx

    n

    ibxi

    i

    i

    e

    e ya

    1

    2

    1

    Substituting gives

    01

    2

    1

    2

    1

    1=

    =

    =

    =

    =

    n

    i

    bxin

    i

    bx

    bxn

    ii

    bxi

    n

    ii

    i

    i

    i

    i e xe

    e ye x y

    This equation is still a nonlinear equation in b and can be solved by numerical methods such asbisection method or secant method.

    2.4.2. Growth model

    Growth models common in scientific fields have been developed and used successfully forspecific situations. The growth models are used to describe how something grows with changesin regressor variable (often the time). Examples in this category include growth of population withtime. Growth models include

    xcbe

    a y .1 +

    =

    where a , b and c are the constants of the model. At x= 0 ,b1

    a y+

    = and as x , a y .The residuals at each data point, xi are

    icxii be

    a y E +

    =1

    The sum of the square of the residuals is

  • 8/10/2019 Interpolation and Regression

    19/21

  • 8/10/2019 Interpolation and Regression

    20/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    Interpolation & Regression 79

    ( )

    ( )

    ( ) 0)(....2...

    .

    0)(....2

    0)1(....2

    110

    110

    1

    110

    0

    ==

    ==

    ==

    =

    =

    =

    mi

    n

    i

    mimii

    m

    r

    i

    n

    i

    mimii

    r

    n

    i

    mimii

    r

    x xa xaa yaS

    x xa xaa ya

    S

    xa xaa ya

    S

    Writing these equations in matrix form gives

    =

    =

    =

    =

    ==

    +

    =

    =

    +

    ==

    ==

    n

    ii

    mi

    n

    iii

    n

    ii

    mn

    i

    mi

    n

    i

    mi

    n

    i

    mi

    n

    i

    mi

    n

    ii

    n

    ii

    n

    i

    mi

    n

    ii

    y x

    y x

    y

    a

    a

    a

    x x x

    x x x

    x xn

    1

    1

    1

    1

    0

    1

    2

    1

    1

    1

    1

    1

    1

    2

    1

    11

    ......

    ...

    ...........

    ...

    ...

    The above system is solved for a 0, a 1,. . ., a m

    2.4.4. Logarithmic Functions

    The form for the log regression models is

    ( ) x y ln10 +=

    This is a linear function between y and ( ) xln and the usual least squares method applies inwhich y is the response variable and ( ) xln is the regressor.

    2.4.5. Power Functions

    The power function equation describes many scientific and engineering phenomena:

    bax y =

    The method of least squares is applied to the power function by first linearizing the data(assumption is that b is not known). If the only unknown is a , then a linear relation exists betweenxb and y . The linearization of the data is as follows:

    ( ) ( ) ( ) xba y lnlnln +=

    The resulting equation shows a linear relation between ( ) yln and ( ) xln .

  • 8/10/2019 Interpolation and Regression

    21/21

    Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

    We can put

    )ln(

    ln

    xw

    y z==

    ( )aa ln0 = then oaea = ba =1

    we getwaa z 10 +=

    n

    w

    an

    z

    a

    wwn

    zw zwna

    n

    ii

    n

    ii

    n

    i

    n

    iii

    n

    ii

    n

    i

    n

    iiii

    ==

    = =

    == =

    =

    =

    1

    1

    1

    0

    1

    2

    1

    2

    11 11

    Since a 0 and a 1 can be found, the original constants of the model are

    0

    1

    aea

    ab

    ==