process model formulation and solution, 3e4modelling3e4.connectmv.com/images/c/c6/d... · process...

26
Process Model Formulation and Solution, 3E4 Section D: Curve Fitting Techniques: Part A - Regression Instructor: Kevin Dunn [email protected] Department of Chemical Engineering Course notes: Dr. Benoˆ ıt Chachuat 29 November 2010 1

Upload: dinhthuan

Post on 25-Jun-2018

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Process Model Formulation and Solution, 3E4Section D: Curve Fitting Techniques: Part A - Regression

Instructor: Kevin Dunn [email protected]

Department of Chemical Engineering

Course notes: © Dr. Benoıt Chachuat29 November 2010

1

Page 2: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Recall: (one possible) model classification

First-principles models: (or mechanistic models, or white-box models)

I Describe physico-chemical processes using engineering knowledge;e.g., conservation principles

I No direct use of measurement data

I Do use physical properties (e.g. from Perry’s Handbook)

Empirical models: (a.k.a. black-box models)

I Describes physico-chemical processes using collected sets ofmeasurement data; e.g., statistical methods

I No prior engineering knowledge of the system

Grey-box models: ←− most practical process engineering models!

I Suitable combination of:I a priori engineering knowledge; e.g. model structureI measured data; e.g. kinetic and transport rates

2

Page 3: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Curve fitting methods: various types

Different curve fitting approaches distinguished on the basis of theamount of noise associated to the data

Least-squares regression techniques:

I Where the data exhibit a significant degree of error (or noise)

I Principle: Fit a single curve that represents the general trend of thedata

I no attempt made to intersect every pointI follow the pattern of the points taken as a group

Interpolation techniques:

I Where the data is known to be very precise

I Principle: Fit a curve or a series of curves that pass directlythrough each data point

I data usually taken from tables (e.g. thermodynamical data)I also used to be build a simplified, more handy, version of a

complicated model

3

Page 4: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Example of least-squares regression

Problem: Fit an empirical model relating temperature (T ) and vaporpressure (P◦) for a pure component, using experimental data

Antoine equation: lnP◦ = a0 − a1

a2+T

T (K) P◦ (Pa) T (K) P◦ (Pa)298 95.2 348 730.8303 187.4 353 867.0308 133.1 358 997.7313 185.9 363 1104.5318 267.7 368 1262.6323 356.0 373 1507.8328 366.2 378 1721.8333 416.8 383 1943.3338 531.6 388 2126.2343 637.7 393 2428.2

4

Page 5: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Example of data interpolation

Problem: Fit an empirical model relating temperature (T ) and specificheat capacity (Cp) for a pure component, using experimental data

Cubic splines: Piecewise 3rd-order polynomials,

Cp = a0 + a1T + a2T2 + a3T

3

T (K) Cp (J/K/mol) T (K) Cp (J/K/mol)300 29.85 1000 34.00400 29.93 1100 34.49500 30.51 1200 34.86600 31.25 1300 35.23700 32.01 1400 35.54800 32.74 1500 35.79900 33.42 1600 36.01

5

Page 6: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Outline and recommended readings

Least-squares regressionLinear regressionMultiple linear regressionPolynomial regression

(Strongly) recommended readings:

I Chapters in Part 5: 17.1-17.2, and 18.1-18.6of S. C. Chapra, and R. P. Canale, “NumericalMethods for Engineers”, McGraw Hill, 5th/6thedition.

6

Page 7: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Refresher: descriptive statistics

I Given a collection of n data points y1, y2, . . . , yn

I Arithmetic mean:

y =

n∑k=1

yk

n

I Locates the “center” of the distribution of the data

I Standard deviation:

σy =

√√√√√√n∑

k=1

(yk − y)2

n − 1=

√√√√√√n∑

k=1

(yk)2 −

(n∑

k=1

yk

)2/n

n − 1

I Measures the “spread” of the distribution of the dataI The variance is the square of the standard deviation, σ2

y

7

Page 8: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: basics

Problem:Fit a straight line y = a0 + a1x that bestdescribes a set of paired observations(x1, y1), (x2, y2), . . . , (xn, yn)

How? Devise a criterion to establish a basis for the fit

I Due to measurement error, the data is such that:

yk = a0 + a1xk + ek

I ek , residual between the measured and predicted values of yk

I One possible strategy: “Minimize the sum of the squares of theresidual errors for all the available data”: other criteria?

Find a0, a1 that minimize: Sr =n∑

k=1

(ek)2 =

n∑k=1

(yk − a0 − a1xk)2

8

Page 9: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: calculations

I Values of a0, a1 that minimize Sr are such that:∂Sr

∂a0=

∂Sr

∂a1= 0

I This leads to a system of linear algebraic equations:n∑

k=1

yk − a0 n − a1

n∑k=1

xk = 0

n∑k=1

xkyk − a0

n∑k=1

xk − a1

n∑k=1

(xk)2 = 0

I The best possible straight line is obtained for:

a1 =

nn∑

k=1

xkyk −n∑

k=1

xk

n∑k=1

yk

nn∑

k=1

(xk)2 −

(n∑

k=1

xk

)2 , a0 =

n∑k=1

yk

n− a1

n∑k=1

xk

n

9

Page 10: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: example

Use least-squares regression to fit a straight line to:

xk 1 2 3 4 5 6 7yk 0.5 2.5 2.0 4.0 3.5 6.0 5.5

xk yk xkyk (xk )2

1.0 0.5 0.5 1.0

2.0 2.5 5.0 4.0

3.0 2.0 6.0 9.0

4.0 4.0 16.0 16.0

5.0 3.5 17.5 25.0

6.0 6.0 36.0 36.0

7.0 5.5 38.5 49.0

Σ 28.0 24.0 119.5 140.0

a1 =7(119.5)− 28(24)

7(140)− (28)2≈ 0.8393, a0 =

24

7− a1

28

7≈ 0.0714

10

Page 11: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: error quantification

I Assume that the measurement error is normally distributedI Gaussian white noise

I Standard deviation from the regression line:

σr =

√Sr

n − 2=

√∑nk=1(yk − a0 − a1xk)2

n − 2

I Divide by n − 2 since 2 data-derived estimatesI Quantifies the spread around the regression line

For a linear regression to be meaningful: σr < σy

I Various regressions can be compared based on:

Coefficient of determination:

r 2 = 1− Sr

Sy= 1−

P(yk − a0 − a1xk)

2P(yk − y)2

I r2 = 1: perfect fit

I r2 = 0: no improvement over themean y

11

Page 12: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: example (cont’d)

Use least-squares regression to fit a straight line to:

xk 1 2 3 4 5 6 7yk 0.5 2.5 2.0 4.0 3.5 6.0 5.5

a0 ≈ 0.0714

a1 ≈ 0.8393

xk yk (yk − y)2 (yk − a0 − a1xk )2

1.0 0.5 8.5765 0.1687

2.0 2.5 0.8622 0.5625

3.0 2.0 2.0408 0.3473

4.0 4.0 0.3265 0.3265

5.0 3.5 0.0051 0.5896

6.0 6.0 6.6122 0.7972

7.0 5.5 4.2908 0.1993

Σ 28.0 24.0 22.7143 2.9911

I Merit: σy =√

22.71437−1 ≈ 1.9457 > σr =

√2.99117−2 ≈ 0.7735

I Goodness: r2 = 1− 2.991122.7143 ≈ 0.868

I 86.8% of the variance explained by the linear model

12

Page 13: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: application to nonlinear models

I Linear regression requires a linear relationship between dependentand independent variables

I Concept: Find a transformation to express a nonlinear model in aform compatible with linear regression, y = a0x + a1

Example 1: exponential model, y = α0 exp(α1x)

1. Linearize by taking the natural logarithm:

ln y = ln α0 + α1x

2. Define new variables and data:

x = x , y = ln y , a0 = ln α0, a1 = α1

3. Calculate linear regression: y = a0 + a1x

4. Calculate initial parameter values:

α0 = exp(a0), α1 = a1

13

Page 14: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: application to nonlinear models

I Linear regression requires a linear relationship between dependentand independent variables

I Concept: Find a transformation to express a nonlinear model in aform compatible with linear regression, y = a0x + a1

Example 2: power model, y = α0xα1

1. Linearize by taking the natural logarithm:

ln y = ln α0 + α1 ln x

2. Define new variables and data:

x = ln x , y = ln y , a0 = ln α0, a1 = α1

3. Calculate linear regression: y = a0 + a1x

4. Calculate initial parameter values:

α0 = exp(a0), α1 = a1

14

Page 15: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Linear regression: application to nonlinear models

I Linear regression requires a linear relationship between dependentand independent variables

I Concept: Find a transformation to express a nonlinear model in aform compatible with linear regression, y = a0x + a1

Remarks:I The transformation must be tailored to the nonlinear model at hand!

I E.g., the logarithm function is useful to linearize product terms

I A transformation may or may not be found that yields a linearregression form

I A transformation may be found that leads to a multi-linearregression or a polynomial regression (instead of a linear regression)

15

Page 16: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: basics

Problem:Fit a hyperplane y = a0 + a1x1 + · · ·+ amxm

that best describes a set of observations(x11, . . . , xm1, y1), (x12, . . . , xm2, y2), . . .,(x1n, . . . , xmn, yn)

How? “Minimize the sum of the squares of the residual errors for all theavailable data”

Find a0, a1, . . . , am that minimize: Sr =n∑

k=1

(yk−a0−a1x1k−. . .−amxmk)2

I Values of a0, a1, . . . , am that minimize Sr are such that:

∂Sr

∂a0=

∂Sr

∂a1= · · · = ∂Sr

∂am= 0

16

Page 17: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: calculations

I For the jth coefficient aj (j 6= 0):

∂Sr

∂aj= 0 ⇒

I Overall, m + 1 linear algebraic equations (in m + 1 variables):

n∑k=1

yk − a0 n − a1

n∑k=1

x1k − . . .− am

n∑k=1

xmk = 0

n∑k=1

x1kyk − a0

n∑k=1

x1k − a1

n∑k=1

(x1k)2 − . . .− am

n∑k=1

x1kxmk = 0

...n∑

k=1

xmkyk − a0

n∑k=1

xmk − a1

n∑k=1

xmkx1k − . . .− am

n∑k=1

(xmk)2 = 0

17

Page 18: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: calculations (cont’d)

I In matrix form, Ma = r:

nn∑

k=1

x1k · · ·n∑

k=1

xmk

n∑k=1

x1k

n∑k=1

(x1k)2 · · ·

n∑k=1

x1kxmk

......

. . ....

n∑k=1

xmk

n∑k=1

x1kxmk · · ·n∑

k=1

(xmk)2

︸ ︷︷ ︸

M

a0

a1

...am

︸ ︷︷ ︸a

=

n∑k=1

yk

n∑k=1

x1kyk

...n∑

k=1

xmkyk

︸ ︷︷ ︸

r

I Use numerical methods for solution of linear algebraic equations toget best coefficient values a0, a1, . . . , am

I E.g., LU decomposition method

18

Page 19: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: example

Use least-squares regression to fit a plane to:

x1k 0 2 2.5 1 4 7x2k 0 1 2 3 6 2yk 5 10 9 0 3 27

x1k x2k yk x1kyk x2kyk (x1k )2 (x2k )2 x1kx2k

0 0 5 0 0 0 0

2 1 10 20 10 4 1 2

2.5 2 9 22.5 18 6.25 4 5

1 3 0 0 0 1 9 3

4 6 3 12 18 16 36 24

7 2 27 189 54 49 4 14

Σ 16.5 14 54 243.5 100 76.25 54 48

24 1P

x1kP

x2kPx1k

P(x1k )2

Px1kx2kP

x1kP

x1kx2kP

(x2k )2

35 =

24 1 16.5 1416.5 76.25 4814 48 54

35 ,

24 PykP

x1kykPx2kyk

35 =

24 54243.5100

3519

Page 20: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: example (cont’d)

Use least-squares regression to fit a plane to:

x1k 0 2 2.5 1 4 7x2k 0 1 2 3 6 2yk 5 10 9 0 3 27

a0

a1

a2

=

54−3

20

Page 21: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: error quantification

I The good news: Very similar to basic linear regression

I Standard deviation from the regression line:

σr =

√Sr

n − (m + 1)=

√∑nk=1(yk − a0 − a1x1k − . . .− amxmk)2

n − (m + 1)

I Divide by n − (m + 1) since m + 1 data-derived estimates,a0, a1, . . . , am

I Quantifies the spread around the regression line

For a multi-linear regression to be meaningful: σr < σy

I Various regressions can be compared based on:

Coefficient of determination:

r 2 = 1− Sr

Sy

= 1−P

(yk − a0 − a1x1k − . . .− amxmk)2P

(yk − y)2

I r2 = 1: perfect fit

I r2 = 0: no improvementover the mean y

21

Page 22: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Multiple linear regression: example (cont’d)

Use least-squares regression to fit a plane to:

x1k 0 2 2.5 1 4 7x2k 0 1 2 3 6 2yk 5 10 9 0 3 27

a0

a1

a2

=

54−3

x1k x2k yk (yk − y)2 (yk − a0

−P

i aixik )2

0 0 5 16 0

2 1 10 1 0

2.5 2 9 0 0

1 3 0 81 0

4 6 3 36 0

7 2 27 324 0

Σ 16.5 14 54 458 0

I Spread: σr = 0

I Goodness: r2 = 1Perfect fit!

22

Page 23: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Polynomial regression: basics

Problem:Fit a polynomial y = a0 + a1x + · · ·+ amxm

that best describes a set of observations(x1, y1), (x2, y2), . . ., (xn, yn)

How? “Transform a mth-order polynomial model into a mth-dimensionalmulti-linear model”

1. Define new data:

x1k = xk , x2k = (xk)2, xmk = (xk)

m, for k = 1, . . . , n

2. Calculate multi-linear regression:

y = a0 + a1x1 + . . . + amxm

23

Page 24: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Polynomial regression: calculations

I In matrix form, Ma = r:

nn∑

k=1

xk · · ·n∑

k=1

xmk

n∑k=1

xk

n∑k=1

x2k · · ·

n∑k=1

xm+1k

......

. . ....

n∑k=1

xmk

n∑k=1

xm+1k · · ·

n∑k=1

x2mk

︸ ︷︷ ︸

M

a0

a1

...am

︸ ︷︷ ︸a

=

n∑k=1

yk

n∑k=1

xkyk

...n∑

k=1

xmk yk

︸ ︷︷ ︸

r

I The resulting linear algebraic equations tend to become more andmore ill-conditioned as the polynomial order m increases

I Potential round-off error problem with LU decompositionI Most practical problems consist of low-order polynomials

24

Page 25: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Polynomial regression: example

Use least-squares regression to fit a 2nd-order polynomial to:

xk 0 1 2 3 4 5yk 2.1 7.7 13.6 27.2 40.9 60.1

xk yk (xk )2 (xk )

3 (xk )4 xkyk x2

k yk

0 2.1 0 0 0 0 01 7.7 1 1 1 7.7 7.72 13.6 4 8 16 27.2 54.43 27.2 9 27 81 81.6 244.84 40.9 16 64 256 163.6 654.45 60.1 25 125 625 305.5 1527.5

Σ 15 152.6 55 225 979 585.6 2488.8

6 15 5515 55 22555 225 979

a0

a1

a2

=

152.6585.6

2488.8

a0 ≈ 2.4786a1 ≈ 2.3593a2 ≈ 1.8607

25

Page 26: Process Model Formulation and Solution, 3E4modelling3e4.connectmv.com/images/c/c6/D... · Process Model Formulation and Solution, 3E4 ... Instructor: Kevin Dunn dunnkg@mcmaster.ca

Least-squares regression: conclusion

Which type of model should I choose in practice?

I Trade-off between:

AccuracyBest possible fit of the

data

←→Parsimony

Least possible number ofparameters

Practical procedure:

1. Start by fitting a linear model to the data; order m = 1

2. Increase order m← m + 1, and fit the corresponding polynomialmodel

3. Repeat step 2 as long as σr decreases with increasing m

26