chapter 4 statistics - purdue engineering · chapter 4 statistics when we make a measurement...

38
CHAPTER 4 STATISTICS When we make a measurement several different things may be true: (1) The parameter we are measuring is random, e.g. flow noise. (2) The parameter we are trying to measure is deterministic but other parameters affect the measurement: e.g. environmental factors, electronic noise, quantization noise. Many of these parameters are random. The result is our measurement is random, and the result of the next measurement cannot be predicted exactly. A deterministic measurement is one that can be predicted exactly. So how do we describe the measurements if they are random? How does the description relate to what we are trying to measure as opposed to what we do measure? Let's consider the following example as an illustration of some the things we should be thinking about when we make measurements of parameters that are random in nature. EXAMPLE Suppose we wish to measure the average age of US citizens in Indiana, and we decide to do this by going into a high school class, asking the ages of the students there and averaging the answers. It turns out that on this day only 5 students are in class. Let's make some comments about this. (a) The Process of Collecting the Data The average age of students in a high school class is much lower than the average of the US population. So even if we had thousands of students in our high school class and averaged their ages, the result would be much less than the US population average age. We say that there is a "bias" in our estimation procedure. Also the variation of the ages in the high school class would also be much lower than the variation of ages in the US population. Had we conducted the test at a location peopled by all age groups we might have wished to take many samples of people's ages, so that we would feel "happier" that the calculated average was closer to the true average age of the US population. (b) The Data Suppose we wanted to know the average age of students in Grade 11, and the class we sampled was Grade 11. We would now feel more comfortable that we were collecting a more appropriate range of samples. However, we only have 5 samples. If we calculated the average age when more students were in class, the answer would be different, and perhaps having more sample ages to average we would feel "better" about the calculation.

Upload: others

Post on 14-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

CHAPTER 4

STATISTICS

When we make a measurement several different things may be true:

(1) The parameter we are measuring is random, e.g. flow noise.

(2) The parameter we are trying to measure is deterministic but other parameters affect the

measurement: e.g. environmental factors, electronic noise, quantization noise. Many of

these parameters are random.

The result is our measurement is random, and the result of the next measurement cannot be

predicted exactly. A deterministic measurement is one that can be predicted exactly. So how do

we describe the measurements if they are random? How does the description relate to what we

are trying to measure as opposed to what we do measure? Let's consider the following example

as an illustration of some the things we should be thinking about when we make measurements

of parameters that are random in nature.

EXAMPLE

Suppose we wish to measure the average age of US citizens in Indiana, and we decide to do this

by going into a high school class, asking the ages of the students there and averaging the

answers. It turns out that on this day only 5 students are in class. Let's make some comments

about this.

(a) The Process of Collecting the Data

The average age of students in a high school class is much lower than the average of the US

population. So even if we had thousands of students in our high school class and averaged

their ages, the result would be much less than the US population average age. We say that

there is a "bias" in our estimation procedure. Also the variation of the ages in the high

school class would also be much lower than the variation of ages in the US population. Had

we conducted the test at a location peopled by all age groups we might have wished to take

many samples of people's ages, so that we would feel "happier" that the calculated average

was closer to the true average age of the US population.

(b) The Data

Suppose we wanted to know the average age of students in Grade 11, and the class we

sampled was Grade 11. We would now feel more comfortable that we were collecting a

more appropriate range of samples. However, we only have 5 samples. If we calculated the

average age when more students were in class, the answer would be different, and perhaps

having more sample ages to average we would feel "better" about the calculation.

4-2

In this example we have talked about:

the average value of the data samples,

the spread of the data samples,

the idea that somehow more data is "better",

the idea that the bigger the variation the more desirable it is to take more samples, and

finally,

the idea of bias where you are consistently "off" in your estimates due to some problem

with the way you are calculating what you wish to measure.

In this chapter we will try to formalize these concepts with a mathematical description of the

random data.

DISTRIBUTION OF THE DATA

Suppose we take samples of what we are trying to measure. Let

1 2 NX , X ,…X

be the sample measurements. What is the probability that a measurement (X) is less than a

certain value x? We write this as:

P [X < x].

Based on our sample we would estimate this probability by counting how many Xi are less than

x. Let's denote this by ni, and we estimate the probability that a measurement is less than x by

dividing ni by N. So:

inP [X < x] =

N (1)

where ^ denotes an estimate. As x increases P [X x] gets larger until we reach the point where

x is greater than the maximum of all the samples taken, maxi iX max (X ; i 1, 2,...,N) , at which

point ni = N for all values above this and P [X x] 1. Let’s plot P [X x] versus x (Figure 1).

Figure 1: Cumulative Probability (Sample Population)

Every time x equals one of the sample values this function jumps by 1/N. If N were large then

1/N would be small and the jumps would be less noticeable; they would also be occurring more

often. So when N → , i.e., the number of samples we take → , this function becomes a

4-3

smooth function of x, as shown below in Figure 2. We refer to the set of all possible samples as

the parent population. When we only have a finite number of samples we refer to this group as a

sample population.

Figure 2: Cumulative Probability (Total Population)

This shape tells us there are not many data values when x < a or when x > b.

In order to show where most data values lie, let's look at the probability of a data value lying

in an interval ∆x, centered on an arbitrary value of x. This is given by:

x x x x

P x X x P X x P X x2 2 2 2

(2)

The result of this calculation is dependent on the interval width ∆x so let's normalize the

calculation by ∆x, and define

x xP X x P X x

2 2p(x)

x (3)

Now let ∆x 0 and then

x o

dˆp(x) lim p(x) P[X x].

dx (4)

If we differentiate the function, P[X < x] shown in Figure 2, we should have a function similar to

the p(x) shown in Figure 3.

p(x)

x

a c b

Figure 3: Probability density function

4-4

p(x) is called the probability density function and shows us where the data values are most likely

to occur. From Figure 3 we can see that many of the data samples will have values near x = c.

There are many different types of probability density functions. Here are two examples.

(a) GAUSSIAN OR NORMAL PROBABILITY DENSITY FUNCTION

p(x)

2x2

x

1(x )

2

x

1p(x) e

2 (5)

x

Figure 4: Gaussian probability density function

Many noise signals approximately have this distribution. Also, if a measurement is the outcome

of many random events, the distribution will tend to be Gaussian. We see that values of x close

to x

are more like to occur than other values.

(b) UNIFORM PROBABILITY DENSITY FUNCTION

p(x)

1 p(x) = 1

2B , A – B < x < A + B (6)

2B

= 0 , elsewhere.

x

A-B A A+B

Figure 5

Calculating the Probability that X Lies in an Interval, a < X < b by using p(x)

Since p(x) is the derivative of P[X < x], then the probability that X lies between a and b is

integral of p(x) between a and b, given by:

b

a

P[a X b] p(x) dx. (7)

Figure 6: Rayleigh distribution

21x

2rx

p(x) e .r

4-5

This is illustrated in Figure 6 where the probability density function (pdf) of a Rayleigh

distributed random variable is shown.

Note that a property of ALL probability density functions is:

p(x) dx 1, (8)

which is to say that the probability that X takes a value between - and + is 1.

EXAMPLE

If p(x) is the pdf of a uniformly distributed random variable, such

p(x) = 12 10 < x < Q

= 0 x < 10 , x > Q.

Calculate Q.

Solution p(x)

12 The total area under the probability density function

should equal one.

x Area = (Q – 10) 12 = 1.

10 Q

Therefore Q = 10 1/12.

Figure 7: Uniform distribution

EXAMPLE

If the pdf of a random variable, X, is

2x

B1p(x) e

A, what is the relationship between A and B?

Solution

This looks like a Gaussian distributed random variable where p(x) has the form:

2x2

x

1(x )

2

x

1p(x) e

2

Setting 2x x

x

0 ; 2 B and 2 A, we derive the relationship:

2A

B2 2

xA 2 B ,

EXAMPLE

4-6

The pdf of a random variable (x) is

2x

2rxp(x) e

r for x > 0, and p(x) = 0 for x < 0.

What is the probability of X lying between 10 and 25?

Solution 2

2x

25 25x / 2r2r

10 10

x dP[10 x 25] e dx e dx

r dx

Note 2 2x / 2r x /2rd x

e edx r

so,

2 25x /2r 100/ 2r 625/ 2r

10

P[10 x 25] e e e

MOMENTS OF THE DATA e.g. mean, variance etc.

In this section we will describe calculation of moments of a random variable, e.g., mean value,

mean square value, etc.

Mean Value

If we wished to estimate the average or mean of our data samples we would calculate

N

ii 1

1X

N ,

If we know p(x) how can we calculate the mean value? Consider a small interval centered on

cx of width x, as shown in Figure 8(a).

Shaded area is c cx x

P[x x x ]2 2

Normalized histogram of the data =

icn /(N x).

Let's start with calculating how many of the total number of samples, N, fall into this interval:

4-7

c cx x

x X x .2 2

The number of values in this interval, cn , is the total number of values (N) times the probability

of a value lying in the interval, which is:

c

c

x x/2c c c x x/2

x Xn N P x X x N p(x) dx

2 2 (9)

If x is small, this tells us how many values we have approximately equal to cx . Splitting the

full range of data into bins x wide, centered on icx , we can replace p(x) by an approximate

histogram representation (Figure 8(b)). We can then say that we have approximately 1cn values

equal to 1cx ,

2cn values equal to 2cx and so on. The average can then be estimated by using:

ci i1 1 2 2 M M i ici

M M x x / 2cc c c c c c c cx x / 2

i 1 i 1

n1n x n x n x x p(x) x dx

N N (10)

Letting x 0 the bins become smaller and the icx become closer until they join up and become

continuous: icx 0 and c x /2i

c x /2i

M x

xi 1

. Therefore the mean, which we denote by x

is

given by:

x

Average or mean or E[X] x p(x)dx . (11)

The E[.] denotes expected value of, or average value of whatever is inside the brackets.

EXAMPLE

A random variable, X, has a probability density function p(x) = 0 when x < 0, p(x) = x when

0 < x < B, and p(x) = 0 when x > B. Calculate the mean value of X.

Solution

Lets denote the mean value of X by .

B

3

0

1E[X] x x dx B

3.

Because we know that the total area under the p.d.f. is 1, we can also calculate .

B

2

0

11 x dx B

2.

4-8

Hence 2

2 2, and B

3B.

Mean Value of a Function of a Random Variable

Using similar arguments to the derivation of the formula for calculating the mean, it is possible

to show the following. If Y is a function of a random variable X, Y = g(X), and X has a

probability density function p(x), then the mean value of Y is given by:

E[Y] E[g(X)] g(x)p(x)dx. (12)

EXAMPLE

Suppose that nRT

PV

and n, R and T are constant but V is a uniformly distributed random

variable with a p.d.f., p(V) = 1

Bfor V in the range o o

B BV V V

2 2, and p(V) = 0 outside this

range. Note that, because the V has a uniform distribution, the mean value of V is oV , the center

value of the range of values that V can take. Calculate the mean value of P.

Solution

o o

o o

B BV V

2 2

pB B

V V2 2

nRT nRT nRT 1E[P] E p(V) dV dV

V V V B

enRT

log (V)B

o

oe

o

o

BV

2nRT V B/2

logB V B/2

BV

2

.

You might note here that:

pnRT

E[V] which would equal

o

nrT

V.

However, if B is small, it can be shown, by expanding the log function in a Taylor series

expansion, that

p is approximately equal o

nRT

V .

4-9

Later in the propagation of error section we will make this assumption. Remember when you

use it, that it is based on the assumption that the variation is small which may not, in general, be

true. You actually knew this, if you thought about it. For example, consider Y = X2, with X

uniformly distributed in the interval -3 < X < +3. Y will take values in the interval 0 < Y < 9 and

is not uniformly distributed. The mean of X is 0, the mean of Y is obviously not the mean of X

squared, i.e., 2x , which is 0. In fact, the mean of Y is 3. Using equation (12), prove this as an

exercise.

Variance and Standard Deviation

To measure the spread of the data we look at how the data varies about its mean value, . If we

have our N samples, and say we know the mean, we measure the variation in the data with the

following estimate,

N

2i

i 1

1(X )

N . (13)

If we know p(x) we can calculate this exactly. This quantity is called the variance, which we

denote by 2x , and using equation (12), where the function is 2(x ) , we can show that:

2 2 2x E[(X ) ] (x ) p(x)dx (14)

The standard deviation, x , is the square root of the variance.

EXAMPLE

Show that the mean value of a random variable with the following Gaussian distribution is c.

21(x c)

21p(x) e

2

Solution

21(x c)

2

x

1E[X] xp(x)dx xe dx

2

2 21 1(x c) (x c)

2 21 1(x c)e dx c e dx

2 2

21(x c)

21e

2 c. p(x)dx 0.0 c.1 c. .

4-10

As x , 21

(x c)2e 0. Therefore the first term is zero. By definition the integral of p(x)

over the entire range of possible values for x is 1; the second term is therefore equal to c, giving

the required result.

EXAMPLE

Calculate the variance of a random variable with the following uniform distribution.

p( )

1p( ) for , p( ) 0,

2 elsewhere.

-x 0 x

Solution

2 2E[( ) ] ; 0 from inspection. 2 2 2E[ ] p( )d

3

21d

2 6

3 2

.3 3

EXAMPLE

Y = 3X2 – 12 and X is a random variable with a uniform distribution:

p(x) 1/2 for x , and 0, elsewhere.

What is the standard deviation of Y?

Solution

Let's first calculate the variance of Y.

2 2 22 2 2 2 2y

y y y y y y

E[(Y ) ] E[(Y 2Y ] E[Y ] 2E[Y] E[Y ]

2 2

y y

E[Y] 3E[X ] 12 12

Note we used 2

2E[X ] ,3

which we calculated in the previous example.

E[Y2] = E[9 X

4 – 72 X

2 + 144] = 9E[X

4] – 72E[X

2] + 144

1

2

4-11

E[X4] =

541 x

x dx2 10

4

5

4 2 4

2 2 2 2 2 4 2 2 4y

y

9 9 4E[Y ] 72 144 ( 12) 24 24

5 3 5 5.

The standard deviation of Y is y , that is, the square root of the variance. Hence,

2y

2

5.

ESTIMATES OF MEAN AND STANDARD DEVIATION

Most often we do not know p(x) and must estimate xx

and from the sample data values. The

formulas we use are:

N

ii 1x

1ˆX or X

N,

(15)

2N N

2 2 2x x i i

i 1 i 1x

1 1ˆand s or X or (X X)

N (N 1). (16)

Note that we have introduced some new notation. ^ is used to denote an estimate of a

quantity and hence x

ˆ denotes and estimate of the mean of X. The notation X is often used in

texts to denote the same thing, and is included here so that you become familiar with both

notations. In different texts you will find a wide variety of notation, so take care when reading to

understand that particular author's notation. Following our ^ notation 2xˆ , denotes an estimate of

the variance; another commonly used notation for this estimate is 2xs .

Each time we measure N samples and calculate xX and s we get a different answer.

xX and s are themselves random variables. In this section we will look at the properties of X and

how they are related to the properties of X, the data samples.

If X1, X2….Xn are independent measurements then we may write, if i j, that

i j i j

i j i jx x x x

E[(X ) (X )] E[(X )] E[(X )].

If i = j, then we have

4-12

ii

2 2i x

x

E[(X ) ] , by definition.

If the means of the Xi are all equal to x

, the mean of X is also x

. Formally proved:

N N N

i ix x xi 1 i 1 i 1

1 1 1 1E[X] E X E[X ] u N u u.

N N N N

Here we used the property: the expectation of a sum of terms is the sum of the expectations of

the individual terms in the sum. We say that X is an unbiased estimator of x

becausex

E[X] 0 . This means that if we repeated the test many times and calculate the mean

of each test, each time we would generate a different X , but these X's are centered aroundx

.

The standard deviation of X is the square root of the variance, which is given by:

2N N N

i i j2x x xi 1 i 1 j 1

1 1E (X ) E (X ) (X )

N N

2

N N N2 2x x i x i x j x2

i 1 i 1 j 1

1 1E[(X ) ] E X E[(X )(X )]

N N .

If i j,

i j i jx x x x

E[(X ) (X )] E[(X )] E[(X )] 0 ,

because i ix x x x

E[X ] E[X ] 0. If i = j, and all the Xi have the same variance, 2x , then

2 2

i xx

E[(X ) ] by definition.

Hence only the i = j terms survive the double summation. Thus:

2

2 2 2x x x2

1 xE[(X ) ] N .

NN

The standard deviation of the estimated means is:

4-13

2

2 x xx x .

N N (17)

This is an example of a case where you can see that more data (larger N) will lead to lower

variance of the estimated mean, and hence the estimate of the mean is improved by taking more

data.

What is p(x) , the pdf for X ? How does it relate to p(x)? If p(x) is Gaussian distributed and

N is large (in excess of 60) then p (x) is also approximately Gaussian with mean x

and standard

deviation, xx 1/ 2N

. The two distributions are shown together in Figure 9, for N = 100.

Figure 9: The probability density function of the data and of the estimated mean of the data

CONFIDENCE INTERVALS BASED ON X

Having taken N measurements and calculated X , we can derive the distribution of X if we

assume that X is a Gaussian distributed random variable. (For N large, X will have a Gaussian

distribution, for N < 60 it will have a student-T distribution). Consider the case where N is large

and

2 22 2

x xx x

1 N(x ) (x )

2 2

x x

1 Np(x) e e

2 2. (18)

This is illustrated in Figure 10. The total area of the shaded regions in this figure is .32. So if

the estimation of the mean were repeated many times, 68% of the estimated mean values would

lie in the region

xx

.

4-14

We might say that

x xx x

P[ X ] 0.68 .

Rearranging the inequality we say, we are 68% confident that the true mean x

( ) lies in the

interval:

x xx

X X .

This is a 68% confidence interval.

Figure 10: The probability density function of the estimated mean, X .

To be more confident that the true value lies in a particular interval then we need to widen

that interval. In general, if

Q xx

Q xx

z

z

p(X) dX Q

then the 100 Q% confidence interval for x

is

Q x Q xx

X z X z (19)

Qz can be taken from look-up tables (see Appendix A of this Chapter). x is not known but is

estimated by using:

x xx 1/ 2 1/ 2

s

N N.

EXAMPLE

p(x)

4-15

400 measurements of temperature are taken. X was calculated as 85.6 F and xs was calculated

as 5.3 F. What is the 95% confidence interval for the temperature?

Solution The 95% confidence interval is, approximately,

.95 xX z s .

xx 1/ 2

s 5.3s .265 F

20N

From look-up tables z.95 = 1.96. Therefore the 95% confidence interval is

85.6 1.96 .265 F .

Calculating confidence intervals for the true mean when N < 60

When less than 60 data values are available we may no longer use the Qz values derived from

assuming that X has a Gaussian distribution. In these cases we choose values taken from a

student-T distribution; this distribution is a function of the number of data samples, N. Again we

use look-up tables to determine these values (see Appendix B of this chapter). So for N < 60 the

100Q% confidence interval is approximately:

,Q xX t s , where N 1 (20)

EXAMPLE

25 measurements of pressure are taken, and the mean and standard deviation calculated:

X 201.3 p.s.i. and xs 3.5 p.s.i.

Calculate the 99% confidence interval for the true value.

How many more measurements need to be made so that the 99% confidence interval is

X .005 p.s.i?

Solution

xx 1/ 2

s 3.5s 0.7

5N p.s.i.

99% confidence interval is

24,99 xX t s 201.3 2.775 0.7 p.s.i.

The 0.005 p.s.i. is so small that we will assume that the number of measurements necessary will

be in excess of 100. Therefore

4-16

.99 1/ 2

3.50.005 z

N

1/ 2 .99z 3.5N 700 2.575 1802.5

0.005

N 3249006.25

Number of extra measurements = 3, 249, 007 - 25 = 3, 248, 982.

PROPAGATION OF ERROR

Often the determination of a physical quantity requires measurement of more than one

parameter. Consider, for example, the problem of obtaining the density of a gas which may be

assumed ideal.

P

RT

Both pressure and temperature must be measured to determine and each measurement may

contain an error. What then, is the best estimate of the resulting error in the density, i.e., how do

the errors propagate through the calculation?

We approach the problem by using the chain rule of differential calculus. Consider a general

function u = f(x,y) where x and y are the measured values. Then

f f

du dx dyx y

(21)

or,

f fu x y

x y

where x and y represent the measurement errors and f

x and

f

y are evaluated at X and Y .

Now x y may be positive or negative in general, so we define the absolute error au , as

af f

u x yx y

, (22)

where au is the maximum error which could be expected in the computed value of u based on

x and y. This may be too conservative, since it is just as likely that the second term in the

4-17

above equation could subtract from the first as add. Therefore another commonly used error

estimate is ru :

22

rf f

u x yx y

(23)

In general, then, the value of u would be reported in terms of a 100Q% confidence interval of the

form:

u u u

where u is chosen to be ru , and the 100Q% confidence intervals for X and Y were X x and

Y y , respectively. We approximate u by f (X, Y) , which is an approximation that is true

only if X and Y are small, as discussed earlier.

We have limited our discussion to functions of only two independent variables. However, the

extension to functions of three or more variables is straightforward. As the number of

independent variables increases ru rather than au becomes a better estimate of the error, since

the probability that all errors will be additive decreases. If

1 2 NY f (X ,X , ,X )

and we have calculated 100Q% confidence intervals for i i iX : X x . The 100Q% confidence

interval for Y is Y y where:

1 2 3 NY f (X ,X ,X X ) and

2N

ii 1 i

1 1 2 2

fy x

Xx x ,x x

(24)

EXAMPLE

As an example of the above procedure consider again the gas density measurement problem.

The basic equation is

P

RT

Assume R is known exactly and confidence intervals for P and T are P P and T T

respectively.

2

1 Pand

P RT T RT

The absolute error in is then

4-18

a 2

1 PP T

RT RT

and

22

2r 2

1 PP T

RT RT

With these equations, and estimates of the resolution of available instrumentation for measuring

pressure and temperature, one can readily determine if a is dominated by the expected error in

P or T . Steps can then be taken to improve the instrumentation for one or both measurements to

achieve a desired error bound in the measured density. One of the advantages of this analysis is

that it can provide such information before any data is actually taken, thus saving time and

money.

FITTING FUNCTIONS TO DATA

When we do static calibration we are often interested in fitting a model: f(x), to the input-

output data: i i(X ,Y ) . Let's assume for simplicity that the iX are exact but the iY deviate from the

model. So

i i iY f (X ) n (25)

where ni is the error in the ith measurement, or the amount the ith measurement deviates from the

data model.

There are many ways that we can optimize the choice of the function parameters, given the

data: i i(X ,Y ), i = 1, 2....N. For example, one is called Maximum Likelihood Estimation and

another is called Least Squares Estimation. If we assume that the ni are independent, Gaussian

distributed random variables, both approaches lead to the same set of equations. We will briefly

outline the least squares approach here.

Least Squares Estimation

In this approach we compose a function that measures the cost of getting the model wrong. This

is called the cost function. Here we will just use a measure of the total squared error at each data

point as our cost function. Sometimes, for example in controls, people add other terms to the

cost function such as how much power is being used to actuate a system to achieve the desired

output. In this way we ensure that the optimum solution is one that does not take too much

power to achieve, and so there will be a trade-off between power consumption and accuracy.

However, if we just use the squared error the cost function will look like:

4-19

N N 22

o 1 i i i o 1i 1 i 1

J(a ,a , ) n Y f (X ,a ,a , ) . (26)

We now minimize this cost function J by our choice of the parameters in the model: o 1a ,a , .

To do this we differentiate equation (26) with respect to ai , i = 0, 1, 2,...... and set the results to

zero. This will give us a set of equations to solve for the parameter estimates. If the function that

we are fitting to the data is a straight line: f(x) = ao + a1x, then the equations we need to solve

are:

N

i o 1 ii 1

ˆ ˆ0 2(Y a a X ) ( 1) (27a)

and

N

i o 1 i ii 1

ˆ ˆ0 2(Y a a X ) ( X ) . (27b)

Note that we have replaced a0 and a1 by 0 1ˆ ˆa and a , respectively. This is because the result that

you get is dependent on the data set that you collected. Even if the number of data points (N) is

large, each time the experiment is repeated and a new set of (Xi , Yi) are taken, the estimate will

be slightly different.

For simplicity, we will now stop explicitly including the limits on the summations.

Rearranging equations (27) gives:

, i o 1 iˆ ˆY a N a X (28a)

and

, 2i i o i 1 iˆ ˆX Y a X a X (28b)

or in matrix form:

ii 0

2i i 1i i

N , X ˆY a

ˆX Y aX , X. (29)

These are called the normal equations.

We could also have derived these equations by using vector notation. Starting with equation

(25) and writing it down using this straight line model for i = 1, 2,...N, we can generate a set of

equations. This set of equations can be written in matrix form:

4-20

1 1 1

2 2 20

1

N N N

Y 1, X n

Y 1, X na

, a

Y 1, X n

, (30a)

which can be written even more simply as:

y A a n (30b)

The cost function can now be written as:

T T T T T T TJ n . n y A a a A y y y a A A a. (31)

If you minimize this set of equations with respect to the parameters in a, then you will get the set

of normal equations:

T T ˆA y A A a, (32)

which you can show are identical to equations (29).

Equation (32) could be thought of as the result of pre-multiplying equation (30b) AT,

removing the AT n term and replacing a by a . The solution to these equations is:

i i i i1

N X Y X Ya , (33a)

and

2

i i i i i0

Y X X X Ya , (33b)

where

2 2i idet N (X ) ( X )TA A . (33c)

Thus, equations (33a-c) can be used to determine the gradient and intercept of the best

straight line fit to the data.

Standard Deviation of ˆ ˆi 0 1n , a and a .

Recalling that Yi = f(Xi) + ni and that the Xi are known exactly, then the standard deviation of

Yi once the relationship to Xi is removed is equal to the standard deviation of ni. That is:

i i iy f (x ) n n . We will retain the subscript n to remind ourselves that it is the standard

deviation of the error between the model and the data.

4-21

The easiest way to look at the variance of the parameter estimates is to look at:

1 1 2 1T T T T T Tnˆ ˆE[(a a).(a a) ] E[[A A] A n.n A[A A] ] [A A] . (34)

This result comes from assuming that the errors (ni) are independent. That is, the error at one

data point does not have a functional relationship to the error at another data point. The diagonal

elements of this covariance matrix give the variance of the parameters. Using equations (29) and

(32) and your linear algebra knowledge gives:

2i i

Ti

X , X

X , Ndet [A A]

T -1 1[A A] .

Therefore pulling off the diagonal elements of this matrix, recalling that Tdet[A A] and

scaling by 2n as shown in equation (34), gives:

1 0

22 22 2n

ˆ ˆa n a iN

, X . (35)

We can estimate 2n by estimating the variance of the noise that is the difference between the

model prediction of Yi and the measured Yi. The formula is:

N N

2 2 2n i i 0 1 i

i 1 i 1

1 1ˆ ˆˆ (n ) (Y a a X )

N 2 N 2. (36)

Note that there is an (N-2) in the denominator. In order to calculate this we needed to estimate

a0 and a1 from the data. Since the data has been used to estimate two of the parameters used in

this formula, we have to subtract two from the N. It can be shown formally that this yields an

unbiased estimate of 2n .

You need to be very careful about notation here, because the mean value of Y varies with X.

Subtract f(Xi) from Yi and then calculate 2nˆ ; this is really a measure of the variance of Y and

thus is sometimes denoted by 2y . However, in other texts 2

nˆ is denoted by 2yxs , which one

could term the variance of y once the functional relationship with x has been removed. More

usually 2y is defined as:

N

2 2 2 2 2y y y i

i 1y

1ˆE[(Y ) ], or s (Y Y)

N 1 , (37)

i.e. calculated ignoring the functional relationship between X and Y. The moral is: you need to

understand the notation in the text you are using - not all texts use the same notation and not all

4-22

notation makes that much sense! We will use equation (37) as the definition of 2y , the variance

of Y.

Confidence Intervals for the Parameter Estimates.

Now that we can calculate estimates (equations (33)) and the standard deviation of the

estimates,0aˆ and

1aˆ , (equations (35)), we are in a position to write down the confidence

intervals for the parameter estimates. For N < 60, the distribution that is used to construct the

confidence interval scaling, ,100Qt , has N-2 degrees of freedom, i.e., N 2 . So the 100Q%

confidence interval is:

iˆi N 2,100Q aˆ ˆa t for N < 60; (38a)

or iˆi Q aˆ ˆa z for N 60 , (38b)

where the subscript i is either 0 (bias estimate) or 1 (sensitivity estimate).

How well does the model fit the data?

A measure of how well the model fits the data is given by

Adjusted 2

2 n2y

ˆR 1

ˆ (39a)

Unadjusted 2

2 n2

i y

ˆSum of squares of error (N 2)R 1 1

Sum of squares of (Y Y) ˆ(N 1) , (39b)

which is the fraction of the output data variation that is explained by the model. 2nˆ is calculated

by using equation (36) and 2yˆ is calculated by using equation (37). Here R

2 =1 implies a perfect

fit to the data. R2 = 0.5 may imply a poor fit to the data. What is poor depends on what you are

measuring. For most measurements we do in this course, 0.5 would imply a poor model. If you

were measuring something that has inherently high variability, e.g., a person's response to music

(like/dislike), and relating it to sound level, a value of 0.5 might imply a reasonably good model.

Too few data points (N) compared to the number of parameters (NP)

How large does N need to be? The answer to this is really determined by the level of noise. If

there is a lot of noise on the data, you will need a lot of data to cancel out the effects of the noise

and to make the variances of the estimates small. While it is not obvious from the formulae,

there is a (1/N) factor in equations (35) because in the denominator ( ) the terms arise from

4-23

products of two summations, and in the numerator the terms arise from single summations. This

1/N factor means that the variance decreases with the number of data points used in the

estimation.

If there are too few data points (N is not much bigger than NP), the R2 value can be close to

one. For example, if you are fitting a straight line to two data points R2 = 1.0 because there is

apparently no error, the line goes through the data points. So you might think that this is the

perfect model. However, if you took another two data points and fit a line through them the

answer would be different if the ni are not zero, and yet R2 still equals 1. People often say that

the model is adapting to or fitting the noise in the data set. To avoid this problem you need to

have the number of data points to be at least 4-10 times the number of parameters, and, in

general, even more data is better.

A very high R2 value is suspicious. You should repeat the test to see if your parameter

estimates change significantly with the new set of data. If they do, you are probably modeling the

noise in your data set. If they change little, then perhaps you do really have a good model!

Another indicator of this problem is when the adjusted and unadjusted R2 values are very

different.

If R2 is small, you could still have a good model of the underlying functional relationship

between the Xi and the Yi. This would be the case if, the model structure (straight line) was

correct, there was a lot of noise on the measurements, but you used many measurements when

making the estimates of the parameters of the model (N very large). In this case the 2nˆ will be

large because of the high level of noise, and as a consequence R2 will be small.

So the bottom line is: R2 is a measure of how well your model fits the data set you have.

It is not a measure of how good your model is. However, when used carefully, R2 calculations

may be useful in helping you determine if your model is good.

Example: Using MATLAB to fit a straight line to some calibration data

Given:

The following calibration data.

Temperature (°C) 20 30 40 50 60 70 80 90 100

Voltage (mV) 1.77 2.21 2.60 2.97 3.51 3.79 4.13 4.69 5.02

You may assume that the data is approximately linear over this range of inputs.

To Find:

The 99% confidence interval for the sensitivity and bias.

Solution:

4-24

An m-file was written to do the calculation using regress and also by explicitly doing the

calculations after first calculating the vector and matrix in the normal equations (29). The m-file

is listed below, after the output it generates.

M-file Output

ans =

Regress results

biasCI =

0.9780 +- 0.1760 mV

SensitivityCI =

0.0405 +- 0.0027 mV/degC

R2 =

Coefficient of determination: 0.9975

ans =

Own analysis results

biasCI =

0.9780 +- 0.1760 mV

SensitivityCI =

0.0405 +- 0.0027 mV/degC

R2 =

Unadjusted coefficient of determination: 0.9975

R2 =

Adjusted coefficient of determination: 0.9971

It appears that the R2 calculation that MATLAB does is the unadjusted R2 value.

M-file listing

% exreg.m program to fit a straight line through

% static calibration from a thermocouple.

%

temp=(20:10:100)';

N=length(temp);

y=[1.77 2.21 2.60 2.97 3.51 3.79 4.13 4.69 5.02]';

%

% Now do straight line fit using regress and construct the

% 99% confidence intervals.

[a,aint,r,rint,stats] = regress(y,[ones(size(temp)),temp],0.01);

%

% Calculate the uncertainty in the estimates:

% From the calculated confidence intervals.

Da=(aint(:,2)-aint(:,1))/2;

%

sprintf('\\n Regress results ')

biasCI = sprintf('%4.4f +- %4.4f mV',a(1), Da(1))

SensitivityCI = sprintf('%4.4f +- %4.4f mV/degC',a(2), Da(2))

R2 = sprintf('Coefficient of determination: %4.4f ',stats(1))

%

%-------------------------------------------------------------

% now do it by programming the normal equations yourself

% ATy = ATA a

%

4-25

ATA(1,1)=N;

ATA(1,2)=sum(temp);

ATA(2,1)=ATA(1,2);

ATA(2,2)=sum(temp.*temp);

ATy(1)=sum(y);

ATy(2)=sum(temp.*y);

%

%Solve equations: See equations (33) in Notes, Chapter 4

Del = ATA(1,1)*ATA(2,2)-ATA(1,2)*ATA(2,1);

a(1) = (ATA(2,2)*ATy(1)- ATA(2,1)*ATy(2))/Del;

a(2) = (ATA(1,1)*ATy(2)- ATA(1,2)*ATy(1))/Del;

%

% Work out what the noise is, and its variance:

ypred=a(1)+a(2)*temp;

noise=y-ypred;

varn=sum(noise.^2)/(N-2);

%

% Now work out confidence interval

% We use the T-distribution tables, but with

% nu = N-2 = 9-2 = 7

%

tnu_99=3.499;

Da= tnu_99*sqrt(varn*[ATA(2,2);ATA(1,1)]/Del);

%

sprintf('\\n Own analysis results ')

biasCI = sprintf('%4.4f +- %4.4f mV',a(1), Da(1))

SensitivityCI = sprintf('%4.4f +- %4.4f mV/degC',a(2), Da(2))

%

% Now calculate the variance of y for the R^2 calculation.

vary=(1/(N-1))*sum( (y-mean(y)).^2 );

%

% Calculate the unadjusted R^2 and adjusted R^2 value

% Unadjusted is based on sum of squares, Adjusted on estimated

% variances.

sta = 1 - (((N-2)*varn)/((N-1)*vary));

stats(1) = 1.0 - varn/vary;

R2 = sprintf('Unadjusted coefficient of determination: %4.4f ' , sta)

R2 = sprintf('Adjusted coefficient of determination: %4.4f ' , stats(1))

Example: Using MATLAB to fit a quadratic function to some calibration data

Given:

The following calibration data.

Temperature ( C) 20 30 40 50 60 70 80 90 100

Voltage (mV) 1.77 2.21 2.60 2.97 3.51 3.79 4.13 4.69 5.02

Find:

A quadratic function that fits this data.

Solution:

Following is the output of MATLAB program listed below. In contrast to the above example, I

have used MATLAB's own linear equation solver.

4-26

M-file Output

ans =

Regress results

b0_CI =

0.9609 +- 0.4461 mV

b1_CI =

0.0412 +- 0.0166 mV/degC

b2_CI =

-0.000006 +- 0.000136 mV/(degC*degC)

R2 =

Coefficient of determination: 0.9975

ans =

Own analysis results

b0_CI =

0.9609 +- 0.4461 mV

b1_CI =

0.0412 +- 0.0166 mV/degC

b2_CI =

-0.000006 +- 0.000136 mV/(degC*degC)

R2 =

Unadjusted coefficient of determination: 0.9975

R2 =

Adjusted coefficient of determination: 0.9967

Note that the coefficient of the squared term is effectively zero, showing us that this model has

no advantage over a linear model. Also, if you compare the 2R value in this example with that

calculated with the linear model, shown in the previous example, the value has not really

changed at all.

M-file Listing

% exreg2.m program to fit a quadratic to calibration data.

%

temp=(20:10:100)';

N=length(temp);

y=[1.77 2.21 2.60 2.97 3.51 3.79 4.13 4.69 5.02]';

%

% Now do quadratic fit using regress and construct the

% 99% confidence intervals.

A=[ones(size(temp)),temp,temp.*temp];

[b,bint,r,rint,stats] = regress(y,A,0.01);

%

% Calculate the uncertainty in the estimates:

% From the calculated confidence intervals.

Da=(bint(:,2)-bint(:,1))/2;

%

sprintf('\\n Regress results ')

b0_CI = sprintf('%4.4f +- %4.4f mV', b(1), Da(1))

b1_CI = sprintf('%4.4f +- %4.4f mV/degC', b(2), Da(2))

b2_CI = sprintf('%4.6f +- %4.6f mV/(degC*degC)', b(3), Da(3))

4-27

R2 = sprintf('Coefficient of determination: %4.4f ', stats(1))

%

%_____________________________________________________________

% Now do it via the normal equations.

% For ease of notation set Q = AtA and p=Aty

%

A=[ones(size(temp)),temp,temp.*temp];

Q=A'*A; p=A'*y;

%

% Solve the set of linear normal equations: p = Q b

b=Q\\p;

% This should give the same results as:

b1=A\\y;

% because MATLAB solves y = A b1 using least squares.

%

% Find the error (noise) vector and its variance

no=y-A*b;

varn=(sum(no.*no))/(N-3);

%

% Find the variance of Y

vary=sum((y-mean(y)).^2)/(N-1);

%

%Find the diagonal elements of Q inverse and scale by variance of

%noise (varn) to find variance of estimates.

varb=diag(inv(Q))*varn;

%

% Scale by t(N-3,100Q)=t(6,99) to generate confidence interval.

t6_99=3.707;

Da=sqrt(varb)*t6_99;

%

%Now calculate the R^2 value

stats(1)=1-(((N-3)*varn)/((N-1)*vary));

sta=1-(varn/vary);

%

% Now print out the results

sprintf('\\n Own analysis results ')

b0_CI = sprintf('%4.4f +- %4.4f mV',b(1), Da(1))

b1_CI = sprintf('%4.4f +- %4.4f mV/degC',b(2), Da(2))

b2_CI = sprintf('%4.6f +- %4.6f mV/(degC*degC)' ,b(3),Da(3))

R2 = sprintf('Unadjusted coefficient of determination: %4.4f ' ,stats(1))

R2 = sprintf('Adjusted coefficient of determination: %4.4f ' ,sta)

Differentiating Between Repeatability and Nonlinearity.

When the variation in data is large it is difficult to determine whether the deviations from a

straight line that you observe are due to nonlinearity or due to random variability (noise). To

distinguish between the two we need to reduce the random variation in the data until it is small

when compared to the variation due to the nonlinearity. We do this by repeating the test and

averaging. For each input value we can calculate the mean and the standard deviation of a set of

nominally identical measurements, and then we can estimate confidence intervals (68% or 95%

or 99%) for the true mean at each input value. This is illustrated in Figures 11a and 11b. The

standard deviation of the data can be used as a measure of the repeatability of the measurements.

4-28

Having plotted the means and the confidence limits, you can now start to examine the data for

nonlinearities. The difference between the straight fit and the averaged data must be larger than

the confidence interval sizes, for it to be apparent that you are indeed seeing a nonlinearity, and

not some effect due to the random variation in the data.

Figure 11a: Calibration data from 25 tests and 1 test. Underlying functional relationship is:

O = -2 + 4 I + 1.5 I2. Measured O subject to random fluctuations.

Figure 11b: Calibration data averaged from 100 tests. Intervals are 99% confidence intervals for

the true mean of the measurements. (Data for input increasing only, no hysteresis effects

examined.)

When you are unable to repeat measurements, it is difficult for you to determine whether

you are observing complicated nonlinearities or variability due to randomness in the

measurement, or in the quantity that you are trying to measure. The second plot in Figure 10a

illustrates this. Beware of attributing the maximum difference between the straight line and the

data to nonlinearity, unless you are sure that the random variability is much smaller than the so-

called nonlinear effect that you are measuring.

TESTING AN HYPOTHESIS USING THE χ2 TEST

4-29

When reading papers and texts on statistical analysis you will notice that it is often assumed

that the data has a Gaussian distribution. This is because it has been observed that measurements

of real phenomena often do have Gaussian distributions. There is a theorem that says if your

measurements are the result of many different random events added together, the measurements

will have a Gaussian distribution if the number of contributing events is very large. This is

called the Central Limit Theorem. However, it is important to be able to test this assumption.

Here we have an hypothesis "The data is Gaussian distributed." How do we test this, or indeed

any hypothesis. Examples of other hypotheses that you may want to test in Engineering are:

"The relationship between two variables, x and y, is y = f(x)."

"This data has a uniform distribution."

The approach is the same for testing all these hypotheses and can be summarized in the

procedure outlined below. We will use the test for Gaussianity as the example. The function here

is the number of data values in an interval or bin. At the start of the analysis, you will decide how

many bins. There are many ways of splitting the data into bins: e.g., divide into M equally

spaced bins by dividing (XMAX - XMIN) by M. Actually, bins need not be equally spaced, and it is

common even with the equally spaced bins to extend the first and last to start from -∞ and end at

+ , respectively. We must be careful here as we don't want too many bins (not much data for

each bin) or too few bins (fine detail in the distribution is lost). A rule of thumb is that 80% of

the bins must contain 5 or more values. When you have a large amount of data (N > 40), you can

use the following formula.

The number of bins, M = 1.87 0.40

N 1 + 1. (40)

1. Based on your assumption, predict what the M function values should be. Denote this by Ek,

where k = 1,2,3,....M

Here our function is the number of values in each bin. The fraction of values in a bin extending from x i-1 to xi is

found from integrating the probability density function between these values. To find out how many we would

expect to see, we multiply this by N, the number of data values measured. Since we are testing to see if the data

is Gaussian the Ek are calculated thus:

2xi i 2

x

i 1 i 1

1(x )x x

2k

x x x

1E N p(x) dx . N e dx

(2 ). (41)

This integration is usually done by manipulating tabulated results such as those shown in Appendix A.

However, to evaluate the integral of the Gaussian p.d.f., we need to estimate the mean ( x) and the standard

deviation ( x) of the data. We now assume that the data has exactly this estimated mean and standard deviation,

perhaps not a good assumption if N is small.

2. Now list out the result of measuring the function values. Denote these values by Ok,

k = 1, 2, 3,.....M.

These are just the number of values you found in each bin when you sorted out your measurements into the

bins.

3. Form a normalized measure of the error between what you saw and what you expected to

see. In the 2 test this measure is:

4-30

2M

2 k ko

k 1 k

1 (E O )

(n c) E , (42)

where (n - c) = d is the number of degrees of freedom.

n is the number of function values you start with. In this case that is the number of bins you are sorting data into

(M). c is the number of constraints, the number of things you had to estimate in order to find Ek. Here because

the data was Gaussian, you had to estimate the mean and standard deviation and you also used the total number

of data values, N, which is the sum of the number in each bin. So in this example, testing whether the data is

Gaussian, c = 3.

4. We now decide whether this value of 2o is good or bad. We determine whether it lies in the

interval defined by:

(large error) 0.05 < 2 2oP[ ] 0.95 (suspiciously small error).

The probability is calculated from look up tables such as the one shown in Appendix C of

this Chapter. You need to know d = (n - c) and 2o to use the look up table.

If we repeated the test, gathering a new set of data each time, and the data was indeed Gaussian, then the

distribution of the 2o 's would look something like that shown in Figure 12. (The shape of this chi-squared

distribution is a function of the number of degrees of freedom; as d gets large, this looks more Gaussian.)

While any positive value is possible for 2

, it is unlikely that the calculated value would be large. Note that a

large value corresponds to a large difference between what was observed and what was expected in each bin.

We are also suspicious of very small values because the data seems too good to be true, a result which is

probably not warranted with the amount of data sampled and the crudeness of this test.

Figure 12: Chi-squared 2( ) probability density function for some d.

Shaded areas each equal to 0.05. 2 2P[ A] 0.95 and P[ B] 0.05.

Example

One hundred data points were collected from an experiment where the phase of a signal was

measured. Test the hypothesis that the data is Gaussian distributed. The estimated mean was

0.0013 and standard deviation 1.8. The data was sorted into 13 bins using the hist in MATLAB.

The centers of the bins ranged from -3.0 to +3.0 and the bins were 0.5 wide. The number of data

4-31

points in each bin, starting from the bin centered on -3.0, was: 7, 4, 12, 8, 4, 7, 14, 10, 9, 7, 6, 12,

8.

Solution Use a mean = 0.0, standard deviation of data = 1.8, standard deviation of estimated mean = 0.18,

and so the third and fourth decimal places in the estimated mean are not significant. For the chi-

squared analysis the first and last bins were expanded to - and + , respectively.

Shown in Figure 13 is the estimated probability density function derived from dividing the

number of values in each bin by (bin width total number of data points) = 0.5 ( 100 = 50. Also

shown is a Gaussian probability density function having a mean of 0.0 and a standard deviation

of 1.8.

Figure 13: Estimated and Gaussian probability density functions.

The expected values were calculated from equation (40), utilizing the erf program in MATLAB.

You could also do these calculations using the table in Appendix A. The data is shown in the

Table below.

Bin Limits Calculated and Measured Data

i 1x ix kO kE Error

-∞ -2.75 7 6.9657 0.0002

-2.75 -2.25 4 4.3567 0.0292

-2.25 -1.75 12 6.0188 5.9439

-1.75 -1.25 8 7.7387 0.0088

-1.25 -0.75 4 9.2607 2.9884

-0.75 -0.25 7 10.3139 1.0648

-0.25 0.25 14 10.6910 1.0242

0.25 0.75 6 10.3139 1.8044

0.75 1.25 4 9.2607 2.9884

1.25 1.75 8 7.7387 0.0088

1.75 2.25 6 6.0188 0.0001

2.25 2.75 12 4.3567 13.4093

2.75 + 8 6.9657 0.1536

4-32

Check that the calculations are correct by summing the Ek’s, the result should equal the number

of data points (100).

d = M – 3 = 13 – 3 = 10. 2o = 2.9424

From Appendix C,

2oP [ 2.9424] 0.002 , expressed as 0.2% in the table.

Therefore we reject the hypothesis that the data was Gaussian distributed at the 5% significance

level.

For your interest, listed below is the MATLAB m-file used to do the calculations and plotting for

this example.

% ch4exchi.m program to do a chi squared example

% % Data generation for example. n=100

rand('seed',sum(100000*clock)) x=pi*2*(rand(1,n)-0.5); %

% Definition of bins, based on Eq. (40), Chapter 4 xbin=(-3.:0.5:3.); wid=0.5;

wid2=0.25 % % Calculation of histogram, y contains no. of points in each bin.

% nen contains the bin centers. M is the number of bins [y,nen] = hist(x,xbin); [a,M]=size(nen)

% % Data statistics. sigx=std(x);

me=mean(x); % % Calculation of the area of each bin

for i=1:M; % % First normalize bin boundaries for erf function.

xt=max([abs(nen(i)-wid2),abs(nen(i)+wid2)])/(sqrt(2)*sigx); xb=min([abs(nen(i)-wid2),abs(nen(i)+wid2)])/(sqrt(2)*sigx); %

% If this is the first or last bin calculate the additional % area.

if i==1

xxl=n*0.5*(1-erf(xt)); elseif i==M xxu=n*0.5*(1-erf(xt));

end % % Now calculate the areas associated with each boundary,

% subtract unless the mean is contained in the bin, % then add. if ( me>=(nen(i)-wid2) & me<=(nen(i)+wid2) )

e(i)=n*0.5*(erf(xt)+erf(xb));

4-33

else

e(i)=n*0.5*(erf(xt)-erf(xb));

end % end

% % Now correct first and last bin to add the tails. e(1)=e(1)+xxl;

e(M)=e(M)+xxu; % % Check that e sums to n, the number of data points.

[n sum(e)] % % Calculate the bin boundaries.

binl=nen-wid2; binu=nen+wid2; %

% Calculate the error and the chi-squared statistic. erro=(y-e).^2./abs(e);

chi=sum(erro)/(M-3)

% % Now tabulate the results in a diary file. !rm diary

diary on [binl' binu' y' e' erro'] diary off

% % Plot the estimated and the Gaussian distributions. % Put the measured data into a bar chart form.

[xx,yy]=bar(nen,y); % % Normalize yy (histogram of y) to form p hat (x).

yy=yy/(n*wid); % % Calculate the Gaussian from the estimated mean and standard deviation.

xgr=(-500:500)/100;

ygr=(1./(sqrt(2*pi)*sigx))*exp((-(xgr-me).^2)/(2*sigx^2)); %

% Plot the histogram and the Gaussian with text annotation. plot(xx,yy,xgr,ygr) xlabel('x - measured values')

ylabel('p(x)') gtext('Measured pdf') gtext('Gaussian pdf')\fR

SUMMARY

In this Chapter we have given you a brief introduction to statistics. Many Measurements texts

cover this material as well as Statistics and Error Analysis texts. In putting these notes together

the references listed below were used.

Here we have considered ways of describing a quantity that is random, basically through its

probability density function. Often we don't know this, but we can estimate the mean and

4-34

standard deviation of the data, which is a measure of what value the data is centered on and how

much variation is present. Often it is assumed that the data has a Gaussian distribution; we have

looked at the chi-squared test of this assumption. We developed confidence intervals for the true

value of the quantity that we are trying to measure based on the estimated mean and standard

deviation of the data.

We then considered what happens when we have a function of one or several quantities that

are random variables, and how the variation propagates through the function to produce

variability in the output of the function. In static calibration, where we compare the amplitude of

the output of a measurement system with the amplitude of the input, we often approximate the

input-output relationship with a straight line. In Chapter 2 we did this is a very qualitative

manner, in this chapter we derived the formulae for calculating the intercept and gradient of the

best fit straight line, given a set of (Xi, Yi) calibration measurements. Also we derived formulae

for the variation in these intercept (bias) and gradient (sensitivity) estimates, as well as

describing a measure (R2), which is an indicator of how well the model fits the data.

References

1. J. R. Taylor, An introduction to error analysis, University Science Books, Oxford

University Press, 1982.

2. T. G. Beckwith, G. Thomas, Mechanical measurements, Addison-Wesley, 1993.

3. R. S. Figliola and D. E. Beasley, Theory and design for mechanical measurements,

Wiley, 1995.

Appendix A: INTEGRATION OF THE NORMAL DISTRIBUTION

In this Appendix is a table of values resulting from integrating a normal distribution:

Q x Q xx x

Q x Q xx x

z z0

z 0 z

100 Q % 100 p(x) dx 2 100 p(x) dx . 2 100 p(x) dx .

This is the percentage of the total area under p(x)

the normal probability density function

between Q xx

z and Q xx

z ; this is

the shaded region in Figure A1. x

Q xx

z

x

Q xx

z

Table A1: Percentage of Total Area under p(x).

zQ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.00 0.80 1.60 2.39 3.19 3.99 4.79 5.58 6.38 7.17

0.1 7.97 8.76 9.55 10.35 11.14 11.93 12.71 13.50 14.29 15.07

0.2 15.85 16.63 17.41 18.19 18.97 19.74 20.51 21.28 22.05 22.82

0.3 23.58 24.34 25.10 25.86 26.61 27.37 28.11 28.86 29.60 30.34

0.4 31.08 31.82 32.55 33.28 34.00 34.73 35.45 36.16 36.88 37.58

0.5 38.29 38.99 39.69 40.39 41.08 41.77 42.45 43.13 43.81 44.48

0.6 45.15 45.81 46.47 47.13 47.78 48.43 49.07 49.71 50.35 50.98

0.7 51.61 52.23 52.85 53.46 54.07 54.67 55.27 55.87 56.46 57.05

0.8 57.63 58.21 58.78 59.35 59.91 60.47 61.02 61.57 62.11 62.65

0.9 63.19 63.72 64.24 64.76 65.28 65.79 66.30 66.80 67.29 67.78

1.0 68.27 68.75 69.23 69.70 70.17 70.63 71.09 71.54 71.99 72.43

1.1 72.87 73.30 73.73 74.15 74.57 74.99 75.40 75.80 76.20 76.60

1.2 76.99 77.37 77.76 78.13 78.50 78.87 79.24 79.59 79.95 80.30

1.3 80.64 80.98 81.32 81.65 81.98 82.30 82.62 82.93 83.24 83.55

1.4 83.85 84.15 84.44 84.73 85.01 85.30 85.57 85.85 86.11 86.38

1.5 86.64 86.90 87.15 87.40 87.64 87.89 88.12 88.36 88.59 88.82

1.6 89.04 89.26 89.48 89.69 89.90 90.11 90.31 90.51 90.70 90.90

1.7 91.09 91.27 91.46 91.64 91.81 91.99 92.16 92.33 92.49 92.65

1.8 92.81 92.97 93.12 93.27 93.42 93.57 93.71 93.85 93.99 94.12

1.9 94.26 94.39 94.51 94.64 94.76 94.88 95.00 95.11 95.23 95.34

2.0 95.45 95.56 95.66 95.76 95.86 95.96 96.06 96.15 96.25 96.34

2.1 96.43 96.51 96.60 96.68 96.76 96.84 96.92 97.00 97.07 97.15

2.2 97.22 97.29 97.36 97.42 97.49 97.55 97.62 97.68 97.74 97.80

2.3 97.85 97.91 97.96 98.02 98.07 98.12 98.17 98.22 98.27 98.31

2.4 98.36 98.40 98.45 98.49 98.53 98.57 98.61 98.65 98.68 98.72

2.5 98.76 98.79 98.82 98.86 98.89 98.92 98.95 98.98 99.01 99.04

2.6 99.07 99.09 99.12 99.14 99.17 99.19 99.22 99.24 99.26 99.28

2.7 99.30 99.33 99.35 99.37 99.38 99.40 99.42 99.44 99.45 99.47

2.8 99.49 99.50 99.52 99.53 99.55 99.56 99.58 99.59 99.60 99.61

2.9 99.63 99.64 99.65 99.66 99.67 99.68 99.69 99.70 99.71 99.72

3.0 99.73 99.74 99.75 99.75 99.76 99.77 99.78 99.79 99.79 99.80

4-36

Appendix B: STUDENT-t DISTRIBUTION

In the following table is given the scaling factors for the uncertainty, given N 1, the number

of degrees of freedom, and 100Q%, the percent confidence. Numbers are only given for 50%,

90%, 95% and 99% confidence intervals. Note that if 60 , then the values are the same as

those given for a Gaussian distribution (see Appendix A).

Table B1: t Values for Confidence Intervals

,50t ,90t ,95t ,99t

1 1.000 6.314 12.706 63.657

2 0.816 2.920 4.303 9.925

3 0.765 2.353 3.182 5.841

4 0.741 2.132 2.770 4.604

5 0.727 2.015 2.571 4.032

6 0.718 1.943 2.447 3.707

7 0.711 1.895 2.365 3.499

8 0.706 1.860 2.306 3.355

9 0.703 1.833 2.262 3.250

10 0.700 1.812 2.228 3.169

11 0.697 1.796 2.202 3.106

12 0.695 1.782 2.179 3.055

13 0.694 1.771 2.160 3.012

14 0.692 1.761 2.145 2.977

15 0.691 1.753 2.131 2.947

16 0.690 1.746 2.120 2.921

17 0.689 1.740 2.110 2.898

18 0.688 1.734 2.101 2.878

19 0.688 1.729 2.093 2.861

20 0.687 1.725 2.086 2.845

21 0.686 1.721 2.080 2.831

30 0.683 1.697 2.042 2.750

40 0.681 1.684 2.021 2.704

50 0.680 1.679 2.010 2.679

60 0.679 1.671 2.000 2.660

0.674 1.645 1.960 2.576

4-37

Appendix C: INTEGRATION OF THE 2 DISTRIBUTION

Tabulated below is 2 2o100 P[ ] .

d = n – c is the number of degrees of freedom. n is the number of function values, equal to the

number of bins if you are testing to see if the data is Gaussian. c is the number of constraints

which usually equals 3, when the hypothesis is that the data is Gaussian distributed. 2o is the

chi-squared statistic that you calculated from your test.

Recall that you are probably interested in obtaining a value between 5 (errors very large) and 95

(errors suspiciously small) for an acceptance of your hypothesis.

2o

d 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 8.0 10.0

1 100 48 32 22 16 11 8.3 6.1 4.6 3.4 2.5 1.9 1.4 0.5 0.2

2 100 61 37 22 14 8.2 5.0 3.0 1.8 1.1 0.7 0.4 0.2 - -

3 100 68 39 21 11 5.8 2.9 1.5 0.7 0.4 0.2 0.1 - - -

4 100 74 41 20 9.2 4.0 1.7 0.7 0.3 0.1 0.1 - - - -

5 100 78 42 19 7.5 2.9 1.0 0.4 0.1 - - - - - -

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

1 100 65 53 44 37 32 27 24 21 18 16 14 12 11 9.4 8.3

2 100 82 67 55 45 37 30 25 20 17 14 11 9.1 7.4 6.1 5.0

3 100 90 75 61 49 39 31 24 19 14 11 8.6 6.6 5.0 3.8 2.9

4 100 94 81 66 52 41 31 23 17 13 9.2 6.6 4.8 3.4 2.4 1.7

5 100 96 85 70 55 42 31 22 16 11 7.5 5.1 3.5 2.3 1.6 1.0

6 100 98 88 73 57 42 30 21 14 9.5 6.2 4.0 2.5 1.6 1.0 0.6

7 100 99 90 76 59 43 30 20 13 8.2 5.1 3.1 1.9 1.1 0.7 0.4

8 100 99 92 78 60 43 29 19 12 7.2 4.2 2.4 1.4 0.8 0.4 0.2

9 100 99 94 80 62 44 29 18 11 6.3 3.5 1.9 1.0 0.5 0.3 0.1

10 100 100 95 82 63 44 29 17 10 5.5 2.9 1.5 0.8 0.4 0.2 0.1

11 100 100 96 83 64 44 28 16 9.1 4.8 2.4 1.2 0.6 0.3 0.1 0.1

12 100 100 96 84 65 45 28 16 8.4 4.2 2.0 0.9 0.4 0.2 0.1 -

13 100 100 97 86 66 45 27 15 7.7 3.7 1.7 0.7 0.3 0.1 0.1 -

14 100 100 98 87 67 45 27 14 7.1 3.3 1.4 0.6 0.2 0.1 - -

15 100 100 98 88 68 45 26 14 6.5 2.9 1.2 0.5 0.2 0.1 - -

16 100 100 98 89 69 45 26 13 6.0 2.5 1.0 0.4 0.1 - - -

17 100 100 99 90 70 45 25 12 5.5 2.2 0.8 0.3 0.1 - - -

18 100 100 99 90 70 46 25 12 5.1 2.0 0.7 0.2 0.1 - -

19 100 100 99 91 71 46 25 11 4.7 1.7 0.6 0.2 0.1 - - -

20 100 100 99 92 72 46 24 11 4.3 1.5 0.5 0.1 - - - -

22 100 100 99 93 73 46 23 10 3.7 1.2 0.4 0.1 - - - -

24 100 100 100 94 74 46 23 9.2 3.2 0.9 0.3 0.1 - - - -

26 100 100 100 95 75 46 22 8.5 2.7 0.7 0.2 - - - -

28 100 100 100 95 76 46 21 7.8 2.3 0.6 0.1 - - - -

30 100 100 100 96 77 47 21 7.2 2.0 0.5 0.1 - - - -

4-38