statistics block presentation slides day 4

Statistics of Measurements and Reliability

Kristiaan Schreve

Stellenbosch University

kschreve@sun.ac.za

January 26, 2015

Kristiaan Schreve (SU) Stats Block January 26, 2015 1 / 181

Overview I

1 Introduction

2 Some Important Concepts

3 Excel Demonstration

4 Graphing DataChoosing the right type of graphGuidelines for creating good scientific graphs

5 Calculating Averages with Excel

6 Standard Deviation and Variance

7 Z Scores

8 Higher Order Distribution Descriptors

9 Frequency and Histograms

10 Box-and-whisker Plots

11 The Normal Distribution

12 Confidence LimitsSampling distributions


Overview II

Central limit theoremLimits of confidencet-distributionNormal distribution and t-distribution confidence limits compared

13 One-sample Hypothesis TestingSome revisionHypothesis testingSummary of one-sample hypothesis tests

14 Two-sample Hypothesis TestingHypotheses for two-sample means testingHypotheses for two-sample variance testingSummary of two-sample hypothesis tests

15 Analysis of Variance - Part OneIntroduction to ANOVASingle factor ANOVA


Overview III

After the F-test

16 RegressionLinear regressionTesting hypotheses about regressionExcels R-squaredExcel functions for regressionMultiple regressionGuidelines

17 CorrelationPearsons correlation coefficientCorrelation and regressionTesting hypotheses about correlation

18 Uncertainty of MeasurementEvaluation of standard uncertainty

Type A evaluation of standard uncertaintyType B evaluation of standard uncertainty


Overview IV

Law of propagation of uncertainty for uncorrelated quantitiesLaw of propagation of uncertainty for correlated quantitiesDetermining expanded uncertaintyReporting uncertaintyExample

19 Selecting the Right Method


Some Important Concepts I

Samples and Populations


[7]: pp. 10-17

Some Important Concepts II

Probability

Pr(event) =Number of ways the event can occur

Total number of possible events

Conditional Probability

Pr(event|condition)


Some Important Concepts III

Hypothesis

A statement of what you are trying to prove.What is the probability of obtaining the data, given that this hypothesis iscorrect?Can only be rejected.

Null hypothesis

H0

Alternate hypothesis

H1


Some Important Concepts IV

Type I error

Rejecting H0 when you should not.

Type II error

Not rejecting H0 when you should.


Excel Demonstration I

Accessing statistical functions (pp. 37)

Array functions (pp. 38)

Just remember to press Ctrl+Shift+Enter to complete the function

Naming cells or arrays (pp. 42)

Data analysis tools (pp. 51)


[7]: pp. 37-55

Graphing Data IChoosing the right type of graph

Column graphs

E.g. show percentage change over time for nominal values

Discrete data: open space between columns

Continuous data: no space between columns


[7]: pp. 65-96

Graphing Data IIChoosing the right type of graph

Avoid 3D. Here it works to show a zero value.


Graphing Data IIIChoosing the right type of graph

Pie graph

E.g. show percentages that make up one total

Avoid 3D effects, it can distort the ability to distinguish between sizesof the slices

As few slices as possible


Graphing Data IVChoosing the right type of graph


Graphing Data VChoosing the right type of graph

Line graph

E.g. show trends, or relationships between parameters

Figure: Global Temperature


Graphing Data VIChoosing the right type of graph

Figure: Global Temperature


Graphing Data VIIChoosing the right type of graph

Bar graph

E.g. make a point about reaching a goal

Good if the labels on the horizontal axis take too much space

Arrange in ascending/descending order whenever appropriate


Graphing Data VIIIChoosing the right type of graph


Graphing Data IXChoosing the right type of graph

Linear regression

E.g. show relationship between parameters

Use with great care!


Graphing Data XChoosing the right type of graph

Figure: Regression example


Graphing Data IGuidelines for creating good scientific graphs

Avoid colour graphs

Black & white printersColour blindness: up to 10% of male population suffer from red-greedcolour blindness (www.colour-blindness.com)Using colour in presentations is OK.

Dont wear out the viewers eyes

Pie graphs: avoid too many slicesLine graphs: avoid too many series/lines

Avoid unnecessary junk - it distracts from the main message (gridlines, 3D effects, etc.)

Include all information (axis labels, units, appropriate legends)

Excels Smooth scatter plots are almost always a bad idea

Independent variable on horizontal axis

Dependent variable on vertical axis


Not in textbook

Graphing Data IIGuidelines for creating good scientific graphs

Use regression with great care

The order of the regression must be appropriate for the number of datapoints and trend in the data, e.g. dont fit a quadratic polynomial toonly 3 data points.In general, dont extrapolate beyond the data range.Give an indication of the goodness of fit, see Figure 3.Give the confidence limits, see Figure 18.Check that the regression curve gives a valid prediction, e.g. a curvefitted to data that predicts temperature in Kelvin, cannot give negativevalues.Too large samples can be bad (see pp. 417 in textbook)

When plotting experimental data, use markers, with no lines betweenthem, see Figure 17.


Graphing Data IIIGuidelines for creating good scientific graphs

Whenever appropriate, include variability in your graphs (errorbars...). Also indicate what the error bars mean (95% confidence,min/max range, standard deviation, etc.), see Figure 17.

Graph a categorical (discrete) variable as though it is a quantitativevariable is just wrong (see Fig 19-1 in the textbook).

Choose the range of the variables appropriately, see Figure 5.

When the dependent and independent variable have the same unit,make sure that the axes have the same scale, see Figure 3.


Graphing Data IVGuidelines for creating good scientific graphs

Figure: An example of how NOT to plot categorical data.


Graphing Data VGuidelines for creating good scientific graphs

Figure: Use appropriate vertical range [4]


Graphing Data VIGuidelines for creating good scientific graphs

Table: Data set A, Running Times. [3]

Name Time [s]

Thomas 19Anthony 26Emma 18Jaspal 19.6Lisa 21Meena 22Navtej 27Nicola 23Sandeep 17Tanya 23


Graphing Data VIIGuidelines for creating good scientific graphs

Figure: Charts based on data in Table 1 [3]


Graphing Data VIIIGuidelines for creating good scientific graphs

Horizontal bars useful for large number of bars

Also useful if there is too much text for the horizontal axis

Rank of each athlete is clearly visible on bottom graph

None of the graphs shows the distribution of the data


Graphing Data IXGuidelines for creating good scientific graphs

Figure: Pie chart based on data in Table 1 [3]

Pie graphs generally OK for showing discrete data

Must show parts of a hole - not in this case!


Graphing Data XGuidelines for creating good scientific graphs

Figure: Histogram showing distribution of data in Table 1 [3]

Histograms show continuous data - no spaces between the bars.


Graphing Data XIGuidelines for creating good scientific graphs

Figure: Correct histogram showing distribution of data in Table 1 [3]


Graphing Data XIIGuidelines for creating good scientific graphs

Table: Data set B: Wind in January [3]

Wind type Days

Strong wind 10Calm 5Gale 7Light breeze 9

Total 31


Graphing Data XIIIGuidelines for creating good scientific graphs

Figure: Bar chart based on data in Table 2 [3]

Discrete data should have spaces between columns

Sequence of wind categories is not helpful


Graphing Data XIVGuidelines for creating good scientific graphs

Figure: Bar chart based on the data in Table 2 [3]

Meaningless to compare Total to wind categories. Looks likeanother category.


Graphing Data XVGuidelines for creating good scientific graphs

Figure: Bar chart based on the data in Table 2 [3]

Note discontinuity at start of Y-axis. This distorts the effect of thecolumns.


Graphing Data XVIGuidelines for creating good scientific graphs

Figure: Correct bar chart based on the data in Table 2 [3]


Graphing Data XVIIGuidelines for creating good scientific graphs

Figure: This is how you show a discontinuity in an axis.


Graphing Data XVIIIGuidelines for creating good scientific graphs

Figure: Pie chart based on the data in Table 2 [3]

Data in Table 2 is ideal for pie charts.

Including the Total makes no sense in the pie graph since it representscomponents of the total.


Graphing Data XIXGuidelines for creating good scientific graphs

Figure: Correct pie chart based on the data in Table 2 [3]


Graphing Data XXGuidelines for creating good scientific graphs

Figure: Graphing experimental data. Error bars show the measurement errorrange.


Graphing Data XXIGuidelines for creating good scientific graphs

Figure: Graphing regression curves


Graphing Data XXIIGuidelines for creating good scientific graphs

Figure: Example of a bad graph


Graphing Data XXIIIGuidelines for creating good scientific graphs



Graphing Data XXIVGuidelines for creating good scientific graphs



Calculating averages with Excel I

Mean (Excel: AVERAGE, AVERAGEA, AVERAGEIF, AVERAGEIFS,TRIMMEAN)

(We dont do geometric mean or harmonic mean on pp 106-107)

Median (Excel: MEDIAN)

Mode (Excel: MODE.MULT, MODE.SNGL)


[7]: pp. 97-112

Standard Deviation and Variance I

Population variance

2 =

(X X )2

N

Excel function: VAR.P and VARPA


[7]: pp. 113-123

Standard Deviation and Variance II

Sample variance

s2 =

(X X )2

N 1

Excel functions: VAR.S and VARA

Why divide by (N 1)? Calculating the average of the sample, X ,effectively takes away one degree of freedom.


Standard Deviation and Variance III

Standard deviation of a population

=2 =

(X X )2

N

Excel function: STDEV.P and STDEVPA

NOTE: the standard deviation has the same unit as the originalmeasurements


Standard Deviation and Variance IV

Standard deviation of a sample

s =

s2 =

(X X )2N 1

Excel function: STDEV.S and STDEVA

NOTE: whenever presenting a mean, always provide a standarddeviation as well


Z Scores I

How do you compare scores in one year to another year for, say,Mechatronics 424?Z scores take the mean as a zero point and the standard deviation as aunit of measure. Therefore, for a sample

z =X X

s

and for a population

z =X


[7]: pp. 131-145

Z Scores II

IQ scores are typically transformed Z scores

IQ = 16z + 100

The implication of this formula: mean IQ score is 100, standard deviationof IQ scores is 16.


Z Scores III

Excel function related to Z scores

STANDARDIZE

PERCENTILE.EXC, PERCENTILE.INC

PERCENTRANK.EXC, PERCENTRANK.INC

QUARTILE.EXC, QUARTILE.INC


Higher Order Distribution Descriptors I

Descriptors

Variance: Describes the spread in the data.

Skewness: Describes how symmetrically the data is distributed.

Kurtosis: Describes whether or not there is a peak in the distributionclose to the mean.


[7]: pp. 152-156

Higher Order Distribution Descriptors II

SkewnessExcel function: SKEW

skewness =

(X X )3

(N 1)s3


Higher Order Distribution Descriptors III

KurtosisExcel function: KURT

kurtosis =

(X X )4

(N 1)s4 3


Frequency and Histograms I

Frequency: Excel function: FREQUENCY - Remember: it is an arrayfunction.

Histogram: Use the Data Analysis Tool


[7]: pp. 156-160

Frequency and Histograms II

Histrogram: Shows the number of items in a certain category.

Frequency distribution: Shows the percentage of the total in a certaincategory, i.e. the histogram number for the category isdivided by the total number of samples in the histogram.

Histograms and frequency distributions are good to study centraltendencies, i.e. the tendency of all values in a sample of random variablesto be scattered around a certain value.The following is a guideline for the number of intervals K (from [2])

K = 1.87(N 1)0.4 + 1

As N, the number of measurements, becomes large, choose K

N [2]


Box-and-whisker Plots I

Figure: Box-and-whisker plot generated with Python


Not in textbook

Box-and-whisker Plots II

Gives an indication of the distribution of the data

Compare with histogram

Useful to compare different distributions

Matlab and Python both have useful tools to create these tools. Moredifficult with Excel.


Box-and-whisker Plots III

Figure: Box-and-whisker plot generated with Python


Box-and-whisker Plots IV

Example (Showing results of robot movement - Table)


Box-and-whisker Plots V

Example (Showing results of robot movement - Box-and-whisker plot)


The Normal Distribution I

f (x) =1

2e

(x)2

22

f (x) Probability density

Standard deviation

Mean


[7]: pp. 173-183

The Normal Distribution II

Properties of the normal curve [8], pp. 141

Point where curve reaches its maximum is at x =

Curve is symmetric about a vertical line through x =

Points of inflection at x = . It is concave downward if < x < + , concave upwards otherwise.Approaches horizontal axis asymptotically in both directions awayfrom x =

Total area under the curve above the horizontal axis is 1.

Other names for the normal curve

Gaussian curve

Bell curve


The Normal Distribution III

Standard Normal Distribution

= 0

= 1

If Z scores are normally distributed, it will fit the standard normaldistribution.

Normal distribution of IQ scores


The Normal Distribution IV

Cumulative Normal DistributionGives the cumulative area under the normal distribution.

F (x) =1

2

x

e(x)2

22

Figure: Cumulative Normal Distribution


The Normal Distribution V

Vertical axis gives area under normal distribution to the left of x .

Asymptotically approaches 1.

Areas under the normal distribution is used to calculate probabilities asfollows:Probability of an event between two values:

P(x1 < x < x2) =1

2

x2x1

e(x)2

22


The Normal Distribution VI

Figure: Probability of event between x1 and x2

Grey area is the probability of an event, x , between x1 and x2, i.e.P(x1 < x < x2)

F (x1) is probability of an event, x , less than x1, i.e. P(x < x1). Thisis found from the cumulative distribution function.

Therefore P(x1 < x < x2) = F (x2) F (x1)


The Normal Distribution VII

In Excel: P(x1 < x < x2) =NORM.DIST(x2,mean,standarddeviation,TRUE) - NORM.DIST(x1,mean,standard deviation,TRUE)


The Normal Distribution VIII

Probability of an event less than a value:

P(x < x1) =1

2

x1

e(x)2

22

Figure: Probability of event less than x1


The Normal Distribution IX

Grey area is the probability of an event, x , less than x1, i.e.P(x < x1) = F (x1)

In Excel: P(x < x1) =NORM.DIST(x1,mean,standarddeviation,TRUE)


The Normal Distribution X

Probability of an event more than a value:

P(x > x1) =1

2

x1

e(x)2

22

Figure: Probability of event more than x1


The Normal Distribution XI

Grey area is the probability of an event, x , more than x1, i.e.P(x < x1) = 1 F (x1)Note: the cumulative distribution gives the area to the left of x1.Since we are interest in the area to the right, we must subtract F (x1)from 1.

In Excel: P(x < x1) = 1NORM.DIST(x1,mean,standarddeviation,TRUE)


The Normal Distribution XII

Excel functions

NORM.DIST, NORM.S.DIST

NORM.INV, NORM.S.INV

Use NORM.DIST(x,mean,standard deviation,TRUE) for thecumulative distribution function

Use NORM.DIST(x,mean,standard deviation,FALSE) for theprobability density function


The Normal Distribution XIII

Example (Interpreting the normal curve [3], pp. 291)

The example refers to the distribution of normal IQ scores.

What proportion of the population measures an IQ less than 105?

90% of the population will have an IQ below what value?

The top 1% of the population will have an IQ above what value?

What range of IQs define the 95% interval?

Someone with a measured IQ in excess of 140 is considered eligiblefor MENSA. What is the probability that a randomly chosen personfalls in this category?


Confidence Limits ISampling distributions

A sampling distribution is the distribution of all possible values of astatistic for a given sample size.

Remember, the statistic, can be anything, e.g. the mean or thestandard deviation.We are talking about a statistic, because we are talking about samples,not populations, which would have parameters.In other words, if we repeatedly take samples from the samepopulation, we would get a slightly different statistic, say the mean,each time. The sampling distribution is the description of all thepossible values that the statistic can have.

The sampling distribution therefore has its own mean and standarddeviation.

The mean of the sampling distribution of the mean is x .

The standard deviation of the sampling distribution is called thestandard error.

The standard error is denoted as x .


[7]: pp. 187-189

Confidence Limits ICentral limit theorem

Theorem (Central limit theorem)

If X is the mean of a random sample of size n taken from a populationwith mean and finite variance 2, then the limiting form of thedistribution of

Z =X

n

as n, is the standard normal distribution with = 0 and = 1. [8]


[7]: pp. 189-195

Confidence Limits IICentral limit theorem

Implications of the central limit theorem.

Sampling distribution of the mean is approximately a normaldistribution if sample size is large enough (i.e. 30 or more samples).

The mean of the sampling distribution mean is the same as thepopulation mean, = x .

The standard error (or standard deviation of the sampling distributionmean) is equal to the population standard deviation, divided by thesquare root of the sample size, x =

N

.

The population does not have to be a normal distribution.


Confidence Limits ILimits of confidence

Theorem (Confidence interval of ; known)

If x is the mean of a random sample of size n from a population withknown variance 2, a (1 )100% confidence interval for is given by

x z/2n< < x + z/2

n

where z/2 is the z value leaving an area of /2 to the right. [8]

Note: for non-normal populations, n > 30, still give good resultsthanks to the central limit theorem.

Work through the example on pp. 195-198.

Excel function: CONFIDENCE.NORM, CONFIDENCE.T

Note: only use CONFIDENCE.NORM when n > 30 and if thepopulation is normally distributed.


[7]: pp. 195-199

Confidence Limits It-distribution

What if the sample size is < 30 or the distribution is not normal?t-distribution works better.

t =x s/

n


[7]: pp. 199-201

Confidence Limits IIt-distribution

Shape of the distribution depends on the degrees of freedom or df.

Figure: From [6]


Confidence Limits IIIt-distribution

Theorem (Confidence interval for ; unknown)

If x and s are the mean and standard deviation of a random sample from anormal population with unknown variance 2, a (1 )100% confidenceinterval for is given by

x t/2sn< < x + t/2

sn

where t/2 is the t value with n 1 degrees of freedom, leaving an area of/2 to the right. [8]

Excel functions:

T.INV, T.INV.2T

T.DIST, T.DIST.2T, T.DIST.RT

Repeat example on pp. 195-198, but use t-scores.


Confidence Limits INormal distribution and t-distribution confidence limits compared

Figure: Comparison of 90% confidence limits for the normal and t-distributions


Not in textbook

Confidence Limits IINormal distribution and t-distribution confidence limits compared

Note: the range of for the t-distribution is much larger than for the normaldistribution.


One-sample Hypothesis Testing ISome revision

Hypothesis Essentially a guess about the way the world works.

Null hypothesis H0 The data wont show anything new or interesting.Any deviation from the norm, is strictly due to chance.

Alternative hypothesis H1 Explains the world differently.

H0 Can only reject or not reject. Can never accept a hypothesis.

Type I error Incorrectly rejecting H0

Type II error Not rejecting H0 when it should have been rejected.

Hypothesis testing is about setting criteria for rejecting H0. This sets theprobability of making a Type I error. The probability is called .


[7]: pp. 203-204

One-sample Hypothesis Testing IHypothesis testing

Figure: From [6]


[7]: pp. 205-209

One-sample Hypothesis Testing IIHypothesis testing

and are areas that show the probabilities of making decisionerrors.

is typically 0.05. This corresponds to a 5% chance of making aType I error. It also represents the likelihood that the sample mean x ,is in that shaded region.

represents the likelihood that x is in the H1 distribution.

is never set beforehand. It depends on the distributions and where is set.


One-sample Hypothesis Testing IIIHypothesis testing

Example on pp 207-209


One-sample Hypothesis Testing IVHypothesis testing

Guidelines for writing the hypotheses [8], pp. 299

For a simple direction such as more than, less than, superior to,inferior to, etc., state H1 as an appropriate inequality (< or >). H0will be stated with the = sign.

If the claim suggests an equality and direction such as at least, equalto or greater, at most, no more than, etc., then state H0 using (6 or>). State H1 with the opposite inequality (< or >)sign.

If no direction is claimed (two-tailed tests), state H1 with 6= and H0with =.


One-sample Hypothesis Testing VHypothesis testing

One-sided (or one-tailed) tests are stated as

H0 : = (or )0H1 : > 0

or

H0 : = (or )0H1 : < 0

Two-sided (or two-tailed) tests are stated as

H0 : = 0

H1 : 6= 0


One-sample Hypothesis Testing VIHypothesis testing

Reject H0, with variance known, if x > b or x < a, where

a = 0 z/2n

b = 0 + z/2n

Figure: From [8]


One-sample Hypothesis Testing VIIHypothesis testing

The above is for a two-tailed test. A similar test can be formulated for aone-tailed hypothesis.


One-sample Hypothesis Testing VIIIHypothesis testing

Tests on a single mean (variance unknown)Rejection of H0 at significance level for

t =x 0s/

n

when

t > t/2,n1 or t < t/2,n1

Excel function: T.DIST


One-sample Hypothesis Testing IXHypothesis testing

Hypotheses involving variancesWhat if the hypothesis uses a variance rather than a mean?

H0 : 2 = (or )20

H1 : 2 > 20

or

H0 : 2 = (or )20

H1 : 2 < 20


One-sample Hypothesis Testing XHypothesis testing

Two-sided (or two-tailed) tests are stated as

H0 : 2 = 20

H1 : 2 6= 20


One-sample Hypothesis Testing XIHypothesis testing

Hypotheses involving variancesThe chi-square distribution is used in the hypothesis test

Like the t-distribution, it also involves the degrees of freedom in thesample (df=n-1).

2 =(N 1)s2

2


One-sample Hypothesis Testing XIIHypothesis testing


One-sample Hypothesis Testing XIIIHypothesis testing

H0 is rejected at significance level under the following conditionsOne-tailed hypothesis

For H1 : 2 < 20

2 < 21

For H1 : 2 > 20

2 > 2

Two-tailed hypothesis

2 < 21/2 or 2 > 2/2

Excel functions

CHISQ.DIST, CHISQ.DIST.RT

CHISQ.INV, CHISQ.INV.RT

CHISQ.TEST


One-sample Hypothesis Testing ISummary of one-sample hypothesis tests

H0 Value of Test Statistic H1 Critical Region

= 0 or 0 z = x0/n known < 0 z < z = 0 or 0 > 0 z > z = 0 6= 0 z < z/2

and z > z/2 = 0 or 0 t = x0s/n unknown < 0 t < t = 0 or 0 df = n 1 > 0 t > t = 0 6= 0 t < t/2

and t > t/2

2 = 20 or 2 20 2 =

(n1)s22

2 < 20 2 < 2,df

2 = 20 or 2 20 df = n 1 2 > 20 2 > 2,df

2 = 20 2 6= 20 2 < 2/2,df

and 2 > 2/2,df


Not in textbook

Two-sample Hypothesis Testing IHypotheses for two-sample means testing

Objective: does the two samples come from two different populations ornot?

Null hypothesis: Difference between the two samples are strictly due tochance. They come from the same population.

Alternative hypothesis: There is a real difference between the samples.They come from different populations.


[7]: pp. 219-235

Two-sample Hypothesis Testing IIHypotheses for two-sample means testing

One-tailed tests

H0 : 1 2 = 0H1 : 1 2 > 0

or

H0 : 1 2 = 0H1 : 1 2 < 0

Two-tailed tests

H0 : 1 2 = 0H1 : 1 2 6= 0


Two-sample Hypothesis Testing IIIHypotheses for two-sample means testing

Hypothesis testing procedure

1 Write the hypotheses, H0 and H1

2 Select the probability for making a Type I error

3 Calculate 1, 2, 1 and 2

4 Compare the test statistic to a sampling distribution of test statistics(see next slides)

5 Reject (or do not reject) H0


Two-sample Hypothesis Testing IVHypotheses for two-sample means testing

For this type of testing, the sampling distribution of the difference betweenmeans is needed.The sampling distribution of the difference between means is thedistribution of all possible values of differences between pairs of samplemeans with the sample sizes held constant from pair to pair.


Two-sample Hypothesis Testing VHypotheses for two-sample means testing

Figure: From [6]


Two-sample Hypothesis Testing VIHypotheses for two-sample means testing

NOTE:

All samples from population 1 must have the same size.

All samples from population 2 must have the same size.

The two sample sizes are not necessarily equal.

Characteristics of the sampling distribution of the difference betweenmeans according to the Central Limit Theorem

For large samples, it is approximately normally distributed.

For normally distributed populations, it is normally distributed.

The mean is the difference between the population meansx1x2 = 1 2The standard deviation (or standard error of the difference between

means) is x1x2 =

21N1

+22N2


Two-sample Hypothesis Testing VIIHypotheses for two-sample means testing

Tests on two means (variance known). Rejection of H0 at significancelevel for

z =(x1 x2) (1 2)

21N1

+22N2

when H1 : 1 2 < 0 (one tailed tests)

z < z

or H1 : 1 2 > 0 (one tailed tests)

z > z

or (two tailed tests)


Two-sample Hypothesis Testing VIIIHypotheses for two-sample means testing

z > z/2 or z < z/2


Two-sample Hypothesis Testing IXHypotheses for two-sample means testing

Tests on two means (variance unknown, but equal)Central Limit Theorem no longer applicable. Now, rather use thet-distribution.Calculate the pooled estimate of the standard error of the differencebetween means.

s2p =(N1 1)s21 + (N2 1)s22

(N1 1) + (N2 1)df = (N1 1) + (N2 1)


Two-sample Hypothesis Testing XHypotheses for two-sample means testing

Rejection of H0 at significance level for

t =(x1 x2) (1 2)

sp

1N1

+ 1N2


t < t,dfor H1 : 1 2 > 0 (one tailed tests)

t > t,df


t > t/2,df or t < t/2,df


Two-sample Hypothesis Testing XIHypotheses for two-sample means testing

Tests on two means (variance unknown, and unequal)Same test as the previous test (Two means, variance unknown), but thedegrees of freedom will be adjusted as follows [5], pp. 356:

df =(s21/n1 + s

22/n2)

2[(s21/n1)

2

n11 +(s22/n2)

2

n21

]df will in general not be an integer. Round down to nearest integer to uset table.


Two-sample Hypothesis Testing XIIHypotheses for two-sample means testing

Rejection of H0 at significance level for

t =(x1 x2) (1 2)

s21N1

+s22N2


t < t,dfor H1 : 1 2 > 0 (one tailed tests)

t > t,df


t > t/2,df or t < t/2,df


Two-sample Hypothesis Testing XIIIHypotheses for two-sample means testing

Hypothesis testing of paired samples [5], pp. 359One-tailed test

H0 :(1 2) = D0H1 :(1 2) > D0

[or H1 : (1 2) < D0]

Two-tailed test

H0 :(1 2) = D0H1 :(1 2) 6= D0

t =d D0sd/

n; df = n 1

Assumptions

The relative frequency distribution of the population of differences isapproximately normal.

The paired differences are randomly selected from the population ofdifferences.


Two-sample Hypothesis Testing IHypotheses for two-sample variance testing

Comparing the variances of two samplesTwo-tailed hypothesis

H0 :21 =

22

H1 :21 6= 22

To compare variances of two samples, the F-test is used.The test statistic is the F-ratio

F =s2as2b

where s2a > s2b

To draw a conclusion, the F-distribution is needed.


[7]: pp. 239-248

Two-sample Hypothesis Testing IIHypotheses for two-sample variance testing

Figure: From [6]


Two-sample Hypothesis Testing IIIHypotheses for two-sample variance testing

NOTE

The distribution depends on two dfs, dfa and dfb.

dfa = na 1dfb = nb 1


Two-sample Hypothesis Testing IVHypotheses for two-sample variance testing

Rejection of H0 at significance level when

F > F1/2(dfa, dfb) and F < F/2(dfa, dfb)

The F-test can be used to see if the variances of two samples differsignificantly before deciding which t-test to use for testing the differencebetween the means. In this case, we are not looking for small differencesbetween the variances, therefore it is desirable to choose a higher , say0.2 for the variance test.Excel functions

F.TEST

F.DIST, F.DIST.RT

F.INV, F.INV.RT

Data analysis tool: F-test two sample for variances


Two-sample Hypothesis Testing ISummary of two-sample hypothesis tests


1 2 = 0 z = (x1x2)(12)2

1N1

+2

2N2

1 2 < 0 z < z

1 and 2 known 1 2 > 0 z > z1 2 6= 0 z < z/2

and z > z/21 2 = 0 t = (x1x2)(12)

sp

1N1

+ 1N2

1 2 < 0 t < t,df

1 and 2 unknown 1 2 > 0 t > t,dfbut equal 1 2 6= 0 t < t/2,dfdf = N1 + N2 2 and t > t/2,df


Not in textbook

Two-sample Hypothesis Testing IISummary of two-sample hypothesis tests


1 2 = 0 t = (x1x2)(12)s21

N1+

s22

N2

1 2 < 0 t < t,df

1 and 2 unknown 1 2 > 0 t > t,dfand unequal 1 2 6= 0 t < t/2,dfdf =

(s21/n1+s22/n2)

2[(s2

1/n1)

2

n11+

(s22/n2)

2

n21

] and t > t/2,df1 2 = D0 t = dD0sd/n 1 2 < D0 t < t,df

df = n 1 1 2 > D0 t > t,df1 2 6= D0 t < t/2,df

and t > t/2,df


Two-sample Hypothesis Testing IIISummary of two-sample hypothesis tests


21 = 22 F =

s2as2b

21 < 22 F < F(dfa, dfb)

dfa = na 1 21 > 22 F > F1(dfa, dfb)dfb = nb 1 21 6= 22 F < F/2(dfa, dfb)

andF > F1/2(dfa, dfb)


Analysis of Variance - Part OneIntroduction to ANOVA

Example (Based on Table 12-1, [6])

Table: Data from Three Training Methods

Method 1 Method 2 Method 395 83 6892 89 7589 85 7990 89 7499 81 7588 89 8196 90 7398 82 7795 84

80

Mean 93.44 85.20 75.25Variance 16.28 14.18 15.64Standard Deviation 4.03 3.77 3.96


[7]: pp. 251-253

Analysis of Variance - Part One IIntroduction to ANOVA

Example (Continued...)

Hypothesis

H0 :1 = 2 = 3

H1 :Not H0

= 0.05

Performing multiple t-tests possibly sets us up for a disaster. Lets see why:

Chance of NOT making a Type I error with one comparison, with asignificance level of = 0.05 is 95%.

So, for 3 samples, 3 tests must be done: Method 1 Method 2,Method 1 Method 3 and Method 2 Method 3.


Analysis of Variance - Part One IIIntroduction to ANOVA

Each test will have a probability of NOT making a Type I error ofpi = 95%.

The combined probability of NOT making a Type I error is therefore

p(p1 p2 p3) = 0.95 0.95 0.95 = 0.86

Therefore, the combined chance (note, this is covered in chapter 16)of making a Type I error is

1 p(p1 p2 p3) = 0.14 or 14%

In general, the chance of making a Type I error increases as1 (1 )N where N is the number of t-tests.


Analysis of Variance - Part One IIIIntroduction to ANOVA

Table: Increasing chance of making a Type I error for multiple t-tests, from [6]

Number of samples t Number of tests Pr(at least one significant t)3 3 0.144 6 0.265 10 0.406 15 0.547 21 0.668 28 0.769 36 0.84

10 45 0.90


Analysis of Variance - Part One IVIntroduction to ANOVA

The idea with ANOVA is to separate the total variability into the followingcomponents [8]

1 Variability between samples, measuring systematic and randomvariation.

2 Variability within samples, measuring only random variation.

3 Finally, determine if component 1 is more significant than component2.


Analysis of Variance - Part One VIntroduction to ANOVA

The idea can also be illustrated with the following plots.The figure shows a single factor experiment at two levels, i.e. two treatments.

Figure: From [5] pp. 627

Is there sufficient evidence to indicate a difference between the populationmeans?


Analysis of Variance - Part One VIIntroduction to ANOVA

How about these two plots?

Figure: From [5] pp. 627

What statistics of the two samples in these plots did we intuitively use tomake a decision on the difference between the population means?


Analysis of Variance - Part One ISingle factor ANOVA

Recall the definition of the sample variance

s2 =

(x x)2

N 1

This is often called the Mean Square, because it is almost a mean ofsquared deviations.

Numerator: sum of squares=

(x x)2

Denominator: degrees of freedom, df


[7]: pp. 253-265

Analysis of Variance - Part One IISingle factor ANOVA

We can calculate the following variances (or mean squares) (alternativedefinitions are derived from [8], pp 472).

MST =SSTdfT

=

ki=1

nij=1 y

2ij

(ki=1

nij=1 yij

)2(ki=1 ni

)1(k

i=1 ni) 1

Mean Square for all the data.

Subscript T is for total data.

Numerator: Total sum of squares

Denominator: Total degrees of freedom. All the data - 1.


Analysis of Variance - Part One IIISingle factor ANOVA

In the second equation

k is the number of samples or treatmentsni is the number of data points in the i

th sampleyij is the j

th data point, from the i th sample.

MSW =SSWdfW

=

ki=1

(nij=1 yij

)2ni

(k

i=1

nij=1 yij

)2(ki=1 ni

)1k

i=1(ni 1)

Mean squares within samples. It is a pooled estimate of thepopulation variance.


Analysis of Variance - Part One IVSingle factor ANOVA

Indication of variances within samples.

Subscript W stands for within

Numerator: Within samples sum of squares

Denominator: Sum of degrees of freedom of each sample

MSB =SSBdfB

=SST SSWdfT dfW

Mean squares between samples. Indicates how the means differ.

Subscript B stands for between

Numerator: Between samples sum of squares


Analysis of Variance - Part One VSingle factor ANOVA

Denominator: Number of samples - 1

Note that

SSB + SSW = SST anddfB + dfW = dfT

Note that both MSW and MSB are estimates of the population variance.If there is a meaningful difference between the variances, then the samplescannot all come from the same populations and therefore there is ameaningful difference between the samples that cannot be attributed justto random errors.ANOVA translates

H0 :1 = 2 = . . . = k

H1 :Not H0


Analysis of Variance - Part One VISingle factor ANOVA

into

H0 :2B 2W

H1 :2B >

2W

Variances are compared with the F-distribution.The test statistic is therefore

f =MSBMSW

Reject H0 at significance level if f > f


Analysis of Variance - Part One IAfter the F-test

If H0 is rejected, how can you find where the differences lie?Planned comparisons

Also called a priori tests

Essentially it is t-tests comparing means of different samples.

The test statistic is

t =x1 x2

MSw [1n1

+ 1n2 ]

The hypotheses are:

H0 :1 2H1 :1 > 2

The rest of the test is a standard t-test with df = dfW .Kristiaan Schreve (SU) Stats Block January 26, 2015 133 / 181

[7]: pp. 258-261

Analysis of Variance - Part One IIAfter the F-test

Unplanned comparisonsThere may be some situations where the conditions for the t-testmentioned above are not met. This is then called a unplanned comparison.Also known as a posteriori or post hoc tests.Numerous tests are available...


Regressions ILinear regression


[7]: pp. 293-299

Regressions IILinear regression

Figure: Left: Scatter plot. Right: With linear trend line.


Regression ITesting hypotheses about regression

Residual variance of estimate

s2yx =

(y y )2

N 2

=

(y y )2

N n 1

n is the degree of the polynomial fitted to the data. In the linear case,n = 1.

N is the number of data points.

y y is the difference between the measured and predicted value.


[7]: pp. 299-306

Regression IITesting hypotheses about regression

Standard error of estimate

syx =

s2yx =

(y y )2N 2

Hypothesis

H0 :No real relationship

H1 :Not H0

Similar to ANOVA, the hypothesis will compare variances. Therefore,rewrite


Regression IIITesting hypotheses about regression

H0 :2Regression 2Residual

H1 :2Regression >

2Residual

To find the variances, we need the sums of squares and their correspondingdegrees of freedom.


Regression IVTesting hypotheses about regression

Figure: Deviations in a scatter plot, from [6]


Regression VTesting hypotheses about regression

SSResidual =

(y y )2

This represents the variability around the regression curve.

SSRegression =

(y y)2

This represents the gain in prediction by using a regression curve ratherthan just the average of the data.

SSTotal =

(y y)2

This represents the total variance.


Regression VITesting hypotheses about regression

The following identities hold

SSResidual + SSRegression = SSTotal

dfResidual + dfRegression = dfTotal

dfResidual = N 2dfTotal = N 1


Regression VIITesting hypotheses about regression

Similar to ANOVA, we use mean squares for the variances

MSRegression =SSRegressiondfRegression

MSResidual =SSResidualdfResidual

MSTotal =SSTotaldfTotal

Test the hypothesis with an F test

F =MSRegressionMSResidual

Reject H0 at significance level if F > F


Regression VIIITesting hypotheses about regression

Testing the slope(Note, this is a different approach from the textbook on pp. 267)Is the slope different from zero? Or, is the mean an equally goodpredictor?Hypotheses

H0 : = 0

H1 : 6= 0

This is a standard one-sample, two tailed, t-test. In what follows, = 0The test statistic is

t =b

sb; df = N 2


Regression IXTesting hypotheses about regression

Denominator estimates the standard error of the slope

sb =syx

sx

N 1

syx =

(y y )2N 2

sx =

(x x)2N 1


Regression XTesting hypotheses about regression

Testing the interceptIs the intercept not zero?Hypotheses

H0 : = 0

H1 : 6= 0

This is a standard one-sample, two tailed, t-test. In what follows, = 0The test statistic is

t =a

sa; df = N 2

sa =syx

sx

1N +

x2

(N1)s2x


Regression IExcels R-squared

Coefficient of Determination

R2 =SSRegression

SSTotal

When R2 1, there is a good correlation.When R2 0, not so!


Not in textbook

Regression IExcel functions for regression

SLOPE

INTERCEPT

STEYX

FORECAST

TREND

LINEST

Data analysis tool: Regression


[7]: pp. 307-319

Multiple regression I

Regression for more than one dependent variable.E.g. a plane:

y = a + b1x1 + d2x2

Any number of dependent variables are possible.

y = a +

bixi

Other types of fitting is also possible in Excel (logarithmic, exponential,higher order polynomials, etc.). Make careful decisions about the trend inthe data and choose an appropriate model. Use hypothesis testing to testyour assumptions.


[7]: pp. 320-327

RegressionGuidelines

Give an indication of the goodness of fit.

Report the range of the dependent variable(s) for which theregression was done and therefore the range for which the goodnessof fit test is valid.

Check the validity of the prediction of the regression result over therange of the dependent variable. E.g. sometimes the predicted resultmust be a positive value (e.g. the score of the tut test). If theregression result allows the possibility of predicting a negative value inthis case, the result must be reconsidered.

Fit the lowest order curve possible.


Not in textbook

Correlation IPearsons correlation coefficient

Correlation is an alternative to regression for looking at relationshipsbetween parameters. With regression it is possible to makepredictions. With correlation it is easier to say that relationships arestronger than others.

Positive correlation means that as one parameter increases, the otheralso increases.

Negative correlation means that as one parameter increases, the otherdecreases.

Note that correlation does not imply causality. (The same is true forregression.)


[7]: pp. 331-334

Correlation IIPearsons correlation coefficient

Pearsons product-moment correlation coefficient

r =

[1

N1]

(x x)(y y)sxsy

=cov(x , y)

sxsy

Numerator: covariance represents how x and y vary together.

Denominator: Standard deviations of x and y variables.

r = 1 implies perfect negative correlation (minimum value r canhave)

r = 1 implies perfect positive correlation (maximum value r can have)

r = 0 implies no correlation.


Correlation ICorrelation and regression

r =

r 2 =

SSRegression

SSTotal

r 2 is just Excels Coefficient of Determination

R2 = 0.667 implies SSRegression is 66.7% of SSTotal . To find out ifthat is significant, do a hypothesis test...


[7]: pp. 334-337

Correlation ITesting hypotheses about correlation

Correlation coefficient greater than zero?Sample statistic is r .Test for positive correlation

H0 : 0H1 : > 0

Test statistic (N 2) degrees of freedom.

t =r

srWhere

= 0

sr =

1r2N2

Reject H0 at significance level if t > t.Kristiaan Schreve (SU) Stats Block January 26, 2015 154 / 181

[7]: pp. 338-340

Correlation IITesting hypotheses about correlation

Example (Too much data for regression? [6] pp. 371)

Say, N = 102 and = 0.05.Say r = 0.195Is it a significant correlation?

t = rN2

1r2 = 1.988

t = 1.984. Since t > t, reject H0. We suspect the correlation issignificant.BUTr 2 = 0.038, which implies that SSRegression is just 4% of SSTotal .


Correlation IIITesting hypotheses about correlation

Example (Too much data for regression? Continued...)

r t t N 2 Reject?0.195 2.178 1.980 120 Yes0.195 2.085 1.982 110 Yes0.195 1.988 1.984 100 Yes0.195 1.886 1.987 90 No0.195 1.778 1.990 80 No


Correlation IVTesting hypotheses about correlation

Do two correlation coefficients differ?

H0 :1 = 2

H1 :1 6= 2

We have to transform the r value with

zr = 0.5[ln(1 + r) ln(1 r)]

The test statistic is then

z =z1 z2z1z2

where


Correlation VTesting hypotheses about correlation

z1z2 =

1

N1 3+

1

N2 3Reject H0 at significance level if

z/2 > z or z > z/2


Uncertainty of Measurement I

Based on ISO Guide 98-3[1].

Formal standard for expression of uncertainty in measurement.

True valuesof measurand can never be known.

Therefore, measurement errorcan also never be known.

Measurement results therefore should be expressed in statisticalterms, i.e. as a distribution.

Therefore, we should report some nominal value, e.g. the mean value,with some expression of the measurement uncertainty.


Not in textbook

Uncertainty of Measurement II

Example (Measuring Power Dissipated from a Resistor [1])

If a potential difference V is applied to the terminals of atemperature-dependent resistor that has a resistance of R0 at the definedtemperature t0 and a linear temperature coefficient of resistance , thepower P (the measurand) dissipated by the resistor at the temperature tdepends on V , R0, and t according to

P = f (V ,R0, , t) =V 2

R0[1 + (t t0)]P is never directly measured. We will measure V and t. With enoughrepetitions, measurement uncertainties for V and t can be found.Hopefully, the uncertainty in the reference values of R0, and t0 areknown. Then we need a method to propagate the uncertainty of thesevalues to the uncertainty of the measurand P.


Uncertainty of Measurement III

The example illustrates a few things

The measurand is seldom measured directly. Often it is derived froma functional relationship such as

Y = f (X1,X2, ...,XN)

Y is the measurand.

The Xi is either known from measurements or from some priorknowledge (e.g. a catalogue value).

There are two types of evaluation of standard uncertainty

Type A is determined from statistical analysis of a set ofmeasurements.Type B is determined by any other means.

We need a method to propagate uncertainty (see slide 164).


Uncertainty of Measurement IV

To find the mean value of the measurand, do you take the mean ofthe input quantities or do you first calculate the measurand for eachset of measurements and then take the mean of the measurand?


Uncertainty of Measurement V

Example (When to calculate the mean)

The table shows voltage andtemperature readings for the powerdissipated by the resistor in theprevious example. If R0 = 4.33 , = 0.00393 and t0 = 20

C, themean power dissipated is

21.43545 W if P is calculatedfor each data point and then themean of the 10 power values aretaken.

21.43568 W if the mean voltage(10.006565 V) and meantemperature (40.0563 C) isused.

The difference is due to the nonlinearfunction for P. The GUM guide [1]states that for nonlinear relations, themeasurand for each data point mustbe calculated and then the mean ofthe set of measurands must be taken.

Voltage [V] Temperature [C]

10.030 39.9309.991 39.9629.971 39.916

10.023 40.10210.000 39.94910.039 40.25010.073 40.3159.987 39.9219.935 40.124

10.017 40.093Kristiaan Schreve (SU) Stats Block January 26, 2015 163 / 181

Uncertainty of Measurement IEvaluation of standard uncertainty

From the examples it is clear that there are two types of uncertainty.

One is based on a set of repeated measurements. (Type A.) In theexample, it is the standard uncertainty of the temperature t andvoltage V .Another is based on other information, e.g. data sheets. (Type B.) Inthe example, it is the standard uncertainty of the constants R0, andt0.


Not in textbook

Uncertainty of Measurement IType A evaluation of standard uncertainty

Type A standard uncertainty is based on repeated measurements.

It is typically estimated with

sx =sN

Note, it is the standard error (or standard deviation of the samplingdistribution mean).

Can also be evaluated by other means, depending on the situation.

It is important to always report the degrees of freedom with the TypeA standard uncertainty.


Not in textbook

Uncertainty of Measurement IType B evaluation of standard uncertainty

Type B standard uncertainty is NOT based on repeatedmeasurements.

Typical sources of information [1]

previous measurement dataprevious experience and good engineering judgementmanufacturers specificationsdata provided in calibration and other certificatesuncertainties assigned to reference data taken from handbooks.

If the source does not give the standard uncertainty explicitly, it maybe derived. The GUM Guide [1] gives several examples in section 4.3.


Not in textbook

Uncertainty of MeasurementLaw of propagation of uncertainty for uncorrelated quantities

When the measurand is not directly measured, as in the example, thestandard uncertainty of the measurand depends on the combinedType A and Type B standard uncertainties.It can be shown, if the input quantities are independent, that thecombined standard uncertainty is

s2c (y) =Ni=1

(f

xi

)2s2(xi )

This is called the law of propagation of uncertaintyf is the function Y = f (X1,X2, ...,XN) and xi are the estimates of Xi .Note, the partial derivatives essentially scales the input uncertainties.It is sometimes called sensitivity coefficients.If the partial derivatives cannot be calculated directly, they may beevaluated numerically, or estimated experimentally (see sections 5.1.3and 5.1.4 in the GUM Guide [1]).


Not in textbook

Uncertainty of MeasurementLaw of propagation of uncertainty for correlated quantities

The law of propagation of uncertainty for correlated input quantitiesis

s2c (y) =Ni=1

(f

xi

)2s2(xi ) + 2

N1i=1

Nj=i+1

f

xi

f

xjs(xi , xj)

s(xi , xj) is the estimated covariance associated with xi and xj . It iscalculated as

s(xi , xj) =1

N(N 1)

Nk=1

(xi ,k xi )(xj ,k xj)

EXCEL: Covariance is calculated with COVARIANCE.P (populations)or COVARIANCE.S (samples).How do you handle a situation where some quantities are correlatedand some not?


Not in textbook

Uncertainty of MeasurementDetermining expanded uncertainty

In some practical cases the combined uncertainty is insufficient tocapture the uncertainty.

The expanded uncertainty is

U = ksc(y)

where k is the coverage factor.

Typically, 2 k 3.The result of the measurement is then typically expressed asY = y U.k can be chosen to cover a certain confidence interval, in which canthe confidence level should also be given.


Not in textbook

Uncertainty of Measurement IReporting uncertainty

In general, give all the information needed to repeat the evaluation.

Rather report too much.

What is reported should be in line with the intended use of themeasurement result, e.g. a calibration certificate for a nano-metreprecision measurement device would require a lot more informationthan a laser distance sensor you can buy at the local hardware store.

Consider to include the following [1]

clearly describe the methods used to calculate the measurement resultand its uncertainty from the experimental observation (Type Astandard uncertainty) and input data (Type B standard uncertainty)list all the uncertainty components and document fully how they wereevaluated.present the data analysis in such a way that each of its important stepscan be readily followed and the calculation of the reported result canbe independently repeated


Not in textbook

Uncertainty of Measurement IIReporting uncertainty

give all the corrections and constants used in the analysis and theirsourcesin the case of reporting expanded uncertainty report the coveragefactor.

The numerical result of the uncertainty is reported in one of thefollowing four ways. (Assume a mass ms of an object weighing about100 g is being reported.) The words below in parentheses may beomitted. [1]

ms=100,021 47 g with (a combined standard uncertainty)sc=0,35 mgms=100,021 47(35) g, where the number in parentheses is thenumerical value of (the combined standard uncertainty) sc referred tothe corresponding last digits of the quoted result.ms=100,021 47(0,000 35) g, where the number in parentheses is thenumerical value of (the combined standard uncertainty) sc expressed inthe unit of the quoted result.


Uncertainty of Measurement IIIReporting uncertainty

ms=(100,021 47 0,000 35) g, where the number following thesymbol is the numerical value of (the combined standarduncertainty) sc and not a confidence interval.

Report an expanded uncertainty as

ms=(100,021 47 0,000 79) g, where the number following thesymbol is the numerical value of (an expended uncertainty) U = ksc ,with U determined from (a combined standard uncertainty)sc=0,35 mg and (a coverage factor) k=2,26 based on thet-distribution for v=9 degrees of freedom, and defines an intervalestimated to have a level of confidence of 95 percent.


Uncertainty of Measurement IExample

Continue with example from beginning of section.Use

s2c (y) =Ni=1

(f

xi

)2s2(xi ) + 2

N1i=1

Nj=i+1

f

xi

f

xjs(xi , xj)

to calculate the combined uncertainty for

P = f (V ,R0, , t) =V 2

R0[1 + (t t0)]Let


Uncertainty of Measurement IIExample

x1 = V

x2 = R0

x3 =

x4 = t

Ignore the uncertainty contribution of t0. Assume it is a very well knownreference value with negligible uncertainty. Then


Uncertainty of Measurement IIIExample

f

V=

2V

R0[1 + (t t0)]f

R0=

V 2

R20 [1 + (t t0)]f

=

(t t0)V 2

((t t0) + 1)2R0f

t=

V 2

[(t t0) + 1]2R0

Evaluate these values at mean values of V ,R0, and t, i.e.V = 10.007 V, R0 = 4.33 , = 0.00393 and t = 40.056

C.This gives


Uncertainty of Measurement IVExample

f

V= 4.284

f

R0= 4.950

f

= 398.506

f

t= 0.078

Assume only V and t is correlated. Hence, from EXCEL, finds(V , t)=0.00296. Also, from the data we can find


Uncertainty of Measurement VExample

s2(V ) = 0.00149

s2(t) = 0.02076

Finally, lets assume that somehow we know that

s2(R0) = 0.001

s2() = 0.02

Now it is straight forward to calculate s2c (P).


Selecting the Right Method I

Method Typical Use

Confidence interval of ; known

Calculate confidence limits for your estimateof the population mean. You know the pop-ulation variance.

Confidence interval of ; unknown

Calculate confidence limits for your estimateof the population mean. You do not knowthe population variance.

One sample hypothesistest: z-test on singlemean

You have one sample and some guess of thepopulation mean. You want to know if theguess is right or how it differs. You knowthe population variance.

One sample hypothesistest: t-test on singlemean

You have one sample and some guess of thepopulation mean. You want to know if theguess is right or how it differs. You do notknow the population variance.


Not in textbook

Selecting the Right Method II

One sample hypothesistest: 2-test on singlevariance

You have one sample and some guess of thepopulation variance. You want to know ifthe guess is right or how it differs.

Two sample hypothesistest: z-test on two means

You have two samples and want to know ifthey are the same or not. You know thepopulation variance.

Two sample hypothesistest: t-test on two meanswith equal variances

You have two samples and want to know ifthey are the same or not. You do not knowthe population variance, but know that theyare equal.

Two sample hypothesistest: t-test on two meanswith unknown, unequalvariances

You have two samples and want to knowif they are the same or not. You have noknowledge about the population variance.

Two sample hypothesistest: paired samples.

Comparing two samples, but the specimensin the two samples are somehow linked. Youdo not know the population variance.


Selecting the Right Method III

Two sample hypothesistest: F-test

Comparing the variances of two samples.

Single factor ANOVA Seeing if there is a difference in the meansof more than two samples.

Single factor ANOVA:Planned comparison

A priori t-tests on the means of selectedsamples to find out if there is a significantdifference.

Single factor ANOVA:Unplanned comparison

A posteriori test on sample means. Not cov-ered in this course.

Regression If you suspect there is a trend between thedependent and independent variables.

Regression: F-test Test the above mentioned suspicion.

Regression: Testing theslope

See if the slope of the linear regressioncurve is significant, otherwise the mean isan equally good predictor.


Selecting the Right Method IV

Regression: Testing theintercept

See if the intercept plays a significant role.Otherwise it could have been zero.

Regression: Coefficientof Determination R2

Indication of goodness of fit. Is not a hy-pothesis test. Should be combined with anF-test for the regression.

Correlation: Pearsonscorrelation coefficient

Similar to coefficient of determination, butdistinguishes between positive and negativecorrelation. Tests if data is correlated, butdoes not tell how. Is not a hypothesis test.

Correlation: Is correla-tion coefficient greaterthan zero?

Hypothesis test to evaluate correlation coef-ficient.

Correlation: Do two cor-relation coefficients dif-fer?

Is there a new correlation between the data?


References I

Uncertainty of measurementpart 3: guide to the expression of uncertainty inmeasurement, 1995.

R.S. Figliola and D.E. Beasley.

Theory and Design for Mechanical Measurements.

Wiley, Hoboken, 4th edition, 2006.

A Graham.

Statistics: A Complete Introduction.

Hodder & Stoughton, 2013.

D Huff and I Geis.

How to Lie with Statistics.

Norton, New York, 1954.

W. Mendenhall and T Sincich.

Statistics for Engineering and the Sciences.

MacMillan, New York, 3rd edition, 1992.

INBO 519.502462 MEN.


References II

J. Schmuller.

Statistical Anlysis with Excel for Dummies.

Wiley, Hoboken, 2nd edition, 2009.

J Schmuller.

Statistical Analysis with Excel for Dummies.

Wiley, Hoboken, 3rd edition, 2013.

R.E. Walpole and R.H. Myers.

Probability and Statistics for Engineers and Scientists.

MacMillan, New York, 4th edition, 1990.


The End


IntroductionSome Important ConceptsExcel DemonstrationGraphing DataChoosing the right type of graphGuidelines for creating good scientific graphs

Calculating Averages with ExcelStandard Deviation and VarianceZ ScoresHigher Order Distribution DescriptorsFrequency and HistogramsBox-and-whisker PlotsThe Normal DistributionConfidence LimitsSampling distributionsCentral limit theoremLimits of confidencet-distributionNormal distribution and t-distribution confidence limits compared

One-sample Hypothesis TestingSome revisionHypothesis testingSummary of one-sample hypothesis tests

Two-sample Hypothesis TestingHypotheses for two-sample means testingHypotheses for two-sample variance testingSummary of two-sample hypothesis tests

Analysis of Variance - Part OneIntroduction to ANOVASingle factor ANOVAAfter the F-test

RegressionLinear regressionTesting hypotheses about regressionExcel's R-squaredExcel functions for regressionMultiple regressionGuidelines

CorrelationPearson's correlation coefficientCorrelation and regressionTesting hypotheses about correlation

Uncertainty of MeasurementEvaluation of standard uncertaintyLaw of propagation of uncertainty for uncorrelated quantitiesLaw of propagation of uncertainty for correlated quantitiesDetermining expanded uncertaintyReporting uncertaintyExample

Selecting the Right Method

statistics block presentation slides day 4

sample hypothesis tests14

sample variance testingsummary

important concepts isamples

important concepts ivtype

columnscontinuous data

graphing data ichoosing

data analysis tools

overview iiiafter

Documents

effective use of block-level sampling in statistics...

block 1 slides

practical statistics for neuroscience miniprojects steven...

spatial coupling vs. block coding: a comparison (slides)

lecture powerpoint slides basic practice of statistics 7 th...

part 11 recognition features of translational block slides

161.120 introductory statistics week 4 lecture slides

senske’s first block ap statistics alesha seternus and...

block ciphers and stream...

[slides] block cipher modes of operation and cmac for...

statistics -...

elementary statistics tenth edition -...

statistics and probability - power point slides

descriptive statistics slides

sas slides 8 : base sas statistics procedures

+ chapter 14: nonparametric statistics lecture powerpoint...

em p 01 energy statistics slides

+ chapter 1: the nature of statistics lecture powerpoint...

statistics slides

year 6 - summer - block 3 - statistics