basic statistics for engineers. collection, presentation, interpretation and decision making. prof....

Basic Statistics for Engineers.Basic Statistics for Engineers.

Collection, presentation, interpretation and decision making.

Prof. Dudley S. Finch

StatisticsStatistics

Four steps:– Data collection including sampling techniques– Data presentation– Data analysis– Conclusions and decisions based on the

analysis

Data typesData types

Discrete– Defined as:

A variable consisting of separate values; for example the number of bolts in a packet. There may be 8 or 9 but there cannot be 8.5

Continuous– Defined as:

A variable which may have any value; for example the diameter of steel bars after machining. Any diameter is possible within the allowable tolerance to which the machine is set.

SamplingSampling Often not practical to examine every component

therefore sampling techniques are used. Sample should be representative of the complete set

(the population) of values from which it has been chosen.

Although not guaranteed, we attempt to chose an unbiased sample.

To be unbiased every possible sample must have an equal chance of being chosen. Satisfied if sample is chosen at random; that is, if there is no order in the way the sample is chosen. This is called a random sample.

Random samplesRandom samples

The larger the random sample the more representative of the population it is likely to be.

Random sampling can be carried out by allocating a number to each member of the population and then drawing numbered balls from a bag or using a random number generator.

Sampling techniques involve probability theory (will be dealt with later).

Data presentationData presentation51.4 55.3 56.1 50.5 55.5

52.8 55.6 55.3 50.2 56.1

52.1 54.8 49.6 57.0 52.0

56.5 55.3 54.0 51.6 52.1

57.3 53.9 53.5 56.1 57.2

54.6 55.4 55.9 56.0 52.9

54.1 55.0 54.2 54.2 54.5

53.0 52.7 54.5 54.7 58.4

56.2 55.8 54.1 56.0 55.1

55.1 54.4 57.2 53.2 55.4

53.9 50.9 54.5 56.9 54.0

56.4 53.1 51.8 52.8 50.5

53.7 52.8 54.0 56.4 55.0

53.8

Measured weights of a casting (lbs).

Frequency distributionFrequency distribution

Mass of casting 50 51 52 53 54 55 56 57 58

Number of castings (frequency)f

2 4 5 8 13 15 12 6 1

The class interval should be one that emphasizes any pattern in the data. Typically between 8 and 15 class intervals should be chosen.

In the example used, a class interval of 1lb is chosen. 50lbs therefore includes 49.5 to 50.4lbs. We can thereforecompile a frequency distribution table.

Bar chartBar chart

0

2

4

6

8

10

12

14

16

50 51 52 53 54 55 56 57 58

Variable x (lbs)

Fre

qu

en

cy

(f)

HistogramHistogram

0

2

4

6

8

10

12

14

16

50 51 52 53 54 55 56 57 58

Variable x (lbs)

Fre

qu

en

cy

(f)

Frequency polygonFrequency polygon

0

2

4

6

8

10

12

14

16

50 51 52 53 54 55 56 57 58

Variable x (lbs)

Fre

qu

en

cy

(f)

Frequency curveFrequency curve

0

1

2

3

4

5

6

7

Mass of casting (lbs)

Fre

qu

en

cy

Pie chart showing relative frequencyPie chart showing relative frequency

Relative frequency = class frequency / total frequency of the sample e.g. the relative frequency of the 53lb class is 8/66 or 0.121

503%

516% 52

8%

5312%

5420%

5522%

5618%

579%

582%

Numerical methods of a Numerical methods of a distributiondistribution

A frequency distribution can be represented by two numerical quantities:– Central tendency or average value of the

distribution– Dispersion or scatter of variables about the

average value

Numerical measures of central Numerical measures of central tendencytendency

Mid point of range:– Difference between the largest and smallest values of

the variable Generally poor measure of central tendency since it depends

only on the extreme values of the variable and is not influenced by the form of the distribution.

Mode:– The most frequently occurring value of the variable

Easily obtained from frequency table. For the casting the mode = 55lbs.

Arithmetic mean

– Determined by adding all the values of the variable and dividing this by the total number of values. If x1, x2, x3, ….xn are the N values then…

= x1 + x2 + ... + xn

¹ ¹ ¹ ¹ N

= ˆ̂̂̂1 x¹ ˆ̂̂̂N

mean =

f1x1 + f2x2 + ... + fnxn

¹ ¹ ¹ f1 + f2 + ... + fn

where f1 + f2 + ... + fn = N

or = 1 fx¹ ¹ N

For frequency distribution tablesFor frequency distribution tables::

Evaluate the deviations:

(x1 - ), (x2 - ), ... (xn - )

Evaluate the squares of the deviations:

(x1 - )2, (x2 - )

2, ... (xn - )

2

Evaluate the sum f(x- )2

= f1(x1 - )2, f2(x2 - )

2, ... fn(xn - )

2

To calculate standard deviation:To calculate standard deviation:

Evaluate the average squared deviation

= f(x- )

2

¹ ¹ N

Evaluate the standard deviation s

=ž f(x- )2 ¹ ¹ ¹ ˆ̂̂̂̂̂N¹ ¹

¹ ¹ ¹

EstimationEstimation

Applies to the difficulty of obtaining data about the population from which the sample was drawn and in setting up a mathematical model to describe this population.

Two components: estimation and testing of hypotheses about the chosen model.

Two types of estimates:Two types of estimates:

Point estimate– Estimate of a population parameter expressed as a

single number This method gives no indication as to the accuracy of the

estimate

Interval estimate – Estimate of a population parameter expressed as two

numbers This method is preferable as it gives an indication as to where

the population parameter is expected to lie

Confidence intervalsConfidence intervals In practice, the true standard deviation, , is

unknown and that the sample standard deviation, s, is used to estimate .

If a random sample size n is drawn, an estimate of the standard error of the sample mean is given by

Need to determine the confidence interval for the true mean, .

For n>30 a good approximation can be obtained. For small samples a wider interval is used.

s/n

Use of Student t-distribution tablesUse of Student t-distribution tables

Look up value for (n-1) and use desired confidence limits (0.01= 98%, 0.005 = 99%, 0.001 = 99.8%, etc.).

Find The true mean = sample mean

t½,n-1

s/n

s/n

For castings example:For castings example:

Sample mean = 54.3lbs

Standard deviation, s = 1.83lbs

n = 66

Using t0.005, 65 the true mean is given by:

54.3 2.66 x 0.225 = 0.599

Thus we can be 99% confident that the true

mean lies between 53.7 and 54.9

basic statistics for engineers. collection, presentation, interpretation and decision making. prof....

Documents