revision workshop 17 january 2013

REVISION WORKSHOP

NUBE 17TH JANUARY 2013

2

• Frequency table consists of a number of classes and each observation is counted and recorded as the frequency of the class.

• If n observations need to be classified into a frequency table, determine:

–

– max minClass widthx x

c

Organising and graphing quantitative data in a frequency

distribution table.

Number of classes:

1 3,3logc n

3

Example: The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

Organising and graphing quantitative data in a frequency

distribution table.

4

1 3,3log

1 3,3log 48 6,5 7

Number of classes n

max min 22 22,86 3

7

x xClass width

k

Frequency distribution

8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

5

– first class min min[ ; )x x class width[ 2 ; 2 3 )[ 2 ; 5 )

– second class [ 5 ; 5 3 )[ 5 ; 8 )[ 5 ; 5 )class width


8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

“[“ value is included in class

“)“ value is excluded from class

Classes Count

[2;5)

[5;8)

[8;11)

[11;14)

[14;17)

[17;20)

[20;23)

6

3

4

11

13

9

2

6

8 11 12 20 ….

5 7 11 12 ….

6 18 9 15 ….

11 13 22 11 ….

19 14 17 9 ….


|

|

|

|

|

|

│││

││││

│││││││││

│││││││││││││

│││││││││││

││││││

││

7

Classes Frequency (f)

[2;5) 3

[5;8) 4

[8;11) 11

[11;14) 13

[14;17) 9

[17;20) 6

[20;23) 2

Total 48


8

Classes f % frequency

[2;5) 3 3/48×100 = 6,3

[5;8) 4 4/48×100 = 8,3

[8;11) 11 11/48×100 = 22,9

[11;14) 13 27,1

[14;17) 9 18,8

[17;20) 6 12,5

[20;23) 2 4,2

Total 48 100


9

Classes f % f Cumulative frequency (F)

[2;5) 3 6,3 3

[5;8) 4 8,3 3 + 4 = 7

[8;11) 11 22,9 7 + 11 = 18

[11;14) 13 27,1 18 + 13 = 31

[14;17) 9 18,8 31 + 9 = 40

[17;20) 6 12,5 40 + 6 = 46

[20;23) 2 4,2 46 + 2 = 48

Total 48 100


10

Classes f % f F % F

[2;5) 3 6,3 3 3/48×100 = 6,3

[5;8) 4 8,3 7 7/48×100 = 14,6

[8;11) 11 22,9 18 18/48×100 = 37,5

[11;14) 13 27,1 31 64,6

[14;17) 9 18,8 40 83,3

[17;20) 6 12,5 46 95,8

[20;23) 2 4,2 48 100

Total 48 100


11

Classes f F Class mid-points (x)

[2;5) 3 3 (2 + 5)/2 = 3,5

[5;8) 4 7 (5 + 8)/2 = 6,5

[8;11) 11 18 (8 + 11)/2 = 9,5

[11;14) 13 31 (11 + 14)/2 = 12,5

[14;17) 9 40 15,5

[17;20) 6 46 18,5

[20;23) 2 48 21,5

Total 48


12

Classes f % f F % F (x)

[2;5) 3 6,3 3 6,3 3,5

[5;8) 4 8,3 7 14,6 6,5

[8;11) 11 22,9 18 37,5 9,5

[11;14) 13 27,1 31 64,6 12,5

[14;17) 9 18,8 40 83,3 15,5

[17;20) 6 12,5 46 95,8 18,5

[20;23) 2 4,2 48 100 21,5

Total 48 100


13

Classes f % f

[2;5) 3 6,3

[5;8) 4 8,3

[8;11) 11 22,9

[11;14) 13 27,1

[14;17) 9 18,8

[17;20) 6 12,5

[20;23) 2 4,2

y-axis

x-axis

Histograms

14

Histograms

Number of telephone calls per hour

at a municipal call centre

0

2

4

6

8

10

12

14

Number of calls

Nu

mb

er

of

ho

urs

2 5 8 11 14 17 20 23

Definitions

Frequency Polygon

A line graph of a frequency distribution and offers a useful alternative to a histogram. Frequency polygon is useful in conveying the shape of the distribution

Ogive

A graphic representation of the cumulative frequency distribution. Used for approximating the number of values less than or equal to a specified value

15

16

Class mid-points (x) f % f

3,5 3 6,3

6,5 4 8,3

9,5 11 22,9

12,5 13 27,1

15,5 9 18,8

18,5 6 12,5

21,5 2 4,2

y-axis

x-axis

Frequency polygons

17

Number of telephone calls per hour

at a municipal call centre

0

2

4

6

8

10

12

14

0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5

Number of calls

Nu

mb

er

of

ho

urs

Arbitrary mid-points to

close the polygon.

(x)

3,5

6,5

9,5

12,5

15,5

18,5

21,5

Frequency polygons

18

Classes F % F

[2;5) 3 6,3

[5;8) 7 14,6

[8;11) 18 37,5

[11;14) 31 64,6

[14;17) 40 83,3

[17;20) 46 95,8

[20;23) 48 100

y-axis

x-axis

Ogives

19

Ogive of number of call received

at a call centre per hour

0102030405060708090

100

2 5 8 11 14 17 20 23

Number of calls

% C

um

ula

tiv

e

nu

mb

er

of

ho

urs

None of the hours had

less than 2 calls.

Ogives

20

Ogive of number of call received

at a call centre per hour

0102030405060708090

100

2 5 8 11 14 17 20 23

Number of calls

% C

um

ula

tiv

e

nu

mb

er

of

ho

urs

Ogives

50% of the hours had less

than 12 calls per hour.

80% of the

hours had

less than

17 calls

per hour.

20% of the

hours had

more than

17 calls

per hour.

Exam question 2 A garbage removal company would like to start charging by the weight of a customers bin rather than by the number of bins put out. They select a sample of 25 customers and weigh their garbage bins. The weights in kg are given below:-

1. Construct a frequency table to describe the data. Include a frequency and relative (%) frequency column. (Hint: start the class intervals with the whole number just smaller than the lowest value in the dataset)

14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.4

13.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.1

14.8 15.1 4.9 17.0 10.3

Procedure

1. Calculate the range of the dataset

2. Calculate the no of classes

3. Calculate the class width

4. Construct table showing the intervals calculated in 1 to 3

5. Put in the tally for each interval and then show as frequency

6. Calculate the relative (%) frequency

13 marks

Range

31.4 - 4.9 = 26.5

No of classes

K or c= 1+3.3logn

n = 25 K or c= 3.3 log (25) = 5.61 ≈ 6

Class Width

= 26.5/6 = 4.41 ≈ 5

max minClass widthx x

c

INTERVALS TALLY FREQUENCY (f) RELATIVE FREQUENCY (%f)

4 - < 9 111 3 12

9 - < 14 1111 1 6 24

14 - < 19 1111 1111 9 36

19 - < 24 111 3 12

24 - < 29 111 3 12

29 - < 34 1 1 4

25 100

No of classes = 6 Class width = 5

Exam question 2

2. Comment on the interval containing the lowest percentage

3. In which interval do the data tend to cluster? Which descriptive statistics measure, can we assume, would be found in this interval?

4. Comment on the shape of the distribution without drawing a graph . Give reasons

4% of bins weighed between 29 & 34 kg

Largest no. of bins weighed between 14 & 19kg. We assume mode will fall in this

interval (highest frequency)

+ve skewed as more values located in lower intervals

7 MARKS

Quartiles & Box & Whisker Plots

27

• Quartiles • Percentiles • Interquartile range

QUARTILES

28

29

• QUARTILES

– Order data in ascending order.

– Divide data set into four quarters.

25% 25% 25% 25%

Q1 Q2 Q3 Min Max

30

Determine Q1 for the sample of nine measurements:

•Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

Find difference between data for 2 & 3

2-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5

Add to smallest figure: -3 + 2.5: Q1 = 0.5

th1 11 4 4 is the 1 9 1 2,5 valueQ n

31

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

Q3 = 5 + 0,5(6 − 5) = 5,5

th3 33 4 4 is the 1 9 1 7,5 valueQ n

Determine Q3 for the sample of nine measurements:


2 5 8 −3 5 2 6 5 −4

32

Q3 = 5,5

Q1 = −0,5

Interquartile range = Q3 – Q1

Interquartile range

= 5,5 – (−0,5)

= 6


2 5 8 −3 5 2 6 5 −4

INTERQUARTILE RANGE (IQR)

• Difference between the third and first quartiles

• Indicates how far apart the first and third quartiles are

IQR = Q3 – Q1

33

BOX & WHISKER PLOT

• Provides a graphical summary of data based on 5 summary measures or values

– First quartile, median, third quartile ,lower limit, upper limit

• Box and whisker plot detects outliers in a data set

LL = Q1 – 1,5 (IQR)

UL = Q3 + 1,5 (IQR)

34

35

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

Me = 12,38

Q3 = 15,67

Q1 = 9,36

IRR = 6,31

LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11

UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14

BOX-AND-WISKER PLOT

1,5(IQR) 1,5(IQR) IQR

• Any value smaller than −0,11 will be an outlier.

• Any value larger than 25,14 will be an outlier.

Exam question 3 The Tubeka brothers spent the following amounts in Rand on groceries over the last 8 weeks:-

1. Calculate a five number summary table

2. Construct a box and whisker plot for the data

3. Determine whether there are any outliers. Show calculations

20 MARKS

PROCEDURE

1. Reorder the data set

2. Identify maximum and minimum values in dataset

3. Calculate median

4. Calculate Q1 & Q3

5. Construct plot

6. Calculate upper & lower limits for dataset to determine if outliers present

54 56 89 67 74 57 43 51

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 value Between 51 & 54 54-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower value Q1 = 51 + 0.75 = 51.75 Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 value Between 67 & 74 74 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower value Q3 = 67 + 5.25 = 72.25

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 OUTLIERS 1. Calculate upper & lower limits

LL = Q1 – 1,5 (IQR) UL = Q3 + 1,5 (IQR)

IQR = 72.25 – 51.75 = 20.5

LL = 51.75 – 1,5(20.5) = 21 UL = 72.25 + 1.5(20.5) = 103

No values smaller than 21 or greater than 103 therefore no outliers present

MEASURES OF LOCATION

40

th

i

th

where frequency of the i class interval

= class midpoint of the i class interval

i i

i

i

f xx

f

f

x

• ARITHMETIC MEAN

– Data is given in a frequency table

– Only an approximate value of the mean

41

12

-1

where = lower boundary of the median interval

= upper boundary of the median interval

= cumulative frequency of interval foregoing

median interval

= frequency o

n

i i i

e i

i

i

i

i

i

u l FM l

f

l

u

F

f

f the median interval

• MEDIAN

– Data is given in a frequency table.

– First cumulative frequency ≥ n/2 will indicate the median class interval.

– Median can also be determined from the ogive.

42

• MODE

– Class interval that has the largest frequency value will contain the mode.

– Mode is the class midpoint of this class.

– Mode must be determined from the histogram.

43

To calculate the

mean for the sample

of the 48 hours:

determine the class

midpoints

Number of Number of calls hours fi xi

[2–under 5) 3 3,5

[5–under 8) 4 6,5

[8–under 11) 11 9,5

[11–under 14) 13 12,5

[14–under 17) 9 15,5

[17–under 20) 6 18,5

[20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

44

Number of Number of calls hours fi xi

[2–under 5) 3 3,5

[5–under 8) 4 6,5

[8–under 11) 11 9,5

[11–under 14) 13 12,5

[14–under 17) 9 15,5

[17–under 20) 6 18,5

[20–under 23) 2 21,5

n = 48

597

48

12, 44

i i

i

f xx

f


Average number

of calls per hour

is 12,44.

Exam question 3 The number of overtime hours worked by 40 part-time employees of a security company in 1 week is shown in the following frequency distribution:-

1. Estimate the mean number of overtime hours worked

2. What % of employees worked at least 4.2 hours overtime?

8 marks

Hours per week

Frequency (f)

2.1 - < 2.8 12

2.8 - < 3.5 13

3.5 - < 4.2 7

4.2 - < 4.9 5

4.9 - < 5.6 2

5.6 - < 6.3 1

Exam question 3 Procedure

1. Calculate the midpoint x for each interval (lower limit + upper limit/2)

2. Multiply f by the midpoint x

3. Total the fx and f columns

4. Divide ∑fx by ∑f

Exam question 3

Mean = 136.5/40 = 3.41hrs

Employees at least 4.2 hrs = 8 8/40 *100 = 20%

Hours per week Frequency (f) Mid point (x) fx

2.1 - < 2.8 12 (2.1 + 2.8)/2= 2.45

29.4

2.8 - < 3.5 13 3.15 40.95

3.5 - < 4.2 7 3.85 26.95

4.2 - < 4.9 5 4.55 22.75

4.9 - < 5.6 2 5.25 10.5

5.6 - < 6.3 1 5.95 5.95

40 136.5

PERCENTILES

48

49

• PERCENTILES

– Order data in ascending order.

– Divide data set into hundred parts.

20% 80%

P80 Min Max

50% 50%

P50 = Q2 Min Max

10%

P10 Min Max

90%

50

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

P20 = −3

nd2020 100 100

is the 1 9 1 2 valuepP n

Determine P20 for the sample of nine measurements:


2 5 8 −3 5 2 6 5 −4

51

Number of Number of calls hours fi F

[2–under 5) 3 3

[5–under 8) 4 7

[8–under 11) 11 18

[11–under 14) 13 31

[14–under 17) 9 40

[17–under 20) 6 46

[20–under 23) 2 48

n = 48


= np/100

= 48(60)/100

= 28,8

The first cumulative

frequency ≥ 28,8

P60

52

Number of Number of calls hours fi F

[2–under 5) 3 3

[5–under 8) 4 7

[8–under 11) 11 18

[11–under 14) 13 31

[14–under 17) 9 40

[17–under 20) 6 46

[20–under 23) 2 48

n = 48


60

1100

P

14 11 28,8 1811

1313,49

np

p p p

p

p

u l Fl

f

60% of the time less than 13,49 or 40% of the time more than 13,49 calls per hour.

Exam question 3 1. John, one of the part-time workers was told he falls on the

70th percentile. Calculate the value and explain what it means.

PROCEDURE

1. Calculate the cumulative frequencies

2. Calculate which class the required percentile falls into by using P =np/100

3. Once you have identified the class use the percentile formula given in the tables book to calculate the value. Take CARE to order the calculation correctly.

4 MARKS

Exam question 3

P = np/100 = 40*70/100

=28

P70 = 3.5 + [ (4.2-3.5)(28-25)]/7

= 3.5 + 0.8

=3.8

70% of the workers worked fewer hours overtime than John. 70% of the workers worked fewer than 3.8 hrs. 30% of the workers worked more overtime hours than John. 30% of the employees worked more than 3.8hrs.

Hours per week

Frequency (f)

Cumulative F

2.1 - < 2.8 12 12

2.8 - < 3.5 13 25

3.5 - < 4.2 7 32

4.2 - < 4.9 5 37

4.9 - < 5.6 2 39

5.6 - < 6.3 1 40

40

CONFIDENCE INTERVALS

56

Confidence interval

– An interval is calculated around the sample statistic

Confidence interval

Population parameter

included in interval

57

Confidence interval

– An upper and lower limit within in which the population parameter is expected to lie

– Limits will vary from sample to sample

– Specify the probability that the interval will include the parameter

– Typical used 90%, 95%, 99%

– Probability denoted by

• (1 – α) known as the level of confidence

• α is the significance level

Example:

Meaning of a 90% confidence interval:

90% of all possible samples taken from

population will produce an interval that will

include the population parameter

• An interval estimate consists of a range of values with an upper & lower limit

• The population parameter is expected to lie within this interval with a certain level of confidence

• Limits of an interval vary from sample to sample therefore we must also specify the probability that an interval will contain the parameter

• Ideally probability should be as high as possible

58

SO REMEMBER

•We can choose the probability

•Probability is denoted by (1-α)

•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)

•The probability is known as the LEVEL OF CONFIDENCE

•α is known as the SIGNIFICANCE LEVEL

•α corresponds to an area under a curve

•Since we take the confidence level into account when we estimate an interval, the interval is called CONFIDENCE INTERVAL

59

60

Confidence interval for Population Mean, n ≥ 30

- population need not be normally distributed

- sample will be approximately normal

2

2

1 1

1 1

( ) , if is known

( ) , if is not known

CI x Zn

sCI x Z

n

Example :

90% confidence interval

1 – 0,90

0,10

0,100,05

2 2

61

2

2

1 1

1 1

( ) , if is known

( ) , if is not known

CI x Zn

sCI x Z

n

Lower conf limit Upper conf limit x

1 - α

2

2

Confidence level

= 1 - α

1

1 – α

= 0,90 0,052

0,05

2

90% of all sample

means fall in this area

These 2 areas added

together = α i.e. 10%

63

• Confidence interval for Population Mean, n < 30 – For a small sample from a normal population and σ is

known, the normal distribution can be used.

– If σ is unknown we use s to estimate σ

– We need to replace the normal distribution with the t-distribution

▬ standard normal

▬ t-distribution 2

1 1;1( )

n

sCI x t

n

t Distribution

64

65

• Example – The manager of a small departmental store is concerned about

the decline of his weekly sales.

– He calculated the average and standard deviation of his sales for the past 12 weeks,

– Estimate with 99% confidence the population mean sales of the departmental store.

1;12

134612400 3,106

12

12400 1206,86

11193,14 ; 13606,86

n

sx t

n

= R12400 and s = R1346x

99% confident the mean weekly

sales will be between

R11 193,14 and R13 606,86

t11;0.995

66

• Confidence interval for Population proportion – Each element in the population can be classified as a

success or failure

– Proportion always between 0 and 1

– For large samples the sample proportion is

approximately normal

2

1 1

ˆ ˆ(1 )ˆ( )

p pCI p p z

n

number of successesˆSample proportion = =

sample size

xp

n

p̂

Exam question 7 1. In a sample of 200 residents of Johannesburg, 120 reported

they believed the property taxes were too high. Develop a

95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer

2. The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate

for the mean time it will take the mechanic for all engine tune ups. Interpret your answer

15 MARKS

Exam question 7 PROCEDURE

1. Determine what measure your are looking at: mean, proportion or standard deviation

2. Select appropriate formula based on 1. and sample size (t for small sample sizes <30; z for larger sample sizes)

3. Put the numbers into the formula and calculate the confidence intervals

Exam question 7 1.

𝑝 = 120/200 = 0.6

Z 1-α

2

= 1.96

CI = 0.6 +/_1.96 √( 0.6 0.4 )/200

CI = 0.6 +/- 0.07

0.53<CI<0.67

At CL of 95% between 53% and 67% of residents believe tax rate is too high

In a sample of 200 residents of Johannesburg, 120 reported they believed the property taxes were too high. Develop a 95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer

21 1

ˆ ˆ(1 )ˆ( )

p pCI p p z

n


sample size

xp

n

Exam question 7

The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate for the mean time it will take the mechanic for all engine tune ups. Interpret your answer

2

1 1;1( )

n

sCI x t

n

= 45 +/- 2.093 14

√20

= 45 +/- 6.55 38.45< µ < 51.55 At a confidence level of 95% the population average time to complete a tune up is between 38.45 and 51.55 minutes

HYPOTHESIS TESTING

STEPS OF A HYPOTHESIS TEST

Step 1 • State the null and alternative hypotheses

Step 2 • State the values of α

Step 3 • Calculate the value of the test statistic

Step 4 • Determine the critical value

Step 5 • Make a decision using decision rule or graph

Step 6 • Draw a conclusion

72

73

• Hypothesis test for Population Mean, n < 30 – If σ is unknown we use s to estimate σ

– We need to replace the normal distribution with the t-distribution with (n - 1) degrees of freedom

Testing H0: μ = μ0 for n < 30

Alternative

hypothesis

Decision rule:

Reject H0 if Test statistic

H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2

H1: μ > μ0 t ≥ tn-1;1- α

H1: μ < μ0 t ≤ -tn-1;1- α

0xt

s

n

74

• Hypothesis testing for Population proportion

–

– Proportion always between 0 and 1


sample size

xp

n

Testing H0: p = p0 for n ≥ 30

Alternative

hypothesis

Decision rule:

Reject H0 if Test statistic

H1: p ≠ p0 |z| ≥ Z1- α/2

H1: p > p0 z ≥ Z1- α

H1: p < p0 z ≤ -Z1- α

0

0 0

ˆ

(1 )

p pz

p p

n

Exam question 8

1. Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.

2. The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level

16 MARKS

Exam question 8 Procedure

1. State H0 and Ha

2. Determine the critical value from the appropriate test table using α, and n

3. Compute test statistic (t or z value??)

4. Draw conclusion

Exam question 8

State hypothesis

H0: µ = 42.5

Ha: µ > 42.5

Determine critical value

tn-1; 1- α = t 23; 0.9 = 1.319

Reject H0 if the test statistic is > 1.319

Calculate test statistic

T = 40-42.5 = -6.12

2

√24

Do not reject H0

Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.

0xt

s

n

Exam question 8 State hypothesis

H0: p = 0.55

Ha: p > 0.55

Determine critical value

α = 0.05 Z = 1.64

Reject H0 if Z test > 1.64

Calculate test statistic

Z = 0.6−0.55

√((0.55)(0.45)/70 = 0.84

Do not reject H0

The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level

0

0 0

ˆ

(1 )

p pz

p p

n


sample size

xp

n

CORRELATION COEFFICIENT

80

Coefficient of correlation

• The coefficient of correlation is used to measure the strength of association between two variables.

• The coefficient values range between -1 and 1.

– If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line.

– If r = 0 there is no linear pattern.

• The coefficient can be used to test for linear relationship between two variables.

81

X

Y

X

Y

X

Y

X

Y

X

Y

X

Y

Perfect positive

r = +1

High positive

r = +0,9

Low positive

r = +0,3

Perfect negative

r = -1

High negative

r = -0,8

No Correlation

r = 0

Exam question 10 The cost of repairing cars that were involved in accidents is one reason that insurance premiums are so high. In an experiment 5 cars were driven into a wall. The speeds were varied between 20km/hr and 80km/hr (X). The costs of repair (Y) were estimated and listed below:-

1. Use calculator to calculate coefficient of correlation. Interpret your

answer 2. Calculate and interpret the coefficient of determination for this

data 3. Use your calculator to construct regression line equation and

predict repair cost at 50km/h

10 MARKS

SPEED (Km/h) (X) COST OF REPAIR (R’000) (Y)

20 3

30 5

40 8

60 24

80 34

Exam question 10

1. Put data into calculator

2. Select regression function and select r

3. Calculate coefficient of determination

= r2 x100%

4. Interpret results

5. Using Y = A + BX select regression function on calculator and determine values for A & B

6. Put x = 50 into formula and calculate result

Exam question 10 1. r = 0.98

There is a very strong relationship between the repair cost and speed.

2. r2 x 100% = 0.982 x 100 = 96%

96% of the variation in the cost of repair is explained by the variation in the speed at which the car crashed

3. Y = -10.7 +0.55x

X = 50 Y = 16.8

revision workshop 17 january 2013

Education