revision workshop 17 january 2013
Post on 05-Dec-2014
750 Views
Preview:
DESCRIPTION
TRANSCRIPT
REVISION WORKSHOP
NUBE 17TH JANUARY 2013
2
• Frequency table consists of a number of classes and each observation is counted and recorded as the frequency of the class.
• If n observations need to be classified into a frequency table, determine:
–
– max minClass widthx x
c
Organising and graphing quantitative data in a frequency
distribution table.
Number of classes:
1 3,3logc n
3
Example: The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
Organising and graphing quantitative data in a frequency
distribution table.
4
1 3,3log
1 3,3log 48 6,5 7
Number of classes n
max min 22 22,86 3
7
x xClass width
k
Frequency distribution
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
5
– first class min min[ ; )x x class width[ 2 ; 2 3 )[ 2 ; 5 )
– second class [ 5 ; 5 3 )[ 5 ; 8 )[ 5 ; 5 )class width
Frequency distribution
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
“[“ value is included in class
“)“ value is excluded from class
Classes Count
[2;5)
[5;8)
[8;11)
[11;14)
[14;17)
[17;20)
[20;23)
6
3
4
11
13
9
2
6
8 11 12 20 ….
5 7 11 12 ….
6 18 9 15 ….
11 13 22 11 ….
19 14 17 9 ….
Frequency distribution
|
|
|
|
|
|
│││
││││
│││││││││
│││││││││││││
│││││││││││
││││││
││
7
Classes Frequency (f)
[2;5) 3
[5;8) 4
[8;11) 11
[11;14) 13
[14;17) 9
[17;20) 6
[20;23) 2
Total 48
Frequency distribution
8
Classes f % frequency
[2;5) 3 3/48×100 = 6,3
[5;8) 4 4/48×100 = 8,3
[8;11) 11 11/48×100 = 22,9
[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2
Total 48 100
Frequency distribution
9
Classes f % f Cumulative frequency (F)
[2;5) 3 6,3 3
[5;8) 4 8,3 3 + 4 = 7
[8;11) 11 22,9 7 + 11 = 18
[11;14) 13 27,1 18 + 13 = 31
[14;17) 9 18,8 31 + 9 = 40
[17;20) 6 12,5 40 + 6 = 46
[20;23) 2 4,2 46 + 2 = 48
Total 48 100
Frequency distribution
10
Classes f % f F % F
[2;5) 3 6,3 3 3/48×100 = 6,3
[5;8) 4 8,3 7 7/48×100 = 14,6
[8;11) 11 22,9 18 18/48×100 = 37,5
[11;14) 13 27,1 31 64,6
[14;17) 9 18,8 40 83,3
[17;20) 6 12,5 46 95,8
[20;23) 2 4,2 48 100
Total 48 100
Frequency distribution
11
Classes f F Class mid-points (x)
[2;5) 3 3 (2 + 5)/2 = 3,5
[5;8) 4 7 (5 + 8)/2 = 6,5
[8;11) 11 18 (8 + 11)/2 = 9,5
[11;14) 13 31 (11 + 14)/2 = 12,5
[14;17) 9 40 15,5
[17;20) 6 46 18,5
[20;23) 2 48 21,5
Total 48
Frequency distribution
12
Classes f % f F % F (x)
[2;5) 3 6,3 3 6,3 3,5
[5;8) 4 8,3 7 14,6 6,5
[8;11) 11 22,9 18 37,5 9,5
[11;14) 13 27,1 31 64,6 12,5
[14;17) 9 18,8 40 83,3 15,5
[17;20) 6 12,5 46 95,8 18,5
[20;23) 2 4,2 48 100 21,5
Total 48 100
Frequency distribution
13
Classes f % f
[2;5) 3 6,3
[5;8) 4 8,3
[8;11) 11 22,9
[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2
y-axis
x-axis
Histograms
14
Histograms
Number of telephone calls per hour
at a municipal call centre
0
2
4
6
8
10
12
14
Number of calls
Nu
mb
er
of
ho
urs
2 5 8 11 14 17 20 23
Definitions
Frequency Polygon
A line graph of a frequency distribution and offers a useful alternative to a histogram. Frequency polygon is useful in conveying the shape of the distribution
Ogive
A graphic representation of the cumulative frequency distribution. Used for approximating the number of values less than or equal to a specified value
15
16
Class mid-points (x) f % f
3,5 3 6,3
6,5 4 8,3
9,5 11 22,9
12,5 13 27,1
15,5 9 18,8
18,5 6 12,5
21,5 2 4,2
y-axis
x-axis
Frequency polygons
17
Number of telephone calls per hour
at a municipal call centre
0
2
4
6
8
10
12
14
0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5
Number of calls
Nu
mb
er
of
ho
urs
Arbitrary mid-points to
close the polygon.
(x)
3,5
6,5
9,5
12,5
15,5
18,5
21,5
Frequency polygons
18
Classes F % F
[2;5) 3 6,3
[5;8) 7 14,6
[8;11) 18 37,5
[11;14) 31 64,6
[14;17) 40 83,3
[17;20) 46 95,8
[20;23) 48 100
y-axis
x-axis
Ogives
19
Ogive of number of call received
at a call centre per hour
0102030405060708090
100
2 5 8 11 14 17 20 23
Number of calls
% C
um
ula
tiv
e
nu
mb
er
of
ho
urs
None of the hours had
less than 2 calls.
Ogives
20
Ogive of number of call received
at a call centre per hour
0102030405060708090
100
2 5 8 11 14 17 20 23
Number of calls
% C
um
ula
tiv
e
nu
mb
er
of
ho
urs
Ogives
50% of the hours had less
than 12 calls per hour.
80% of the
hours had
less than
17 calls
per hour.
20% of the
hours had
more than
17 calls
per hour.
Exam question 2 A garbage removal company would like to start charging by the weight of a customers bin rather than by the number of bins put out. They select a sample of 25 customers and weigh their garbage bins. The weights in kg are given below:-
1. Construct a frequency table to describe the data. Include a frequency and relative (%) frequency column. (Hint: start the class intervals with the whole number just smaller than the lowest value in the dataset)
14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.4
13.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.1
14.8 15.1 4.9 17.0 10.3
Procedure
1. Calculate the range of the dataset
2. Calculate the no of classes
3. Calculate the class width
4. Construct table showing the intervals calculated in 1 to 3
5. Put in the tally for each interval and then show as frequency
6. Calculate the relative (%) frequency
13 marks
Range
31.4 - 4.9 = 26.5
No of classes
K or c= 1+3.3logn
n = 25 K or c= 3.3 log (25) = 5.61 ≈ 6
Class Width
= 26.5/6 = 4.41 ≈ 5
max minClass widthx x
c
INTERVALS TALLY FREQUENCY (f) RELATIVE FREQUENCY (%f)
4 - < 9 111 3 12
9 - < 14 1111 1 6 24
14 - < 19 1111 1111 9 36
19 - < 24 111 3 12
24 - < 29 111 3 12
29 - < 34 1 1 4
25 100
No of classes = 6 Class width = 5
Exam question 2
2. Comment on the interval containing the lowest percentage
3. In which interval do the data tend to cluster? Which descriptive statistics measure, can we assume, would be found in this interval?
4. Comment on the shape of the distribution without drawing a graph . Give reasons
4% of bins weighed between 29 & 34 kg
Largest no. of bins weighed between 14 & 19kg. We assume mode will fall in this
interval (highest frequency)
+ve skewed as more values located in lower intervals
7 MARKS
Quartiles & Box & Whisker Plots
27
• Quartiles • Percentiles • Interquartile range
QUARTILES
28
29
• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.
25% 25% 25% 25%
Q1 Q2 Q3 Min Max
30
Determine Q1 for the sample of nine measurements:
•Order the measurements
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
Find difference between data for 2 & 3
2-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5
Add to smallest figure: -3 + 2.5: Q1 = 0.5
th1 11 4 4 is the 1 9 1 2,5 valueQ n
31
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
Q3 = 5 + 0,5(6 − 5) = 5,5
th3 33 4 4 is the 1 9 1 7,5 valueQ n
Determine Q3 for the sample of nine measurements:
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
32
Q3 = 5,5
Q1 = −0,5
Interquartile range = Q3 – Q1
Interquartile range
= 5,5 – (−0,5)
= 6
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
INTERQUARTILE RANGE (IQR)
• Difference between the third and first quartiles
• Indicates how far apart the first and third quartiles are
IQR = Q3 – Q1
33
BOX & WHISKER PLOT
• Provides a graphical summary of data based on 5 summary measures or values
– First quartile, median, third quartile ,lower limit, upper limit
• Box and whisker plot detects outliers in a data set
LL = Q1 – 1,5 (IQR)
UL = Q3 + 1,5 (IQR)
34
35
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
Me = 12,38
Q3 = 15,67
Q1 = 9,36
IRR = 6,31
LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14
BOX-AND-WISKER PLOT
1,5(IQR) 1,5(IQR) IQR
• Any value smaller than −0,11 will be an outlier.
• Any value larger than 25,14 will be an outlier.
Exam question 3 The Tubeka brothers spent the following amounts in Rand on groceries over the last 8 weeks:-
1. Calculate a five number summary table
2. Construct a box and whisker plot for the data
3. Determine whether there are any outliers. Show calculations
20 MARKS
PROCEDURE
1. Reorder the data set
2. Identify maximum and minimum values in dataset
3. Calculate median
4. Calculate Q1 & Q3
5. Construct plot
6. Calculate upper & lower limits for dataset to determine if outliers present
54 56 89 67 74 57 43 51
43 51 54 56 57 67 74 89
xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 value Between 51 & 54 54-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower value Q1 = 51 + 0.75 = 51.75 Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 value Between 67 & 74 74 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower value Q3 = 67 + 5.25 = 72.25
43 51 54 56 57 67 74 89
xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 OUTLIERS 1. Calculate upper & lower limits
LL = Q1 – 1,5 (IQR) UL = Q3 + 1,5 (IQR)
IQR = 72.25 – 51.75 = 20.5
LL = 51.75 – 1,5(20.5) = 21 UL = 72.25 + 1.5(20.5) = 103
No values smaller than 21 or greater than 103 therefore no outliers present
MEASURES OF LOCATION
40
th
i
th
where frequency of the i class interval
= class midpoint of the i class interval
i i
i
i
f xx
f
f
x
• ARITHMETIC MEAN
– Data is given in a frequency table
– Only an approximate value of the mean
41
12
-1
where = lower boundary of the median interval
= upper boundary of the median interval
= cumulative frequency of interval foregoing
median interval
= frequency o
n
i i i
e i
i
i
i
i
i
u l FM l
f
l
u
F
f
f the median interval
• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the median class interval.
– Median can also be determined from the ogive.
42
• MODE
– Class interval that has the largest frequency value will contain the mode.
– Mode is the class midpoint of this class.
– Mode must be determined from the histogram.
43
To calculate the
mean for the sample
of the 48 hours:
determine the class
midpoints
Number of Number of calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
44
Number of Number of calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
n = 48
597
48
12, 44
i i
i
f xx
f
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
Average number
of calls per hour
is 12,44.
Exam question 3 The number of overtime hours worked by 40 part-time employees of a security company in 1 week is shown in the following frequency distribution:-
1. Estimate the mean number of overtime hours worked
2. What % of employees worked at least 4.2 hours overtime?
8 marks
Hours per week
Frequency (f)
2.1 - < 2.8 12
2.8 - < 3.5 13
3.5 - < 4.2 7
4.2 - < 4.9 5
4.9 - < 5.6 2
5.6 - < 6.3 1
Exam question 3 Procedure
1. Calculate the midpoint x for each interval (lower limit + upper limit/2)
2. Multiply f by the midpoint x
3. Total the fx and f columns
4. Divide ∑fx by ∑f
Exam question 3
Mean = 136.5/40 = 3.41hrs
Employees at least 4.2 hrs = 8 8/40 *100 = 20%
Hours per week Frequency (f) Mid point (x) fx
2.1 - < 2.8 12 (2.1 + 2.8)/2= 2.45
29.4
2.8 - < 3.5 13 3.15 40.95
3.5 - < 4.2 7 3.85 26.95
4.2 - < 4.9 5 4.55 22.75
4.9 - < 5.6 2 5.25 10.5
5.6 - < 6.3 1 5.95 5.95
40 136.5
PERCENTILES
48
49
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.
20% 80%
P80 Min Max
50% 50%
P50 = Q2 Min Max
10%
P10 Min Max
90%
50
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
P20 = −3
nd2020 100 100
is the 1 9 1 2 valuepP n
Determine P20 for the sample of nine measurements:
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
51
Number of Number of calls hours fi F
[2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
= np/100
= 48(60)/100
= 28,8
The first cumulative
frequency ≥ 28,8
P60
52
Number of Number of calls hours fi F
[2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
60
1100
P
14 11 28,8 1811
1313,49
np
p p p
p
p
u l Fl
f
60% of the time less than 13,49 or 40% of the time more than 13,49 calls per hour.
Exam question 3 1. John, one of the part-time workers was told he falls on the
70th percentile. Calculate the value and explain what it means.
PROCEDURE
1. Calculate the cumulative frequencies
2. Calculate which class the required percentile falls into by using P =np/100
3. Once you have identified the class use the percentile formula given in the tables book to calculate the value. Take CARE to order the calculation correctly.
4 MARKS
Exam question 3
P = np/100 = 40*70/100
=28
P70 = 3.5 + [ (4.2-3.5)(28-25)]/7
= 3.5 + 0.8
=3.8
70% of the workers worked fewer hours overtime than John. 70% of the workers worked fewer than 3.8 hrs. 30% of the workers worked more overtime hours than John. 30% of the employees worked more than 3.8hrs.
Hours per week
Frequency (f)
Cumulative F
2.1 - < 2.8 12 12
2.8 - < 3.5 13 25
3.5 - < 4.2 7 32
4.2 - < 4.9 5 37
4.9 - < 5.6 2 39
5.6 - < 6.3 1 40
40
CONFIDENCE INTERVALS
56
Confidence interval
– An interval is calculated around the sample statistic
Confidence interval
Population parameter
included in interval
57
Confidence interval
– An upper and lower limit within in which the population parameter is expected to lie
– Limits will vary from sample to sample
– Specify the probability that the interval will include the parameter
– Typical used 90%, 95%, 99%
– Probability denoted by
• (1 – α) known as the level of confidence
• α is the significance level
Example:
Meaning of a 90% confidence interval:
90% of all possible samples taken from
population will produce an interval that will
include the population parameter
• An interval estimate consists of a range of values with an upper & lower limit
• The population parameter is expected to lie within this interval with a certain level of confidence
• Limits of an interval vary from sample to sample therefore we must also specify the probability that an interval will contain the parameter
• Ideally probability should be as high as possible
58
SO REMEMBER
•We can choose the probability
•Probability is denoted by (1-α)
•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)
•The probability is known as the LEVEL OF CONFIDENCE
•α is known as the SIGNIFICANCE LEVEL
•α corresponds to an area under a curve
•Since we take the confidence level into account when we estimate an interval, the interval is called CONFIDENCE INTERVAL
59
60
Confidence interval for Population Mean, n ≥ 30
- population need not be normally distributed
- sample will be approximately normal
2
2
1 1
1 1
( ) , if is known
( ) , if is not known
CI x Zn
sCI x Z
n
Example :
90% confidence interval
1 – 0,90
0,10
0,100,05
2 2
61
2
2
1 1
1 1
( ) , if is known
( ) , if is not known
CI x Zn
sCI x Z
n
Lower conf limit Upper conf limit x
1 - α
2
2
Confidence level
= 1 - α
1
1 – α
= 0,90 0,052
0,05
2
90% of all sample
means fall in this area
These 2 areas added
together = α i.e. 10%
62
63
• Confidence interval for Population Mean, n < 30 – For a small sample from a normal population and σ is
known, the normal distribution can be used.
– If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with the t-distribution
▬ standard normal
▬ t-distribution 2
1 1;1( )
n
sCI x t
n
t Distribution
64
65
• Example – The manager of a small departmental store is concerned about
the decline of his weekly sales.
– He calculated the average and standard deviation of his sales for the past 12 weeks,
– Estimate with 99% confidence the population mean sales of the departmental store.
1;12
134612400 3,106
12
12400 1206,86
11193,14 ; 13606,86
n
sx t
n
= R12400 and s = R1346x
99% confident the mean weekly
sales will be between
R11 193,14 and R13 606,86
t11;0.995
66
• Confidence interval for Population proportion – Each element in the population can be classified as a
success or failure
– Proportion always between 0 and 1
– For large samples the sample proportion is
approximately normal
2
1 1
ˆ ˆ(1 )ˆ( )
p pCI p p z
n
number of successesˆSample proportion = =
sample size
xp
n
p̂
Exam question 7 1. In a sample of 200 residents of Johannesburg, 120 reported
they believed the property taxes were too high. Develop a
95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer
2. The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate
for the mean time it will take the mechanic for all engine tune ups. Interpret your answer
15 MARKS
Exam question 7 PROCEDURE
1. Determine what measure your are looking at: mean, proportion or standard deviation
2. Select appropriate formula based on 1. and sample size (t for small sample sizes <30; z for larger sample sizes)
3. Put the numbers into the formula and calculate the confidence intervals
Exam question 7 1.
𝑝 = 120/200 = 0.6
Z 1-α
2
= 1.96
CI = 0.6 +/_1.96 √( 0.6 0.4 )/200
CI = 0.6 +/- 0.07
0.53<CI<0.67
At CL of 95% between 53% and 67% of residents believe tax rate is too high
In a sample of 200 residents of Johannesburg, 120 reported they believed the property taxes were too high. Develop a 95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer
21 1
ˆ ˆ(1 )ˆ( )
p pCI p p z
n
number of successesˆSample proportion = =
sample size
xp
n
Exam question 7
The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate for the mean time it will take the mechanic for all engine tune ups. Interpret your answer
2
1 1;1( )
n
sCI x t
n
= 45 +/- 2.093 14
√20
= 45 +/- 6.55 38.45< µ < 51.55 At a confidence level of 95% the population average time to complete a tune up is between 38.45 and 51.55 minutes
HYPOTHESIS TESTING
STEPS OF A HYPOTHESIS TEST
Step 1 • State the null and alternative hypotheses
Step 2 • State the values of α
Step 3 • Calculate the value of the test statistic
Step 4 • Determine the critical value
Step 5 • Make a decision using decision rule or graph
Step 6 • Draw a conclusion
72
73
• Hypothesis test for Population Mean, n < 30 – If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with the t-distribution with (n - 1) degrees of freedom
Testing H0: μ = μ0 for n < 30
Alternative
hypothesis
Decision rule:
Reject H0 if Test statistic
H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2
H1: μ > μ0 t ≥ tn-1;1- α
H1: μ < μ0 t ≤ -tn-1;1- α
0xt
s
n
74
• Hypothesis testing for Population proportion
–
– Proportion always between 0 and 1
number of successesˆSample proportion = =
sample size
xp
n
Testing H0: p = p0 for n ≥ 30
Alternative
hypothesis
Decision rule:
Reject H0 if Test statistic
H1: p ≠ p0 |z| ≥ Z1- α/2
H1: p > p0 z ≥ Z1- α
H1: p < p0 z ≤ -Z1- α
0
0 0
ˆ
(1 )
p pz
p p
n
Exam question 8
1. Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.
2. The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level
16 MARKS
Exam question 8 Procedure
1. State H0 and Ha
2. Determine the critical value from the appropriate test table using α, and n
3. Compute test statistic (t or z value??)
4. Draw conclusion
Exam question 8
State hypothesis
H0: µ = 42.5
Ha: µ > 42.5
Determine critical value
tn-1; 1- α = t 23; 0.9 = 1.319
Reject H0 if the test statistic is > 1.319
Calculate test statistic
T = 40-42.5 = -6.12
2
√24
Do not reject H0
Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.
0xt
s
n
Exam question 8 State hypothesis
H0: p = 0.55
Ha: p > 0.55
Determine critical value
α = 0.05 Z = 1.64
Reject H0 if Z test > 1.64
Calculate test statistic
Z = 0.6−0.55
√((0.55)(0.45)/70 = 0.84
Do not reject H0
The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level
0
0 0
ˆ
(1 )
p pz
p p
n
number of successesˆSample proportion = =
sample size
xp
n
CORRELATION COEFFICIENT
80
Coefficient of correlation
• The coefficient of correlation is used to measure the strength of association between two variables.
• The coefficient values range between -1 and 1.
– If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line.
– If r = 0 there is no linear pattern.
• The coefficient can be used to test for linear relationship between two variables.
81
X
Y
X
Y
X
Y
X
Y
X
Y
X
Y
Perfect positive
r = +1
High positive
r = +0,9
Low positive
r = +0,3
Perfect negative
r = -1
High negative
r = -0,8
No Correlation
r = 0
Exam question 10 The cost of repairing cars that were involved in accidents is one reason that insurance premiums are so high. In an experiment 5 cars were driven into a wall. The speeds were varied between 20km/hr and 80km/hr (X). The costs of repair (Y) were estimated and listed below:-
1. Use calculator to calculate coefficient of correlation. Interpret your
answer 2. Calculate and interpret the coefficient of determination for this
data 3. Use your calculator to construct regression line equation and
predict repair cost at 50km/h
10 MARKS
SPEED (Km/h) (X) COST OF REPAIR (R’000) (Y)
20 3
30 5
40 8
60 24
80 34
Exam question 10
1. Put data into calculator
2. Select regression function and select r
3. Calculate coefficient of determination
= r2 x100%
4. Interpret results
5. Using Y = A + BX select regression function on calculator and determine values for A & B
6. Put x = 50 into formula and calculate result
Exam question 10 1. r = 0.98
There is a very strong relationship between the repair cost and speed.
2. r2 x 100% = 0.982 x 100 = 96%
96% of the variation in the cost of repair is explained by the variation in the speed at which the car crashed
3. Y = -10.7 +0.55x
X = 50 Y = 16.8
top related