revision workshop 17 january 2013
DESCRIPTION
NUBE Revision WorkshopTRANSCRIPT
![Page 1: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/1.jpg)
REVISION WORKSHOP
NUBE 17TH JANUARY 2013
![Page 2: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/2.jpg)
2
• Frequency table consists of a number of classes and each observation is counted and recorded as the frequency of the class.
• If n observations need to be classified into a frequency table, determine:
–
– max minClass widthx x
c
Organising and graphing quantitative data in a frequency
distribution table.
Number of classes:
1 3,3logc n
![Page 3: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/3.jpg)
3
Example: The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
Organising and graphing quantitative data in a frequency
distribution table.
![Page 4: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/4.jpg)
4
1 3,3log
1 3,3log 48 6,5 7
Number of classes n
max min 22 22,86 3
7
x xClass width
k
Frequency distribution
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
![Page 5: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/5.jpg)
5
– first class min min[ ; )x x class width[ 2 ; 2 3 )[ 2 ; 5 )
– second class [ 5 ; 5 3 )[ 5 ; 8 )[ 5 ; 5 )class width
Frequency distribution
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
“[“ value is included in class
“)“ value is excluded from class
![Page 6: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/6.jpg)
Classes Count
[2;5)
[5;8)
[8;11)
[11;14)
[14;17)
[17;20)
[20;23)
6
3
4
11
13
9
2
6
8 11 12 20 ….
5 7 11 12 ….
6 18 9 15 ….
11 13 22 11 ….
19 14 17 9 ….
Frequency distribution
|
|
|
|
|
|
│││
││││
│││││││││
│││││││││││││
│││││││││││
││││││
││
![Page 7: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/7.jpg)
7
Classes Frequency (f)
[2;5) 3
[5;8) 4
[8;11) 11
[11;14) 13
[14;17) 9
[17;20) 6
[20;23) 2
Total 48
Frequency distribution
![Page 8: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/8.jpg)
8
Classes f % frequency
[2;5) 3 3/48×100 = 6,3
[5;8) 4 4/48×100 = 8,3
[8;11) 11 11/48×100 = 22,9
[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2
Total 48 100
Frequency distribution
![Page 9: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/9.jpg)
9
Classes f % f Cumulative frequency (F)
[2;5) 3 6,3 3
[5;8) 4 8,3 3 + 4 = 7
[8;11) 11 22,9 7 + 11 = 18
[11;14) 13 27,1 18 + 13 = 31
[14;17) 9 18,8 31 + 9 = 40
[17;20) 6 12,5 40 + 6 = 46
[20;23) 2 4,2 46 + 2 = 48
Total 48 100
Frequency distribution
![Page 10: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/10.jpg)
10
Classes f % f F % F
[2;5) 3 6,3 3 3/48×100 = 6,3
[5;8) 4 8,3 7 7/48×100 = 14,6
[8;11) 11 22,9 18 18/48×100 = 37,5
[11;14) 13 27,1 31 64,6
[14;17) 9 18,8 40 83,3
[17;20) 6 12,5 46 95,8
[20;23) 2 4,2 48 100
Total 48 100
Frequency distribution
![Page 11: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/11.jpg)
11
Classes f F Class mid-points (x)
[2;5) 3 3 (2 + 5)/2 = 3,5
[5;8) 4 7 (5 + 8)/2 = 6,5
[8;11) 11 18 (8 + 11)/2 = 9,5
[11;14) 13 31 (11 + 14)/2 = 12,5
[14;17) 9 40 15,5
[17;20) 6 46 18,5
[20;23) 2 48 21,5
Total 48
Frequency distribution
![Page 12: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/12.jpg)
12
Classes f % f F % F (x)
[2;5) 3 6,3 3 6,3 3,5
[5;8) 4 8,3 7 14,6 6,5
[8;11) 11 22,9 18 37,5 9,5
[11;14) 13 27,1 31 64,6 12,5
[14;17) 9 18,8 40 83,3 15,5
[17;20) 6 12,5 46 95,8 18,5
[20;23) 2 4,2 48 100 21,5
Total 48 100
Frequency distribution
![Page 13: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/13.jpg)
13
Classes f % f
[2;5) 3 6,3
[5;8) 4 8,3
[8;11) 11 22,9
[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2
y-axis
x-axis
Histograms
![Page 14: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/14.jpg)
14
Histograms
Number of telephone calls per hour
at a municipal call centre
0
2
4
6
8
10
12
14
Number of calls
Nu
mb
er
of
ho
urs
2 5 8 11 14 17 20 23
![Page 15: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/15.jpg)
Definitions
Frequency Polygon
A line graph of a frequency distribution and offers a useful alternative to a histogram. Frequency polygon is useful in conveying the shape of the distribution
Ogive
A graphic representation of the cumulative frequency distribution. Used for approximating the number of values less than or equal to a specified value
15
![Page 16: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/16.jpg)
16
Class mid-points (x) f % f
3,5 3 6,3
6,5 4 8,3
9,5 11 22,9
12,5 13 27,1
15,5 9 18,8
18,5 6 12,5
21,5 2 4,2
y-axis
x-axis
Frequency polygons
![Page 17: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/17.jpg)
17
Number of telephone calls per hour
at a municipal call centre
0
2
4
6
8
10
12
14
0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5
Number of calls
Nu
mb
er
of
ho
urs
Arbitrary mid-points to
close the polygon.
(x)
3,5
6,5
9,5
12,5
15,5
18,5
21,5
Frequency polygons
![Page 18: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/18.jpg)
18
Classes F % F
[2;5) 3 6,3
[5;8) 7 14,6
[8;11) 18 37,5
[11;14) 31 64,6
[14;17) 40 83,3
[17;20) 46 95,8
[20;23) 48 100
y-axis
x-axis
Ogives
![Page 19: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/19.jpg)
19
Ogive of number of call received
at a call centre per hour
0102030405060708090
100
2 5 8 11 14 17 20 23
Number of calls
% C
um
ula
tiv
e
nu
mb
er
of
ho
urs
None of the hours had
less than 2 calls.
Ogives
![Page 20: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/20.jpg)
20
Ogive of number of call received
at a call centre per hour
0102030405060708090
100
2 5 8 11 14 17 20 23
Number of calls
% C
um
ula
tiv
e
nu
mb
er
of
ho
urs
Ogives
50% of the hours had less
than 12 calls per hour.
80% of the
hours had
less than
17 calls
per hour.
20% of the
hours had
more than
17 calls
per hour.
![Page 21: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/21.jpg)
Exam question 2 A garbage removal company would like to start charging by the weight of a customers bin rather than by the number of bins put out. They select a sample of 25 customers and weigh their garbage bins. The weights in kg are given below:-
1. Construct a frequency table to describe the data. Include a frequency and relative (%) frequency column. (Hint: start the class intervals with the whole number just smaller than the lowest value in the dataset)
14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.4
13.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.1
14.8 15.1 4.9 17.0 10.3
![Page 22: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/22.jpg)
Procedure
1. Calculate the range of the dataset
2. Calculate the no of classes
3. Calculate the class width
4. Construct table showing the intervals calculated in 1 to 3
5. Put in the tally for each interval and then show as frequency
6. Calculate the relative (%) frequency
13 marks
![Page 23: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/23.jpg)
Range
31.4 - 4.9 = 26.5
No of classes
K or c= 1+3.3logn
n = 25 K or c= 3.3 log (25) = 5.61 ≈ 6
Class Width
= 26.5/6 = 4.41 ≈ 5
max minClass widthx x
c
![Page 24: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/24.jpg)
INTERVALS TALLY FREQUENCY (f) RELATIVE FREQUENCY (%f)
4 - < 9 111 3 12
9 - < 14 1111 1 6 24
14 - < 19 1111 1111 9 36
19 - < 24 111 3 12
24 - < 29 111 3 12
29 - < 34 1 1 4
25 100
No of classes = 6 Class width = 5
![Page 25: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/25.jpg)
Exam question 2
2. Comment on the interval containing the lowest percentage
3. In which interval do the data tend to cluster? Which descriptive statistics measure, can we assume, would be found in this interval?
4. Comment on the shape of the distribution without drawing a graph . Give reasons
4% of bins weighed between 29 & 34 kg
Largest no. of bins weighed between 14 & 19kg. We assume mode will fall in this
interval (highest frequency)
+ve skewed as more values located in lower intervals
7 MARKS
![Page 26: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/26.jpg)
Quartiles & Box & Whisker Plots
![Page 27: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/27.jpg)
27
• Quartiles • Percentiles • Interquartile range
![Page 28: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/28.jpg)
QUARTILES
28
![Page 29: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/29.jpg)
29
• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.
25% 25% 25% 25%
Q1 Q2 Q3 Min Max
![Page 30: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/30.jpg)
30
Determine Q1 for the sample of nine measurements:
•Order the measurements
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
Find difference between data for 2 & 3
2-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5
Add to smallest figure: -3 + 2.5: Q1 = 0.5
th1 11 4 4 is the 1 9 1 2,5 valueQ n
![Page 31: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/31.jpg)
31
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
Q3 = 5 + 0,5(6 − 5) = 5,5
th3 33 4 4 is the 1 9 1 7,5 valueQ n
Determine Q3 for the sample of nine measurements:
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
![Page 32: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/32.jpg)
32
Q3 = 5,5
Q1 = −0,5
Interquartile range = Q3 – Q1
Interquartile range
= 5,5 – (−0,5)
= 6
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
![Page 33: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/33.jpg)
INTERQUARTILE RANGE (IQR)
• Difference between the third and first quartiles
• Indicates how far apart the first and third quartiles are
IQR = Q3 – Q1
33
![Page 34: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/34.jpg)
BOX & WHISKER PLOT
• Provides a graphical summary of data based on 5 summary measures or values
– First quartile, median, third quartile ,lower limit, upper limit
• Box and whisker plot detects outliers in a data set
LL = Q1 – 1,5 (IQR)
UL = Q3 + 1,5 (IQR)
34
![Page 35: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/35.jpg)
35
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
Me = 12,38
Q3 = 15,67
Q1 = 9,36
IRR = 6,31
LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14
BOX-AND-WISKER PLOT
1,5(IQR) 1,5(IQR) IQR
• Any value smaller than −0,11 will be an outlier.
• Any value larger than 25,14 will be an outlier.
![Page 36: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/36.jpg)
Exam question 3 The Tubeka brothers spent the following amounts in Rand on groceries over the last 8 weeks:-
1. Calculate a five number summary table
2. Construct a box and whisker plot for the data
3. Determine whether there are any outliers. Show calculations
20 MARKS
PROCEDURE
1. Reorder the data set
2. Identify maximum and minimum values in dataset
3. Calculate median
4. Calculate Q1 & Q3
5. Construct plot
6. Calculate upper & lower limits for dataset to determine if outliers present
54 56 89 67 74 57 43 51
![Page 37: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/37.jpg)
43 51 54 56 57 67 74 89
xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 value Between 51 & 54 54-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower value Q1 = 51 + 0.75 = 51.75 Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 value Between 67 & 74 74 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower value Q3 = 67 + 5.25 = 72.25
![Page 38: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/38.jpg)
43 51 54 56 57 67 74 89
xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 OUTLIERS 1. Calculate upper & lower limits
LL = Q1 – 1,5 (IQR) UL = Q3 + 1,5 (IQR)
IQR = 72.25 – 51.75 = 20.5
LL = 51.75 – 1,5(20.5) = 21 UL = 72.25 + 1.5(20.5) = 103
No values smaller than 21 or greater than 103 therefore no outliers present
![Page 39: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/39.jpg)
MEASURES OF LOCATION
![Page 40: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/40.jpg)
40
th
i
th
where frequency of the i class interval
= class midpoint of the i class interval
i i
i
i
f xx
f
f
x
• ARITHMETIC MEAN
– Data is given in a frequency table
– Only an approximate value of the mean
![Page 41: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/41.jpg)
41
12
-1
where = lower boundary of the median interval
= upper boundary of the median interval
= cumulative frequency of interval foregoing
median interval
= frequency o
n
i i i
e i
i
i
i
i
i
u l FM l
f
l
u
F
f
f the median interval
• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the median class interval.
– Median can also be determined from the ogive.
![Page 42: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/42.jpg)
42
• MODE
– Class interval that has the largest frequency value will contain the mode.
– Mode is the class midpoint of this class.
– Mode must be determined from the histogram.
![Page 43: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/43.jpg)
43
To calculate the
mean for the sample
of the 48 hours:
determine the class
midpoints
Number of Number of calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
![Page 44: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/44.jpg)
44
Number of Number of calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
n = 48
597
48
12, 44
i i
i
f xx
f
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
Average number
of calls per hour
is 12,44.
![Page 45: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/45.jpg)
Exam question 3 The number of overtime hours worked by 40 part-time employees of a security company in 1 week is shown in the following frequency distribution:-
1. Estimate the mean number of overtime hours worked
2. What % of employees worked at least 4.2 hours overtime?
8 marks
Hours per week
Frequency (f)
2.1 - < 2.8 12
2.8 - < 3.5 13
3.5 - < 4.2 7
4.2 - < 4.9 5
4.9 - < 5.6 2
5.6 - < 6.3 1
![Page 46: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/46.jpg)
Exam question 3 Procedure
1. Calculate the midpoint x for each interval (lower limit + upper limit/2)
2. Multiply f by the midpoint x
3. Total the fx and f columns
4. Divide ∑fx by ∑f
![Page 47: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/47.jpg)
Exam question 3
Mean = 136.5/40 = 3.41hrs
Employees at least 4.2 hrs = 8 8/40 *100 = 20%
Hours per week Frequency (f) Mid point (x) fx
2.1 - < 2.8 12 (2.1 + 2.8)/2= 2.45
29.4
2.8 - < 3.5 13 3.15 40.95
3.5 - < 4.2 7 3.85 26.95
4.2 - < 4.9 5 4.55 22.75
4.9 - < 5.6 2 5.25 10.5
5.6 - < 6.3 1 5.95 5.95
40 136.5
![Page 48: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/48.jpg)
PERCENTILES
48
![Page 49: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/49.jpg)
49
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.
20% 80%
P80 Min Max
50% 50%
P50 = Q2 Min Max
10%
P10 Min Max
90%
![Page 50: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/50.jpg)
50
−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9
P20 = −3
nd2020 100 100
is the 1 9 1 2 valuepP n
Determine P20 for the sample of nine measurements:
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
![Page 51: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/51.jpg)
51
Number of Number of calls hours fi F
[2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
= np/100
= 48(60)/100
= 28,8
The first cumulative
frequency ≥ 28,8
P60
![Page 52: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/52.jpg)
52
Number of Number of calls hours fi F
[2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48
n = 48
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.
60
1100
P
14 11 28,8 1811
1313,49
np
p p p
p
p
u l Fl
f
60% of the time less than 13,49 or 40% of the time more than 13,49 calls per hour.
![Page 53: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/53.jpg)
Exam question 3 1. John, one of the part-time workers was told he falls on the
70th percentile. Calculate the value and explain what it means.
PROCEDURE
1. Calculate the cumulative frequencies
2. Calculate which class the required percentile falls into by using P =np/100
3. Once you have identified the class use the percentile formula given in the tables book to calculate the value. Take CARE to order the calculation correctly.
4 MARKS
![Page 54: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/54.jpg)
Exam question 3
P = np/100 = 40*70/100
=28
P70 = 3.5 + [ (4.2-3.5)(28-25)]/7
= 3.5 + 0.8
=3.8
70% of the workers worked fewer hours overtime than John. 70% of the workers worked fewer than 3.8 hrs. 30% of the workers worked more overtime hours than John. 30% of the employees worked more than 3.8hrs.
Hours per week
Frequency (f)
Cumulative F
2.1 - < 2.8 12 12
2.8 - < 3.5 13 25
3.5 - < 4.2 7 32
4.2 - < 4.9 5 37
4.9 - < 5.6 2 39
5.6 - < 6.3 1 40
40
![Page 55: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/55.jpg)
CONFIDENCE INTERVALS
![Page 56: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/56.jpg)
56
Confidence interval
– An interval is calculated around the sample statistic
Confidence interval
Population parameter
included in interval
![Page 57: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/57.jpg)
57
Confidence interval
– An upper and lower limit within in which the population parameter is expected to lie
– Limits will vary from sample to sample
– Specify the probability that the interval will include the parameter
– Typical used 90%, 95%, 99%
– Probability denoted by
• (1 – α) known as the level of confidence
• α is the significance level
Example:
Meaning of a 90% confidence interval:
90% of all possible samples taken from
population will produce an interval that will
include the population parameter
![Page 58: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/58.jpg)
• An interval estimate consists of a range of values with an upper & lower limit
• The population parameter is expected to lie within this interval with a certain level of confidence
• Limits of an interval vary from sample to sample therefore we must also specify the probability that an interval will contain the parameter
• Ideally probability should be as high as possible
58
![Page 59: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/59.jpg)
SO REMEMBER
•We can choose the probability
•Probability is denoted by (1-α)
•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)
•The probability is known as the LEVEL OF CONFIDENCE
•α is known as the SIGNIFICANCE LEVEL
•α corresponds to an area under a curve
•Since we take the confidence level into account when we estimate an interval, the interval is called CONFIDENCE INTERVAL
59
![Page 60: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/60.jpg)
60
Confidence interval for Population Mean, n ≥ 30
- population need not be normally distributed
- sample will be approximately normal
2
2
1 1
1 1
( ) , if is known
( ) , if is not known
CI x Zn
sCI x Z
n
![Page 61: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/61.jpg)
Example :
90% confidence interval
1 – 0,90
0,10
0,100,05
2 2
61
2
2
1 1
1 1
( ) , if is known
( ) , if is not known
CI x Zn
sCI x Z
n
Lower conf limit Upper conf limit x
1 - α
2
2
Confidence level
= 1 - α
1
1 – α
= 0,90 0,052
0,05
2
90% of all sample
means fall in this area
These 2 areas added
together = α i.e. 10%
![Page 62: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/62.jpg)
62
![Page 63: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/63.jpg)
63
• Confidence interval for Population Mean, n < 30 – For a small sample from a normal population and σ is
known, the normal distribution can be used.
– If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with the t-distribution
▬ standard normal
▬ t-distribution 2
1 1;1( )
n
sCI x t
n
![Page 64: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/64.jpg)
t Distribution
64
![Page 65: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/65.jpg)
65
• Example – The manager of a small departmental store is concerned about
the decline of his weekly sales.
– He calculated the average and standard deviation of his sales for the past 12 weeks,
– Estimate with 99% confidence the population mean sales of the departmental store.
1;12
134612400 3,106
12
12400 1206,86
11193,14 ; 13606,86
n
sx t
n
= R12400 and s = R1346x
99% confident the mean weekly
sales will be between
R11 193,14 and R13 606,86
t11;0.995
![Page 66: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/66.jpg)
66
• Confidence interval for Population proportion – Each element in the population can be classified as a
success or failure
– Proportion always between 0 and 1
– For large samples the sample proportion is
approximately normal
2
1 1
ˆ ˆ(1 )ˆ( )
p pCI p p z
n
number of successesˆSample proportion = =
sample size
xp
n
p̂
![Page 67: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/67.jpg)
Exam question 7 1. In a sample of 200 residents of Johannesburg, 120 reported
they believed the property taxes were too high. Develop a
95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer
2. The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate
for the mean time it will take the mechanic for all engine tune ups. Interpret your answer
15 MARKS
![Page 68: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/68.jpg)
Exam question 7 PROCEDURE
1. Determine what measure your are looking at: mean, proportion or standard deviation
2. Select appropriate formula based on 1. and sample size (t for small sample sizes <30; z for larger sample sizes)
3. Put the numbers into the formula and calculate the confidence intervals
![Page 69: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/69.jpg)
Exam question 7 1.
𝑝 = 120/200 = 0.6
Z 1-α
2
= 1.96
CI = 0.6 +/_1.96 √( 0.6 0.4 )/200
CI = 0.6 +/- 0.07
0.53<CI<0.67
At CL of 95% between 53% and 67% of residents believe tax rate is too high
In a sample of 200 residents of Johannesburg, 120 reported they believed the property taxes were too high. Develop a 95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer
21 1
ˆ ˆ(1 )ˆ( )
p pCI p p z
n
number of successesˆSample proportion = =
sample size
xp
n
![Page 70: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/70.jpg)
Exam question 7
The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate for the mean time it will take the mechanic for all engine tune ups. Interpret your answer
2
1 1;1( )
n
sCI x t
n
= 45 +/- 2.093 14
√20
= 45 +/- 6.55 38.45< µ < 51.55 At a confidence level of 95% the population average time to complete a tune up is between 38.45 and 51.55 minutes
![Page 71: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/71.jpg)
HYPOTHESIS TESTING
![Page 72: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/72.jpg)
STEPS OF A HYPOTHESIS TEST
Step 1 • State the null and alternative hypotheses
Step 2 • State the values of α
Step 3 • Calculate the value of the test statistic
Step 4 • Determine the critical value
Step 5 • Make a decision using decision rule or graph
Step 6 • Draw a conclusion
72
![Page 73: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/73.jpg)
73
• Hypothesis test for Population Mean, n < 30 – If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with the t-distribution with (n - 1) degrees of freedom
Testing H0: μ = μ0 for n < 30
Alternative
hypothesis
Decision rule:
Reject H0 if Test statistic
H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2
H1: μ > μ0 t ≥ tn-1;1- α
H1: μ < μ0 t ≤ -tn-1;1- α
0xt
s
n
![Page 74: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/74.jpg)
74
• Hypothesis testing for Population proportion
–
– Proportion always between 0 and 1
number of successesˆSample proportion = =
sample size
xp
n
Testing H0: p = p0 for n ≥ 30
Alternative
hypothesis
Decision rule:
Reject H0 if Test statistic
H1: p ≠ p0 |z| ≥ Z1- α/2
H1: p > p0 z ≥ Z1- α
H1: p < p0 z ≤ -Z1- α
0
0 0
ˆ
(1 )
p pz
p p
n
![Page 75: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/75.jpg)
Exam question 8
1. Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.
2. The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level
16 MARKS
![Page 76: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/76.jpg)
Exam question 8 Procedure
1. State H0 and Ha
2. Determine the critical value from the appropriate test table using α, and n
3. Compute test statistic (t or z value??)
4. Draw conclusion
![Page 77: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/77.jpg)
Exam question 8
State hypothesis
H0: µ = 42.5
Ha: µ > 42.5
Determine critical value
tn-1; 1- α = t 23; 0.9 = 1.319
Reject H0 if the test statistic is > 1.319
Calculate test statistic
T = 40-42.5 = -6.12
2
√24
Do not reject H0
Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.
0xt
s
n
![Page 78: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/78.jpg)
Exam question 8 State hypothesis
H0: p = 0.55
Ha: p > 0.55
Determine critical value
α = 0.05 Z = 1.64
Reject H0 if Z test > 1.64
Calculate test statistic
Z = 0.6−0.55
√((0.55)(0.45)/70 = 0.84
Do not reject H0
The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level
0
0 0
ˆ
(1 )
p pz
p p
n
number of successesˆSample proportion = =
sample size
xp
n
![Page 79: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/79.jpg)
CORRELATION COEFFICIENT
![Page 80: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/80.jpg)
80
Coefficient of correlation
• The coefficient of correlation is used to measure the strength of association between two variables.
• The coefficient values range between -1 and 1.
– If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line.
– If r = 0 there is no linear pattern.
• The coefficient can be used to test for linear relationship between two variables.
![Page 81: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/81.jpg)
81
X
Y
X
Y
X
Y
X
Y
X
Y
X
Y
Perfect positive
r = +1
High positive
r = +0,9
Low positive
r = +0,3
Perfect negative
r = -1
High negative
r = -0,8
No Correlation
r = 0
![Page 82: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/82.jpg)
Exam question 10 The cost of repairing cars that were involved in accidents is one reason that insurance premiums are so high. In an experiment 5 cars were driven into a wall. The speeds were varied between 20km/hr and 80km/hr (X). The costs of repair (Y) were estimated and listed below:-
1. Use calculator to calculate coefficient of correlation. Interpret your
answer 2. Calculate and interpret the coefficient of determination for this
data 3. Use your calculator to construct regression line equation and
predict repair cost at 50km/h
10 MARKS
SPEED (Km/h) (X) COST OF REPAIR (R’000) (Y)
20 3
30 5
40 8
60 24
80 34
![Page 83: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/83.jpg)
Exam question 10
1. Put data into calculator
2. Select regression function and select r
3. Calculate coefficient of determination
= r2 x100%
4. Interpret results
5. Using Y = A + BX select regression function on calculator and determine values for A & B
6. Put x = 50 into formula and calculate result
![Page 84: Revision workshop 17 january 2013](https://reader034.vdocuments.net/reader034/viewer/2022042515/5487b5d9b4af9f910d8b5461/html5/thumbnails/84.jpg)
Exam question 10 1. r = 0.98
There is a very strong relationship between the repair cost and speed.
2. r2 x 100% = 0.982 x 100 = 96%
96% of the variation in the cost of repair is explained by the variation in the speed at which the car crashed
3. Y = -10.7 +0.55x
X = 50 Y = 16.8