s1 notes (edexcel) · ... a2 notes and igcse / gcse worksheets 1 s1 notes (edexcel) for use only in...
TRANSCRIPT
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 1
S1 Notes (Edexcel)
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 2
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 3
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 4
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 5
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 6
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 7
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 8
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 9
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 10
Definitions for S1 Statistical Experiment A test/investigation/process adopted for collecting data to provide evidence for or against a hypothesis. “Explain briefly why mathematical models can help to improve our understanding of real world problems” Simplifies a real world problem; enables us to gain a quicker / cheaper understanding of a real world problem Advantage and disadvantage of statistical model Advantage : cheaper and quicker Disadvantage : not fully accurate
“Write down two reasons for using statistical models” To simplify a real world problem To improve understanding / describe / analyse a real world problem Quicker and cheaper than using real thing To predict possible future outcomes Refine model / change parameters possible Any 2
“Statistical models can be used to describe real world problems. Explain the process involved
in the formulation of a statistical model.” Observe real-world problem Devise a statistical model and collect data (Experimental) data collected Model used to make predictions Compare and observe against expected outcomes and test model; Statistical concepts are used to test how well the model describes the real-world problem Refine model if necessary.
A sample space A list of all possible outcomes of an experiment Event Sub-set of possible outcomes of an experiment. Normal Distribution Bell shaped curve symmetrical about mean; mean = mode = median 95% of data lies within 2 standard deviations of mean 68.3% between one standard deviation of mean
Independent Events
( ) ( ) ( )P A B P A P B Mutually Exclusive Events
( ) 0P A B
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 11
Explanatory and response variables The response variable is the dependent variable. It depends on the explanatory variable (also called the independent variable). So in a graph of length of life versus number of cigarettes smoked per week, the dependent variable would be length of life. It depends (or may do) on the number of cigarettes smoked per week. Give two reasons to justify the use of statistical models Used to simplify or represent a real world problem Cheaper or quicker or easier (than the real situation) or more easily modified (any two lines) To improve understanding of the real world problem B1 Used to predict outcomes from a real world problem (idea of predictions) Describe the main features and uses of a box plot.
Two tests for skewness
Positive skew if 3 2 2 1 0Q Q Q Q and if Mean > Median > Mode
Negative skew if 3 2 2 1 0Q Q Q Q and if Mean < Median < Mode
A good way to remember the condition for skewness involving mean, median and mode. Write down the three averages in alphabetical order, that is
Mean…Median…Mode
For positive skew fill the blank with a “>” sign to give
Mean > Median > Mode For negative skew fill the blank with a “<” sign to give
Mean < Median < Mode
Which sort of average and which sort of measure of dispersion should you use? If there are outliers or if the data is skewed then use median and IQR, If the are not outliers then use mean and standard deviation.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 12
Data Discrete Discrete data can only take certain values in any given range. Number of cars in a household is an example of discrete data. The values do not have to be whole numbers (e.g. shoe size is discrete). Continuous Continuous data can take any value in a given range. So a person’s height is continuous since it could be any value within set limits. Categorical Categorical data is data which is not numerical, such as choice of breakfast cereal etc. Data may be displayed as grouped data or ungrouped data. We say that data is “grouped” when we present it in the following way:
Weight (w) Frequency 65- 3 70- 7
Or
Score (s) Frequency 5-9 2
10-14 5 NB: We can group discrete data or continuous data. We must know how to interpret these groups, So that
Weight (w) 65- 65 70w 70- 70 75w
Or
Score (s) 5-9 4.5 9.5s
10-14 9.5 14.5s
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 13
Representation of Data Histograms, stem and leaf diagrams, box plots. Use to compare distributions. Back-to-back stem and leaf diagrams may be required. Stem and Leaf Diagrams The stem and leaf diagram is a very useful way of grouping data whilst retaining the original data. For example suppose we had the following scores from children in a Maths test: 85, 18, 38, 67, 43, 75, 78, 81, 92, 71, 52, 62, 49, 62, 82, 69, 55, 57, 95, 62, 37 We see that the smallest value is 18 and the largest is 95. The classes of stem and leaf diagrams must be of equal width and so it would seem sensible to choose classes 10-19, 20-29, etc. The “stem” in this case represents the tens and the “leaf” represents the units so we have the following: Scores in Maths Test
Stem (Tens) Leaf (Units) 1 8 2 3 8 7 4 3 9 5 2 5 7 6 7 2 2 9 2 7 5 8 1 8 5 1 2 9 2 5
We then arrange these in numerical order to give the following: Scores in Maths Test
Stem (Tens) Leaf (Units) 1 8 2 3 7 8 4 3 9 5 2 5 7 6 2 2 2 7 9 7 1 5 8 8 1 2 5 9 2 5
We m also include a “key” with the diagram, so we say
1 8 means 18
This diagram tells us the basic shape of the distribution. We can easily see the smallest and largest values and we can see that the mode is 62. We can also use it to calculate 1Q , 2Q and 3Q .
NB : the data must be in order in a Stem and Leaf Diagram.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 14
NB: If we wanted to represent the interval 18-22 on a stem & leaf we could not make 1 the stem since not all the numbers would begin with 1. What we could do is have a stem of 18 and then make the leaf the number we add on to the stem. In this case our key would be:
18 0 means 18 and 18 4 means 22
Back to back stem diagrams We can use these to compare two samples by using a “back to back stem plot”. In this we put stems down the middle and then one set of data on the left and the on the on the right. So we might end up with a diagram as follows:
Physics Maths 7 5 1 8
1 2 6 5 3 3 7 8 4 2 1 4 3 9
9 4 3 1 0 5 2 5 7 8 4 2 6 2 2 2 7 9
6 3 7 1 5 8 5 1 8 1 2 5
9 2 5 Our key here would be
In Physics 7 1 means 17
In Maths 1 8 means 18
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 15
Histograms Data that has been grouped can be represented using a histogram. A histogram is made up of rectangles of varying widths and heights – there are no gaps between the blocks.
The key feature of a histogram is that the area of each block is proportional to the frequency In order for the area to be equal (or proportional) to the frequency we plot frequency density on the
vertical axis, where widthclass
frequency densityfrequency . The class width is the width of the interval
(i.e. it runs from the lower boundary to the upper boundary). Remember Frequency Density is Frequency Divided by class width. Example Plot a histogram for the following:
Length (h) Frequency Class width Frequency Density
650- 3 20 0.15 670- 7 10 0.7 680- 20 10 2 690- 16 10 1.6
700-720 4 20 0.2 So the first block runs from 650 to 670 and has height 0.15 etc.
NB: If there are gaps between the stated upper limit of one class interval and the lower limit of the next class interval then we need to fill those gaps as shown below. For example,
Length (m) 15-19 14.5 19.5x 20-24 19.5 24.5x 25-29 24.5 29.5x
NB: Be careful with age since “15-19” would mean 15 20x since one is 19 until the moment before one’s 20th birthday. The shape of the histogram gives us information about the mean and the dispersion
When question says “give a reason to justify the use of a histogram to represent these data”…. The answer is “Data is continuous”
So the class width is 5. Not 19 15 4
FD
Length
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 16
Box Plot Diagram (or Box and whisker diagram) This is a diagram used to illustrate the dispersion of data. There is a box which runs from the lower quartile, 1Q to the upper quartile 3Q with the median, 2Q marked on it. The whisker then goes from
this box to the lowest value in one direction and to the highest value in the other. We end up with a diagram as follows: Skewness. Concepts outliers. Any rule to identify outliers will be specified in the question. If the question refers to outliers then use a refined box plot where the maximum length for the whisker is to, for example, 3 11.5 Q Q at the end where an outlier lies. In this case outliers (which
lie outside of the whiskers) are marked with crosses. NB We can tell something about the skewness / symmetry of the distribution from the box plot. For example,
1Q2Q 3Q
Highest Value Lowest Value
1Q2Q 3Q
Highest Value Lowest Value
NB : It must have a horizontal axis with a scale on it.
1Q2Q 3Q
3 3 11.5Q Q Q
Outlier
Lowest Value
Note that the whisker does not need to take the its maximum length. It goes only to the lowest value on this side.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 17
We can see from the above that this is positively skewed 3 2 2 1 0Q Q Q Q with a long tail of
high values. Similarly, The above is negatively skewed since 3 2 2 1 0Q Q Q Q .
1Q2Q 3Q
Highest Value Lowest Value
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 18
Measures of Location Measures of location - mean, median, mode. Data may be discrete, continuous, grouped or ungrouped. Understanding and use of coding. Measures of dispersion – variance, standard deviation, range and interpercentile ranges. Simple interpolation may be required. Interpretation of measures of location and dispersion. Ungrouped Data Mean
The mean of a set of data is valuesofnumber
values theall of sum, which can be written as
x fxx
n f
.
When the data is grouped so that it appears in class intervals we must use the midpoints of those intervals to calculate the mean (in which case it is only an estimate of the mean) Example Find the mean of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0, 62.8, 79.1, 85.6, 79.3, 85.2, 69.1, 75.2
75.6 81.4 67.3 81.5 84.0 62.8 79.1 85.6 79.3 85.2 69.1 75.2
12926.1
77.175 kg12
x
Example Find the mean of the following scores:
Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 2
12 3 13 5 ... 17 1 299
14.22 5 ... 2 21
x
Median, Quartiles, and Percentiles The lower quartile of a set of data is the value one quarter of the way into the data set, when written in ascending order. It is represented by the symbol 1Q .
The media n of a set of data is the value one half of the way into the data set, when written in ascending order. It is represented by the symbol 2Q .
The upper quartile of a set of data is the value three-quarters of the way into the data set, when written in ascending order. It is represented by the symbol 3Q .
There are many methods for calculating 1Q , 2Q and 3Q but the following method is recommended
by Edexcel and will get full credit in exams…. The xth percentile, xP , to be the value which is x% of the way through the distribution.
If there are n observations then the xth percentile is calculated as follows: If 100
xn is an integer r then the xth percentile is the midpoint of the rth and 1r th values.
If 100xn lies between the integers r and 1r then the xth percentile is the 1r th value
Also given the letter .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 19
We use the same method to find 1Q , 2Q and 3Q , treating them as the 25th, 50th and 75th percentiles
respectively. If there are n observations then…. the lower quartile is calculated as follows: If 4
n is an integer r then the lower quartile is the midpoint of the rth and 1r th values.
If 4n lies between the integers r and 1r then the lower quartile is the 1r th value.
the median is calculated as follows: If 2
n is an integer r then the median is the midpoint of the rth and 1r th values.
If 2n lies between the integers r and 1r then the median is the 1r th value.
the upper quartile is calculated as follows: If 3
4n is an integer r then the upper quartile is the midpoint of the rth and 1r th values.
If 34n lies between the integers r and 1r then the upper quartile is the 1r th value.
Example Find the median and quartiles of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0, 62.8, 79.1, 85.6, 79.3, 85.2, 69.1, 75.2 First of all list them in order: 62.8, 67.3, 69.1, 75.2, 75.6, 79.1, 79.3, 81.4, 81.5, 84, 85.2, 85.6 There are 12 pieces of data
124 3 so the lower quartile is the midpoint of the 3rd and 4th values, that is
69.1 75.272.15
2
kg
122 6 so the median is the midpoint of the 6th and 7th values, that is 79.2kg 3 12
4 9 so the median is the midpoint of the 9th and 10th values, that is 82.75kg
The interquartile range is 3 1Q Q .
Example Find the median of the following scores:
Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 2
There are 21 pieces of data: 214 5.25 so the lower quartile is 6th value, that is 13. 212 10.5 so the median is the 11th value, that is 14. 3 21
4 15.75 so the upper quartile is the 16th value, that is 15.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 20
Grouped Data Mean Use the midpoints of each category and use the same method for ungrouped data. Example Find an estimate of the mean of the following data:
Height, h (cm) 140- 145- 150- 155- 160- 165-170 Frequency 140 245 354 214 105 57
142.5 140 147.5 245 ... 167.5 57 170387.5
153 cm140 245 ... 57 1115
x
(to 3sf)
Median When dealing with grouped data we have to interpolate to find estimates for 1Q , 2Q and 3Q .
Example Find the median of the following data:
Height, h (cm) 140- 145- 150- 155- 160- 165-170 Frequency 140 245 354 214 105 57
There are 1115 pieces of data so the median is the height associated with the 1115
2 557.5 th value.
This lies somewhere between 150cm and155cm.
So the median lies 172.5
354of the way into the interval so our estimate for
the median is 172.5
150 5 152.4354
cm (to 1dp).
Use exactly the same method for finding lower quartile and upper quartile So the lower quartile is the height associated with the 1115
4 278.75 th value. This lies somewhere
between 145cm and150cm.
The lower quartile lies 278.75 140
245
of the way into the interval so our estimate for the lower
quartile is 138.75
145 5 147.8245
cm (to 1dp).
The first 739 (that is 140 245 354 ) have a height less than 155cm.
Median 150cm 155cm
385 739
The first 385 (that is 140 245 ) values have a height less than 150cm
557.5
557.5 385 172.5
739 385 354
With grouped data, work
with 2
nth value for the
median (eg. 557.5) , do not round up (e.g. to 558) as with ungrouped data.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 21
Mode The mode is the most frequently occurring value. Example Find the mode of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0 There is no mode because all values occur with equal frequency. Example Find the mode of the following scores:
Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 1
The mode is 14. Example Find the modal class of the following data:
Height, h (cm) 140- 145- 150- 160- 180- 200-250 Frequency 140 245 354 214 105 57
Frequency Density 28 49 35.4 10.7 5.25 1.14 The modal class is the class interval with the highest frequency density and so is “145-”. Variance Suppose that we have the following frequency table:
Value 1x 2x 3x 4x
rx
Frequency 1f 2f 3f 4f
rf
The variance is a measure of spread, or dispersion. It is the average of the square of the difference of each piece of data from the mean. That is, it is the average of all the values 2)( xxi .
So we have,
f
fxx 22 )(
Variance .
This can be rearranged so that it becomes easier to use and we have
22
2 Variance xf
fx
.
We tend not to put in the f and to write f as n, giving
2
2 2Variance x
xn
“mean of the squares minus the square of the mean”
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 22
Standard Deviation The standard deviation is simply the square root of the variance.
Hence the standard deviation, , is given by 2 2
2( )x x x
xn n
.
Example Find the standard deviation of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.2
First of all we calculate the mean as 75.6+81.4+67.3+81.5+84.2 390
785 5
x
We use this to get that the standard deviation is
2 2 2 2 2
275.6 +81.4 +67.3 +81.5 +84.278 6.04
5 kg (to 3sf)
Example Find the standard deviation of the following scores:
Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 1
First of all we calculate the mean as 12 3 13 5 ... 17 1 282
14.12 5 ... 1 20
x
We use this to get that the standard deviation is
2 2 2
212 2 13 5 ... 17 114.1 1.3
20
Example Find an estimate of the standard deviation of the following data:
Height, h (cm) 140- 145- 150- 155- 160- 165-170 Frequency 140 245 354 214 105 57
The only difference between this and the previous example is that we have to use midpoints. First of all we calculate the mean as
1531115
5.170387
57...245140
575.167...2455.1471405.142
x cm (to 3sf)
We use this to get that the standard deviation is
2 2 2
2142.5 140 147.5 245 ... 167.5 57153 6.6
1115 cm (to 2sf)
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 23
Understanding and use of coding. Coding Suppose we had to find the mean and standard deviation of 200.037, 200.039, 200.042, 200.054. Suppose now that we now consider 1000 200t x .
Our values of t are 37, 39, 42, 54 which are easier to deal with than the initial data. The summary statistics for t are : 5n , 172t and 2 7570t .
It follows from this that 172
434
t and 2757043 6.59
4t (to 3sf).
Shifting numbers along by 200 has no effect on the dispersion (i.e. the standard deviation) but it does shift the mean by 200. Dividing by 1000 has the effect of reducing both the mean and the standard deviation by a factor of 1000.
Since 2001000
tx it follows that
2001000
tx and that
1000t
x
Hence we see that 43
200 200.0431000
x and that 6.59
0.006591000x (to 3sf).
In general if we have x at b then x at b and x ta .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 24
Using calculator to find mean and standard deviation
Height (cm) 8-12 13-17 18-22 23-27 28-32 Frequency 6 8 12 8 6
On Texas…
Press to get:
Now press to get
Type in the following:
Press and choose CALC, 1-Var Stats
to see this:
Now type in to get
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 25
Press to see the following
Write down:
2
80020
40
1760020 6.32 (to 3sf)
40
x
sd
IGNORE Sx for S1.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 26
On Casio…
Scroll down using to get Press 3 to get Press 1
Press then 2 (STAT) and then 1 (1-VAR) to get Type in all the data to get
Press For x and 2x
Press to give Choose 4 (sum) to get Choosing 1 gives 17600
Choosing again. Choose 4 (sum) and then 2 gives 800.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 27
For the value of n, standard deviation and mean
Press to give
Choose 5 (Var) to give
Choosing 1 gives n to be 40. For standard deviation Choose 5 (Var) and then 3 to get
So standard deviation us 6.23 (to 3sf)
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 28
Skewness and Outliers Skewness. Concepts outliers. Any rule to identify outliers will be specified in the question. Skewness A distribution that is not symmetric is said to be skewed.
Mean Median Mode
Positive Skew
Mean Median
Mode
Negative Skew
If the distribution has a long ‘tail’ of high values then it is said to be positively skewed. In this case the median is less than the mean.
If the distribution has a long ‘tail’ of low values then it is said to be negatively skewed. In this case the median is greater than the mean.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 29
Two tests for skewness
Positive skew if 3 2 2 1 0Q Q Q Q and if Mean > Median > Mode
Negative skew if 3 2 2 1 0Q Q Q Q and if Mean < Median < Mode
A good way to remember the condition for skewness involving mean, median and mode. Write down the three averages in alphabetical order, that is
Mean…Median…Mode
For positive skew fill the blank with a “>” sign to give
Mean > Median > Mode For negative skew fill the blank with a “<” sign to give
Mean < Median < Mode
Outliers Values which are at one of the extremes of the distribution are said to be outliers. How extreme a value has to be in order to be classified as an outlier will be stated in the question. One measure of an outlier would be if the value was greater than 3 3 11.5Q Q Q at one end or if
it was less than 1 3 11.5Q Q Q at the other end.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 30
Probability Elementary probability. Sample space. Exclusive and complementary events. Sum and product laws.
The probability of Tottenham Hotspur winning their next game is 0.75. The probability of scoring a six when a die is thrown is 1
6 .
The probability of a part made in a factory being faulty is 0.02.
Probabilities can be expressed either as a fraction or a decimal. All probabilities lie between 0 and 1.
Sets and Venn Diagrams A set is a collection of items – such as numbers, people, letters etc. For example Set A is the set of pupils in year 11 who wear glasses Set B is the set of pupils in year 11 who have blonde hair A Venn diagram is a diagram which represents various sets. It is made up of a large rectangle with various oblongs inside to represent the various sets. These may or may not overlap. By the side of the large rectangle is the symbol . This represents “the universal set” and so could be, for example, all year 11 pupils. In this example only year 11 pupils in this school would feature in the Venn diagram. If an element is in set A and also in set B then it lies in the intersection of the two oblongs. If an element is neither in A or B then it lies inside the rectangle but outside both oblongs.
0
If an event has a probability of 0 then it will certainly not occur.
If an event has a probability of 1 then it will certainly occur.
It is more likely that an event occurs as its probability moves along this line.
event not happening 1 event happeningp p
A B
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 31
Example The diagram shown below is a Venn diagram indicating who enjoy swimming and running in a class. (a) There are 5 students who enjoy both running and swimming as there are 5 students lying in the intersection of the two oblongs. (b) There are 13 who enjoy running. That is the 8 who enjoy running but not swimming and the 5 who enjoy both running and swimming. (c) There are 10 students who enjoy neither running nor swimming. These are the 10 who lie outside both oblongs. Example In a class of 33 pupils, 20 like chess, 12 like draughts and 5 like neither. Represent this on a Venn diagram. Those who like chess will be represented by one oblong, those who like draughts will be represented by another oblong. Those who like both are in the overlap of the oblongs whilst those who like neither will lie outside both oblongs. So the 4 who like neither go outside the two oblongs.
How is the rest of the Venn diagram to be filled in? Think of the two oblongs are being two rugs which can overlap. One rug has area 20, one has area 12. The combined area has to be 28. If the oblongs did not overlap their combined area would be 32 so the overlap must be 4. So…
There are 20 who like chess in all so 16 must go in here.
There are 12 who like draughts in all so 8 must go in here
Running Swimming
8 5 3
10
Chess Draughts
5
Chess Draughts
5
16 4 8
These 5 pupils do not lie in either oblong so that leaves 28 to be put in the three positions of the oblongs.
(a) How many students enjoy both running and swimming? (b) How many enjoy running? (c) How many enjoy neither running nor swimming?
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 32
Notation
NB When A is a subset of B , A B B .
AB represents the fact that A is not a subset of B Mutually Exclusive This is an example of two events being mutually exclusive. A and B are said to be mutually exclusive if they cannot both occur at the same time, for example getting a six and a five from one throw of the die.
A B
A B
A B represents all the elements which belong to A or B or both.
A B represents all the elements which belong to A and B.
A
B
A B represents the fact that A is a subset of B
A B
A represents all the elements outside of A.
In this example A B since the two oblongs do not intersect.
A B
A B
From this we see that P A B P A P B P A B
If A and B are mutually exclusive then )()()( BPAPBAP .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 33
Example If there are 7 red balls, 2 blue balls and 6 yellow balls in a bag. A ball is chosen out of the bag then what is the probability that the ball is: (a) blue (b) red (c) either red or blue
(a) 2 out of 15 of the balls are blue so the probability of picking a blue ball is 2
15
(b) 7 out of 15 of the balls are blue so the probability of picking a red ball is 7
15
(c) The events of being red and blue are mutually exclusive so: Probability of being red or blue = Probability of being red + Probability of being red
2 7
15 159
153
5
BE CAREFUL…
Example A spinner is equally likely to land on the numbers 1, 2, 3, 4, 5 or 6. Find the probability that the spinner lands on: (a) an even number (b) a number greater than 4 (c) an even number or a number greater than 4
(a) There are three even numbers (that is 2, 4, 6) so the probability is 3 1
6 2 .
(b) There are two numbers greater than 4 (that is 5 and 6) so the probability is 2 1
6 3 .
(c) The event of landing on an even number and the event of landing on a number greater than 4 are not mutually exclusive so the probabilities cannot be added. The numbers which are even or greater than 4 are 2, 4, 5 or 6.
Hence the probability is 4 2
6 3 .
Example The probability of a girl in a school wearing glasses is 0.3 and the probability of a girl having blond hair is 0.4. Joshua thinks that the probability of wearing glasses or having blond hair is 0.7. Is he right? The answer is that he is not right. Some people have blond hair and wear glasses. In other words, wearing glasses and having blond hair are not mutually exclusive, so their probabilities cannot be added together.
1 2
3
4 5
6
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 34
Set notation Set A might be all the blonde haired pupils in year 11 in a school. It could also be the set of even numbers less than 10. In which case curly brackets are used and A is written as 2, 4,6,8A .
4 A means that 4 is an element of A. 9 A means that 9 is not an element of A.
( )n A represents the number of elements in A. If 2, 4,6,8A then ( ) 4n A .
Example
is all whole numbers less than 10.
: is an even numberA x x : is a prime numberB x x
(a) Express this on a Venn diagram. (b) List all the elements of A . (c) What is n A B ?
(d) Find A B . (a) Since is all whole numbers less than 10, the only numbers appearing on the Venn diagram are 1, 2, 3, 4, 5, 6, 7, 8 and 9 (b) A represents all the elements outside of A. So 1, 3, 5, 7, 9A
(c) A B represents all the elements which belong to A or B or both. So 2,3, 4,5,6,7,8A B . There are 8 elements in this set so 8n A B .
(d) A B represents all the elements which belong to A and B. So 2A B since 2 is the
only number which is both even and prime. Shading Unions Example A B includes any region that is shaded in A or B (or both). Area represented by A Area represented by B Area represented by A B
A B
A B
=
A B
4, 6, 8 2 3, 5, 7
1, 9
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 35
Example A B includes any region that is shaded in A or B (or both). Shading Intersections Example A B includes any region that is shaded in both A and B . Area represented by A Area represented by B Area represented by A B Some important regions to know
A B
A B
A B
=
A B
C
A C
B
A C
B
Area represented by A Area represented by B Area represented by A B
=
A B
A B
A B
A B
A B A BA B
A B
A B
A B
A BB
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 36
Combined Events P A B means the probability of A happening, given that B has happened.
Independent Events Two events A and B are said to be independent if they have no effect on each other (e.g. being late to school in Kenya and raining in Bolivia are independent).
If A and B are independent then B happening has no effect on A happening so P A B P A .
If A and B are independent then B then, since
( )( )
P A BP A B
P B
, it follows that
( )
( )P A B
P AP B
and so P A B P A P B .
A B
P A B means the probability of A
happening, given that B has happened.
Referring to the Venn diagram, P A B
represents the probability of being in the shaded area
So
( )( )
P A BP A B
P B
If A and B are independent then P A B P A P B
The probability of A happening given that B happens is
represented by P A B where
( )( )
P A BP A B
P B
.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 37
Exam Question (S1, January 2006) For the events A and B,
P(A B) = 0.32, P(A B) = 0.11 and P(A B) = 0.65.
(a) Draw a Venn diagram to illustrate the complete sample space for the events A and B. (3)
(b) Write down the value of P(A) and the value of P(B). (3)
(c) Find P(AB). (2)
(d) Determine whether or not A and B are independent. (3)
(a) (b) 0.32 0.22 0.54P A
0.22 0.11 0.33P B
(c) If 0.33P B then 1 0.33 0.67P B
0.32 32
0.478 to 3dp0.67 67
P A BP A B
P B
(d) 0.54 0.33 0.1782P A P B
0.22P A B
So P A B P A P B and so A and B are not independent.
OR…. 0.54P A and 0.478 to 3dpP A B so P A B P A and so A and B are
not independent.
A B
0.32 0.110.22
0.65 0.11 0.32 0.22
0.35
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 38
Example If a coin is tossed and a die is rolled then find the probability of getting a head on the coin and a five on the die. Suppose the event A is getting a head on the coin and that the event B is getting a five on the die.
1p
2A and 1
p6
B .
These are independent events so 1 1 1p and p p
2 6 12A B A B .
Example Alexander fires three arrows at a target. With each arrow, the probability that he hits the target is 0.3. Whether he hits with any given arrow is independent of what happened to the previous arrows he fired.
Find the probability that he hits with the first two but misses with the third. The probability that he hits the target with his first arrow is 0.3. The probability that he hits the target with his second arrow is 0.3. The probability that he misses the target with his third arrow is 1 0.3 0.7 . Since the events are assumed to be independent, the probability that he hits with the first and hits with the second and misses with the third is obtained by multiplying the three numbers given above. So the probability that he hits with the first two but misses with the third is 0.3 0.3 0.7 0.063 .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 39
Questions often involve choosing two or more balls from a bag. In these questions it is vital to establish whether or not the first ball has been replaced before the next one has been chosen. Read the question carefully.
Tree Diagrams Tree diagrams are a very clear way of representing the possible outcomes of combined events. On tree diagrams: as we move across we use a cross ( ) and so multiply probabilities as we move Down we aDD probabilities
The case when the ball is replaced…. Example A bag contains 10 balls, seven of which are red and the rest are green. A boy takes out a ball from the bag, notes its colour and puts it back. He then repeats this process. (a) Draw a tree diagram to represent this information. (b) Find the probability that he chose two red balls. (c) Find the probability that he chose two balls of different colours.
(a)
(b) Probability of two reds is 7 7 49
10 10 100 (as we move across we use a cross ( ))
(c) 7 3 21
p( )10 10 100
RG and 3 7 21
p( )10 10 100
GR
As we move Down we aDD so probability is 21 21 42 21
100 100 100 50
7/10R 49/100
3/10 G 21/100
7/10 R 21/100
3/10 G 9/100
7/10 R
3/10 G
as we move across we use a cross ( ).
7 710 10
Notice that the probabilities on the branches that start from the same point always add up to 1 .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 40
The case when the ball is not replaced…. Example 3 red balls and 2 green balls in a bag. One ball is chosen at random, its colour is noted and it is not replaced. A second ball is then chosen and its colour is also noted. Find the probability that: (a) Both balls are red. (b) The second ball is green (c) At least one of the balls is green. (d) The first ball was red, given that the second ball was green. On tree diagrams: as we move across ( ) we multiply. as we move Down we aDD (a) 3 6 32
5 4 20 101 2( )P R R (b) 6 82 2
20 20 20 52 1 2 1 2(G ) (R G ) (G G )P P P (c) Either “At least one of the balls is green” or both of them are red.
So the probabilities of these two events add up to 1. So 1 2(at least one green) 1 ( )P P R R
The tree diagram shows us that 3
101 2( )P R R .
And so we have that
1 2
6 71420 20 10
(at least one green) 1 ( )
1
P P R R
(d) We need to find 1 2P R R .
61 2 20 3
41 2 8202
P R GP R G
P G
35 R
G
R
G
R
G
24
34
25
24
14
620
220
620
620
When the 5 balls are in the bag, the probability of choosing a red is 3
5 , since 3 of the 5 balls are red.
If a red ball is chosen first, then there are now 4 balls in the bag and only 2 of them are red. The probability of choosing a red is 2
4 .
3 25 4
Questions involving “At least one” This is an example of an important short cut. You DO NOT need to think of all the cases when there is at least one green ball.
Use the notation that 1R represents
the first ball being red etc.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 41
Example 60% of the population of a certain town are vaccinated against flu. The probability of someone getting the flu given that they have had the vaccination is 0.2 but the probability of someone getting flu given that they have not had the vaccination is 0.65. (a) Find the probability that a person chosen at random gets flu. (b) Write down the probability that a person chosen at random doesn’t get flu given that he
didn’t have the vaccination. (c) Find, as a fraction, the probability that that a person chosen at random didn’t have the
vaccination given that he did get flu. (a) There are two groups of people who get flu. Either a person has a vaccination and gets flu or a person does not have a vaccination and gets flu. The probability of a person has a vaccination and gets flu is 0.12 The probability of a person does not have a vaccination and gets flu is 0.26 As we move Down we aDD so probability of getting flu is 38.026.012.0)( FP So 38.026.012.0)( FP (b) 35.0)( NVNFP (straight from tree diagram)
(c) ( ) 0.26 26 13
( )( ) 0.26 0.12 38 19
P NV FP NV F
P F
0.6
0.4
V
NV
0.2
0.8
0.65
0.35
F
NF
F
NF
0.12
0.48
0.26
0.14
Notice that the probabilities on the branches that come from the same point always add up to 1 . So 0.6 0.4 1
0.2 0.8 1
0.65 0.35 1
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 42
Correlation Scatter Diagram Many statistical investigations have to do with the relationship between several variables. We will now examine ways of handling data with two variables.
Suppose we want to compare two sets of data x and y. For example x might be the boy’s mark in a Maths exam and y might be the boy’s mark in an English exam.
Boy A B C D E F G H I J Maths Mark, x 1 8 15 17 23 28 33 39 45 45 English Mark, y 3 15 9 20 19 17 36 26 15 29
We plot these values on a graph in the form of a scatter diagram.
In the diagram we have marked on the dotted lines through x and y . This splits the diagram into four quadrants. If all the points of our scatter diagram lie near a straight line then we say that there is linear correlation between x and y.
(a) If y tends to increase as x increases then we say that there is a positive linear correlation. In this case most points will be in either the first or the third quadrants.
(b) If y tends to decrease as x increases then we say that there is a negative linear correlation. In this case most points will be in either the second or the fourth quadrants.
(c) If there is no relationship between x and y then we say that there is no correlation. In this case the points will be randomly distributed in all the quadrants.
Scattter Diagram of M aths and English M arks
05
10152025303540
0 10 20 30 40 50
M aths M ark
Eng
lish
Mar
k
x
y
4th
2nd 3rd
1st
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 43
How do we measure the strength of linear correlation? Suppose, for example that we look at the case above. If ,i ix y represented the general point then
because most points lie in either the first or the third quadrant we would find that i ix x y y
was positive most of the time and so we would find that i ix x y y would be positive.
However the problem with i ix x y y is that it varies with the size of x and y.
To avoid this we divide by 2
ix x and by 2
iy y and we end up with the product
moment correlation coefficient. Product Moment Correlation Coefficient (PMCC) The product moment correlation coefficient, its use, interpretation and limitations. Derivations and tests of significance will not be required. As stated above this measures the degree of linear correlation between x and y.
The coefficient r, known as the PMCC, is defined as r x x y y
x x 2y y 2
.
This can also be written as:
2 2
2 2
x yxy
nrx y
x yn n
FORMULA BOOK HAS
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 44
We now define three further quantities, which are, xxS , xyS and yyS . These are defined as
follows:
2
2 2xx
xS x x x
n
xy
x yS x x y y xy
n
2
2 2yy
yS y y y
n
Using these we see that xy
xx yy
Sr
S S .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 45
Now consider the properties of this coefficient.
We know that 20 for all real x x y y .
2 22
2
2 2 2
2 2 2 2
2 2
Hence 2 0 for all real
This quadratic in is always postive and so " 4 ". Hence
4 4
1 1
1 1
y y y y x x x x
b ac
y y x x x x y y
x x y y y y x x x x y y
y y x x
x x y y
r
NB1 1r if all the points lie on a straight line with positive gradient, in which case we say there is perfect positive correlation. NB2 1r if all the points lie on a straight line with negative gradient, in which case we say there is perfect negative correlation. NB 3 The lower the value of r the less linear correlation there is.
NB4 r is only a measure of a linear relationship between x and y. They may be related in another way but r would still be very low. NB5 Coding the values of x and y does not change the product moment correlation coefficient, r. So if we make X ax b and Y cx d then the product moment correlation coefficient of x and y is the same as the product moment correlation coefficient of X and Y. NB6 Suppose we told to make X ax b and Y cx d and are then asked to find the regression line of Y on X having been given the summary statistics for x and y. First off find the regression line of y on x then replace y and x in that equation with
X b
xa
and
Y by
a
.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 46
Regression
Scatter diagrams. Linear regression. Explanatory (independent) and response (dependent) variables. Applications and ations. Use to make predictions within the
range of values of the explanatory variable and the dangers of extrapolation. Derivations will not be required. Variables other than x and y may be used. Linear change of variable may be required. When we looked at the product moment correlation coefficient we were looking to measure the amount of linear association between two quantities x and y. We now want to go further and find the relationship between x and y when we know that there is this linear association. Once have plotted our data on a scatter diagram we look for a relationship y a bx (since the relationship is linear). We work back or ‘regress’ from the points we have plotted to find the equation of the line and hence the line is called the regression line of y on x. If we had a set of data such as
x 10 20 30 40 50 y 17 23 29 32 38
then x has been controlled and so it is called the explanatory (or independent) variable. y is called the response (or dependent) variable. If there is a linear relationship between y and x then for each value of iy there will be an
experimental error associated with it. We denote this error with i which can be either positive or
negative. Therefore we can write i i iy x . The mean value of the errors, i , will be about
zero. So if we are given a set of points how do we determine the equation of that line? How do we find a and b? We could simply try to draw a line of best fit by eye. There is, however, a standard line that we draw, called “least-squares regression line” which minimises the sum of squares of the distances shown below.
x
y y a bx
1 1,x y
1 1,x a bx
The explanatory (or independent) variable always goes on the horizontal axis
interpret
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 47
These distances could be denoted by id where i i id y a bx . We are, therefore, trying to
minimise 2
i iy a bx .
We can derive the values of a and b using calculus and when we do so we find that xy
xx
Sb
S where,
as we have already seen,
xy
x yS x x y y xy
n
and 2
2 2xx
xS x x x
n
Once we have the value of b we can find a using the fact that the line passes through the point ( , )x y . It follows from this that y a bx and so that a y bx .
So we say that the equation of the regression line of y on x is y a bx where xy
xx
Sb
S and
a y bx .
When using coding we might make x p
Xq
and
y mY
n
. We could then use the values of X
and Y to find the line of regression of Y on X. This would be Y a bX .
What we have left to do, is, therefore to replace X with x p
q
and Y with
y m
n
and so we get:
y m x pa b
n q
. We can then rearrange this to get y in terms of x.
We can use this find to find an estimate for the mean value of y (the response variable) when we have been given a value of x (the explanatory variable). This process is called interpolation. NB: When finding a value of y from a value of x we must be careful only to use values of x which lie in outside the range of x values which have been given. If we predict values of y using values of x which lie outside the range of given values then we are using extrapolation. We should take great care when extrapolating to see if this can be justified.
NB : When drawing the line of best fit always mark the point ,x y on your diagram and make
sure that the line of best fit passes through this point.
A word on ‘interpretation’ If the x values were the lengths of salmon and the y values were the weights of salmon and if r was close to 1 then the interpretation is not “positive correlation. It is “as the length of the salmon increases, so the weight of the salmon increases”. The January 2011 markscheme suggests that if r is greater than 0.4 then this is evidence of positive correlation though this is not explained until S2.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 48
Using calculators to find product moment correlation coefficient and line of regression
For example find the summary statistics for the data shown below..
1 4 7 10 13 16 19
5 8 13 2 11 27 35
On Texas….
1. Press
2. Press and type in all the data so that it look as follows
3. Press
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 49
4. Highlight CALC
5. Highlight 2-Var Stats and press twice and you get this:
Scrolling down gives us all we need to use in the formulae
2
2
70
101
1388
952
2337
7
x
y
xy
x
y
n
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 50
On Casio….
1. Press
2. Press to get
3. Press (A+BX) to get
4. Type in all the data so that it look as follows
5. Press to get
6. Press to get
7. Press to get
NB : To go back to Data on Casio choose
to get
Press to get back to
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 51
8. Using the different options gives us
2
2
70
101
1388
952
2337
7
x
y
xy
x
y
n
Using the values from the calculator….
Formula book tells us, for example, that 2
2 2xx
xS x x x
n
.
So we see that 270
952 2527xxS .
Formula book tells us, for example, that 2
2 2xx
xS x x x
n
.
So we see that 270
952 2527xxS .
Similarly the formula book tells us that
xy
x yS xy
n
so we see that
70 1011388 378
7xyS
In the same way the formula book tells us that that 2
2yy
yS y
n
so we see that
21012337 880
7yyS .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 52
PMCC
We could calculate directly using xy
xx yy
Sr
S S to give
3780.803
252 880r
(to 3sf).
On Texas…
Type , choosing 5: Statistics, choosing EQ and then 7 to get 0.803. On Casio…
Press and then to get
Press to get
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 53
Regression
We are also given that the regression line of y on x is y a bx where xy
xx
Sb
S .
So we can calculate 378
1.5252
b .
We then use the fact that y a bx passes through the point ,x y to see that a y bx and so
101 701.5 0.571
7 7a (to 3sf).
On Texas…
Press , choose CALC and press 4 to get the following
On Casio…
Press to get
Press to get
Press to get
Press to get
NB Formula book refers to y a bx , whilst Texas uses y ax b
NB Casio uses the same notation as the formula book, that is y a bx .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 54
Discrete Random Variables The concept of a discrete random variable. A random variable is a variable, taking numerical values, which represents the value obtained when we take a measurement from a real-life experiment. There are two kinds of random variable - discrete and continuous. A continuous random variable can take any value in a given range, whereas a discrete random variable can take only certain values. The probability function and the cumulative distribution function for a discrete random variable. Simple uses of the probability function p(x) where ( ) ( ) p x P X x . The set of all possible values of a random variable along with the associated probabilities is called the probability distribution. Example Suppose a coins is tossed three times and the number of heads is the random variable X. The probability distribution for X is as follows:
Number of Heads (X) 0 1 2 3 Probability 0.125 0.375 0.375 0.125
Sometimes the probabilities are expressed as a function as opposed to a table. This function is called the probability function. Example Find the probability function for X, the score on a die. We know that 1 1 1
6 6 6( 1) , ( 2) , ( 3)P X P X P X etc. and so we get the formula
16 for 1, 2,3, 4,5,6
( )0 otherwise
xP X x
NB ( )p x (or simply p) is a short way of writing P(X x). If X is a continuous random variable we cannot talk about the probability of X taking particular values but only of the probability of it taking a range of values, for example ( 3)P X . We cannot, therefore, use a simple probability function to define X.
NB: ( ) 1p x Example Find p in the following table
x 1 2 3 4
P X x 0.2 0.1 p 0.55 Since ( ) 1p x , it follows that 0.2 0.1 0.55 1p and so 0.15p .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 55
Use of the cumulative distribution function 0
0 0 ( )x x
F x P X x p x
Just as we have met cumulative frequency graphs in GCSE so we know meet a cumulative distribution function for a random variable, X. The cumulative distribution function of X is defined as 0 0F x P X x so that 0F x means
“the probability that X is less than or equal to 0x ”.
If X is discrete then we can also write 0F x as
0
0 ( )x x
F x p x
.
If r is the largest value which X can take then 1F r .
If s is the smallest value which X can take then F s P X s .
If X takes consecutive integer values then 1 1F r F r P r
Example Find 1F and 3F in the following
Number of Heads (X) 0 1 2 3 Probability 0.125 0.375 0.375 0.125
(1) 0 1 0.125 0.375 0.5F P X P X
3 is the largest value that X can take so (3) 1F .
Example The random variable X is such that 2
16
k xF x
for 0,1,2,3x . Find the positive
integer k is positive and draw the probability distribution table for X.
3 is the largest value that X can take so (3) 1F . 23
3 116
kF
. So 2
3 16k .
3 4k so 7 or 1k k . k is positive, so 1k .
Using 21
16
xF x
we have:
x 0 1 2 3
F x 1
16
4
16
9
16
16
16
3 0 1 2 3F P X P X P X P X and
2 0 1 2F P X P X P X so 3 2 3F F P X . So
16 9 73
16 16 16P X .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 56
Similarly 2 1 2F F P X so 9 4 52
16 16 16P X .
And 1 0 1F F P X so 4 1 31
16 16 16P X .
And 0 0F P X so 10
16P X
.So we have the following table
x 0 1 2 3
P X x 1
16
3
16
5
16
7
16
Mean and variance of a discrete random variable. Use of E X , 2E X for calculating the
variance of X. Knowledge and use of E EaX b a X b and 2Var VaraX b a X .
The mean of a probability distribution is called the expected value of the random variable X. It is written as E(X) for short and its symbol is .
We have already met the formula for the mean of a sample, i.e. x xff and by replacing the
relative frequency f
f with p we have the formula:
E(X) xp for the random variable X
We have also met the formula 2
2variancex f
xf
.
If we replace the relative frequency f
f with p as we did before and replace x with E X then
we have the formula, 2 2Var( ) E( )X x p X .
Now for any function g(X) we have E( )g X g x p and so we see from this that 2 2E( )X x p .
Hence we have the following formula for the variance of X:
22Var( ) E EX X X
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 57
In an exam if all the scores were doubled then the mean score would clearly double. Since every
value of x would increase by a factor of 4 it follows that both 2E X and 2E X increase by a
factor of 4. Hence 22Var( ) E EX X X increases by a factor of 4.
So E 2 2E and Var 2 4VarX X X X .
Similarly if in exam if all the scores were increased by 10 then the mean would increase by 10. However, since the spread of the scores has not changed, the variance remains the same. So E 10 E 10 and Var 10 VarX X X X .
Generalising the results shown above gives the following:
2
2
E E
E E
Var Var
Var Var
aX a X
aX b a X b
aX a X
aX b a X
Example The random variable X has the following probability distribution :
X 1 2 5
P X x 0.1 0.2 0.7
Find
(a) E
(b) Var
(c) E 3 1
(d) Var 10 7
X
X
X
X
(a) & (b) E 1 0.1 2 0.2 5 0.7 4X
2 2 2 2 2E 1 0.1 2 0.2 5 0.7 4 18.4X
2 2Var( ) E( ) E( ) 18.4 16 2.4X X X
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 58
On Casio…
Scroll down using to get Press 3 to get Press 1
Press then 2 (STAT) and then 1 (1-VAR) to get
Type in all the data to get
Press For mean and variance
Press to give
Choose 5 (Var) to get Choosing 2 gives the mean to be 4.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 59
Choosing again. Choose 4 (Sum) and then 1 to get 2x , that is 2E X when dealing
with probabilities not frequencies. This is 18.4
Choosing again. Choose 5 (Var) and then 3 to get standard deviation. Square to give the variance to be 2.4.
If using the calculator, write down 22 2Var E E 18.4 4 2.4X X X .
(c) Use E 3 1 3E 1 3 4 1 13X X
DO NOT USE 3 1Y X
y 4 7 16
P Y y 0.1 0.2 0.7
(d) Use 2 2Var 10 7 10 Var 10 2.4 240X X
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 60
The discrete uniform distribution. This is the distribution in which the random variable can take a set of, say, n values and X is equally likely to take each value (e.g. X being the score on the die). In general, then, if X can take the values
1 2 3, , ,... nx x x x we have 1
( )rP X xn
for 1, 2, 3, ...r n .
The mean and variance of this distribution. In the above example we see that:
1 1
1 1E
n n
r rr r
X x xn n
and 2 2
2 2
21 1 1 1
1 1 1 1Var
n n n n
r r r rr r r r
X x x x xn n n n
If the random variable X took the values 1, 2, 3, 4, …n then
1 1
1
2
n n
rr r
n nx r
and
2 2
1 1
1 2 1
6
n n
rr r
n n nx r
(these are standard results).
In which case we see that
1
1 11 1E
2 2
n
rr
n n nX x
n n
And, in a similar way we see that
22
21 1
2 2
2
2
1 1Var
1 2 1 1 1 2 1 11 1
6 2 6 4
1 14 2 3 1 1
12 12
1
12
n n
r rr r
X x xn n
n n n n n n n n
n n
n nn n n
n
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 61
So for the random variable X which has discrete uniform distribution over the integers 1, 2 , 3,…, n
we have 1E
2
nX
and
2 1Var
12
nX
From this we can, using coding, find E X and Var( )X if X has discrete uniform distribution over,
for example, the integers 3, 5 , 7,…, 19. We say that 2 1X Y and so Y has discrete uniform distribution over the integers 1, 2 , 3,…, 9.
We know, therefore, that 9 1E 5
2Y
and that
29 1 80 20Var( )
12 12 3Y
.
Using the coding we see that :
E =E 2 1 2E 1 11X Y Y and that 80
Var( ) Var(2 1) 4Var( )3
X Y Y
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 62
Normal Distribution The Normal distribution including the mean, variance and use of tables of the cumulative distribution function. Knowledge of the shape and the symmetry of the distribution is required. Knowledge of the probability density function is not required. Derivation of the mean, variance and cumulative distribution function is not required. Interpolation is not necessary. Questions may involve the solution of simultaneous equations. The normal distribution provides us with a suitable model for a very large number of distributions and so is the most important distribution in statistical theory. Heights or weights of men or women, time taken to run the 200m etc. all follow the normal distribution. It has some key features, notably that the vast majority of data is centred around the mean and that very small or large values are rare. It is symmetrical about the mean (an hence the mean is also equal to the mode and the median). We describe the normal distribution using two parameters, its mean and its variance 2 . So if
the random variable X has normal distribution with mean and variance 2 we write X~ 2,N
for short.
A sketch of the distribution of Z~ N 0 ,1 is shown below:
We have looked at the cumulative distribution functions of discrete random variables. We are now dealing with a continuous random variable and so we cannot talk about the probability of X taking any particular value. We can only talk about the probability of X lying between two values a and b.
Z is the letter given to the random variable which has mean 0 and variance 1. In fact the random variable Z is called the standard normal distribution and it and so is written as N 0 ,1 .
NB: The total area under this curve is 1.
So 21
21
e d 12
tt
The equation that generates
this curve is 21
21
e2
xy
.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 63
The cumulative distribution function for the random variable Z is written as z .
In other words 21
21
( )2
zt
z P Z z e dt
.
From the tables we have 0 0.5, 1 0.8413, 2 0.9772, 3 0.9987 etc.
It is always true that
The symmetry of the graph gives us the following: These two areas combine to give 1 and so we have the following:
In the same way
Combining (3) and (1) these gives
z
So z is the area shown in the diagram.
z -z
z z
1z z (2)
-z z
P Z z
P Z z
1P Z z P Z z (3)
1P Z z z (1)
P Z z z (4)
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 64
We need to be able to use the tables in the formula book.
Finding a probability when given a value of z Three stages: a. Turn it into a “less than problem”, using 1P Z z z
b. Ensure that the z value is positive, using 1z z
c. Read off from tables We use the large table to find probabilities when we have been given values of z: Case 1: Find 1.7 .
a. Already is “less than problem”. b. z value is already positive. c. Read off from tables, to give 1.7 0.9554
Case 2: Find 0.8 .
a. Already is “less than problem” b. Use 1z z to get 0.8 1 0.8
c. Read off from tables. 0.8 1 0.8 1 0.7881 0.2119
Case 3: Find 0.3P Z
a. Use 1P Z z z to get 0.3 1 0.3P Z
b. z value is already positive. c. Read off from tables. 0.3 0.6179 and so 0.3 1 0.6179 0.3821P Z
Case 4: Find 1.7P Z .
a. Use 1P Z z z to get 1.7 1 1.7P Z
b. Use 1z z to get 1 1.7 1 1 1.7 1.7
c. Read off from tables. 1.7 0.9554
NB : We could have used P Z z z to give
1.7 1.7P Z
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 65
With our knowledge of these four cases we can find the area between any limits:
Example Find 0.7 2.1P Z .
0.7 2.1 2.1 0.7
2.1 0.7
0.9821 0.7580
0.224 (to 3dp)
P Z P Z P Z
Example Find 1.2 1.8P Z .
1.2 1.8 1.8 1.2
1.8 1.2
1.8 1 1.2
0.9641 1 0.8849
0.849 (to 3dp)
P Z P Z P Z
Example Find 0.9 0.1P Z .
0.9 0.1 0.1 0.9
0.1 0.9
1 0.1 1 0.9
0.9 0.1
0.8159 0.5398
0.276 (to 3dp)
P Z P Z P Z
NB : It is worth knowing that P a Z b P b Z a
a b
P a Z b
P Z b P Z a
b a
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 66
Finding a value of z when given a ‘neat’ probability (0.3, 0.8, 0.05 etc.) We use the small table to find probabilities when we have been given values of z: Three stages: a. Turn it into a “greater than problem”, using 1P Z z z
b. Ensure that the probability is less than 1
2 , using 1P Z z P Z z
c. Read off from tables Case 1 : Solve 0.2P Z z
a. Already is a “greater than problem”.
b. The probability is already less than 1
2
c. Read off from tables to give 0.8416z . Case 2 : Solve 0.7P Z z
a. Already is a “greater than problem”. b. Use 1P Z z P Z z to give 1 0.7 0.3P Z z
c. Read off from tables to give 0.5244z and so 0.5244z . Case 3 : Solve 0.9z
a. Use 1P Z z z to give 0.1P Z z .
b. The probability is already less than 1
2 .
c. Read off from tables to give 1.2816z . Case 4 : Solve 0.15z
a. Use 1P Z z z to give 0.85P Z z .
b. Use 1P Z z P Z z to give 1 0.85 0.15P Z z
c. Read off from tables to give 1.0364z and so 1.0364z .
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 67
We have done all the working so far on Z~ N 0 ,1 . We have seen how to find probabilities and z values. However, heights and weights and so on are not normally distributed with mean 0 and variance 1. It would be impossible to have a set of tables showing us all possible areas for all possible values of
and 2 . So we need some way of turning the random variable X ~ N , 2 into the standard
normal distribution Z~ N 0 ,1 .
The transformation we need to make is Z X
.
From the work we did on E aX b and Var aX b we see that EE 0
XZ
and that
2
Var( )Var( ) 1
XZ
.
All we have done is shift and stretch the curve shown above and so Z still has the normal distribution.
Thus Z X
may be used to transform N , 2 into N 0 ,1 . For example…
X~ 2120 ,5N Z~ N 0 ,1
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 68
Finding probability when given a value of x Five stages:
Write it out using the random variable X Turn it into a “less than problem” Turn it into a “Z problem” Use 1z z to ensure that the value inside ... is positive.
Read off from tables Example The amount of jam in a jar is normally distributed with mean 140g and standard deviation 5.2g. Find the probability that the amount of jam in a jar chosen at random is:
(a) Less than 143g (b) More than 135g (c) Less than 138g (d) More than 146g (e) Between 139g and 145g
(a)
143
143 140
5.2
0.58
0.7190
P X
(c)
138
138 140
5.2
0.38
1 0.38
1 0.6480 0.3520
P X
(e)
139 145 145 139
145 140 139 140
5.2 5.2
0.96 0.19
0.8315 1 0.19
0.8315 1 0.5753
0.4068
P X P X P X
(b)
135
1 135
135 1401
5.2
1 0.96
0.96 0.8315
P X
P X
(d)
146
1 146
146 1401
5.2
1 1.15
1 0.8749 0.1251
P X
P X
This value must rounded to 2dp and stated. The z values in the table are given to 2dp.
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 69
Finding a value of x given a percentage Five stages:
Write it out using the random variable X Turn it into a “greater than problem” Turn it into a “Z problem” Use 1P Z z P Z z to ensure that
the probability on the right hand side is less than ½.
Use tables (REVERSE SIDE) Example The heights of a certain animal is normally distributed with mean 60cm and standard deviation 2cm. (a) Find the height above which 20% of animals lie. (b) Find the height above which 60% of animals lie. (c) Find the height below which 10% of animals lie. (d) Find the height below which 70% of animals lie. (a)
0.2
600.2
2
600.8416
260 2 0.8416 61.7 cm (to 3sf)
P X x
xP Z
x
x
(c)
0.1
0.9
600.9
2
600.1
2
601.2816
260 2 1.2816 57.4 cm (to 3sf)
P X x
P X x
xP Z
xP Z
x
x
SHORTCUT : All these problems come down to finding a value which is either z or z , where the z value is the value linked in the tables above with p (if 0.5p ) or 1 p (if 0.5p ). It is easy to work out whether it is z or z . This is explained overleaf.
(b)
0.6
600.6
2
600.4
2
600.2533
260 2 0.2533 59.5 cm (to 3sf)
P X x
xP Z
xP Z
x
x
(d)
0.7
0.3
600.3
2
600.5244
260 2 0.5244 61.0 cm (to 3sf)
P X x
P X x
xP Z
x
x
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 70
SHORTCUT Example The heights of a certain animal is normally distributed with mean 60cm and standard deviation 2cm. (a) Find the height above which 20% of animals lie. (b) Find the height above which 60% of animals lie. (c) Find the height below which 10% of animals lie. (d) Find the height below which 70% of animals lie. (a) Using the shortcut above, the z value to use is 0.8416 (corresponding to 0.2) Since 20% of animals lie above this value, the value is clearly greater than the mean, so the value is 60 0.8416 2 61.7 cm (to 3sf). (b) Using the shortcut above, the z value to use is 0.2533 (corresponding to 1 0.6 0.4 ). Since 60% of animals lie above this value, the value is clearly less than the mean, so the value is 60 2 0.2533 59.5 cm (to 3sf) . (c) Using the shortcut above, the z value to use is 1.2816 (corresponding to 0.1). Since 10% of animals lie below this value, the value is clearly less than the mean, so the value is 60 2 1.2816 57.4 cm (to 3sf) . (d) Using the shortcut above, the z value to use is 0.5244 (corresponding to 1 0.7 0.3 ). Since 70% of animals lie below this value, the value is clearly greater than the mean, so the value is 60 2 0.5244 61.0 cm (to 3sf) .
Example If X~ 287,3N then find x such that 0.9821P X x .
0.9821 clearly does not appear on the reverse table so we have to use the front.
870.9821
3
x
which gives 87
2.103
x and so 93.3x (to 3sf).
NB : A question may give us a probability which is not one of the values listed on the back table and which is not listed on the front. We need to use the front table and linearly interpolate between the relevant values to find an x value.
Example If X~ 21620,35N then find x such that 0.75P X x .
0.75 clearly does not appear on the reverse table so we have to use the front. It lies between 0.7486 and 0.7517.
0.75 lies 14
31th of the way between 0.7486 and 0.7517. So the z value is
140.67 0.01 0.6745
31
(to 4dp) 1620
0.7535
x
which gives 1620
0.674535
x and so 1640x (to 3sf).
NB : A question may give us a probability which is not ‘neat’, so not one of the values listed on the back table. We then need to use the front table. How to do this is explained below….
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 71
Find mean or standard deviation Five stages:
Write it out using the random variable X Turn it into a “greater than problem” Turn it into a “Z problem” Ensure that the probability on the right hand
side is less than ½. Use tables (REVERSE SIDE)
Example The amount of jam in a jar is normally distributed with mean and standard deviation 5g. Find the mean if the probability that a jam chosen at random contains: (a) more than 142g is 0.05 (b) less than 142g is 0.01 (a) Using the shortcut outlined earlier, 142 5 1.6449 , since 142 is greater than . 142 5 1.6449 133.8g (to 1dp) . (b) Using the shortcut outlined earlier, 142 5 2.3263 , since 142 is less than . 142 5 2.3263 153.6g (to 1dp) . Example X is normally distributed with mean 140g and standard deviation . Find given that the probability of being more than 150g is 0.2. Using the shortcut outlined earlier, since 150 is greater than the mean, 150 140 0.8416 . Hence 11.9g (to 1dp) . Alternatively… 150 0.2
150 1400.2
100.8416
11.9g (to 1dp)
P X
P Z
For use only in Badminton School November 2011 S1 Note
Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 72
Find mean and standard deviation
X is normally distributed with mean and standard deviation . Find and given that the probability of being more than 70 is 0.1 and the probability of being less than 60 is 0.2. Using the shortcut outlined earlier: 70 1.2816 , since 70 is greater than . 60 0.8416 , since 60 is less than . Alternatively
70 0.1 60 0.2
700.1 60 0.8
70 601.2816 0.8
6070 1.2816 0.2
600.8416
60 0.8416
P X P X
P Z P Z
P Z
P Z
We now have two simultaneous equations to solve 70 1.2816 (1)
60 0.8416 (2)
Subtract (2) from (1) gives us
10 1.2816 0.8416
10 2.1232
4.71 (to 3sf)
Plugging this back into (1) gives us 64.0 (to 3sf)