s1 notes (edexcel) · ... a2 notes and igcse / gcse worksheets 1 s1 notes (edexcel) for use only in...

For use only in Badminton School November 2011 S1 Note

Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 1

S1 Notes (Edexcel)



Definitions for S1 Statistical Experiment A test/investigation/process adopted for collecting data to provide evidence for or against a hypothesis. “Explain briefly why mathematical models can help to improve our understanding of real world problems” Simplifies a real world problem; enables us to gain a quicker / cheaper understanding of a real world problem Advantage and disadvantage of statistical model Advantage : cheaper and quicker Disadvantage : not fully accurate

“Write down two reasons for using statistical models” To simplify a real world problem To improve understanding / describe / analyse a real world problem Quicker and cheaper than using real thing To predict possible future outcomes Refine model / change parameters possible Any 2

“Statistical models can be used to describe real world problems. Explain the process involved

in the formulation of a statistical model.” Observe real-world problem Devise a statistical model and collect data (Experimental) data collected Model used to make predictions Compare and observe against expected outcomes and test model; Statistical concepts are used to test how well the model describes the real-world problem Refine model if necessary.

A sample space A list of all possible outcomes of an experiment Event Sub-set of possible outcomes of an experiment. Normal Distribution Bell shaped curve symmetrical about mean; mean = mode = median 95% of data lies within 2 standard deviations of mean 68.3% between one standard deviation of mean

Independent Events

( ) ( ) ( )P A B P A P B Mutually Exclusive Events

( ) 0P A B



Explanatory and response variables The response variable is the dependent variable. It depends on the explanatory variable (also called the independent variable). So in a graph of length of life versus number of cigarettes smoked per week, the dependent variable would be length of life. It depends (or may do) on the number of cigarettes smoked per week. Give two reasons to justify the use of statistical models Used to simplify or represent a real world problem Cheaper or quicker or easier (than the real situation) or more easily modified (any two lines) To improve understanding of the real world problem B1 Used to predict outcomes from a real world problem (idea of predictions) Describe the main features and uses of a box plot.

Two tests for skewness

Positive skew if 3 2 2 1 0Q Q Q Q and if Mean > Median > Mode

Negative skew if 3 2 2 1 0Q Q Q Q and if Mean < Median < Mode

A good way to remember the condition for skewness involving mean, median and mode. Write down the three averages in alphabetical order, that is

Mean…Median…Mode

For positive skew fill the blank with a “>” sign to give

Mean > Median > Mode For negative skew fill the blank with a “<” sign to give

Mean < Median < Mode

Which sort of average and which sort of measure of dispersion should you use? If there are outliers or if the data is skewed then use median and IQR, If the are not outliers then use mean and standard deviation.



Data Discrete Discrete data can only take certain values in any given range. Number of cars in a household is an example of discrete data. The values do not have to be whole numbers (e.g. shoe size is discrete). Continuous Continuous data can take any value in a given range. So a person’s height is continuous since it could be any value within set limits. Categorical Categorical data is data which is not numerical, such as choice of breakfast cereal etc. Data may be displayed as grouped data or ungrouped data. We say that data is “grouped” when we present it in the following way:

Weight (w) Frequency 65- 3 70- 7

Or

Score (s) Frequency 5-9 2

10-14 5 NB: We can group discrete data or continuous data. We must know how to interpret these groups, So that

Weight (w) 65- 65 70w 70- 70 75w

Or

Score (s) 5-9 4.5 9.5s

10-14 9.5 14.5s



Representation of Data Histograms, stem and leaf diagrams, box plots. Use to compare distributions. Back-to-back stem and leaf diagrams may be required. Stem and Leaf Diagrams The stem and leaf diagram is a very useful way of grouping data whilst retaining the original data. For example suppose we had the following scores from children in a Maths test: 85, 18, 38, 67, 43, 75, 78, 81, 92, 71, 52, 62, 49, 62, 82, 69, 55, 57, 95, 62, 37 We see that the smallest value is 18 and the largest is 95. The classes of stem and leaf diagrams must be of equal width and so it would seem sensible to choose classes 10-19, 20-29, etc. The “stem” in this case represents the tens and the “leaf” represents the units so we have the following: Scores in Maths Test

Stem (Tens) Leaf (Units) 1 8 2 3 8 7 4 3 9 5 2 5 7 6 7 2 2 9 2 7 5 8 1 8 5 1 2 9 2 5

We then arrange these in numerical order to give the following: Scores in Maths Test

Stem (Tens) Leaf (Units) 1 8 2 3 7 8 4 3 9 5 2 5 7 6 2 2 2 7 9 7 1 5 8 8 1 2 5 9 2 5

We m also include a “key” with the diagram, so we say

1 8 means 18

This diagram tells us the basic shape of the distribution. We can easily see the smallest and largest values and we can see that the mode is 62. We can also use it to calculate 1Q , 2Q and 3Q .

NB : the data must be in order in a Stem and Leaf Diagram.



NB: If we wanted to represent the interval 18-22 on a stem & leaf we could not make 1 the stem since not all the numbers would begin with 1. What we could do is have a stem of 18 and then make the leaf the number we add on to the stem. In this case our key would be:

18 0 means 18 and 18 4 means 22

Back to back stem diagrams We can use these to compare two samples by using a “back to back stem plot”. In this we put stems down the middle and then one set of data on the left and the on the on the right. So we might end up with a diagram as follows:

Physics Maths 7 5 1 8

1 2 6 5 3 3 7 8 4 2 1 4 3 9

9 4 3 1 0 5 2 5 7 8 4 2 6 2 2 2 7 9

6 3 7 1 5 8 5 1 8 1 2 5

9 2 5 Our key here would be

In Physics 7 1 means 17

In Maths 1 8 means 18



Histograms Data that has been grouped can be represented using a histogram. A histogram is made up of rectangles of varying widths and heights – there are no gaps between the blocks.

The key feature of a histogram is that the area of each block is proportional to the frequency In order for the area to be equal (or proportional) to the frequency we plot frequency density on the

vertical axis, where widthclass

frequency densityfrequency . The class width is the width of the interval

(i.e. it runs from the lower boundary to the upper boundary). Remember Frequency Density is Frequency Divided by class width. Example Plot a histogram for the following:

Length (h) Frequency Class width Frequency Density

650- 3 20 0.15 670- 7 10 0.7 680- 20 10 2 690- 16 10 1.6

700-720 4 20 0.2 So the first block runs from 650 to 670 and has height 0.15 etc.

NB: If there are gaps between the stated upper limit of one class interval and the lower limit of the next class interval then we need to fill those gaps as shown below. For example,

Length (m) 15-19 14.5 19.5x 20-24 19.5 24.5x 25-29 24.5 29.5x

NB: Be careful with age since “15-19” would mean 15 20x since one is 19 until the moment before one’s 20th birthday. The shape of the histogram gives us information about the mean and the dispersion

When question says “give a reason to justify the use of a histogram to represent these data”…. The answer is “Data is continuous”

So the class width is 5. Not 19 15 4

FD

Length



Box Plot Diagram (or Box and whisker diagram) This is a diagram used to illustrate the dispersion of data. There is a box which runs from the lower quartile, 1Q to the upper quartile 3Q with the median, 2Q marked on it. The whisker then goes from

this box to the lowest value in one direction and to the highest value in the other. We end up with a diagram as follows: Skewness. Concepts outliers. Any rule to identify outliers will be specified in the question. If the question refers to outliers then use a refined box plot where the maximum length for the whisker is to, for example, 3 11.5 Q Q at the end where an outlier lies. In this case outliers (which

lie outside of the whiskers) are marked with crosses. NB We can tell something about the skewness / symmetry of the distribution from the box plot. For example,

1Q2Q 3Q

Highest Value Lowest Value

1Q2Q 3Q


NB : It must have a horizontal axis with a scale on it.

1Q2Q 3Q

3 3 11.5Q Q Q

Outlier

Lowest Value

Note that the whisker does not need to take the its maximum length. It goes only to the lowest value on this side.



We can see from the above that this is positively skewed 3 2 2 1 0Q Q Q Q with a long tail of

high values. Similarly, The above is negatively skewed since 3 2 2 1 0Q Q Q Q .

1Q2Q 3Q




Measures of Location Measures of location - mean, median, mode. Data may be discrete, continuous, grouped or ungrouped. Understanding and use of coding. Measures of dispersion – variance, standard deviation, range and interpercentile ranges. Simple interpolation may be required. Interpretation of measures of location and dispersion. Ungrouped Data Mean

The mean of a set of data is valuesofnumber

values theall of sum, which can be written as

x fxx

n f

.

When the data is grouped so that it appears in class intervals we must use the midpoints of those intervals to calculate the mean (in which case it is only an estimate of the mean) Example Find the mean of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0, 62.8, 79.1, 85.6, 79.3, 85.2, 69.1, 75.2

75.6 81.4 67.3 81.5 84.0 62.8 79.1 85.6 79.3 85.2 69.1 75.2

12926.1

77.175 kg12

x

Example Find the mean of the following scores:

Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 2

12 3 13 5 ... 17 1 299

14.22 5 ... 2 21

x

Median, Quartiles, and Percentiles The lower quartile of a set of data is the value one quarter of the way into the data set, when written in ascending order. It is represented by the symbol 1Q .

The media n of a set of data is the value one half of the way into the data set, when written in ascending order. It is represented by the symbol 2Q .

The upper quartile of a set of data is the value three-quarters of the way into the data set, when written in ascending order. It is represented by the symbol 3Q .

There are many methods for calculating 1Q , 2Q and 3Q but the following method is recommended

by Edexcel and will get full credit in exams…. The xth percentile, xP , to be the value which is x% of the way through the distribution.

If there are n observations then the xth percentile is calculated as follows: If 100

xn is an integer r then the xth percentile is the midpoint of the rth and 1r th values.

If 100xn lies between the integers r and 1r then the xth percentile is the 1r th value

Also given the letter .



We use the same method to find 1Q , 2Q and 3Q , treating them as the 25th, 50th and 75th percentiles

respectively. If there are n observations then…. the lower quartile is calculated as follows: If 4

n is an integer r then the lower quartile is the midpoint of the rth and 1r th values.

If 4n lies between the integers r and 1r then the lower quartile is the 1r th value.

the median is calculated as follows: If 2

n is an integer r then the median is the midpoint of the rth and 1r th values.

If 2n lies between the integers r and 1r then the median is the 1r th value.

the upper quartile is calculated as follows: If 3

4n is an integer r then the upper quartile is the midpoint of the rth and 1r th values.

If 34n lies between the integers r and 1r then the upper quartile is the 1r th value.

Example Find the median and quartiles of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0, 62.8, 79.1, 85.6, 79.3, 85.2, 69.1, 75.2 First of all list them in order: 62.8, 67.3, 69.1, 75.2, 75.6, 79.1, 79.3, 81.4, 81.5, 84, 85.2, 85.6 There are 12 pieces of data

124 3 so the lower quartile is the midpoint of the 3rd and 4th values, that is

69.1 75.272.15

2

kg

122 6 so the median is the midpoint of the 6th and 7th values, that is 79.2kg 3 12

4 9 so the median is the midpoint of the 9th and 10th values, that is 82.75kg

The interquartile range is 3 1Q Q .

Example Find the median of the following scores:

Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 2

There are 21 pieces of data: 214 5.25 so the lower quartile is 6th value, that is 13. 212 10.5 so the median is the 11th value, that is 14. 3 21

4 15.75 so the upper quartile is the 16th value, that is 15.



Grouped Data Mean Use the midpoints of each category and use the same method for ungrouped data. Example Find an estimate of the mean of the following data:

Height, h (cm) 140- 145- 150- 155- 160- 165-170 Frequency 140 245 354 214 105 57

142.5 140 147.5 245 ... 167.5 57 170387.5

153 cm140 245 ... 57 1115

x

(to 3sf)

Median When dealing with grouped data we have to interpolate to find estimates for 1Q , 2Q and 3Q .

Example Find the median of the following data:


There are 1115 pieces of data so the median is the height associated with the 1115

2 557.5 th value.

This lies somewhere between 150cm and155cm.

So the median lies 172.5

354of the way into the interval so our estimate for

the median is 172.5

150 5 152.4354

cm (to 1dp).

Use exactly the same method for finding lower quartile and upper quartile So the lower quartile is the height associated with the 1115

4 278.75 th value. This lies somewhere

between 145cm and150cm.

The lower quartile lies 278.75 140

245

of the way into the interval so our estimate for the lower

quartile is 138.75

145 5 147.8245

cm (to 1dp).

The first 739 (that is 140 245 354 ) have a height less than 155cm.

Median 150cm 155cm

385 739

The first 385 (that is 140 245 ) values have a height less than 150cm

557.5

557.5 385 172.5

739 385 354

With grouped data, work

with 2

nth value for the

median (eg. 557.5) , do not round up (e.g. to 558) as with ungrouped data.



Mode The mode is the most frequently occurring value. Example Find the mode of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.0 There is no mode because all values occur with equal frequency. Example Find the mode of the following scores:

Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 1

The mode is 14. Example Find the modal class of the following data:


Frequency Density 28 49 35.4 10.7 5.25 1.14 The modal class is the class interval with the highest frequency density and so is “145-”. Variance Suppose that we have the following frequency table:

Value 1x 2x 3x 4x

rx

Frequency 1f 2f 3f 4f

rf

The variance is a measure of spread, or dispersion. It is the average of the square of the difference of each piece of data from the mean. That is, it is the average of all the values 2)( xxi .

So we have,

f

fxx 22 )(

Variance .

This can be rearranged so that it becomes easier to use and we have

22

2 Variance xf

fx

.

We tend not to put in the f and to write f as n, giving

2

2 2Variance x

xn

“mean of the squares minus the square of the mean”



Standard Deviation The standard deviation is simply the square root of the variance.

Hence the standard deviation, , is given by 2 2

2( )x x x

xn n

.

Example Find the standard deviation of the following weights (in kg): 75.6, 81.4, 67.3, 81.5, 84.2

First of all we calculate the mean as 75.6+81.4+67.3+81.5+84.2 390

785 5

x

We use this to get that the standard deviation is

2 2 2 2 2

275.6 +81.4 +67.3 +81.5 +84.278 6.04

5 kg (to 3sf)

Example Find the standard deviation of the following scores:

Score 12 13 14 15 16 17 Frequency 2 5 6 4 2 1

First of all we calculate the mean as 12 3 13 5 ... 17 1 282

14.12 5 ... 1 20

x


2 2 2

212 2 13 5 ... 17 114.1 1.3

20

Example Find an estimate of the standard deviation of the following data:


The only difference between this and the previous example is that we have to use midpoints. First of all we calculate the mean as

1531115

5.170387

57...245140

575.167...2455.1471405.142

x cm (to 3sf)


2 2 2

2142.5 140 147.5 245 ... 167.5 57153 6.6

1115 cm (to 2sf)



Understanding and use of coding. Coding Suppose we had to find the mean and standard deviation of 200.037, 200.039, 200.042, 200.054. Suppose now that we now consider 1000 200t x .

Our values of t are 37, 39, 42, 54 which are easier to deal with than the initial data. The summary statistics for t are : 5n , 172t and 2 7570t .

It follows from this that 172

434

t and 2757043 6.59

4t (to 3sf).

Shifting numbers along by 200 has no effect on the dispersion (i.e. the standard deviation) but it does shift the mean by 200. Dividing by 1000 has the effect of reducing both the mean and the standard deviation by a factor of 1000.

Since 2001000

tx it follows that

2001000

tx and that

1000t

x

Hence we see that 43

200 200.0431000

x and that 6.59

0.006591000x (to 3sf).

In general if we have x at b then x at b and x ta .



Using calculator to find mean and standard deviation

Height (cm) 8-12 13-17 18-22 23-27 28-32 Frequency 6 8 12 8 6

On Texas…

Press to get:

Now press to get

Type in the following:

Press and choose CALC, 1-Var Stats

to see this:

Now type in to get



Press to see the following

Write down:

2

80020

40

1760020 6.32 (to 3sf)

40

x

sd

IGNORE Sx for S1.



On Casio…

Scroll down using to get Press 3 to get Press 1

Press then 2 (STAT) and then 1 (1-VAR) to get Type in all the data to get

Press For x and 2x

Press to give Choose 4 (sum) to get Choosing 1 gives 17600

Choosing again. Choose 4 (sum) and then 2 gives 800.



For the value of n, standard deviation and mean

Press to give

Choose 5 (Var) to give

Choosing 1 gives n to be 40. For standard deviation Choose 5 (Var) and then 3 to get

So standard deviation us 6.23 (to 3sf)



Skewness and Outliers Skewness. Concepts outliers. Any rule to identify outliers will be specified in the question. Skewness A distribution that is not symmetric is said to be skewed.

Mean Median Mode

Positive Skew

Mean Median

Mode

Negative Skew

If the distribution has a long ‘tail’ of high values then it is said to be positively skewed. In this case the median is less than the mean.

If the distribution has a long ‘tail’ of low values then it is said to be negatively skewed. In this case the median is greater than the mean.



Two tests for skewness

Positive skew if 3 2 2 1 0Q Q Q Q and if Mean > Median > Mode

Negative skew if 3 2 2 1 0Q Q Q Q and if Mean < Median < Mode

A good way to remember the condition for skewness involving mean, median and mode. Write down the three averages in alphabetical order, that is

Mean…Median…Mode

For positive skew fill the blank with a “>” sign to give

Mean > Median > Mode For negative skew fill the blank with a “<” sign to give

Mean < Median < Mode

Outliers Values which are at one of the extremes of the distribution are said to be outliers. How extreme a value has to be in order to be classified as an outlier will be stated in the question. One measure of an outlier would be if the value was greater than 3 3 11.5Q Q Q at one end or if

it was less than 1 3 11.5Q Q Q at the other end.



Probability Elementary probability. Sample space. Exclusive and complementary events. Sum and product laws.

The probability of Tottenham Hotspur winning their next game is 0.75. The probability of scoring a six when a die is thrown is 1

6 .

The probability of a part made in a factory being faulty is 0.02.

Probabilities can be expressed either as a fraction or a decimal. All probabilities lie between 0 and 1.

Sets and Venn Diagrams A set is a collection of items – such as numbers, people, letters etc. For example Set A is the set of pupils in year 11 who wear glasses Set B is the set of pupils in year 11 who have blonde hair A Venn diagram is a diagram which represents various sets. It is made up of a large rectangle with various oblongs inside to represent the various sets. These may or may not overlap. By the side of the large rectangle is the symbol . This represents “the universal set” and so could be, for example, all year 11 pupils. In this example only year 11 pupils in this school would feature in the Venn diagram. If an element is in set A and also in set B then it lies in the intersection of the two oblongs. If an element is neither in A or B then it lies inside the rectangle but outside both oblongs.

0

If an event has a probability of 0 then it will certainly not occur.

If an event has a probability of 1 then it will certainly occur.

It is more likely that an event occurs as its probability moves along this line.

event not happening 1 event happeningp p

A B



Example The diagram shown below is a Venn diagram indicating who enjoy swimming and running in a class. (a) There are 5 students who enjoy both running and swimming as there are 5 students lying in the intersection of the two oblongs. (b) There are 13 who enjoy running. That is the 8 who enjoy running but not swimming and the 5 who enjoy both running and swimming. (c) There are 10 students who enjoy neither running nor swimming. These are the 10 who lie outside both oblongs. Example In a class of 33 pupils, 20 like chess, 12 like draughts and 5 like neither. Represent this on a Venn diagram. Those who like chess will be represented by one oblong, those who like draughts will be represented by another oblong. Those who like both are in the overlap of the oblongs whilst those who like neither will lie outside both oblongs. So the 4 who like neither go outside the two oblongs.

How is the rest of the Venn diagram to be filled in? Think of the two oblongs are being two rugs which can overlap. One rug has area 20, one has area 12. The combined area has to be 28. If the oblongs did not overlap their combined area would be 32 so the overlap must be 4. So…

There are 20 who like chess in all so 16 must go in here.

There are 12 who like draughts in all so 8 must go in here

Running Swimming

8 5 3

10

Chess Draughts

5

Chess Draughts

5

16 4 8

These 5 pupils do not lie in either oblong so that leaves 28 to be put in the three positions of the oblongs.

(a) How many students enjoy both running and swimming? (b) How many enjoy running? (c) How many enjoy neither running nor swimming?



Notation

NB When A is a subset of B , A B B .

AB represents the fact that A is not a subset of B Mutually Exclusive This is an example of two events being mutually exclusive. A and B are said to be mutually exclusive if they cannot both occur at the same time, for example getting a six and a five from one throw of the die.

A B

A B

A B represents all the elements which belong to A or B or both.

A B represents all the elements which belong to A and B.

A

B

A B represents the fact that A is a subset of B

A B

A represents all the elements outside of A.

In this example A B since the two oblongs do not intersect.

A B

A B

From this we see that P A B P A P B P A B

If A and B are mutually exclusive then )()()( BPAPBAP .



Example If there are 7 red balls, 2 blue balls and 6 yellow balls in a bag. A ball is chosen out of the bag then what is the probability that the ball is: (a) blue (b) red (c) either red or blue

(a) 2 out of 15 of the balls are blue so the probability of picking a blue ball is 2

15

(b) 7 out of 15 of the balls are blue so the probability of picking a red ball is 7

15

(c) The events of being red and blue are mutually exclusive so: Probability of being red or blue = Probability of being red + Probability of being red

2 7

15 159

153

5

BE CAREFUL…

Example A spinner is equally likely to land on the numbers 1, 2, 3, 4, 5 or 6. Find the probability that the spinner lands on: (a) an even number (b) a number greater than 4 (c) an even number or a number greater than 4

(a) There are three even numbers (that is 2, 4, 6) so the probability is 3 1

6 2 .

(b) There are two numbers greater than 4 (that is 5 and 6) so the probability is 2 1

6 3 .

(c) The event of landing on an even number and the event of landing on a number greater than 4 are not mutually exclusive so the probabilities cannot be added. The numbers which are even or greater than 4 are 2, 4, 5 or 6.

Hence the probability is 4 2

6 3 .

Example The probability of a girl in a school wearing glasses is 0.3 and the probability of a girl having blond hair is 0.4. Joshua thinks that the probability of wearing glasses or having blond hair is 0.7. Is he right? The answer is that he is not right. Some people have blond hair and wear glasses. In other words, wearing glasses and having blond hair are not mutually exclusive, so their probabilities cannot be added together.

1 2

3

4 5

6



Set notation Set A might be all the blonde haired pupils in year 11 in a school. It could also be the set of even numbers less than 10. In which case curly brackets are used and A is written as 2, 4,6,8A .

4 A means that 4 is an element of A. 9 A means that 9 is not an element of A.

( )n A represents the number of elements in A. If 2, 4,6,8A then ( ) 4n A .

Example

is all whole numbers less than 10.

: is an even numberA x x : is a prime numberB x x

(a) Express this on a Venn diagram. (b) List all the elements of A . (c) What is n A B ?

(d) Find A B . (a) Since is all whole numbers less than 10, the only numbers appearing on the Venn diagram are 1, 2, 3, 4, 5, 6, 7, 8 and 9 (b) A represents all the elements outside of A. So 1, 3, 5, 7, 9A

(c) A B represents all the elements which belong to A or B or both. So 2,3, 4,5,6,7,8A B . There are 8 elements in this set so 8n A B .

(d) A B represents all the elements which belong to A and B. So 2A B since 2 is the

only number which is both even and prime. Shading Unions Example A B includes any region that is shaded in A or B (or both). Area represented by A Area represented by B Area represented by A B

A B

A B

=

A B

4, 6, 8 2 3, 5, 7

1, 9



Example A B includes any region that is shaded in A or B (or both). Shading Intersections Example A B includes any region that is shaded in both A and B . Area represented by A Area represented by B Area represented by A B Some important regions to know

A B

A B

A B

=

A B

C

A C

B

A C

B

Area represented by A Area represented by B Area represented by A B

=

A B

A B

A B

A B

A B A BA B

A B

A B

A B

A BB



Combined Events P A B means the probability of A happening, given that B has happened.

Independent Events Two events A and B are said to be independent if they have no effect on each other (e.g. being late to school in Kenya and raining in Bolivia are independent).

If A and B are independent then B happening has no effect on A happening so P A B P A .

If A and B are independent then B then, since

( )( )

P A BP A B

P B

, it follows that

( )

( )P A B

P AP B

and so P A B P A P B .

A B

P A B means the probability of A

happening, given that B has happened.

Referring to the Venn diagram, P A B

represents the probability of being in the shaded area

So

( )( )

P A BP A B

P B

If A and B are independent then P A B P A P B

The probability of A happening given that B happens is

represented by P A B where

( )( )

P A BP A B

P B

.



Exam Question (S1, January 2006) For the events A and B,

P(A B) = 0.32, P(A B) = 0.11 and P(A B) = 0.65.

(a) Draw a Venn diagram to illustrate the complete sample space for the events A and B. (3)

(b) Write down the value of P(A) and the value of P(B). (3)

(c) Find P(AB). (2)

(d) Determine whether or not A and B are independent. (3)

(a) (b) 0.32 0.22 0.54P A

0.22 0.11 0.33P B

(c) If 0.33P B then 1 0.33 0.67P B

0.32 32

0.478 to 3dp0.67 67

P A BP A B

P B

(d) 0.54 0.33 0.1782P A P B

0.22P A B

So P A B P A P B and so A and B are not independent.

OR…. 0.54P A and 0.478 to 3dpP A B so P A B P A and so A and B are

not independent.

A B

0.32 0.110.22

0.65 0.11 0.32 0.22

0.35



Example If a coin is tossed and a die is rolled then find the probability of getting a head on the coin and a five on the die. Suppose the event A is getting a head on the coin and that the event B is getting a five on the die.

1p

2A and 1

p6

B .

These are independent events so 1 1 1p and p p

2 6 12A B A B .

Example Alexander fires three arrows at a target. With each arrow, the probability that he hits the target is 0.3. Whether he hits with any given arrow is independent of what happened to the previous arrows he fired.

Find the probability that he hits with the first two but misses with the third. The probability that he hits the target with his first arrow is 0.3. The probability that he hits the target with his second arrow is 0.3. The probability that he misses the target with his third arrow is 1 0.3 0.7 . Since the events are assumed to be independent, the probability that he hits with the first and hits with the second and misses with the third is obtained by multiplying the three numbers given above. So the probability that he hits with the first two but misses with the third is 0.3 0.3 0.7 0.063 .



Questions often involve choosing two or more balls from a bag. In these questions it is vital to establish whether or not the first ball has been replaced before the next one has been chosen. Read the question carefully.

Tree Diagrams Tree diagrams are a very clear way of representing the possible outcomes of combined events. On tree diagrams: as we move across we use a cross ( ) and so multiply probabilities as we move Down we aDD probabilities

The case when the ball is replaced…. Example A bag contains 10 balls, seven of which are red and the rest are green. A boy takes out a ball from the bag, notes its colour and puts it back. He then repeats this process. (a) Draw a tree diagram to represent this information. (b) Find the probability that he chose two red balls. (c) Find the probability that he chose two balls of different colours.

(a)

(b) Probability of two reds is 7 7 49

10 10 100 (as we move across we use a cross ( ))

(c) 7 3 21

p( )10 10 100

RG and 3 7 21

p( )10 10 100

GR

As we move Down we aDD so probability is 21 21 42 21

100 100 100 50

7/10R 49/100

3/10 G 21/100

7/10 R 21/100

3/10 G 9/100

7/10 R

3/10 G

as we move across we use a cross ( ).

7 710 10

Notice that the probabilities on the branches that start from the same point always add up to 1 .



The case when the ball is not replaced…. Example 3 red balls and 2 green balls in a bag. One ball is chosen at random, its colour is noted and it is not replaced. A second ball is then chosen and its colour is also noted. Find the probability that: (a) Both balls are red. (b) The second ball is green (c) At least one of the balls is green. (d) The first ball was red, given that the second ball was green. On tree diagrams: as we move across ( ) we multiply. as we move Down we aDD (a) 3 6 32

5 4 20 101 2( )P R R (b) 6 82 2

20 20 20 52 1 2 1 2(G ) (R G ) (G G )P P P (c) Either “At least one of the balls is green” or both of them are red.

So the probabilities of these two events add up to 1. So 1 2(at least one green) 1 ( )P P R R

The tree diagram shows us that 3

101 2( )P R R .

And so we have that

1 2

6 71420 20 10

(at least one green) 1 ( )

1

P P R R

(d) We need to find 1 2P R R .

61 2 20 3

41 2 8202

P R GP R G

P G

35 R

G

R

G

R

G

24

34

25

24

14

620

220

620

620

When the 5 balls are in the bag, the probability of choosing a red is 3

5 , since 3 of the 5 balls are red.

If a red ball is chosen first, then there are now 4 balls in the bag and only 2 of them are red. The probability of choosing a red is 2

4 .

3 25 4

Questions involving “At least one” This is an example of an important short cut. You DO NOT need to think of all the cases when there is at least one green ball.

Use the notation that 1R represents

the first ball being red etc.



Example 60% of the population of a certain town are vaccinated against flu. The probability of someone getting the flu given that they have had the vaccination is 0.2 but the probability of someone getting flu given that they have not had the vaccination is 0.65. (a) Find the probability that a person chosen at random gets flu. (b) Write down the probability that a person chosen at random doesn’t get flu given that he

didn’t have the vaccination. (c) Find, as a fraction, the probability that that a person chosen at random didn’t have the

vaccination given that he did get flu. (a) There are two groups of people who get flu. Either a person has a vaccination and gets flu or a person does not have a vaccination and gets flu. The probability of a person has a vaccination and gets flu is 0.12 The probability of a person does not have a vaccination and gets flu is 0.26 As we move Down we aDD so probability of getting flu is 38.026.012.0)( FP So 38.026.012.0)( FP (b) 35.0)( NVNFP (straight from tree diagram)

(c) ( ) 0.26 26 13

( )( ) 0.26 0.12 38 19

P NV FP NV F

P F

0.6

0.4

V

NV

0.2

0.8

0.65

0.35

F

NF

F

NF

0.12

0.48

0.26

0.14

Notice that the probabilities on the branches that come from the same point always add up to 1 . So 0.6 0.4 1

0.2 0.8 1

0.65 0.35 1



Correlation Scatter Diagram Many statistical investigations have to do with the relationship between several variables. We will now examine ways of handling data with two variables.

Suppose we want to compare two sets of data x and y. For example x might be the boy’s mark in a Maths exam and y might be the boy’s mark in an English exam.

Boy A B C D E F G H I J Maths Mark, x 1 8 15 17 23 28 33 39 45 45 English Mark, y 3 15 9 20 19 17 36 26 15 29

We plot these values on a graph in the form of a scatter diagram.

In the diagram we have marked on the dotted lines through x and y . This splits the diagram into four quadrants. If all the points of our scatter diagram lie near a straight line then we say that there is linear correlation between x and y.

(a) If y tends to increase as x increases then we say that there is a positive linear correlation. In this case most points will be in either the first or the third quadrants.

(b) If y tends to decrease as x increases then we say that there is a negative linear correlation. In this case most points will be in either the second or the fourth quadrants.

(c) If there is no relationship between x and y then we say that there is no correlation. In this case the points will be randomly distributed in all the quadrants.

Scattter Diagram of M aths and English M arks

05

10152025303540

0 10 20 30 40 50

M aths M ark

Eng

lish

Mar

k

x

y

4th

2nd 3rd

1st



How do we measure the strength of linear correlation? Suppose, for example that we look at the case above. If ,i ix y represented the general point then

because most points lie in either the first or the third quadrant we would find that i ix x y y

was positive most of the time and so we would find that i ix x y y would be positive.

However the problem with i ix x y y is that it varies with the size of x and y.

To avoid this we divide by 2

ix x and by 2

iy y and we end up with the product

moment correlation coefficient. Product Moment Correlation Coefficient (PMCC) The product moment correlation coefficient, its use, interpretation and limitations. Derivations and tests of significance will not be required. As stated above this measures the degree of linear correlation between x and y.

The coefficient r, known as the PMCC, is defined as r x x y y

x x 2y y 2

.

This can also be written as:

2 2

2 2

x yxy

nrx y

x yn n

FORMULA BOOK HAS



We now define three further quantities, which are, xxS , xyS and yyS . These are defined as

follows:

2

2 2xx

xS x x x

n

xy

x yS x x y y xy

n

2

2 2yy

yS y y y

n

Using these we see that xy

xx yy

Sr

S S .



Now consider the properties of this coefficient.

We know that 20 for all real x x y y .

2 22

2

2 2 2

2 2 2 2

2 2

Hence 2 0 for all real

This quadratic in is always postive and so " 4 ". Hence

4 4

1 1

1 1

y y y y x x x x

b ac

y y x x x x y y

x x y y y y x x x x y y

y y x x

x x y y

r

NB1 1r if all the points lie on a straight line with positive gradient, in which case we say there is perfect positive correlation. NB2 1r if all the points lie on a straight line with negative gradient, in which case we say there is perfect negative correlation. NB 3 The lower the value of r the less linear correlation there is.

NB4 r is only a measure of a linear relationship between x and y. They may be related in another way but r would still be very low. NB5 Coding the values of x and y does not change the product moment correlation coefficient, r. So if we make X ax b and Y cx d then the product moment correlation coefficient of x and y is the same as the product moment correlation coefficient of X and Y. NB6 Suppose we told to make X ax b and Y cx d and are then asked to find the regression line of Y on X having been given the summary statistics for x and y. First off find the regression line of y on x then replace y and x in that equation with

X b

xa

and

Y by

a

.



Regression

Scatter diagrams. Linear regression. Explanatory (independent) and response (dependent) variables. Applications and ations. Use to make predictions within the

range of values of the explanatory variable and the dangers of extrapolation. Derivations will not be required. Variables other than x and y may be used. Linear change of variable may be required. When we looked at the product moment correlation coefficient we were looking to measure the amount of linear association between two quantities x and y. We now want to go further and find the relationship between x and y when we know that there is this linear association. Once have plotted our data on a scatter diagram we look for a relationship y a bx (since the relationship is linear). We work back or ‘regress’ from the points we have plotted to find the equation of the line and hence the line is called the regression line of y on x. If we had a set of data such as

x 10 20 30 40 50 y 17 23 29 32 38

then x has been controlled and so it is called the explanatory (or independent) variable. y is called the response (or dependent) variable. If there is a linear relationship between y and x then for each value of iy there will be an

experimental error associated with it. We denote this error with i which can be either positive or

negative. Therefore we can write i i iy x . The mean value of the errors, i , will be about

zero. So if we are given a set of points how do we determine the equation of that line? How do we find a and b? We could simply try to draw a line of best fit by eye. There is, however, a standard line that we draw, called “least-squares regression line” which minimises the sum of squares of the distances shown below.

x

y y a bx

1 1,x y

1 1,x a bx

The explanatory (or independent) variable always goes on the horizontal axis

interpret



These distances could be denoted by id where i i id y a bx . We are, therefore, trying to

minimise 2

i iy a bx .

We can derive the values of a and b using calculus and when we do so we find that xy

xx

Sb

S where,

as we have already seen,

xy

x yS x x y y xy

n

and 2

2 2xx

xS x x x

n

Once we have the value of b we can find a using the fact that the line passes through the point ( , )x y . It follows from this that y a bx and so that a y bx .

So we say that the equation of the regression line of y on x is y a bx where xy

xx

Sb

S and

a y bx .

When using coding we might make x p

Xq

and

y mY

n

. We could then use the values of X

and Y to find the line of regression of Y on X. This would be Y a bX .

What we have left to do, is, therefore to replace X with x p

q

and Y with

y m

n

and so we get:

y m x pa b

n q

. We can then rearrange this to get y in terms of x.

We can use this find to find an estimate for the mean value of y (the response variable) when we have been given a value of x (the explanatory variable). This process is called interpolation. NB: When finding a value of y from a value of x we must be careful only to use values of x which lie in outside the range of x values which have been given. If we predict values of y using values of x which lie outside the range of given values then we are using extrapolation. We should take great care when extrapolating to see if this can be justified.

NB : When drawing the line of best fit always mark the point ,x y on your diagram and make

sure that the line of best fit passes through this point.

A word on ‘interpretation’ If the x values were the lengths of salmon and the y values were the weights of salmon and if r was close to 1 then the interpretation is not “positive correlation. It is “as the length of the salmon increases, so the weight of the salmon increases”. The January 2011 markscheme suggests that if r is greater than 0.4 then this is evidence of positive correlation though this is not explained until S2.



Using calculators to find product moment correlation coefficient and line of regression

For example find the summary statistics for the data shown below..

1 4 7 10 13 16 19

5 8 13 2 11 27 35

On Texas….

1. Press

2. Press and type in all the data so that it look as follows

3. Press



4. Highlight CALC

5. Highlight 2-Var Stats and press twice and you get this:

Scrolling down gives us all we need to use in the formulae

2

2

70

101

1388

952

2337

7

x

y

xy

x

y

n



On Casio….

1. Press

2. Press to get

3. Press (A+BX) to get

4. Type in all the data so that it look as follows

5. Press to get

6. Press to get

7. Press to get

NB : To go back to Data on Casio choose

to get

Press to get back to



8. Using the different options gives us

2

2

70

101

1388

952

2337

7

x

y

xy

x

y

n

Using the values from the calculator….

Formula book tells us, for example, that 2

2 2xx

xS x x x

n

.

So we see that 270

952 2527xxS .

Formula book tells us, for example, that 2

2 2xx

xS x x x

n

.

So we see that 270

952 2527xxS .

Similarly the formula book tells us that

xy

x yS xy

n

so we see that

70 1011388 378

7xyS

In the same way the formula book tells us that that 2

2yy

yS y

n

so we see that

21012337 880

7yyS .



PMCC

We could calculate directly using xy

xx yy

Sr

S S to give

3780.803

252 880r

(to 3sf).

On Texas…

Type , choosing 5: Statistics, choosing EQ and then 7 to get 0.803. On Casio…

Press and then to get

Press to get



Regression

We are also given that the regression line of y on x is y a bx where xy

xx

Sb

S .

So we can calculate 378

1.5252

b .

We then use the fact that y a bx passes through the point ,x y to see that a y bx and so

101 701.5 0.571

7 7a (to 3sf).

On Texas…

Press , choose CALC and press 4 to get the following

On Casio…

Press to get

Press to get

Press to get

Press to get

NB Formula book refers to y a bx , whilst Texas uses y ax b

NB Casio uses the same notation as the formula book, that is y a bx .



Discrete Random Variables The concept of a discrete random variable. A random variable is a variable, taking numerical values, which represents the value obtained when we take a measurement from a real-life experiment. There are two kinds of random variable - discrete and continuous. A continuous random variable can take any value in a given range, whereas a discrete random variable can take only certain values. The probability function and the cumulative distribution function for a discrete random variable. Simple uses of the probability function p(x) where ( ) ( ) p x P X x . The set of all possible values of a random variable along with the associated probabilities is called the probability distribution. Example Suppose a coins is tossed three times and the number of heads is the random variable X. The probability distribution for X is as follows:

Number of Heads (X) 0 1 2 3 Probability 0.125 0.375 0.375 0.125

Sometimes the probabilities are expressed as a function as opposed to a table. This function is called the probability function. Example Find the probability function for X, the score on a die. We know that 1 1 1

6 6 6( 1) , ( 2) , ( 3)P X P X P X etc. and so we get the formula

16 for 1, 2,3, 4,5,6

( )0 otherwise

xP X x

NB ( )p x (or simply p) is a short way of writing P(X x). If X is a continuous random variable we cannot talk about the probability of X taking particular values but only of the probability of it taking a range of values, for example ( 3)P X . We cannot, therefore, use a simple probability function to define X.

NB: ( ) 1p x Example Find p in the following table

x 1 2 3 4

P X x 0.2 0.1 p 0.55 Since ( ) 1p x , it follows that 0.2 0.1 0.55 1p and so 0.15p .



Use of the cumulative distribution function 0

0 0 ( )x x

F x P X x p x

Just as we have met cumulative frequency graphs in GCSE so we know meet a cumulative distribution function for a random variable, X. The cumulative distribution function of X is defined as 0 0F x P X x so that 0F x means

“the probability that X is less than or equal to 0x ”.

If X is discrete then we can also write 0F x as

0

0 ( )x x

F x p x

.

If r is the largest value which X can take then 1F r .

If s is the smallest value which X can take then F s P X s .

If X takes consecutive integer values then 1 1F r F r P r

Example Find 1F and 3F in the following

Number of Heads (X) 0 1 2 3 Probability 0.125 0.375 0.375 0.125

(1) 0 1 0.125 0.375 0.5F P X P X

3 is the largest value that X can take so (3) 1F .

Example The random variable X is such that 2

16

k xF x

for 0,1,2,3x . Find the positive

integer k is positive and draw the probability distribution table for X.

3 is the largest value that X can take so (3) 1F . 23

3 116

kF

. So 2

3 16k .

3 4k so 7 or 1k k . k is positive, so 1k .

Using 21

16

xF x

we have:

x 0 1 2 3

F x 1

16

4

16

9

16

16

16

3 0 1 2 3F P X P X P X P X and

2 0 1 2F P X P X P X so 3 2 3F F P X . So

16 9 73

16 16 16P X .



Similarly 2 1 2F F P X so 9 4 52

16 16 16P X .

And 1 0 1F F P X so 4 1 31

16 16 16P X .

And 0 0F P X so 10

16P X

.So we have the following table

x 0 1 2 3

P X x 1

16

3

16

5

16

7

16

Mean and variance of a discrete random variable. Use of E X , 2E X for calculating the

variance of X. Knowledge and use of E EaX b a X b and 2Var VaraX b a X .

The mean of a probability distribution is called the expected value of the random variable X. It is written as E(X) for short and its symbol is .

We have already met the formula for the mean of a sample, i.e. x xff and by replacing the

relative frequency f

f with p we have the formula:

E(X) xp for the random variable X

We have also met the formula 2

2variancex f

xf

.

If we replace the relative frequency f

f with p as we did before and replace x with E X then

we have the formula, 2 2Var( ) E( )X x p X .

Now for any function g(X) we have E( )g X g x p and so we see from this that 2 2E( )X x p .

Hence we have the following formula for the variance of X:

22Var( ) E EX X X



In an exam if all the scores were doubled then the mean score would clearly double. Since every

value of x would increase by a factor of 4 it follows that both 2E X and 2E X increase by a

factor of 4. Hence 22Var( ) E EX X X increases by a factor of 4.

So E 2 2E and Var 2 4VarX X X X .

Similarly if in exam if all the scores were increased by 10 then the mean would increase by 10. However, since the spread of the scores has not changed, the variance remains the same. So E 10 E 10 and Var 10 VarX X X X .

Generalising the results shown above gives the following:

2

2

E E

E E

Var Var

Var Var

aX a X

aX b a X b

aX a X

aX b a X

Example The random variable X has the following probability distribution :

X 1 2 5

P X x 0.1 0.2 0.7

Find

(a) E

(b) Var

(c) E 3 1

(d) Var 10 7

X

X

X

X

(a) & (b) E 1 0.1 2 0.2 5 0.7 4X

2 2 2 2 2E 1 0.1 2 0.2 5 0.7 4 18.4X

2 2Var( ) E( ) E( ) 18.4 16 2.4X X X



On Casio…

Scroll down using to get Press 3 to get Press 1

Press then 2 (STAT) and then 1 (1-VAR) to get

Type in all the data to get

Press For mean and variance

Press to give

Choose 5 (Var) to get Choosing 2 gives the mean to be 4.



Choosing again. Choose 4 (Sum) and then 1 to get 2x , that is 2E X when dealing

with probabilities not frequencies. This is 18.4

Choosing again. Choose 5 (Var) and then 3 to get standard deviation. Square to give the variance to be 2.4.

If using the calculator, write down 22 2Var E E 18.4 4 2.4X X X .

(c) Use E 3 1 3E 1 3 4 1 13X X

DO NOT USE 3 1Y X

y 4 7 16

P Y y 0.1 0.2 0.7

(d) Use 2 2Var 10 7 10 Var 10 2.4 240X X



The discrete uniform distribution. This is the distribution in which the random variable can take a set of, say, n values and X is equally likely to take each value (e.g. X being the score on the die). In general, then, if X can take the values

1 2 3, , ,... nx x x x we have 1

( )rP X xn

for 1, 2, 3, ...r n .

The mean and variance of this distribution. In the above example we see that:

1 1

1 1E

n n

r rr r

X x xn n

and 2 2

2 2

21 1 1 1

1 1 1 1Var

n n n n

r r r rr r r r

X x x x xn n n n

If the random variable X took the values 1, 2, 3, 4, …n then

1 1

1

2

n n

rr r

n nx r

and

2 2

1 1

1 2 1

6

n n

rr r

n n nx r

(these are standard results).

In which case we see that

1

1 11 1E

2 2

n

rr

n n nX x

n n

And, in a similar way we see that

22

21 1

2 2

2

2

1 1Var

1 2 1 1 1 2 1 11 1

6 2 6 4

1 14 2 3 1 1

12 12

1

12

n n

r rr r

X x xn n

n n n n n n n n

n n

n nn n n

n



So for the random variable X which has discrete uniform distribution over the integers 1, 2 , 3,…, n

we have 1E

2

nX

and

2 1Var

12

nX

From this we can, using coding, find E X and Var( )X if X has discrete uniform distribution over,

for example, the integers 3, 5 , 7,…, 19. We say that 2 1X Y and so Y has discrete uniform distribution over the integers 1, 2 , 3,…, 9.

We know, therefore, that 9 1E 5

2Y

and that

29 1 80 20Var( )

12 12 3Y

.

Using the coding we see that :

E =E 2 1 2E 1 11X Y Y and that 80

Var( ) Var(2 1) 4Var( )3

X Y Y



Normal Distribution The Normal distribution including the mean, variance and use of tables of the cumulative distribution function. Knowledge of the shape and the symmetry of the distribution is required. Knowledge of the probability density function is not required. Derivation of the mean, variance and cumulative distribution function is not required. Interpolation is not necessary. Questions may involve the solution of simultaneous equations. The normal distribution provides us with a suitable model for a very large number of distributions and so is the most important distribution in statistical theory. Heights or weights of men or women, time taken to run the 200m etc. all follow the normal distribution. It has some key features, notably that the vast majority of data is centred around the mean and that very small or large values are rare. It is symmetrical about the mean (an hence the mean is also equal to the mode and the median). We describe the normal distribution using two parameters, its mean and its variance 2 . So if

the random variable X has normal distribution with mean and variance 2 we write X~ 2,N

for short.

A sketch of the distribution of Z~ N 0 ,1 is shown below:

We have looked at the cumulative distribution functions of discrete random variables. We are now dealing with a continuous random variable and so we cannot talk about the probability of X taking any particular value. We can only talk about the probability of X lying between two values a and b.

Z is the letter given to the random variable which has mean 0 and variance 1. In fact the random variable Z is called the standard normal distribution and it and so is written as N 0 ,1 .

NB: The total area under this curve is 1.

So 21

21

e d 12

tt

The equation that generates

this curve is 21

21

e2

xy

.



The cumulative distribution function for the random variable Z is written as z .

In other words 21

21

( )2

zt

z P Z z e dt

.

From the tables we have 0 0.5, 1 0.8413, 2 0.9772, 3 0.9987 etc.

It is always true that

The symmetry of the graph gives us the following: These two areas combine to give 1 and so we have the following:

In the same way

Combining (3) and (1) these gives

z

So z is the area shown in the diagram.

z -z

z z

1z z (2)

-z z

P Z z

P Z z

1P Z z P Z z (3)

1P Z z z (1)

P Z z z (4)



We need to be able to use the tables in the formula book.

Finding a probability when given a value of z Three stages: a. Turn it into a “less than problem”, using 1P Z z z

b. Ensure that the z value is positive, using 1z z

c. Read off from tables We use the large table to find probabilities when we have been given values of z: Case 1: Find 1.7 .

a. Already is “less than problem”. b. z value is already positive. c. Read off from tables, to give 1.7 0.9554

Case 2: Find 0.8 .

a. Already is “less than problem” b. Use 1z z to get 0.8 1 0.8

c. Read off from tables. 0.8 1 0.8 1 0.7881 0.2119

Case 3: Find 0.3P Z

a. Use 1P Z z z to get 0.3 1 0.3P Z

b. z value is already positive. c. Read off from tables. 0.3 0.6179 and so 0.3 1 0.6179 0.3821P Z

Case 4: Find 1.7P Z .

a. Use 1P Z z z to get 1.7 1 1.7P Z

b. Use 1z z to get 1 1.7 1 1 1.7 1.7

c. Read off from tables. 1.7 0.9554

NB : We could have used P Z z z to give

1.7 1.7P Z



With our knowledge of these four cases we can find the area between any limits:

Example Find 0.7 2.1P Z .

0.7 2.1 2.1 0.7

2.1 0.7

0.9821 0.7580

0.224 (to 3dp)

P Z P Z P Z


1.2 1.8 1.8 1.2

1.8 1.2

1.8 1 1.2

0.9641 1 0.8849

0.849 (to 3dp)

P Z P Z P Z


0.9 0.1 0.1 0.9

0.1 0.9

1 0.1 1 0.9

0.9 0.1

0.8159 0.5398

0.276 (to 3dp)

P Z P Z P Z

NB : It is worth knowing that P a Z b P b Z a

a b

P a Z b

P Z b P Z a

b a



Finding a value of z when given a ‘neat’ probability (0.3, 0.8, 0.05 etc.) We use the small table to find probabilities when we have been given values of z: Three stages: a. Turn it into a “greater than problem”, using 1P Z z z

b. Ensure that the probability is less than 1

2 , using 1P Z z P Z z

c. Read off from tables Case 1 : Solve 0.2P Z z

a. Already is a “greater than problem”.

b. The probability is already less than 1

2

c. Read off from tables to give 0.8416z . Case 2 : Solve 0.7P Z z

a. Already is a “greater than problem”. b. Use 1P Z z P Z z to give 1 0.7 0.3P Z z

c. Read off from tables to give 0.5244z and so 0.5244z . Case 3 : Solve 0.9z

a. Use 1P Z z z to give 0.1P Z z .

b. The probability is already less than 1

2 .

c. Read off from tables to give 1.2816z . Case 4 : Solve 0.15z

a. Use 1P Z z z to give 0.85P Z z .

b. Use 1P Z z P Z z to give 1 0.85 0.15P Z z

c. Read off from tables to give 1.0364z and so 1.0364z .



We have done all the working so far on Z~ N 0 ,1 . We have seen how to find probabilities and z values. However, heights and weights and so on are not normally distributed with mean 0 and variance 1. It would be impossible to have a set of tables showing us all possible areas for all possible values of

and 2 . So we need some way of turning the random variable X ~ N , 2 into the standard

normal distribution Z~ N 0 ,1 .

The transformation we need to make is Z X

.

From the work we did on E aX b and Var aX b we see that EE 0

XZ

and that

2

Var( )Var( ) 1

XZ

.

All we have done is shift and stretch the curve shown above and so Z still has the normal distribution.

Thus Z X

may be used to transform N , 2 into N 0 ,1 . For example…

X~ 2120 ,5N Z~ N 0 ,1



Finding probability when given a value of x Five stages:

Write it out using the random variable X Turn it into a “less than problem” Turn it into a “Z problem” Use 1z z to ensure that the value inside ... is positive.

Read off from tables Example The amount of jam in a jar is normally distributed with mean 140g and standard deviation 5.2g. Find the probability that the amount of jam in a jar chosen at random is:

(a) Less than 143g (b) More than 135g (c) Less than 138g (d) More than 146g (e) Between 139g and 145g

(a)

143

143 140

5.2

0.58

0.7190

P X

(c)

138

138 140

5.2

0.38

1 0.38

1 0.6480 0.3520

P X

(e)

139 145 145 139

145 140 139 140

5.2 5.2

0.96 0.19

0.8315 1 0.19

0.8315 1 0.5753

0.4068

P X P X P X

(b)

135

1 135

135 1401

5.2

1 0.96

0.96 0.8315

P X

P X

(d)

146

1 146

146 1401

5.2

1 1.15

1 0.8749 0.1251

P X

P X

This value must rounded to 2dp and stated. The z values in the table are given to 2dp.



Finding a value of x given a percentage Five stages:

Write it out using the random variable X Turn it into a “greater than problem” Turn it into a “Z problem” Use 1P Z z P Z z to ensure that

the probability on the right hand side is less than ½.

Use tables (REVERSE SIDE) Example The heights of a certain animal is normally distributed with mean 60cm and standard deviation 2cm. (a) Find the height above which 20% of animals lie. (b) Find the height above which 60% of animals lie. (c) Find the height below which 10% of animals lie. (d) Find the height below which 70% of animals lie. (a)

0.2

600.2

2

600.8416

260 2 0.8416 61.7 cm (to 3sf)

P X x

xP Z

x

x

(c)

0.1

0.9

600.9

2

600.1

2

601.2816

260 2 1.2816 57.4 cm (to 3sf)

P X x

P X x

xP Z

xP Z

x

x

SHORTCUT : All these problems come down to finding a value which is either z or z , where the z value is the value linked in the tables above with p (if 0.5p ) or 1 p (if 0.5p ). It is easy to work out whether it is z or z . This is explained overleaf.

(b)

0.6

600.6

2

600.4

2

600.2533

260 2 0.2533 59.5 cm (to 3sf)

P X x

xP Z

xP Z

x

x

(d)

0.7

0.3

600.3

2

600.5244

260 2 0.5244 61.0 cm (to 3sf)

P X x

P X x

xP Z

x

x



SHORTCUT Example The heights of a certain animal is normally distributed with mean 60cm and standard deviation 2cm. (a) Find the height above which 20% of animals lie. (b) Find the height above which 60% of animals lie. (c) Find the height below which 10% of animals lie. (d) Find the height below which 70% of animals lie. (a) Using the shortcut above, the z value to use is 0.8416 (corresponding to 0.2) Since 20% of animals lie above this value, the value is clearly greater than the mean, so the value is 60 0.8416 2 61.7 cm (to 3sf). (b) Using the shortcut above, the z value to use is 0.2533 (corresponding to 1 0.6 0.4 ). Since 60% of animals lie above this value, the value is clearly less than the mean, so the value is 60 2 0.2533 59.5 cm (to 3sf) . (c) Using the shortcut above, the z value to use is 1.2816 (corresponding to 0.1). Since 10% of animals lie below this value, the value is clearly less than the mean, so the value is 60 2 1.2816 57.4 cm (to 3sf) . (d) Using the shortcut above, the z value to use is 0.5244 (corresponding to 1 0.7 0.3 ). Since 70% of animals lie below this value, the value is clearly greater than the mean, so the value is 60 2 0.5244 61.0 cm (to 3sf) .

Example If X~ 287,3N then find x such that 0.9821P X x .

0.9821 clearly does not appear on the reverse table so we have to use the front.

870.9821

3

x

which gives 87

2.103

x and so 93.3x (to 3sf).

NB : A question may give us a probability which is not one of the values listed on the back table and which is not listed on the front. We need to use the front table and linearly interpolate between the relevant values to find an x value.

Example If X~ 21620,35N then find x such that 0.75P X x .

0.75 clearly does not appear on the reverse table so we have to use the front. It lies between 0.7486 and 0.7517.

0.75 lies 14

31th of the way between 0.7486 and 0.7517. So the z value is

140.67 0.01 0.6745

31

(to 4dp) 1620

0.7535

x

which gives 1620

0.674535

x and so 1640x (to 3sf).

NB : A question may give us a probability which is not ‘neat’, so not one of the values listed on the back table. We then need to use the front table. How to do this is explained below….



Find mean or standard deviation Five stages:

Write it out using the random variable X Turn it into a “greater than problem” Turn it into a “Z problem” Ensure that the probability on the right hand

side is less than ½. Use tables (REVERSE SIDE)

Example The amount of jam in a jar is normally distributed with mean and standard deviation 5g. Find the mean if the probability that a jam chosen at random contains: (a) more than 142g is 0.05 (b) less than 142g is 0.01 (a) Using the shortcut outlined earlier, 142 5 1.6449 , since 142 is greater than . 142 5 1.6449 133.8g (to 1dp) . (b) Using the shortcut outlined earlier, 142 5 2.3263 , since 142 is less than . 142 5 2.3263 153.6g (to 1dp) . Example X is normally distributed with mean 140g and standard deviation . Find given that the probability of being more than 150g is 0.2. Using the shortcut outlined earlier, since 150 is greater than the mean, 150 140 0.8416 . Hence 11.9g (to 1dp) . Alternatively… 150 0.2

150 1400.2

100.8416

11.9g (to 1dp)

P X

P Z



Find mean and standard deviation

X is normally distributed with mean and standard deviation . Find and given that the probability of being more than 70 is 0.1 and the probability of being less than 60 is 0.2. Using the shortcut outlined earlier: 70 1.2816 , since 70 is greater than . 60 0.8416 , since 60 is less than . Alternatively

70 0.1 60 0.2

700.1 60 0.8

70 601.2816 0.8

6070 1.2816 0.2

600.8416

60 0.8416

P X P X

P Z P Z

P Z

P Z

We now have two simultaneous equations to solve 70 1.2816 (1)

60 0.8416 (2)

Subtract (2) from (1) gives us

10 1.2816 0.8416

10 2.1232

4.71 (to 3sf)

Plugging this back into (1) gives us 64.0 (to 3sf)

s1 notes (edexcel) · ... a2 notes and igcse / gcse worksheets 1 s1 notes (edexcel) for use only in...

Documents