stats pack jun14

662
The Actuarial Education Company © IFE: 2014 Examinations ActEd Study Materials: 2014 Examinations Stats Pack Contents Introduction 12 Chapters If you think that any pages are missing from this pack, please contact our administration team by email at [email protected] or by phone on 01235 550005. Important: Copyright Agreement This study material is copyright and is sold for the exclusive use of the purchaser. You may not hire out, lend, give out, sell, store or transmit electronically or photocopy any part of it. You must take care of your material to ensure that it is not used or copied by anybody else. By opening this pack you agree to these conditions. Item code: PSTA14

Upload: gopal-chandu

Post on 23-Nov-2015

427 views

Category:

Documents


15 download

DESCRIPTION

Stats Pack June 2014

TRANSCRIPT

  • The Actuarial Education Company IFE: 2014 Examinations

    ActEd Study Materials: 2014 Examinations

    Stats Pack

    Contents

    Introduction 12 Chapters

    If you think that any pages are missing from this pack, please contact our administration team by email at [email protected] or by phone on 01235 550005.

    Important: Copyright Agreement

    This study material is copyright and is sold for the exclusive use of the purchaser. You may not hire out, lend, give out, sell, store or transmit electronically or photocopy any

    part of it. You must take care of your material to ensure that it is not used or copied by anybody else. By opening this pack you agree to these conditions.

    Item code: PSTA14

  • IFE: 2014 Examinations The Actuarial Education Company

    All study material produced by ActEd is copyright and is sold for the exclusive use of the purchaser. The copyright is owned

    by Institute and Faculty Education Limited, a subsidiary of the Faculty and Institute of Actuaries.

    You may not hire out, lend, give out, sell, store or transmit electronically or photocopy any part of the study material.

    You must take care of your study material to ensure that it is not used or copied by anybody else.

    Legal action will be taken if these terms are infringed. In addition, we may seek to take disciplinary action through the

    profession or through your employer.

    These conditions remain in force after you have finished using the course.

  • Stats Pack-00: Introduction Page 1

    The Actuarial Education Company IFE: 2014 Examinations

    Stats Pack

    Introduction

    Background Stats Pack was originally developed in response to requests from students who have studied very little statistics before in their schooling and for whom Subject CT3 is a significant jump. It now forms part of the syllabus for the Actuarial Common Entrance Test (ACET). How to use the Stats Pack Dont be put off by the size of this course! The style of this pack is deliberately chatty its purpose being to ensure that you understand the concepts, as doing so will make remembering and applying the results far easier. The earlier parts of some chapters are pitched deliberately low so that those with a non-mathematical background (or those who havent studied maths for a while) can quickly get into it. If you find it too easy, skip through it and try the questions! Do however take the time to try the questions. This will make a real difference to your understanding (especially if you try them before looking at the solutions). You will find extra practice questions at the end of each chapter to enable you to consolidate what you have learnt. Many of these questions are from the CT3 exam.

  • Stats Pack-00: Introduction Page 3

    The Actuarial Education Company IFE: 2014 Examinations

    Stats Pack Online Classroom

    Please note that this Stats Pack comes with complimentary access to the Stats Pack online classroom. This is a series of pre-recorded tutorials covering the main points from the course with examples as well as a dedicated forum for queries staffed by tutors. To access the online classroom please visit:

    https://learn.bpp.com

    You should have received an email with your access details. If you have lost this then enter your username (which is your email address used by ActEd) and click the Forgotten your password? to have a new password emailed to you. Should you have any problems with accessing the online classroom then please do email our admin team at [email protected].

    Queries and feedback We have worked hard to ensure the Stats Pack is clear and accessible and we honestly believe that the Stats Pack will be an invaluable aid in helping you to get to grips with the fundamentals of statistics. However, if you find that anything is still unclear please post your queries in the forum in the Stats Pack Online Classroom or alternatively, you can post your query in the FAC and StatsPack forum at www.ActEd.co.uk/forums (or use the link from our homepage at www.ActEd.co.uk). If you have any feedback on this course then please do email [email protected]. Thanks.

    ACET Mock Exam

    A practice exam containing questions of the same standard as the ACET exam can be found in the reference resources section of the FAC online classroom.

  • Stats Pack: Index Page 1

    The Actuarial Education Company IFE: 2013 Examinations

    Stats Pack Index Addition rule for mutually exclusive events ....................... Ch4 p6 Addition rule for non-mutually exclusive events ................ Ch4 p9 Attribute data ....................................................................... Ch1 p4 Bar chart .............................................................................. Ch1 p9 Bernoulli distribution........................................................... Ch8 p7 Binomial distribution ........................................................... Ch8 p11 Bivariate data ....................................................................... Ch12 p2 Boxplot ................................................................................ Ch1 p23 Ch3 p18 Categorical data ................................................................... Ch1 p2 Central moment ................................................................... Ch3 33 Ch7 p38 Coefficient of skewness ....................................................... Ch7 p35 Ch9 p33 Combinations ....................................................................... Ch6 p8 Combinations to calculate probabilities .............................. Ch6 p11 Comparison of data.............................................................. Ch3 p39 Complementary events ........................................................ Ch4 p4 Conditional probability ........................................................ Ch4 p15 Ch5 p10 Continuous uniform distribution ......................................... Ch10 p2 Continuous random variables .............................................. Ch9 p3 Correlation ........................................................................... Ch12 p4 Correlation coefficient ......................................................... Ch12 p10 Covariance ........................................................................... Ch12 p7 Cumulative distribution function ......................................... Ch7 p11 Ch9 p11 Cumulative frequency curve ................................................ Ch1 p20 Cumulative frequency table ................................................. Ch1 p8 Dichotomous data ................................................................ Ch1 p4 Discrete data ........................................................................ Ch1 p3 Discrete random variables ................................................... Ch7 p3 Discrete uniform distribution .............................................. Ch8 p2 Dotplot ................................................................................. Ch1 p19

  • Page 2 Stats Pack: Index

    IFE: 2013 Examinations The Actuarial Education Company

    Expectation Of a continuous random variable ............................. Ch9 p17 Of a discrete random variable .................................. Ch7 p17 Of a function of a continuous random variable ....... Ch9 p22 Of a function of a discrete random variable ............ Ch7 p20 Of linear functions of random variables .................. Ch7 p22 Ch9 p24 Explanatory variable ............................................................ Ch12 p3 Exponential distribution ...................................................... Ch10 p10 Frequency density ................................................................ Ch1 p12 Frequency distribution ......................................................... Ch1 p5 Grouped frequency distribution ........................................... Ch1 p6 Histogram ............................................................................ Ch1 p10 Independent events .............................................................. Ch4 p11 Interpolation ......................................................................... Ch2 p31 Interquartile range From a frequency distribution .................................. Ch3 p10 From a grouped frequency distribution ................... Ch3 p13 From a list ................................................................ Ch3 p5 Using cumulative frequency .................................... Ch3 p16 Line of best fit ...................................................................... Ch12 p15 Lineplot ................................................................................ Ch1 p19 Location ............................................................................... Ch2 p1 Lower quartile ...................................................................... Ch3 p5 Mean From a frequency distribution .................................. Ch2 p9 From a grouped frequency distribution ................... Ch2 p11 From a list ................................................................ Ch2 p7 Of a discrete random variable .................................. Ch7 p17 Of a continuous random variable ............................. Ch9 p16 Median From a frequency distribution .................................. Ch2 p18 From a grouped frequency distribution ................... Ch2 p20 From a list ................................................................ Ch2 p15 Of a continuous random variable ............................. Ch9 p18 Of a discrete random variable .................................. Ch7 p18 Using cumulative frequency .................................... Ch2 p21

  • Stats Pack: Index Page 3

    The Actuarial Education Company IFE: 2013 Examinations

    Mode From a frequency distribution ................................. Ch2 p4 From a grouped frequency distribution ................... Ch2 p5 From a list ................................................................ Ch2 p3 Of a continuous random variable ............................. Ch9 p20 Of a discrete random variable .................................. Ch7 p19 Moment................................................................................ Ch2 p25 Ch3 p33 Ch7 p37 Ch9 p34 Multiplication rule for independent events .......................... Ch4 p11 Mutually exclusive events ................................................... Ch4 p5 Negative correlation ............................................................ Ch12 p4 Nominal data ....................................................................... Ch1 p4 Normal distribution General probability .................................................. Ch11 p22 Moments .................................................................. Ch11 p7 PDF .......................................................................... Ch11 p3 Probabilities for any normal distribution ................. Ch11 p26 Standard normal ....................................................... Ch11 p8 Standard normal probabilities .................................. Ch11 p9 Standardising ........................................................... Ch11 p24 Numerical data ..................................................................... Ch1 p2 Ordinal data ......................................................................... Ch1 p4 Permutations of all objects .................................................. Ch6 p4 Permutations of some objects .............................................. Ch6 p5 Poisson distribution ............................................................. Ch8 p21 Positive correlation .............................................................. Ch12 p4 Probability ........................................................................... Ch4 p2 Probability density function ................................................ Ch9 p6 Probability distributions ...................................................... Ch7 p4 Probability functions ........................................................... Ch7 p5 Probability tree diagrams ..................................................... Ch5 p5 Qualitative data .................................................................... Ch1 p2 Quantitative data .................................................................. Ch1 p2

  • Page 4 Stats Pack: Index

    IFE: 2013 Examinations The Actuarial Education Company

    Random variable .................................................................. Ch7 p3 Range From a frequency distribution .................................. Ch3 p3 From a grouped frequency distribution ................... Ch3 p4 From a list ................................................................ Ch3 p2 Regression line .................................................................... Ch12 p20 Residual ............................................................................... Ch12 p21 Response variable ................................................................ Ch12 p3 Sample space ....................................................................... Ch4 p2 Scatterplot ............................................................................ Ch12 p2 Skewness ............................................................................. Ch1 p26 Ch1 p34 Ch3 p34 Ch7 p32 Ch9 p30 Negative skew .......................................................... Ch2 p26 Positive skew ........................................................... Ch2 p26 Standard deviation From a frequency distribution .................................. Ch3 p26 From a grouped frequency distribution ................... Ch3 p29 From a list ................................................................ Ch3 p20 Of a continuous random variable ............................. Ch9 p26 Of a discrete random variable .................................. Ch7 p26 Standard normal distribution ............................................... Ch11 p8 Probabilities ............................................................. Ch11 p9 Stem and leaf diagram ......................................................... Ch1 p17 Transformation of data ........................................................ Ch2 p28 Ch3 p37 Tree diagrams ...................................................................... Ch5 p5 Upper quartile ...................................................................... Ch3 p5 Uniform distribution (continuous) ....................................... Ch10 p2 Uniform distribution (discrete) ............................................ Ch8 p2 Variance From a list ................................................................ Ch3 p24 Of a continuous random variable ............................. Ch9 p26 Of a discrete random variable .................................. Ch7 p26 Of a linear function of random variables ................. Ch7 p30 Ch9 p28 Waiting time for a Poisson distribution ............................... Ch10 p21

  • Stats Pack-01: Statistical diagrams Page 1

    The Actuarial Education Company IFE: 2014 Examinations

    Chapter 1

    Statistical diagrams

    Links to CT3: Chapter 1 Sections 1.1 1.6, 4.1 Syllabus objectives: (i)1. Summarise a set of data using a table or frequency distribution, and display it

    graphically using a line plot, a bar chart, histogram, stem and leaf plot, or other elementary device.

    0 Introduction

    The whole basis of this course is that we will be dealing with data (that is information or facts) such as claim types and amounts, number and age of deaths and so on. We will then summarise these data using diagrams (Chapter 1) and analysing them using averages and measures of spread (Chapter 2). We can then take it a step further: we use these figures to construct statistical models that fit the data we observe. An insurance company can then make predictions about future claims using these models. Technically speaking the word data is in fact plural (datum is the singular) and so we use these data rather than this data.

  • Page 2 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    1 Types of data

    Before we start summarising data using diagrams we will briefly describe the various types of data that we might meet and give them their mathematical names. First we need to consider whether we are dealing with numbers or not.

    data

    numerical(ie numbers)

    categorical(ie not numbers)

    Numerical data consists of numbers (eg 341, 0.8, 3.7, , , ) or quantities (eg 1.2 kg,

    1 3784 , ,ms m

    ). This is why, in some textbooks, you will see numerical data being referred to as quantitative data. Categorical data consists of non-numerical information (eg sex, eye colour, preferred payment method) and can therefore only take various categories (eg male/female, blue/green/brown/, cheque/cash/visa/). In some textbooks, you will see categorical data being referred to as qualitative data. Next we can subdivide the numerical data into two types:

    data

    numerical(ie numbers)

    categorical(ie not numbers)

    discrete continuous

  • Stats Pack-01: Statistical diagrams Page 3

    The Actuarial Education Company IFE: 2014 Examinations

    Discrete data is numerical data that can only take particular values. For example, the number of claims can only be whole numbers ( 0,1,2,3, ). We certainly cant have

    2 claims or 3.8 claims! Typically we get discrete data from counting, eg number of actuaries, number of claims, number of deaths. Continuous data is numerical data that can take any value within a specified range. For example, the length of time between claims can take any positive value it doesnt have to be a whole number, eg 85 minutes, but it could be 84.6914, etc. Typically we get continuous data from measuring, eg height (cm) or time (secs). Since continuous data can take an infinite number of different values it is usually rounded off when written down, eg to the nearest second.

    Question 1.1

    (i) For each of the following state whether the data is numerical or categorical: (a) weight (b) place of birth (c) number of claims to be processed (d) nature of car insurance claim (e) age (f) amount of claim. (ii) For the numerical data in part (i), state whether it is discrete or continuous.

  • Page 4 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    We now subdivide the categorical data into 3 types:

    data

    numerical(ie numbers)

    categorical(ie not numbers)

    discrete continuous nominal ordinalattribute(dichotomous)

    Attribute (or dichotomous) data is categorical (ie non-numerical) data that has only two categories. For example, claim/no claim, dead/alive or male/female. It is called attribute data as we are simply saying whether the item has this attribute (ie characteristic) or not. Nominal data is categorical (ie non-numerical) data that cannot be ordered in any way. For example, hair colour (blonde, brunette, ginger or black), type of policy (whole life assurance, term assurance or endowment assurance) and nature of claim (fire, theft, accident, earthquake, etc). Ordinal data is categorical (ie non-numerical) data that can be ordered. For example, tidiness (messy, fairly tidy or very neat), build (fat, medium sized or thin), agreement (strongly agree, agree, neither agree nor disagree, disagree, strongly disagree).

    Question 1.2

    State what type of data is required by each of these questions: (i) Which area do you work in (life, pensions, general, health or investment)? (ii) Did you study mathematics at university? (iii) How would you rate your revision technique (1 excellent, 5 poor)? Since we will be concentrating on numerical data in the Subject CT3 course, it is the distinction between discrete and continuous data that will be the most important.

  • Stats Pack-01: Statistical diagrams Page 5

    The Actuarial Education Company IFE: 2014 Examinations

    2 Summarising data in tables

    The first step to summarising a list of data values is to put the values in a table.

    2.1 Frequency distributions

    Below we have the number of claims reported each day to a small general insurance company over the last 28 working days: 4 2 0 3 2 1 1 4 2 5 0 3 2 1 3 4 3 5 1 2 4 2 3 1 4 2 3 2 This list of data is not very helpful in telling us exactly what is going on. So to help make things clearer were going to count how many there are of each number (called the frequency) and then put this into a table. In our list we have two days where 0 claims were reported, five days where only 1 claim was reported, eight days where 2 claims were reported and so on.

    Claims reported each day Frequency

    0 2 1 5 2 8 3 6 4 5 5 2

    This table is called a frequency distribution as it shows the distribution of the frequencies between the data values (ie how the frequencies are shared out amongst the data values). A frequency distribution is suitable for categorical or discrete numerical data.

    Question 1.3

    For the frequency distribution above, explain how we can obtain: (i) the number of results obtained (ii) the total number of claims reported.

  • Page 6 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    2.2 Grouped frequency distributions

    Below is a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 The frequency distribution for this data is:

    Age at death Frequency 48 1 51 1 57 1 63 1 66 2 67 0 68 1 69 1 70 1 71 0 72 2 73 1 74 1 75 1 76 1 77 2 78 1 79 0 80 1 81 3 82 0 83 1 84 1 85 0 86 1 87 0 88 1 89 0 90 1 94 1 96 1

    101 1

  • Stats Pack-01: Statistical diagrams Page 7

    The Actuarial Education Company IFE: 2014 Examinations

    As you can see this is particularly unhelpful! Why? Because the data values are too spread out. To counter this we can put the data into groups (called classes). We have 1 result (48) that is between 40 and 49, 2 results (51, 57) that are between 50 and 59, 5 results (63, 66, 66, 68, 69) that are between 60 and 69 and so on.

    Age at death Frequency 40 49 1 50 59 2 60 69 5 70 79 10 80 89 8 90 99 3

    100 109 1 This table is called a grouped frequency distribution as it shows the distribution of the frequencies between the groups (classes). Continuous data is unlikely to produce any repeats (as the values could be anything) and so we would expect the data values to be spread out. Hence, a grouped frequency distribution is how we should tabulate continuous data. Question 1.4

    A consumer watchdog measures the length of time (to the nearest 1 100 th minute) for which 30 phone calls to a helpline were put on hold. The results are: 1.45 0.32 1.81 0.90 1.02 2.00 1.63 0.86 8.56 0.78

    0.16 3.36 2.70 0.64 1.46 4.29 0.50 3.18 4.64 1.70

    2.69 4.20 1.50 3.90 6.20 3.15 4.99 2.05 7.90 9.10 Complete this grouped frequency distribution:

    Time (t) Frequency 0 0.5t 0.5 1t 1 2t 2 5t 5 10t

  • Page 8 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    2.3 Cumulative frequency tables

    A cumulative frequency table is one where we accumulate (ie add up) the frequencies as we go through each of the data values. For example, using the ages of death given in the previous section we get:

    Age at death Frequency Cumulative Frequency 40 49 1 1 50 59 2 3 60 69 5 8 70 79 10 18 80 89 8 26 90 99 3 29

    100 109 1 30 What do each of the cumulative frequencies represent? Well the 1 is all the deaths from 40 49 (ie up to age 49), the 3 is all the deaths from 40 59 (ie up to age 59), the 8 is all the deaths from 40 69 (ie up to age 69) and so on. Therefore it would make sense to label the cumulative frequency table as follows:

    Age at death Cumulative Frequency up to 49 1 up to 59 3 up to 69 8 up to 79 18 up to 89 26 up to 99 29 up to 109 30

    Question 1.5

    Draw up a cumulative frequency table for length of time for which 30 phone calls to a helpline were put on hold using the data from Question 1.4.

    A cumulative frequency table is helpful in finding the positions of data values, such as the middle value (the median). This will be covered in Chapter 2.

    1 2 3

    3 5 8

    8 10 18

  • Stats Pack-01: Statistical diagrams Page 9

    The Actuarial Education Company IFE: 2014 Examinations

    3 Summarising data in diagrams

    Whilst putting data into frequency tables is helpful, a diagram can often make the patterns in the data much clearer. We now look at six types of diagram.

    3.1 Bar chart

    A bar chart can be drawn for discrete or categorical data. For each data item, we simply draw a bar showing its frequency (ie how often that value occurs). A general insurance company has analysed the types of claims it received over the last month. The results are as follows:

    Claim type Frequency House theft 57 House fire 48 Car theft 156

    Car accident 245 The bar chart for these data is:

    0

    50

    100

    150

    200

    250

    300

    House theft House fire Car theft Car accident

    Types of claims

    Freq

    uenc

    y

    Generally, the x-axis is used to show the data items and the y-axis is used to show frequency. However, they can be drawn the other way round. If we are given a list of data values, it is usually easier to put them into a frequency table first and then draw the bar chart.

  • Page 10 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Question 1.6

    ActEd carried out a study into the mock exam results of students who passed their Subject CT3 exam. The results of a randomly selected group of 20 students (who subsequently passed) in their Subject CT3 mock exam are as follows:

    72, 70, 71, 74, 68, 69, 71, 72, 70, 75, 71, 72, 71, 71, 72, 69, 74, 70, 72, 71 Draw a bar chart to represent these data.

    Bar charts show the shape of the distribution clearly and simply, but are not suitable for continuous data. This is because continuous data can take any value and so we would need a bar for every number! In Subject CT3 we shall be dealing with numerical data (eg number of claims, age of death, amount of claims) and so all the remaining diagrams in this chapter are only suitable for numerical data.

    3.2 Histogram

    In the last section we used a bar chart to display discrete data. A histogram is similar to a bar chart but is used to display continuous data. Therefore we will use a continuous scale with no gaps between the bars. A general insurance company recorded the claim amounts that it received over the last week. The results are as follows:

    Claim amount (x) Frequency 0 500x 6

    500 1,000x 10 1,000 1,500x 9 1,500 2,000x 8 2,000 2,500x 3 2,500 3,000x 2 3,000 3,500x 1 3,500 4,000x 1

  • Stats Pack-01: Statistical diagrams Page 11

    The Actuarial Education Company IFE: 2014 Examinations

    The histogram for these data would be:

    0 1,000 2,000 3,000 4,000

    claim amount ()

    2

    4

    6

    8

    10

    Freq

    uenc

    y

    In this case the groups (called classes) all have the same width (called the class width) of 500. However, in practice we may have groups with different widths:

    Claim amount (x) Frequency 0 500x 6

    500 1,000x 10 1,000 1,500x 9 1,500 2,000x 8 2,000 4,000x 7

    This would mean our diagram would look like this:

    0 1,000 2,000 3,000 4,000

    claim amount ()

    2

    4

    6

    8

    10

    Freq

    uenc

    y

  • Page 12 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    The problem with this diagram is that most people would think that the 2,000 4,000x group has the most claims. Why? Because of its huge area! How can we draw a fairer diagram? Well since the bar for the last group is four times wider than the other bars, we should reduce the height by a factor of four:

    0 1,000 2,000 3,000 4,000

    claim amount ()

    2

    4

    6

    8

    10

    Essentially what we are doing is working with the area of the bars instead of the heights (as we did in a bar chart). So for a histogram: area frequency But wait a second! We can no longer use frequency on the vertical axis, since there are clearly more than 7 4 1.75 claims in the 2,000 4,000x group! Since we are dealing with rectangular bars the area is height width . Therefore, if we are given the frequency and the class width of the group we can calculate the height by:

    frequencyheightclass width

    Definition The frequency density (height) of each bar on a histogram is given by:

    frequencyfrequency densityclass width

  • Stats Pack-01: Statistical diagrams Page 13

    The Actuarial Education Company IFE: 2014 Examinations

    So for our data we get:

    Claim amount (x) Frequency Frequency density 0 500x 6 6 500 0.012

    500 1,000x 10 10 500 0.02 1,000 1,500x 9 9 500 0.018 1,500 2,000x 8 8 500 0.016 2,000 4,000x 7 7 2,000 0.0035

    Hence, our histogram is given by:

    0 1,000 2,000 3,000 4,000

    claim amount ()

    0.004

    0.008

    0.012

    0.016

    0.020

    Freq

    uenc

    y de

    nsity

    All that has changed from our previous fair diagram is the scale on the vertical axis so that the area of each bar is now the frequency (eg for the 0 500x group, the area is 0.012 500 6 , which is the frequency). Note that in general, a histogram is drawn with vertical bars and a continuous scale on the x-axis. However, it can be drawn with horizontal bars instead. In an exam, it is expected that you would use graph paper to draw a histogram. In summary, to draw a histogram we first have to calculate the frequency densities (by dividing the frequencies by the class widths). We then draw the histogram using the frequency densities for the heights. Technically the area is proportional to the frequency. Thus A k f , however for simplicity we have assumed that 1k for this section.

  • Page 14 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Question 1.7

    Another general insurance company recorded the claim amounts that it received during the previous month:

    Claim amount (x) Frequency 0 250x 60

    250 500x 75 500 1,000x 50

    1,000 2,000x 40 2,000 5,000x 30

    (i) Calculate the frequency densities for each of the groups. (ii) Hence draw a histogram to represent these data. Once we know the width of the group it is fairly straightforward to then calculate the height of the bar (ie the frequency density) and thus draw the histogram. We are now going to look at how we can calculate the widths for two other ways of grouping continuous data. Continuous data could be rounded eg time to the nearest minute. In which case we could get a group of: 10 19 mins Since the times are rounded to the nearest minute, the smallest value that could be included in this group is 9.5 mins (as this will round up to 10 mins). Similarly, the largest value that could be included in this group is (just below) 19.5 mins (as this will round down to 19 mins). Therefore we get: class width 19.5 9.5 10 mins When we construct our histogram we would actually draw the 10 19 mins bar from 9.5 to 19.5. The only other type of group that we could meet is one that involves ages. In which case we could get a group of: 11 20 years

  • Stats Pack-01: Statistical diagrams Page 15

    The Actuarial Education Company IFE: 2014 Examinations

    The problem with this group is that most people give their age last birthday (eg someone who is actually 24 years 9 months would say that they were 24 years old). The lowest age that could be included in this group is 11 years (as it could be the persons 11th birthday). However, up until the day before your 21st birthday you would still say that you were 20 years old. Therefore the largest age that could be included in this group is (just below) 21 years. So we get: class width 21 11 10 years When we construct our histogram we would actually draw the 11 20 years bar from 11 to 21.

    Question 1.8

    Write down the class width for each of these groups: (i) 150 170x where x represents a claim amount (ii) 150 169 for claim amounts recorded to the nearest (iii) 0 149 for claim amounts recorded to the nearest (iv) 30 35 years for age last birthday before the death of an individual. Question 1.9

    A life assurance company has analysed the ages of its current policyholders. All ages are recorded as age last birthday. The results are as follows:

    Age Frequency 24 29 72 30 34 80 35 39 100 40 49 80 50 64 75

    Draw a histogram of these data.

  • Page 16 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    We can also work backwards from a histogram to get the frequency table. For a bar chart, we just needed to read off the frequencies off the vertical axis. Recall that for a histogram the frequency of each group is the area of its bar. This histogram shows the journey times (in minutes) of employees to their offices:

    0 20 40 60 80 100 1200

    1

    2

    3

    4

    5

    6

    7

    Freq

    uenc

    y de

    nsity

    Journey time (mins)

    The first group (0 to 10 mins) has a frequency of: 5 10 50 Question 1.10

    Complete the frequency table for the journey times histogram:

    Time Frequency 100 t 50

    10 20t 20 40t

  • Stats Pack-01: Statistical diagrams Page 17

    The Actuarial Education Company IFE: 2014 Examinations

    3.3 Stem and leaf diagram

    A stem and leaf diagram is an alternative to a histogram. Here are the ages of 9 individuals in a company: 17 19 19 24 25 27 28 30 31 A stem and leaf diagram splits each data value up into 2 parts as follows:

    1 7 9 92 4 5 7 83 0 1

    The single number on the left-hand side is called the stem and the numbers on the right-hand side are the leaves associated with the stems. For the first row 1 7 9 9 , the stem is 1 and the leaves are 7, 9 and 9. This row represents the numbers 17, 19 and 19. In this case each number has been split up into tens (stem) and units (leaves). Each of the numbers on the right-hand side represents a data value. To make clear what each value is actually shown we need a key:

    Key: 2|4 represents 24

    Question 1.11

    Write down the data represented by this stem and leaf diagram:

    1 7 9 92 4 5 7 83 0 1

    Key: 2|4 represents 2.4

    Note how we have arranged the leaves in numerical order. This will allow us to use the diagram to find the middle value (the median) and the values that are a quarter and three-quarters of the way through the data (the lower and upper quartiles). This will be covered in Chapters 2 and 3.

    stem leaves

  • Page 18 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    In the previous example each of the numbers had only two digits, eg 24. In cases where we have more digits we can either place more digits on the stem or use rounding on the numbers. For example, a company averages their students mock examination results and gets the following data:

    56.2, 61.0, 62.8, 63.9, 64.5, 61.8, 59.4, 58.6, 65.1, 62.1, 60.3, 57.9, 62.3, 62.1, 60.7, 59.4, 61.4, 58.7, 63.0, 70.5, 68.3, 61.9, 60.5, 63.2, 64.8

    Using a key of 61|4 represents 61.4 we get:

    56 257 958 6 759 4 460 3 5 761 0 4 8 962 1 1 3 863 0 2 964 5 865 1666768 36970 5

    Alternatively, rounding each of the data values to the nearest whole number and using a key of 5|8 represents 58 gives:

    5 6 8 9 9 9 96 0 1 1 1 1 2 2 2 2 2 3 3 3 4 5 5 5 87 1

  • Stats Pack-01: Statistical diagrams Page 19

    The Actuarial Education Company IFE: 2014 Examinations

    Question 1.12

    Represent the following claim amounts on a stem and leaf diagram:

    1730, 2480, 3010, 2820, 5390, 6360, 8340, 3710, 2270, 2500, 3450, 4830, 2360, 4340, 7510, 6270, 1750, 2720, 9340, 7550, 11920, 4840, 5670, 930, 2750, 220, 2340, 3510, 4890, 1040, 3410, 5580, 3760

    Comment on the shape of the diagram.

    Stem and leaf diagrams show the shape of the distribution (like bar charts) but have the advantage of not losing the detail of the original data.

    3.4 Dotplot/Lineplot

    A dotplot (also called a lineplot) is another alternative to the histogram. Here are the starting salaries (in 000s) of 7 new students joining a company: 21 23 24 24 25 25 25 27 27 28 We just plot each data value against a number line using a cross or a dot:

    20 21 22 23 24 25 26 27 28 29

    Salary (000's)

    If there are two or more pieces of data to be plotted against the same number then you use the appropriate number of crosses (or dots) on top of each other.

    Question 1.13

    Plot the CT3 mock exam results from Question 1.6 on a line plot:

    72, 70, 71, 74, 68, 69, 71, 72, 70, 75, 71, 72, 71, 71, 72, 69, 74, 70, 72, 71

    Like histograms, dot plots show the shape of the distribution clearly. They also have the advantage of being quick to draw.

  • Page 20 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    A dot plot (or line plot) is often used in the Subject CT3 course to compare the spread (variance) of two or more data sets. They are also commonly used in exam questions as a quick way to check the whether the data set looks like it has come from a normal distribution. This is covered in Chapter 10.

    3.5 Cumulative frequency curves

    In Section 2.3 we constructed cumulative frequency tables from frequency tables:

    Claim amount (x) Frequency Claim amount (x) Cumulative Frequency 0 500x 6 500x 6

    500 1,000x 10 1,000x 16 1,000 1,500x 9 1,500x 25 1,500 2,000x 8 2,000x 33 2,000 4,000x 7 4,000x 40 To obtain a cumulative frequency curve of these data all we do is plot a graph of the cumulative frequencies against the largest claim amount in each group (ie plot 6 against 500).

    0

    5

    10

    15

    20

    25

    30

    35

    40

    0 1000 2000 3000 4000

    claim size

    cum

    ulat

    ive

    freq

    uenc

    y

    In this case we can start at zero as this is the lowest possible value the claims can be. In an exam you would be expected to use graph paper to draw this diagram.

  • Stats Pack-01: Statistical diagrams Page 21

    The Actuarial Education Company IFE: 2014 Examinations

    Typically we get an S-shaped graph as there tend to be lots of values in the middle (so the cumulative frequency rises quickly here) and few extreme values (so the cumulative frequency rises slowly at the ends). Recall from Section 3.2 that there were various ways of grouping continuous data. We are now going to look at how we plot points for each of these other ways. Continuous data could be rounded eg time to the nearest minute. In which case we could get a group of: 10 19 mins Since the times are rounded to the nearest minute, the largest value that could be included in this group is (just below) 19.5 mins (as this will round down to 19 mins). Therefore we would plot the cumulative frequency against 19.5 mins. For groups involving ages, such as age last birthday: 11 20 years The largest age that could be included in this group is (just below) 21 years. This is because up until the day before your 21st birthday you would still say that you were 20 years old. So we would plot the cumulative frequency against 21 years. Question 1.14

    A life assurance company has analysed the ages of its current policyholders. All ages are recorded as age last birthday. The results are as follows:

    Age Frequency 24 29 70 30 34 80 35 39 100 40 49 80 50 64 70

    Construct a cumulative frequency graph for these data.

  • Page 22 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    We will now use the cumulative frequency curve to make some guesstimates about the data. For example, how many of our claims were for less than 750? Reading 750 off of our graph we see that about 11 claims are less than this amount.

    0

    5

    10

    15

    20

    25

    30

    35

    40

    0 1000 2000 3000 4000

    claim size

    cum

    ulat

    ive

    freq

    uenc

    y

    Similarly, we could find the amount that 50% of the claims were under by reading off (50% of 40 which is) the 20th value. We see that this is about 1,225.

    0

    5

    10

    15

    20

    25

    30

    35

    40

    0 1000 2000 3000 4000

    claim size

    cum

    ulat

    ive

    freq

    uenc

    y

  • Stats Pack-01: Statistical diagrams Page 23

    The Actuarial Education Company IFE: 2014 Examinations

    Question 1.15

    Use your cumulative frequency curve from Question 1.14 to estimate: (i) how many policyholders are aged 32 or less (ii) the age under which 75% of the policyholders lie.

    3.6 Boxplot

    A boxplot (also called a box and whisker plot) is another way of showing data:

    25% of data

    lower quartile

    upper quartile

    lowest value

    highest value

    median

    Q1 Q3 M

    25% of data 25% of data 25% of data

    The rectangle (box) in the middle represents the middle 50% of the data (between the values that are a and of the way through the data). The lines (whiskers) extend from the box to the smallest and largest values. The diagram also shows the middle value (called the median). A boxplot is particularly effective when comparing two sets of data, however to draw the diagram we need to calculate the median and the quartiles. Since the median will be covered in Chapter 2 and the quartiles will be covered in Chapter 3 we will deal with this type of diagram at the end of Chapter 3. In the exam it is expected that you would draw a boxplot accurately on graph paper.

  • Page 24 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    4 Using diagrams to compare data

    Once we have drawn our diagrams we can use them to interpret the patterns in the data or compare two or more data sets. In Subject CT3 we will be looking at three features of any data set: the location, the spread and the skewness.

    4.1 Location

    The location of a data set is simply where the data is located ie where is the centre of the data or about what values is it grouped. In everyday language you may use average to describe the location. The stem and leaf diagrams below show the claim amounts (in $s) under two different types of insurance:

    Type A Type B

    0 2 7 0 81 1 1 3 6 8 9 1 0 2 32 3 4 4 4 7 2 1 4 6 83 0 5 3 2 3 3 6 9 94 1 4 0 1 55 2 5 4

    Key: 2|5 represents $250

    Type A claims are mostly located between $100 and $200 whereas type B claims are located between $200 and $300. So we could say the type B claims are greater on average than type A claims. In Chapter 2, we will use the mean, median and mode to measure the location of a set of data.

  • Stats Pack-01: Statistical diagrams Page 25

    The Actuarial Education Company IFE: 2014 Examinations

    4.2 Spread

    The spread of a set of data is simply how spread out (ie how variable) the values are. Are the values bunched together or are they very diverse? The dotplots below show the number of telephone calls received in the last six hours in two different departments of the same company:

    0 1 2 3 4 5 6 7 8 9 10Dept A

    0 1 2 3 4 5 6 7 8 9 10Dept B

    For Department A, the number of phone calls are all bunched together about 5 per hour, whereas for Department B they are very diverse ranging from zero to ten. So we would say that the number of phonecalls per hour is more spread out in Department B than Department A. In Chapter 3, we will use the interquartile range and standard deviation to measure the spread of a set of data.

  • Page 26 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    4.3 Skewness (shape)

    The skewness describes the shape of the distribution is it symmetrical or not? The more skew the data, the more asymmetrical the distribution is. The histograms below show the ages of the population in two different towns:

    age (years)

    freq

    uenc

    y de

    nsity

    Town A population

    age (years)

    freq

    uenc

    y de

    nsity

    Town B population

    We can see that the population in Town A is skewed (ie not symmetrical) as the hump is on the left. However, it is called postively skew as most of the people in the town are to the right of the hump (ie on the positive side). The population in Town B is also skewed (ie not symmetrical) as the hump is on the right. We call this negatively skew as most of the people in the town are to the left of the hump (ie on the negative side). Smoother sketches are shown below:

    positively skewed symmetrical negatively skewed

    In Chapter 3, we will use the third central moment to measure the skewness of a data set.

  • Stats Pack-01: Statistical diagrams Page 27

    The Actuarial Education Company IFE: 2014 Examinations

    Question 1.16

    The diagrams below show the boxplots for two different distributions:

    0 5 10 15 20

    Group A

    Group B

    Compare the location, spread and skewness of these two distributions using the middle lines (the median), the boxes and the whole boxplot, respectively.

  • Page 28 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Extra practice questions Section 3: Summarising data in diagrams

    P1.1 The lengths travelled by snails in 5 mins were measured to the nearest cm. The results are shown in the table below:

    Length (cm) Frequency 0 4 4 5 6 7 7 8 15 9 12 23

    13 18 11 Calculate the frequency densities that you would need to plot on a histogram for these data.

    P1.2 The mortality of males before retirement is being investigated. The age last birthday at death of 500 males was as follows:

    Age 5 19 20 29 30 39 40 49 50 54 55 59 60 64 Frequency 3 20 27 63 67 116 204

    (i) Draw a histogram to represent these data. Below is a histogram showing the deaths of 500 females in the same age range:

    45.2

    26.2

    15.6

    age at death

    3.81.50.70.333

    5 20 30 40 50 60

    10

    20

    30

    40

    Freq

    uenc

    y de

    nsity

    (ii) Use the two histograms to compare the male and female mortality. (iii) Construct a grouped frequency distribution for the females.

  • Stats Pack-01: Statistical diagrams Page 29

    The Actuarial Education Company IFE: 2014 Examinations

    P1.3 The following data shows the times taken (in days) to completely process some simple claims:

    8.02 5.11 5.04 3.88 4.76 3.25 4.41 5.19 4.48 6.28

    9.12 6.53 5.14 2.57 6.80 7.31 5.71 6.16 7.51 8.58 (i) Display these data in a stem and leaf diagram by rounding to 1 decimal place. (ii) Comment on the shape of the distribution.

    P1.4 The length of time (in minutes) for which calls to a helpline were put on hold are given in the following table:

    Time (t) Frequency 0 0.5t 2 0.5 1t 5 1 2t 7 2 5t 12 5 10t 4

    (i) Construct a cumulative frequency curve for these data. (ii) Use this graph to estimate: (a) how many calls were held for less than 3 minutes (b) the time for which more than 50% of the calls were on hold for.

    P1.5 Subject C1, September 1996, Q8 (part) The following table gives the ages of 100 men (in years) in the form of a grouped frequency distribution, where the ages are in groups of width five years, with the exception of the final group. Age last birthday: 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-64 Number of men: 1 2 10 16 22 20 15 14

    Draw a histogram of the data. [2]

  • Page 30 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    P1.6 Subject 101, September 2002, Q6 (part) As part of an investigation an insurance company collected data for the year 2000 on claims sizes for all claims on a certain type of motor insurance policy. The resulting data are given below in the form of a grouped frequency distribution.

    Claim size () Frequency 100 862

    > 100 and 200 608 > 200 and 300 1,253 > 300 and 400 1,066 > 400 and 500 558

    > 500 1,290 Total 5,637

    (i) Calculate the cumulative frequencies and draw a graph of the claim size

    distribution function (ie the cumulative frequencies against claim size). [3] (ii) Determine the proportion of claim sizes which are less than 250. [2] [Total 5] Section 4: Using diagrams to compare data

    P1.7 The ages of employees in two departments are given below: Marketing 24 25 27 27 28 28 28 29 29 32 Personnel 27 31 35 38 44 44 47 47 47 51 Draw dotplots for each of these departments and hence compare the two departments.

  • Stats Pack-01: Statistical diagrams Page 31

    The Actuarial Education Company IFE: 2014 Examinations

    P1.8 Subject 101, April 2002, Q7 The following information on white blood cell count (WBCC) was collected from subjects one week after the start of chemotherapy treatment. One group of subjects (A) received steroids in addition to the chemotherapy treatment and the other group (B) received a placebo in addition to the chemotherapy. The subjects were assigned to the groups at random. Group A Steroid WBCC (millions of cells per ml)

    12.4 15.2 12.7 15.9 12.2 14.2 12.9 14.2 12.4 14.6 12.7 13.6 12.5 13.3 12.1 13.9 17.1 13.6 17.2 13.1

    Group B Placebo WBCC (millions of cells per ml)

    17.0 13.5 15.4 14.1 15.4 14.8 12.9 14.4 13.2 13.1 12.9 13.9 13.0 13.6 13.0 13.4 12.9 13.1 14.4 13.8

    (i) Construct stem and leaf diagrams for Group A and Group B separately. [2] (ii) Comment on the results in the context of investigating an association between

    WBCC and the treatment with or without steroids. [2] [Total 4]

  • Page 32 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    This page has been left blank so that you can keep the chapter summaries together for revision purposes.

  • Stats Pack-01: Statistical diagrams Page 33

    The Actuarial Education Company IFE: 2014 Examinations

    Chapter 1 Summary Data Data (ie information or facts) can be subdivided as follows:

    data

    numerical(ie numbers)

    categorical(ie not numbers)

    discrete continuous nominal ordinalattribute(dichotomous)

    Discrete data is numerical data that can only take particular values (eg 0,1,2,3, ). Continuous data is numerical data that can take any value. We can summarise data using tables (frequency distributions) or diagrams. Histograms A histogram is similar to a bar chart but is drawn for continuous data. Therefore it has no gaps between the bars. However, for a histogram the area of the bar gives the frequency of the group (class). The frequency density (the height of the bars) is found from:

    frequencyfrequency densityclass width

    where the class width is the difference between the largest and smallest values allowed in the class. Line plots We just plot each data value against a number line using a cross or a dot.

  • Page 34 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Stem and leaf diagrams A stem and leaf diagram splits each data value up into 2 parts as follows:

    1 7 9 92 4 5 7 83 0 1

    Key: 2|4 represents 24

    This diagram represents the values: 17, 19, 19, 24, 25, 27, 28, 30, 31. Cumulative frequency diagrams Cumulative frequency is the sum of the frequencies. A cumulative frequency diagram plots the largest possible value in each group against the cumulative frequency. Boxplots

    25% of data

    lower quartile

    upper quartile

    lowest value

    highest value

    median

    Q1 Q3 M

    25% of data 25% of data 25% of data

    Comparing data sets When comparing data sets we look at the location, spread and skewness (shape) of each distribution. The types of skewness are:

    positively skewed symmetrical negatively skewed

    stem leaves

  • Stats Pack-01: Statistical diagrams Page 35

    The Actuarial Education Company IFE: 2014 Examinations

    Chapter 1 Solutions Solution 1.1

    (i) (a) Numerical (eg 75kg, 200g, 3 tons) (b) Categorical (eg London, Glasgow, Bognor Regis) (c) Numerical (eg 12 claims, 193 claims) (d) Categorical (eg theft, fire, accident, hurricane) (e) Numerical (eg 23 years, 65 years) (f) Numerical (eg 180, 2m, $740.99) (ii) (a) Continuous as items can weigh absolutely any positive value. (c) Discrete, as there can only be a whole number of claims (ie 0, 1, 2, ).

    (e) Depends! When we give our age we usually give our age last birthday (eg 23 years) which is discrete rather than our exact age (eg 23 years, 3 months, 2 days, 14 hours, ) which is continuous.

    (f) Well technically discrete since you can only have a whole number of

    pence (eg 450.62). However, in Subject CT3 we shall treat it as continuous as the numbers involved as often so large (eg 4,267,593.81) that it can take (as good as) any value.

    The advantage of treating it as continuous is that we can use continuous

    functions (eg 2 2 1y x x ) to calculate amounts (and then just round them to the nearest pence afterwards). This is much more preferable to awkward functions that would only give whole numbers

    Solution 1.2

    (i) Nominal data as we cannot put them in any order. (ii) Attribute data as the answer is yes or no. (iii) Ordinal data as the categories are ordered from poor to excellent.

  • Page 36 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solution 1.3

    (i) We just total up the frequencies 2 5 8 6 5 2 28 . (ii) From the table, we have 2 days with 0 claims (total 2 0 0 claims), 5 days

    with 1 claim (total 5 1 5 claims), 8 days with 2 claims (total 8 2 16 claims), 6 days with 3 claims ( total 6 3 18 claims), 5 days with 4 claims (total 5 4 20 claims) and 2 days with 5 claims (total 2 5 10 claims). This gives us a grand total of 0 5 16 18 20 10 69 claims.

    What we are doing is multiplying the frequencies by each data value and then

    totalling all of these up. Later we shall write this in shorthand as fx . Solution 1.4

    The completed frequency table is as follows:

    Time (t) Frequency 0 0.5t 2 0.5 1t 5 1 2t 7 2 5t 12 5 10t 4

    The only problems that might occur are placing 0.5 in 0 0.5t group rather than the 0.5 1t group and not including some of the data values in the table. Crossing off the data values as you put them in the table is a useful way to ensure we dont miss any values. We could also check that we have the correct total number of results by adding up the frequencies.

  • Stats Pack-01: Statistical diagrams Page 37

    The Actuarial Education Company IFE: 2014 Examinations

    Solution 1.5

    The cumulative frequency table is:

    Time (t) Cumulative Frequency 0.5t 2 1t 7 2t 14 5t 26

    10t 30 Or we could use up to 0.5 mins, up to 1 min, etc as the groups. Solution 1.6

    Putting this data into a frequency table:

    Mock result Frequency 68 1 69 2 70 3 71 6 72 5 73 0 74 2 75 1

    It is now easy to draw the bar chart:

    0

    1

    2

    3

    4

    5

    6

    7

    68 69 70 71 72 73 74 75

    Frequency

    Moc

    k re

    sults

  • Page 38 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solution 1.7

    (i) Using

    frequencyfrequency densityclass width

    we get:

    Claim amount (x) Frequency Frequency density 0 250x 60 60 250 0.24

    250 500x 75 75 250 0.3 500 1,000x 50 50 500 0.1

    1,000 2,000x 40 40 1,000 0.04 2,000 5,000x 30 30 3,000 0.01

    (ii) The histogram is:

    0 1,000 2,000 3,000 4,000

    claim amount ()

    0.05

    0.1

    0.15

    0.2

    0.25

    Freq

    uenc

    y de

    nsity

    0.3

    5,000

  • Stats Pack-01: Statistical diagrams Page 39

    The Actuarial Education Company IFE: 2014 Examinations

    Solution 1.8

    (i) The group ranges from exactly 150 to (just below) 170. Hence: 170 150 20class width (ii) Since the amounts are rounded to the nearest , the smallest value that could be

    included in this group is 149.50 (as this would round up to 150). Similarly (treating the amounts as continuous) the largest value that could be included in this group is (just below) 169.50. Hence:

    169.50 149.50 20class width (iii) This is very similar to part (ii) except that we cant get claims smaller than 0.

    Therefore the smallest value that could be included is 0. Hence: 149.50 0 149.50class width (iv) The smallest age that could be included in this group is 30 years (as the person

    could have died on their 30th birthday). However, if a person dies up until the day before their 36th birthday we would still say they were age 35. Hence the largest age that could be included is (just below) 36 years. This gives:

    36 30 6class width years

  • Page 40 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solution 1.9

    First we need to calculate the frequency densities. The first class goes from age 24 to (just below) age 30 so the class width is 30 24 6 . Similarly, the second class goes from age 30 to (just below) 35 so the class width is 35 30 5 and so on.

    Age Frequency Frequency density 24 29 72 72 6 12 30 34 80 80 5 16 35 39 100 100 5 20 40 49 80 80 10 8 50 64 75 75 15 5

    Now we can draw the histogram, remembering to draw start and end points of the bars at the correct values (eg the first bar should be drawn from ages 24 to 30):

    age (years)20 30 40 50 600

    5

    10

    15

    20

    Freq

    uenc

    y de

    nsity

  • Stats Pack-01: Statistical diagrams Page 41

    The Actuarial Education Company IFE: 2014 Examinations

    Solution 1.10

    Using the fact that the frequency is given by the area: ( )frequency area height frequency density class width We get:

    Time Frequency 0 10t 50

    10 20t 7 10 70 20 40t 4 20 80 40 70t 2.5 30 75 70 120t 0.5 50 25

    Solution 1.11

    The data represented by the stem and leaf diagram is: 1.7 1.9 1.9 2.4 2.5 2.7 2.8 3.0 3.1

  • Page 42 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solution 1.12

    Rounding each of the claims to the nearest 100 we get a stem and leaf diagram of:

    0 2 91 0 7 82 3 3 4 5 5 7 8 83 0 4 5 5 7 84 3 8 8 95 4 6 76 3 47 5 68 39 3

    1011 9

    Key: 4|8 is 4,800.

    The data is concentrated at the lower end ie there are many claims for small amounts and few claims for high amounts. This is known as positively skewed. We will meet this in Section 4 of this chapter and also in Chapter 3. Solution 1.13

    The dot plot (or line plot) is as follows:

    68 69 70 71 72 73 74 75

    mock exam mark

  • Stats Pack-01: Statistical diagrams Page 43

    The Actuarial Education Company IFE: 2014 Examinations

    Solution 1.14

    We first need to calculate the cumulative frequencies:

    Age Cumulative Frequency 29 70 34 150 39 250 49 330 64 400

    Since the first group started at age 24 the graph can start from this value. However dont get caught out! Age 29 goes all the way up to (just before) age 30. Therefore the first cumulative frequency should be plotted against 30, the second against 35 and so on.

    0

    100

    200

    300

    400

    20 30 40 50 60

    age (years)

    cum

    ulat

    ive

    freq

    uenc

    y

  • Page 44 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solution 1.15

    (i) We use our graph from Solution 1.14 to read off 32 years:

    0

    100

    200

    300

    400

    20 30 40 50 60

    age (years)

    cum

    ulat

    ive

    freq

    uenc

    y

    We can see that roughly 100 policyholders are younger than this. (ii) 75% of 400 policyholders is 300. So reading off the 300th value:

    0

    100

    200

    300

    400

    20 30 40 50 60

    age (years)

    cum

    ulat

    ive

    freq

    uenc

    y

    We can see that this is roughly 45 years.

  • Stats Pack-01: Statistical diagrams Page 45

    The Actuarial Education Company IFE: 2014 Examinations

    Solution 1.16

    Using the middle line (the median) on each boxplot to compare the locations, we see that Group A is located at 8 and Group B is located at 7. Therefore on average the values in Group A are higher than Group B. Using the boxes to measure the spread, we see that Group A has a smaller spread than Group B. Looking at the whole boxplot, we see that Group A is roughly symmetrical whereas Group B is positively skew (as most of the data values are to the right of the middle value).

  • Page 46 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    Solutions to extra practice questions

    P1.1 Since the lengths are rounded to the nearest cm, the first group ranges from 0 cm to 4.5 cm. Similarly the second group ranges from 4.5 cm to 6.5 cm and so on. This gives:

    Length (cm) Frequency 0 4 4 4.5 0.89 5 6 7 2 3.5 7 8 15 2 7.5 9 12 23 4 5.75

    13 18 11 6 1.83 Note that if we were constructing the histogram we would draw the first bar from 0 to 4.5, the second bar from 4.5 to 6.5 and so on.

  • Stats Pack-01: Statistical diagrams Page 47

    The Actuarial Education Company IFE: 2014 Examinations

    P1.2 (i) Using

    frequencyfrequency densityclass width

    we get:

    Age (years) Frequency Frequency density 5 19 3 3 15 0.2

    20 29 20 20 10 2 30 39 27 27 10 2.7 40 49 63 63 10 6.3 50 54 67 67 5 13.4 55 59 116 116 5 23.2 60 64 204 204 5 40.8

    The first bar is drawn from 5 to 20, the second bar from 20 to 30 and so on:

    40.8

    23.2

    13.4

    age at death

    6.32.7

    5 20 30 40 50 60

    10

    20

    30

    40

    Freq

    uenc

    y de

    nsity

    20.2

    (ii) The mortality for this group of males is much higher in the 20 49 age range

    and lower in the 50 64 age range than the mortality for this group of females. So it appears that on average males die at a younger age. Both male and female ages at death have negatively skewed distributions.

    (iii) Using ( )frequency area height frequency density class width we get:

    Age (years) Frequency 5 19 0.333 15 5

    20 29 0.7 10 7 30 39 1.5 10 15 40 49 3.8 10 38 50 54 15.6 5 78 55 59 26.2 5 131 60 64 45.2 5 226

  • Page 48 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    P1.3 (i) Rounding each of the values to 1 decimal place, we get:

    2 63 3 94 4 5 85 0 1 1 2 76 2 3 5 87 3 58 0 69 1

    Key: 3|9 represents 3.87

    (ii) The data appears to be symmetrical about roughly 5 days.

  • Stats Pack-01: Statistical diagrams Page 49

    The Actuarial Education Company IFE: 2014 Examinations

    P1.4 (i) The cumulative frequency table for these data is:

    Time (t) Cumulative Frequency 0.5t 2 1t 7 2t 14 5t 26

    10t 30

    Since the data starts at 0 our cumulative frequency curve will start from there. The next point would be at (0.5, 2) and so on.

    0

    10

    20

    30

    0 2 4 6 8 10

    time (mins)

    cum

    ulat

    ive

    freq

    uenc

    y

    (ii) (a) Reading 3 mins off the graph gives about 19 phone calls. (b) Reading off 15 (50% of 30) gives about 2 minutes.

    0

    10

    20

    30

    0 2 4 6 8 10

    time (mins)

    cum

    ulat

    ive

    freq

    uenc

    y

  • Page 50 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    P1.5 First we need to calculate the frequency densities:

    Age Frequency Frequency density 20 24 1 1 5 0.2 25 29 2 2 5 0.4 30 34 10 10 5 2 35 39 16 16 5 3.2 40 44 22 22 5 4.4 45 49 20 20 5 4 50 54 15 15 5 3 55 64 14 14 10 1.4

    Now we can draw the histogram, remembering to draw the bars the correct widths as well (eg the first bar should be drawn from ages 20 to 25).

    age (years)20 30 40 50 600

    1

    2

    3

    4

    Freq

    uenc

    y de

    nsity

    5

  • Stats Pack-01: Statistical diagrams Page 51

    The Actuarial Education Company IFE: 2014 Examinations

    P1.6 (i) The cumulative frequencies are shown in the following table:

    Claim size () Cumulative frequency 100 862 200 1,470 300 2,723 400 3,789 500 4,347 5,637

    Claims start from 0 hence our cumulative frequency curve will start from there. Once again note that the values are plotted at the end of each group:

    0

    1000

    2000

    3000

    4000

    5000

    6000

    0 100 200 300 400 500

    claim size ()

    cum

    ulat

    ive

    freq

    uenc

    y

    Since we dont know what the largest claim is we simply draw a line to indicate the maximum cumulative frequency that can be attained.

  • Page 52 Stats Pack-01: Statistical diagrams

    IFE: 2014 Examinations The Actuarial Education Company

    (ii) Reading 250 off the cumulative frequency curve to see how many values are less than this we get:

    0

    1000

    2000

    3000

    4000

    5000

    6000

    0 100 200 300 400 500

    claim size ()

    cum

    ulat

    ive

    freq

    uenc

    y

    From the graph we can see that about 2,075 claims are less than 250 (well it would be if it was drawn on graph paper). This would give a proportion of 2,0755,637 37% .

    Alternatively, using the original frequency table, 250 is halfway through the

    200 300and group. So half of the 1,253 values in this group will be less than 250. In addition, the 862 and 608 values in the first two groups are also less than 250. Hence 862 608 626.5 2,096.5 values are less than 250. So the proportion of claim sizes less than 250 is 2,096.55,637 37.2% . This method is called interpolation and will be met in more detail in the next chapter.

  • Stats Pack-01: Statistical diagrams Page 53

    The Actuarial Education Company IFE: 2014 Examinations

    P1.7 The dot plot (or line plot) for each department is:

    20 25 30 35 40 45 50 55

    Marketing

    20 25 30 35 40 45 50 55

    Personnel

    We can see that the ages of those working in Personnel are higher on average than those working in Marketing. The spread of the ages of those working in Personnel is wider than Marketing. Finally, the ages appear to be fairly symmetrical in Marketing whereas they are negatively skewed for Personnel.

    P1.8 (i) For Group A: For Group B:

    12 1 2 4 4 5 7 7 9 12 9 9 913 1 3 6 6 9 13 0 0 1 1 2 4 5 6 8 914 2 2 6 14 1 4 4 815 2 9 15 4 416 1617 1 2 17 0

    Key: 13|1 is 13.1

    (ii) Group A seems to have slightly more data at lower values, so the results for

    group A are slightly lower on average than group B. However, group A is slightly more spread out than group B (from 12.1 to 17.2 whereas group B ranges from 12.9 to 17.0). Finally, both distributions are positively skewed. So overall it seems that the treatment with or without steroids is pretty much the same.

  • IFE: 2014 Examinations The Actuarial Education Company

    All study material produced by ActEd is copyright and is sold for the exclusive use of the purchaser. The copyright is owned

    by Institute and Faculty Education Limited, a subsidiary of the Institute and Faculty of Actuaries.

    Unless prior authority is granted by ActEd, you may not hire out, lend, give out, sell, store or transmit electronically or

    photocopy any part of the study material.

    You must take care of your study material to ensure that it is not used or copied by anybody else.

    Legal action will be taken if these terms are infringed. In addition, we may seek to take disciplinary action through the

    profession or through your employer.

    These conditions remain in force after you have finished using the course.

  • Stats Pack-02: Sample calculations 1 Page 1

    The Actuarial Education Company IFE: 2014 Examinations

    Chapter 2

    Sample calculations 1

    Links to CT3: Chapter 1 Sections 2.1-2.3 Syllabus objectives: (i)2. Describe the level/location of a set of data using the mean, median, mode, as

    appropriate.

    0 Introduction

    Below is a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 This list is not very helpful in telling us what exactly is going on. In Chapter 1 we used diagrams to make sense of the data, such as the simple dot plot below:

    40 50 60 70 80 90 100 110

    Age of death (yrs)

    We could also look at the location of the distribution, the spread of the distribution and the shape of the distribution (skewness). Recall that the location gives the centre or average of a set of data.

  • Page 2 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    In this chapter we will find a single numerical value to summarise the location of the entire data set. That is, a single figure that will tell us whereabouts the data is grouped (ie a typical value to represent the data). We will cover the three measures of location of a sample data set: the mode, the median and the mean.

  • Stats Pack-02: Sample calculations 1 Page 3

    The Actuarial Education Company IFE: 2014 Examinations

    1 Sample mode

    1.1 Sample mode from a list

    Here are the salaries of 7 individuals in a company (in 000s): 18 21 25 25 25 25 30 If I asked you to give one salary that summarised these results, its quite likely that you would say 25,000. Why? Because most of the employees earn 25,000. This summary figure is called the mode of the data it is simply the data value that appears most often (ie the most frequent value). You may also see the mode referred to as the modal value.

    Question 2.1

    Below are the numbers of new actuarial students taken on in 2003 by six pension companies: 8 5 19 3 6 5 Find the modal number of new actuarial students employed.

    The mode is very easy to obtain and is not affected by extreme values (eg 19 in the above question), however, there are a couple of problems that limit its usefulness. These are illustrated in the next question.

    Question 2.2

    Find the mode of each of the following data sets: (i) 6 4 7 5 4 6 (ii) 1 2 3 4 5

    Since the mode may not exist or may not be unique, we will not be making much use of the mode as a measure of location.

  • Page 4 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    1.2 Sample mode from a frequency distribution

    We will now look at how we can calculate the mode from a table of results (a frequency distribution) that we used in Chapter 1 to summarise a large data set. Recall that the mode was the data value that occurred most often. Take this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 It is clear to see that the mode is 3 as there are more of this number than any other. Putting this set of data into a frequency table:

    Value, x 1 2 3 4 5 Frequency, f 3 4 6 5 2

    We can see that the number 3 has the highest frequency (since it occurs most). This gives us the method of finding the mode from a frequency table: find the value with the highest frequency. Question 2.3

    The number of personal pension reviews completed by a student each day over the last four weeks are given below:

    Reviews completed in a day 4 5 6 7 8 Frequency 5 7 4 3 1

    (i) James thinks that the modal amount of reviews completed in a day is 8 as it is

    the highest number. What has he done wrong? (ii) What is the correct mode?

  • Stats Pack-02: Sample calculations 1 Page 5

    The Actuarial Education Company IFE: 2014 Examinations

    1.3 Sample mode from a grouped frequency distribution

    At the beginning of this chapter we had a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 This data is best suited to a grouped frequency table (as there are 25 different values with hardly any repeats). Age 40 49 50 59 60 69 70 79 80 89 90 99 100 109Frequency 1 2 5 10 8 3 1

    If we do not have the original list of data we will not be able to tell which value occurs most. For example, looking at the frequency table above, you would not be able to tell that 81 is the mode! All we can do in this situation is to state the modal group, which in this case is the 70 79 group. Question 2.4

    A general insurance company records the amount claimed on the last 100 claims on a particular type of car insurance. The results were:

    Claim Amount, c No. of claims 0 500c < 6

    500 1,000c < 11 1,000 1,500c < 49 1,500 2,000c < 26 2,000 5,000c < 8

    State the modal group.

  • Page 6 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    1.4 Sample mode summary

    In summary: Advantages Easy to calculate Unaffected by extreme values (see Question 2.1) Disadvantages May not be unique (see Question 2.2 (i)) May not exist (see Question 2.2 (ii)) Does not use all the data values Cannot be used in further calculations May only be able to obtain a modal group

  • Stats Pack-02: Sample calculations 1 Page 7

    The Actuarial Education Company IFE: 2014 Examinations

    2 Sample mean

    2.1 Sample mean from a list

    Looking again at the salaries of 7 individuals in a company (in 000s): 18 21 25 25 25 25 30 We could share out the salaries equally between the 7 individuals as a way of finding the average or centre salary. This gives:

    18 21 25 25 25 25 30 169 24.1437 7

    + + + + + + = = So this gives a salary of 24,143 each. This method has the advantage of using all the data values and we can see that this gives a value slightly less than the mode of 25,000 because there were two people who earned less than this compared to one who earned more. This summary figure is called the mean of the data and this is what most people would call the average.

    Question 2.5

    The sizes of ten car claims received by an insurance company were: 1,500 1,820 840 260 2,100 790 530 1,360 1,780 1,650 Find the mean car insurance claim amount.

    The formula Suppose we have a sample of n values 1 2, , , nx x x . We add these numbers up and divide by how many data values there are (ie n):

    1 2 nx x xn

    + + +

  • Page 8 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    Using the sigma notation for summation, this becomes:

    1

    n

    ii

    x

    n=

    Although it is usually abbreviated to:

    orix x

    n n

    Definition The sample mean, x , is given by:

    x

    xn

    = This formula is given on page 22 of the Tables. Whilst the mean is a vast improvement over the mode as a measure of the centre of the data, it still can give some dodgy answers. The next question illustrates this:

    Question 2.6

    (i) Below are the numbers of new actuarial students taken on in 2003 by six pension companies:

    8 5 19 3 6 5

    Find the mean number of new actuarial students employed. (ii) Below are the salaries (in 000s) of eight individuals in a small company: 12 12 12 12 12 12 12 50 Find the mean salary. (iii) What are the problems with the values obtained in (i) and (ii)?

  • Stats Pack-02: Sample calculations 1 Page 9

    The Actuarial Education Company IFE: 2014 Examinations

    Despite these problems, the mean is still used as the main measure of location throughout the actuarial exams. This is mainly due to the fact that the sample mean has a number of properties that make it useful in further calculations. These will be covered in Chapters 8 and 9 of the Subject CT3 course.

    2.2 Sample mean from a frequency distribution

    We will now look at how we can calculate the mean from a frequency distribution (ie a table of results) that we used in Chapter 1 to summarise a large data set. Recall that we calculated the sample mean, x , by first adding up all the data values and then dividing the total by how many values there were. Take this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 We would find the mean by:

    (1 1 1) (2 2 2 2) (3 3 3 3 3 3) (4 4 4 4 4) (5 5)20

    59 2.9520

    x + + + + + + + + + + + + + + + + + + +=

    = =

    Surely there must be a quicker way? There is! How about we say we have three 1s and four 2s and so on? The calculation then becomes:

    (3 1) (4 2) (6 3) (5 4) (2 5) 59 2.9520 20

    x + + + + = = = Notice that we are multiplying each value by its frequency. Notice also that the total number of values is given by the total of the frequencies 3 4 6 5 2 20+ + + + = .

    Question 2.7

    Use the shortcut method to calculate the mean of this set of data: 2, 2, 2, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 10, 10, 10, 10, 12, 12, 14

  • Page 10 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    So, how does this relate to a frequency table? Well, we are given the values and their frequencies in the table so we can work out the total in the same way as above (by multiplying each of the values by their respective frequencies) and then dividing by how many values there are (the total of the frequencies):

    Value, x 1 2 3 4 5 Frequency, f 3 4 6 5 2

    (3 1) (4 2) (6 3) (5 4) (2 5) 59 2.95

    3 4 6 5 2 20x + + + + = = =+ + + +

    Common Error: Students often confuse dividing by the total number of values (which is obtained by totalling the frequencies) with dividing by the number of groups.

    Question 2.8

    The frequency table shows the number of claims made on 100 car insurance policies in the last year. Calculate the mean number of claims per policy:

    Number of claims per policy 0 1 2 3 Frequency 74 19 5 2

    Formula In our table we have, say, m different values 1 2, , , mx x x with frequencies

    1 2, , , mf f f . To find the mean we multiplied the frequencies by the corresponding data values and divided by the total of the frequencies:

    1 11

    m m

    m

    f x f xxf f+ += + +

    Writing this using the sigma notation for summation we get:

    fxx

    f=

  • Stats Pack-02: Sample calculations 1 Page 11

    The Actuarial Education Company IFE: 2014 Examinations

    2.3 Sample mean from a grouped frequency distribution

    Now suppose we want to find the mean from this grouped frequency distribution:

    Claim Amount, c Frequency 0 500c < 6

    500 1,000c < 11 1,000 1,500c < 49 1,500 2,000c < 26 2,000 5,000c < 8

    Before when we calculated the mean from a frequency table, we multiplied the values by the frequency. The question now is which value in each group do we multiply the frequency by? Well the natural choice would be the middle of each group. We will use the midpoint of each group. We find the midpoint by averaging the largest and smallest possible value in each group. So the midpoint for the 0 500c < group is 0 500 250

    2+ = . Similarly, the midpoints for the other groups are

    750, 1,250, 1,750, and 3,500 . The mean claim amount is then:

    (6 250) (11 750) (49 1,250) (26 1,750) (8 3,500)6 11 49 26 8

    144,500 1,445100

    x + + + + = + + + += =

    Question 2.9

    The heights, in cm, of thirty actuaries are recorded below. Find their mean height.

    Heights, h Frequency 150 160h < 4 160 170h < 6 170 175h < 11 175 180h < 7 180 195h < 2

  • Page 12 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    When calculating the midpoint of groups with rounded data and ages we will need to take care that we do use the correct largest and smallest possible values for that group. For example, when values are rounded to the nearest cm, the 10 19- cm group ranges from 9.5 cm to (just below) 19.5 cm. Hence, the midpoint would be:

    9.5 19.5 14.52+ = cm

    Similarly, when age last birthday is used, the 10 19- years group ranges from 10 years to (just below) 20 years old. Hence, the midpoint would be:

    10 20 152+ = years

    Question 2.10

    The table below contains the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company:

    Age 40 49 50 59 60 69 70 79 80 89 90 99 100 109 Frequency 1 2 5 10 8 3 1

    Find the mean age of the policyholders. Note that when we use the midpoint we assume that the values are evenly spread through the group. This is not necessarily the case. For example, the actual data values for Question 2.10 were: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101

    The true mean of these values is 2, 277 75.930

    x = = years. This is slightly different to the mean value obtained in Question 2.10. Hence, the mean using midpoints is just an estimate it is the best we can do without having the original list of data.

  • Stats Pack-02: Sample calculations 1 Page 13

    The Actuarial Education Company IFE: 2014 Examinations

    2.4 Other questions involving the mean

    There are a couple of other questions that could be asked about the mean. For example, we could be given the mean and be asked to calculate a single value or the sample total. The following question covers each of these possibilities:

    Question 2.11

    (i) The mean age of death of 12 assurance policyholders was 72. What was the total age of the 12 policyholders?

    (ii) The mean of the following list of investment returns is 4.2%. 5% 4.75% 3.6% % 3.25%x Find the value of x. (iii) A small department employs ten actuaries; their mean salary is 48,000. When

    an eleventh actuary joins the department the mean salary of all the actuaries drops to 45,800. Find the salary of the new employee.

    (iv) The mean sum assured on 12 term assurances was 50,000 whereas the mean

    sum assured on 8 endowment assurances was 30,000. Calculate the mean sum assured on all 20 policies.

  • Page 14 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    2.5 Sample mean summary

    In summary: Advantages Uses all the data values Has properties that make it useful in further calculations (see Subject CT3

    Chapter 6)

    Disadvantages Can give impossible figures for discrete data (see Question 2.6 (i)) Affected by extreme values (see Question 2.6 (ii)) Can only be estimated when using grouped data

  • Stats Pack-02: Sample calculations 1 Page 15

    The Actuarial Education Company IFE: 2014 Examinations

    3 Sample median

    3.1 Sample median from a list

    Whilst the mean is the preferred measure of location it is affected by extreme values. So what we need is a measure that is unaffected by extreme values (unlike the mean) and that always exists and is unique (unlike the mode). Consider the heights of the five individuals below:

    Billy Bertie Barry BorisBart

    If I asked you to give me the person with the typical height of these individuals you would probably choose Bart (despite his name). Why? Because he has the middle height. This gives us our third and final measure of location the median the middle value. So how do we calculate the median in practice? Here is a list of 5 numbers: 9 7 2 9 4 So the median is the middle value which is 2!?! Clearly not! We need to put the numbers in increasing order first (like the heights above) otherwise the number in the middle of the list is not necessarily the middle value numerically! This gives: 2 4 7 9 9 So all we have to do now is locate the middle value well simply counting in from both ends we arrive at the number 7. 2 4 7 9 9 median

  • Page 16 Stats Pack-02: Sample calculations 1

    IFE: 2014 Examinations The Actuarial Education Company

    Question 2.12

    Find the median of these sums assured (000s) by a certain life assurance company: 125 75 25 20 50 25 50 15 30

    What happens if we have an even number of data values? Consider the following list: 5 9 11 3 6 12 Firstly, rearranging them in order gives: 3 5 6 9 11 12 Counting in to the middle and we see that the middle lies between 6 and 9 3 5 6 9 11 12 median All we need is the value that is halfway between 6 and 9. This is 7 so the median is 7. If you have trouble finding the value halfway between the two middle numbers, just

    find the average of them, ie 6 9 72+ = .

    Question 2.13

    Find the median of the following data set:

    9 1 4 10 15 5 3 9

    Now if we have a long list of numbers the last thing we would want to do is count in to find the middle. Hence, we need to find a shortcut to locate the middle value.

  • Stats Pack-02: Sample calculations 1 Page 17

    The Actuarial Education Company IFE: 2014 Examinations

    Suppose we have 6 numbers surely the median would just be the 6 2 3rd