statistical reasoning chapter 5. chapter 5 5.1 – exploring data

50
Statistica l Reasoning CHAPTER 5

Upload: tracy-gilmore

Post on 18-Jan-2016

236 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Statistical ReasoningCHAPTER 5

Page 2: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 55.1 – EXPLORING

DATA

Page 3: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

DATA SET

The payrolls for three small companies are shown in the table. Figures include year-end bonuses. Each company has 15 employees. Sanela wonders if the companies have similar “average” salaries.What is the range of salaries for each company? 245 000 – 27 300 (Media Focus

Advertising) 217 700

362 000 – 52 300 (Computer Rescue) 309 700

97 500 – 55 250 (Auto Value Sales) 42 250

Are there any outliers for any of the companies?

Page 4: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

MEASURES OF CENTRAL TENDENCY

What are the measures of central tendency (mean, median, and mode)?

Mean: Company A:27 300 + 28 500 + 33 400 + 36 200 + 39 500 + 42 500 + 47 400 + 57 500 + 61 000 + 61 000 + 65 000 + 71 000 + 86 000 + 162 000 + 245 000 = 1 063 300 1 063 300/15 = 70 887

Company B:52 300 + 52 700 + 53 100 + 53 800 + 55 200 + 55 900 + 56 500 + 59 000 + 59 200 + 62 500 + 63 000 + 96 500 + 96 500 + 112 000 + 362 000 = 1 290 200 1 290 200/15 = 86 013

Company C:55 250 + 55 250 + 56 900 + 57 300 + 57 900 + 58 200 + 58 300 + 58 900 + 61 500 + 62 300 + 62 800 + 62 800 + 64 400 + 66 900 + 97 500 = 936 200 936 200/15 = 62 413

Page 5: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

MEDIAN AND MODE

What are the measures of central tendency (mean, median, and mode)?

Median: We’re looking for the term right in the middle

Company A: 57 500Company B: 59 000Company C: 58 900Mode:The number that appears the most times

Company A: 61 000 Company B: 96 500Company C: 55 250

Mean:Company A: 70 887Company B: 86 013Company C: 62 413

Page 6: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Paulo needs a new battery for his car. He is trying to decide between two different brands. Both brands are the same price. He obtains data for the lifespan, in years, of 30 batteries of each brand, as shown below.

What’s the range? Brand X: 3.1 – 8.2 Brand Y: 4.5 – 7.0

Range is a measure of dispersion. Dispersion is a measure that varies by the spread among the data in a set; dispersion has a value of zero if all the data in a set is identical, and it increases in value as the data becomes more spread out.What’s the mode? Brand X: 5.7 Brand Y: 5.6, 5.7, 5.8, 5.9, 6.0,

6.8

Is the mode useful in this comparison?

Which brand would you choose?

Page 7: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

HANDOUT

Today we will be examining sets of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.

Page 8: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 5

5.2 – FREQUENCY TABLES,

HISTOGRAMS, AND FREQUENCY

POLYGONS

Page 9: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

The following data represents the flow rates of the Red River from 1950 to 1999, as recorded at the Redwood Bridge in Winnipeg, Manitoba.

Page 10: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

Determine the water flow rate that is associated with serious flooding by creating a frequency distribution.

Frequency distribution is a set of intervals (table or graph), usually of equal width, into which data is organized; each interval is associated with a frequency that indicates the number of measurements in this interval.What’s the lowest water flow rate? 159What’s the highest? 4587The range is 4587 – 159 = 4428 We can divide this into 10 equal

parts:4428/10 = 442.8So, let’s let the interval width be 500.

Flow Rate(m3/s)

Tally Frequency (# of years)

0 - 500

500 - 1000

1000-1500

1500-2000

2000-2500

2500-3000

3000-3500

3500-4000

4000-4500

4500-5000

Page 11: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

Instead, you could have created a histogram. A histogram is the graph of a frequency distribution, in which equal intervals of values are marked on a horizontal axis and the frequencies associated with these intervals are indicated by the areas of the rectangles drawn for these intervals.

We know that there were nine floods. Based on the histogram, the flow rate was greater than 1950 m3/s in only 12 years. These 12 years must include the flood years. We could predict that floods occur when the flow rate is greater than 1950 m3/s.

Page 12: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

Or, we could create a frequency polygon. A frequency polygon is the graph of a frequency distribution, produced by joining the midpoints of the intervals using straight lines. Most of the data is in the first four intervals, and the most common water flow is between 1500 and 2000 m3/s. After this, the frequencies drop off dramatically.

There were six years where the flow rate was around 2625, 3075 or 4425 m3/s. These must have been flood years. The other three floods should have occurred when the flow rate was around 2175 m3/s.

Assuming that the flow rate in three of those years was 2175 m3/s or greater, floods should occur when the flow rate is 2175 m3/s or greater.

Page 13: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

The magnitude of an earthquake is measured using the Richter scale. Examine the histograms for the frequency of earthquake magnitudes in Canada from 2005 to 2009. Which of these years could have had the most damage from earthquakes?

Page 14: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

We could use a frequency table:

Year 2005 2006 2007 2008 2009

Frequency of Earthquakes from 4.0 to

4.9

5 8 9 16 7

2008 had the greatest number of earthquakes with the potential for minor damage.

Four of the years had three earthquakes with magnitudes from 5.0 to 5.9, while 2008 had five earthquakes with these magnitudes.

Year Magnitude on Richter Scale Total

4.0 – 4.9 5.0 – 5.9 6.0 – 6.9

2008 16 5 4 25

2009 7 3 1 11

Therefore, 2008 could have had the most damage from earthquakes.

Page 15: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

We could have made a frequency polygon.

Both 2008 and 2009 had the strongest earthquakes, registering from 6.0 to 6.9 on the Richter scale.

The number of earthquakes in the three highest intervals was greater into 2008 than in 2009, so 2008 could have had the most damage from earthquakes.

Page 16: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Independent Practice

PG. 249-253, # 1, 3, 5, 6, 8, 10, 11, 12

Page 17: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 55.3 – STANDARD

DEVIATION

Page 18: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

USING EXCEL TO FIND MEAN AND STANDARD DEVIATION

We will go over how to find the mean and standard deviation using Excel. It’s important that you pay close attention as I go through the steps, since you will need to use this information to do complete today’s summative assessment.

Page 19: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

HANDOUT

Today we will be finding the mean and the standard deviation of a set of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.

Page 20: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

STANDARD DEVIATION

Standard deviation is a measure of dispersion, of how the data in a set is distributed.

Standard deviation, σ, can be expressed as:

The mean, , can be expressed as:

where x is each term in a set of data, and n is the number of terms in the data.

What does Σ mean?

Page 21: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Brendan works part-time in the canteen at his local community centre. One of his tasts is to unload delivery trucks. He wondered about the accuracy of the mass measurements given on two cartons that contained sunflower seeds. He decided to measure the masses of the 20 bags in the two cartons. One carton contained 227 g bags, and the other carton contained 454 g bags.

How can measures of dispersion be used to determine if the accuracy of measurement is the same for both bag sizes?

Page 22: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

We can find the mean, , and standard deviation, σ, using our calculator.

STAT

EDIT

Enter all of the terms into L1

STAT

CALC

1-Var Stats

We are given all of the information about the data set. Try it with these two sets:

Set 1: Set 2:

The accuracy is not the same for both bags. The 454 g bags are more accurate.

Page 23: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Angele conducted a survey to determine the number of hours per week that Grade 11 males in her school play video games. She determined that the mean was 12.84 h, with a standard deviation of 2.16 h.

Janessa conducted a similar survey of Grade 11 females in her school, she organized her results in this frequency table. Compare the results.

Firstly, we need to use the midpoint of the Hours, since we can’t enter a range into our data.

Midpoints: 4, 6, 8, 10, 12, 14

Page 24: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

For group data, like this, there is an extra step.

Enter Hours into L1, & Frequency into L2

STAT

Midpoints: 4, 6, 8, 10, 12, 14

1-Var

( 2nd STAT L1

STAT L22nd ) ENTER

For the girls, we find: Compared to the boys:Boys have a higher mean (they play more video games), and they have a lower standard deviation (so they’re more consistent than the girls).

,

Page 25: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Independent Practice

PG. 261-265, # 2, 4, 5, 7, 9, 10, 12, 13,

14

Page 26: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 55.4 – THE NORMAL

DISTRIBUTION

Page 27: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

NORMAL DISTRIBUTION

If we rolled a single die 50 000 times, what do you think the graph would look like?

What about two dice?

In partners, roll two dice 50 times. One person rolls, one person keeps a frequency diagram.

Page 28: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

NORMAL DISTRIBUTION

A normal curve is a symmetrical curve that represents the normal distribution; also called a bell curve.

Normal distribution is data that, when graphed as a histogram or a frequency polygon, results in a unimodal symmetric distribution about the mean.

Page 29: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Heidi is opening a new snowboard shop near a local ski resort. She knows that the recommended length of a snowboard is related to a person’s height. Her research shows that most of the snowboarders who visit this resort are males, 20 to 39 years old. To ensure that she stock the most popular snowboard lengths, she collects height data for 1000 Canadian men, 20 to 39 years old. How can she use the data to help her stock her store with boards that are the appropriate lengths?First, enter the data into your calculator. Remember to use midpoints for the height range (i.e. 60.5, 61.5, etc).

Page 30: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Heights within one standard deviation of the mean: 69.52 – 2.98 = 66.5 inches 69.52 + 2.98 = 72.5 inchesHeights within two standard deviations of the mean: 69.52 – 2(2.98) = 63.5 inches 69.52 + 2(2.98) = 75.5 inchesNumber of men within one std. deviation (from 66.5–72.5 in): 116 + 128 + 147 + 129 + 115 + 63 = 698 men

Number of men within two std. deviation (from 63.5–75.5 in): 30 + 52 + 64 + 116 + 128 + 147 + 129 + 115 + 63 + 53 + 29 + 20 = 944 menShe surveyed 1000 men, so 698/1000 or 69.8% of men are 66.5 to 72.5 inches tall, while 944/1000 or 94.4% of men are within 63.5 to 75.5 inches tall.

Page 31: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Jim raises Siberian husky sled dogs at his kennel. He know, from the data he has collected over the years, that the weights of adult male dogs are normally distributed, with a mean of 52.5 lbs and a standard deviation of 2.4 lbs. Jim used this information to sketch a normal curve, with:• 68% of the data within one standard

deviation of the mean• 95% of the data within two standard

deviations of the mean• 99.7% of the data within three standard

deviations of the meanWhat percent of dogs at Jim’s kennel would you expect to have a weight between 47.7 lbs and 54.9 lbs.

Page 32: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

We want to find the percentage of dogs who weigh between 47.7 lbs and 54.9 lbs.

Shade in the area on the provided graph. So, we can see that 68% of dogs will be

between 50.1 and 54.9 lbs, so we just need to know what percent falls between 47.7 and 50.1.

How can we figure it out?

What percent happens between 47.7 and 52.5?

95/2 = 47.5% What percent is between 50.1 and 52.5?

68/2 = 34% So, what percent is left between 47.7

and 50.1? 47.5% – 34% = 13.5%

13.5%

So, 68% + 13.5% = 81.5% are between 47.7 lbs and 54.9 lbs.

Page 33: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Sometimes, we use the symbol μ to represent the mean.

a) Enter it into your calculator, to find that:

μ = 2.526σ = 0.482Median = 2.55 (scroll down in the 1-Var list)

When the median and the mean are close, that suggests the data may be normally distributed.

We can make a histogram using this data on our calculator to check if it is normally distributed. First, enter all of the data into your STAT list.

Page 34: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Making a histogram on your calculator:

2nd Y=

Switch it to ON

ENTER

Use the arrows to navigate to the histogram picture, press ENTER

ZOOM

9

GRAPH

It’s also helpful to change the x-scale under WINDOW to the standard deviation (0.482, in this case)

Page 35: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Now that we have a graph, we can see that it looks like it’s probably normally distributed. However, to make sure, we need to check that it has the right percents—normally distributed graphs have approximately 68% of the data within one standard deviation of the mean, approximately 95% within two standard deviations of the mean, and approximately 99.7% within three. These are important to remember. We can check this on the calculator, using the TRACE function.

When on the graph screen, press TRACE, and then use the arrow keys to find the number of terms within each standard deviation.

In the two tallest bars, we have:n = 18n = 17 That’s within one

standard deviation, so 17 + 18 = 35 terms are within one standard deviation.

35/50 = 70% are within one deviation.

In the next two, we have:n = 7 and n = 6 35 + 7 + 6 = 48

terms are within 2 standard deviations

48/50 = 96%

In the last, we have:n = 1 and n = 1 48 + 2 = 50 50/50 = 100%

are within three standard deviations70%, 96% and 100% are pretty close to the

percents for a normal distribution, so we can say that this data approximates a normal distribution.

Page 36: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

b) If Shirley purchases this cellphone, what is the likelihood that it will last for more than three years?

Sketch a frequency diagram, and show one standard deviation (0.482) above the mean (2.526):

Using the standard normal distribution percents (68%, 95%, 99.7%), we know that 50% of data should fall below the mean of 2.526.

68%/2 = 34%, so approximately 34% should fall within one standard deviation above the mean.

So, approximately, what percent would be lasting longer than 3? 100% – 34% – 50% = 16% There is a 16% chance that her

cellphone will last more than three years.

Page 37: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

STANDARD NORMAL DISTRIBUTION

Page 38: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Independent Practice

PG. 279-282, # 1, 3, 4-7, 9, 12-16

Page 39: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 55.5 – Z-SCORES

Page 40: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Z-SCORES

A z-score is a standardized value that indicates the number of standard deviations of a data value above or below the mean (it tells you how far away from the mean a value is–the higher the z-score, the further a value is from the mean).

The standard normal distribution is a normal distribution that has a mean of zero and a standard deviation of one.

When we do problems involving z-scores, we need to use the z-score tables on pages 580-581.

Page 41: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Hailey and Serge belong to a running club in Vancouver. Part of their training involves a 200 m sprint. Below are normally distributed times for the 200 m sprint in Vancouver and on a recent trip to Lake Louise. At higher altitudes, run times improve.

Determine at which location Hailey’s run time was better, when compared with the results.

For any score, x, we have x = μ + zσ, where z represents the number of standard deviations of the score from the mean.

Solve for z:

Page 42: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE (CONTINUED)

Vancouver: Lake Louise:

In this case, is a score that is more negative a good or bad thing?

Hailey’s time was better than the mean in both places, however, since her z-score was lower for Lake Louise, her better time was in Lake Louise.

Try it: Use z-scores to figure out with of Serge’s run times was better.

Page 43: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

IQ tests are sometimes used to measure a person’s intellectual capacity at a particular time. IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15. If a person scores 119 on an IQ test, how does this score compare with the scores of the general population?

Diagram:

119

Now that we have the z-score, we can find the percentile of the score, which tells you the percent of people who would be lower on the graph.

The value in the z-score table is 0.8980. This means that a person who scores 119, scores greater than 89.80% of the population. We say that he’s in the 89.8th percentile.

Page 44: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

Athletes should replace their running shoes before the shoes lose their ability to absorb shock. Running shoes lose their shock-absorption after a mean distance of 640 km, with a standard deviation of 160 km. Zack is an elite runner and wants to replace his shoes at a distance when only 25% of people would replace their shoes. At what distance should he replace his shoes?Diagram:

Now, we use the z-score table in the opposite way: look for the number closest to 0.25 in the middle of the table.

What is the accompanying z-score?0.2500 isn’t an option on the table, so we

see that it must fall between 0.2483 and 0.2514. That means that the z-score falls between –0.67 and –0.68. We can say that z = –0.675.

He should replace his shoes after 532 km.

Page 45: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

The ABC Company produces bungee cords. When the manufacturing process is running well, the lengths of the bungee cords produced are normally distributed, with a mean of 45.2 cm and a standard deviation of 1.3 cm. Bungee cords that are shorter than 42.0 cm or longer than 48.0 cm are rejected by the quality of control workers. If 20 000 bungee cords are manufactured each day, how many bungee cords would you expect the quality control workers to reject?Minimum = 42 cm

Maximum = 48 cm

We want to find the area to the right of the max, so the process is different.2.15 = 1 – 0.98422.15 = 0.0158

There is 0.69% below the minimum, and 1.58% above the max. They will reject approximately 0.69 + 1.58 = 2.27% of the bungee cords.

(0.0227)(20 000) = 454

So, they would reject approximately 454 cords.

Look up the percentiles for these values:–2.46 = 0.0069

Page 46: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

EXAMPLE

A manufacturer of personal music players has determined that the mean life of the players is 32.4 months, with a standard deviation of 6.3 months. What length of warranty should be offered if the manufacturer wants to restrict repairs to less than 1.5% of all the players sold?

1.5% = 0.015

Find the z-score for 0.0150, using your calculator

z = –2.17

They should offer an 18-month warranty.

Page 47: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA
Page 48: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Independent Practice

PG. 292-294 # 1-4, 6, 8, 10, 12, 13, 15, 17,

20, 23

Page 49: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Chapter 55.6 – CONFIDENCE

INTERVALS

Page 50: Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA

Independent PracticePG. 302-304, #