ways to look at the data

20
Ways to look at the data umber of hurricanes that occurred each year from 19 hrough 2000 as reported by Science magazine 4 8 12 16 0 2 4 6 8 H urricanes C h04_H urricanes Histogram 0 1 2 3 4 5 6 7 8 H urricanes C h04_H urricanes DotPlot 0 1 2 3 4 5 6 7 8 C h04_H urricanes Box Plot Histogram Dot plot Box plot

Upload: lenci

Post on 09-Feb-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Ways to look at the data. Box plot. Dot plot. Histogram. Number of hurricanes that occurred each year from 1944 through 2000 as reported by Science magazine. 3 Characteristics of data. Shape Center Spread. Shape of the data – Symmetric. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ways to look at the data

Ways to look at the data

Number of hurricanes that occurred each year from 1944through 2000 as reported by Science magazine

4

8

12

16

0 2 4 6 8Hurricanes

Ch04_Hurricanes Histogram

0 1 2 3 4 5 6 7 8Hurricanes

Ch04_Hurricanes Dot Plot

012345678

Ch04_HurricanesBox PlotHistogram Dot plot Box plot

Page 2: Ways to look at the data

3 Characteristics of data

Shape

Center

Spread

Page 3: Ways to look at the data

Shape of the data – Symmetric

The age of all US Presidents at the time they took office

Notice that this distribution has only one mode

2

4

6

8

10

12

14

Age (years)40 45 50 55 60 65 70 75

Presidents Histogram

Page 4: Ways to look at the data

Shape of the data – Bimodal

10

20

30

40

50

Time110 120 130 140 150 160 170 180

Ch04_Kentucky_Derby Histogram

The winning times in the Kentucky Derby from 1875 to thepresent. Why two modes?

Page 5: Ways to look at the data

Shape of the data – Bimodal

10

20

30

40

50

Time110 120 130 140 150 160 170 180

Ch04_Kentucky_Derby Histogram

The winning times in the Kentucky Derby from 1875 to thepresent. Why two modes?

The length of the track was reduced from 1.5 miles to1.25 miles in 1896. The race officials thought that 1.5 miles was too far.

Page 6: Ways to look at the data

Shape of the data – skewed

Data for two different variables for all female heart attackpatients in New York state in one year. One is skewed left; the other is skewed right. Which is which?

LEFT RIGHT

Page 7: Ways to look at the data

Center and Spread of Data

Maximum

Q3

Median

Q1

Minimum

100th percentile

75th percentile

50th percentile

25th percentile

0th percentile 012345678

Ch04_HurricanesBox Plot

These numbers are called the 5 number summary.The median measures the center of the data.Q3 – Q1 = Interquartile range (IQR) measures the spread.

Page 8: Ways to look at the data

Symbols:• s2 = Sample Variance• s = Sample Standard Deviation 2 = Population Variance (Pop. St. Dev. Squared) = Population Standard Deviation (Sq. Root of Variance)• REMEMBER-The Variance is the SD squared!And the SD is the Sq. root of the Variance!• x = Mean

Symbols

x

xxxxxxx

--

Page 9: Ways to look at the data

The normal distribution and standard deviations

The total area under the curve is 1.

In a normal distribution:

34% 34%

13.5% 13.5% 2.35%2.35%

Page 10: Ways to look at the data

The normal distribution and standard deviations

Approximately 68% of scores will fall within one standard deviation of the mean

In a normal distribution:

Page 11: Ways to look at the data

The normal distribution and standard deviations

Approximately 95% of scores will fall within two standard deviations of the mean

In a normal distribution:

Page 12: Ways to look at the data

The number of points that one standard deviations equals varies from distribution to distribution. On one math test, a standard deviation may be 7 points. If the mean were 45, then we would know that 68% of the students scored from 38 to 52.

24 31 38 45 52 59 63Points on Math Test

30 35 40 45 50 55 60Points on a Different Test

On another test, a standard deviation may equal 5 points. If the mean were 45, then 68% of the students would score from 40 to 50 points.

2.35% 13.5% 34% 34% 13.5% 2.35%

2.35% 13.5% 34% 34% 13.5% 2.35%

Page 13: Ways to look at the data

Using standard deviation units to describe individual scores

100 110 1209080-1 sd 1 sd 2 sd-2 sd

What score is one sd below the mean? 90

What score is two sd above the mean? 120

Here is a distribution with a mean of 100 and standard deviation of 10:

Page 14: Ways to look at the data

Using standard deviation units to describe individual scores

Here is a distribution with a mean of 100 and standard deviation of 10:

100 110 1209080-1 sd 1 sd 2 sd-2 sd

How many standard deviations below the mean is a score of 90? 1

2How many standard deviations above the mean is a score of 120?

Page 15: Ways to look at the data

Using standard deviation units to describe individual scores

Here is a distribution with a mean of 100 and standard deviation of 10:

100 110 1209080-1 sd 1 sd 2 sd-2 sd

What percent of your data points are < 80? 2.50%

84%What percent of your data points are > 90?

Page 16: Ways to look at the data

Types of Sampling:Self-selected Sample

• This methods allows the sample to choose themselves by responding to a general appeal (volunteering to be surveyed).

• Examples of Self-selected Sample: a call-in radio poll, an internet poll on a website

• Problems with Self-selected samples: bias – because people with strong opinions on the topic (especially negative opinions) are most likely to respond.

Page 17: Ways to look at the data

Convenience Sampling• In a convenience sample individuals are

chosen because they are easy to reach.• Example: People conducting a survey go to

the mall and stop people who are shopping. This is convenient for the person doing the survey but does not guarantee that the sample is representative of the population of the study.

• Convenience sampling also involves bias on the part of the interviewer.

Page 18: Ways to look at the data

Random Samples

• A random sample of size “n” individuals from the population chosen in such a way that every set of “n” individuals has an equal chance to be the sample selected.

• Example: Putting everyone’s name in a hat and drawing 3 names to participate in the study.

Page 19: Ways to look at the data

Systematic Sample

• When a rule is used to select members of the population.

• Ex. Every third person on an alphabetized list

Page 20: Ways to look at the data

Stratified Random Sample

To select a stratified random sample, first divide the population into groups of similar individuals, called STRATA. Then choose a separate sample in each strata and combine these to form the full sample. Common example would be separating by gender or race first, then selecting samples from each group.