statistics class 2 1/25/2012. intro to statistics data are collections of observations (such as...

35
Statistics Class 2 1/25/2012

Upload: martina-powers

Post on 31-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Statistics Class 2

1/25/2012

Page 2: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Intro to Statistics

Data are collections of observations (such as measurements, genders, survey responses).

Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

 

A Population is the complete collection of all individuals (scores, people, measurements, and so on) to be studied. The collection is complete in the sense that it includes all the of the individuals to be studied.

Page 3: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Statistical Thinking

A Census is the collection of data from every member of the population.

 

A Sample is a sub-collection of members selected from a population.

When conducting statistical analysis it is important to consider:• Context of the data•Source of the data•Sampling method•conclusions•practical implications

Page 4: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Context of Data

When examining data it is important to consider the context of the data.  Consider the statement below by John Allen Paulos.

 The problem isn’t with statistical tests themselves but with what we do before and after we run them. First, we count if we can, but counting depends a great deal on previous assumptions about categorization. Consider, for example, the number of homeless people in Philadelphia, or the number of battered women in Atlanta, or the number of suicides in Denver. Is someone homeless if he’s unemployed and living with his brother’s family temporarily? Do we require that a women self-identify as battered to count her as such? If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself?

Page 5: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Source of Data

Consider this news clip from UK based paper The Independent:

 Tobacco companies are funding research into infertility in a bid to counter widespread evidence that smoking drastically undermines the chances of conceiving. Philip Morris, one of the world's largest cigarette firms, is being accused by the anti-smoking lobby of attempting to deceive smokers into believing they can improve their chances of having children if they take vitamin supplements.

Page 6: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Sampling Method

Collecting sample data fro a study can have a great influence on the result of the study.

 •Literary digest poll page 3 of the text.(Demographics)

 •Question: Do you think the site Rate my Professor has

accurate data or results?(Self Selection)

Page 7: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Conclusions

It is important to state conclusions carefully, to avoid claiming more than your results justify.

Practical Implications

Sometimes the Statistical significance of a study can differ from its practical significance. 

Example 6 in your book sites the results of a study of the Atkins weight loss program. The mean weight loss of the program was 2.1 pounds after 1 year.

Page 8: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Statistical Significance

Ex 7: Apparently using MicroSort 13 out of 14 couples that wanted to have girls had girls.  Normally there is about a 1 in 1000 chance of that happening.

Ex 8:  What if instead of 13 out of 14, only 8 out of 14 couples had a baby girl? 

The results from example 7 would be Statistically Significant, while those of example 8 would not.

Page 9: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Data 

A Parameter is a numerical measurement describing some characteristic of a population.

A Statistic is a numerical measurement describing some characteristic of a sample.

Quantitative (or numerical) data consist of numbers represent counts or measurements.

Categorical (or qualitative or attribute) data consists of names or labels that are not numbers representing counts or measurements.

Page 10: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Discrete data result when the number of possible values is either a finite number or a "countable" number.

 

Continuous (numerical) data result from infinitely many possible value that correspond to some continuous scale that covers a range of values without gaps, interruption, or jumps.

The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high).

Data are at the ordinal level of measurement if they can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.

Page 11: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

The interval level of measurement is like the ordinal level, with additional property that the difference between any two data values is meaningful. However, data at this level do not have a natural zero starting point (where none of the quantity is present).

The ratio level of measurement is the interval level with the additional property that there is also a natural zero starting point (where zero indicates none of the quantity is present). For values at this level, differences and ratios are both meaningful.

Page 12: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Levels of Measurement

Ratio: There is a natural zero starting point and ratios are meaningful.

ex: Distances

Interval: Differences are meaningful, but there is no natural zero starting point and ratios are meaningless.

ex: Body temperatures in degrees Fahrenheit

Ordinal: Categories are ordered, but differences can't be found or are meaningless.

ex: Ranks of colleges in U.S. News and World Report

Nominal: Categories only. Data cannot be arranged in an ordering scheme.

ex: Eye Colors

Page 13: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Quiz 1

1.Explain the difference between a Census  and a Sample. 

2.Explain the difference between numerical and Categorical data.

3.Give an example of data that is at the nominal level of measurement.

Page 14: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Quiz 1 Solutions

1.Explain the difference between a Census  and a Sample. A census is a collection of data from the entire population while a sample is from a subset.

2.Explain the difference between numerical and Categorical data. Numerical data consists of numbers. Categorical data consists of names of labels or categories

3.Give an example of data that is at the nominal level of measurement. Political affiliation.

Page 15: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Critical Thinking  

We must think carefully about the context of the data the source of the data, the method used in data collection, the conclusions reached, and the practical implications

 

 Fun Quotes•"There are three kinds of lies: lies, damned lies,

and statistics"-Benjamin Disraeli•"Figures don't lie; liars figure."-Mark Twain•"There are two kinds of statistics, the kind you

look up, and the kind you make up."-Rex Stout

Page 16: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Bad StatisticsBad statistics happen either by evil intent or unintentional errors.  How to Lie with Statistics a book written by Darrell Huff in 1954, is the classic text on this topic and has many examples of intentional or unintentional misuses of statistics. Misuse of graphs is a common way to misrepresent data or results. 

Page 17: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Bad Samples may result in incorrect findings as well.  Bad samples occur when the methods used to collect the data results in a biased sample.  So that the sample does not

represent the population from which it was obtained

Page 18: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

A voluntary response sample (or self-selected sample) is one in whch the respondents themselves decide whether to be included.

Ex: Any poll or survey where the readers or listeners decide to participate

Another way to misinterpret statistical data is to find a statistical association between two variables and to conclude that one of the variable caused the other.  The relationship is called a correlation.  When one of the variables does cause a change in the other, then we have causality.

Page 19: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Correlation and Causality

Page 20: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Reported results

When collecting data from people, it is better to take the measurements yourself rather than rely on subjects to report results.

Ex. 3 Voting Behavior When surveyed about whether they voted or not about 70% of 1000 eligible voters reported that they had voted.  Voting records showed that only 61% had indeed voted.

Page 21: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Small Samples

Conclusions should not be drawn from samples that are far too small.

Ex. 4 The Children's defense fund published an article Children out of School in America, in which it was reported that in a certain school district that 67%  of the students were suspended at least three times.  The figure is based on a sample of only 3 students.

Page 22: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Percentages

Percentages can be cited in a manner that is either unclear or misleading.  The fact is that a 100% of something is all of the something.  So if you see percentages about 100% being cited it is probably not justified.  

Page 23: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Other Sampling considerations

Wording of a question

97% yes: "Should the President have the line item veto to eliminate waste?"

57% yes: "should the President have the line item veto, or not?"

Order of Questions 

Nonresponse

Missing Data Phone surveys miss people without phones

Self-Interest Study Kiwi Shoe Polish, getting a job

 

Page 24: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Other Sampling Considertations

Precise Numbers You cannot assume precise numbers are accurate

Deliberate Distortions Avis vs Hertz

Page 25: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Collecting Sample Data

If sample data are not collected in an appropriate way, the data may be so completely useless that no amouicsnt of statistical torturing can salvage them.

In an Observational Study, we observe and measure specific characters, but we don't attempt to modify the subjects being studied.

In an experiment, we apply some treatment and then proceed to observe its effects on the subjects. (Subjects in experiments are called experimental units.)

Page 26: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples

A Simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.

In a random sample members from the population are selected in such a way that each individual member in the sample has an equal chance of being selected.

 

A probability sample involves selecting members from a population in such a way that each member of the population has a know (but not necessarily the same) chance of being selected

Page 27: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? 

Page 28: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Page 29: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample?

Page 30: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

Page 31: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

 

Probability sample? 

Page 32: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

 

Probability sample?  Yes since each senator has a know chance of being selected.

Page 33: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Other Sampling Methods

In Systematic sampling, we select some starting point and then select every kth element in the population.

 

With convenience sampling, we simply use the results that are very easy to get.

 

With Stratified sampling, we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket), then we draw a sample from each subgroup (or stratum).

 

In Cluster sampling, we first divide the population area into sections (or clusters), then randomly select  some of those clusters, and then chose all the members from those selected clusters.

Page 34: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Other Sampling Methods

Multistage sampling occurs when pollsters collect data using a combination of the basic sampling methods. In a multistage sample design, pollsters select a sample in different stages, and each stage might use different methods of sampling.

Page 35: Statistics Class 2 1/25/2012. Intro to Statistics Data are collections of observations (such as measurements, genders, survey responses). Statistics is

Homework

1-2: 1-14, 15-25. odd

1-3: 1-4, 5, 7, 11-17 odd, 21-33 odd.

1-4: 1-4, 8-20, 23, 25

1-5: 1-4, 5-26

Say Hello via a comment on my website

Extra Credit Coupons for articles citing devious uses of statistics.

Remember the odds are in the back :)