chapter 5.1 data production - math with...

24
AP Statistics

Upload: vuongduong

Post on 16-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

AP Statistics

Page 2: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

One way to gather data is by generating a sample survey, carefully selecting it using (among many types) an SRS, a stratified random sample, or a multistage sample.

The idea is to integrate deliberate randomness in the selection process of individuals as to avoid potential bias. The goal is to get a picture of the population, disturbed as little as possible in order to gather information.

Page 3: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

An observational study observes individuals and measures variables of interest but does not attempt to influence the response.

Sample surveys

However, an observational study is a poor way to gauge the effect of an intervention.

Page 4: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

When our goal is to understand cause and effect, experiments are the only source of fully convincing data.

An experiment deliberately imposes some treatment on individuals in order to observe their responses.

Goal of research is to establish a causal link between a particular treatment and a response.

Page 5: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Why do you consume caffeine?

What kind of potential side effects are there then?

Could any variables that could be confounded with taking caffeine? (What else could create that side-effect?)

Research question: Does caffeine affect pulse rate?

Page 6: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

The individuals on which the experiment is done are the experimental units; human experimental units are called subjects.

The experimental condition applied to the units (aka the thing we ‘do’ to the participants) is called the treatment.

Page 7: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

The distinction between explanatory and response variables is essential: the explanatory variables are often called factors.

Factors: number of variables interested in (example: Study differences of gender and alcohol preference. 2 factors: Gender, alcohol preference)

Levels: number of ‘categories’ for each: (gender has 2 levels…M/F, Alcohol lets say has 3 levels…hard liquor/beer/wine)

Page 8: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Researchers at the University of North Carolina were concerned about the increasing dropout rate in the state’s high schools, especially for low-income students. Surveys of recent dropouts revealed that many of these students had started to lose interest during middle school. They said they saw little connections between what they were studying in school and their future plans. To change this perception, researchers developed a program called CareerStart. The big idea of the program is that teachers show students how the topics they learn get used in specific careers.

To test the effectiveness of CareerStart, the researchers recruited 14 middle schools in Forsyth County to participate in an experiment. Seven of the schools, chosen at random, used CareerStart along with the distict’s standard curriculum. The other seven schools just followed the standard curriculum. Researchers followed both groups of students for several years, collecting data on students’ attendance, behavior, standardized test scores, level of engagement in school, and whether the students graduated from high school. Results: Students at schools that used CareerStart generally had better attendance and fewer discipline problems, earned higher test scores, reported greater engagement in their classes, and were more likely to graduate.

Identify the experimental units, explanatory and response variables, and the treatments in the CareerStart experiment.

Page 9: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Experimental Units: 14 middle schools in Forsyth County, NC

Explanatory Variable/Factor: Whether or not the school used CareerStart with its students

Treatments: 2: (1)Standard Middle School Curriculum, and (2) the standard curriculum plus CareerStart

Response Variables: Test Scores Attendance Behavior Student engagement Graduation Rages

Page 10: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

What are the effects of repeated exposure to an advertising message? The answer may depend on both the length of the ad and on how often it is repeated. An experiment investigated this question using undergraduate students as subjects. All subjects viewed a 40-minute television program that included ads for a digital camera. Some subjects saw a 30-second commercial; others, a 90-second version. The same commercial was shown either 1, 3, or 5 times during the program. After viewing, all the subjects answered questions about their recall of the ad, their attitude towards the camera, and their intention to purchase it.

For the advertising study,(a) identify the explanatory and response variables

(b) list all the treatments

Page 11: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Explanatory Variable/Factor: 2: (1): length of the commercial and (2) number of repetitions

Response Variables: Recall of the ad Attitude towards the camera Their intention for purchase

Treatments: 2 different lengths of commercials (30 or 90 seconds) combined with three repetition types (1, 3, and 5), makes a total of 6 combinations of different treatments…

Page 12: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Should women take hormones such as estrogen after menopause, when natural production of these hormones ends? In 1992, several major medical organization said “yes.” Women who took hormones seemed to reduce their risk of a heart attack by 35% to 50%. The risks of taking hormones appeared small compared with the benefits.

The evidence in favor of hormone replacement came from a number of observational studies that compared women who were taking hormones with others who were not. But the women who chose to take hormones were richer and better educated and saw doctors more often than women who didn’t take hormones. Because women who took hormones did many other things to maintain their health, it isn’t surprising that they had fewer heart attacks.

What kind of study was this?

Explanatory variable?

Response variable?

In the early observational studies, the effect of taking hormones was mixed up with the characteristics of women who chose to take them.

Page 13: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable.

Two lurking variables in the hormone study: Number of doctor visits per year: The women who took

hormones saw their doctors more often than those who didn’t

Age

Did the women in the hormone group have fewer heart attacks because they took hormones, or because they got better health care?

Page 14: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

When two variables, lurking or otherwise (such as attributing heart attack risk to better care versus taking hormones), are unable to separate the effects from each other on a response variable, it is called confounding.

Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other.

Page 15: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

We use lab experiments often to protect us from lurking variables which may happen when conducting experiments ‘in the field’ or with living subjects

Fortunately, with a control group we can compare treatments; instead of giving everyone in the study the treatment and examine the results in isolation, we can give one group the treatment and another a placebo. That way, it enables us to control the effects of lurking variables, which is the first principle of statistical design.

When a study compares two treatments in such a way, the experimental design is called comparative. The control provides a baseline for comparing the effects of other treatments.

Without comparing treatments, the result is often bias, systematically favoring one outcome (how convenient for money-making companies!)

Page 16: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Well-designed experiments take steps to prevent confounding.

Observational studies often fail because of confounding between the explanatory variable and potentially multiple lurking variables.

Page 17: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Step 1: Determine the response variable(s), the factors, specific treatments, and if necessary, make a “before” measurement.

In our example: Identify the response variable(s), the factors/explanatory variables, specific treatments, and do we need to make a “before” measurement?

Page 18: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Experimental Units: 30-ish (depends on attendance!) AP Stats students from THS

Explanatory Variable/Factor: Whether or not the subjects consumed caffeine in soda

Treatments: 2: (1)Regular Coca-Cola (2) Caffeine-free Coca-Cola

Response Variables: We’re just measuring heart rate

Page 19: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Comparison of treatment to control alone isn’t enough to produce trustful results, as there could still be bias.

Random assignment means that experimental units are assigned to treatments at random using some form of chance process. This eliminates the opportunity for bias by not allowing participants to choose which group to volunteer for. When the subjects are assigned to treatments by chance, we

call the experimental design randomized.

In our class experiment, how did we decide the random assignment? What other way could we have?

Page 20: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

In completely randomized experimental design, the treatments are assigned to all the experimental units completely by chance.

Can compare any number of treatments.

Constructing an outline of a completely randomized design is an effective way to see the distribution of treatments; on the AP test, using it in isolation is not enough as you must also describe how the treatments were assigned, and identify what will be measured/compared.

Page 21: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

vvvvv

The diagrams include the treatments, the response variable, the number of units in each group, but also how it was randomized and what is being compared.

Page 22: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

Music students often don’t evaluate their own performances accurately. Can small-group discussions help? The subjects were 29 students preparing for the end-of-semester performance that is an important part of their grade. Assign 15 students to the treatment: videotape a practice performance, ask the student to evaluate it, then have the student discuss the tape with a small group of other students. The remaining 14 students form a control group who watch and evaluate their tapes alone. At the end of the semester, the discussion-group students evaluated their final performance more accurately.

1. Outline a completely randomized design for this experiment.

2. Describe how you would carry our the random assignment. Provide

enough detail that a classmate could implement your procedure.

3. What is the purpose of the control group in this experiment?

Page 23: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined

2. To get a simple random sample, random numbers could be generated via a random number table or calculator/computer program. Alphabetize the class and assign each student a number from 01-29 and using your random number generator of choice, choose 15 numbers to be part of the treatment group, the group who meets to discuss the videos.

3. The purpose of the control group is to compare the treatment group against. So in this experiment, the purpose is to see how accurately one can evaluate his/her work.

Page 24: Chapter 5.1 Data Production - Math with Mayermathwithmayer.weebly.com/uploads/3/7/2/7/37277397/ap_stats_5.2...Treatments: 2 different lengths of commercials (30 or 90 seconds) combined