1 simpson paradox and related problems. 2 simpson paradox 1960’s admission data show that male and...
TRANSCRIPT
1
Simpson Paradox
And related problems
2
Simpson Paradox
• 1960’s Admission data show that male and female have different admission rates when entering a famous University Graduate School.
• But every relevant person of the graduate school claimed that they are very fair in the process.
3
Hypothetical Data
• Two schools (Arts and Engineering)
• Male admission rate = 35/80 = .44
• Female admission rate = 20/60 = .33
4
2 by 2 table
admit deny
male 35 45
female 20 40
5
Further analysis
• School of Art
• Male admission rate = 5/20 = .25
• Female admission rate = 10/40 = .25
• School of Engineering
• Male admission rate = 30/60 = 0.5
• Female admission rate = 10/20 = 0.5
6
Data
• Notice that
• 30 + 5 = 35
• 30 + 15 = 45
• 10 + 10 = 20
• 10 + 30 = 40
7
School of arts
admit deny
male 30 30
female 10 10
8
School of Engineering
Admit Deny
Male 5 15
Female 10 30
9
Why?
• In each school, we can see that it is fair.
• But, on the whole, it seems that the female students are discriminated.
10
Reason
• More female students apply for school of arts
• The admission rate for school of arts is low for both male and female
11
Maori versus Non-Maori
Age Maori
(deaths/1000)
Non-Maori (deaths/1000)
0-4 3.68 2.75
5-14 .28 .27
15-24 1.26 1.06
25-44 2.44 1.31
45-64 15.0 8.76
65+ 67.36 54.75
12
But, on the whole
• For Maori, death rate = 4.65/1000
• For non-Maori, death rate = 8.35/1000
13
The lesson we learn
• We cannot draw a conclusion based on the data without understanding how the data are obtained.
14
Causal relation
• When we say that there is a sex discrimination in the admission process, we mean that sex is a cause and admission is the consequence.
• How can we come to conclude factor A causes the outcome B in Science?
15
Some possible mistakes
• Data---from hospital record
• Death rates of surgical patients are different for operations with different anesthetics
• Halothane (1.7%), Pentothal (1.7%), Cyclopropane (3.4%), Ether (1.9%)
• Can we say that cyclopropane is more dangerous than the other anesthetics?
16
Answer• No! the worst patients were receiving
cyclopropane.
17
Study the effect of vaccine on preventing Polio
• Can we apply the vaccine to all students and compare the proportion of students having polio at the end of year with the proportion in last year?
• Can we apply the vaccine to all students in New York City and compare with proportion of students having polio with the corresponding proportion in Chicago?
18
Further questions
• Can we compare the above proportion of students from private school with that of private school?
• Can we compare the above proportion of male students with that of female students?
19
How to know the effect of vaccine in preventing polio
• We need two groups: control group (no “real” treatment) treatment group (apply the vaccine)
20
We should compare the two groups under “equal” conditions
• People are different from each other
• By random assignment of participants into the two groups, we can make the two groups have almost identical conditions – e.g., around the same on average
21
Real difficulties
• There are many factors that will affect the outcome, it is impossible to control all of them
22
Design of an Experiment
• For comparing one treatment (A) with the other treatment (B), we need to randomize the patient into each group receiving one of the treatments
23
The vaccine can prevent Polio
• 1956---USA---over two million children involved
• Can we let the students voluntarily select their own treatment?
24
Randomization
• We need to randomly assign each school children to receive vaccine or placebo
• The purpose of such randomization is to ensure the comparability of the two groups
• Unfortunately many physicians could not understand the importance of the randomization
25
Placebo• In this case, placebo is another kind of
liquid, which is similar to the vaccine in its outlook, injected into the children.
• It is used so that all children were receiving “same” treatment. So that the difference in the results would not be explained as psychological effect
26
DataPolio (after half year)
No polio (after half year)
Control (placebo)
A=115 B=201,114
treatment C=33 D=200,712
27
An example
• The University Group Diabetes Program
• Randomly assign patients to 4 groups:
• Group 1: Placebo
• Group 2: Tolbutamide
• Group 3: Insulin Standard
• Group 4: Insulin variable
28
The results are controversial
• Is it really random?
29
Seven risk factors
• There are 7 risk factors related to diabetes
• Age of 55 or older, High Blood Pressure, History of Chest Pains, Electrocardiogram (EEG), history of digitalise use, High Cholesterol level, overweight and Calcification of the arteries
30
Risk factor distributions
No of RF I II III IV
0 28 25 22 15
1 60 50 62 76
2 59 58 60 57
3 26 34 34 30
4 10 17 8 4
5 2 4 8 4
6 0 1 1 1
31
Surprise?
• The distribution in the four groups are almost identical
• Notice that the study of the distribution is carried out after the experiment is done. It is quite likely that the randomization would make all potential risk factors equally distributed across the groups
32
Exercise one
• How to show that vitamin C can prevent catching cold?
33
FDA
• Food and Drug Administration
• Guidelines for developing drugs and treatments
• Statisticians should be involved in the design of the experiment and analysis of the data
34
Some past errors
• Hormone therapy (approved by FDA)---treat menopausal symptoms and to prevent osteoporosis, or age-related loss of bone density
• Later experiments showed that it does not protect against heart diseases or strokes and it increases the risk of dangerous blood clots and gallbladder disease.
35
Smoking and Lung Cancer
• For moral reason, we cannot randomly assign a person to smoke or not to smoke
36
Observational study
• Case-Control study• We study the smoking habit of patients with
lung cancer in the hospital• In the same hospital, we study the smoking
habit of patients of other diseases (without lung cancer, around same age, gender)
• Or, we can study the individuals without lung cancer from the same community
37
Example
• Oral contraceptives and Thromboembolic diseases
• Cases—all women in the hospital having thromboembolic diseases
• Control--?
38
Selection of controls
• Hospital---same as case
• Discharge date---same 6-month interval
• Discharge status---all alive
• Age—same 5-year span
• Marital status---same
• Residence---same metropolitan area
• Race---same
39
Selection of controls
• Parity---same (no pregnancies, one or two, three or more)
• Hospital status---same (ward, semiprivate, or private room)
40
Observational study
• Cohort study
• At the beginning, we have two groups, one smoking and the other non-smoking
• Wait for 5 years and study the proportions of persons getting lung cancer in the two groups
41
Cancer risk
• Many reports on the cancer risk were based on observation studies. Their results were not really reliable.
42
Exercise two
• Think about the validity of using case-control study in the following task---to show salted fish can cause nasopharyngeal cancer.
43
Question
• Comment on the following?• In a 1996 study by Dr. Leslie Wolfson of the
University of Connecticut, tai chi was compared to balance training, strength training, and combined balance and strength training in people with an average age of Eighty. Those who learned tai chi gained significantly more balance and strength than the other groups.
44
Case 1
• We obtain data on recoveries for males and females who have received a treatment (t) and a control ©
45
Males
R=1 R=0
T=t 18 12
T=c 7 3
46
Females
R=1 R=0
T=t 2 8
T=c 9 21
47
Combined
R=1 R=0
T=t 20 20
T=c 16 24
48
Question
• The recovery rate is higher for T=c for both males and females
• But the recovery rate is higher for T=t for the combined group?
• For a new subject whose gender is unknown, which treatment should we prefer, t or c?
49
Another situation
• Data on yields and heights for samples of black and white plants
50
Tall
Y=1 Y=0
C=w 18 12
C=b 7 3
51
Short
Y=1 Y=0
C=w 2 8
C=b 9 21
52
Combined
Y=1 Y=0
C=w 20 20
C=b 16 24
53
Question
• Should we plant a white (C=w) or a black variety of plant, in ignorance of the height the plant will grow to?