Download - Health Economics- Lecture Ch03
-
8/3/2019 Health Economics- Lecture Ch03
1/28
Statistical Tools
Dr. Katherine Sauer
Metropolitan State College of Denver
Health Economics
-
8/3/2019 Health Economics- Lecture Ch03
2/28
Outline:
I. Hypothesis Testing
II. Difference of MeansIII. Regression Analysis
-
8/3/2019 Health Economics- Lecture Ch03
3/28
I. Hypothesis Testing
A. Simple Hypothesis
Men and women smoke different numbers of cigarettes
State the hypothesis:
Null hypothesis
(hypothesis we wish todisprove):
H0: cm = cw
ex: men and women
smoke the same number
of cigarettes
Alternative hypothesis
(hypothesis that theory
suggests to be the case)
H1: cm cw
ex: men and women donot smoke the same
number of cigarettes
-
8/3/2019 Health Economics- Lecture Ch03
4/28
B. Composite Hypothesis
Rich people spend more on health care than do poor
people
State the hypothesis
Null hypothesis
(hypothesis we wish todisprove):
H0: Er= Ep
ex: the rich and poor
spend the same amount
Alternative hypothesis
(hypothesis that theory
suggests to be the case)
H1: Er> Ep
ex: the rich spend morethan the poor
-
8/3/2019 Health Economics- Lecture Ch03
5/28
II. Difference in Means
Consider the example of mens and womens smoking.
To compare mens and womens smoking rates wecould ask people from the population at-large how many
cigarettes they smoke per day.
Since we cant ask everyone, how do we decide uponthe sample to use?
-
8/3/2019 Health Economics- Lecture Ch03
6/28
Since many things other than gender may affect the
number of cigarettes a person smokes, we can account
for this by selecting a sample of people randomly
from the universe of all people.
We could also select a sample of people from a
relatively homogeneous group, like, college
sophomores from a given college.
-
8/3/2019 Health Economics- Lecture Ch03
7/28
Types of Data
Continuous - natural measures that in principle could take
on different values for each observation
ex: height, weight, income, price
Categorical - refer to arbitrary categories
ex: gender (male or female)
race (black, white, or other)
location (urban or rural)
Is the number of cigarettes smoked continuous or
categorical?
-
8/3/2019 Health Economics- Lecture Ch03
8/28
Using NIH data for smokers from 2001 and 2002 it wasfound that:
For 4,714 men, cm = 15.60 cigarettes per day
For 4,841 women, cw = 13.47 cigarettes per day
the difference is = cm cw = 2.13 cigarettes per day
-
8/3/2019 Health Economics- Lecture Ch03
9/28
The data shows a difference in the average number of
cigarettes smoked per day by men and women.
Does the difference represent a true difference
between men and women smoking?
or
Did the sample randomly draw a higher average level
for men (15.60) than for women (13.47)?
Lets look at the sample distribution.
-
8/3/2019 Health Economics- Lecture Ch03
10/28
Based on the distribution,
some men and somewomen smoked far fewer
and some smoked far
more than the average.
Variance is a measure of
the dispersion of
cigarettes smoked around
the average.
mean: men (15.60) , women (13.47)
-
8/3/2019 Health Economics- Lecture Ch03
11/28
The larger the variance, the dispersion around the mean
is large.
- another observation may be far from the
sample mean
The smaller the variance, the dispersion around the
mean is small.
- another observation is likely close to the
sample mean
In testing a hypothesis, would you rather see a large or
small variance in your sample data?
-
8/3/2019 Health Economics- Lecture Ch03
12/28
The square root of the variance is called the standarddeviation,s.
A larger standard deviation indicates more dispersion
around the mean.
A smaller standard deviation indicates less dispersion
around the mean.
-
8/3/2019 Health Economics- Lecture Ch03
13/28
Thestandard errorof the mean is the standard deviation
divided by the square root of the number ofobservations.
-
8/3/2019 Health Economics- Lecture Ch03
14/28
To test our smoking hypothesis formally, we can
construct a difference of means test.
- good for continuous data that can be broken
up by categories
We wish to compare the value,
difference = cm cwto zero, which was the original hypothesis.
Recall: difference = 2.13
The standard error of the difference is calculated to be
equal to 0.216.
-
8/3/2019 Health Economics- Lecture Ch03
15/28
About 68 percent of a distribution lies within 1 standard
error 2.13 0.216 =1.91
2.13 + 0.216 =2.35
About 95 percent of a distribution lies within 2 standarderrors
2.13 (2)(0.216) =1.69
2.13 +(2)(0.216) =2.56
How does this compare to our null hypothesis that the
value difference is zero?
-
8/3/2019 Health Economics- Lecture Ch03
16/28
The t test:
The t statistic is calculated as the value divided by the
standard error.
In our example: 2.13 / 0.216 = 9.86
As a rule of thumb, if the t-statistic is greater than 2,
you have statistical significance.
-
8/3/2019 Health Economics- Lecture Ch03
17/28
This experiment would find very good evidence that
among smokers, women smoke fewer cigarettes than
men.
The males have higher levels than the females, and the
probability is well over 95 percent that this difference is
statistically significant.
-
8/3/2019 Health Economics- Lecture Ch03
18/28
III. Regression Analysis
- good for data that is continuous
Suppose we wish to explore the relationship between thecigarette tax and the amount of cigarettes smoked per
day.
null hypothesis: no effect (b = 0)alternative hypothesis: tax is inversely related to
the quantity smoked
(b < 0)
-
8/3/2019 Health Economics- Lecture Ch03
19/28
We want to know if the coefficient of -3.24 is
significantly different from zero.
-
8/3/2019 Health Economics- Lecture Ch03
20/28
A coefficient of -3.24 means:
A $1 increase in the tax is correlated with a change in
quantity demanded of 3.24 fewer cigarettes.
-
8/3/2019 Health Economics- Lecture Ch03
21/28
The elasticity is -0.09. This means a 1% increase in the
tax will lead to a 0.09% reduction in quantity
demanded.
-
8/3/2019 Health Economics- Lecture Ch03
22/28
A multiple regression includes more than one
explanatory variable.
ex: gender, race, age, education, income
Some of the variables may be continuous, some may be
categories.
- interpretation is different
-
8/3/2019 Health Economics- Lecture Ch03
23/28
Continuous variables
Notice how adding more variables changes the
coefficient on excise tax.
Is it still significant?
CC
C
C
-
8/3/2019 Health Economics- Lecture Ch03
24/28
Income:
Age:
Education:
CC
C
C
-
8/3/2019 Health Economics- Lecture Ch03
25/28
When using categorical variables in a regression, we need
to assign them a numerical value.- dummy variables
Dummy variables are used in regression analysis to
determine whether groups of people differ from others.
For example, maybe we would want to know if African
Americans smoke more than other groups.
We can create a dummy variable that assigns the value 1
if the person is African American or 0 otherwise.
-
8/3/2019 Health Economics- Lecture Ch03
26/28
Because male appears as a variable, we know it was
assigned a value of 1. (female =0)
Is the male coefficient significant?
D
D
D
-
8/3/2019 Health Economics- Lecture Ch03
27/28
The interpretation of a dummy variable is different than
that of a continuous variable.
0 -5.05
2.23
African AmericanNo=0 Yes =1
No=0
Yes =1
Male
An African American
female smokes 5.05 fewer
cigarettes than white
females.
A white male smokes 2.23
more cigarettes than a
white female.
An African American male smokes 2.82 fewer
cigarettes than a white female.
2.23 -5.05
= - 2.82
-
8/3/2019 Health Economics- Lecture Ch03
28/28
Summary of Statistical Competencies:
Formulate questions in terms of hypotheses.
Read statistical test results to determine if the result is
significant.
Understand statistical significance.
Interpret reported regression results.