ch1. introduction - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung...ch1. introduction...
TRANSCRIPT
1.1 Categorical Response Data
• A categorical variable has a measurement scale consisting of a set of categories
• For example, political philosophy may be measured as “liberal”, “moderate”, or “conservative”;
• Commonly used in the social and health sciences for measuring attitudes, opinions and responses.
• Behavior sciences, public health, zoology, education, marketing, engineering sciences and industrial quality control
2
Response/Explanatory Variable
• Response variable(dependent variable or Y variable) • Explanatory variable(independent variable or X variable) • The subject of this course is the analysis of categorical
response variables. • The explanatory variables can be categorical or
continuous.
3
Nominal/Ordinal Scale
• Categorical variables have two main types of measurement scales – Ordinal variables: ordered scales like attitude
toward something, appraisal of a company’s inventory level, response to a medical treatment, and frequency of feeling symptoms of anxiety
– Nominal variables: unordered scales like religious affiliation, primary mode of transportation to work, favorite type of music, and favorite place to shop
4
Nominal/Ordinal Scale
• Methods designed for ordinal variables cannot be used with nominal variables.
• Methods designed for nominal variables can be used with nominal or ordinal variables, but they do not use the information about that ordering (serious loss of power)
5
Problems
• 1.1 In the following examples, identify the response variable and the explanatory variables. – a. Attitude toward gun control(favor, oppose),
Gender(female, male), Mother’s education(high school, college)
– b. Heart disease(yes, no), Blood pressure, Cholesterol level
– c. Race(white, nonwhite), Religion(Catholic, Jewish, Protestant), Vote for president(Democrat, Republican, Other), Annual income
– d. Marital status (married, single, divorced, widowed), Quality of life(excellent, good, fair, poor)
6
Problems
• 1.2 Which scale of measurement is most appropriate for the following variables –nominal, or ordinal? – a. Political party affiliation (Democrat, Republican,
unaffiliated). – b. Highest degree obtained (none, high school,
bachelor’s, master’s, doctorate). – c. Patient condition (good, fair, serious, critical). – d. Hospital location (London, Boston, Madison,
Rochester, Toronto). – e. Favorite beverage (beer, juice, milk, soft drink,
wine, other). – f. How often feel depressed (never, occasionally,
often, always). 7
1.2 Probability Distributions for Categorical Data
• Key distributions for categorical data: – binomial and – multinomial distribution
8
Binomial Distribution
• n independent and identical trials with two possible outcomes, “success” and “failure”
• Identical trials: the probability of success is the same for each trial
• Independent trials: the response outcomes are independent random variables
Bernoulli trials
9
Binomial Distribution
• Let Y denote the number of successes out of the 𝑛𝑛 trials with 𝜋𝜋, the probability of success for a given trial.
• The probability of outcome y for Y equals
𝑃𝑃 𝑦𝑦 =𝑛𝑛!
𝑦𝑦! 𝑛𝑛 − 𝑦𝑦 !𝜋𝜋𝑦𝑦(1 − 𝜋𝜋)𝑛𝑛−𝑦𝑦 ,𝑦𝑦 = 0,1,2, … ,𝑛𝑛
• For fixed 𝑛𝑛, it becomes more skewed as π moves toward 0 or 1
• For fixed 𝜋𝜋, it becomes more bell-shaped as 𝑛𝑛 increases.
• When n is large, it can be approximated by a normal distribution with 𝜇𝜇 = 𝑛𝑛𝜋𝜋 and σ= 𝑛𝑛𝜋𝜋(1 − 𝜋𝜋)
10
Binomial Distribution • Table 1.1. Binomial Dist. with 𝑛𝑛 =10 and 𝜋𝜋 =0.20, 0.50, and 0.80.
The distribution is symmetric when 𝜋𝜋 =0.5 y P(y) when π=0.2 P(y) when π=0.5 P(y) when π=0.8
0 0.107 0.001 0.000
1 0.268 0.010 0.000
2 0.302 0.044 0.000
3 0.201 0.117 0.001
4 0.088 0.205 0.005
5 0.027 0.246 0.027
6 0.005 0.205 0.088
7 0.001 0.117 0.201
8 0.000 0.044 0.302
9 0.000 0.010 0.268
10 0.000 0.001 0.107 11
Multinomial Distribution
• have more than two possible outcomes. • Let c denote the number of outcome
categories. • For 𝑛𝑛 independent observations, the
multinomial probability that 𝑛𝑛1 fall in category 1, 𝑛𝑛2 fall in category 2, …, 𝑛𝑛𝑐𝑐 fall in category c with their probabilities 𝜋𝜋𝑗𝑗 , where ∑ 𝜋𝜋𝑗𝑗𝑗𝑗 = 1, equals
𝑃𝑃 𝑛𝑛1,𝑛𝑛2, … ,𝑛𝑛𝑐𝑐 = (𝑛𝑛!
𝑛𝑛1!𝑛𝑛2! …𝑛𝑛𝑐𝑐!)𝜋𝜋1𝑛𝑛1𝜋𝜋2𝑛𝑛2 ⋯𝜋𝜋𝑐𝑐𝑛𝑛𝑐𝑐
12
1.3 Statistical Inference for a Proportion
• In practice, the parameter values for the binomial and multinomial distributions are unknown.
• Using sample data, we estimate the parameters.
• In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model.
13
Likelihood Function
• The probability of the observed data, expressed as a function of the parameter, is called the likelihood function.
• For example, in n=10 trials, suppose a binomial count equals y=0.
• From the binomial formula with parameter 𝜋𝜋, the probability of this outcome equals
𝑃𝑃 𝑦𝑦 = 0 =10!
0! 10!𝜋𝜋0(1 − 𝜋𝜋)10 = (1 − 𝜋𝜋)10
14
Maximum Likelihood Estimation(MLE)
15
• The maximum likelihood (ML)estimate of a parameter is the parameter value for which the probability of the observed data takes its greatest value.
Maximum Likelihood Estimation(MLE)
• In general, for the binomial outcome of y successes in n trials, the ML estimate of 𝜋𝜋 equals 𝑝𝑝 = 𝑦𝑦/𝑛𝑛 (the sample proportion of successes for the n trials)
• The ML estimate is often denoted by the parameter symbol with a ^(a “hat”) over it.
16
Significance Test About a Binomial Proportion
• The ML estimator for the parameter 𝜋𝜋 is the sample proportion, 𝑝𝑝.
• The sampling distribution of the sample proportion 𝑝𝑝 has mean and standard error
𝐸𝐸 𝑝𝑝 = 𝜋𝜋, 𝜎𝜎 𝑝𝑝 = 𝜋𝜋(1−𝜋𝜋)𝑛𝑛
• The sampling distribution of 𝑝𝑝 is approximately normal for large n.
17
Significance Test About a Binomial Proportion
• Null hypothesis 𝐻𝐻0: 𝜋𝜋 = 𝜋𝜋0 • The test statistic
𝑧𝑧 =𝑝𝑝 − 𝜋𝜋0𝜋𝜋0(1 − 𝜋𝜋0)
𝑛𝑛
• For large samples, the null sampling distribution of the z test statistic is the standard normal.
18
Example: Survey Results on Legalizing Abortion
• Let 𝜋𝜋 denote the proportion of the American adult population that responds “yes” to the question,
• “Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if she is married and does not want any more children.”
19
Example: Survey Results on Legalizing Abortion
• Of 893 respondents to this question, 400 replied “yes” and 493 replied “no”
• p=400/893=0.448 • 𝐻𝐻0: 𝜋𝜋 = 0.50, 𝐻𝐻𝑎𝑎: 𝜋𝜋 ≠ 0.50
• z=(0.448 − 0.50)/ 0.50 0.50893
= −3.1
• The two-sided P-value is 0.002
20
Confidence Intervals for a Binomial Proportion
• 100(1-𝛼𝛼)% confidence interval for 𝜋𝜋 𝑝𝑝 ± 𝑧𝑧𝛼𝛼
2𝑆𝑆𝐸𝐸 ,𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝐸𝐸 = 𝑝𝑝(1 − 𝑝𝑝)/𝑛𝑛
• where 𝑧𝑧𝛼𝛼2 denotes the standard normal
percentile having right-tail probability equal to 𝛼𝛼
2
• Unless 𝜋𝜋 is close to 0.50, however, it does not work well unless n is very large.
21
Confidence Intervals for a Binomial Proportion
• A better way to construct confidence intervals uses a duality with significance tests.
• For given p and n, the 𝜋𝜋0 values that have test statistic value 𝑧𝑧𝛼𝛼
2 are the solutions to
the equation |𝑝𝑝 − 𝜋𝜋0|
𝜋𝜋0(1 − 𝜋𝜋0)/𝑛𝑛= 𝑧𝑧𝛼𝛼
2
for 𝜋𝜋0.
22