population proportion and sample proportion
DESCRIPTION
Population proportion and sample proportion. 生活中很多的調查都僅問是否贊成 … 、是否支持 … ,然後計算「贊成」與「反對」的人數( count) 所佔之比例 (proportion) 。 本章要介紹如何用統計方法來推論單一的「比例」 (a single proportion) 。下一章將會介紹如何來推論一組比例的分配。. Population proportion and sample proportion. - PowerPoint PPT PresentationTRANSCRIPT
© 蘇國賢 2007社會統計(上) Page 1
Population proportion and sample Population proportion and sample proportionproportion
• 生活中很多的調查都僅問是否贊成…、是否支持…,然後計算「贊成」與「反對」的人數( count) 所佔之比例(proportion) 。
• 本章要介紹如何用統計方法來推論單一的「比例」 (a single proportion) 。下一章將會介紹如何來推論一組比例的分配。
© 蘇國賢 2007社會統計(上) Page 2
Population proportion and sample Population proportion and sample proportionproportion
• 想要估計總統大選阿扁的得票率,即投票給阿扁的人佔所有投票者的比例,我們可以利用適當的抽樣方法取處樣本數為 n 的樣本,然後觀察樣本中支持阿扁的人數佔整個樣本n 的比例,即可得到樣本中的阿扁支持率,稱之為樣本比例。
• 如果我們知道樣本比例的抽樣分配,即樣本比例的期望值,變異數,及分配形狀,則可以用樣本比例來推估母體比例。
© 蘇國賢 2007社會統計(上) Page 3
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• Let p denote the proportion of items in a population that possess a certain characteristic (unemployed, income below poverty level).
• To estimate p, we take a random sample of n observation from the population and count the number X of items in the sample that possess the characteristic.
• The sample proportion p^ = X/n is used to estimate the population proportion p.
© 蘇國賢 2007社會統計(上) Page 4
Sampling Distribution of the Sample Sampling Distribution of the Sample ProportionProportion
• 若一隨機試驗只有兩種課能的結果( X=1 支持阿扁 , X=0 不支持阿扁),若母體數總共為 N( 所有投票人),若母體中有 K 個人會投票給阿扁,則支持阿扁的母體比例 (population proportion) 為
• p = K/N (N= 母體個數, K= 支持阿扁總人數)
定義定義
© 蘇國賢 2007社會統計(上) Page 5
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• 上次總統大選的有效投票數 12,664,393 (N)• 其中阿扁得 4,977,697 (K)• 母體比例為 39.30%
定義定義
© 蘇國賢 2007社會統計(上) Page 6
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• 若母體 N 中隨機抽取 n 個元素為樣本,表為(X1, X2, …Xn) ,且 n 個樣本中有 k 個人支持阿扁,支持阿扁所佔的比例稱為樣本比例 (sample proportion) :
• (n= 樣本個數, k= 樣本個數)• k 為樣本中,支持阿扁 (X=1) 的個數總和。
定義定義
n
X
n
kp
n
ii
1ˆ
© 蘇國賢 2007社會統計(上) Page 7
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• 在大選前,民調中心調查 1500 個樣本 (n=1500) ,其中有 573 人支持阿扁 (k=573) ,樣本支持比例為 38.2%
• 抽樣誤差為
定義定義
%1.12.383.39ˆ pp
隨著每一次樣本所抽取的對象不同,所計算出的樣本比例也會有差異,因此樣本比例本身為一隨機變數。
© 蘇國賢 2007社會統計(上) Page 8
The Bernoulli DistributionThe Bernoulli Distribution
• P(X=1) = p
• P(X=0) = (1-p)
• If we let q = 1- p, then the p.f of X can be written as follows:
定義定義
otherwise 0
1,0for )(
1 XqpXf
XX
© 蘇國賢 2007社會統計(上) Page 9
The Bernoulli DistributionThe Bernoulli Distribution
• E(X) = 1·p +0·q = p ( X的期望值等於母體比例)
• E(X2) =X2 f(x)=12·p+02·q = p
• Var(X) = E(X2) –[E(X)]2 =p-p2
=p(1-p) = p·q
定義定義
otherwise 0
1,0for )(
1 XqpXf
XX
© 蘇國賢 2007社會統計(上) Page 10
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• The Normal Approximation Rule for Proportion: Let p denote the proportion of a population possessing some characteristics of interest. Take a random sample of n observations from the population. Let X denote the number of items in the sample possessing the characteristic. We estimate the population proportion p by the sample proportion p^=X/n. If np5, and nq 5, the random variable p^ has approximately a normal distribution with:
pq/n pq/n)p̂Var( )ˆ( p̂ ppE
© 蘇國賢 2007社會統計(上) Page 11
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• 證明 )ˆ( ppE
)(1
)()ˆ( XEnn
XEpE
)](...)()([1
)...(1
21
21
n
n
XEXEXEn
XXXEn
n
npppp
n
pXE
]...[1
)(
© 蘇國賢 2007社會統計(上) Page 12
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• 證明
)var(1
)var()ˆvar(2
Xnn
Xp
)]var(...)var()[var(1
)...(1
212
212
n
n
XXXn
XXXVarn
22]...[
1
)var(
n
npqpqpqpq
n
pqX
pq/n)p̂Var(
assume X1, X2…Xn independent
© 蘇國賢 2007社會統計(上) Page 13
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• If the distribution of p^ is approximately normal, and
)1,0(~pq/n
ˆN
ppZ
pq/n pq/n)p̂Var( )ˆ( p̂ ppE
),(~ˆn
pqpNp
© 蘇國賢 2007社會統計(上) Page 14
例題例題• 假設這一次的大選會有 55% 的選民支持阿扁,假設我們任取 n=400 人的隨機樣本來預測阿扁的當選率,我們預測阿扁會輸的的機率為?
45.55.1,55.,400 qpn
5)45(.400,5)55(.400 nqnp
)00062.400
)45)(.55(.,55(.~ˆ
n
pqNp
2025.
05.
00062.
)55(.)50(.
Z
0228.)2()5.ˆ( ZPpP
© 蘇國賢 2007社會統計(上) Page 15
例題例題• Of your first 15 grandchildren, what is the chance there will be
more than 10 boys? (assume equal probability of male/female)• “more than 10 boys””the proportion of boys is more than
10/15”• Use the Normal Approximation Rule:
)15/)5)(.5(./,50(.~ˆ npqNp
29.1129.
5.1510
ˆ
SE
ppZ
099.)29.1()15/10ˆ( ZPpP
© 蘇國賢 2007社會統計(上) Page 16
Confidence intervals for proportions Confidence intervals for proportions (large samples)(large samples)
we know that p^ ~N(p, pq/n) , where q = 1-p and np5 and nq 5) ≧ ≧
)1,0(~/
ˆN
npq
ppZ
)(1 2/2/ zZzP
© 蘇國賢 2007社會統計(上) Page 17
Value of ZValue of Zαα
• P(Z z≧ α/2) =α/2 • P(Z -z≦ α/2) =α/2 • P(-zα/2 Z z≦ ≦ α/2) =(1-α)
0.399
1.338104
f x( )
44 x4 3 2 1 0 1 2 3 4
0
0.2
0.4
α/2
1-α/2-α/2
=1-α
© 蘇國賢 2007社會統計(上) Page 18
Confidence intervals for proportions Confidence intervals for proportions (large samples)(large samples)
)1,0(~/
ˆN
npq
ppZ
)/
ˆ(1 2/2/
z
npq
ppzP
)(1 2/2/ zZzP
)/ˆ/ˆ(1 2/2/ npqzppnpqzpP
)/ˆ,/ˆ( 2/2/ npqzpnpqzp
上面的公式必須要有母體比例 p 才能估計標準誤
© 蘇國賢 2007社會統計(上) Page 19
Confidence intervals for proportions Confidence intervals for proportions (large samples)(large samples)
)/ˆ,/ˆ( 2/2/ npqzpnpqzp
因為沒有 p 與 q 的資訊,在樣本數夠大時,我們通常以樣本的比例 p^ 來估計母體的標準誤:
)/ˆˆˆ,/ˆˆˆ( 2/2/ nqpzpnqpzp
© 蘇國賢 2007社會統計(上) Page 20
Confidence interval for the population Confidence interval for the population proportion proportion pp
Let p denote the population proportion. Suppose we take a large random sample of n observations and obtain the sample proportion p^. A confidence interval for the population proportion having level of confidence 100(1-α)% is given by
)/ˆˆˆ,/ˆˆˆ( 2/2/ nqpZpnqpZp
定義定義
© 蘇國賢 2007社會統計(上) Page 21
© 蘇國賢 2007社會統計(上) Page 22
Wilson estimateWilson estimate
• 用樣本比例取代母體比例來估計標準誤並不一定正確。
• 例如:丟一個銅板三次得到三次都得正面,則
• p^=3/3=1
• p^(1-p^)/n=1(1-0)/3=0
© 蘇國賢 2007社會統計(上) Page 23
Wilson estimateWilson estimate
We must know the s.d. of the population to get a CI for p.
• Unfortunately, modern computer studies reveal the confidence intervals based on this approach can be quite inaccurate, even for large samples.
-- When the sample is not a SRS.
-- When the sample size is small
© 蘇國賢 2007社會統計(上) Page 24
Wilson estimateWilson estimate
• The Wilson estimate ~
Add 2 successes and 2 failures (so that the sample proportion is slightly moved away from 0 and 1.)
-- Because this estimate was first suggested by Edwin Bidwell Wilson in 1927, we call it the Wilson estimate.
4
2~ˆ
n
xp
n
xp
p̂
© 蘇國賢 2007社會統計(上) Page 25
Wilson estimateWilson estimate
• 的抽樣分配趨近於平均數為 p 、標準差為 的常態分配。
• An approximate level C confidence interval for p is
• The margin of error is
p~
4
)1(
n
pp
pSEzp ~~
4
)~1(~~
n
ppzSEzm p
© 蘇國賢 2007社會統計(上) Page 26
Confidence interval for the population Confidence interval for the population proportion proportion pp
政府想要估計月收入低於 $25,000NT 的家庭。 500 個家庭接受訪問,其中有 200 戶人家年收入少於 25000. 求 p 的 95% 信賴區間?
4.504/202/~ nxp
例題例題
6.~1~ pq
)504/)6)(.4(.96.14,.504/)6)(.4(.96.14(.
(.3572, .4428)
))4/(~~~,)4/(~~~( 2/2/ nqpZpnqpZp
© 蘇國賢 2007社會統計(上) Page 27
例題例題• 從台北市隨機抽取 500 個人,詢問是否贊成公投,結果有 312 名贊成。試求台北市贊成公投比率 95% 信賴區間。
623.0504
314~ p
504
)623.01(623.0623.0
504
)623.01(623.0623.0 025.0025.0
ZpZ
042.0623.0042.0623.0 p
665.0581.0 p
, p的信賴區間為:
© 蘇國賢 2007社會統計(上) Page 28
One-sided confidence intervals for the One-sided confidence intervals for the population proportionpopulation proportion
Suppose that we take a random sample of n observations from some population having unknown proportion p. Suppose we wish to find the lower confidence limit LCL such that the probability is (1-) that p exceeds LCL.
The one-sided interval (LCL, 1.00) is a left-sided confidence interval.
The LCL is given by: nqpZpLCL /ˆˆˆ
© 蘇國賢 2007社會統計(上) Page 29
One-sided confidence intervals for the One-sided confidence intervals for the population proportionpopulation proportion
Construct a right-sided 95% CI for the proportion of defective items produced by a machine if 16 items are found to be defective in a random sample of 100 items.
)4/(~~~ nqpZpUCL
17.104/18~ p 05.95.1
2306.104/)83)(.17(.645.117. UCL
The 95% right-sided CI for p is (0, .2306) This mean that we can be 95% confident that the population proportion is less than .2306
© 蘇國賢 2007社會統計(上) Page 30
Determining the sample sizeDetermining the sample size 決定樣本大小決定樣本大小
Margin of Error
Suppose that we take a random sample from some population. Then a 100(1-)% confidence interval for the population proportion extends at most a distance m on each side of the sample proportion if the number of observations is ?
))4/(~~~,)4/(~~~( 2/2/ nqpzpnqpzp
© 蘇國賢 2007社會統計(上) Page 31
Determining the sample sizeDetermining the sample size 決定樣本大小決定樣本大小
問題是我們還不知道 (因為樣本數都還沒決定),所以上述公式無法使用,除非我們有 p 的推估值。
(1) 我們可以用 pilot study 來得到 p 的估計值。
(2) 在不知道的樣本比例情形下,我們可以採用最保守的估計,也就是最大的變異 .5*.5=.25 來估計 n 。
m
qpzn
n
qpzm
~~4
4
~~2/2/
2
22/
~~)(4
m
qpzn
p~
© 蘇國賢 2007社會統計(上) Page 32
Sample size and confidence interval for the Sample size and confidence interval for the proportionproportion
如果母體比率無法推估,則樣本數:
2
22/ )(25.
4m
zn
如果母體比率 p 可以推估,則樣本數:
2
22/
~~)(4
m
qpzn
© 蘇國賢 2007社會統計(上) Page 33
Sample size and confidence interval for the Sample size and confidence interval for the proportionproportion
民意調查機構想知道某總統候選人得票的比率,請問至少要多大的樣本數才可以使此機構在 95% 的信賴度下,估計的誤差界不會超過 .03 ?
.5q̂p̂ 03.0m 96.1 95.1 2/ z
10681.067,103.
)96.1(25.)(25.4
2
2
2
22/
m
zn
© 蘇國賢 2007社會統計(上) Page 34
Sample size and confidence interval for the Sample size and confidence interval for the proportionproportion
民意調查機構想知道某總統候選人得票的比率。假設該公司要求樣本比例與母體之誤差不能超過 0.01 ,且有 95 %的信賴度,則樣本數應為何?
2/1p
9604)01.0(
4/1)96.1(
)01.0(
)1()(4
2
2
2
22/
PPZn 可解得
p未知,故以 代入,
故至少應選取 9,600個樣本點。
© 蘇國賢 2007社會統計(上) Page 35
Tests of the population proportionTests of the population proportion
樣本比例的抽樣分配 f(p^) :如果母體的比例為 p, 且 np5 and nq 5 , 則樣本比例 p^ 為一常態分配 ~N(p, pq/n)
The Normal Approximation Rule for Proportion: If np5, and nq 5, the random variable p^ has approximately a normal distribution with:
pq/n pq/n)p̂Var( )ˆ( p̂ ppE
© 蘇國賢 2007社會統計(上) Page 36
Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion
• If the distribution of p^ is approximately normal, then random variable
)1,0(~pq/n
ˆN
ppZ
© 蘇國賢 2007社會統計(上) Page 37
Tests of the population proportionTests of the population proportion
設 np5 and nq 5 檢證下列假說:H0: p = p0 or H0: pp0 H1: p < p0
如果 H0為真,則樣本比率 ~N(p0, p0q0/n)
)1,0(~/nqp
ˆ
00
0 Npp
Z
Reject H0 if Z < -z or p^ < p^* ( critical value approach)
nqpzpp /ˆ 000*
假設為真時的母體比例
© 蘇國賢 2007社會統計(上) Page 38
© 蘇國賢 2007社會統計(上) Page 39
© 蘇國賢 2007社會統計(上) Page 40
© 蘇國賢 2007社會統計(上) Page 41
Page 614, Procedure 12.2B (cont.)Page 614, Procedure 12.2B (cont.)
© 蘇國賢 2007社會統計(上) Page 42
例:例: Testing a population ProportionTesting a population Proportion
藍營立法委員宣稱民調顯示 60% 的民眾支持連戰出訪中國,綠營團體宣稱支持的民眾不會超過 60% ,妳用 100 的樣本來驗證:H0: p = .6 v.s. H1: p < .6
假設 55 個樣本支持連戰出訪,以 5% 的顯著水準,我們可以推翻藍營立委的說法嗎?
© 蘇國賢 2007社會統計(上) Page 43
例:例: Testing a population ProportionTesting a population Proportion
Solution:
If H0 is true, then p^ has a normal distribution with mean p =.6 and variance pq/n = (.6)(.4)/100 = .0024If we use a one-tailed test at the 5% level of significance, the critical region consists of all values of Z less than –z = -z.05 = -1.645從樣本中得知 p^=x/n = 55/100 =.55
02.1100/)4)(.6(.
60.55.
/nqp
ˆ
00
0
pp
z
© 蘇國賢 2007社會統計(上) Page 44
例:例: Testing a population ProportionTesting a population Proportion
We do not reject H0
0
1
645.1-1.02
519.100/)4)(.6(.645.16./ˆ 000* nqpzpp
1539.)02.1,( ZPvaluep
實際上觀察到的樣本比例為 .55>.519 因此無法推翻虛擬假設
© 蘇國賢 2007社會統計(上) Page 45
Sampling distribution of the difference Sampling distribution of the difference between sample proportionsbetween sample proportions
• Suppose we take independent sample of size n1 and n2 from two population. Let p1 and p2 be the proportion of items in each population that possess a certain characteristics, and let q1=(1-p1), q2=(1-p2). If n1p1>5, n1q1>5, n2p2>5, n2q2>5, then the random variable (p1^-p2^) is approximately normally distributed with
2121 )ˆˆ( ppppE
2
22
1
1121 )ˆˆ(
n
qp
n
qpppVar
© 蘇國賢 2007社會統計(上) Page 46
例題例題
• 假設某行銷公司想要知道某電視節目在高、低收入人口中受歡迎的程度。假設高收入的人中有 40%喜歡看此節目,在低收入人口中喜歡此節目的佔 50% 。這家行銷公司從高收入的人口中抽取 100 人的樣本,從低收入中抽 200 人的樣本。請問兩樣本比率差距小於 .05 的機率?
?)05.ˆˆ5.0( 21 ppP
© 蘇國賢 2007社會統計(上) Page 47
例題例題
200,5.,5.
100,6.,4.
122
111
nqp
nqp
5,5
5,5
2222
1111
qnpn
qnpn
10.5.4.21 pp
00365.200
)5)(.5(.
100
)6)(.4(.
2
22
1
11 n
qp
n
qp
)00365.
)1.0(5.0ˆˆ
00365.
)1.0(05.( 2211 zppzP
1967.)48.2ˆˆ83(. 21 ppP
© 蘇國賢 2007社會統計(上) Page 48
Confidence intervals for the difference of Two Confidence intervals for the difference of Two population proportionpopulation proportion
2
22
1
11,2
212
22
1
11,2
21
ˆˆˆˆ)ˆˆ(,
ˆˆˆˆ)ˆˆ(
n
qp
n
qpZpp
n
qp
n
qpZpp
Let p1 denote the observed proportion of successes in a random sample of n1 observation from a population with proportion p1 successes, and let p2 denote the observed proportion of successes in an independent random sample of n2 observations from a population with proportion p2 successes. A 100(1- α) % confidence interval for (p1 – p2) is given by the interval
This result holds provided n1p1 5≧ n1q1 5 ≧ n2p2 5 and ≧ n2q2 5≧
© 蘇國賢 2007社會統計(上) Page 49
Tests concerning differences of Tests concerning differences of proportionsproportions
• 欲檢定兩母體的比率是否等於某特定值(相等),假設母體 1 的比率為 p1,母體 2 的比率為 p2 :
• H0: p1 –p2 = D0
• 分別從兩母體中抽取樣本 n1, n2並計算樣本比率為 p^1 p^2。
© 蘇國賢 2007社會統計(上) Page 50
Tests concerning differences of Tests concerning differences of proportionsproportions
• 若虛擬假設為真 H0: p1 –p2 = D0,且 n1p1≥5, n1q1
≥5, n2p2≥5, n2q2≥5
),(~)ˆˆ(2
22
1
112121 n
qp
n
qpppNpp
• 通常我們想要檢驗虛擬假設 H0: p1 –p2 =0 的情形,即 H0: p1 = p2
© 蘇國賢 2007社會統計(上) Page 51
Tests concerning differences of Tests concerning differences of proportionsproportions
),(~)ˆˆ(2
22
1
112121 n
qp
n
qpppNpp
• 由於 p1和 p2為未知,我們無法計算變異數。• 由於我們假設 p1 = p2,一個合理估計母體變異數的方法為同時利用兩樣本的資料來估計母體比率p = p1 = p2,稱為 pooled sample proportion 。
21
2211
21
21 ˆˆˆ
nn
pnpn
nn
xxp
© 蘇國賢 2007社會統計(上) Page 52
Tests concerning differences of Tests concerning differences of proportionsproportions
21
2211
21
21 ˆˆˆ
nn
pnpn
nn
xxp
)(ˆˆ
0)ˆˆ(
/ˆˆ/ˆˆ
0)ˆˆ(
21
21
21
21
21
nnnn
qp
pp
nqpnqp
ppz
檢定 H0: p1 – p2 =0 v.s. H1: p1 – p2 ≠0
© 蘇國賢 2007社會統計(上) Page 53
例題例題• 第一台機器生產的 400產品中,有 23 個瑕疵品,第二台機器生產的 400 樣本種,有 17 個瑕疵品,請用 5% 的顯著水準測驗這兩台機器的品質是否相當。 96.1 valuecritical,05. 025. z
05.800
40
400400
1723ˆ
21
21
nn
xxp
2121 /ˆˆ/ˆˆ的變異數為 )ˆˆ( nqpnqppp
0002375.400
)95)(.05(.
400
)95)(.05(.
© 蘇國賢 2007社會統計(上) Page 54
例題例題• 第一台機器生產的 400產品中,有 23 個瑕疵品,第二台機器生產的 400 樣本種,有 17 個瑕疵品,請用 5% 的顯著水準測驗這兩台機器的品質是否相當。
10002375.
0)0425.0575(.
z Failed to reject H0
21
21
/ˆˆ/ˆˆ
0)ˆˆ(
nqpnqp
ppz
© 蘇國賢 2007社會統計(上) Page 55
© 蘇國賢 2007社會統計(上) Page 56
© 蘇國賢 2007社會統計(上) Page 57
© 蘇國賢 2007社會統計(上) Page 58
© 蘇國賢 2007社會統計(上) Page 59
© 蘇國賢 2007社會統計(上) Page 60
例題例題
• 兩家銀行,信用卡部門上個月申請件數與核准件數如下表:
• 當顯著水準等於 5% 時,檢定兩家銀行信用卡核准率是否相同?若不同,則求核准率差的 95% 信賴區間。
AB
申請件數
核准件數
銀行 350 273
銀行 450 378
• 因檢定統計量小於左尾臨界值,故拒絕虛無假設,兩家銀行信用卡核准率不同。
AB
申請件數
核准件數
銀行 350 273
銀行 450 378
0:,0: 10 BABA ppHppH欲檢定
81375.0450350
378273ˆ
p混和樣本比例
16.2
4501
3501
18625.081375.0
450378
350273
Z檢定統計量
96.1025.0 Z左尾臨界直
450
1
350
118625.081375.006.0
%95
025.0 Z
信賴區間核准率差的
( . ~ . )0 114 0 006