introduction to ranked set samplingdm.ieu.edu.tr/workshop_rss_2011/monday1.pdfintroduction to ranked...

29
Introduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June 20, 2011 Omer Ozturk (OSU) RSS Izmir Economy Univ and TUBITAK 1/ 24

Upload: others

Post on 20-Jan-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Introduction to Ranked Set Sampling

Omer Ozturk

The Ohio State University

Department of MathematicsJune 20, 2011

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 1 /

24

Page 2: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 3: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 4: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 5: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 6: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 7: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Outline

Outline

1 Setting and Main Idea

2 Examples

3 Statistical Inference for Population Mean

4 Generalization of the Results

5 Statistical Inference for Population Variance

6 Concluding Remark

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 2 /

24

Page 8: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Population of interest could be either finite or infinite.Actual measurement of a unit is either expensive of timeconsuming.Ranking of a small set of experimental units is cheap and easy.Objective: By using this cheap ranking information, we wish toconstruct a better representative sample.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 3 /

24

Page 9: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Ranked set samplingSelect M units at random from a specified population.Rank these M units with some expert judgment without measuringthem.Retain the smallest judged unit and return the others.Select the second M units and retain the second smallest unitjudged.Continue to the process until M ordered units are measured.Note: These M ordered observations X[1]i , ...,X[M]i are called acycle.Note: Process repeated i = 1, · · · ,n cycle to get Mn observations.These nM observations are called a standard ranked set sample.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 4 /

24

Page 10: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

DiagramLet M = 4 and n = 2.

Judgment RankCycle 1 2 3 4

X[1]1 X[2]1 X[3]1 X[4]11 X[1]1 X[2]1 X[3]1 X[4]1

X[1]1 X[2]1 X[3]1 X[4]1X[1]1 X[2]1 X[3]1 X[4]1X[1]2 X[2]2 X[3]2 X[4]2

2 X[1]2 X[2]2 X[3]2 X[4]2X[1]2 X[2]2 X[3]2 X[4]2X[1]2 X[2]2 X[3]2 X[4]2

X[1]1, · · · ,X[4]2 is called aranked set sample.

For each fully measuredunit, we need M − 1additional units for ranking.Measured units are allindependent.Under a stable rankingcondition, observationsfrom the same judgmentclass are identicallydistributed.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 5 /

24

Page 11: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Why ranked-set sampling?

Before ranking

Unit 1 Unit 2 Unit 3 Unit 4 Unit 5

? ? ? ? ?

After Ranking

Unit 2 Unit 5 Unit 4 Unit 1 Unit 3

R1, ? R2, ? R3, ? R4, ? R5, ?

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 6 /

24

Page 12: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Why ranked-set sampling?

−4 −2 0 2 4

01

23

4

y

f(y)

Simple Rjgmt 1jgmt 2jgmt 3jgmt 4jgmt 5

XX XXX

Y YY Y Y

Ranked Set Sample

Simple Random Sample

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 7 /

24

Page 13: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Design Considerations

Rankers should be blinded. He/she should not know which rankedunit should be selected for full measurement.Sets should be constructed at random.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 8 /

24

Page 14: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Setting and Main Idea

Benefit of Ranked Set Sampling

Ranked set sampling through ranking process creates covariates.These covariates partitions the population into several strata.Benefit of ranked set sample can then be anticipated from thetheory of stratified sampling.Covariates creates some positive dependence among response ofranked units. This may be useful in certain setting.The marginal distribution of X[h] has a smaller variance than thevariance of X .

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 9 /

24

Page 15: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Examples

Examples for ranked set samplingQuantity of interest is the length of a kind of bacterial cells (inmicrometer) in a microscopic field. The actual measurement maybe laborious, but ordering two or three cells might be very easy.The quantity of interest is the number of a kind of bacterial cellsper unit volume in a test tube. Assume there are several test tubesavailable. The counting might be very time consuming and costly,but via an optical instrument ordering might be very easy andcheap.The quantity of interest is bilirubin level in jaundiced neonatalbabies. Babies can be ranked according to their appearancewithout actual measurements.The quantity of interest would be the prevalence of heart diseasein a population. The subjects can be ranked with respect to BMIand the expensive and time consuming variables can bemeasured from the ranked subjects.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 10 /

24

Page 16: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Examples

Notation

Let the pairs (R,X ) represent the rank and the value of theselected unit for full measurement.For example, if R = h, we measure X[h].For balanced ranked set sample P(R = h) = 1/M.The conditional distribution ofX given R = h is F[h].

X ∼ F with pdf f , X[h] ∼ F[h] with pdf f[h]

µ = E(X ), V (X ) = σ2, µ[h] = E(X[h]),

v(X[h]) = σ2[h], cov(X[h],X[h′]) = σ[h,h′]

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 11 /

24

Page 17: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Examples

Assumptions

Sets are independent.Judgment order statistics are independent.Consistent (stable) ranking mechanism is used.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 12 /

24

Page 18: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Examples

Some Elementary Results

The CDF of the population distribution F can be partitioned asfollows F (X ) = 1

M∑M

h=1 F[h](x).

Assume that E |X |k <∞, then E |X[h]|k <∞,h = 1, · · · ,M..

E(X ) = 1M∑M

h=1 µ[h].

σ2 = 1M∑M

h=1 σ2[h] + 1

M∑M

h=1(µ[h] − µ)2.

The last two items connects simple random sampling to stratifiedsampling in infinite population setting.In the context of ANOVA , total variation =within strata+ betweenstrataIn Mathematical statistics, MSE=Variance+ Bias2

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 13 /

24

Page 19: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Examples

Let X1, · · · ,XM be a simple random sample and X[1], · · · ,X[M] be aranked set sample.

var(XSRS) =1

M2 var(M∑

i=1

Xi) =1

M2 var(m∑

i=1

X[i])

=1

M2

m∑

i=1

σ2[i] +∑i 6=j

σ[i,j]

= var(XRSS) + cov

var(XSRS) ≥ var(XRSS)

Inequality becomes an equality when the ranking is completelyrandom.This improved efficiency result holds for almost all statisticalprocedures based on RSS.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 14 /

24

Page 20: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Mean

Point EstimationLet X[h]i , i = 1, · · · ,nh; h = 1, · · · ,M be a ranked set sample fromdistribution F . Consider two estimators, SRS (X ) and RSS (µ)

X =M∑

h=1

n∑i=1

Xhi , µ =1M

M∑h=1

1nh

nh∑i=1

X[h]i =1M

M∑h=1

X[h].

If nh > 0 for all 1 ≤ h ≤ M, µ is unbiased.If min(n1, · · · ,nM)→∞, µ is consistent.var(µ) = 1

M2

∑Mh=1 σ

2[h]/nh.

var(X ) = var(µ) + 1MN∑M

h=1(µ[h] − µ)2

eff (var(X ), var(µ)) = var(X)var(µ) = σ2

σ2− 1M

∑Mh=1(µ[h]−µ)2

> 1

Equality holds iffµ[h] = µ for all h = 1, · · · ,M.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 15 /

24

Page 21: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Mean

Hypothesis Testing

SE(µ) =√

1M2

∑Mh=1 S2

[h]/nh

If min(n1, · · · ,nM)→∞, then Tn = (µ− µ)/SE(µ) converges to astandard normal.Let H0 : µ = µ0, HA : µ 6= µ0, we then reject the null hypothesisfor large values of |Tn|.By inverting the above test, the (1− α)100% confidence intervalfor µ is constructed as follows

{µ− tn(1− α/2)SE(µ), µ+ tn(1− α/2)SE(µ)}.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 16 /

24

Page 22: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Generalization of the Results

Linear Transformation of the SampleLet g be any function of X and Y = g(X ).The results wedeveloped for X earlier holds for Y as well.Some examples of g:

g(X ) = X r , r = 1,2,3, ..., g(X ) =

{1 X < c0 otherwise

It is easy to establish

µg = E(g(X )), σ2g = var(g(X )),

µ[h]g = E(g(X[h])), σ2[h]g = var(g(X[h]))

Estimator for g can established easily.

g =1M

M∑h=1

g[h], where g[h] =1nh

nh∑j=1

g(X[h]j)

All distributional properties developed earlier hold.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 17 /

24

Page 23: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Generalization of the Results

As sample size increases, from CLT, we have (g − µg)/SE(g)converges in distribution to a standard normal, where

SE(g) =1

nh − 1

nh∑j=1

(g(X[h]j)− g[h])2.

Special Case

g(X ) =

{1 X < c0 otherwise

g(X ) has a Bernoulli distribution with parameter p = P(X ≤ c). Itis clear that E(g(X )) = p, Var(g(X )) = p(1− p).It is also clear that E(g(X[h])) = p[h], Var(g(X )) = p[h](1− p[h])

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 18 /

24

Page 24: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Generalization of the Results

Estimator for p[h]. Let I[h]j = I(X[h]j ≤ c) and g[h] = 1nh

∑nhj=1 I[h]j .

g[h] is unbiased and consistent for p[h].

S2[h]g = 1

nh−1∑nh

j=1{I[h]j − g[h]}2 is approximately equal tog[h](1− g[h]).From central limit theorem, we then have

√N(p − p)√

1M2

∑Mh=1

Ng[h](1−g[h])

nh

D→ N(0,1)

Test and confidence interval can be constructed in a similarfashion.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 19 /

24

Page 25: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Variance

Estimation of population varianceWe first consider two estimators, SRS, RSS Stokes (1980) estimator

S2SRS =

1N − 1

M∑h=1

nh∑i=1

(Xhi − XSRS)2

ST 2RSS =

1N − 1

M∑h=1

nh∑i=1

(X[h]i − XRSS)2

SRS estimator S2SRS is unbiased for σ2

Stokes estimator ST 2 is biased for finite M and n, but unbiased forlarge M and n

E(ST 2) = σ2 +1

M(Mn − 1)

M∑h=1

(µ[h] − µ)2

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 20 /

24

Page 26: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Variance

Unbiased Estimation (MacEachern, Ozturk, WolfeStark, JRSS-B, 2002)Key Idea: Note that we have established that

σ2 =1M

M∑h=1

σ2[h] +

1M

M∑h=1

(µ[h] − µ)2

If we obtain unbiased estimator for each piece and put them together,we can construct an unbiased estimator.

σ2 =1

2n2M2

∑h 6=h′

n∑i=1

n∑j=1

(X[h]i − X[h′]j)2

+1

2n(n − 1)M2

M∑h=1

n∑i=1

n∑j=1

(X[h]i − X[h]j)2

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 21 /

24

Page 27: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Variance

Alternative Expression

Assume that we have a one-way ANOVA model by treatingjudgment classes as treatment groups.The unbiased estimator σ2 can be computed from

σ2 =1

nM{(M − 1)MST + (nM −M + 1)MSE}

Let

RE1 =var(ST 2

RSS)

var(σ2), RE2 =

var(S2SRS)

var(σ2)

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 22 /

24

Page 28: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Statistical Inference for Population Variance

Relative Efficiency

Dist n,M RE1 RE2 Dist n,M RE1 RE2U(0,1) 2,2 1.412 1.294 N(0,1) 2,2 1.315 1.135

2,5 1.300 1.708 2,5 1.203 1.3935,2 1.158 1.136 5,2 1.111 1.0565,5 1.116 1.521 5,2 1.075 1.323

3x2

2 2,2 1.393 1.554 Γ(1,5) 2,2 1.236 1.0982,5 1.461 2.284 2,5 1.161 1.2985,2 1.204 1.381 5,2 1.084 1.0535,5 1.195 1.758 5,5 1.059 1.257

exp(−u) 2,2 1.135 1.052 LN(0,1) 2,2 1.026 1.0042,5 1.110 1.187 2,5 1.024 1.0215,2 1.049 1.043 5,2 1.009 1.0055,5 1.041 1.174 5,5 1.009 1.020

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 23 /

24

Page 29: Introduction to Ranked Set Samplingdm.ieu.edu.tr/Workshop_RSS_2011/monday1.pdfIntroduction to Ranked Set Sampling Omer Ozturk The Ohio State University Department of Mathematics June

Concluding Remark

Concluding Remark

We looked at how one can use subjective information to create abetter representative sample.Ranked set sampling is one of the designs that are useful in useof subjective information.Efficiency improvement holds almost all inferential procedures.Ranking errors could be problem, but careful planning or use ofrobust procedures may decrase the impact of ranking error.There are many open problems to be worked out.

Omer Ozturk (OSU) RSSIzmir Economy Univ and TUBITAK 24 /

24