Estimating Phone Service and Usage Percentages:
How to Weight the Data from a Local, Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users in the United States
Estimating Phone Service and Usage Percentages:
How to Weight the Data from a Local, Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users in the United States
Presented at
AAPOR 2009
Hollywood, FL
May 14, 2009
Thomas M. [email protected]
2
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
The ProblemThe Problem• Dual-frame telephone surveys are becoming more
prevalent in U.S. survey research– The rising percentages and distinctive demographics of
cellphone-only [CPO] households make it imperative that sample designs cover them.
– Landline RDD + Cellphone RDD sample frames
• Result: sample data for 3 phone-service segments– CPO; overlap (dual-phone); landline-only [LLO]
• Problem: what is the correct population distribution across 3 phone service segments?
3
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
National data? No problemNational data? No problem• National Health Interview Survey [NHIS] data are
the ‘gold standard’– Uses a very large N, continuous sampling, in-person
mode to establish household phone service.– NHIS provides fairly current data on cellphone
coverage, percent CPO, phone segment distributions
• NHIS data are available for the U.S. & for four census regions– State estimates released in 2009 using CPS + NHIS
• SOLUTION: Weight phone-service segments in the national sample to NHIS percents for U.S.
4
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
What about local studies?What about local studies?
• We cannot assume that the local phone-service segment distribution is the same as national or regional averages.
• Cellphone penetration and CPO lifestyle adoption vary considerably across areas.
• Cell penetration is higher in high density areas, metro areas, high-income areas, flat terrain, near interstates
• CPO percentage varies with age, ethnicity, urbanicity, landline phone costs
• NHIS: strong phone service variation across regions, states– Variation within states is probably similar in magnitude
5
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Why not use percents from the local sample data?
Why not use percents from the local sample data?
• In a local dual-frame sample, we will directly observe % CPO in the cell sample, % LLO in the landline sample.
• But estimation from these observed percents is problematic for several reasons:
1) If we just combine the two samples, we overlook the fact that overlap households are double-sampled.
2) It’s not intuitively obvious how to calculate the percentages for the combined sample from the split sample results.
6
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Why not use percents from the local sample data?
Why not use percents from the local sample data?
3) Cellphone-only cases are substantially overcounted in a cellphone sample.
• CPOs have different telephone behaviors. More likely than dual-phone users . . .• To have phone with them• To have phone turned on• To accept calls from unknown numbers
4) Cellphone samples are usually kept small because of higher per-completion cost
• So we can’t just add up the segment counts from the two samples.
7
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Can we use the local sample data?
Can we use the local sample data?
• Collected data from the two realized, local samples surely contain useful information about local phone-service segments
• Overcounts of CPO and LLO distort these data• We have to do the math correctly• IDEA: Estimate the amount of CPO and LLO
overcount in national dual-frame studies, and then apply an adjustment to the local sample data to arrive at local estimates for %CPO and %LLO
8
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Overview: A proposed solutionOverview: A proposed solution• Develop algebraic solution for combining the two sample
results from a dual-frame design into an overall phone service segment distribution, assuming equal response rates.
• Develop algebraic solution for combining the two samples when response rates are NOT equal– higher response rates (overcounts) are assumed for CPO and LLO
(compared to overlap)• Compare 2007 CHIS to 2007 NHIS (West region) to
estimate ‘response rate ratios’ that correspond to the observed overcount
• Apply these ratios to newly collected dual-frame survey data from three counties in Virginia– Result: plausible, locality-specific estimates of phone segments
9
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Key assumptionsKey assumptions
• Local phone-service segment distributions vary– Forcing NHIS segment distributions onto local data
would distort results
• Response rate ratios (rates of overcount) are constant across surveys– If fielding and screening procedures are similar
• Sampling variability is ignorable– In comparison of NHIS to CHIS
– In projection from the local samples to local population
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
How to combine dual-frame sample results
(equal response rates)
How to combine dual-frame sample results
(equal response rates)
The universe of telephone households
The universe of telephone households
100%
Cell phone samples include some that are also in the RDD frame
Cell phone samples include some that are also in the RDD frame
Cell phones (Frame 1)
Landline-only
householdsare excluded
81.1%
RDD samples cover all landline households
RDD samples cover all landline households
RDD
(Frame 2)
Cell-phone-only households
are excluded
86.8%
RDD and Cell samples overlap,yield complete coverage
RDD and Cell samples overlap,yield complete coverage
Cell phones
RDD
CPOCELL ONLY
13.2%PaT=.132
OVERLAP
CELL + LANDLINE
67.9%
PabT=.679
LLOLANDLINE
ONLY
18.9%PbT=.189
All percentages are from 2007 NHIS data (West region).
1PPPTTT abba
a
ab
b
These proportions define the population distribution of segments:
With equal response rates, cell sample would show:
With equal response rates, cell sample would show:
Cell phones
RDD
CPOPaT=.132
OVERLAP
PabT=.679 LLOLANDLINE
ONLYPbT=.189
All percentages are from 2007 NHIS data (West region).
a
81.1%
CPO as percent ofFrame 1
Pa′ =.132/.811=.163
OVERLAP as percent of Frame 1
Pab′ =.679/.811
=.837
1PP baa
With equal response rates,RDD sample would show:With equal response rates,RDD sample would show:
Cell phones
RDD
CPOPaT=.132
OVERLAP
PabT=.679
LLOPbT=.189
All percentages are from 2007 NHIS data (West region).
a
ab
bOVERLAP as
percent of Frame 2
Pab″=.679/.868
=.783
86.8%
LLO as percent Of Frame 2
Pb″=.189/.868=.218
1PP bab
So, if response rates were equal, we would have . . .
So, if response rates were equal, we would have . . .
True values
NHIS West 2007
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
13.2% Pa′ 16.3%
OverlapPabT
67.9% Pab′ 83.7% Pab″ 78.3%
LLO
PbT
18.9% Pb″ 21.7%
Total 100.0% 100.0% 100.0%
How do we get from observedpercentages to population percents?
How do we get from observedpercentages to population percents?
True values
NHIS West 2007
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
?? Pa′ 16.3%
OverlapPabT
?? Pab′ 83.7% Pab″ 78.3%
LLO
PbT
?? Pb″ 21.7%
Total 100.0% 100.0% 100.0%
19
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Formulas for calculating underlying population distribution
Formulas for calculating underlying population distribution
1P1P1
1P
'ab'ab'abT
)(PP
PP
T
T
abab'
abaT
abaTbT PP1P
With PabT + PaT evaluated, we have:
.
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Combining dual-frame sample results when response rates
are not equal
Combining dual-frame sample results when response rates
are not equal
Three segments, four response ratesThree segments, four response rates
Cell phones
RDD
Cell sample response rate
for CPOs:
ra
a
ab
b
Cell sample response rate
for overlap:
rab′
RDD sample response rate
for LLOs:
rb
RDD sample response rate
for overlap:
rab″
22
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
4 response rates,2 response rate ratios
4 response rates,2 response rate ratios
• Reduction in base response for dual-phone in the cell sample is:
– This is the ‘response rate ratio’ that applies to the cellphone sample.
• Reduction in base response for dual-phone in the RDD sample is:
– This is the response rate ratio for the RDD sample.
a
ab'1 r
rr
b
'ab'2 r
rr
23
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
It follows that . . .It follows that . . .
• And our expressions for calculating true population phone service segments are modified by incorporating the response rate ratios:
).(rrr);(rrr b2'ab'a1ab'
21'ab'2ab'1ab rr1PrPr
1P
T
)(PrP
PrP
T
T
ab1ab'
ab1a abab PP1P
24
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
How to calculate response rate ratiosHow to calculate
response rate ratios• Now assume that we have observed results from a
dual-frame phone survey.
• We also know the true population distribution.
• We can calculate the response rate ratios:
ab'abab
ab'a1 )P(PP
)P(Pr
TT
T
baabab
bab2 )P(PP
)P(Pr
TT
T
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Deriving response rate ratiosby comparing
CHIS 2007 to NHIS
Deriving response rate ratiosby comparing
CHIS 2007 to NHIS
CHIS 2007California Health Interview Survey
CHIS 2007California Health Interview Survey
True values
NHIS West 2007
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
13.2% Pa′ 34.6%
OverlapPabT
67.9% Pab′ 65.4% Pab″ 68.3%
LLO
PbT
18.9% Pb″ 32.7%
Total 100.0% 100.0% 100.0%
≠16.3%
≠21.7%
27
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
From these data we can evaluate r1 and r2
From these data we can evaluate r1 and r2
368.)P(PP
)P(Pr
ab'abab
ab'a1
TT
T
598.)P(PP
)P(Pr
baabab
bab2
TT
T
In the cellphone sample,overlap response rateis only 37% of CPO rate.
In the RDD sample,overlap response rateis about 60% of LLO rate.
• Overcount of CPOs is greater than overcount of LLOs. This shows: many dual-phone users still use cellphone as a secondary device.
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Calculating local area estimatesof population phone-service
segment distributions
Calculating local area estimatesof population phone-service
segment distributions
29
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
2008 Prince William County Survey
2008 Prince William County Survey
• Citizen satisfaction survey in large, suburban county in Northern Virginia
• N = 1,666• Triple frame design: cellphone, landline RDD, and
directory-listed sample– Here we combine the landline samples and treat as a
dual-frame design
• Screening questions patterned after those on CHIS
2008 Results for Prince William County, VA
2008 Results for Prince William County, VA
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
Pa′ 40.6% 0.7%
OverlapPabT
Pab′ 59.4% Pab″ 88.5%
LLO
PbT
Pb″ 10.5%
Total 100.0% 100.0% 100.0%
2008 Results for Prince William County, VA
2008 Results for Prince William County, VA
True values
for PWC
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
?? Pa′ 40.6% 0.7%
OverlapPabT
?? Pab′ 59.4% Pab″ 88.5%
LLO
PbT
?? Pb″ 10.5%
Total 100.0% 100.0% 100.0%
32
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Apply formulas given above:Apply formulas given above:
753.rr1PrPr
1P
21'ab'2ab'1abT
.190 )(PrP
PrP
T
T
ab1ab'
ab1a
057.PP1P abab
Calculations based on:r1 = .368r2 = .598
2008 Results for Prince William County, VA
2008 Results for Prince William County, VA
True values
for PWC
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
19.0% Pa′ 40.6% 0.7%
OverlapPabT
75.3% Pab′ 59.4% Pab″ 88.5%
LLO
PbT
5.7% Pb″ 10.5%
Total 100.0% 100.0% 100.0%
34
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
2008 Albemarle County Survey2008 Albemarle County Survey
• Citizen satisfaction survey• Suburban and rural county surrounding City of
Charlottesville, VA• Similar triple-frame design as in PWC survey• Smaller sample size: n = 700
2008 Results for Albemarle County, VA2008 Results for Albemarle County, VA
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
Pa′ 21.9% 0.2%
OverlapPabT
Pab′ 78.1% Pab″ 82.7%
LLO
PbT
Pb″ 17.2%
Total 100.0% 100.0% 100.0%
2008 Results for Albemarle County, VA2008 Results for Albemarle County, VA
True values for
Albemarle
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
8.4% Pa′ 21.9% 0.2%
OverlapPabT
81.4% Pab′ 78.1% Pab″ 82.7%
LLO
PbT
10.2% Pb″ 17.2%
Total 100.0% 100.0% 100.0%
37
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
2008 Chesterfield County Survey2008 Chesterfield County Survey
• Citizen satisfaction survey• Suburban county adjacent to Richmond, VA• Similar triple-frame design as in PWC survey
– Treated as dual frame here
• n = 1600
2008 Results for Chesterfield County, VA2008 Results for Chesterfield County, VA
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
Pa′ 20.4% 0.1%
OverlapPabT
Pab′ 79.6% Pab″ 87.6%
LLO
PbT
Pb″ 12.4%
Total 100.0% 100.0% 100.0%
2008 Results for Chesterfield County, VA2008 Results for Chesterfield County, VA
True values for
Chesterfield
Observed thru
Cell sample
Observed thru
RDD sample
CPO
PaT
8.0% Pa′ 20.4% 0.1%
OverlapPabT
84.8% Pab′ 79.6% Pab″ 87.6%
LLO
PbT
7.2% Pb″ 12.4%
Total 100.0% 100.0% 100.0%
Contrasting resultsContrasting results
NHISCHIS
[= NHIS]
Prince William
Albe-marle
Chester-field
CPO
PaT
13.2% 13.2% 19.0% 8.4% 8.0%
OverlapPabT
67.9% 67.9% 75.3% 81.4% 84.8%
LLO
PbT
18.9% 18.9% 5.7% 10.2% 7.2%
Total 100.0% 100.0% 100.0% 100.0% 100.0%
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Using the estimated segment distribution to weight the
sample data
Using the estimated segment distribution to weight the
sample data
Example: PWC 2008Example: PWC 2008
Observed thru
cell sample
Observed thru
RDD sample
Combined sample
unweighted
CPO 76 40.6% 11 0.7% 87 5.3%
Overlap 111 59.4% 1303 88.5% 1414 85.4%
LLO 154 10.5% 154 9.3%
Total 187 100.0% 1468 100.0% 1655 100.0%
3-segment weights: PWC 20083-segment weights: PWC 2008
Combined sample
unweighted
True values for
PWCWeight Weighted N
CPO 87 5.3% 19.0% 3.61 314 19.0%
Overlap 1414 85.4% 75.3% .88 1247 75.3%
LLO 154 9.3% 5.7% .61 94 5.7%
Total 1655 100.0% 100.0% 1655 100.0%
But wait . . . We have 4 segmentsBut wait . . . We have 4 segments
Observed thru
cell sample
Observed thru
RDD sample
Combined sample
unweighted
CPO 76 40.6% 11 0.7% 87 5.3%
Overlap
via cell111 59.4% 111 6.7%
Overlap
via RDD1303 88.5% 1303 78.7
LLO 154 10.5% 154 9.3%
Total 187 100.0% 1468 100.0% 1655 100.0%
If 2 frames split the overlap equally:
If 2 frames split the overlap equally:
Combined sample
unweighted
True values for
PWCWeight Weighted N
CPO 87 5.3% 19.0% 3.61 314 19.0%
Overlap
via cell111 6.7% 37.7% 5.62 623 37.7%
Overlap
via RDD1303 78.7 37.7% .48 623 37.7%
LLO 154 9.3% 5.7% .61 94 5.7%
Total 1655 100.0% 100.0% 1655 100.0%
If overlap-cell segment gets weight = 2If overlap-cell segment gets weight = 2
Combined sample
unweighted
True values for
PWCWeight Weighted N
CPO 87 5.3% 19.0% 3.61 314 19.0%
Overlap
via cell111 6.7%
75.3%
2.00 222 13.4%
Overlap
via RDD1303 78.7 .79 1025 61.9%
LLO 154 9.3% 5.7% .61 94 5.7%
Total 1655 100.0% 100.0% 1655 100.0%
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
In Summary . . .In Summary . . .
48
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Problem and solutionProblem and solution• We don’t have ‘gold standard’ data by which to weight the
results of a dual-frame telephone survey in a local area
• Weighting to national or state averages might not be accurate
• We developed needed formulas that relate observed percentages to underlying population phone segment distributions
• We calculated ‘response rate ratios’ by comparing CHIS 2007 to regional NHIS 2007 results.
• We applied these ratios to calculate underlying distributions in three local telephone surveys
49
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
ResultsResults
• The estimates for three suburban counties in Virginia are quite different from national phone-segment distributions—and from each other– Cellphone penetration is higher in Northern Virginia
than in downstate suburbs, or in national estimates– CPO lifestyle has been adopted by fewer people in the
downstate suburbs
• The estimates can guide weighting of sample data– But we must use caution in weighting our cellphone
samples up too much– Larger cellphone samples needed in the future
50
Center for Survey ResearchUniversity of VirginiaCenter for Survey ResearchUniversity of Virginia
Future researchFuture research• This is a time of rapid change in the telephone
system– We are just learning how to deal with the weighting
issues in cellphone surveys
• We need to look at optimization of our dual-frame designs (cf. Hartley 1962)
• Estimates of response rate ratios can be updated using more current national phone surveys compared to NHIS
• Results would be strengthened if external local data were available to validate the estimates
Estimating Phone Service and Usage Percentages:
How to Weight the Data from a Local, Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users in the United States
Estimating Phone Service and Usage Percentages:
How to Weight the Data from a Local, Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users in the United States
Presented at
AAPOR 2009
Hollywood, FL
May 14, 2009
Thomas M. [email protected]