the online use of randomized response measurements
DESCRIPTION
The Online Use of Randomized Response Measurements. Chris Snijders Eindhoven University of Technology The Netherlands [email protected] Jeroen Weesie Utrecht University The Netherlands [email protected]. Questions in surveys. - PowerPoint PPT PresentationTRANSCRIPT
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
The Online Use of Randomized Response Measurements
Chris SnijdersEindhoven University of TechnologyThe [email protected]
Jeroen WeesieUtrecht UniversityThe [email protected]
Questions in surveys
• Surveys are one of the standard instruments of the social scientist
• You ask for behavior, attitudes, characteristics etc
• Big problem: non-response (especially firms), you get selective responses (cf. Dutch elections)
• Many surveys now conducted online either after email invitation, banners, etc
Internet surveys• Seem to work relatively well
• Except for sensitive questions (which were problematic off-line as well)
• Social desirability bias: the tendency to report about oneself in a favourable manner or in accordance with local norms (Edwards, 1957)
Getting rid of social desirability bias• Indirect questions
• "Covariate technique" Marlowe-Crowne scale (MCSDS)
• Lie-detector (!; this does not seem to work that well online)
• stress-reduction through question wording ("everybody does things they later regret ...")
• randomized response
Sensitive questions in surveysFor instance, questions about
– criminal behavior– sexual preferences– monetary issues– ...
Two major concerns
– Survey drop-out– Useless answers (respondents do not admit to behavior that
is likely to be considered unappropriate or weird)
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Throughout
ONLY BINARY RESPONSE VARIABLES
(0/1)
YES = ADMITTING TO THE SENSITIVE ISSUE
See, e.g., Warner, 1965; Kuk, 1990; Chaudhuri & Mukerjee, 1988; Fox, 1986
Basic idea (here you see the forced response method):
Did you cheat on your tax-return last year?
Respondent is instructed to roll two dice:
if 2, 3 or 4 : reply YESif 11 or 12 : reply NOotherwise : tell truth
Possible solution: Randomized Response
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Randomized Response
• Allows estimation of group averages (if respondents follow the protocol).
• Protocols other than using dice are possible, e.g., using a question such as:If your mother’s birthday is in Jan, Feb, Mar : YESIf your mother’s birthday is in Nov, Dec : NOOtherwise : truthOnline use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Did you cheat on your tax-return last year?
Respondent is instructed to roll two dice:if 2, 3 or 4 : reply YESif 11 or 12 : reply NOotherwise : tell truth
Might be negative!
Other ways to do randomized response• Roll the dice. If you roll 2 through 7 answer question
number 1, otherwise answer question 2:
1) I own an illegal copy of Microsoft Office.correct / not correct
2) I do not own an illegal copy of Microsoft Officecorrect / not correct
Or:• How many of the following issues pertain to you:
- you own a laptop- you like country music- you own a motor-cycle- you play a musical instrument
- you own a laptop- you like country music- you own a motor-cycle- you play a musical instrument- you own an illegal copy of Microsoft Office
Version 1
Version 2
Randomized Response
• Main use: dichotomous variables (yes-no)
• Two kinds of studies:
– With an objective control
– Without an objective control, we assume higher observed percentages are better measurements
• RR improves results (in paper-and-pencil surveys; Edgell et al., 1982; Lensvelt-Mulders et al, 2005), but is still far from perfect
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Randomized Response
But ... not often used, because:
• Necessary sample size is larger (typically 750 or more [given prev=7%])
• Wide-spread myth that analyses at the individual level are impossible.
And
• Most of the evidence is based on off-line research
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Individual level (logistic) regressions... are possible, but that is not common knowledge.
1. Stata
2. SPSS www.randomizedresponse.nl, search for HLanalyse.pdf (in Dutch, unfortunately)
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
capture program drop rrlogit_lfprogram rrlogit_lf args lnf lp tempvar p quietly { gen double `p' = exp(`lp')/(1 + exp(`lp')) replace `p' = (1/6) + (3/4)*`p’ replace `lnf' = ($ML_y1==1)*log(`p') + ($ML_y1==0)*log(1-`p') }end
ml clearml model lf rrlogit_lf (y = x1 x2)ml max
Randomized Response online
• Might work: replying online you feel more anonymous when combined with RR, you feel even more anonymous and hence do not mind answering sensitive questions
• Might not work: – Implementation online is non-trivial– Since online already makes one feel anonymous, loss in
precision might not be compensated for– Respondents might “play it safe” and not follow protocol
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
DesignPopulation: Internetpanel in Netherlands - EuroClixSensitive questions in four conditions (n=3,557)
A direct [control condition] n=1,078 complB dice embedded in the survey n = 910 compl
C “downloadable dice” n = 679 compl
D optional rand. response (if yes B) n = 890 compl
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Question ...
Sensitive questions about three topics
1. Behavior in surveys
2. Traffic violations
3. Illegal copies of software / movies / music
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Behavior in surveys
S1: At PanelClix I am registered under more than just one name.
S2: I fill out the surveys without really reading what they ask me.
S3: In the past two weeks, I filled out 4 or more surveys from PanelClix
S4: I sometimes fill out surveys under the id of another PanelClix member
S5: I sometimes let somebody else fill out surveys under my id.
S6: I sometimes lie about personal characteristics in a PanelClix survey
S7: When I have to respond to large numbers of statements I sometimes just rush through the answers.
S8: I am what you could call "a professional respondent" S9: Almost always I leave open questions blank.
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Traffic violationsT1: Have you had a speeding ticket in the past 3 months?
T2: Do you ever drive faster than 100 where only 50 is allowed?
T3: Do you ever drive a car or motorcycle when you know you have had too much to drink?
T4: Did you neglect a red traffic light in the past week (by car or motorcycle)? T5: On the highway I tend to drive closely behind the car in front of me, so that
they will get out of my way ("bumperkleven").
T6: Have you ever damaged the vehicle of somebody else without reporting it?
T7: In the past two months, have you driven faster than 150 km/h with a car or motorcycle?
T8: In the past two months, did you park in a place where you had to pay, but paid less than you had to, or nothing at all?
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Illegal music, movies and software
IC1: I own copied music for which I have not paid although I should have
IC2: -------------- movies ----------------------------------------------------
IC3: -------------- software --------------------------------------------------
IC4: I have past on copied music, movies or software to others so that they do not have to pay for it, although I know they should have just bought it.
IC5: I have an illegal copy of Microsoft Windows in my possession.
IC6: Whenever possible, I try to get commercial software without having to pay for it.
IC7: The largest part of my music collection is actually illegal.
IC8: I have an illegal copy of Microsoft Office in my possession.
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Do you understand ......what you are supposed to do?
No clue 4%Not really 3%I think I do 39%Completely clear 55%
...what the purpose of the procedure is?
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
0.5
11.
5D
ensi
ty
0 2 4 6 8 101 3 5 7 9Understand usefulness of procedure
Some data cleaning is necessary ...
Completion rates per condition
(Mean time for survey-completion 15 minutes)(given that respondent started survey)
A: direct 85.5%B: RR embedded 80.7%C: RR download 62.2%D: RR optional 78.5%
So downloadable dice cost 15-20 percentage points of the completion rate
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Also: nine indirect questions
Out of 100 people, how many ...
(behavior in surveys)S6S2S4(traffic violations)T2T3T4(illegal copies)IC1IC5IC8
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
And several covariates:- age- gender- computer literacy- ...
[for those in the control condition]
Indirect questions correlate with the direct question scores,
but are not strong enough predictors to actually predict the behavioral data.
Results: surveys
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Direct RRemb
RRdown
RRoptN
RRoptY
S1 6 -6 -2 4 0S2 2 -7 -8 2 -4S3 30 30 31 30 29S4 2 9 -11 1 -9S5 2 8 -11 1 -11S6 4 4 -5 5 -6S7 27 24 29 32 27S8 9 4 -1 9 -1S9 27 24 26 25 25
NB Estimate = 4/3*(Obs – 1/6)
Results: traffic
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Direct RRemb
RRdown
RRoptN
RRoptY
T1 16 10 11 18 8T2 3 -1 2 6 0T3 12 9 2 10 1T4 31 32 26 37 34T5 16 14 14 19 13T6 6 -1 1 7 0T7 20 15 1 19 12T8 38 32 6 34 37
Results: illegal copies
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Direct RRemb
RRdown
RRoptN
RRoptY
IC1 62 63 60 62 70IC2 43 44 46 44 55IC3 47 53 44 44 51IC4 43 46 49 44 51IC5 14 13 13 16 9IC6 46 51 49 44 56IC7 26 26 24 25 22IC8 21 24 24 22 15
This does not seem to work that well...
• The group of people who is convinced by the Randomized Response method is not large enough
• Or ... respondents are not following the protocol!
Who are not following the protocol?
7.0% of respondents has 2 “yes”-answers or less2.4% gives no “yes”-answers at all
Logistic regression on 2 “yes”-answers or less
Age + Female+ Education - Computer literacy1 (podcasts, RSS etc) 0 Computer literacy2 (basic internet skills)-
Understand how 0Understand why -
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Randomized response: non-complianceThree types of respondents:1) Honest yes =2) Honest no =3) Cheater: has yes, but says no =And: they add up to 1, and we are interested in Assumption: if ordered to say no, all do soThen we have:
Direct questions :
Indirect questions :
Note: the idea itself is old (and not mine)
The idea for this is in Clarke 1998.
Downloadable from
http://chrissnijders.com/tempback/Clarke1998.pdf
Taking non-compliance into account
Direct RRemb
RRdown
RRoptN
RRoptY
RRadjNC
IC1 62 63 60 62 70 84IC2 43 44 46 44 55 61IC3 47 53 44 44 51 72IC4 43 46 49 44 51 63IC5 14 13 13 16 9 22IC6 46 51 49 44 56 69IC7 26 26 24 25 22 38IC8 21 24 24 22 15 36
Conclusions– Rand.Response online
• Not much support for Randomized Response (with the Forced Response method) for these particular topics if non-compliance is not taken into account.
For illegal software we find small positive effects. Larger effects if non-compliance is taken into account.
• Some indication that RR works better as the sensibility of the topic increases
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
Conclusions (2)
• Quick and dirty result: compliance with protocol more often for younger, male, high-educated, computer-literate respondents (who understand what RR is for).
• Allowing for optional Randomized Response does not seem to work very well; perhaps some support with the illegal copying topic
• Downloadable dice – not a good idea
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
?
So there is good and bad news ...• Internet high openness, which makes RR less
necessary.
• For really sensitive behavior, RR can be conducted online and analyzed relatively easily ...
• ... but compliance with the protocol is a major issue and has to be explicitly modeled
different kinds of non-compliance analyses less straightforward
Possible assignments
First review the literature on Randomized Response measurement (given online).
Either:
1) For the fanatics: Run a mini-survey on some topic that is sensitive but interests you (help will be provided). Try to come up with different ways to measure the topic of interest.
2) Design an experiment to further test the use of randomized response measurement. For instance, compare different methods.
3) Give a brief overview of randomized response measurement, and come up with a large set of questions that can be used as randomizers (such as "in which month was your mother born?")
Results: traffic
Online use of randomized response measurements – C.Snijders, J.Weesie. GOR, Hamburg, March 10-12, 2008
RRnCOMPL
Direct RRemb
RRdown
RRoptN
RRoptY
T1 72 16 10 11 18 8T2 69 3 -1 2 6 0T3 55 12 9 2 10 1T4 34 31 32 26 37 34T5 42 16 14 14 19 13T6 52 6 -1 1 7 0T7 65 20 15 1 19 12T8 33 38 32 6 34 37