how to seamlessly integrate data quality measures into your questionnaire

27
Please tweet! @PeanutLabsMedia @LoveStats How to seamlessly integrate data quality measures into your questionnaire Presented by Annie Pettit Chief Research Officer at Peanut Labs Please tweet! @LoveStats @PeanutLabsMedia

Upload: annie-pettit-research-officer-peanut-labs

Post on 17-Jul-2015

87 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

How to seamlessly

integrate data quality

measures into your

questionnaire

Presented by Annie PettitChief Research Officer at

Peanut Labs

Please tweet! @LoveStats @PeanutLabsMedia

Page 2: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

How to distinguish between these people?

But,

incentive getting,

satisficing,

cheaters make lots

of mistakes

When answering

surveys, good

honest people will

naturally make a

mistake or two

Page 3: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Multi-Select Questions

Red Herrings

High/Low Incidence

Over/Under Clicking

Following Instructions

Sed turpis elit, venenatis utporttitor sed, venenatis in arcu.Nam consequat leo ut loremviverra, ac commodo exullamcorper? Praesent Maximus Ullamcorper Faucibus Lacinia Tincidunt Efficitur

Page 4: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Red Herrings

•Question Type: Multi-selects, Single-selects, Rating scales

• Concern: Respondents may have misread or confused a name. It may be a real name!

• Cheaters: Incentive Getting. Responder is providing the minimum amount of information required to proceed.

•Application: Choose at least TWO fake names and Google them to ensure they are extremely low incidence

• Scoring: Flag answers in sets of two

Which of the following stores have you visited in the past 3 months? (Please select all that apply) Abercrombie & Fitch Aéropostale American Apparel American Eagle Outfitters Anthropologie Bellamis New York Bloomingdale's Brooks Brothers Club Forenzo Coldwater Creek Dillard's DKNY Other

Page 5: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

High Incidence

•Question Type: Multi-selects

• Concern: Respondents are tired or bored.

• Cheaters: Satisficing. Responder is providing the minimum amount of information required to proceed.

•Application: Works best on longer lists. Include at least a few answers that ought to be selected by everyone

• Scoring: Flag people with the fewest clicks for very common responses.

Which of the following activities have you participated in during the last 3 months? (Please select all that apply) Attended a sporting event Attended a music event Exercised Listened to music Used the internet Visited a community center Visited a library Watched TV Went to a grocery store Went to the movies Went to school Went to work

Page 6: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Low Incidence

•Question Type: Multi-selects

• Concern: Respondents are thinking of people other than themselves

• Cheaters: Incentive getting. They may be trying to avoid screening out so they will qualify for an incentive.

•Application: Incorporate at least 3 extremely low incidence options. Rare doesn't happen in pairs.

• Scoring: Flag people who select two or more rare answers.

Which of the following ailments do you have? (Please select all that apply) Acid Reflux Acidosis Adrenal Disorders AIDS Alzheimer's Disease Amyloidosis Anemia Anorexia Nervosa Arteriosclerosis Autism Blood Pressure (High) Bronchitis Cancer

Page 7: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Underclicking

•Question Type: Multi-selects

• Concern: Respondents are tired or bored.

• Cheaters: Satisficing. Responder is providing the minimum amount of information required to proceed.

•Application: Works best on longer lists. Include at least a few answers that ought to be selected by everyone.

• Scoring: Flag people with the fewest clicks across the whole question.

Which of the following stores have you visited in the past 3 months? (Please select all that apply)

Albertsons Big Lots Costco CVS Dollar General Family Dollar Kroger Publix Safeway Target Walgreen Walmart

Page 8: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Overclicking

•Question Type: Multi-selects

• Concern: They may be thinking of their household or family members.

• Cheaters: Incentive getting. They may be trying to qualify for the incentive.

•Application: Works best on longer questions. Ensure there are options that don’t really go together.

• Scoring: Flag people with the most clicks across the whole question.

Which of the following stores have you visited in the past 3 months? (Please select all that apply) Abercrombie & Fitch Aéropostale American Apparel American Eagle Outfitters Anthropologie Barneys New York Bloomingdale's Brooks Brothers Club Monaco Coldwater Creek Dillard's DKNY Eddie Bauer

Page 9: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Following Instructions

•Question Type: Unvalidated multi-selects

• Concern: Respondents may have misunderstood the task, or want to provide a ‘more accurate’ answer

• Cheaters: Satisficing. The responder wants to finish and get their incentive.

•Application: Word questions so it makes sense to ask for exactly 2 or 3 choices. Don’t validate!

• Scoring: Flag anyone who chooses more or fewer responses than requested.

Which three of the following stores did you visit most often? (Please select only 3) Abercrombie & Fitch Aéropostale American Apparel American Eagle Outfitters Anthropologie Bellamis New York Bloomingdale's Brooks Brothers Club Forenzo Coldwater Creek Dillard's DKNY Other

Page 10: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Don’t Know

•Question Type: Multi-selects, Single-selects

• Concern: They probably do know if they would think about it a little bit

• Cheaters: Satisficing. Responder is providing the minimum amount of information required to proceed.

•Application: Follow good survey design practice of including DK wherever it is possible

• Scoring: Count how many times they select ‘Don’t Know’ across the survey

Which of the following stores have you visited in the past 3 months? (Please select all that apply) Abercrombie & Fitch Aéropostale American Apparel American Eagle Outfitters Anthropologie Barneys New York Bloomingdale's Brooks Brothers Club Monaco Coldwater Creek Dillard's DKNY Don’t know

Page 11: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Grid Rating Questions

Straightlining

Sed turpis elit, venenatis utporttitor sed, venenatis inarcu?

o o o o o

o o o o o

o o o o o

o o o o o

Praesent maximus

Ullamcorper faucibus

Lacinia tincidunt

Efficitur aliquam

Page 12: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Straightlining

•Question Type: Multi-select ratings

• Concern: Respondents may have only glanced at the items, not noticed reverse keyed items

• Cheaters: Satisficing. Get in and get out!

•Application: Be sure to include positively and negatively keyed items.

• Scoring: Measure every grid for patterns -vertical, diagonal, repetitive. Each straightline generates a flag.

What is your opinion about each of these statements?

o o o o o

o o o o o

o o o o o

o o o o o

Type AXXXXX

Type BX

XX

XX

Type CX

XX

X

Tastes good

Smells tempting

Feels nice

Looks pretty

Page 13: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

“Select the third answer”

•Question Type: Multi-Select Rating

• Concern: It’s confusing. It creates suspicion. SQUIRREL!

• Cheaters: Satisficing. Answer quickly and get your incentive.

•Application: Ask responders to choose a specific answer, “Select Somewhat Disagree in this row”

• Scoring: Don’t! This is a terrible measure!

What is your opinion about each of these statements?

o o o o o

o o o o o

o o o o o

o o o o o

o o o o o

Tastes bland

Would recommend

Select Disagree Somewhat

Box is ugly

Smells delicious

Agre

e

Stron

gly

Agre

e

Som

ew

hat

Ne

utral

Disagre

e

Som

ew

hat

Disagre

e

Stron

gly

Squirrel!

Page 14: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Open-End Verbatims

Pellentesque quis aliquet felis,

sit amet ultricies erat. Etiam velmetus augue.

Page 15: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

dunno NA

I lik that their crunchy

? none

1) If Congress cannot do its job - pass aBudget EVERY year (not a CR), they shouldNOT get paid AND should NOT get ANYbenefits; 2) An ex government official shouldbe prohibited form becoming a lobbyist within2 years of leaving office; 3) Contracting outshould NOT be done if NO REAL costs savingscan be realized; and 4) People making over$250, 000 should pay a higher percentage ofincome tax - minimum of 30%

Verbatim Length

•Question Type: Long verbatims

• Concern: It’s difficult to introspect and self-evaluate

• Cheaters: Satisficing. Responder is providing the minimum amount of information required to proceed.

•Application: Create at least one question that requires a long answer. “Describe three reasons why…” (And benefit from the tidbits of gold!)

• Scoring: Flag any response under 10 characters.

Page 16: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

asdf

You’re stupid

dumbdumbdumb

ass

hjsrthetutjytnse5hyyjn,jiuuyjydtpysrgawe.btzergrsebearh

uyllkjhhgcfddwad

Verbatim Gibberish

• Question Type: Short verbatims, Long verbatims

• Concern: It’s difficult to introspect and self-evaluate

• Cheaters: Satisficing. Responder is providing the minimum amount of information required to proceed.

• Application: Ensure there is at least one open-end question for a text response.

• Scoring: Look for no spaces, improbable letter combinations (hh, kk, yy, hj, bt, js). Assign points based on the severity (e.g., crude words=3, ‘ass’=2, ‘asdf’=1

Page 17: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Other Question Types

Contradictions; Don’t Know;

Sums and Ranks; Speeding

Page 18: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Contradictions

•Question Type: Any two questions

• Concern: Respondents may misread one of the questions. The question was poorly worded.

• Cheaters: Satisficing.

•Application: Ensure the questions are far apart. Focus on related, contextual questions, not identical questions. Allow improbable answers.

• Scoring: Flag cases where the answers don’t match

Q3. What pets do you have in your household? Dog Cat Fish Bird Small mammal Other None

Q19. How often does your household buy pet food?o At least once per weeko About once per montho Several times per yearo About once per yearo Less ofteno Never

Page 19: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Rank Orders

• Question Type: Unvalidated fill-in-the-blank rank orders

• Concern: They didn’t read carefully, They weren’t thinking carefully. They didn’t understand what they were being asked for.

• Cheaters: Satisficing. Responder is incentive getting.

• Application: Create a question with 5 to 8 options. Don’t validate!

• Scoring: Flag cases with any numbers less than or greater than the minimum/maximum, or if there are any duplicate numbers

Please rank the importance of these store features from 1 to 5. 1 = Most Important5 = Least Important(Please use each number only once.)

Price Location Style Service Selection

Page 20: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Sums

•Question Type: Unvalidated fill-in-the-blank sums

• Concern: They didn’t understand the question. Their math doesn’t reflect reality. They aren’t good with math.

• Cheaters: Satisficing.

•Application: Create a question with 4 to 6 options so the math is reasonable. Don’t validate!

• Scoring: Flag any response that doesn’t add to 100.

What percentage of your monthly income is spent in these areas? (Please make sure the numbers add up to 100%.) Housing Food Utilities (e.g., heat, electricity) Transportation Entertainment

Page 21: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Speeding

•Question Type: The entire survey

• Concern: Some people are fast readers. Some have a lot of survey experience. Some got all the skips and short paths.

• Cheaters: Satisficing. Incentive getting.

•Application: Wait until all the completes are in. Fastest 5% is ideal. Fastest 2% is impossible!

• Scoring: Give one point to the fastest 5%. Give two points to the fastest 2%.

Seconds CumulativeFrequency

0 - 40 041 - 50 051 - 60 0.261 - 70 0.571 - 80 0.981 - 90 1.591 - 100 2.4101 - 110 3.5111 - 120 4.7121 - 130 6.4131 - 140 8.3141 - 150 10.5151 - 160 13.6161 - 170 16.6171 - 180 19.7181 - 190 23.6191 - 200 27.6

Page 22: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Creating Cut-Scores

Question Level

Survey Level

Page 23: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Question Level Cut-Scores

Red

Herring

Percent of

people failing

0 78.9

1 15.4

2 2.9

3 2.8

Over

Clicking

Percent of

people failing

0 76.3

1 13.5

2 9.4

3 0.8

Rank

Order

Percent of

people failing

0 77.7

1 9.6

2 2.9

3 9.8

Ideal Scenario Too Few Fail Too Many Fail

•Check and improve

the scoring

•Use it anyways

•Don’t use it at all

•Check and improve

the scoring

•Use it anyways

•Anything

between 3

and 7 is ideal

Page 24: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Question Level Scores of Individual People

Person 1 (Regular Person)

Person 2 (Satisficer)

Person 3 (Incentive Getter)

Overclick 0 0 1Underclick 0 1 0Speeding 0 2 0Red Herring 1 1 2Low Incidence 0 0 1High Incidence 0 1 0Rank Order 0 0 0Sums 0 0 0TOTAL 1 5 4

This is not a problem THESE are problems!

Page 25: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Survey Level Cut-Score

•Goal: Fail 5% of respondents

• Cut-Score: 3

• Result: 4.1% of respondents failed

•Why?

– Survey was poorly written

– Cheaters, liars, meanies

– Respondents couldn’t understand the questions, perhaps ESL or low reading level

Flags Cumulative Frequency

0 64.01 25.02 6.93 2.54 0.85 0.56 0.17 0.18 0.19 0.010 0.0

Page 26: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Tips

• Try to incorporate at least 4 different measures into every survey

• Don’t cram too many tests into one question

– You COULD put overclicking, underclicking, high incidence, low incidence, and red herrings in the same question. But if that’s the only question a good respondent had trouble with, you’ve excluded good data.

– Use each question a maximum of two times

• Spread the tests throughout the survey - beginning, middle, and end.

• This doesn’t have to be a lot of work. Create SPSS/Excel syntax that you can use repeatedly.

• Remember that human beings make mistakes. You makes lots of mistakes everyday. Don’t expect other people to be better than you.

Page 27: How to seamlessly integrate data quality measures into your questionnaire

Please tweet! @PeanutLabsMedia @LoveStats

Thank you!

Annie Pettit

Chief Research Officer

Peanut Labs

[email protected]

twitter.com/LoveStats

ca.linkedin.com/in/anniepettit/

Questions about our Sample Services?

Jonathan Cheriff

Director of Sales & Marketing

[email protected]

twitter.com/paperbackdad

www.linkedin.com/in/jonathancheriff