a comprehensive quality evaluation of security and privacy ... · reported useradoption...

36
A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web @eredmil1 [email protected] Elissa M. Redmiles, Noel Warford, Amritha Jayanti, and Aravind Koneru, Sean Kross, Miraida Morales, Rock Stevens and Michelle L. Mazurek

Upload: others

Post on 29-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web

@[email protected]

Elissa M. Redmiles, Noel Warford, Amritha Jayanti, and Aravind Koneru,Sean Kross, Miraida Morales, Rock Stevens and Michelle L. Mazurek

Page 2: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

2

Page 3: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

3

People must learn a variety of security & privacy behaviors

Page 4: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

By Elissa Redmiles, May 16, 2017

Estimates of damage caused by phishing vary widely, ranging from $61 million per year to $3 billion per year of direct losses to victims in the U.S.

By Jason Hong

Despite advances on core security problems,user decisions can still lead to significant security risks

4

Page 5: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

5

How do they learn security? Is security education working?

Page 6: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

6

Ecosystem-wide quality measurement of one of the most prevalent security education sources: online articles

Where is the Digital Divide? A Survey of Security, Privacy, and Socioeconomics. CHI2017. How I Learned to be Secure: a Census-Representative Survey of Security Advice Sources and Behavior. CCS2016.

Page 7: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

7

Comprehensibility: can users understand the document?

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

Evaluate quality of corpus along three axes

Page 8: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

8

User Generated Search Queries (989 docs)

Expert Recommended Advice (889 docs)

• List 5 search queries for each of 3 digital security topics you’re interested in learning more about

• Show up to 6 security & privacy news articles• First one they indicate interest in: ask for 3 search queries

10 security experts & librarians

Collected representative corpus of online security advice

Step 2: Crowd workers clean corpus “Is this document about online privacy/security?”

1,264 documents left after cleaning

Step 1: Collect documents based on user-generated searches & expert recommendations

Page 9: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

9

Comprehensibility: can users understand the document?

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

Evaluate quality of corpus along three axes

Page 10: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

10

Comprehensibility: can users understand the document?

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

Evaluate quality of corpus along three axes

Page 11: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

What to use when evaluating security documents?

Domain-Specific Application?

First-Glance PerceptionMatters?

No No YesYes

Expect Uniform Distribution?

YesNo

Cloze

FRES

Smart Cloze

FRES

Ease

FRES

Smart Cloze

Ease

Figure Credit: Comparing and Developing Tools to Measure the Readability of Domain-Specific Texts. EMNLP 2019.

Page 12: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

12

Smart Cloze tool creates domain-relevant distractors

Use NLP techniques to generate four grammatically-probable distractors:two distractors drawn from a domain-specific dictionary we generatetwo from a general dictionary

Page 13: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

13

Each document evaluated by three test-takers,who had excellent reliability (ICC>0.90)

Census-representative sample of test takers

Page 14: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

14

55% of documents at least partially comprehensibleAverage doc perceived as “somewhat” easy to read

Mean = 47.5% Partial comprehensionFull comprehension

Page 15: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

15

Variance within domain groupings: some government providers far more comprehensible than others

Partial comprehension

Full comprehension

Page 16: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

16

Evaluate quality of corpus along three axes

Comprehensibility: measure with Smart Cloze & perceived ease

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

55% of documents at least partially comprehensible

Page 17: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

17

Evaluate quality of corpus along three axes

Comprehensibility: measure with Smart Cloze & perceived ease

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

55% of documents at least partially comprehensible

Page 18: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

18

To measure actionability (and accuracy) need to extractadvice imperatives from documents

Two research assistants manually annotated 1,264 documents to extract imperatives

Page 19: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

19

Started with literature-grounded taxonomy of 194 codes, 206 new codes discovered through annotation

374 unique advice imperatives2,780 pieces of advice

securityadvice.cs.umd.edu

Page 20: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

20

12 high level topics of security advice

Page 21: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

21

Evaluate quality of corpus along three axes

Comprehensibility: measure with Smart Cloze & perceived ease

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

55% of documents at least partially comprehensible

Page 22: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

22

Four theoretically-grounded actionability sub-metrics

Time Consumption: how time consuming would it be to follow this advice?economic frameworks (cost)

Difficulty: how difficult would it be to follow this advice?HiTL (capabilities)

Confidence: how confident is the user that they can follow the advice?PMT (perceived ability) & HiTL (knowledge acquisition)

Disruption: how disruptive would it be to follow this advice?economic frameworks (cost)

Answered on a Likert Scale: Very to Not at All

Page 23: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

23

Each piece of advice evaluated by three evaluators,who had good reliability (ICC>0.85)

Census-representative sample of evaluators

Page 24: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

Majority of advice rated as actionable

⁄𝟑 𝟒 of advice “somewhat”+ confident⁄𝟐 𝟑 of advice at most “slightly”

time consuming, disruptive, and difficult

20% of documents contain at least one unactionable piece of advice

Page 25: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

25

Evaluate quality of corpus along three axes

Comprehensibility: measure with Smart Cloze & perceived ease

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

People are somewhat or very confident about implementing ⁄𝟑 𝟒 of advice⁄𝟐 𝟑 considered at most slightly time consuming, disruptive, or difficult to implement

55% of documents at least partially comprehensible

Page 26: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

26

Evaluate quality of corpus along three axes

Comprehensibility: measure with Smart Cloze & perceived ease

Actionability: can users follow the advice?

Accuracy: will following the advice make users more secure?

55% of documents at least partially comprehensible

People are somewhat or very confident about implementing ⁄𝟑 𝟒 of advice⁄𝟐 𝟑 considered at most slightly time consuming, disruptive, or difficult to implement

Page 27: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

27

Recruit security experts to evaluate advice accuracy

RecruitmentQualification

CTF, pen testing,secure development

OR those who are certified

41 Experts

2+

Page 28: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

28

Ask experts to evaluate impact on risk & to prioritize

Perceived accuracy: accurate, useless, harmful

Risk reduction (or increase): 0-50+%

Priority: number 1, top 3, top 5, top 10

Page 29: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

29

Each piece of advice evaluated by three experts,who had good reliability (ICC>0.85)

Average of 38 pieces of advice evaluated by each expert

Page 30: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

30

Experts perceive 333 pieces of advice (89%) as accurate

All documents contain at least one piece of accurate advice

Page 31: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

31

Experts are a bit more discerning when prioritizing advicebut 118 pieces of advice are rated in the ”top 5”

Used matrix factorization to generate full ranked list across all votes

#1 Use unique passwords for different accounts#2 Update devices#3 Use anti-malware software#4 Scan attachments you open for viruses…

Top Advice

Page 32: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

32

Expert Priority Rankingof Advice

Reported User Adoption

User Priority Rankingof Advice Advice Actionability Ratings

Confidence Time Consumption

Disruption Difficulty

r = 0.600r = 0.212

Users’ reported adoption of advice correlates with actionability & prioritization

r =0.391 r =0.305 r =0.355 r =0.367

Page 33: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

Problem with online security advice: there is too much

Comprehensibility: average document is “partially” comprehensible to the average U.S. user

Actionability: majority of advice rated as actionable andactionability correlates with prioritization & adoption

Accuracy: 89% of advice rated accurate

Leaves behind low-literacy users

Data storage & network security advice not very actionable20% of documents contain at least one unactionable piece of advice

Lack of prioritization & falsifiability: experts think (almost) all the advice is great

Page 34: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

Future of Security AdviceNow What?

Page 35: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

Future of security advice requires falsifiability for security claims and empirical studies to narrow down behaviors

Page 36: A Comprehensive Quality Evaluation of Security and Privacy ... · Reported UserAdoption UserPriority Ranking of Advice Advice Actionability Ratings Confidence Time Consumption Disruption

A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web

Collected a corpus of 1,264 security advice documentsThrough user generated queries and expert recommendations

Evaluated Quality along three axesAverage document is partially comprehensible to the average U.S. userMajority of advice rated actionable; actionability correlated w/ reported behavior89% of advice rated accurate by experts

Experts can’t narrow down advice; need empirical scienceExperts struggle to identify the most impactful adviceWe need more concrete measurement & falsifiability

Elissa M. Redmiles, Noel Warford, Amritha Jayanti, and Aravind Koneru,Sean Kross, Miraida Morales, Rock Stevens and Michelle L. Mazurek

@eredmil1 [email protected]