teaching students how (not) to lie with statistics
TRANSCRIPT
Teaching Students How (Not) to Lie with Statistics
Lynette HoelterAmerican Sociological Association
August 23, 2015
Presentation Outline:• Statistics as social construction• Questioning evidence• Practice, practice, practice• Ways stats can “catch” us• Sources of “numbers” for practice
Numbers lend “authority”• Make arguments seem more “scientific”• Appears definitive
but, sometimes…• Sources are given more credibility than they
should be (e.g., “Univ. of Michigan data suggest” referring to results from a study of UM students)
• Key information needed to evaluate is missing and/or numbers are taken out of context
Numbers as social construction• Evidence is evidence, right? • Numbers/statistics do not exist apart from
people– Who counted?– What exactly did they count?– Why did they count it?
• Quantitative literacy is first step, then add sociology (or vice versa)
Questions to ask upon sighting data1
• What is the source of the statement and/or data?
• How is the information reported?• Is the sample of adequate size and
representative?
1 Adapted from Healey, Joseph E., 2013. The Essentials of Statistics: A Tool for Social Research (3rd Ed). Belmont, CA: Wadsworth, Cengage Learning.
We ALL need practice• Using data in (any) class:
– Start class with data– Tie survey data to topic of lecture– Use real data as examples for problems or
exams– Require evidence-based arguments
Easy Example:EXTRA CREDIT: The charts below were part of a blog post by the Federal Reserve Bank of New York (9/2/2014) and demonstrate two ways of looking at the value of a college degree. Net Present Value represents the additional income earned by someone with a Bachelor’s degree compared to someone without, added over a 40+ year working life. In a couple of sentences, describe the trends in each chart and then answer the question: Is a college degree worth it? Why or why not? (5 points)
Ways stats can “catch” us• Definition issues• Big numbers• Proper measure of
central tendency• Percent/percent
change• Risks/Rates• Correlation & causation
• Trends over time• Statistical vs
substantive significance• Funky graphics• Reducing complexity of
social patterns
Definition Issues• What was included, what was excluded? • How was a “positive” defined?• If looking at cost/benefits – really measuring
all costs/benefits? (Compare apples to apples)• From whom were data collected (sampling)?
Source: http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225
Definitions (con’t)• Rates = fairly straightforward; • US Divorce Rate – commonly reported ~ 50%• Numerator is easy (formal divorces?)• Denominator??
– All current marriages– All first marriages– All marriages in one year
• Large differences by age at first marriage, number of previous marriages, etc.
Definition of credit card fraud given on site: Credit card fraud is a theft committed using a credit card or debit card, as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. According to the United States Federal Trade Commission, while identity theft had been holding steady for the last few years, it saw a 21 % increase in 2008.
No hint as to whether denominator includes all Americans, Americans with credit cards, etc.Source: www.statisticbrain.com/credit-card-fraud-statistics/
Big Numbers• Shock value• No context• More memorable
– Deaths from flu 1976-2006 range from 3,000 to 49,000
– 49,000 is a lot, isn’t it?!– 1,715,434 deaths in US in 2015 so far
Providing Context for Big Numbers• Using seconds1:
– One million seconds ~ 11.6 days (86400 = day) – One billion seconds ~ 31.5 years
• Using $$: $17 Trillion US Debt• Population sizes2:
– 100,000 people ~ South Bend, IN– 1,000,000 people ~ San Jose, CA or Austin, TX; Montana or Rhode
Island– 10,000,000 people ~ North Carolina or Georgia– US. Pop. = 320,145,187 (320 million)– China Pop. = 1,393,783,836 (1.39 billion)– World Pop. = 7,361,779,045 (7.36 billion)1 Paulos, 2001 2US Census and Worldometers.com
Central Tendency• Plays on our understanding of “average”• Distributions that are skewed should use
median– E.g., “Average” household income in US, 2011
• Median: $50,502• Mean: $69,821
Percent/Percent Change• Beware of percentages in tables
– Make sure they add to 100% for the independent variable
• Percent change– Each calculation changes the base– Why 50% Off sales are not the same as 20% off
and additional 30% off
Percent Alone Can Be Misleading
Percent Change
Risks & Rates Risk of developing breast cancer in next 10 years goes up by 230% from age 30 to 40; 58% from age 40-50.
From: http://www.cdc.gov/cancer/breast/statistics/age.htm
Correlation vs. Causation
• From: Spurious Correlations
Trends (or “Trends”) over Time• Legends of charts• Time frame presented
can change interpretation
• Changes in defining/reporting
• Be wary of trends that suddenly change direction (life doesn’t move that quickly)
Incidents were classified as school shootings when a firearm was discharged inside a school building or on school or campus grounds, as documented by the press or confirmed through further inquiries with law enforcement. Incidents in which guns were brought into schools but not fired, or were fired off school grounds after having been possessed in schools, were not included.
Statistical vs. “Real” Significance
“Funky” Graphics
All examples from http://flowingdata.com/category/statistics/mistaken-data/
Simplifying Complex Processes• Identifying one event/process/change as
affecting change in complex process– E.g., “Broken Window” theory of crime
In Short:
• Get students thinking about numbers and their context as early and often as possible
Websites to Start Your Search• ABCNews Who’s Counting (Paulos’ column)• Association of Religion Data Archives
Learning Center• Choosing a Good Chart (decision table)• Data360• Gapminder• ICPSR: Resources for Instructors
– Data-driven Learning Guides • Pew Research Center: Fact Tank, Reports,
Datasets, Interactives• Population Pyramids of the World • Social Explorer: US mapping• Social Science Data Analysis Network • Spurious Correlations• Statistic Brain• Stats.org• Survival Curve• TeachingWithData.org• Worldometers, USA Live Stats
• Public Opinion: – Gallup Organization – National Opinion Research Center (GSS
Explorer)– Roper Center (iPoll)
• Government Centers such as the Census (American FactFinder), NCES, or NCHS
• Professional Development: – Science Education Resource Center
(Carleton College)– TeachQR.org (Lehman College)– Making Data Meaningful (United Nations
Economic Commission for Europe)• International:
– UK Data Services Teaching with Data– European Social Survey EduNet
(A Few) Interesting Reads:Best, Joel. 2012. Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists (2nd Ed). Berkeley: University of California Press.Best, Joel. 2004. More Damned Lies and Statistics: How Numbers Confuse Public Issues. Berkeley: University of California Press.Huff, Darrell. 1993. How to Lie With Statistics (2nd Ed). New York: W.W. Norton & Company.Klass, Gary. 2012. Just Plain Data Analysis: Finding, Presenting, and Interpreting Social Science Data (2nd Ed). New York: Rowman & Littlefield Publishers, Inc.Paulos, John Allen. 2001. Innumeracy: Mathematical Illiteracy and Its Consequences (2nd Ed). New York: Hill & Wang.Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t. New York: Penguin Group (USA).