writing with data: incorporating statistics into causal research statlab workshop spring 2011 brian...
TRANSCRIPT
Writing with Data:Writing with Data:Incorporating Statistics Incorporating Statistics
Into Causal ResearchInto Causal Research
Statlab WorkshopStatlab WorkshopSpring 2011Spring 2011
Brian FriedBrian Friedand and
Kevin CallenderKevin Callender
Outline of WorkshopOutline of WorkshopPart I: Causation and StatisticsPart I: Causation and Statistics
What is Causation? Correlation?What is Causation? Correlation? Why Statistics?Why Statistics? Threats to InferenceThreats to Inference
Part II: Gathering and Using DataPart II: Gathering and Using Data Gathering DataGathering Data Managing DataManaging Data
Part III: Writing with StatisticsPart III: Writing with Statistics A General Outline, with an exampleA General Outline, with an example
Causation vs. CorrelationCausation vs. Correlation
Causation…Causation…
……correlationcorrelation
Why StatisticsWhy Statistics
Probabilistic RelationshipsProbabilistic Relationships(see previous graph)(see previous graph)
Multivariate RelationshipsMultivariate RelationshipsWe can analyze the relationships We can analyze the relationships between multiple variables at the same between multiple variables at the same time.time.(e.g. education, age, gender, income ….(e.g. education, age, gender, income …. -> voting) -> voting)
What is a regression? What is a regression?
Threats to InferenceThreats to Inference Endogeneity (vs exogeneity of Endogeneity (vs exogeneity of
errors)errors) Autocorrelation (time series)Autocorrelation (time series) Homo/HeteroskedasticityHomo/Heteroskedasticity Internal vs. external validity Internal vs. external validity
Probably the most important step Probably the most important step in research design; advanced in research design; advanced techniques can often compensate.techniques can often compensate.
Part II: DataPart II: DataThink about analyses early! (Ideal vs. Possible)Think about analyses early! (Ideal vs. Possible)What’s Possible? What’s Convincing?What’s Possible? What’s Convincing?
Experimental Ideal Experimental Ideal Practical Data LimitationsPractical Data Limitations Collecting Your Own DataCollecting Your Own Data Using Other DataUsing Other Data
Some data sources: Some data sources: Statlab Statlab Webpage (http://statlab.stat.yale.edu)Webpage (http://statlab.stat.yale.edu) Advisors/Professional ContactsAdvisors/Professional Contacts Yale StatCat (http://ssrs.yale.edu/statcat/)Yale StatCat (http://ssrs.yale.edu/statcat/) ICPSR (http://www.icpsr.umich.edu)ICPSR (http://www.icpsr.umich.edu) Reference Librarian (Julie Linden)Reference Librarian (Julie Linden)
(Quant.) Data Types (Quant.) Data Types and Usesand Uses
Dependent Variable (Dependent Variable (response, response, outcome, criterion)outcome, criterion)
Independent Variables (Independent Variables (explanatory explanatory or predictor variables)or predictor variables)
Control / Confounding Variables Control / Confounding Variables Categorical and Continuous Categorical and Continuous
VariablesVariablesRemember: Types of variables we choose Remember: Types of variables we choose
determine the statistics we usedetermine the statistics we useQualitative knowledge always helps!Qualitative knowledge always helps!
Once You’ve Found or Once You’ve Found or Collected Your DataCollected Your Data
Download the data and documentationDownload the data and documentation StatTransfer (Statlab)StatTransfer (Statlab)
Determine data file typeDetermine data file type Probably a text file (.txt, .dat, .raw)Probably a text file (.txt, .dat, .raw)
Converting text & delimited filesConverting text & delimited files
Choose a statistical software programChoose a statistical software program
Managing your dataManaging your data
Back up all Master Data FilesBack up all Master Data FilesCodebookCodebook
Merging DataMerging Data Adding variables, cases, computing Adding variables, cases, computing
new variablesnew variables
Keep a roadmap Keep a roadmap Keep a log of all analyses with what Keep a log of all analyses with what
you have doneyou have done Save syntax filesSave syntax files
Syntax FilesSyntax FilesWhat are they?What are they?
Text-files used to enter commands in Text-files used to enter commands in bulkbulk
Why?Why?You will make mistakes, need to make You will make mistakes, need to make
changeschanges
How do I know what to write?How do I know what to write?Program’s manual provides the Program’s manual provides the
underlying commandunderlying command
Part III: WritingPart III: Writing
IntroductionIntroductionTheory (Lit Review)Theory (Lit Review)Data DescriptionData DescriptionAnalysis/ResultsAnalysis/ResultsConclusionConclusion
IntroductionIntroductionQuestionQuestion
What is the question you want to answer? What is the question you want to answer?
Why should we care?Why should we care?
HypothesisHypothesisSuccinctly state your claimSuccinctly state your claim
Context & SummaryContext & Summary
MotivationMotivation Are politics becoming more Are politics becoming more
programmatic in Brazil?programmatic in Brazil?
Is Bolsa Familia, a conditional cash Is Bolsa Familia, a conditional cash transfer (CCT) program that benefits transfer (CCT) program that benefits a quarter of Brazil’s population, a quarter of Brazil’s population, programmatic?programmatic?
An Illustrative Example: Bolsa Familia
Programa Bolsa Família – key Programa Bolsa Família – key factsfacts
Conditional cash transfer (CCT) program, launched in Conditional cash transfer (CCT) program, launched in October October 2003. This was not the first CCT program in 2003. This was not the first CCT program in Brazil; some existing programs (like Bolsa Escola) were Brazil; some existing programs (like Bolsa Escola) were incorporated into Bolsa Familia. incorporated into Bolsa Familia.
Benefits families with per capita income below US$78.Benefits families with per capita income below US$78.12 million poor families (almost 50 million people) 12 million poor families (almost 50 million people) currently receive support in all 5,564 Brazilian currently receive support in all 5,564 Brazilian municipalities;municipalities;
Size of stipend: between US$13 and US$114, Size of stipend: between US$13 and US$114, depending on the family’s size and poverty level. depending on the family’s size and poverty level.
Average amount: US$54 per familyAverage amount: US$54 per family2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP)2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP)
An Illustrative Example: Bolsa Familia
Theory/Lit. ReviewTheory/Lit. Review
What does existing theory say?What does existing theory say? What do you believe? What do you believe? Position yourself within theoretical debates.Position yourself within theoretical debates.
Identify Testable HypothesesIdentify Testable Hypotheses
Choose Method Best Suited to Testing Choose Method Best Suited to Testing Your HypothesisYour Hypothesis
Do you need statistics after all?Do you need statistics after all? Quantitative v Qualitative researchQuantitative v Qualitative research
Research QuestionResearch Question
Do political criteria explain the variation in Do political criteria explain the variation in Bolsa Familia’s coverage across Bolsa Familia’s coverage across municipalities?municipalities?
Theoretical (Cox and McCubbins 1986, Dixit and Theoretical (Cox and McCubbins 1986, Dixit and Londregan 1996, Lindbeck and Weibell 1987) and Londregan 1996, Lindbeck and Weibell 1987) and empirical (Ames 1987, Levitt and Snyder 1995, Schady empirical (Ames 1987, Levitt and Snyder 1995, Schady 2000, Dahberg and Johansson 2002, Stokes 2004, 2000, Dahberg and Johansson 2002, Stokes 2004, Kitschelt 2010) reasons to believe that political Kitschelt 2010) reasons to believe that political spending is often targeted, especially given Brazil’s spending is often targeted, especially given Brazil’s history with clientelism and pork.history with clientelism and pork.
An Illustrative Example: Bolsa Familia
How do politicians target?How do politicians target?
““Core”Core”
““Swing”Swing”
MobilizationMobilization
An Illustrative Example: Bolsa Familia
Descriptive StatisticsDescriptive StatisticsVariablesVariables
Dependent Variable(s)Dependent Variable(s)
Independent Variable(s)Independent Variable(s)
Important Control Variable(s)Important Control Variable(s)
GraphsGraphs
Summary Statistics on Key VariablesSummary Statistics on Key VariablesNumber, Mean, Minimum, Maximum, Standard Number, Mean, Minimum, Maximum, Standard
DeviationDeviation
Cross-TabsCross-Tabs
Descriptive StatisticsDescriptive Statistics
MeanStand. Dev.
Min Max Missing
Dependent Variable
Coverage in 2009 0.976 0.229 0.018 6.276 12
Explanatory Variables
PT Vote Share for Deputado Federal
0.060 0.048 0.000 0.326 345
PT Vote Share for President 0.470 0.107 0.110 0.826 18
An Illustrative Example: Bolsa Familia
Coverage in 2009 This continuous variable is the ratio of recipients over the number estimated to be poor in each municipality in November of 2009.
PT Voteshare for Deputado Federal This continuous variable captures a core targeting strategy and measures average PT vote share for federal deputy across the 2002 and 2006 elections.
PT Voteshare for President This continuous variable captures a core targeting strategy and measures average PT vote share for president across the 2002 and 2006 elections.
Key VariablesAn Illustrative Example: Bolsa Familia
Descriptive StatisticsDescriptive StatisticsMean
Stand. Dev.
Min Max Missing
Explanatory Variables
PT Mayor in 2008 0.098 0.297 0 1 0
Base Mayor in 2008 0.609 0.488 0 1 0
Change in Support for PT Presidential Candidate
0.055 0.080 0 0.603 18
Close Presidential Election in 2006
0.190 0.392 0 1 0
An Illustrative Example: Bolsa Familia
So, how do I analyze my So, how do I analyze my data?data?
Correlational designCorrelational design Correlation allows you to quantify relationships Correlation allows you to quantify relationships
between variables (r, r-squared)between variables (r, r-squared) Correlation, partial correlationCorrelation, partial correlation Regression allows you predict scores on 1 variable Regression allows you predict scores on 1 variable
from subjects score on another variable(s) from subjects score on another variable(s)
Group differencesGroup differences t-test & ANOVAt-test & ANOVA Chi-square for categorical and frequency dataChi-square for categorical and frequency data
Significance v. effect sizeSignificance v. effect size
SimulationsSimulations
Methods Methods ofof AnalysisAnalysis
(Empirical (Empirical StrategyStrategy))We discussed this in Part I, We discussed this in Part I, but one generally devotes a but one generally devotes a
section to explaining how one section to explaining how one will identify a causal will identify a causal
relationship prior to the relationship prior to the results section.results section.
Coverage = β0 + β1(political criteria) + βXX + e
Results: Explaining Coverage in Results: Explaining Coverage in 20092009
Explanatory Variable Regression Coefficient
Core Indicators
PT Vote Share for Deputado Federal -.473***
PT Vote Share for President -.0972***
PT Mayor -.0241**
Base Mayor -.0208***
Swing Indicators
Change in Support for PT Presidential Candidate
-.175***
Close Presidential Election .00651
An Illustrative Example: Bolsa Familia
Effect of Standard Deviation Shift of Effect of Standard Deviation Shift of Explanatory Variables on Coverage in Explanatory Variables on Coverage in
20092009
Shift Explained by Political CriteriaEffect of Shift in
Support
PT Vote Share for Deputado Federal -0.023
PT Vote Share for President -0.010
PT Mayor* -0.024
Base Mayor* -0.021
Change in Support for PT Presidential Candidate -0.014
Close Presidential Election in 2006* 0.007
RobustnessRobustness
Identify Threats to Inference!Identify Threats to Inference!
(Do I have any?)(Do I have any?)
Robustness Check: Relationship Robustness Check: Relationship between Coverage in 2004 and Prior between Coverage in 2004 and Prior
ElectionsElections
Shift Explained by Political Criteria
Effect of Shift in Support
PT Vote Share for Deputado Federal in 2002 0.018
PT Vote Share for President in 2002 0.034
PT Mayor in 2000* 0.002
Base Mayor in 2000* 0.005
Change in Support for PT Presidential Candidate (1998 to 2002)
-0.003
Close Presidential Election in 2002* -0.016
Putting Output into a Putting Output into a PaperPaper
Cut and PasteCut and PasteGraphsGraphs
Cut and Paste into Word Processing documentCut and Paste into Word Processing document
Save as .jpeg or .tif fileSave as .jpeg or .tif file
TablesTablesCut and PasteCut and Paste
Format in Word Processing documentFormat in Word Processing document
Import into Excel, format, and then place in Import into Excel, format, and then place in WordWord
More Advanced More Advanced AnalysisAnalysis
Multivariate techniques are only a start; Multivariate techniques are only a start; they do help to account for confounding they do help to account for confounding factors, allow for testing change over factors, allow for testing change over time and more complex hypothesestime and more complex hypotheses……
(See: Tabachnick & Fidell, Using Multivariate (See: Tabachnick & Fidell, Using Multivariate Statistics)Statistics)
1)1) Be honest about your abilities.Be honest about your abilities.2)2) Ask for helpAsk for help3)3) Best off including techniques that you Best off including techniques that you
fully understand, but may be worth fully understand, but may be worth learning something new!learning something new!
Take Away MessagesTake Away Messages1)1) Begin by thinking about what question interests.Begin by thinking about what question interests.
2)2) Look for data and consider appropriate methods; Look for data and consider appropriate methods; identify what hypotheses are actually testable.identify what hypotheses are actually testable.
3)3) Design and run analysis; keep a codebook/syntax Design and run analysis; keep a codebook/syntax files!files!
4)4) Back up dataBack up data
5)5) Ask for help-especially when choosing method—Ask for help-especially when choosing method—and seek feedback on research design.and seek feedback on research design.
6)6) Research and Writing an Iterative ProcessResearch and Writing an Iterative Process