accessing statistics canada data and resources · • general social survey (gss): 1985-2014 •...
TRANSCRIPT
Accessing Statistics Canada Data and
Resources Hugh McCague
Valerie Preston
Walter Giesbrecht
Sara Tumpane
Outline
• Survey Terminology
• Research Data Centre (RDC)
• RDC versus Public Use Microdata Files (PUMF)
• Accessing the RDC
• Statistics Canada Surveys and Data
• Statistical Software
• Research Opportunities
• Statistical Consulting Service
• Resources
Some Survey Terminology
3
• Population
• Elements
• Sample: Simple Random Sample, Probability Sample
• Response Rate • Weights: Simple Weights
4
• Demographics
• Strata
• Clusters (primary sampling units, PSUs)
• Complex Sample • Complex Weights, Bootstrap and Jackknife
Replicate Weights
Some Survey Terminology
5
• Cross-sectional data
• Longitudinal data: periods, waves, cycles, trajectory, life course
• Attrition: attrition rate.
• Helpful reference: Ornstein, Michael. A Companion to Survey Research. London; Thousand Oaks, CA: SAGE, 2013.
Some Survey Terminology
Research Data Center (RDC)
• Access to Statistics Canada data and statistical software
• Microdata & administrative data
• For York students and faculty, access is free
• A “secure” environment • Researchers are “deemed employees” of Statistics Canada
• Must work in RDC
• CRDCN Network
The CRDCN Network
York RDC • 282 York Lanes
• Staffed by: • Analyst Sara Tumpane ([email protected])
• Assistant Theresa Kim ([email protected])
• 8 workstations
• Open 3-3.5 days/ wk
• http://www.isr.yorku.ca/rdc/
8
Before you apply to the RDC…
• Consider your options
• Is what you need in some more readily accessible source (either PUMF or aggregate file)
RDC or PUMF? Confidential Microdata in Research
Data Centres Public Use Microdata Files accessed
online
Characteristics:
o Contains most of the original information collected during the survey
o Continuous variables are accessible o Longitudinal identifiers provided o Contains bootstrap weights used for
calculating exact variance
Characteristics:
o Manipulated by aggregating, capping, or deleting variables that could be “identifiers”; survey respondents cannot be identified
o Many continuous variables transformed into categorical variables
o Longitudinal identifiers stripped
Access is appropriate when: o Sensitive variables not provided in
PUMF o A PUMF does not exist o Longitudinal data is necessary o Analytical work is complex in
nature
Access is appropriate when: o Immediate data access is required o Analysis is for a course paper or
equivalent o Data exploration
CCHS 2012 Example 1
PUMF Master File
• 1381 variables
• Sources of personal income o Employment inc.
o EI/Worker's comp
o Senior benefits
o Other
• 1815 variables • Sources of personal income
o wages and salaries o income from self-employment o dividends and interest o employment insurance o worker's compensation o CPP or QPP o job related retirement pensions o RRSP/RRIF o OAS and GIS o social assistance/welfare o child tax benefits o child support o alimony o other o none
CCHS 2012 Example 2
PUMF Master File • Geography
o Province of residence of respondent-(G)
o Health Region - (G)
o B.C. Health Authority (BCHA) - (D)
• Geography o Province of residence of respondent o Postal code - (D) o Health region of residence of respondent - (D) o Sub-health region (Québec only) - (D) o Nova Scotia district health authority o British Columbia local health authority - (D) o Regional health authority (RHA) - Alberta - (D) o British Columbia health authority - (D) o Local health integrated networks - Ontario - (D) o 2006 census dissemination area o Federal electoral district - (D) o Census subdivision - (D) o Census division - (D) o Statistical area classification type - (D) o 2006 Census metropolitan area (CMA) o Health region peer group o Urban and rural areas o Urban and rural areas - 2 levels - (D) o Subzones for Alberta o Manitoba health authority - (D)
Accessing
PUMFs & master file metadata
• Statistics Canada Nesstar data portal o metadata only, for PUMFs and master files
o http://www62.statcan.ca/webview/
• YUL: Data & Statistics library guide o http://researchguides.library.yorku.ca/data
• <odesi> (OCUL) o http://www.library.yorku.ca/e/resolver/id/1165738
http://www.andertoons.com/data/cartoon/6543/things-good-stuff-ok-i-reiterate-request-for-specific-data
How to apply to an RDC and available datasets
• RDC Application Pages
• SSHRC Website
• Data available in the RDCs
Accessing the RDC
Action Timeline Notes
Apply through the SSHRC website
1-2 Hours Provide list of academic contributions; 5-10 page project proposal
Evaluation of the proposal
2-4 Weeks
Approval based on relevance of methods and data, and demonstrated need for microdata
Security screening process
1-3 Weeks for approval
Sign Microdata Research Contract
1-3 Weeks for approval
Project Proposal
• The project proposal is a maximum of ten pages and includes the following elements:
o Title of the Project
o Rationale and objectives of the study
o Proposed data analysis and software requirements
o Data requirements
o Expected project start and end dates
o Expected products
o References
Data at the RDC
• Canadian Community Health Survey (CCHS): 2001-2014 o Health status, health care utilization, and health determinants
• Annual Component (starting in 2001, N~130,000) • Mental Health (2002, 2012) N ~ 37,000 • Nutrition (2004) N ~ 35,000 • Healthy Aging (2008-2009) N ~ 52,000 (sample 45+)
• Canadian Health Measures Survey (CHMS): 2011, 2012, 2013 o Survey and administrative data
• Hate Crime Data (Pilot): 2010-2012
o Characteristics of hate-motivated criminal incidents, victims, and accused persons
Data (continued)
• General Social Survey (GSS): 1985-2014 • Health (1985, 1991)
• Time Use (1986, 1992, 1998, 2005, 2010)
• Victimization (1988, 1993, 1999, 2004, 2009, 2014)
• Education, Work and Retirement (1989, 1994)
• Family (1990, 1995, 2001, 2006, 2011)
• Caregiving and Care Receiving (1996, 2002, 2007, 2012)
• Access to and Use of Information Technology (2000)
• Social Networks/Social Identity (2003, 2008, 2013)
• Giving, Volunteering and Participating (2013)
• National Longitudinal Survey of Children and Youth (NLSCY): 8 cycles
o Development and well-being: birth - early adulthood o Follow-ups every two years to age 25
Data by Themes • Health and Health Care
• National Population Health Survey (NPHS)
• Participation and Activity Limitation Survey (PALS)
• Canadian Tobacco, Alcohol and Drugs Survey (CTADS)
• Occupations and Organizations • Workplace and Employee Survey (WES)
• Survey of Labour and Income Dynamics (SLID)
• Census
• Education • Youth in Transition Survey (YITS)
• National Graduates Survey (NGS)
• Race and Ethnicity • Aboriginal Peoples Survey (APS)
• Longitudinal Survey of Immigrants to Canada (LSIC)
• Ethnic Diversity Survey (EDS)
Pilot Data
• Canadian Cancer Registry (CCR)
• Vital Statistics
• Uniform Crime Reporting • Homicide Survey
• Hate Crime Data
• Ministry of Community and Social Services (MCSS)
• Citizenship and Immigration Canada (CIC)
Which Statistical Software to use at the York RDC?
Features to Consider
• SPSS 23
• SAS 9.4
• Stata 13
• R 3.0.3
Statistical Software Resources: Institute for Digital Research and Educations (idre), UCLA
http://www.ats.ucla.edu/stat/
23
• Ames, M. E., Rawana J. S., Gentile P., and Morgan A. S.
“The protective role of optimism and self-esteem on depressive symptom pathways among Canadian Aboriginal youth.” Journal of Youth and Adolescence 44.1 (2013): 142-154.
• National Longitudinal Study of Children and Youth
• Complex Sample Design, Post-Stratification
• Longitudinal Linear Mixed Models with Mediation
An Example of a Psychology Research Project
at the York RDC
24
• Extending methods to Complex Samples Designs
• Proper methods for the Structural Equation Modeling
of Complex Survey Data are strongly needed (Bollen et al., 2013)
• R package laavan.survey has started to address this issue (Oberski, 2014)
• Item Response Theory with Complex Survey Data needs much more development (Cyr and Davies, 2005)
A Few of Many
Quantitative Methods Research Opportunities
Statistical Consulting
Service (SCS)
25
• Statistical Consulting provided by a group of York faculty and graduate students with staff at the Institute for Social Research (ISR).
• Usually, no fee for York faculty and student researchers
• Online appointment scheduler
http://truthfacts.com/truthfacts/2014/04/09
Statistical Consulting
Service (SCS)
27
• ISR/SCS Short Courses and Spring Seminar Series on data analysis, qualitative research methods, survey methods, and related software
• More details: http://www.isryorku.ca/centres/scs/