using large data sets
DESCRIPTION
using large data sets. tues , feb 4, 2014. Analyzing a research article. Use the Analyzing Research Articles handout Select one of the five research articles linked from our class schedule (Feb 4) http://ils.unc.edu/courses/2014_spring/inls200_001/schedule.html - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/1.jpg)
tues, feb 4, 2014
using large data sets
![Page 2: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/2.jpg)
Analyzing a research article• Use the Analyzing Research Articles handout• Select one of the five research articles linked from our class
schedule (Feb 4)• http://ils.unc.edu/courses/2014_spring/inls200_001/
schedule.html • Focus on the purpose of the study, description of study
design (participants, methods, how they collected data), data analysis and conclusions
• Don’t worry about specific statistical analysis methods• Due next Tuesday – print or email to me by class time• Format – whatever works for you (bullets, address some
but not necessarily all questions/points from handout)• Counts as one pop quiz (worth up to 2 points)
![Page 3: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/3.jpg)
Collecting Quantitative Data for a Study
• sample survey: sample people from a population and interview them.example: General Social Survey
• experiment: compare responses of subjects under
different conditions, with subjects assigned to the conditions.example: food labeling studies
![Page 4: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/4.jpg)
![Page 5: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/5.jpg)
General Social Survey
• The GSS (General Social Survey) is a biannual personal interview survey of U.S. households conducted by the National Opinion Research Center (NORC). The first survey took place in 1972.
• Approximately 3000 American adults are interviewed in person for about 90 minutes and asked around 450 questions.
![Page 6: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/6.jpg)
http://www3.norc.org/gss+website/
![Page 7: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/7.jpg)
Purpose of GSS
• gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes over time
• to compare the United States to other societies
![Page 8: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/8.jpg)
General Social Survey
• demographics & attitudes– The questionnaire contains a standard core of
demographic and attitudinal variables, plus certain topics of special interest selected for rotation (called "topical modules")
– Items include national spending priorities, drinking behavior, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and membership in voluntary associations
![Page 9: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/9.jpg)
![Page 10: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/10.jpg)
variables
• variable – a characteristic that can vary in value among subjects in a sample or a population. We are interested in similarities and differences - variance
• types of variables– categorical (also called qualitative)– quantitative
![Page 11: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/11.jpg)
categorical variable
• scale for measurement is a set of categories• examples:– Racial-ethnic group (white, black, Hispanic)– Political party identification (Dem., Repub., Indep.)– Vegetarian? (yes, no)– Mental health evaluation (well, mild symptom formation,
moderate symptom formation, impaired)– Happiness (very happy, pretty happy, not too happy)– Religious affiliation– Major
![Page 12: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/12.jpg)
SPANKING: Categorical (Single) Do you strongly agree, agree, disagree, or strongly disagree that it is sometimes necessary to discipline a child with a good, hard spanking?
Categories: Code as:{strongly_agree} Strongly agree 5{agree} Agree 4{disagree} Disagree 3{strongly_disagree} Strongly disagree 2{dontknow} DON'T KNOW 1{refused} REFUSED 0
Sample question from GSS
![Page 13: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/13.jpg)
scales of measurement
for categorical variables, two types:
nominal scale – unordered categorieso preference for president, race, gender, religious
affiliation, major opinion items (favor vs. oppose, yes vs. no)
ordinal scale – ordered categorieso political ideology (very liberal, liberal, moderate,
conservative, very conservative) o anxiety, stress, self esteem (high, medium, low)o mental impairment (none, mild, moderate, severe)o government spending on environment (up, same,
down)
![Page 14: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/14.jpg)
PRES12: Categorical (Single) Did you vote for Obama or Romney?
Categories: Code as:Obama 5Romney 4Other Candidate (Specify) 3Didn’t vote for president 2Don’t know 1Refused 0
nominal scale – unordered categories
![Page 15: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/15.jpg)
POLVIEWS: Categorical (Single) We hear a lot of talk these days about liberals and conservatives. I'm going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal--point 1—to extremely conservative--point 7. Where would you place yourself on this scale?
Categories: Code as:Extremely liberal 7Liberal 6Slightly liberal 5Moderate, middle of the road 4Slightly conservative 3Conservative 2Extremely conservative 1DON'T KNOW 0REFUSED 8
ordinal scale – ordered categories
![Page 16: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/16.jpg)
quantitative variable
• possible values differ in magnitude • examples:– Age, height, weight, BMI = weight(kg)/[height(m)]2
– Annual income – GPA– Time spent on Internet yesterday– Reaction time to a stimulus – (e.g., cell phone while driving in experiment)– Number of “life events” in past year
![Page 17: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/17.jpg)
![Page 18: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/18.jpg)
use of statistics to describe, summarize, and explain or make sense of a given set of data
![Page 19: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/19.jpg)
• Mean– Uses all of the data– Has desirable statistical properties– Affected by extreme high or low values (outliers MJ example)
– May not best characterize skewed distributions
• Median– Not affected by outliers– May better characterize skewed distributions
Comparison of mean and median
![Page 20: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/20.jpg)
mid-1980's at the University of North Carolina, the average starting salary of geography students was well over $100,000
![Page 21: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/21.jpg)
![Page 22: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/22.jpg)
sample patterns from GSS data– median income of female respondents compared with
average income of male respondents– median level of education of respondents who own a
gun– number of female respondents who own a gun
compared with number of male respondents who own a gun
– average age of respondents who indicated the government should spend more on space exploration
– self-reported level of happiness compared with income level
![Page 23: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/23.jpg)
Sample characteristics of the GSS
• The sampling frame of the General Social Survey is all U.S. adults living in households. The sampling frame includes 97.3 % of all U.S. adults.
• Who does not live in a household?– college students in dorms– military personnel in barracks– prisoners– elderly persons in retirement homes
![Page 24: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/24.jpg)
Does the GSS sample really draw from all the adults in its sample frame?
• After the GSS is sampled, only 70% of persons in the sample actually respond to the survey (in the 2004 study). – 23% refuse or cut the survey off in the middle– 2% are unavailable or can’t be found– 5% are missing for other reasons
• In general, a response rate of 60% or more is considered minimally acceptable, but you should check your results in any way you can.
![Page 25: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/25.jpg)
Let’s look at a GSS questionnaire
Start at page 31
![Page 26: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/26.jpg)
where can you access SPSS?
• Odum Institute – Davis Library 2nd floor – ask lab assistant
• https://virtuallab.unc.edu• Lab in the Undergraduate Library (need to
confirm)
![Page 27: using large data sets](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814889550346895db59e90/html5/thumbnails/27.jpg)
Notes…
• Bring a flash drive to the Odum lab on Thursday – you may want to save your work
• Davis Library >> Room 219• The dataset that we are using (GSS 2012) is
available for download on our class website– schedule>>feb 06– the dataset is a .sav format – only opens with SPSS