stata for s-052 m. shane tutwiler your friendly s-040 lecturer william johnston it services harvard...

22
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Upload: sheena-simmons

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

STATA for S-052

M. Shane Tutwiler

Your Friendly S-040 Lecturer

William Johnston

IT Services

Harvard Graduate School of Education

Page 2: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Getting the files

The do-file used in this workshop as well as all data files are in the Stata Help tab of the course iSite.

– Download SATdata.csv, auto.dta and Stata for S-052.do and save them to a new folder called Stata_Workshop on your desktop or on a usb drive.

Page 3: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

• Office: Gutman 324

• Email: – [email protected]

• Want to set up a consultation? – hgse.service-now.com/ess/research.do

• Want to learn more on your own?– itservices.gse.harvard.edu/its/services/research-online-

resources/stata

Contact Information

Page 4: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Agenda: Overview

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Questions

Page 5: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Getting Help in Stata

• Many pathways to getting help in Stata:

. help command

. search command

. findit command

• Use the help menu• Look online with a web browser• Set up an appointment

• ([email protected])!

Page 6: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Some notes

• A word about programming in and using Stata

• Stata is case sensitive, so Myvar is different from myvar

• All commands in Stata are lower-case

• and = “&“, or = “|“, not = “! “

• Assignment is “=“ , value equivalency is “==“

• Missing values are coded as extremely large numbers, and are represented by a . or a blank

Page 7: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

How to Begin a Session?

• Specify your directory

– cd “_______”

• Begin using a log file

– log using “______.log”

• Open your data and look at it

– insheet using “SATdata.csv”, comma

– browse

– describe

Page 8: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Anatomy of a Stata Command

• Stata commands follow a pattern:

• [prefix:] command [varlist] [if] [in] [weight ] [, options]

• For example: • bysort region: summarize expense, detail• mean csat if income >= 30000 & region != .• list state in 1/10, nolabel

Page 9: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Getting Started

• Opening Data• Stata formatted data (.dta) : use “file name”

• Comma-separated variables: insheet using “file name”, comma

• Tab-delimited variables: insheet using “file name”, tab

• Web-based data files: webuse “web location”

• Flat-files: Create a dictionary {beyond the scope of this

workshop}

Page 10: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Looking at Data

• Look at your data – did our data import correctly?

• How are our data measured?• What kinds of variables do we have?

• Editor. edit

• Browser. browse

• Other commands. codebook. describe

Page 11: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Examining Data

• There are several ways to look at our data in Stata• How would we describe the distribution of our data?

• Graphs of distribution• Histograms

• histogram• Scatterplots

• scatter

• Charts/Tables of frequency and distribution• Frequency tables

• table• Cross-tabs

• tabulate

Page 12: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Basic Data Operations, part 1

• Generating a new variable

gen newvarname=expression

• Subsetting• keep varlist• drop varlist • if

• Joining Two Datasets

. Merge• Note—this is covered in detail in the Data Management

Workshop!

Page 13: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Basic Data Operations part 2

• Labeling

• To label a variable: label variable varname labelname

• To label values:

. label define labelname 1 ‘high’ 0 ’low’ . label value variable labelname

• Renaming

. rename varname1 varname2

• Replacing values of an already generated variable

. replace newvarname=expression

Page 14: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Apply Your Knowledge

• Use the SATdata dataset

• Generate a dichotomous variable called hi_score from the csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922.

• Label it as 0=low and 1=high.

Page 15: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Questions

Page 16: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Beginning Analysis

• Useful commands

• Looking at Distributions• table, histogram, summarize

• Testing the Normality Assumption• sktest, ladder, gladder

• Beginning to Look at Relationships• tabulate, pwcorr, ttest, anova

Page 17: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Apply Your Knowledge

• Generate a histogram of the expense variable.

• Generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable.

• If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education (expense).

Page 18: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Building Regression Models

• Regression models

• Linear regression• regress depvar indepvar1 indepvar2 …

• Logistic Regression• logit depvar indepvar1 indepvar2 …

Page 19: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Apply Your Knowledge

• Generate two scatterplots – one to look at the relationship between expense and csat , one to look at expense and hi_score.

• Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores.

Page 20: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Saving data, code, and output

• Saving your newly transformed data• save “pathname\filename.dta”• outsheet using “pathname\filename”

• Saving your code• SAVE YOUR DO-FILE!!!!!

• Saving your output• create a log file

• . log using “pathname\filename”• . log close (!!!!) Not closing = not saving!

• Saving graphs• . graph save

Page 21: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Agenda: Overview

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Questions

Page 22: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

Thanks!

Questions?

Gutman Library, room 323a

[email protected]

http://itservices.gse.harvard.edu/its/services/research