siap-srtc training course on sampling acceed center, aim, makati philippines 4 april 2002

92
SIAP-SRTC Training Course on Sampling Acceed Center, AIM, Makati Philippines 4 April 2002 Jose Ramon G. Albert Jose Ramon G. Albert Research Division Chief Research Division Chief Statistical Research & Training Statistical Research & Training Center (SRTC) Center (SRTC) email: [email protected] email: [email protected] A Gentle Introduction to STATA

Upload: marva

Post on 19-Mar-2016

33 views

Category:

Documents


0 download

DESCRIPTION

A Gentle Introduction toSTATA. Jose Ramon G. Albert Research Division Chief Statistical Research & Training Center (SRTC) email: [email protected]. SIAP-SRTC Training Course on Sampling Acceed Center, AIM, Makati Philippines 4 April 2002. OUTLINE. Statistical Computing Resources - PowerPoint PPT Presentation

TRANSCRIPT

  • SIAP-SRTC Training Course on SamplingAcceed Center, AIM, MakatiPhilippines4 April 2002

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    OUTLINE Statistical Computing ResourcesData Management with StataTable GenerationTab and Table CommandsSurvey Commands

    2000 SPSS Public Sector User Exchange

  • 2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesThe Age of ICT has brought about a synergy of computing and communicationsImplications: More DATA collectedMore DATA storedMore DATA accessible and distributed

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesThere are a host of statistical software that provide pre-programmed analytical and data management capabilities. These software may be classified according to use and cost.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesTypes of Stat Software by usageGeneral Purpose -- SAS, SPSS, R, Splus, Statistica, StataSpecial Purposes -- econometric modeling (Eviews), seasonal adjustment (X12), Bayesian modeling (WINBUGS), survey data tabulation & variance estimation (IMPS, CENVAR)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesTypes of Stat Software by costCommercial Software - SAS, SPSS, Stata, S-plus Freeware - R, IMPS, X12

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesFOR SURVEY DATABascula from Statistics Netherlands. CENVAR (& IMPS)from U.S. Bureau of the Census. CLUSTERS from University of Essex. Epi Info from Centers for Disease Control. Generalized Estimation System (GES) from Statistics Canada. IVEWare (beta version) from University of Michigan.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesFOR SURVEY DATAPCCARP from Iowa State University. SAS/STAT from SAS Institute. Stata from Stata Corporation. SUDAAN from Research Triangle Institute. VPLX from U.S. Bureau of the Census. WesVar from Westat, Inc.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesLists of Statistical Software http://members.aol.com/johnp71/javasta2.html http://www.stir.ac.uk/Departments/HumanSciences/SocInfo/Statistical.htmhttp://www.fas.harvard.edu/~stats/survey-soft/ http://www.feweb.vu.nl/econometriclinks/software.html

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesThis afternoon, we will provide a demonstration on how to use STATA for accomplishing some of the most common tasks of data management, statistical computing and analysis of survey data.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesStata Estimation of means, totals, ratios, and proportions; linear regression, logistic regression, and probit. Point estimates, associated standard errors, confidence intervals, and design effects for the full population or subpopulations are displayed.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesStata Auxiliary commands display various information for linear combinations (e.g., differences) of estimators, and conduct hypothesis tests. New in Stata : contingency tables with Rao-Scott corrections of chi-squared tests; new survey-corrected regression commands including tobit, interval, censored, instrumental variables, multinomial logit, ordered logit and probit, and Poisson

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesStatastratified designs; cluster sampling; FPCs can be calculated for simple random sampling w/o replacement of sampling units within strata; variance estimation for multistage sample data carried out through the customary between-PSU-squared-differences calculation.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesStataVariance estimation is done thru Taylor-series linearization in the survey analysis commands. There are also commands for jackknife and bootstrap variance estimation, but these are not specifically oriented toward survey data.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Computing ResourcesNote:We will demonstrate the use of STATA version 6. Current version is version 7; even a Special Edition (SE) which can handle up to 32,766 variables w/ strings up to 244 chars, and up to 11,000 x 11,000 matrices.

    2000 SPSS Public Sector User Exchange

  • 2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementSTARTING UPGo to Start, Programs, Stata, Intercooled StataAlternatively, from Windows Explorer, go to folder c:\stata Double click wstata.exe

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data Management

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementCREATING A NEW DATASETOpen the STATA spreadsheet editor

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementCREATING A NEW DATASETEnter data into the editor, when done close the editor.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementCREATING A NEW DATASETIn the STATA COMMAND window enter the commandsave newfile

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTEA STATA dataset will have extension name dta. That is, newfile is actually newfile.dtaPublic use files of some surveys, e.g. VLSS (Vietnam Living Standards Survey), are in Stata format.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementINSPECTING DATA BASEIn the STATA COMMAND window enter the following commandsdescribe list summarize

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:Stata is case sensitive.Stata commands may be abbreviated, e.g. D for DESCRIBE, SUM for SUMMARIZE, etc.We may use Page Up/Down keys or mouse for re-selecting commands in the Review window.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:Commands and output are shown in Results window. Windows may be re-sized. Commands and output may be logged into a log file by pressing Open Log button.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementRENAMING VARIABLESONE WAY : (From Data Editor) Double click anywhere in the variables column resulting in a dialogue box

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementRENAMING VARIABLESSECOND WAY: (In the STATA COMMAND window) enter rename var1 domain rename var2 hcn rename var3 age label variable age HH head age d

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementSAVING EDITED DATABASEIn the STATA COMMAND window enter the following commands save newfile, replaceNote: typing only save newfile will result in an error message

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementREADING PRE-EXISTING STATA DATASETIf dataset is in folder c:\fies2000 and filename is fies00small.dta, enter clear set mem 64m cd c:\fies2000 use fies00smallNOTE: Impt for MEMORY MANAGEMENT

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementIMPORTING DATASuppose we have a dataset try.txt in c:\fies2000 folder NOTE: Missing Data coded as .

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementIMPORTING DATASuppose we have a dataset try.txt in c:\fies2000 folderUse the infile command with syntaxinfile variable-list using filename.rawIn particular, entercd c:\fies2000 infile domain hcn age using try.txt, automatic

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementTRIVIA ON STRING VARIABLESWhen using the infile command for character (string) variables, we need to identify these variables. For instanceinfile domain hcn str30 prov using tr.txtFor more details regarding infile, enterhelp infile1

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementIMPORTING DATASuppose we have a dataset try2.txt in c:\fies2000 folder with the data in specific fields Assumes last line is blank line

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementIMPORTING DATASuppose we have a dataset try2.txt in c:\fies2000 folder with the data in specific fieldsUse the infix command infix domain 1 hcn 2 age 3-4 using try2.txt, clear

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementThus, Stata can read text files withInfile (if the data in text is separated by spaces and does not have strings, or if strings are just one word, or if all strings are enclosed in quotes)Infix (fixed format text)Insheet (if text file was created by a spreadsheet or db program)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:The commands infile, infix, insheet read data from ASCII files. Outfile is a way to save the data in ASCII. There are third party programs, esp. Stat/Transfer and DBMS/COPY, that perform translations from one data format (e.g., dBASE, Excel, SAS, SPSS, Stata) to another.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data Management

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementOTHER USEFUL COMMANDSTo sort the dataset by age sort ageTo get a listing of the datasetlistTo get a listing of the 2nd-4th datalist in 2/4

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementOTHER USEFUL COMMANDSTo summarize the restricted dataset of HHs whose heads age is less than/equal to 50summarize if age > = == < 35To get the correlation matrixcorrelate x y z

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementGENERATING & REPLACING VARIABLESSuppose we want to obtain per capita income (pci) of FIES 2000 householdsclearcd d:\fies00use fies00small gen pci=toinc/hsize

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementGENERATING & REPLACING VARIABLESNow tag the household as poor (1) if pci < some threshold, say 13823, determine percent of HHs that are poor. gen poor=1 if pci < 13823 replace poor=0 if poor==. sum poor [aw=rfact] save fies00small, replace

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTESmall portion of data set of FIES 2000 was used. The Family Income and Expenditure Survey (FIES) is conducted by the National Statistics Office (NSO)every 3 years. Data may be purchased through the NSO website: www.census.gov.ph

    2000 SPSS Public Sector User Exchange

  • SIAP-SRTC Training Course on SamplingAcceed Center, AIM, MakatiPhilippines5 April 2002

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementRECALLThat if we use our fies2000 data setset mem 64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty line we provided is a weighted average of the variable poverty lines in the Philippines (for urban-rural areas across the different regions)

    2000 SPSS Public Sector User Exchange

  • 2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Estimating Food Poverty LineFood poverty line estimated from low cost one day menus (breakfast, lunch, supper snack) constructed for each urban-rural area of a region by Food and Nutrient Research Institute (FNRI) which meet 100% sufficiency in energy and protein requirements and 80% sufficiency of other nutrients and vitamins. RDAs for energy: 2000 Kcal per personRDAs for protein: 50 grams per person29 such menus constructed on the basis of the 1988 Food Consumption Survey

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Annual Per Capita Food Line Urban, by Region

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Annual Per Capita Food Line Rural, by Region

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Estimating Poverty LinePoverty Line= Food Threshold/ Engels Coefficient Engels coefficient estimated by analyzing the consumption pattern of families having incomes within plus or minus 10 percentage points from food threshold. Engels coeff = Food Exp/ Total Basic Exp

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Annual Per Capita Poverty Line Urban, by Region

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Annual Per Capita Poverty Line Rural, by Region

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Poverty Statistics (Family)[Standard Error]

    Measures20001997

    Poverty Incidence 33.6% [0.3%]31.8%Poverty Gap10.7%[0.1%]10.0%Severity Index4.6%[0.1%]4.3%

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Poverty Incidence All Areas, by Region

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Small Area Poverty Stats?Stata has some add ons for generating SEs for poverty statsIf we wish to generate provincial poverty statistics, we will find out that SEs are too high, i.e. figures are unreliable

    2000 SPSS Public Sector User Exchange

  • 2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementRECALLThat if we use our fies2000 data setset mem 64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty line we provided is a weighted average of the variable poverty lines in the Philippines (for urban-rural areas across the different regions)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:STATA uses several types of weights fw frequency weightsaw analytic weights iw importance weightspw probability weights

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:Within the command generate or replace, we may transform or create variables by using functions, e.g.,generate loginc=ln(toinc) generate y=cos(x*_pi/180)replace newvar=normd(z) generate rvar=uniform()

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementDELETING VARIABLES/DATATo drop a variable, say agedrop ageTo drop some observationsdrop in 2/3Try also the command keep. To drop all data in memoryclear

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:So far we have used STATA interactively. We can also do batch processing through the DO FILE editor.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data ManagementNOTE:The STATA toolbar has 13 buttons.

    The first three are to OPEN a Stata datasetSAVE to the disk the resident dataset PRINT a graph or log

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data Management

    The next five are for Starting/stopping/suspending a LOG Bringing the Log to the Front Bringing the Dialog to Front Bringing the Results to Front Bringing the Graph to Front

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Data Management

    The last five are for Opening the DO FILE editor Opening the DATA editor Opening the DATA Browser Telling Stat to continue when it has paused in mid of long output Stopping the current task

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    ExerciseWhat is the average income of families that are below or above the mean family expenditure?

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    ExerciseCompare correlation of food expenditures (fexp) and nonfood expenditures for families in rural & urban areas.

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    ExtraEntergraph food nfood

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    ExtraNow trysort urb graph food nfood, by (urb) graph food nfood, by (urb) total

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    ExtraMatrix plotsgraph toinc food nfood, matrix

    2000 SPSS Public Sector User Exchange

  • 2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Table Generation w/ tabEarlier, we showed the use of the tab(ulate) command. Trytab urb tab urb [aw=rfact]tab urb [iw=rfact]tab urb regn

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    TabThe tab command has options for generating 1-way tables of freqs tab urb, summ(toinc)and two way tables tab urb sextab urb sex, rowtab urb sex, row col chi2 tab urb sex, all exact

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Table Generation w/ tableAside from the tab command, we can generate tables of statistics with the table command. Compare tab urbwithtable urb

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    TableTo generate the average (family) income and average (family) expenditure across urban and rural areas, enter table urb, c(mean toinc mean toexp)Using weights table urb [aw=rfact], c(mean toinc mean toexp)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    TableThe contents option may specify at most five of the ff statistics: freq (for frequency) mean varname (for mean of varname) sd varname (for standard deviation) sum varname (for sum) rawsum varname (for sums ignoring optionally specified weight) count varname (for count of nonmissing data)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    TableThe contents option may specify at most five of the ff statistics:n varname (same as count)max varname (for maximum)min varname (for minimum)median varname (for median)p1 varname (for 1st percentile)p2 varname (for 2nd percentile) ...iqr varname (for interquartile range)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Exercise Using TableObtain the average and median per capita income of households by sex of household head table sex, c(mean pci median pci)Obtain the weighted frequency of poor and nonpoor households across regions table poor regn [iw=rfact]

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsSTATA has designed a family of commands especially for sample surveys. These commands all begin with svy svyset setting variables svydes describe strata and PSUs svymean estimate popn & subpop means svytotals estimate popn & subpop totals

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsSvy commands svyprop estimate popn & subpop props svyratio estimate popn & subpop ratios svytab for two way tables svyreg for regression svyivreg for instrumental variables reg svylogit for logit reg svyprobitfor probit reg

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsSvy commands svytest for hypothesis testing svylc for estimating linear combs svymlog for multinomial logistic reg svyolog for ordered logistic reg svyoprob for ordered probit reg svypois for poisson reg svyintrg for censored & interval reg

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsBefore issuing any svy estimation command, we identify the weight, strata and PSU identifier variables svyset pweight rfact svyset strata domain svyset psu hcn

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsTo obtain the average family income & average family expenditure svymean toinc toexp To obtain the total family income, total family expenditure by provincesvytotal toinc toexp, by(regn)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsTo obtain the per capita income & per capita expenditure svyratio toinc/fsize toexp/fsize pci & pce by urban/rural svyratio toinc/fsize toexp/fsize, by(urb)

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsLinear regression of ln(pci) gen loginc=ln(pci)svyreg loginc age fsize sex prov urbCompare the results with the regular regression commandreg loginc age fsize sex prov urb

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Using Survey CommandsTwo way tablessvytab urb poor, row se compared withtab urb poor [aw=rfact], no freq row

    2000 SPSS Public Sector User Exchange

  • Alternatives to STATA

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Learning More about StataOnline tutorial, typetutorial introList of TutorialsTutorial Description-----------------------------------------------------intro An introduction to Statagraphics How to make graphstables How to make tablesregress Estimating regression models, inc 2SLSanova Estimating one-, two- and N-way ANOVA and ANCOVA models

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Learning More about StataTutorial Description-----------------------------------------------------logit Estimating maximum-likelihood logit and probit modelssurvival Estimating ML survival modelsfactor Estimating factor and principal component modelsourdata Description of the data we provideyourdata How to input your own data into Stata

    2000 SPSS Public Sector User Exchange

    *SIAP-SRTC Training on Sampling

    Learning More about StataEmail distribution list. Send email to [email protected] the body of your email message type the message subscribe statalist email@address or for a daily summary subscribe statalist-digest email@address

    2000 SPSS Public Sector User Exchange

  • Maraming Salamat sa inyong pakikinig.(Thank you for your attention)

    2000 SPSS Public Sector User Exchange

    2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange