prr 475: spss for windows - basics, lab nov 17-24 · web view9 = delhi, 10 = lower huron, 11 =...

34
PRR844 SPSS & Statistics Survey data analysis with SPSS 10.1 We will practice data analysis with the dataset from the 1996 Huron- Clinton Metro-parks visitor survey. To prepare for lab: (March 10 and 17 in 218 NR, March 12 in 105 Farrell Hall) 1. Read the brief description of HCMA survey methods (page 16). 2. Study questionnaire (handed out in class) and the codebook (pages 17- 18) to become familiar with questions asked and how variables are coded in the computer data file. 3. Review basic statistical procedures (page 10-15, particularly 11 and 12) 4. Go to lab and walk through portions of the SPSS Tutorial, start at beginning 5. Skim SPSS Procedure summary (pages 2-9) 6. In lab we will walk you through the practice exercises (Page 7) TIP: This exercise requires that you have some familiarity with the HCMA survey, some knowledge of basic statistics, and some familiarity with SPSS procedures. Thinking about management, planning or policy questions that suggest particular analysis of this dataset is also helpful. Please review the HCMA questionnaire and codebook prior to the lab and begin formulating ideas about variables you are interested in and hypotheses to test. We can return to micro-lab on March 17/19 to provide individual help. You should first try to complete the exercise yourself. EXERCISE: DUE – March 29 Formulate a couple of research questions/ mini-analysis for the HCMA survey, run appropriate procedures in SPSS and report the results. Refer to the HCMA questionnaire and codebook to identify measurement scales of variables and how to interpret each variable. a. First describe two or more variables - run suitable descriptive statistics (FREQ, DESC). b. Then test at least one hypothesis about a relationship between two variables and/or estimate a confidence interval around a population parameter estimate. Begin with CROSSTAB procedure with a Chi square statistic Or MEANS procedure to compare means for two or more

Upload: others

Post on 03-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

Survey data analysis with SPSS 10.1

We will practice data analysis with the dataset from the 1996 Huron-Clinton Metro-parks visitor survey.

To prepare for lab: (March 10 and 17 in 218 NR, March 12 in 105 Farrell Hall)

1. Read the brief description of HCMA survey methods (page 16).2. Study questionnaire (handed out in class) and the codebook (pages 17-18) to become familiar with

questions asked and how variables are coded in the computer data file. 3. Review basic statistical procedures (page 10-15, particularly 11 and 12)4. Go to lab and walk through portions of the SPSS Tutorial, start at beginning 5. Skim SPSS Procedure summary (pages 2-9)6. In lab we will walk you through the practice exercises (Page 7)

TIP: This exercise requires that you have some familiarity with the HCMA survey, some knowledge of basic statistics, and some familiarity with SPSS procedures. Thinking about management, planning or policy questions that suggest particular analysis of this dataset is also helpful. Please review the HCMA questionnaire and codebook prior to the lab and begin formulating ideas about variables you are interested in and hypotheses to test.

We can return to micro-lab on March 17/19 to provide individual help. You should first try to complete the exercise yourself.

1

EXERCISE: DUE – March 29 Formulate a couple of research questions/ mini-analysis for the HCMA survey, run appropriate procedures

in SPSS and report the results. Refer to the HCMA questionnaire and codebook to identify measurement scales of variables and how to interpret each variable.

a. First describe two or more variables - run suitable descriptive statistics (FREQ, DESC).b. Then test at least one hypothesis about a relationship between two variables and/or estimate a

confidence interval around a population parameter estimate.

Begin with CROSSTAB procedure with a Chi square statisticOr MEANS procedure to compare means for two or more groups

Write up the analysis in 3-5 pages organizing the results for presentation. DO NOT simply DUMP out the raw SPSS output. Create your own tables, format them nicely, and explain the results, reporting only what is important. Attach the SPSS printouts of the procedures you ran as an appendix.

Page 2: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

BASIC SPSS PROCEDURES0. Log in with your pilot name and password. Remember to log out when done. Choose Windows 2000 and wait for system to load. 1. LOAD SPSS – From Start, Programs, Math/Stat Apps, SPSS,… SPSS 10.1 for Windows

Close dialog box that opens at the beginning2. RETRIEVE DATA FILE – In SPSS, from menus, Choose File, Open, Data

Navigate on AFS Root (U drive) to msu.edu/course/prr/844 – pick HCMA96.SAV3. GUIDANCE TO SPSS WINDOWS

DATA FILE (SAV)– two tabs, Data View has raw data, Variable view has codebook- note that some menu commands only show in Data View

OUTPUT (SPO) - results are shown here with outline. Each command you run pastes outputs to end of this file/window. This file not readable outside SPSS. Can Copy and Paste to Word or Excel. PRINT from here by selecting sections you want to print, and chosing File, Print. I recommend copying tables you want to Word or Excel and printing from there.

SYNTAX FILES (SPS) – can store sequences of commands here to run in “batches”. The PASTE button on most procedures, pastes the command to this file. This is a simple Text file that can be edited in Notepad.

Switch Windows from Buttons at bottom or Window command on Menus at top4. SETTING OPTIONS – In EDIT OPTIONS on menu bar

Recommend – General Tab: Variables in alphabetical order, Disply Names or Variable LabelsViewer Tab : check “display commands in log” box at bottom. This will write the procedures

to output window.For options changes to take effect, must re-retrieve the file

File, New, Data – opens a blank fileFile, choose HCMA96.SAV from recently used files at bottom

5. SPSS PROCEDURES – General Stepsa) Choose a PROCEDURE from ANALYZE menub) Choose Variables by moving them into boxesc) Select Options, Statistics, etc buttons for other than the standard outputd) Click OKe) RESULTS appear in OUTPUT window.

6. BASIC PROCEDURESFREQUENCIES – for frequency table of single nominal/ordinal scale variableDESCRIPTIVES – for mean and standard deviation of a interval scale variablesCROSSTABS – For bivariate distribution of two or more nominal/oirdinal scale variablesCOMPARE MEANS – Compare means on interval scale dependent variable for two or more groups defined by a

nominal/ordinal independent variableCORRELATIONS – bivariate correlations of interval (Pearson) or ordinal (Spearman/Kendall tau)HYPOTHESIS TESTS : Pick these within the above procedures – ask for standard error of mean in DESCR, Chi

Square in CROSSTABS, ANOVA Table & Eta in COMPARE MEANS.

7. WEIGHTS : Can weight cases to expand from sample to population of all HCMA visitors and/or to adjust for disproprotionate sampling. There are four weights for this file:

If analyzing characteristics of visits,VSTWT – adjusts sample to 2.78 million visitsVSTWT2 – adjusts for disproportionate sampling, but maintains same sample size

If analyzing characteristics of visitors,VSITORWT – exand sample to population of all HCMA Visitors in 1996VSITORWT2 – adjusts to population of visitors, but maintains given sample size.

TO SET WEIGHTS, in Data View, DATA, Weight Cases, choose “Weight cases by” and move appropriate weight variable into the box. NOTE- weight remains on until turned OFF. To turn weight off, come back and select “Do not weight cases”.

2

Page 3: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

SPSS FOR WINDOWS version 10.0 - LAB March 12-26.

Contents: SPSS procedures - 1-4; Practice Exercise - 5-6 ; Assigned Exercise is on page 6 at bottom. Sample analysis - 7; HCMA study description - 8 Codebook 9-10.

SPSS stands for Statistical Package for the Social Sciences. Other popular statistical software includes SAS, SYSTAT and MINITAB. SPSS is well suited to analysis of social science/survey data. Like all statistical packages, SPSS works with a table of data with cases as rows and variables as columns (just like an Excel Table - in fact you can import Excel tables directly to SPSS and vice versa). For survey data, each case is a respondent or questionnaire and each variable is usually a numeric coding of the response to a single question on the survey instrument. Statistical packages prefer to analyze data in numeric form so one codes variables like GENDER as something like 1=male, 2=female ( 1=male, 0=female is better). We will be analyzing data from the 1996 Huron Clinton Metropark visitor survey. The HCMA survey dataset includes 4,031 cases and 136 variables (original). A few of the messier variables have been dropped for this exercise and other variables have been computed.

You will need copies of the HCMA96.SAV file to complete this exercise. You may retrieve directly in SPSS in micro-labs from the Course AFS space. You also should have reviewed the HCMA questionnaire and codebook to become famliar with the data set – variables, coding etc.

1. Loading SPSS-PC . Run SPSS by selecting the SPSS program from the START menu (In math/stat applications, SPSS,SPSS10.1.4, SPSS 10,1 for Windows).

When SPSS opens you will see options to run tutorial, enter data, or open an existing file (the default). Run through tutorials for a preview on your own. To retrieve HCMA data file, close the opening dialogue box and retrieve file directly from SPSS menus. File, Open, Data then browse to the HCMA96.SAV file in the PRR844 course AFS space.

When the file is loaded, you will see the data in the data window in spreadsheet format. Variable names are at top of columns. Cases run down rows. Each case/row represents one respondent/completed questionnaire. See HCMA codebook and questionnaire to match variables with items on the questionnaire. To see codes as Values rather than numbers, choose View on menus and check Value Labels (uncheck to toggle back to numbers). To see information about any variable, choose "Variable view" tab at bottom. On menus, Utilities, Variables shows you information for all variables. You are now ready to run statistical analysis.

2. To run Statistical Procedures choose the ANALYZE option on menu and then the statistical procedure you wish to run. We will work mostly with Descriptive Statistics and the Compare Means procedure.

Descriptive Statistics

FREQUENCIES frequencies for nominal & ordinal variables

DESCRIPTIVES means etc. for interval/ratio scale variables

EXPLORE Exploratory data analysis procedures to see distributions

CROSSTABS Tables for nominal or ordinal (few categories) variables, Chi square test

3

File Edit Transform New Open

DataSyntaxOutput

Transform Analyze ReportsDescriptive StatisticsCustom TablesCompare meansGeneral LinearCorrelateRegressionLogLinearClasssifyData ReductionScaleNonparametric testsSurvivalMultiple Response

Descriptive StatisticsFrequenciesDescriptivesExploreCrosstabs

Page 4: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

COMPARE MEANS Interval dependent variable, nominal or limited category independent variable

Means Compare subgroup means, Options ANOVA for stat test

One Sample T-Test Test H0 : Mean of variable = some constant Indep. Samples T-Test Two groups, Test H0 : Mean for group 1 = Mean for group 2

Paired samples T-Test Paired variables - applies in pre-test, post-test situation

One Way ANOVA Compare means for more than two groups

3. General Steps for Running Procedures. a. First choose a procedure from Analyze menu. Note the appropriate procedure depends on measurement levels

of your variables and nature of the intended analysis. See 5 below for details.b. Choose variables : Select from list of variables at left, click arrow to move into Variable Box at right. Note that

you can choose several variables at a time - move one at a time by selecting and clicking arrow or by double clicking on variable name. Hold CTRL key down while clicking to select several variables and move to Variable Box as a group. To Unselect a variable, click on it in the Variable Box on right, arrow switches direction, click it to move back.

c. Select Buttons at bottom for special Options, Statistics, etc. - complete dialog boxes, CONTINUEd. Click OK to run the proceduree. Results appear in the OUTPUT Window. SPSS automatically switches to output window when you run a

procedure. Scroll around in this window to view results. To return to Data window click HCMA96 button on application bar at bottom or choose HCMA96 from Window menu item.

4. SPSS Windows and files. SPSS throws up lots of WINDOWS, often not maximized. Use the MAXIMIZE buttons at top right of windows to expand display to full screen. Use WINDOW command on menu bar to choose between the Output or Data Windows or choose them from Application bar at bottom. Three primary windows are

The Data Window - a spreadsheet showing raw data, variables across columns, cases down rows. Run most procedures from here. SPSS data files have an *.SAV extension. SPSS 10.0 has added a "variable view" page to the data window accessed via Excel-type tabs at bottom. The Variable view page has definitions of variables and coding information.

Output window - when you run a procedure, results are shown in the Output window. This is like a wordprocessor with outline at left to select particular results. You may print results from here or copy and paste them to WORD or EXCEL. SPSS Output files have an *.SPO extension

Syntax window - optional. If you use Paste option, you can paste procedures to syntax window, where you can easily rerun them or edit them. SPSS syntax files have an *.SPS extension.

SPSS data and output files are specially coded files you can only read in SPSS. There are utilities to save data files as Excel or Access files, or to import data from those formats to SPSS. The syntax files are simple text files that can be read by a wordprocessor.

5. Guidance on individual procedures - basic statistics

a. FREQUENCIES - run this on variables at nominal or ordinal scale with a small number of categories. Gives frequency distribution for the variable and optional statistics.

b. DECRIPTIVES - run for interval scale variables to get mean, standard deviation, etc. , choose S.E. Mean in Statistics Dialog BOX to compute confidence intervals.

c. CROSSTABS - for nominal/ordinal variables, choose a row and column variable (variable with fewer categories for columns). In Statistics, select Chi square for a hypothesis test, in Cells choose Row Pct and Column Pct.

d. COMPARE MEANS - dependent variable must be interval scale (or dichotomous), independent variable forms subgroups (should take on limited set of values - usually nominal or ordinal).

e. CORRELATE - for two or more interval scale variables user Pearson, Spearman/Kendall for ordinal measures.

4

Compare MeansMeansOne Sample t-TestInd. Samples T-Paired Samples TOne Way ANOVA

Page 5: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

6. Variable Transformations - Sometimes you want to change coding of a variable or compute a new variable. RECODING AND COMPUTING procedures are in the TRANSFORM menu. Use RECODE to change coding of a variable (maybe to collapse into fewer groups or reassign missing codes) and COMPUTE to compute new variables (e.g. simple sum of other variables).

a. RECODE changes coding of a variable. First choose whether you want to put new codes in same variable or a different (new) one . The latter preserves old codes and sets up new variable with new codes. To preserve the original coding on the file, choose recode "into new variable". Then you must add name for new variable and press the CHANGE button. In either case, specify coding changes as follows. Select variable you want to change codes for and choose the "old and new values" option. Then complete the Dialog Box to indicate how codes should be changed. Press ADD button to add each coding change to the recode box. Repeat procedure for as many codes as you wish to change. Then press OK to execute the changes.

For example to change code 4 on the FIRST variable to group "within the past 5 years" (3) and "more than 5 years ago" (4) together, select recode into same variable, choose FIRST variable, choose OLD AND NEW VALUES button enter a 4 in box for old value and a 3 for new value at right. Then click the ADD button and a line 4 3 will appear in box. Click CONTINUE, then Click OK to perform the recoding. If you look in DATA Window under FIRST column all the 4’s should now be 3’s. When you run a FREQ on FIRST, 3’s and 4’s will be grouped and show up as 3’s. Careful as any value labeling won't be automatically corrected.

b. COMPUTE: To compute new variables from old. Choose transform, Compute. Enter a name for new variable in the Target Variable Box( 8 characters or less). Then enter a mathematical expression in the larger box after the = sign indicating how new variable is computed. Press OK to execute the procedure. Your new variable is added as a column at the end of file in DATA window. You may now use this variable in any procedure (refer to it by the name you assigned).

e.g. to compute a variable equal to length of time each party stayed in the park. Enter HOURS as a name in Target Variable Box. In numeric expression box enter LEAVE - ARRIVE. Press OK. Be careful to spell variable names correctly. You can paste variables into box by double clicking on them in the list of variables at left and then adding (or pasting from calculator pad) math expressions in between. You can edit inside box to correct mistakes. SPSS will add the new variable to the file - check it at far right in data window. You can now use the new HOURS variable like any other in a statistical procedure. It won't be kept when you exit SPSS unless you save file (probably no need to save file, but if you do, you'll have to put it in your own AFS space). Beware of missing values when computing new variables. Result will be missing if any variables in formula are missing.

Good practice when recoding or transforming is to always check the result before proceeding with further analysis. Check via frequencies on new and old variables or by manually checking a few cases in data window.

7. Other Procedures and Tips a. OPTIONS. SPSS may be set up to show variables in either alphabetic or file order in pick lists. To get “File”

order of variables, choose EDIT, OPTIONS in main SPSS menu and change Variable order from Alpha to File order (push radio buttons on General Tab at right). You must do this BEFORE retrieving the file. Choose File, New Data and then re-retrieve the file if you already loaded it for this change to take effect. This doesn’t change the order of variables on data window, only in variable pick lists.

b. CUSTOM TABLES: The Custom Tables procedures let you run descriptive statistics on groups of variables and assembles the results in tables, giving you some control over formatting and labeling. It produces what are sometimes called "banner tables" summarizing a number of variables in a single table. Use "Basic Tables" for descriptive statistics, "General Tables" for crosstabulations, and "Tables of Frequencies" for frequency distributions. You may check out this procedure after you have mastered those in SUMMARIZE section, if you wish.

c. PRINTING and SAVING. You may print results as you generate them from the output WINDOW, copy ones you want into a wordprocessor. To save output, when you exit SPSS (By File Exit command), answer YES to the question about saving your output. Enter a path and filename, e.g. A:SPSS.SPO to put it on your floppy or enter

5

Page 6: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics path to your AFS space. You don’t need to save the data (respond NO to this question when exiting). The SPSS.SPO file can only be read by SPSS. You can also copy and paste SPSS output to WORD or EXCEL by opening both SPSS and these applications. The Output window is a simple text editor - you can add your own notations and delete items you don't want. Outline at left is handy for finding a procedure you ran or deleting it.

d. Selecting and Sorting Cases: The Data menu has procedures to SORT the data file on a particular variable or to SELECT subsets of cases to use in an analysis. For example, to Select only cases from Kensington Metropark, choose Data, Select Cases and then push the IF tab and enter filter PARK=1 (Kensington is park 1 in coding scheme). Any subsequent analysis will only use the Kensington cases and you will see a "filter on" message in status bar and cases not from Kensington are "slashed out" in data window. REMEMBER To turn filter off when you want to return to all cases -, come back to DATA, Select CASES and choose the "all cases" radio button.

e. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable.

f. OTHER PROCEDURES. Feel free to explore other parts of the SPSS program. You can generate GRAPHS in the graphs menu (be careful to save before trying graphs - labs are crash prone on graphing), and view various information about the file in UTILITIES menu. See the HELP menus for tutorial and further information about using SPSS for WINDOWS. If you’d like more instruction in SPSS, the Computer Lab runs shortcourses. Also check out the SPSS site on the web at www.spss.com and tutorials.

g. SYNTAX WINDOW. SPSS also lets you paste commands into a syntax window (look for the PASTE buttons on most procedures). If you prefer you can type, edit and run procedures from the syntax window if you know the syntax. This is sometimes faster than navigating thru the menus, but requires some familiarity with SPSS syntax. If you paste commands to Syntax window, you can save the syntax file and easily rerun procedures later. This simplifies rerunning a complicated set of procedures. The HCMA.SPS syntax will run the procedures in the practice exercise that follows. To retrieve, in SPSS menu use File Open, Syntax file and point to hcma.sps file on U drive in course AFS space. Then choose Run, All to run all procedures. Check Output window for results.

h. Using Excel for data Entry. If you want to enter survey data in Excel and then import result to SPSS, use these guidelines. On first row of spreadsheet enter a short (8 characters or less) variable name - avoid spaces and special characters as SPSS may not like them. You can name variables VAR1, VAR2, … but this makes them harder to identify. Enter each questionnaire as a separate case below the names, one case to a row - no blank rows. Save Excel file when complete and close it. To retrieve this file into SPSS, Use File, Open, Data and change default extension to xls files and pick your Excel file. Enter range on spreadsheet where data is located and click button for variable names in first row if you have done that. Should read data into SPSS. Be careful about blanks in Excel as these will come in as missing.

i. MISSING VALUES and N's. SPSS allows certain values to be designated as "missing" for each variable and has a general "system missing value" designated by a "." . It is a good idea to pay attention to the number of cases for any procedure you run and understand when lots of cases are missing or when you have filtered out cases with a SELECT CASES procedure. Watch the N"s. If you just look at percentages and means, these may be based on only a few cases. Remember that confidence levels of results will depend on the sample size. Also beware of statistical tests from WEIGHTed analyses, that may distort the actual sample size.

6

Page 7: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics SPSS PRACTICE EXERCISE

0. So we all have the same options, let's list variables in alphabetic order rather than file order. Choose Edit, Options from SPSS menu, then General tab. In variable lists at right select "Alphabetical". Then OK. Also choose the "Viewer" tab and make sure "Display commands in log" at bottom is checked. Sometimes it is easier to see variables in file order – Set this option in General Tab – choose "Display names" instead of Variable labels.

You will need to re-retrieve the data file for these changes to take effect. Following is easiest way.File, New, Data - will open a blank file

File, choose HCMA96.SAV from recently used files at bottom OR File Open and point to it again.

1. FREQUENCIES for nominal, ordinal variable. Describe the characteristics of park visitors - income and age. From codebook note these variables are measured in categories- i.e. ordinal scale with small number of categories. Run FREQUENCIES. In menu choose Analyze, Descriptive Statistics, Frequencies. Find INCOME and AGE2 variables on list at left (near end in file order, or in alphabetic order). Select them with mouse and click arrow to move to the variable box at right (or double click on the variable). Click OK to run frequencies.

2. DESCRIPTIVES for interval scale variable. How many female visitors were there on average in each? Find variable on codebook = TOTFEMAL, note it is interval scale. In menu Choose Analyze, Descriptive Statistics, Descriptives. Complete Dialog box as above by selecting TOTFEMAL (near end in file order) and moving it to the input box.

3. CROSSTABS with two nominal or ordinal variables. Crosstabs generates a table using two variables, one for rows and one for columns. Question- What is distribution of the sample by age and income? From menu, choose Analyze, Descriptive Statistics, Crosstabs. Complete Dialog Box by choosing INCOME for the row variable and AGE2 for the column variable. Also click the

STATISTICS button at bottom and ask for all of them -- CONTINUECELLS button at bottom and ask for observed count, row percents, and column percents --

then CONTINUE and OK to run procedure.

4. COMPARE MEANS. Use this procedure to compare averages of two or more subgroups. Dependent variable is interval scale variable (means are computed for this variable). Independent variable should be nominal or have small number of values/groups - it forms groups. Let's see if visitors using an annual motor vehicle permit visit parks more often than those entering on a daily permit. Find variables - independent = MVP95 identifies those with a MVP (grouping variable). Dependent= HCMATOT measures days of use of Metroparks last year.

5. Simple Hypothesis testing a. Confidence interval for an average. In the DESCRIPTIVES procedure (Analyze, Descriptive Statistics,

Descriptives) let's compute the average days that people used Metroparks last year - HCMATOT just as in 2 above, but also click the OPTIONS button at bottom of dialog box and ask for SE mean (Standard error of mean ) Click CONTINUE, then OK to run. To get 95% confidence interval you add and subtract two standard errors from the sample mean.

b. Differences in means. Are those who use annual permits older than those who use dailies? Two variables are MVP which indicates whether people used a daily or annual permit to enter the park and AGE which gives age of the respondents (interval scale). We want to compute means for AGE for each type of entry permit. In menu, choose Analyze, Compare Means, Means as in 4 above. Complete Dialog Box by choosing dependent variable - the interval scale one = AGE, then independent or subgroup variable - nominal or ordinal scale with small number of categories = MVP.

For statistical test of hypotheses that all the subgroup means are equal, also choose OPTIONS and ask for the "ANOVA Table and eta" at bottom. Also ask for SE Mean by moving this from list of statistics to "Cell Statistics" at right. Then click OK to run the procedure.

7

Page 8: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics Alternatively you could perform the independent samples T-test on two groups defined by MVP (say group

one has MVP =0, group 2 =1).

c. Chi square - tests for relationships in a crosstab table between two nominal/ordinal variables. Are higher income visitors more likely to use annual permits? Note MVP95 and INCOME are measured in a small number of categories (nominal or ordinal). Run the Crosstab procedure (Analyze, Descriptive Statistics, Crosstabs - see 3 above) with MVP95 and INCOME. In Statistics, choose Chi square, In Cells choose row and column percents.

d. Correlations - For two interval scale variables. Is age (AGE) correlated with total days of use (HCMATOT)? Procedure is Analyze, Correlate, Bivariate. Choose AGE and HCMATOT.

6. General rules for interpreting hypothesis tests.

1. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables (e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column variable in a crosstab table).

2. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5% chance of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and look for SIG's <.01. Weaker, use 90% , SIG's < .10.

3. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc are the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you could get results like those you see from a random sample of this size IF there were no relationship between the two variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no relationship (the null hypothesis). For 95% level, you REJECT null hypothesis if SIG <.05

If SIG > .05 you FAIL TO REJECT REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship.4. Hypothesis tests are evaluating whether you can generalize from information in the sample to draw conclusions

about relationships in the population. With very small samples most null hypotheses cannot be rejected while with very large samples almost any hypothesized relationship will be "statistically significant" - even when not practically significant. Be cognizant of sample size (N) when making tests.

7. RECODING AND COMPUTING NEW VARIABLES - TRANSFORM . Sometimes you want to create new variables or change the coding of an existing variable (e.g. to collapse categories)

COMPUTING NEW VARIABLES. What is the average number of visitors in each party? You will need to COMPUTE a new variable equal to the sum of total female and male visitors and then run DESCRIPTIVES on this new variable. Transform, Compute. Enter name of new variable - PARTY then formula = TOTMALE + TOTFEMALE in box (paste in names to be sure of spelling). OK. This variable has already been created and saved on file.

RECODING a variable. Suppose we want to collapse income into two categories, say above or below 50,000. Choose TRANSFORM on menu, then RECODE. Complete dialog box to recode income into a NEW variable (See Recode command on previous page) - call it INCOM2. Then run FREQUENCY on INCOM2

8. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual

8

Page 9: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable. To turn weigths off, return here and check “Do not weight cases”.

SAMPLE ANALYSIS: All of this will easily fit on one page. (attach the relevant SPSS output).

I hypothesized that the respondents who list price of admission as an important factor in choosing a park will have larger family sizes. The appropriate survey variables are Q7PRICE measured as a 5 item Likert scale (ordinal) and TOTFAM. TOTFAM is computed as a sum of the number of children 18 and under (HOUSEKID) and adults (HOUSEADT) that live in household.

First I computed TOTFAM=HOUSEKID + HOUSEADT. Then I ran Descriptives on these three variables to get means. Put these into a Table and show results. I've omitted the numbers. You should briefly describe and interpret results in a paragraph and display details in short table or figure (format tables & figures properly).

Table 1. Average family size for visitors

Category Number of People Children AdultsTotal Report a 95% confidence interval for TOTFAM by getting DESCRIPTIVES and asking for SE Mean.

Run frequencies on Q7PRICE variable - report as percentages in a simple table. (Use the Valid Pct Column, do not report everything SPSS prints out e.g - omit cumulative pcts unless they are meaningful to you)

Table 2. Rating of importance of admission price

Importance Pct extremely importantvery importantimportantsomewhat importantnot important

Based on Table 2, I split the sample into two groups: Q7PRICE = 1, 2 or 3 (extremely, very, and important) formed group one, and 4 or 5 group two. An independent samples T-Test was run to test for a difference in the average family sizes (TOTFAM) across the two subgroups (when asked to define groups, I used 4 as the Cut Point. All codes less than the cut point form one group, and all codes greater than or equal to the cut point form the other group). Show results of this in short table. Note those rating price as more important (Group 1) have somewhat larger family sizes (2.6 people compare to 2.3). The difference is statistically significant at the 95% confidence level.

Table 3. Test of Difference in Family Sizes by Importance of Admission Price

Importance Subgroup Average Family Size T-statistic SIG Group 1: Extremely, very, or important 2.6Group 2 : Somewhat or not important 2.3Test of difference in Means 3.77 .000

This example illustrates how to explain the analysis including variables you selected and any changes you made in them (recodes) and also the results. If you choose nominal or limited category variables you will use crosstabs and Chi Square test. Be sure to first describe variables and then perform the statistical test.

9

Page 10: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics STATISTICS - SUMMARY

1. Functions of statisticsa. description: summarize a set of data b. inference: make generalizations from sample to population. parameter estimates, hypothesis tests.

2. Types of statisticsi. Descriptive statistics: describe a set of data

a. frequency distribution - SPSS Frequencyb. central tendency: mean, median (order statistics), mode. SPSS - Descriptivesc. dispersion: range, variance & standard deviation in Descriptivesd. Others: shape -skewness, kutosis.e. EDA procedures (exploratory data analysis) . SPSS Explore

Stem & leaf display: ordered array, freq distrib. & histogram all in one.Box and Whisker plot: Five number summary-min.,Q1, median, Q3, and max.Resistant statistics: trimmed and winsorized means,midhinge, interquartile deviation.

ii. Inferential statistics: make inferences from samples to populations.a. Parameter estimation – compute confidence intervals around population parametersb. Hypothesis testing - test relationships between variables

iii. Parameteric vs non-parametric statisticsa. parametric : assume interval scale measurements and normally distributed variables.b. nonparametric (distribution free statistics) : generally weaker assumptions: ordinal or nominal

measurements, don't specify the exact form of distribution.

3. General rules for interpreting hypothesis tests.

i. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables (e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column variable in a crosstab table).

a. Pearson Correlation rxy =0.b. T-Test mx =myc. One Way ANOVA M1=M2=M3=...=Mnd. Chi square : No relationship between X and Y. Formally, this is captured by the "expected table", which

assumes cells in the X-Y table can be generated completely from row and column totals. ii.. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5%

chance of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and look for SIG's <.01. Weaker, use 90% , SIG's < .10.

iii.. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc

are the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you could get results like those you see from a random sample of this size IF there were no relationship between the two variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no relationship (the null hypothesis). For 95% level, you REJECT null hypothesis if SIG <.05

If SIG > .05 you FAIL TO REJECT REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship. iv. Hypothesis tests are assessing if one can generalize from information in the sample to draw conclusions about

relationships in the population. With very small samples most null hypotheses cannot be rejected while with very large samples almost any hypothesized relationship will be "statistically significant" - even when not practically significant. Be cognizant of sample size (N) when making tests.

Type I error: rejecting null hypothesis when it is true. Prob of Type I error is 1-confidence level.Type II error: failing to reject null hypothesis when it is false. Power of a test = 1-prob of a type II error.

10

Page 11: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

DESCRIPTIVE STATISTICS

As the name implies, these are used to describe characteristics of the sample or the population it is intended to represent. Begin by describing variables one at a time (univariate statistics). There are two basic procedures for this:

FREQUENCIESIf variable is nominal, or ordinal with a small number of categories/levels, use SPSS FREQUENCIES procedure.This will produce a table giving the number and percentage of cases that gave each of the possible responses.

Here is a sample SPSS output table from FREQUENCY of INCOME variable. Check questionnaire to see that income was measured in 4 categories with a “choose not to answer” response. The codebook or Variable View page on SPSS file will indicate variable was coded 1-4 for four income groups, and 5 for the “Choose not to answer” response.

TOTAL HOUSEHOLD INCOME BEFORE TAXESFrequency Percent Valid

PercentCumulative

Percent Valid UNDER $25,000 112 10.4 14.0 14.0

$25,000 TO $49,999

267 25.0 33.5 47.5

$50,000 TO $74,999

227 21.2 28.4 75.9

$75,000 OR MORE

193 18.0 24.1 100.0

Total 798 74.5 100.0 MissingCHOOSE NOT TO

ANSWER182 17.0

System 90 8.4Total 273 25.5

Total 1071 100.0 The five possible responses are the rows. Notice response categories (values) are labeled (“Under $25” etc).

Frequency = number of cases selecting this response Percent = percentage this is of al cases Valid Percent = percentage of “non-missing” cases. Here the No answer response is missing as is “system

mising”, cases that left this question blank. Cumulative Percent = running total (not always useful or relevant)

Generally, you want to report the Valid Percent as your best estimate of the percentages of all visitors (in the population) in each income group. Raw counts are largely a function of sample size and not that useful.

DESCRIPTIVES

For interval or ratio scale variables, you usually want to compute means and standard deviations rather than frequencies.Here’s table from running DESCRIPTIVES procedure with the age variable. Age was measured as interval scale.

N Minimum Maximum Mean Std. Error Std. Deviation AGE OF SUBJECT 925 16 86 43.29 .48 14.52

Valid N (listwise) 925 In this case the average age was 43, lowest age in the sample was 16 and highest was 86. The average is based on 925 cases that answered this question. The standard deviation indicates the “spread” of ages in the sample. You may compute a 95% confidence interval for the estimate of average age by computing the standard error = standard deviation/ sqrt(n). In this example SE = 14.52/sqrt(925) = .48. A 95 % confidence interval is two standard errors either side of the mean = (43 + or – 2*.48) or roughly (42,44). SPSS computes the SE for you.

11

Page 12: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

Guidance to Statistical Tests - Hypothesis Tests

Testing hypotheses is a little more complicated. Here we want to test for a relationship between two or more variables. Again, which procedure to use depends on the measurement scales of the variables.

CROSSTABULATIONS – The CROSSTABS procedure – This is simply the bivariate version of FREQUENCIES. Run this when you have two variables that are nominal scale or have small number of categories/levels. (Nominal x Nominal)

SPSS produces the bivariate distribution in the sample and a Chi Square Statistic, which tests the null hypothesis of no relationship between two variables. This is analagous to using a Pivot Table in Excel. You need a minimum of 5 cases per cell in your table, so don’t run this with variables that have too many categories (recode if neccessary to collapse categories) . The Pearson Chi Square statistic in SPSS provides a test of whether or not the two variables are related.

Example of Crosstabs with HCMA data Examine relationship between age and income – CROSSTABS AGE2 BY INCOME (note

AGE2 puts age into a small set of categoriesCompare activity participation or attitudes of men and women – CROSSTAB of GENDER

with one of the activity or attitude variables.To get the Chi Square test along with the table, select the Statistics button and check Chi square.

Look for Significance levels smaller than .05 to reject null hypothesis of no relationship at the 95% confidence level. If SIG > .05 sample doesn’t provide enough evidence to conclude there is a relationship within the full population.

COMPARING SUBGROUP MEANS

Another common bivariate analysis is to compare means on an interval scale variable across two or more population subgroups. In this case you want an interval scale dependent variable (the one you compute means for) and a nominal scale independent variable (the one that forms the groups). (Nominal x Interval)

SPSS has several different procedures for comparing means. It will suffice to use the MEANS procedure. Put the interval scale variable in the dependent variable box and the variable for forming subgroups in independent variable box. To get a hypothesis test, select the Options button and check the “Anova table and eta” box at the bottom, then CONTINUE.

CORRELATIONS - Interval by IntervalPearson Correlation: Run CORRELATION procedure to get the correlation coefficient between the two variables AND a test of null hypothesis that the correlation in population is zero. Be sure you understand distinction here between the measure of association between the two variables in the sample (correlation coefficient) and the test of hypothesis that correlation is zero (making inference to the population).

Regression : is multivariate extension of correlation. A linear relationship between a dependent variable and several independent variables is estimated. t-statistics for each regression coefficient test for a relationship between X and Y while controlling for the other independent variables. Standardized regression coefficients (betas) indicate relative importance of each independent variable. The R square statistic (use adjusted R square) measures amount of variation in Y explained by the X’s.

12

Page 13: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

EXAMPLES OF T-TEST/ANOVA AND CHI SQUARE

The Independent Samples T-TEST Tests for differences in means (or percentages) across two subgroups. ANOVA is simply the extension to more than two groups and uses the F statistic. Null hypothesis with two groups is that the mean of Group 1 = mean of group 2. This test assumes interval scale measure of dependent variable (the one you compute means for) and that the distribution in the population is normal. The generalization to more than two groups is called a one way analysis of variance (ANOVA) and the null hypothesis is that all the subgroup means are identical. These are parametric statistics since they assume interval scale and normality.

In SPSS use Compare means, several options as follows:

Means Compare subgroup means, Options ANOVA for stat testOne Sample T-Test Test H0 : Mean of variable = some constant Indep. Samples T-Test Two groups, Test H0 : Mean for group 1 = Mean for group 2 Paired samples T-Test Paired variables - applies in pre-test, post-test situationOne Way ANOVA Compare means for more than two groups

Chi square is a nonparametric statistic to test if there is a relationship in a contingency table, i.e. Is the row variable related to the column variable? Is there any discernible pattern in the table? Can we predict the column variable Y if we know the row variable X?

The Chi square statistic is calculated by comparing the observed table from the sample, with an "expected" table derived under the null hypothesis of no relationship. If Fo denotes a cell in the observed table and Fe a corresponding cell in expected table, then

Chi square ( c2 ) = å (Fo -Fe)2/Fecells

The cells in the expected table are computed from the row (nr ) and column (nc ) totals for the sample as follows:

Fe =nr nc / n .

CHI SQUARE TEST EXAMPLE: Suppose a sample (n=100) from student population yields the following observed table of frequencies:

GENDERMale Female Total

IM-USEYes 20 40 60No 30 10 40Total 50 50 100

EXPECTED TABLE UNDER NULL HYPOTHESIS (NO RELATIONSHIP)GENDER

Male Female TotalIM-USE

Yes 30 30 60No 20 20 40Total 50 50 100

c2 = (20-30)2/30 + (40-30)2/30 + (30-20)2/20 + (10-20)2/20100/30 + 100/30 + 100/20 +100/20 = 13.67

13

Page 14: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

Chi square tables report the probability of getting a Chi square value this high for a particular random sample, given that there is no relationship in the population. If doing the test by hand, you would look up the probability in a table. There are different Chi square tables depending on the number of cells in the table. Determine the number of degrees of freedom for the table as (rows-1) X (columns -1). In this case it is (2-1)*(2-1)=1. The probability of obtaining a Chi square of 13.67 given no relationship is less than .001. (The last entry in my table gives 10.83 as the chi square value corresponding to a probability of .001, so 13.67 would have a smaller probability).

If using a computer package, it will normally report both the Chi square and the probability or significance level corresponding to this value. In testing your null hypothesis, REJECT if the reported probability is less than .05 (or whatever confidence level you have chosen). FAIL TO REJECT if the probability is greater than .05.

REVIEW OF STEPS IN HYPOTHESIS TESTING: For the above example :(1) Nominal level variables, so we used Chi square.(2) State null hypothesis. No relationship between gender and IM-USE

(3) Choose confidence level. 95%, so alpha = .05, critical region is c2 > 3.84

(4) Draw sample and calculate the statistic; c2 = 13.67(5). 13.67 > 3.84, so inside critical region, REJECT null hypothesis. Alternatively, SIG= .001 on computer

printout, .001<.05 so REJECT null hypothesis. Note we could have rejected null hypothesis at .001 level here.

WHAT HAVE WE DONE? We have used probability theory to determine the likelihood of obtaining a contingency table with a Chi square of 13.67 or greater given that there is no relationship between gender and IMUSE. If there is no relationship (null hypothesis is true), obtaining a table that deviates as much as the observed table does from the expected table would be very rare - a chance of less than one in 1000. We therefore assume we didn't happen to get this rare sample, but instead our null hypothesis must be false. Thus we conclude there is a relationship between gender and IMUSE.

The test doesn't tell us what the relationship is, but we can inspect the observed table to find out. Calculate row or column percents and inspect these. For row percents divide each entry on a row by the row total. Row percents:

GENDERMale Female Total

IM-USEYes .33 .67 1.00No .75 .25 1.00Total .50 .50 1.00

To find the "pattern" in table, compare row percents for each row with the "Totals" at bottom. Thus, half of sample are men, whereas only a third of IMusers are male and three quarters of nonusers are male. Conclusion - men are less likely to use IM.--------------------------------------------------------------Column Percents: Divide entries in each column by column total.

GENDERMale Female Total

IM-USEYes .40 .80 .60No .60 .20 .40Total 1.00 1.00 1.00

PATTERN: 40% of males use IM, compared to 80% of women. Conclude women more likely to use IM. Note in this case the column percents provide a clearer description of the pattern than row percents.

14

Page 15: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

COMPUTING Confidence intervals around parameter estimates. When you use a sample statistic to estimate a population parameter, you base your estimate on a single sample. Estimates will vary somewhat from one sample to another. Reporting results as confidence intervals acknowledges this variation due to sampling error. When probability samples are used we can estimate the size of this error. The standard error of the mean (SE Mean) is the standard deviation of the sampling distribution - i.e. how much do means for different samples of a given size from the same population vary? The SE Mean provides the basic measure of likely sampling error in a sample estimate.

A 95% confidence interval is two (1.96) standard errors (SE) either side of the sample mean.

SEMean = standard deviation in population/ square root of n (sample size)

SPSS computes standard deviations and/or standard errors for you. You should be able to compute a 95 % confidence interval if you have the sample mean (say X) and

a) standard error of mean (SEMean) - (X- 2*SEMean, X + 2* SEMean)b) standard deviation of variable in population ( ) and sample size (n) :

SEMean = /sqrt(n), 95% CI= (X- 2*/sqrt(n), X + 2*/sqrt(n))

Examples: a) In sample of size 100, pct reporting previous visit to park is 40%. If SEMean is 5%, then 95% CI is (40% +

or - 2 * 5%) = (30%, 50%). b) In sample of size 100, pct reporting previous visit to park is 40%. If standard deviation in population is 30%,

then SEMean is /sqrt(n) = 30/sqrt(100) = 30/10 = 3. and95% CI = (40 + or - 2 SEMean) = 40 + or - 2*3%) = 40 + or - 6% = (34%,46%)

c) If same mean and standard deviation as b) but using bigger sample of 900, note the95%CI = (40 + or - 2 * 30%/sqrt(900)) = 40 + or - 2*(30/30) = 40 + or - 2% = (38%, 42%)

OTHER STATISTICAL NOTES

a. Measures of strength of a relationship vs a statistical test of a hypothesis. There are a number of statistics that measure how strong a relationship is, say between variable X and variable Y. These include parametric statistics like the Pearson Correlation coefficient, rank order correlation measures for ordinal data (Spearman's rho and Kendall's tau), and a host of non-parametric measures including Cramer's V, phi, Yule's Q, lambda, gamma, and others. DO NOT confuse a measure of association with a test of a hypothesis. The Chi square statistic tests a particular hypothesis. It tells you little about how strong the relationship is, only whether you can reject a hypothesis of no relationship based upon the evidence in your sample. The problem is that the size of Chi square depends on strength of relationships as well as sample size and number of cells. There are measures of association based on chi square that control for the number of cells in table and sample size. Correlation coefficients from a sample tell how strong the relationship is in the sample, not whether you can generalize this to the population. There is a test of whether a correlation coefficient is significantly different from zero that evaluates generalizability from the sample correlation to the population correlation. This tests the null hypothesis that the correlation in the population is zero.

b. Statistical significance versus practical significance. Hypothesis tests merely test how confidently we can generalize from what was found in the sample to the population we have sampled from. It assumes random

sampling-thus, you cannot do statistical hypothesis tests from a non-probability sample or a census. The larger the sample, the easier it is to generalize to the population. For very large sample sizes, virtually ALL hypothesized

relationships are statistically significant. For very small samples, only very strong relationships will be statistically significant. What is practically significant is a quite different matter from what is statistically significant. Check to see how large the differences really are to judge practical significance, i.e. does the difference make a difference?.1996

15

Page 16: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

Huron Clinton Metroparks User Survey

BACKGROUND. The Huron Clinton Metropolitan Parks Authority (HCMA) manages a system of 13 parks in southeast Michigan. As part of HCMA’s continuing effort to meet the needs of people of Southeast Michigan, a user survey was conducted during 1995-96. The results of the survey will be used to update HCMA’s 5 year plan.

OBJECTIVES

1. Describe characteristics and patterns of use of HCMA park users

2. Identify trends in user characteristics and patterns via comparisons with previous surveys.

3. Identify and profile managerially relevant market segments

4. Evaluate visitor satisfaction with HCMA parks and measure visitor preferences for new facilities and programs.

METHODS: A self-administered survey of HCMA visitors was conducted between Dec 1, 1995 and November 30, 1996. Four page questionnaires were distributed to a sample of visitors in vehicles entering one of the 13 HCMA units during this period.

The sample was stratified by park, season, weekend-weekday and time of arrival at the park. Sampling was disproportionate across these strata to assure an adequate size sample for each park and season. Weights adjust the sample to the actual distribution of visits in 1995-96. Each park distributed questionnaires on 10-12 dates during each season. Dates were uniformly distributed throughout each season and divided evenly between weekends and weekdays. Gate attendants distributed surveys to each vehicle entering the park during the first 5 minutes of each hour on sampling dates. During busy periods surveys were given to every other vehicle and during slow periods sampling was conducted for the first 10 minutes of each hour. Visitors could return surveys at drop boxes located at each park exit or by return mail.

The four page questionnaire was developed from the 1990 HCMA survey instrument. Questions cover party characteristics, use of daily vs annual permits, activities in the park, importance & satisfaction with park attributes (for an I-P analysis), knowledge and use of HCMA parks, preferences for new programs and facilities, and a set of household characteristics.

You will be analyzing data covering winter, spring, summer, and fall seasons. A total of 4,031 surveys were completed over this period (overall response rate of 42%). Surveys by park range from 815 at Kensington to just over 80 at some of the more lightly used parks.

SUGGESTED ANALYSIS

1. Use descriptive statistics to profile parks users - some sample results at website (hcma study).2. Compare two or more subgroups (maybe market segments defined by age, income, use of annual or daily permit, etc.

Develop segments by classifying visitors into useful subgroups and then describing important differences between the subgroups. Example - see activity segment table at hcma results website (link is in (1)).

3. Test for a relationship between two variables using CROSSTABS (Chi square) or COMPARE MEANS (T/F-Test)

We will be using SPSS-PC to analyze this survey. The data file HCMA96.SAV is a specially coded SPSS data file that can be retrieved from within SPSS. It is available in course AFS space.

16

Page 17: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics CODEBOOK: HURON-CLINTON METROPARKS USER SURVEY

Part 1. Variables that APPEAR on the questionnaire. Refer to the questionnaire for more details.QUESTION VARIABLE NAMES CODING COMMENTQ1-Time ARRIVE, LEAVE coded in military timeQ2- N/A dropped from fileQ3-Permit MVP 0=daily, 1=annual permitQ4-Permit 95 MVP95 0=No, 1=YesQ5-Activity NATURE to PLAYGRND 0=not participate,

1=participatesee footnote for variable names & code numbers

Q6-primary activity

PRIMACT activity number from Q5 see footnote below for activity codes

Q7-Importance (characteristics))

Q7BEAUTY to Q7CROWD (see questionnaire for details)

1=extremely important to 5=not important

Q8-Importance (reasons)

Q8FAMILY to Q8NATURE (see questionnaire for details)

1=extremely important to 5=not important

Q9-Facilities Q9WATER to Q9NONEWF (see questionnaire for details)

0=not chosen, 1=like developed

Q10-Programs Q10NATURE to Q10OTHER (see questionnaire for details)

0=not chosen, 1=like developed

Q11 Aware free admission

FREEDAY1, FREEDAY2 0=no, 1=yes if yes for FREEDAY1 continue for FREEDAY2

Q12-Familiarity (column a)

META to LAKA (see questionnaire for details)

0=blank, 1= familiar

Q12-Familiarity (column b)

METB to LAKB (see questionnaire for details)

number of times visited in the past 12 months

interval

Q13- First visit FIRST 1= today, 2= within the past year, 3= within the past 5 years, 4= more than 5 years ago

Q14- Get info INFOTV to INFOOTHR 0=blank, 1= get info from itQ15- Performance Q5BEAUTY to Q5OVERAL

(see questionnaire for details)1 = excellent , 2 = very good, 3 = good, 4 = fair, 5 = poor, -8 = don’t know

Q16- Comments N/A N/AQ17Zipcode ZIPCODE 5 digit zipcodeQ18- Age AGE code age intervalQ19- Gender GENDER 1= female, 2= MaleQ20- Employment EMPLOY (see questionnaire for detail) intervalQ21- Employed EMPLFULL, EMPLPART number of people employed intervalQ22- Marital status

MARITAL (see questionnaire for detail)

Q23- Family members

HOUSEKID, HOUSEADT numbers of children (adults) at home

interval

Q24- Education EDUCATE (see questionnaire for detail)Q25- Income INCOME (see questionnaire for detail)Q26- Race ETHNIC (see questionnaire for detail)Q27- Final N/A N/A

17

Page 18: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics Part 2. Created Variables. These variables were created by using recodes and computes.VARIABLE NAMES CODING COMMENTPARK 1 = Metro Beach, 2 = Wolcott Mill,

3 = Stony Creek, 4 = Indian Springs, 5 = Kensington, 6 = Huron Meadows, 7 = Hudson Mills, 8 = Dexter-Huron, 9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie

Park where survey was distributed- determined from survey ID number

COUNTY 1 = OAKLAND, 2 = LIVINGSTON, 3 = WAYNE, 4 = WASHTENAW, 5 = MACOMB, 6 = MONROE, 7 = other

county of residence determined from zipcode

AGE2 1= <17, 2= 18-35, 3= 36-59, 4= >59 Grouping of age into categoriesTOTFEMAL total number of female in party Sum from question #2TOTMALE total number of male in party Sum from question #2DAY 1= Monday, 2= Tuesday etc. day of the week of visitWEEKEND 0=weekday, 1=weekend from date distributedHCMATOT Sum of total days visited parks for 1995 Sum from question #12VSTWT2 weight to use to adjust sample to population

of visits. VSTORWT2 weight to adjust sample to population of

visitorsFLC 0 = S/NC/18-35, 1 = S/C/18-35,

2 = M/NC/18-35, 3 = M/C/18-35, 4 = S/NC/36-55, 5 = S/C/>36, 6 = M/NC/36-55, 7 = M/C/36-55, 8 = S/NC/>55, 9 = M/NC/>55, 10= M/C/>55

family life cycles computed from age, marital status & children in household

SEASON 1= winter, 2= spring, 3= summer, 4= fall from date distributed

Code number, variable names, and activities for question 5 and 6.

18

Page 19: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics 1 NATURE NATURE OBSERVATION OR

PHOTOGRAPHY2 SCENIC SCENIC DRIVE3 PICNIC PICNIC4 BIKE BICYCLE5 WALK WALK OR HIKE6 WALKPET WALK PET(S) 7 RUN RUN OR JOG8 ROLLER ROLLERSKATE OR IN-LINE SKATE OR

SKI9 VISITNC VISIT NATURE CENTER10 VISITF VISIT FARM11 VISITGM VISIT GRIST MILL12 SUNBATHE SUNBATHE13 BOATNM BOAT - NON-MOTOR14 BOATM BOAT - MOTOR15 FISHB FISH FROM BOAT16 FISHS FISH FROM SHORE17 WATERSL WATERSLIDE18 SWIMLAKE SWIM OR WADE IN LAKE19 SWIMPOOL SWIM OR WADE IN POOL

( INCLUDING WAVEPOOL)20 EVENT ATTEND A SPECIAL EVENT IN THE

PARK21 OTHERACT PARTICIPATE IN AN OTHER ACTIVITY22 GOLF GOLF23 PLAYGAME PLAY OTHER GAMES OR SPORTS

(NOT GOLF)24 WATCH WATCH GAMES OR SPORTS25 PLAYGRND USE PLAYGROUND EQUIPMENT OR

TOT LOT

(the following codes are for question 6 only)

26 ICE FISH27 CROSS COUNTRY SKI28 SLED OR TOBOGGAN29 ICE SKATE30 FISHING (UNDETERMINED)31 NONE

1

Page 20: PRR 475: SPSS FOR WINDOWS - BASICS, LAB Nov 17-24 · Web view9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie Park where survey was distributed- determined

PRR844 SPSS & Statistics

18