a p p e n d i x binfo.igme.es/sidpdf/035000/001/apendice b/35001_0004.pdf · 2007. 7. 27. · a p p...

57

Upload: others

Post on 21-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs
Page 2: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

A P P E N D I X B

INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS

The complete programs may be obtained from:

SPSS Inc., 444 North Michigan Dr., Suite 3000

Chicago , I1 60611

BMDP Statistical Software Inc.

1964 Westwood Blvd, Suite 202

Los Angeles , CA 90025

Page 3: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

3 ®P Stati tc software1981

W. J. Dixon, chief editor

M. B. BrownL. EngeimanJ. W. FraneM. A. HiII

_ A. I. JennrichJ. D. Toporek

UNIVERSIiY OF CALIFORNIA PRESS

BERKELEY • LOS ANGELES • LONDON • 1981

Page 4: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

-- Í

This publication repo rts work sponsored in part under grant RR -3 and con-tinuation contract number N01-RR-8.2107 of the Biotechnology ResourcesBranch of the National Institutes of Health. Reproduction in whole or in part isperni tted for any purpose of the United States Government

Orders for this publication should be directed to

UNIVERSITY OF CALIFORNIA PRESS2223 Fulton Street

Berkeley, California 94720

UNIVERSITY OF CALIFORNIA PRESS, LTD.London . England

Copyright © 1981 by The Regents of the University of California

Library of Congress Catalog Card Number: 80-53770

Comments on programs or orders for copies of theprogram should be addressed toBMOP Statistical Softwareas described in Appendix 0.

Manufactured in the United States of America

Page 5: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Contents

1. Introduction 1 7. The BMDP File -- Savinr _D:: ta and 65St:nttstics for Further Analysís

.1.1 A Cuide to This Manual 27.1 Examples of Writing and Using a 65

BMDP File2. Data Analysís -- Using che BMDP Programs 3 7.2 What is Stored in a &MDP File 67

7.3 Creacing a BMDP Filet the SAVE 682.1 Basic Terminology for Common 3 Paragraph

Features and Stattstics 7.4 Readtng the BMDP File: che INPUT 692.2 Overview of Data Analysis with BMDP 6 Paragraph

7.5 Additional BMDP File Features 70

3. Using BMDP Programs 15Types oí Anatyses - Program Oescnptions

3.1 Examples with Input and Output 153.2 Research Forms , Coding Sheets and 21 8. Data Description 73

Keypunching Cards3.3 Specifying the Data Format 21 8.1 P1D -- Simple Daca Des.,: iption and 74

Data Management8.2 P20 -- Detailed Daca Descriptíon, 80

Features Common to Ah Programs Including Frequencies8.3 P4D -- Single Column Frequencies -- 86

4, Requirements for a., Analysis -- BMDP 25 Numeric and NonnumericSyscere Instruction

4.1 The &NDP Instruction Language 25 9. Data in Groups -- Descriptíon , t Tests 934.2 System Cards 30 One-Way and Taro-uay Analysís of Variante4.3 Cosumon Errors in Specifying BMDP 31

Instructions 9.1 P3D -- Comparison of Two Cro..ps wich 94t Tests

9.2 P7D -- Description of Groups ( Straca) 1055. Describing Data and Variables with 35 with Histograms and

BMDP Instructions Analysís of Variante9.3 P9D -- Multiway Descripcion of Groups 116

5.1 Input and Output Examples Illustracing 36Instructions Common co All BMDPPrograms 10. Plots and Histograms 123

5.2 The PROBLEM Paragraph 385.3 Readtng the Data : the INPUT Paragraph 39 10.1 P5D -- Histograms and Univarlate 1245.4 Describing the Variables : che 40 Plots

VARIABLE Paragraph 10.2 P6D -- Bivariate ( Starter ) Plots 1335.5 The CRDUP or CATEGORY Paragraph 435.6 Free Format Data Reading ( new in 1981) 445.7 Additional Opcions and Features for 46 11. Frequency Tables 143

Formatted Data Reading ( new in 1981)5.8 Multiple Problems and Subproblems 47 P4F -- Two-tJay and Multiway Frequency 143

Tables -- Measures of Associationand the Log-Linear Model

6. Data Editing -- Variable Transformation 49 (Complete and Incomplete Tables)

and Case Seleccion

6.1 Data Editing and Transformacions -- 50 12. Míssing Values -- Parteras . Estinacion 207

The &MDP TRANSFORM Paragraph and Correlations

6.2 FORTRAN Transformations Using the 56BIMEDT Procedure 12.1 P80 -- Correlations with Options 209

6.3 PIS -- Mulcipass Transfornations 59 for Incomplete Data12.2 PAM -- Descriptíon and Estimation 217

of Iissing Data

v

Page 6: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

13. Regression 235 19. Survival Analysis 555

13.1 21R -- Multiple Linear Regression 237 19.1 P1L Life Tables and Survlval 55713.2 PU -- Stepwise Regression 251 Functions13.3 P9K -- All Possible Subsets 264 19.2 P21. -- Survival Analysis wíth 576

Regression Covaríates -- Cox Models13.4 P4K -- Regression on Principal 278

Components13.5 P5R -- Polynomial Regression 283 20. Time Series 595

20.1 PIT -- Univariate and Bivariate 60414. Nonlinear Regression -- And Maximum 289 Spectral Analysis

Likelihood Estimar ion 20.2 P2T -- Box-Jenkins Time Series 639Analysis

14.1 P3R -- Nonlinear RegressLon 29014.2 PAR -- Derivative-Free Nonlinear 305

Regression Appendices14.3 Applicacions of Nonlinear Regresston 315

Algoríthms Including Maximum A. Computacional ProceduresLikelihood Estimatinn

14.4 System Carde for PU and PAR 328 A.1 Random Numbers 66114.5 PLR -- Stepwise Logístíc Regression 330 A.2 Method of Provisional Means 662

A.3 Hotelling' s TZ and Mahalanobis D2 662

Y A.4Sartl

15. Anal sis of Variance and Covariance 345r

ett's Statistic (P9D) 663A.5 Tests and Measures in the Two-Uay 663

15.1 P1V -- One-Way Analysis of Variance 347 Frequency Table (P4F)and Covariance A.6 The Log-Linear Model (P4F) 666

15.2 P2V -- Analysis of Variance and 359 A.7 Parameter Escimates and Standard 667Covariance , Including Errors for Log-Linear Models (P4F)Repeaced Measures A.8 Instruccions for PIF , PU and PU 668

15.3 P4V -- General Univariate and 388 A . 9 Estimating (Smoothing) the Missing 671Multivariate Analysis of Value Correlation Matrix (PAN)Variance and Covariance , A.10 Replacing Missing Values (PAM) 671Including Repeaced Measures A.11 Linear Regression -- Estimating the 671

15.4 P3V -- Ceneral Mixed Model Analysis 413 Coefficients (PLR or P2K)of Variance A.12 Residual Analysis in P9K 672

15.5 PU -- General Mixed Model Analysis 427 A.13 Regression on Principal Components 672of Variance -- Equal Cell (P4R)Sizes A .14 Polynomial Regression (P5R) 673

A.15 Nonlinear Least Squares (P3R) 673A.16 Derivative- Free Nonlinear Regression 674

16. Nonparametric Analysis 437 (PAR)A.17 One-Uay Analysis of Variance and .675

PU -- Nonparametric Statistics 437 Covariance (P1V)A.18 Analysis of Variance and Covariance, 676

Including Repeaced Measures (P2V)17. Cluster Analysis 447 A. 19 General Mixed Model Analysis of 677

Variance (P3V)17.1 P1M -- Cluster Analysis of Variables 448 A.20 General Univarture and 'lultivariate 67817.2 P2M -- Cluster Analysis of Cases 456 Analysis of Variance and Covariance17.3 PKM -- K-Means Clustering of Cases 464 (P4V)17.4 P3M -- Block Clustering 474 A.21 Maximum Likelihood Factor Analysis 678

(P4M)A.22 Canonical Analysis (P6M) 679

18. Multivariate Analysis 479 A.23 Stepwise Discriminant Analysis (P7M) 679A.24 Using BMDP Proerams from a Terminal 681

18.1 P4M -- Factor Analysis 480 A.25 Missing Value Covariance Matrices 68118.2 P6M -- Canonical Correlation Analysis 500 (P8D)18.3 P6K -- Partial Correlation and 509 A.26 Classification Functions (P7M) 683

Multivariare Regression A.27 General Mixed Model ANOVA -- Empty 68318.4 P7M -- Stepwise Discriminant Analysis 519 Cells (28V)18.5 P8)4 -- Boolean Factor Analysis 538 A.28 Logistic Regression (PLR) 68318.6 P9M -- Linear Scores for Preferente 547 A.29 K-Moans Clustering (PKM) 684

Pairs

vi

Page 7: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

A.30 Survival Functions ( PNL) 684 TablesA.31 Regresslon with Incompleta Survival 684

Data (P2L) Table Title Page

A.32 Univariate and Bivariate Spectral 686Analysis (PNT) 2.1 Fear. ures common to BMDP programa 7

A.33 Box-Jenkins Time Series Analysis 690 5.1 Werner et al ( 1970 ) blood chamtstry 37

(P2T) data

A.34 Boolean Factor Analysis (P891) 692 6.1 BMDP transformar.ions available In si

A.35 Linear Scores for Preferente Palrs 692 the TRANSFORM paragraph

(P9M) 8 . 1 Nlneteen cases of data concaintng 89

A.36 Robust Estimators ( P2D) 693 nonnumeric characters

A.37 Cluster Analysis of Cases ( P2M) 693 11.1 Data on 117 male coconary patients 147patients

11.2 Tests and measures computed by 24F 152

3. Size of Programa 11.3 Incidente of peptic ulcer by blood 170group ( Woolf)

B.1 tncreasing che Capacity of BMDP 694 11.4 Three - year survival of breast cancer 178

Programs patients

8.2 ProgramLimitatíons 694 14.1 Radioactivity in the blood of a 290baboon named Brunhilda

14.2 Radioactivity corresponding to levels 297

C. Selected Articles from BMDP Communications of insulin standard14.3 Data on sensitivity of Chorda tympant 318

C.1 Detecting Outliers vich Scepvise 698 fibers in ra es tongue to four test

Regression ( P2R) stimulus sources

C.2 tt Wasn ' t an Aceident ( F-to-enter , 698 15.1 Daca from Afifi and Azen ( 1972, p.166) 360

F-to-remove ) omitting the sane cases omitted by

C.3 Scaling for Minimum Interaction 699 Kutner (1974)

Using &4DP6M 15 . 2 Numerical example from Winer ( 1971, 366

C.4 Computing Predictions 700 p.525)

C.S Tolerante in Regression Analysis 700 15.3 Numerical example from Winer (1971, 368

C.6 Bandos Case Selection 701 p.564)

C.7 Ridge Regresslon L'sing BMDPZR 701 15.4 Numerical example from Winer ( 1971, 370

C.8 Analysis of Multivariate Change 702 p.546)

Scores 15 . 5 Numerical example from Winer ( 1971, 373

C.9 Checking Order of Carda in a Data 702 p.806)

Deck 15.6 Numerical example from Winer ( 1971, 375

C.I0 Quick and Dirty Monte Carlo 703 p.803)C.11 First Sceps 703 15 .7 Life times of electrontc components 418

C.12 Using P1D to Identify and List Cases 704 15.8 Random effects model analysis of 418

Containing Special or Unacceptable varianceValues 15 . 9 Maximum likelihood analysis for the 419

C.13 The Iterated Least Squares Method of 705 random effects modal

r.titi.mar.i ng Mean and Variante 15.10 Maximum likelihood analysis for the 421

Ca:..punents Using P3R mixed modal

C.14 Latged Variables Using Transformation 706 15.11 Mixed model analysis of variante 421

Paragraph vich no interaction

C.15 Cross Validation in BIDP9R 707 15.12 Maximum likelihood analysis for the 421

C.16 Parcial and Canonical Correlation 707 mixed modal vich no interaccion16.1 Data from Siegel (1956, p. 253) 43817.1 Data from Jarvik's smoking 448

D. Ilow To • 708 questionnaire , administered to 110sub j ec ts

Request Copies of the BMD and BMDP Programs 17.2 Realth indicators 456

Obr_ain Additional Manuals 18.1 Fisher iris daca 520

Report Difficulties and Suspected Errors 18.2 Daca from a study of lymphocyte 539

Arrange a Visit by a BM DP Statistician blood cells

Obtain Bi.IDP CO.CiUNICATIONS 19.1 Survival of lung eancer paciente 557

Obtain BMDP Technical Reports 19.2 Artificial dates giving risa to che 570survival times in Table 19.1

E. IBM OS System Cards Required to Use BMDP 713Programs

REFERENCES 714

INDEX 720

vil

Page 8: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Preface

This manual describes che capabilicies and editions vere done primarily by James Frane , MaryAnn

usage of che BMDP computer programs . These programs Mill and W.J. Dixon . This procese vas simpLified by

provide a wide varfety of analytic capabilicies that che expert edicing provided by Morton Brown vho

range free plots and simple data description co edited the previous edicion.advanced statistical techniques . The first chapter This revision could noc have been completedoutlines the organízation of the manual and suggcsts withouc che support of many BMDP staff members. Inhow to use i t. addition Co those mentioned elsewhere ve vish to

This edition differs i n many respects from acknovledge che aasiscance of Noel luheeler, andearlier editions ; therefore ve recommend chal you Linda Moody.read Chapter One even if you are familiar with che Ellen Sommers tested and cer.ested che programaprevious editions. co document bou che options vork and prepared che

The first BMD Biomedical . Computer Programs examples in chis manual.manual appeared In 1961, and vas folloved by Throughout Janice Cammell provided expertnumeraus editions . Each new edicion included new technical editing, improved phrasing, organízationprograma víth improved feacures , novel statiscical and attentíon to detail. Ching Liu and Avis williamstechniques and more robusc statistical algorithms . expertly typed che many drafts of che manuscript andIn 1968 ve began Co develop the English- based prepared che camera- ready copy. The text in chisControl Language used vish Che B1DP programs manual vas produced vish che aid of a text-editingdescribed in chis manual . This method of specifying program produced by che CltnLcal Evaluation Unit ofínstructions is more flexible Chan che fixed format Che Brencwood Vecerans Adminiscrac ion Medicalused in che BMD programs . In addition , repeated Center. Layout vas done by Barbara Widawskf.analyses of Che same daca, or similar analyses of Appendix D concains information on ordering chemultiple sets of data , can be done by stating a programa and che com ? uters on which they -aremínimum number of Control Language instructions . available . Although che programa have been under

In chis edicion ve attempt co incegrate an test and development for some time ve knovextensive discussion of mosc program features vish diffícultíes may avise in theír use; Cheseexamples oí hoy they are used. There are numerous difficulties should be reported co us as describedinput / outpuc examples , many of which are annotated . In Appendix D. Your comments and critícisas aboutOur emphasis Lo on che use of che stacisclcal che programa and chis manual vill also betechniques and noc on che numerical results obtained appreciaced.-- therefore ve explaín che cerms used in che In che fírsc edítion of Che manual 26 programaresulta , but do noc repeat che numerícal ansvers in vere described . The 1979 edítion concained 36. Thische discussion of che results . edition concains 42 programs . The development of new

The manual la vritten for new users of computen programo and revision of previously releasedprograma as vell as for experienced statiscicfans . programs is an ongoing project . New apcions areChapter One gives en oucline of che entíre manual continually beíng added . che new programs in chisand a Cuide co its use . Chapter Two describes Che manual are:scope of analycical techniques available, basicterminology and a short descriptfon of each program . BMDP4F - Two -vay and Multivay Frequency TableIn ehapters four through seven ve discuss che Analysis - inciudes all of che essentialspecifications common co many programs and explain feacures of che older 3NDPIF , 8.`(DPZF, andhow they are specified in Control Language . Chapcers Bi DP3F programs and adds new features foreight chrough tventy are devoced Co che individual screening log-linear models. Models can beprograms . The programs are discussed in relation to selected In a stepwise manner. Structuralche analyses they perfore . Each program is zeros are pecmicted for boch tvo -vay andextensively illustraced by annocated input / outpuc multiway cables. Cells oc scraca whoseexamples . Difficult formulas and computacional frequencíes deviate from che expeccedalgorithms are provided In an appendix . frequencies under a proposed model can be

New chapters are written by che designaCed idencified i n a stepwise manner.auchors. Revisions of material appearing in previous

V1íí

Page 9: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

BMDP2L - Cox Models for Survival Analysis - analyzes A case has a score of one if Le has aCox style proportional hazards survival positive response for any of the variablesanalysis models with covariaces . There can dominant in che factor (those not havingbe both time-independent and time- dependent zero loadíngs ) and zero ocherwise. Thecovariaces . Hodels can be selected la a program has been found useful instepwise manner by either exact maximua serological and other studies.likelihood or by an efficient approximationusing parcial derivatives . Several plots BMDP9M - Linear Seores from Preference Pairs -can be selected co aceompany che results . constructs for each case a score thac is a

linear combinatíon of the variables whereBMDPIT - Spectral Analysis - performs spectral the coefftciencs are based en che judgments

analysis . Control Language ls a series of (preferentes ) of experta . The expertbrief commands to allow easy Lnteractive compares cases two at a time, stating vhichuse. Features include estimation of of che pair he prefers. The analysisspectral densities, coherentes , variable veighcs che observed variables te bestva. time plots, lagged starter plots, and replícate the preferente of the judgecomplex demodulation plots. Attencion Ls (vhatever subjective or objectivegiven to missing values, removal of information he may use ). The expert needseasonal means and linear trends and the not judge all possible pairs of cases.use of several kinds of filters.

The score I s determined in a stepwiseBMDP2T - Interactive Box-Jenkíns Analysis, Including manner veigliting che variables by their

Transfer Functions - deals vith a general Lmportance to che expert . Preferences fromcasss of time series models that i ncludes more than one expert can be analyzed.ARMA, inrervention , and transfer functioncomponente . For the ARIMA component, Ltallows any number of autoregressive andmoving -average factocs and each factor may

Other New Features �jhave any number of paramecers of any orderof lags. Intervention and transfer functioncomponents also have dynamie structure Other modules added since the 1977 edition are:

similar to the ARIMA components. PtotsB`tDPI?1 - K-means Cluster Analysisinclude i ndividual and multiple timeE;tDPLR - Scepwise Logiscic Regressionseries, autocorrelacíon , parcial

sutocorrelation and cross correlation B,1DP8V - Mixed-model Anova, Equal Cell Sízes

functions. Control Language Ls designed for Importanc modifications have beca malle co severalinteractíve and batch use. Outpuc isadjusted to the width of a computer programa:

terminal and can be performed i n a seriesof acepe guided by che user . BMDPID - Option for file sorting

BMDPID - Option for stem and leaf display

BMDP4V - Univariate and Multivariate Analysis of Bi1DP3D - Robuse option for t statistics

Variante and Covarianee, Includíng Repeated BHDP6D - Mulciple pairs of variables in bivariate

Measures and Cell Weighcs expands che plots permítted

analysis of variante capabilities of che BMDP7D - Capacicy co handle more groups in side-b;•

older BMDP2V. (Ir vas originally developed -side histograms , plot of cell means vs.

as URNAS, che University of Rochester standard deviations

Weighted ANOVA System.) Some of its new BMDP8D - Completely rewritten with a much more

features include Creenhouse -Ceisser and efficíent algorithm for computing

Huynh-Feldt adjustments to che univariate correlatlons from incomplete daca _approach co repeated measures ; univariate BMDP2M Option for single linkage and related case

and multívariace approaches to repeated clustering

measures ; covariaces for boch fixed effects BMDPAM - Maximum likelihood estimation of covarianee

and repeated measures; user defined cell matrices from incomplece data, display of

weights ( cell importante ) for hypotheses partero of occurrence of missing values

tested; interactive analysis of submodels ; afcer clustering partera of missingness byroes and colrmnsand orthogonalization of effects under user

control in order to yield various types of RMOP3R - Specification of function and derivacives

suma of squares for unbalanced designs. for nonlinear regression through EIDPControl Language racher than FORTRAN

BMDP8M - Boolean (Binary Data ) Factor Analysis - BFIDPAR - Specification of function for derivati`e-

perforas a factor analysis of dichotomous free nonlinear regression through 6.1DP

(binary) data . The analysis in chis programControl Language racher than FORTRAN,

differs from that of classical factor modifications co permic pharaacokineticmodels defined by differencial equations

analysis (see P4H) based on btnary data gMDP2V - Cell mean vs. cell standard deviation plot,evenn though the goal and model(symbotically) appear similar. The goal is

Creenhouse- Ceisser and Huynh-Feldt

co express p variables by a factors viere madjustments te degrees of freedom for

is considerably smaller than p. repeated Measures

The arithmetic used in the matrixmultiplication is Boolean, so che scoresand loadings are binary.

ix

Page 10: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Features added to all BMDP modules i nclude : Ph.D. chasis, which provides che algorithm for thederivative- free nonlinear regression. He Ls

- Character handling through che cransformation currently involved in planning m.iny ochar programs.processor , e.g., conversion of alphabetic codes to Jim Frane made significant concributions in chenumeric codeo area of multivariate analysis. He designed and

- Easier specification of cacegory codes and mees programmed several of che analyses at a level that- Free- format data reader makes che programs more accessible co the usar; he- Fortran variable formal daca reading replaced by Ls now supervisor of applications programming and isconversion within BMDP of card i mages co numbers deeply involved in making improvemenr. s and programfor enhanced diagnoscics and permicting blanks as testing.missing value codes for all computer systems Alan Hopkins developed che life cable programs.

- Codebook í nterpretation of input format Horton Brown developed che frequency tabla- Dynamic scorage mechanism to allow easy programs , and John Hartigan (of Yale Universicy)

specificacion of very larga analyses contribuced greatly in che area of cluster analysis.- CPU usage report, date and time repoct Ochar staff members , such as Al Forsythe, Ray- Increased control over output, e.g., for Mickey and Jerry Toporek made significant

interactive execucion concríbutions to many of che programs.- Output of data filas in binary, F, and G format Many statisticians have made concributions and- Printing of problem tiCle at top of each paga suggestions for developíng che programs: R.L.- Improved data printing Anderson in regression and analysis of variante,- Specification of certain special characters in Virginia Clark and Robert Elashoff in survival

variable narres without che peed for enclosing them analysis, Robert Ling in che mechod of pictoriallyIn aposcrophes , e.g., che use of underscore allows representing a matrix , and John Tukey in dacaeasier interface viCh SAS analysis. Valuable comments have been received from

- Enhanced portability, especially for CDC and 16 David Andrews, Petar Claringbold, Cuthbert Daniel,bit machines Charles Dunnett, Janet Elashoff, Ivor Francis,

George Furníval, Don Guthrie, Henry Kaiser, SirMaurice Kendall, H.L. Lucas, John Nelder, Shayle

ACKNOWLEDGEMENTS Searle, Frank Sctct, Max Woodbury, Karen Yuen, andCoralee Yale. This lisc i s incomplete. Many

It is ímpossible to properly credic all who have statisticians have visitad che Department ofcontribuced to che development of [hese programs. Biomathematics , and/or have discussed che programsEach program goas Chrough many stages from che vith us at conferences . Even che list of referentes,inicial planning co che final release for Ls, of necessity, incompleta.discribution. In chis manual ve list as author (s), Tony Thrall, Lon-Mu Liu and Laszlo Engelmanwhere possible , che person ( s) who vas most developed the time series programs receiving helpfulinstrumental in Che development of che program . comments from John Tukey, David Brillinger, PetarGenerally Chis person showed originalicy in che Bloomfield, George Box and George Tíao.design of che program and aleo contribuced to che During che past three years NIH provided us withstatistlcal methodology required for che analytic an Advisory Committee chal considered direccíons fortechnique. At che end of each program we narre che our research making suggestions for Lmprovements indesignar and prograemer . But these credits do noc, numerical methods , program design and standards,and cannot , fully cover che concributions of our statistical techniques, exchange of information vichstaff. statisticians and che scacistical computing

Laszlo Engelman designed che basic Eramework of community.Che 3MDP programs, such as che Control l.anguage used BMDP has benefited from che work of manyco specify instructions, che methods used for conversion cantero thar have cuscomized BMDP for usetransformacions and che mechod of saving daca and on a variety of. computer systems. For a completeresults between analyses (che BMDP ( Save ) Fila). He list of systems contact B?1DP at (213) 825-5940. Somealso programmed many of che subroutines common co of che many who have been responsible forall programs and several of che B&`1DP analyses. For conversíons include Rachel Countryman (Burroughs),many years he vas supervisor of applications Eli Cohen (CDC 6000, Cyber), Michael Hatzek (DECprogramming and supervisad the development of many 10/20), Gary Anderson (HP 3000), M.H. Barritt (LCLof che programs . 2900), James Krupp (PDP-11), Rob Charlton (Prime),

Robert Jennrich proposed and designed many of che Malte Sund (Siemens), Robert Byers (Univac 70/90),programs i n regression analysis, analysis of Ann Coleman (Univac 1100), Bernie Ryan (VAX), Randalvariance and discriminant analysis. He made Leavict (Xerox), Lois Secrist (Perkin-Elmer ), Aeneasignificant concributions in nonlinear regression Reid (Honeywell 66), H. Koll (Telefunken), and W.( P3R), factor analysis (P411) and che analysis of Haase (.110DCOMP Classic).variance (P2V and P3V). He supervisad Mary Ralston's W. J. Dixon

x

Page 11: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

i -

Page 12: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

1INTRODUCTION

The &MDP computer programs are designed co aid complexicy , and some numbers do not appear.daca analysis by providing methods ranging from For example, che program " Simple Dacasimple data display and description to advanced Description " is labelled P1D lince it is che firststatistical techníques . Data are usually analyzed by program in che Descriptiva series, and "Nonlinearan iterative "examine and modify " series of steps. Regression " is labelled P3R as che third program inFirst che data are examinad for unreasonable values, the Regression series. Since programs in chis manualgraphically and numerically . If unreasonable values are described by concent, "Multivariate Regression"are found chey are checked and, if possible , ( P6R) is explained in che chapter on Multivariatecorrecced . An analysis i s then performed. This Analysis vith programs from che M- series.analysis may identify other inconsistent Nev programs are continually beíng developed andobservations or indicare that further analyses are released . The first edition of che BMDP manualneeded . Th e BMDP programs are designed to handle all (Dixon, 1975 ) contained 26 programo, che 1979steps in an analysis , from che simple co che edition thirty-six. Nev in chis manual are:sophisticated.

The &MDP programs are organizad so che problem to PGF -- Frequency Tablas -- Tvo-vay and Multiwaybe analyzed,' che variables to be used in the This program replaces P1F, P2F and P3F andanalysis , and che layout of che data are specified expands che capability for analyzingin a uniform manner for all programs . This permits concingency cablesdifferent analyses of che sama data víth only minor P2L -- Survival Analysis vich Covariates --changas in the instructions . Cox models

This manual is arranged by che type of analysis P8M -- Boolean Factor Analysisappropriate to che data . Included are chapters on P9M -- Linear Scores for Preferente Pairsdata description and screeníng , plotting , frequency P1T -- Univariate and Bivariace Spectral Analysiscables , regression , analysis of variante , P27 - Box-Jenkins Time Series Analysismultivariate analysis , etc. Each chapter describes P4V General Univariate and Multivariateche programs that are available to do a specific Analysis of Variante , Including Celltype of analysis . In the i ntroduction to each Geights and Repeated Measureschapter the programs are described and contrastedvith each ocher to índicace which is preferred for a In addition, all programs are reviewed andspecific analysis . Programs in ocher chapters are revised in response to suggestíons from users, tocross - referenced if chey provide a similar funccion. i mprove efficíency or to correct errors. For

Th e programs are loosely classified inco series : example:

D: data description - P3D (t tests ) has been expanded to give crimmed tF: frequency cables scatistics and Levene's test for equalicy ofR: regression analysis variantesV: analysis of variante - P7D nov plots an escimate of che cell standardM: multivariate analysis deviation vs. the cell mean and also plots che logsL: life cables and survival analysis of each cell stacíscic . The slope of che regressionS: special ( miscellaneous ) líne for che second ploc is used to determine aT: time series transformacion for scabilizing variantes

P8D (Missing Value Correlation) has beenMany programs cross boundaries betveen tvo series. revritten to include a much more efficient algorithmFor example , multivariate regression be longs to both - PAM (Description and Estimation of Missingche multivariate ( M) and regression ( R) series. Values ) has be en expanded to include che maximum

Each program is identified by a chree -character likelihood mechod of escimacing covariance andcode ; che first i s P (from BMDP ) and che last is che correlacion matrices from incomplete dataseries classification . The middle character is - P2V ( Repeated Measures ANOVA) now includes cheassigned when the programming be gins; i t can be 1-9 Creenhousé-Geisser and Huynh - Feldc adjustments toor a lecter . Th e order does not índicace increasing degrees of freedom.

1

Page 13: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

1 Infroduction

In addition , diagnostic messages for daca reading instrucrions and results are labelled Example and-

have been enhanced and free - formacted data reading Oucput respectively and are identified by che lastis nov available . tu* characters of che program name and a sequence

Mejor changos in che programa and novel vaya to number vithín the program, e.g., 2R.3 denotes cheuse them are documentad in che nevsletter , BMDP• third example for BMDP2R. Numbers in circles ( e.g.,Communications . Articles from BMDP Communications O

o. ...) are used to annocace che results; chethat describe vaya co use che programa are reprinted numbers on che outpuc correspond co circled numbersin Appendix C. in the legend or che text.

This manual describes che currenc status of the A líst of program options follovs the examples)programa . Since your faeility may have en earlier vith page references to where they are discussed.version of che programa , ve have included notes in Each option is described ; exam les areche manual regardtng changos airare our November 1978 many of them .

P provided for

release. Note thac each BMDP program prints a date The &MDP programo can analyze large amounts ofIn che upper left comer of che first paga of the data . Near che and of each program description a -oucput • scatement is made of the largest problem che program

An abbreviaced manual , che BMDP User's Digesc and can analyze vithout modification . A more detaileda B1WP Control Language Pocket Cuide are available • formula for determining che sine of problem cheSee Appendix D. program can analyze i s given in Appendix B. If your

problem exceeds che limic , Appendix 8 also containsa description of che changes needed co analyze

1.1 A GUIDE TO THIS MANUAL larger problema.The last section.in che program description

inUCducto ry Material (Chaplers 1 through 3 ) $toces che formulas and algorithms chat are

Th e acope of che statistical analyses provided bydescribed in greacer detall i n Appendix A. The more

the BMDP programa is discussed in Chapter 2. Sectiondifficult formulas and computacional procedures are

2.1 introduces terma used throughout chis manual andcollecced in che appendix.

describes analytical features available in more chau Each program concludes vith a summary. Thesummary consists of a Cable thac describes che &MDP

ene program . The acope of che possible analyses with instrucrions used in cheehe BMDP programa la outlined in Section 2.2. program, and Provides short

For readera vho are using a computer for chedefinitions and paga references co explanations ofthe pr

first time , Chapter 3 gives annocated examples ofogram options. These cables ca n be usad as

simple analyses , describes hoy co organice your dataindexes co che program descriptions.

sheeta ( research foros ), and describes the layout(foroat ) of your data . Useful Aids

An índex to che manual Ls included in che lastFeatures Common to Alt BMDP Programa ( Chapters 4 through 7) fev pages.

On che inside front cover the programa are listadChapter 4 describes che English - based BMDP by their ehree - characcer identification codea; page -

instruccion language used to describe che data and referentes are given to program descriptions andspecify che analysis . The terminology and notation summaries.usad throughout chis manual are defined here. ve On che inside back cover che Control Languagerecommend cha[ you read che definitions in Section Common to all programs (and described in Chapters4.1 even Lf you are familiar vith che BMDP programs . 4-7) is presenced i n summary form . Opposite the --The 31DP instrucrions used to describe che data and inside back cover, space is left for you co fill invariables are presented in Chapter S. The che instrucrions necessary to begin an analysis orafree - forma[ data reader described in Chapter 5 Ls your computer ( see Chapters 4-7).

evailable in all programs except P4D . Methods oftransforming and editing data are treated in Chapter Where to Sta rt Reading6. Since am analysis ofcen requires multiple sceps,che data or resulta from one program can be saved ina BMDP ( Save ) Pila (Chapter 7) and then usad in

If you are using computers for the first cine, ve

other programs . The use of a 81IDP Pile eliminates recommend [hat you orear ch Chapter 3. Then real

having co repeat the description of your daca . Chapters 4 and 5 for a deacription of che ControlLanguage. Arad finally , try one of che programs inChapters 8, 9 or 10.

Program Descriptions ( Chapters 8 through 20) If you are not familiar vith che &MDP prograas,but have previously usad a computer, ve recommend

Chapters 8 through 20 describe methods of that you read Chapter 2 (Section 2.1 and 2 . 2) for enanalysis and Che programs available to perforen them . overviev of che programs and Chapters 4 and 5 for aEach chapcer begins vith an introduction describing description of che Control Language before turningalcernate methods and programs. Th is is folloved by to che specific analysis you vant to do. -decailed descriptions of each program . ( The teatures If you are already familiar vith che BMDPdescribed i n Chapters 4 through 7 are noc repeated ). programs , buc not vith chis manual, ve recommend

Each program description begins vith a short thac you skim Chapcer 4. If you are already familiarabstracc and a list of che examples and ocher vith che program co be used, cura co the summary foroptions 6 features . Th is is folloved by one or more che program (see che list of programs inside cheexamples that illustrate che simples [ oc most common fronc cover ); othervise you can either Cura co cheusage of the program . Each example consista of che chapcer describing the type of analysis chac youBMDP instrucrions thac are required by che program vant co do, or co Chapter •2 for ao overviev of alland che resulta produced by che program . The che programs. --

2

f

Page 14: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

2DATA ANALYSIS

Using the BMDP Programs

INTRODUCTION Missing values 9Regcession 9

The meaning of "daca analysis" is different for Nonlinear regression 10each of us, depending on our level of statistical Analysis of varíance and covariance 10training. Techniques used in data analysis vary from Nonparametric statistics 11the simplest display of data in a histogram or a Cluster analysis 11plot and che calculation of statistics (such as che Mulcivaríate analysis (factor analysis, 11mean and standard deviacion) to advanced mechods of canonícal correlation, parcial correlation,multivariate analysis. To some , daca analysis multivariate regression, discriminantínvolves a single display or set of compucacions; co analysis and preferente pairs)ochers it involves a sequence of steps, detecting Survival analysis 13che presente of outliers and inconsistencies, Time series 13ensuring that assumptions necessary to che analysisare met , etc. Each step may suggest furtheranalyses.

The BMDP programs provide eany analyticalcapaba).ities -- from elementary to advanced. In chis Chapter Organization Program Labeischapter ve describe features that are available inmore than one program and oucline che acope oí 8. Data descripcion 1D, 2D, 4Davailable analytical techniques. 9. Data in groups 3D, 7D, 9D

10. Plots and histograma 5D, 6D11. Frequency tables 47

WHERETOFINOIT 12. Missing values 8D, AMpage 13. Regression IR, 2R, 9R, 4R, 5R

2.1 Basic Terminology for Common Feacures 3 14. Nonlinear regression andand Statistics maximum likelihood 3R, AR, LR

15. Analysis of varíance and

Data and acceptable values 4 covariance W. 2V, 3V, 4V, 8VCases-used vhen some values are unacceptable 4 16. Nonparametric analysis 3S

(missing ) 17. Cluster analysis 1M, 24, 3M, NMTransformations and selecting subpopulations 4 18. Multivariate analysis 4M, 6M, 6R, 7M, SM, 9M

for analysis 19. Survival analysis 1L, 21.Hov to define subpopulations or groups 4 20. Time series iT, 2TUnivariace statistics 4Covariances and correlatíons 5Plots and histograms 5Frinting data and resulta for each case 5The BMDP File 6 2.1 BASIC TERMINOLOGY FOR COMMON FEATURESCase veights 6 ANO STATISTICSComputacional accuracy 6Table of common features and statistics 7 Each BMDP program ís designed to provide a

cercain analycical capability, such as datadescripcion, plots, or regression analysis. Some

2.2 Overviev of Data Analysis vith BMDP 6 statistics, plots, or other results are provided byseveral programs.

Data screening and descripcion 8 In chis seccion ve describe some of Che commonData in groups 8 features and introduce terminology that is usedTransformations 8 throughout chis manual. Table 2.1 (p. 7) lista chePlots and histograms 8 features and identifies che programs that containFrequency tablea 9 each feature.

3

Page 15: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

?.0 Data Analysis

Data and Acceptable Values acceptable values for both variables ( cases 1, 2 and3). Hethod C allows che means to be computed from

Data are codes representing characteristics cases chac have acceptable values for all three

( e.g.. sex , oye color ), values of measurements variables ( only cases 2 and 3).

(e.g., height , veight ) or responses co questions . Method C may use fewer cases than either of theEach characteristic , measurement or response Ls other methods . You can specify a list of variablescalled a variable . Each case contains values for all to be checked by Method C with Che USE list in theChe variables for one subject, animal, or sampling VARIABLE paragraph. Variables thac are excluded fromunir. A case may represent such things as responses Che lisc are not used in any of che computations _to questions , outcomes of tests or measurements made ( not even their means are computed). In Chapcer 5 ve

ora a subject or test animal. describe hoy Chis list Ls specified.

Some values in a case may noc be recorded . A The method used by each program Ls shovn in Tablavalue that tu not recorded may be lefc blank or =ay 2.1. Three programs , P3D, P8D and PAM a11ov you tobe recorded vith a special code; Che blank or choose explicicly betveen methods. P8D computes -

special code . vhichever La used , Ls called a missing estimares of correlacion matrices using all observedvalue code and Che unrecorded value i s called a values instead of only those values from complete

míssin value . Miscing values are excluded from all cases . PAM does che sama and, in additíon, provides

computations. estimares to fill in vhere observations are missing.In any program you can restrict Che analysis of a PAM aleo has features useful for describing the

variable to a specified range by assigning an upper pattern of where values are missing.limít (maximum ) and a lower limit (minimum) for,values of che variables . A value that Ls greaterthan Che upper 1LmLt or lesa than che lover limit Ls

Transformations and Selecting Subpopulations for Analysisout of range and Ls excluded from all computations.

An acceptable value la ene that la col equal to amissing value code and Ls not out of range . A Transformations can be used to replace che valuecomplete case Ls a case In which Che values of all of a variable by its transformed value; e.g., veightthe variables are acceptable ( there are no values by Che logarithm of weight . Also, new variables canmissing or out of range ). be created from Che observed variables by

cransformations . For example, Lf pulse roce ismeasured before and alter exercise, che difference

Cases Used When Some Values are Unacceplable betveen Che tv* measurements can be a meaningful -(Missing ) quantity. This difference can be specified as a nev

variable and can be used In Che analysis . Any numberAll BMDP programa ( excepc P4D) check for values of new variables can be treated as functions of the

missing and out of rango. The treatmenc of cases observed variables . These functions , calleddepends ora the primary purpose of che analysis . For cransformations , can involve arichmetical operatíonsexample . P1D - Simple Data Description -- computes ( e.g., +, -, *, /), povers , crigonometric functions,statiscics for each variable from ' all acceptable and complex condicional stacements ( LF(A/8-C LT 0)values for che variable ; but P2R -- Stepvise TREN D-0.).Regression -- uses only complete cases co estímate You can aleo select cases to be used in anthe linear regression equation . analysis ( e.g., data for males and for respondents

The B.MDP programs include cases in an analysis in cheir cvencies -- USE - SER EQ 1 AND ACE CE 20according to one of three criterio* AND ACE LT 30.).

Methods of specifyíng transformations and, caseA All cases are included ( all acceptable values selection are described in detail in Chapcer 6.for each variable are used)B Cases are Lncluded only if they have acceptable

values for all variables specified in che analysis How lo Define Subpopulalions or GroupsC Only complete cases are included ( cases that

have acceptable values for all variables ) Some analyses , such as a t test betveen tvogroups or a one-way analysis of variance, require

The difference betveen Che aboye criterio can be that che cases be classified Lnto groups. Theexplained in termo of an example . Suppose ve have variable -hose values are used to classify the cases -.three variables , each of vhich has some unacceptable into groups is called a grouoing variable . Groups

values . Th e data for five cases are: can be idencified as codes ( such as sex codes 1 and2 for males and females) or as intervals ( such as

height veight age age 10- 19, 20-29 , etc.).In all programs you can select cases be longing co

1) 67 130 - specific groups (fulfilling certain criteria) by2) 64 106 21 case selection ( Chapcer 6 ). The purpose of some

3) 65 117 27 analyses Ls to compare groups, such as in a plot or4) 67 - 24 by a t test or by an analysis of variance . In come -5) - 121 29 programs you can explicitly specify groups to be

analyzed ( or plotted).We request means for Che first tvo variables only.Using Method A, che sean of each variable iscomputed from all Lts acceptable values , vhecher or Univariate Slatisticsnoc either of Che other variables have acceptablevalues ( 1.e., four values for each variable). By Mosc programs compute the sean, standardMethod 8 the mearas of the first tvo variables are deviation and frequency for each variable. Incomputed from data in only those cases that have additíon, ochar univariace statiscics are computed -

4

Page 16: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Basic Terminology 2.1

in several BMDP programs . We review the defínitions The correlation can also be compuced afterof these statistici below. adjusting for che linear effecc of one or more other

Let x1, xZ, .. xN be che observed ( acceptable ) variables. For example , ve may want che correlationvalues f or a variable i n the cases used in an between x and y adjusted for z and w (sometimesanalysis . Then N is che samnie size , frequency or referred to as correlation at a fixed value of z andcount for the variable. The meón, R, is defined as w). This is called the partial correlation

coefficienc betveen x and y given z and w. It isExj/N equivalent to fitting separate regression equations

in z and w co x and co y, and computing che(Other estimates of locacion, such as che median and correlation betveen che residuals from che twomore robust estimates, are available in P2D!and regression fines.p7D.) A mlltiole correlation coefficienc ( R) is the

The standard deviation , s, is maximum correlation chac can be actained betveen onevariable and a linear combination of other

s [£(xj - x)z1(N - 1)� variables. This is che correlación betveen che firscvariable and che predicted value from che multiple

The variance ís s and che standard error of che regression of that variable on che other variables.mean is s //• The coefficienc of variation is che R 2 is the proportion of variance of che firscratio of che standard deviation co che mean , s/z. If variable explained by the multiple regressiona variable has a very small coefficienc of relating it to che other variables.variacion, loas of computacional accuracy can resultdue to the limitad accuracy vich which a number canbe represented internally in the computer. Ptots and Histograms

Many analyses require chas che distribution ofche daca be normal , or at leasc symmecric . A measure An assumpcion of normality is required by many

of symmecry is skewness , and a measure of analyses. The assumpcion can be assessed by a normal

long- tailedness is kurtosis . The BMDP programs probability plot . The assumption of normalíty is notcompute skewness , g1, as usually vich respect to the data , but vich respect

to che residuals . the differences betveen cheg = E(x - z ) z/(Ns3) observed value and che value predicted by che

z j statistical model. Many programs plot che residuals

and kurtosis, gz, as in a normal probability plot.Scaccer plots of one variable againsc anocher are

g = E(x 3 useful i d examining Che relacionships betveen chej cvo variables; they are also useful in assessing che

If che data are from a normal distribution , che fit qf*a statistical modal (such as regression).standard error of g1 is (6/N)14 and of gz i s (24/N)4. Scatcer plots of che data and results are providedA significanc nonzero value of skewness i s an in man' BMDP programs.

indícation of asymmetry -- a posícive value Histograms , or bar graphs , are a basic tool in

indícating a long right tail, a negative value a data screening . They can be used to screen for

long left cail. A value of g, significancly greaterextreme values or for the shape of che distribution

than zero indicases a discribution that is of data. Several BMDP programs plot histograms as

longer- tailed than che normal. We recommend that you Parc of their analyses.

also examine histograms when using these statistics Table 2.1 indicases vhich programs produce plots

for chey are sensitive to a few extreme values. and histograms as parí of their analyses. Chapter 10

The smallest observed (acceptable) value, xmin,describes cvo programs whose p.imary purpose is co

and che largest , xmax, are printed by several províde pinta and histograms in inal form.

programs. The ranga is (xmax - xmin). The smallestand largest standard scores (z-scores, zmin and zmax Printing the Oata and Results for Each Caserespectively ) are also printed by some programs. Wedefine zmin and zmax as 1 Many programs can list che daca for each case.

Some programa print results for each case , such asz zmin = (xmin - X-)15 and max = (fax - x)/s predicted values and residuals from a regression or

scores frota a factor analysis. Several programa have

Covariances and Corretationsspecial capabilities for listing che data:

- P4D can princ che data in a compact card imageCovariances and correlations are used i n many form

statistical analyses. The covariance betveen cvo - P4D can also print only cases that containvariables, x and y, is nonnumeric svmbols

- P10 can lisc all che data so chas each columncov(x,y) - E(x1 - !)(Y1 - r)/(N - 1) . contaíns all che values of one variable or so that

all variables for one case are princed before choseThe correlation , r, betveen cwo variables is for che next case

- P1D can print only cases vich missing values orcov(x,y) E(xj - x)(y1 only cases that have values out of range

r = - P1D can also princ che daca after sorting checases aceording to one or more variables

sx Sy [E(xj - x)z£(Yj PAM can print , in a compressed list, thepositions of the missing values and values out of

This is also called the product-moment correlation rangecoefficient. - P'_*1 and P4M can print standard scores

5

Page 17: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

2.0 Data Analysis.

The BMDP File Lec w be the case weight for che j th case. Then

A set of data is usually analyzed many times by z • Ewjxj/EwjBMDP programa . For example, the data may first be

examinad for extreme values ( outliers ) and for

distribu t ional assumptions ; then necesaary s {Ewj ( xj - x)Z/[(N - 1)Ewj/N]}transformationa can be performed , meaningful

hypotheses cested, or relacionships becveen che

variables studied. The results of an analysis may cov(x,y) - Ewj ( xj - x)(yj -1)Ewj1N] -suggest that further analyses are needed.

All programa can read a data matriz es input. Al.l

programa ( excepc P4D) can copy che data into a 314DP Evj ( xj - z)(yj - 'y)Fije. The 82WP Me is a means of storing your daca r =or resulto from an analysis so you can reuse them -more efficiently in other BMDP programs ; the File {Ewj(xj- z)zEwj(yj - y)z}can be created or read by any BMDP program ( exceptp4D). There are several advantages to using a BMDP

File: where N is the number of acceptable observationsused in the analysis vith positive ( nonzero)

- daca are read efficiently from a BMDP File; the veights.cost of reading a large amount of data from a BMDP Lthen case weights are not specified, wi is set toFile la substantially less than when a format one for all cases and che formulas are iáentical tostatement is used the formulas given on p. S.- many of the Control Language i nstructions,specified when che BMDP Fila 19 created, need not berespecified for each additional analysis . For ComputationalAccuracyexample, the variable names , che indicators formissing values and values out of range, codes and The computer representa each number by a binarynames for categories , etc., are stored vith the File sequence of limited accuracy . As a result there can- data are stored i n the BMDP Fila after be a loss of accuracy in certain types oftransformations and case selection are performed computation , such as matrix i nveraion . Loss of _- the BMDP Fila is the only way to atore resulta accuracy is especially pronounced if a variable has

( such as factor scores , residuals from a regression a small coefficienc of variacion (s/x) or if aanalysis or a covariance matrix) so they can be variable has a very hígh multiple correlacion withanalyzed further by other BMDP programs other variables.

All programs represent data values i n singleTabla 2 . 1 shows that all programa ( except P4D ) precision . Some programs do computations in single

can save che data in a BMDP Fila. Many programs save precision . Others do computations i n doubleresults, such as predicted values or residuals . Some precision ; these programs are the unes whoseprograms can aleo save a covariance or correlacion computations are most likely to be affected by amatrix or some ochar matrix of results . loas of accuracy if che computations are done in

All programs but P4D accept data from a BMDP File single precision.(a BMDP File can be created by one program and readby a different program ). Several regression andmultivariate analysis programs accept the covarianceor correlation matrix from a BMDP Fila, thus saving 2.2 OVERVIEW OF DATA ANALYSIS WITH BMDPcompucer time and cost. P4F accepts multivay tablesas input . The breadth of techniques available in che BMDP

programs is indicated by the chapter titles: -

Case Weights 8. Data Description9. Data in Groups -- Description , t Tests andMosc stacistical analyses assume chat che error One-4ay Analysis of Variante

of each observation has a constant variante . When 10 . Plots and Histogramsche variante l a not constant , the computations of ll . Frequency Tableathe meen, standard deviation and other statistics 12. Missing Values -- Patterns , Estination andare best done by veighting each case by the inverse Correlacionof che variante . For example , Che researcher knows 13. Regression -che variante from previous vork and includes i ts 14 . Nonlinear Regression and Maximum Likelihoodinverse as an additional variable for each case . Estimation

Case veights can also be used to represent che 15. Analysis of Variante and Covariancefrequency of an observacion when che lame 16. Nonparametric Analysisobse rvation is made more than once but is recorded 17. Cluster Analysisin only one case ; hovever, except for the frequency 18. Multivariate Analysistable programs , the sample size vill be the number 19. Su rv ival Analysisof cases and not che sum of weights. 20. Time Series Analysis

You can apecify case weights i n many BMDPprograms . The effect of the case weighc on Che In Chis section ve describe some of chesecomputacion of che unívariate statistics , covarlance techniques in general terms, and explain uhen cheand correlacion l a described below . techniques are useful ( see siso " First Steps",

6

Page 18: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Data Analysis 2.2

Table 2 . 1 Features common co BNIDP programs

Chapter 8 9 10 11 12 13 14 15 16 17 18 19 6 20

Program1241379 56 4 8A 12945 3AL112348 3 123K 466789 12 1 12DDD DDD DO F DM RRRRR RRR , WVVV S 1MPCM;l14RMMM LL S TT

Cases usedA all cases ( all acceptable values ) AAA AAA AA A AA AAi A A AB cases with acceptable values for all vars. B B B B ; BBBB B8 BB

specified in analysis ( see p. 4)C complete cases only CCICC C CC fC C CC C C CI C

Analycic capabilities!TrTTTT transformations and case selectíon TTT TTT TT T Tr.TTTTT TTTT TTTTITTTTTTITT T TT

W case weights W wu!www w WWW1 WW W WWWWW w,D double precision 0: 0 0 0D0 ! DDDD �DODD 0 0

Univariate statisticsTe mean . icc zcz R2 Rpc zxzz 1i - u RR x x xzzz z z zz

ocher estimates of location x xs standard deviacion ss sss ss 5 ss ssss sss I ss 5 s¡s sssss 5 s ssv coefficient of variation v vv vvvv Í ! 1 ;vvvg skewness and kurcosis g g 86 gj ggg gm min. and max . of che acceptable value mm mm m m mmm mmml m a• m msm m m mz z-score for min. and max . values z z zz ! ¡ z:zzzN number of cases ( sample size) NNN NNN NN N¡NN NNNNN NNN;NNNN NiNNNN NNNNNNNN NNG aboye statistics compuced for each group G GCC GG G G GG GC G G G

Other statisticsr covariances and/or eorrelatiorts rr r rr rrrr r!r rsrrr rp parcial correlations p ip p pR squared mulciple correlation ( R2 or SMC) R RRR R R R; Rf frequency counts, for each value or cacegory fff f f f( f f% frequency counts , in % % IX X '.), eigenvalues and/or eigenvectors 1; a aaC aboye stacistics compuced for each group CC GG C G.G 1 G

Test statisticsF comparison of group means (c,F) FFF� FFFFF F1 Fi FV comparison of group variances ! WV,x comparison of group or cell frequencies x xD mulcivariate comparison of group means ID D DI D

(T2 , D2 , A, U)

Plots and RraDhícal disDlays 1 i 1H histograms H Lid HN normal prabability plocs N NNNNNUN NS scacter plots of daca 5 S, 5 SI SSS SR scacter plots of results R RRRRR!RRR .R RiRRRR R,R0 other ( see program description) 0 0 0000 0 ?00i 00

Prints for each case It !1D data ( after transformations , if any) 0 Di 0 0 ODD DIODO 0 0 0 DDDDD DiD DO5 cases with special values S si Sz standacdized scores z IzR residuals and/or predicted values R: RRR RIRRR RR R RR R; RF factor, princ, comp . or canon. var. scores F FF FFM Mahalanobis distances M M M M M0 other ( see program description) 0 0 0 000 001001 0

BMDP Filo outputD data ( after transformacions , if any) DO DDD DO, DDD DDDDD DDD DDDDD 0 DDOD DDDDDD DD D DOR results for each case as parí of data ! R RRR R RRR R RRRRRR RC codes / cutpoints saved with data G GGC GG C G G C GGG GC correlacion and/or covariance matriz ICC.000 CCC0 ocher ( see program description) o! 0 0 0 0 0 0 0

Input from BMDP File0 data DO DDD DO DDD'DDDDD DDD DDDDD 0 0ODD 0DD000 DD 0 DOC correlation or covariance matrix I CCC CCC0 other ( see program description) 0 0 0

Input to- from BMDP FileD daca DDD DDD DDI 0 00 000DD DDD DDDDD 0 DORO DDDDDD DO 0 DO0 ocher ( see program description) 0 0 0 0 000 00

10

7

Page 19: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Data Analysis

Appendix C.11). Speciftc program options are distribution of data vithín groups , and examinepresented in the program descriptions . Some of che whether che assumptíon of normality is reasonable.more advanced techniques , such as maximum Likelihood Heteroscedasticity ( lack of constan [ variante ovarestlmation , are discussed only in che program groups ) can also be observad and tested , and maydescriptions . indicate thac che Input data should be transformed.

An analysis of variante using P7D can indicarewhether group differences are large enough to

Data Screening and Description suggest that futuro analyses should be stratified.P7D also computes AHOVA tests that do not assume

The first step in ao analysis is to examine che equal group variances , plots cell standarddata for errors and for che appropriateness of deviations vs. cell means and reports Bonferroniassumptions to be used In che analysis ( such as probabilíties for pairvise tests of cell means. Morenormality ). lf errors remain in the data they can information ora group differences (both univariatecause a "garbage - in, garbage -out" analysis , Blunders and multivariate ) can be obtained by using P3D. Itor extreme outliera in che data may peed to be yieids t statistics , Hotelling ' s T2 and Mahalanobisremoved to achieve a meaningful analysis . The data D2 for each pair of groups; t statistics , based onmay peed to be transformed to fit che various both pooled and separare variance estimares, areassumptions ( constan[ variance, normality , ecc.) printed In che output . A trimmed t test is availablerequired by che atatistical modél. i n che 1979 version. The Levene test for equality of

After Che original data have been recordad , variantes i s in both P3D and M.various descripcive characteristics of che data can tlhen che cases are classified by more [han onebe used co decect gross errors in che obse rv ations , grouping variable or factor, P90 (Multivayin coding che data , In includtng inappropriate Description of Croups ) can be used co compute cellcases , etc. A good place to begin screening Ls to frequencies , means and standard deviations . Groupingcheck for variables can be suppressed Co obcain information

about marginal celta. The program tests fór the- symbola or characters , such as letters vhere equality of cell frequencies and cell means and fornumbers should be (P4D counts sil distinct homogeneity of cell variantes . These tests arecharactera for each column of data , one column at a performed ora all cells or on specified marginals.time ); many programs will not run íf nonnumeric Cell means are plotted four variables per page in asymbols are In che daca usad for analysis - error compacc graphical display scaled by che overall meanmessages are reported , hovever, by all programa vhen and standard deviation . This display is helpful forillegal characters are found. understanding Lnteraccions i n more complex A11OVA- outliers or blunders (P2D can be used to obtain designs.

a smal1 histogram and frequency counts for sildistinct values of each variable)

TransformationsListing che cases by one of che methods described onp. 5 may alzo locate problems in che daca . Alter screening and describing your daca, you

0ucliers can be i dentified by multivariate should be ready co make decisions regardingscreening . For each case P4M princs che Mahalanobis cransformations. The transformed daca can be putdistance squared from che case to che center of all directly on a B; IDP File ready for easy input Lotocases . Vtthin group multivariate outliers can be any other BMDP program . Although all programs canidentified by using P7M , vhich prints che perforen data cransformations , you may peed to useMahalanobis distance squared from che case co che PIS, che multipass transformation pregram,.forcenter of each group. P9R siso prints distance getting che data transformed and ready for furthermeasures helpful for idencifying unusual cases . analyses . P15 can be used when your transformacion

Univariate descriptive statistics are found in requires more chao one pass Chrough che daca.most programs , but especially in P2D and P1D. For -example, from che cumulative percentiles printed inP2D for each distinct value, you can make summary Plots and Histogramastacements such as, "síxty percent of the patientsare in che 50-60 age group , vhile only severa percent Many research workers like to see cheir data inare in their tventies ", etc. A ates and leal graphical form; scacter olots , for example, are ahistogram Ls also available in chis program . good vay to present information concisely and

clearly i n final reports . Scatter plots that takeadvantage of knovn informacion can be designed co

Data in Groups display unusual cases or outliers -- for example, to -show vhether or not an individual ' s systolic blood

In screening , you often peed to examine groups pressure level Ls higher [han his diascolic level. A(straca or subpopulacions ) of che data . Unusual daca scacter ploc of these two variables will show if chevalues that are masked In a total populacion may data coding is mistakenly reversed for some cases. _stand out vhen che data are separated Lato groups or Or in a plot of height versus weighc , a case thatstraca . Some variables are easily coded into groups , has a height of 72 inches and 225 lbs. vill clearlysuch as sex ( males-l, females-2 ). Contínuous stand out if che height i s mispunched as 52 inches.variables can be cacegorized by a grouping variable . A grouping variable can be used in che P6D

27D is especially poverful for examining groups ; scacter ploc program co provide information about a -it princs histograms (side-by-side for each group ) variable noc used as che ploc axes . If age, forand statistics for each group; it siso provides a example , is divided Loto groups -- less [han orchoice ofone -vay or tvo -vay analysis of variance co equal to 15, 16-35, 36 - 55 and over 55 -- che letterscheck group differences. From Chis output you can A. 8, C and D are used to represen [ cases from eachidentify extreme outliers , obtain en idea of che age group . When two ocher measuremencs • for the

8

Page 20: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Data Analysis 2.2

subjects are plotted , the chiidren may appear In a including [hose that require complete data. PAMseparate arca of the ploc, i ndicating that they i nsures thac the resulting correlation matrix isshould be analyzed separately in later analyses . P6D numerically appropriate ( positiva semidefinite) for

can also perform a simple regression analysis for a regression or factor analysis ; P8D allovs you coche data in a seatcer plot. This analysis may choose betveen four methods to compute theindicare vhether or not en analysis of covariance correlations.should be used later . Variables can be plottedagainst time of encry into a study to see ifobse rv ations are independent , or if a drift over Regressiontime is occurríng.

P5D can prínt a histogram for all the data or for A regression analysis studies the relationshipone or more groups , each identifíed by a differenc betveen a dependent variable, y, and one or morelecter . You can specify the scales of the histogram independent variables , x. Th e linear least squaresco produce a histogram suitable Por a final reporc . model vith parameters oc regression coefficients,

Normality can be roughly checked by looking at Bi, can be vrictenhiscograms in P7D or P5D . P5D can also print anormal probability plot that provides a becter y - 30 + 31x1 + ... + Bpxp + eassessment of normality and helps co identifyoutliers . For simple linear regression (x1 is che only

independent variable in che model ), P6D, PiR and P2Rcan be used. If there are several independent

Frequency Tables variables, PIR, P2R or P9R can be used to perforamultiple linear regression analyses.

Cross tabulations are frequently used es a form PIR , PZR and P9R differ in three importantof final reporting to give a picture of the number respetes:of cases i n specified cacegories (orcross -classiEications ). Tables can be formed from - the criterion for including independentdata or from cell frequencies . Tables can also be variables in the multiple linear regressionformed for each level of a third variable ( such as - the ability to repeat the analysis on subgroupsseparately for males and females ). Tvency-chree of the cases and to compare the subgroupsstatistics appropriate for the analysis of - the residual analysis availablecontingency tablas are available in P4F (vhichincludes all the features formerly contained in PNR includes all the specified independentprograms P1F, P2F and P3F). variables in the multiple regression equation. It

P4F can test whether roes are independent of computes a multiple linear regression on all Checolumns using the frequencies in all cells . P4F can daca and on groups or subpopulations . If grouping isalso test the sane hypothesis using any subset of requested, ?IR firsc analyzes all cases combined andthe calla; for example , are roes i ndependent of the then analyzes each group separately . After allcolumns for all cells, excluding the cells on the groups have been analyzed , the regression equecionsdiagonal ? P4F can also identify cells that are tested for equality betveen groups.concríbute heavily to a sígnificant chi-square test P2R computes che multiple linear regression in aof independence. stepwise manner . At each scep i t entera into the

Multivay frequency tablas are formed and analyzed regression equacion the variable thac bese helps toby P4F . A log-Linear model can be fitted to the cell predice y or removes che leas [ helpful variable.frequencies and the fit tested. P4F can be used co Several criterio are available for entering orselect an appropriate model for the daca and to removing variables from the equation (sea P2Restimare the parameters o£ the model . program description ). A stepvise procedure is useful

for idencifying a good set of predictor variables(separating the most important variables from [hose

Missing Values that may not be necessary at all ). and whensufficient preliminary informacion regarding the

All coo often the data recorded are not complete effectiveness of the independent variables is notand some values are míssíng . Th ese missing values available . In practtcal applications the stepviseare usually left blank or coded by a special ende procedure i s often a satisfactory solution.called the "missing value code ". Missing values, and P9R identifies "best" subsets of independentunusually extreme values that appear co be vrong , variables i n terma of a criterion such as R2are excluded from an analysis . adjusted R2 or Mallovs " Cp (described in P9R program

PAM lista cases containing missing values or data description ). It also identifies alternativa goodto be excluded from che analysis , computes the subsets of the independent variables . P9R computespercentage of missing data for each variable, and only a small fraction of all possible regressions toreports special parteros in che data . PAM can also find the numerically bese subsec.estimare values co replace che missing value code All three programa princ and plot residuals and(or excluded values ) based upon the data present in predicted values. The plots are useful in detectingthe case . lack of- linearity, heteroscedasticity (lack of

Most regression and multivariate analyses requíre constan [ v ariante ), unusual outliers , gross errors,complete cases; i.e., no missing or excluded values an unusual subpopulation thac should be separatedin any case . Many of these analyses can begin from a from the analysis , etc. The plocs may also indícatecorrelation or covariance matriz . Both PAM and P8D chal transfornattons of the data are necessary orcan estimare correlations using cases vith some data that an inappropriate model vas chosen.missing; che correlation matrix can then be stored The residual analysis in P9R is che moscin a BMDP Fila and used as input co other programa , extensiva of the three. P9R also allows easy

cross -validacion of che regression model by testing

9

Page 21: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

.2 Data Analysis

it on a subset of the cases excluded from che Analysis of Variance and Covarianceanalysis.

P4R creares new independent variables , called Analysis of variance is used to test forprincipal componente , that are linear combinations differences betveen che means of tvo or more groupsof che original independent variables . These or subpopulations . In a simple oneway analysis ofprincipal componencs are determined in a vay that variance each individual ( or subject ) is classifiedprovides a parsimonious summary of che original i nto one category or group -- for example, in avariables ; a subset of 'che principal components medical problem patiencs could be assigned toexplains mear of the total variance of che original treatment A, 8 or C. The pacients are grouped by cheset of independent variables . The program then type of treatment. The model for Chis one -vay designregresses che dependen[ variable in a stepwise lamanner on che principal componencs ; not all cheprincipal components may be used , but useful Yik - µ + al + eikinformacion i s based on all che variables. Theregression equations at each step are expressed in where a7, a2 and a2 might represen[ che effect ofCeras of che principal componente and che original treatments A. B and C, respectively, on chevariables. dependen [ variable , Yik, a blood pressure reading

The relation between an independent and a for case k in group i. Programe P7D, P9D , PIV anddependent variable may requitre rerms vith higher P2V can be used to test che hypothesispovers. The model for polynomial regression in P5Ris Ho: all ol - 0

y - 0, +gix +52x2 + ... + 1kxk+ e . that there i s no difference betveen treatments.Group sizes may be unequal in ell tour of these

PSR reports polynomials of degree one through a programa . For each dependent variable analyzed, P7Dspecified degree ; chis helps to determine che presenta side-by-side histograma that give snhighest-order equation necessary for an adequate fit excellent visual picture of how che groups differ.of che data. As higher-order tercos are introduced In che medical treatment example aboye , if cheLato che model , che fitted regression cu rv e and che covariate x (age ) also affects che dependentoriginal data can be plotted at each step for a variable (blood pressure ), che one -vay model becomesvisual check on how the fic la proceeding.

Yik u + al + $ ( xik - x) + eik ' -

Nonlinear Regression P1V could be used to examine treatment effects afteradjusting for che linear effecc of age. P1V algo

To fit a model vhere che equation Ls nor linear allows multiple covariates . It prints sn analysis ofin che parameters you can use che nonlinear variance tabla vith F tests for equalicy of alopes,regression programa, P3R and PAR . These are least zero slopes and equalicy of adjusted group mesassquares programe appropriate for a vide variety of (which adjuscs for che effect of che covariate) andproblems that are not vell - represented by equations a number of residual plots.vith linear parameters . Several different functions Several factors (or characteristics ) may be --are available in P3R by simply stating a number , involved i n en analysis of variance model. In aincluding such functions as sums of exponencials evo-vay factorial analysis of variance, Che

individuals in each group are classified by twop2t p,C characteristics , such as sex and treatment. The

pie + p3e , model can be written

ratios of polynomials , a combination of sine and Yijk - u + al +ni

+ (arl ) ij + eijkexponencial. functions, etc. If you van[ a functiondifferent from choza described in che P3R program Here che ai ' s could be treatment effect che 7) 'sdescriptton , you can requesc i t by FORTRAN sex effect and (a7)) 1. a possible interaction bety enstatements in P3R or PAR . In PU you must also sex and treatment . P

L/D can be used to analyze [hese

specífy che function ' s parcial derivacives . data . The accompanying histograms give additionalA special nonlinear model is che logistic informacion.

function . PLR computes che maximum likelihood P2V handles general fixed effects analysis ofescímates of che parameters of variance and Covariance models. This program can

analyze repeated responses , such as che measurementse8x oí a subject ' s blood pressure every day for a veek.

E( ñ ) The repeated responses are called trial factors orrepeated measures factors and need noc be

1 + esx statístically independent . In che blood pressureexample aboye , time could be a seven-level trial

where s is Che sum of che binary ( 0,1) dependent factor ( e.g., a subject ' s blood pressure could be -variable y ( E y - s) and x represents Che recorded every day of che veek ). In P2V che usualindependent variables . The dependent (outcome) analysis of variance factors, such as sex andvariable records events such as success or failure, treatment , are called grouping factors coresponse or no response , etc. The independent distinguish checo from trial factors. The models may(explanatory or covariate ) variables can be have only trial factors , only grouping factors, orcategorical ( e.g.., sex , treatment , hospital ) and both . The groups can contain en unequal number ofcontinuous ( e.g., age, height , blood pressure ). The subjects , but data for each subject must include allprogram generales design variables for che obse rv acions over che trial factor ( a blood pressurecategorical variables and their interactions. reading must be given for each day).

10

Page 22: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Data Analysis 2.2

Mixed models are treated by PSV (vhich requires distance matriz can also be printed in shaded formequal cell sízes ) and by P3V (vhich allows unequal to display pictoríally che clusters.cell sises and covariates ). P4V is a very general The clustering method in P2M is hierarchical. Theprogram that handles multivariate models , including procedure in PKM Ls callad k-means and begins viththose vith repeated measures and covariates '. The user-specified clusters or vith all che data in oneuser may specify cell veights for use in che cluster : at each step one cluster is split finto cvo.definitfon of model componente such as main effects This procedure is useful when you Nave a largeand lover order interactions in factorial models or number of cases or vhen your goal is to divide théspecify unequal ince rvals for orehogonal polynomials cases inca homogeneous subsecs . PKM provides severalin response surface analysis . ways to standardize the data In order to avoid

problems caused by stale differences.The programa discussed aboye look for variables

Nonparametric Statistics to be clustered across all cases or for cases to beclustered ( by similarity ) across all variables.

If your data grossly violate che usual analysis Hovever, your data may include differences betveenof variance normalicy assumptions, you could try tvo cases that do not extend across all che variables,nonparametric tests in P3S -- che Kruskal-Wallis or your variables may not cluster across all cases.one-way analysis of variance test, or che Friedman P3M allows some of che variables ( columna) to betvo-way analysis of variance test . Nonparametric clustered as a subset of che cases ( rows ) and vicetests such as che Mann-Whitney U test , che sigo test versa . Thfs clustering by both cases and variablesand che Wilcoxon stgned rank test can siso be is representad by a data matriz i n che form of acomputed víth P3S . These tests can be used vhen che block diagram ; roes and columna are permuted andresearcher wants to avoid a t test assumption of smaller blocks ( submacrices ) of similar valuesnormality . vithín che larger block are outlined . This gives a

good visual representation of parteros of likevalues In che data matriz and can be used as a

Cluster Analysis multivariate histogram . P3M Ls best suited to treaccategorical variables that Cake on a small number of

Although many research atudies involve values.multivariate obse rvations (many variables obse rv edfor each case ), sometimes little Ls knovn abouc cheínter-relacions betveen variables , betveen cases , or Multivariate Analysisbetween variables and cases . In discussing screeningand data descriptí on , ve emphasized that groups or Cluster analysis is not appropriace forsubpopulations should be examined; however, problems expressing complex functional relationships. Foroften arise when groups are not clearly defined or example , if you are interested In describing the,,han it is diffícult co see if the data are ínter -relacions among your variables , factorstructured . Clustering is a good techníque to use in analysis may be better suited to your needs, andexploratory or early data analysis '.hen you suspecc discríminant analysis provides functions of chethat che data may not be homogeneous and you want to variables that best separata cases i nto predefinedclassify or reduce Che data foco groups . Clusteríng groups.performs a display functton for multivariate datasimilar to graphs or histograms for univarfate data ; Factor analysis . Factor analysis i s useful init provides a multívariate summary -- a description exploracory data analysis . Ir has three generalof characteristics of clusters i nstead of individual objectives : to study the correlations of a largecases. number of variables by clustering che variables finto

Three different types of clustering can be factors , such that variables vithín each factor areperformed by BMDP programs: clusters of variables . híghly correlated; to ínterpret each factor(P1M), clusters of cases ( P24 and PKM ), and clusters according to the variables belonging to it; and toof both cases and variables ( P3M). After deciding summarize many variables by a fev factors. The usualvhich program í s applícable to your problem , other factor analysis model expresses each variable as aquestions must be ansvered : Hoy víll you measure function of factors common to several variables anddistances between objects ( variables in P1M, cases a factor unique to che variable:P2M and PKM )? Hoy vill you use che distances coamalgamare or group che objects Loto clusters ? Hoy zj - a jlf 1 + aj2f2

+ ... + ajmfm+ Uj

vill you display che resulting clusters ? The bestansvers to these questions are atill being vheredeveloped ; investigators have their ovo preferencesas to which distance measure or vhich amalgamation zj - the jth standardized variableprocedure Ls best . You may want to try several m - che number of factors common to all theoptions given in che program descriptions to see variablesvhich one provides the best results for your Uj - che factor unique to variable zjproblem. ají - factor loadings

in both P1M and P2M the.clustering begins by fi - common factorsfinding che closest pair of objects (In P1M,columna , or variables ; in P2M, rocas, or cases ) The number of factors , m, should be small and cheaccording to che distance matriz and combintng them concributions of che unique factors should also beto form a cluster . The algorichm continuas , joining small . The individual factor loadings , a•i. for eachpairs of objects , pairs of clusters , or an object variable should be either very large or very smallvith a cluster, until all che data are in one so each variable ís associated with a mínimum numbercluster . These clustering steps are shovn in the of factors.output cluster diagram , or cree . The correlation or

11

Page 23: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

2 Data Analysis

To che extent thac chis factor model is Parcial correlations and multivariate regression.

appropriate for your data , the objectives stated Parcial correlaciona can be computed in P6R; the

aboye can be aehieved . Variables vith high loadings correlation between each pair oí dependent variables

on a factor tend to be highly correlated vith each is computad after taking out the linear effect ofother , and variables that do not have che same the set of i ndependent variables. For example, if -loading patterns tend to be leas highly correlated . you want to do a factor analysis on severalEach factor i s interpreted accordíng to the variables ( systolic blood pressure, diastolic bloodmagnitudes oí the loadings associated vich it. The pressure , blood chemistry measurements , income,original variables may be replaced by che factors etc.) but vant to remove the linear effect oí tvo _vich little loas of information . Each case receives variables ( age and veight ) from the measurements,a score for each factor ; [ hese factor scores are you can state that the tvo variables ( age andcomputed as: veight ) are independent variables and the test are

dependent variables. The resulting parcialfi .

bilzi+ bi2z2 + ... + bipzp correlation matriz ( of the dependent variables with -

che effects of age and weight removed) can be scoredvhere bi . are the factor score coefficients. Factor as a matriz i n a BMDP File , and can be used as inputscores clan be used in later analyses , replacing che in P4M , the factor analysis program.values oí the original variables . Under certain P6R can be used co regresa a number of dependentcircumstances there few factor • scores are freer from variables on one set of independent variables. Thismeasurement error than Che original variables , and multivariate regression program gives you a separateare therefore more reliable measures . The scores regression equation for each dependent variable,expresa the degree to which each case possesses che squared multiple correlatíon ( RZ) of eachquality or property that the factor describes . The independent variable vich all other independentfactor scores have mean zero and standard deviation variables , R2 oí each dependent variable vith theone. set oí i ndependent variables, and tests oí

There are Tour main steps in factor analysis : significance oí multiple regression.first, che correlation or covariance matrix iscomputed ; second , che factor loadings are estimated Discriminan [ analysis . In discriminan [ analysis,(inicial factor extraction ); third, the factors are the cases or subjects are divided into groups androtaced to obtaín a simple incerpretation (making the analysis is used to find classificationche loadings for each factor either large or small, functions ( linear combinations oí the variables)not in-between ); and fourth , the factor scores are cha[ best characcerize che differences between thecomputed . P4M provides several methods for inicial groups. These functions are also useful forfactor extraction and rotation . You can specify che classifying new cases.methods to be used or P4M vill use preassigned P7M, che stepvise discriminan [ analysis program,options . The resulta can be presented in a variety is used to find che subset oí variables thatof plots . maximizes group differences . Variables are entered

P8M, Boolean Factor Analysis , i s an alternase luto the classification function one at a time untiltechnique when Che variables are binary or che group separation ceases to improve notably (chisdichotomous . is similar to the stepvise regression program, P2R, -

used to find a good subset oí variables forCanonical correlation analvsis . Canonical prediction ). P7M is also used as a multivariate test

correlatíon analysis (P611 ) examines the relationship for group differences ( or multivariate analysis ofbetween two sets of variables , and can be vieved as varíance ); Wilks' lambda (U stacistic ) and the Fan extension of multiple regression analysis or oí approximacion co lambda are printed at each step, o£ -multíple correlation . Multiple regression deals vich che output for testing group differences.one dependent variable , Y, and p independent A geometrical interpretation of discríminantvariables , x33 . The regression problem is co find a anal ysis can be given by plocting each case as alinear combination oí che X variables that has poínt in a space where each variable is a dimensionmaximum correlatíon with Y . In canonical correlation (has an axis). The points are projected onto a planethere is more [han one dependent Y variable -- vhere or hyperplane selecced so the groups are farthestis a set of them . The problem is to find a linear epart , giving a good visual representation oí hoycombination of the X variables chat has maximum distinct che groups are (for tvo groups, the pointscorrelatíon vich a linear combination oí che Y ( cases ) are projected onto a line vhere che groupsvariables . This correlation is called che canonical are farthest apart ). P7M presenta plots that showcorrelation coefficient . Then a second pair oí such a plane . The X axis is che direction where thelinear combinations , vich maximum correlation groups have che maximum spread; che Y axis shows thebecveen chis pair and zero correlations vich che maximum spread of che groups in a direction -firsc pair oí linear combinations i s found. The orthogonal to che X axis - chis is a plot oí thenumber oí pairs oí linear combinations oí che Y and canonical variables.Y sets is equal to che number oí variables in the The canonical variables are rehaced co canonicalsmaller set (X or Y ). The technique can be used to correlation analysis , which finds the lineartest the independence oí two sets of variables , or combinations oí che tvo sets oí variables that areto predict information abouc a hard- co-measure set most highly correlaced . The first set contains theoí variables from a set chal is easier to neasure . variables in che classification function; the secondle can also be used to relate a combination oí set can be vieved as dummy variables used tooutcome measures to a combination oí history or indicate group membership . The value oí che first -baseline measures . The original and canonical canonical variable oí che classification functionvariables can be plocced one against che other in set is plocted on che X axis; che value oí chescatter plots . second on che Y axis. The coefficients for these

12

1

Page 24: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

Data Analysis 2.2

canonical variables appear in the output . The subset regression program , P9R, prices alcernativecoefficients for the second set ( dummy variables ) do functions for each subset vhich may classify thenot appear in the output . The eigenvalues and cases equally as vell.canonical correlaciona for all canonical variablesand che canonical variable atores associaced vith Preferente Pairs . P9M, Linear Scores fromthe firsc and second canonical variables are also Preferente Pairs, i s used co obcain a linearreported . function of one set of variables that reproduces the

Ac each step , P7M uses a one-vay analysis of ordering of cases as established by recordedvariante F statistic (F-to-enter ) to determine vhich preferentes ( stated by expert j udges ) betveenvariable should join the function next . At step selected pairs of cases.zero, the standard unlvariate analysis of variancetest is made for each of che variables . Th e variablefor vhich the means differ most is entered firsc Su rv ival Analysisfinco the classification funetion. Alter step zero,the computed F-to-enter values are conditioned on The techniques described in Chapter 19 areche variables already present in che function . Th is appropriate when outcome measurements represent cheis like an analysis of covariance , vhere the time to occurrence of some event or response (e.g.,previously enterad variables can be vieved as su rv ival time , or time co disease recurrente ). Whatcovariates and the nonentered variables are distinguishes the techníques of chis chapter fromconsidered as dependent variables. ocher statistical methodology is the ability to

At each step after a variable is entered , •the handle censored (incomplete ) data ; that is, thereclassification functions are recompuced íncluding are cases for which the response is not observad butche nevly entered variable . The number of che data ( time in study ) are included in checlassification functions i s equal to che number of analysis . This eould occur in a study of su rv ival,groups. If you have six groups , the values of all where an individual may remain alive at the Glose ofsix functions are computed for each case and che che obse rv ation period or may drop out before chevalues are used to compute the posterior end.probability ; each case i s assigned to che group in PIL estimates che su rv ival (time - to-response)vhich the value of the posterior probability is distribucion of individuals obse rv ed oven varyingmaxímum . In multiple group díscriminanc analysis , time periods . These estimates can be obtainedone function is sometimes stated in che literacure separately for different groups of patients; chefor separating each pair of groups . To get thls equality of che distributions for [hese groups canfunction from P 7M. you subtract che classification be tested by tvo nonparamecric rank tests . Plots offunction coefficients of che che firsc sember from che survival . hazard and relaxed functions can bethose of the second. At each step , F statiscics ( che printed.F matríx ) that test che equality of means betveen P21. provides Cox model survival analysis wheneach pair of groups are given. These F scatistics there are covariates . Covariaces can be selected inare proportional to Hotelling ' s T2 and che a stepwise manner.Mahalanobis D2 and give an indication of vhichgroup means are elosest together and which arefarthest apart . Alter all variables have been Time Series Analysisentered , che program lista the Mahalanobis DZ fromeach case to che cencer of each graup , and the The primary distínguishing feature of time seriesposterior probability of che case assigned to each analysis , as opposed to ocher cypes of scaciscicalgroup. Th ese tvo bits of information present a good analysis , is the assumption chat cases of datapiccure of how well ( or hov poorly ) each case has represenc measurements or observations made atbeen classified . equispaced points along some linear dimension.

Th e discriminant analysis procedure i s successful Usually the underlying linear dimension is time, asif few cases are classified fi nto che vrong groups. in che record of a subject ' s blood pressure takenIf a large percentage of che cases are classified every second over a period of time . Hovever time iscorrectly (if che posterior probability assigns them occasionally replaced by some other dimension. Forto their original group ) you knov thac group example , che thickness of thread from a certaindifferences do exist and that you have selected a nanufacturing process might be measured eachset of variables that exhibit the differences . Th e millimeter along che length of the thread. ThisP7M output presenta chis classification information would constitute a 'time ' series in which lengthIn a Cable of counts indicacing hoy many cases from replaces time as che underlying linear dimension.each original group are assigned to each of che Nevertheless , ve follow convencional terminology andpossible groups. A pseudo - jackknife classification use che vord ' time'.table i s also p r i n t e d : for each case a A basic goal of time series analysis is toclassification funecíon i s computed with che case characterize che way in which che daca vary overomitted from che computations . Th e 'function ís then time . Th e a priori assumption , common in most otherused to classify che omitted case . This resulta i n a cypes of statistical analysis, chat cases areclassification vith less bias. ( A classification stacistically independent , ls here relaxed. We allowfunction can produce opcimistic results vhen it i s that cases may be correlated , assuming that theused to classify ehe same cases that vere used to correlacion betveen cases depends on the timecompute it.) inter al separacing them . In addition , ve allov for

PLR, Stepvise Logistic Regression , provides en che presente of a trend in the daca. Thus che trendalternative to che multivariate normal model of P7M . of increasing commodity prices ovar che Iast decadeUhen there are only tvo groups, che a11. possible mighc be represented by a straíght fine , or by en

13

Page 25: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

2 Data Analysis

exponencial curva . So the estímated trend and identify che groups of frequencies contributing mostautocorrelation function are one possible of che overall varíability of che data . 1n the timecharacterization of a time series . Other domain approach , ve seek a model Erom a family ofcharacterizations and more elaborate models are also parametric time domain models that is simple and yetpossible. captures the variability of che data . BMDP2T uses

The BMDP package includes tvo programs for time the iterative model building procedure of Box andseries analysis : BMDP1T employs che frequency domain Jenkins , consistíng of tentative modelapproach , chile BMDP2T uses the time domain i dentification , parametric estimation, andapproach . Ve may describe the frequency domain diagnostic checking or residual analysis. Once a --approach as representing che data by a superposition satisfactory model has been reached , che usar may

of sinusoidal vaves at different frequencies . A request BWP2T to forecast futura values of che time

central function of BMDPIT is tb enable the user , by series . Both programa have further capabilities

means of printer plots and accompanying printouc, co which are descríbed i n Chapter 20.

14

Page 26: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

STATISTICAL PACKAGE FOR THE SOCIAL SCIENCESSECOND EDITION

NORMAN H. NIEDepartment of Political Science

andNational Opinion Research Center

University Of Chicago

C. HADLAI HULLComputation Center

University of Chicago

JEAN G . JENKINSMcGRAW-HILL National Opinion Research Center

800K COMPANY University of ChicagoNew YorkSt. Louis

San FranciscoAuckland

Dusseldo rf KARIN STEINBRENNERJohannesburg National Opinion Research CenterKuala Lumpur University of ChicagoLondon

MexicoMontreal

New DelhiPanama

Paris DALE H. BENTSáo Paulo Faculty of Business AdministrationSingapore

Sydney and

Tokyo Computing ServicesToronto The University of Alberta

Page 27: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

No wri tt en materials describing the SPSS system may be producedor distributed without the express written permission of thecopyright holders of both the SPSS programs and the SPSS manual,

Library of Congress Cataloging in Publication DataNie, Norman H

SPSS: statistical package for the social sciences.1. SPSS ( Electronic computer system) 2. Electronic data

processing-Social sciences . 1. Title. II. Title: Statistical package forthe social sciences.HA33.N48 1975 029.7 74-16223ISBN 0-07-046531-2

STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

Copyright © 1970, 1975 by McGraw-Hill, Inc. Al¡ rights reserved.Printed in the United States of America. No part of this publicationmay be reproduced, stored in a retrieval system, or transmirted, inany form or by any mearas, electronic, mechanical, photocopying,recording, or otherwise, without the prior written permission of thepublisher.

8 9 0 W C W C 7 9 8

This book was set in Times Roman by Creative Book Services,division of McGregor & Warner, Inc. The editors were Kenneth J.Bowman and Matthew Cahill; the designer was Joseph Gillians; theproduction supervisor was Thomas J. LoPinto. New drawings weredone by J & R Services, Inc.Webcrafter, Inc., was printer and binder.

Page 28: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS

PREFACE xxi

1AN INTRODUCTION TO COMPUTING WITH SPSS 1

1.1 COMPUTERS AND THE PROCESS OF INDUCTIVE SOCIAL RE-SEARCH 2

1.1.1 Statistical Analysis on Computers: Use and Abuse 3

1.2 USING THE STATISTICAL CAPABILITIES OF SPSS 4

1.2.1 A Note on Levels of Measurement 41.2.2 Statistical Procedures in SPSS 6

1.3 AN OVERVIEW OF THE OPERATION OF SPSS 11

1.3.1 Sequencing Calculations 111.3.2 Entering and Processing Data 121.3.3 Subfiles 161.3.4 Missing Data 171.3.5 Recoding Data 171.3.6 Variable Transformations 171.3.7 Sampling, Selecting, and Weighting Data 181.3.8 Aggregating Data 181.3.9 File Modification and Management 181.3.10 Retrieval of Data from the System 191.3.11 Output of Results from the System 19

1.4 SUGGESTIONS FOR THE USE OF THIS TEXT 20

v

Page 29: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

- vi CONTENTS

2ORGANIZATION AND CODING OF DATA FOR INPUTINTO THE SPSS SYSTEM 21

2.1 THE CASE AS THE UNIT OF ANALYSIS 212.2 CODING CONVENTIONS FOR DATA 21

2.2.1 Variables and Variable Types 222.2.2 Coding Missing Values 222.2.3 Organizing Data for Input 23

2.3 SUBFILES: THEIR FUNCTION AND STRUCTURE 25

2.3.1 Using Subfiles 262.3.2 Subfile Structure 26

2.4 RECORDING DATA ON MACHINE-READABLE MEDIA 26- 2.5 LIMITATIONS ON INPUT DATA 27

3SPSS CONTROL CARDS 29

3.1 CONTROL-CARO PREPARATION: GENERAL FORMAT ANDCONVENTIONS 29

3.1.1 Control Field 303.1.2 Specification Field 303.1.3 Summary of General Rules for Control-Card Preparation 32

3.2 NOTATION USED IN PRESENTING CONTROL-CARD FORMATS 33

4DEFINING AN SPSS FILE: THE DATA-DEFINITION CARDS 35

4.1 FILE NAME CARO 364.2 VARIABLE LIST CARO 364.3 SUBFILE LIST CARO 384.4 INPUT MEDIUM CARO 394.5 N OF CASES CARO 404.6 INPUT FORMAT CARO 41

4.6.1 Fixed-Column, Free-Field, or Binary Format 414,6.2 Format List for Data in Fixed-Column Format 424.6.3 Format Elements 424.6.4 VARIABLE LIST and Format List 464.6.5 INPUT FORMAT Card for Data in Binary Format 47

- 4.6.6 INPUT FORMAT Card for Data in FREEFIELD Format 47

4.7 DATA LIST CARO: AN ALTERNATE TO THE VARIABLE LIST ANOINPUT FORMAT CAROS 49

4.7.1 DATA LIST for Data in Fixed-Column BCD Format 504.7.2 DATA LIST for Data in Binary Format 55

4.8 MISSING VALUES CARO 57

4.8.1 Entering Lists of Variables: General Conventions 584.8.2 Specifying the Missing Va!ues 59

4.9 VALUE LABELS CARO 594.10 PRINT FORMATS CARO 614.11 VAR LABELS CARO 62

Page 30: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS vi¡

4.12 REFERENCING ALL VARIABLES SIMULTANEOUSLY:THE KEYWORD ALL 63

4.13 RULES GOVERNING THE ORDER OF DATA-DEFINITION CARDS 63

5CONTROLLING THE CALCULATIONS:THE TASK-DEFINITION CARDS 65

5.1 PROCEDURE CARDS 655.2 OPTIONS CARO 665.3 STATISTICS CARD 665.4 READ INPUT DATA CARD 675.5 RUN SUBFILES CARD 675.6 RAW OUTPUT UNIT CARD 695.7 SUMMARY OF TASK-DEFINITION CAROS 71

6THE RUN CARDS 72

6.1 RUN NAME CARD 726.2 TASK NAME CARO 736,3 FINISH CARO 736.4 PAGESIZE CARO 746.5 PRINT BACK CARD 746.6 COMMENT CARO 756.7 DOCUMENT CARD 766.8 NUMBERED CARO 76

7GENERATING AND PROCESSING SPSS FILES:CARD ORDER AND DECK SETUP 78

7.1 PROCESSING FROM A RAW-INPUT-DATA FILE ON CARDS,TAPE, OR DISK: CARD ORDER AND DECK SETUP 78

7.2 GENERATING AND RETAINING SPSS SYSTEM FILES: THE SAVEFILE CARD 81

7.3 PROCESSING DATA FROM SPSS SYSTEM FILES 83

7.3.1 GET FILE Card 857.3.2 Deck Setup and Card Order for Processing Data from SPSS

System Files 857.3.3 Modifying or Adding to Data-Definition !nformation During

Processing Runs from SPSS System Files 86

8RECODING AND VARIABLE TRANSFORMATION:THE DATA-MODIFICATION CARDS 89

8.1 RECODING VARIABLES: THE RECODE CARD 90

8.1.1 RECODE Specification List 908.1.2 Aids in Recodíng: The Keywords LOWEST, HIGHEST, and

ELSE 918.1.3 Differentiating Between Blanks and Zeros: The Keyword

BLANK 928.1.4 Converting Variables from Aiphanumeríc to Numeric 92

Page 31: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

viii CONTENTS

8.1.5 Temporary versus Permanent Recoding: the 'RECODECard 94

8.1.6 Example Usages and Deck Setup for RECODE andRECODE Cards 94

8.1.7 Limitations on the RECODE card 96

8.2 COMPUTING VARIABLES BY MEANS OF ARITHMETICEXPRESSIONS: THE COMPUTE CARD 96

8.2.1 Constructing Arithmetic Expressions on the COMPUTECard 97

8.2.2 Example Usages of the COMPUTE Card 998.2.3 Temporary Variable Computations : The COMPUTE Card 101

8.3 VARIABLE TRANSFORMATIONS WITH CONDITIONALASSIGNMENTS: THE IF CARD 101

- 8.3.1 Abbreviating the IF Statement: Implied Operands and Re-lational Operators 105

8.3.2 Temporary Variable Transformations : The *IF Card

8.4 LIMITATIONS ON COMPUTE AND IF CAROS 1078.5 CREATING ADDITIVE INDICES WITH THE COUNT CARD 107

8.5.1 COUNT Value List 1088.5.2 General Format of the COUNT Card 1088.5.3 Temporary Counts: The *COUNT Card 1098.5.4 Limitations on the COUNT Card 110

8.6 STARRED [t] AND NONSTARRED VERSIONS OF RECODE,COMPUTE, IF, AND COUNT CARDS: TEMPORARYVERSUS PERMANENT DATA MODIFICATION 110

8.7 ALLOCATING CORE STORAGE SPACE FOR DATAMODIFICATIONS: THE ALLOCATE CARD 111

8.7.1 Calculating the Amount of TRANSPACE Required 1118.7.2 Preparation and Use of the ALLOCATE Card 113

8.8 MISSING -DATA SPECIFICATIONS ANDOTHER DATA-DEFINITION INFORMATIONFOR TRANSFORMED VARIABLES 115

8.8.1 Treatment of Missing Data in Variable Transformations:The ASSIGN MISSING Card 115

8.8.2 An Alternative Treatment of Missing Data in VariableTransformations : The MISSING VALUES Card 119

8.8.3 Initializing Variables Created with IF Statements 1208.8.4 Inserting Other Data-Definition Information for Trans-

formed Variables 121

8.9 REDUCING CONTROL-CARD PREPARATION WITH THE REPEATFACILITY: DO REPEAT AND END REPEAT CARDS 121

8.9.1 Valid Control Cards Between DO REPEAT and END RE-PEAT 123

8.9.2 Limitations on the REPEAT Facility 124

8.10 DECK SETUP AND CARD ORDER FOR DATA-MODIFICATIONCAROS 126

Page 32: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS ix

9DATA-SELECTION CARDS 127

9.1 GENERATING A RANDOM SAMPLE FROM A FILE: SAMPLE AND*SAMPLE CAROS 127

9.2 SELECTING CASES FROM A FILE: SELECT IF AND *SELECT IFCAROS 128

9.3 WEIGHTING CASES IN A FILE: WEIGHT AND *WEIGHT CAROS 1289.4 DECK SETUP AND CARD ORDER FOR DATA-SELECTION CARDS 131

10RETRIEVING DATA AND DESCRIPTIVE INFORMATIONFROM SPSS SYSTEM FILES:THE FILE RETRIEVAL (LIST AND WRITE) CARDS 133

10.1 LIST FILEINFO CARD 13410.2 LIST CASES CARD 13710.3 PRODUCING RAW-OUTPUT-DATA FILES: THE WRITE CASES

CARD 139

10.3.1 Variable List for the WRITE CASES Card 14010.3.2 Format Specification for the WRITE CASES Card 14010.3.3 A Note on Writing Cases from Weighted Files 14310.3.4 Listwise Deletion of Missing Data: WRITE CASES

Option 1 14310.3.5 Example Deck Setup and Card Order for the WRITE

CASES Procedure 144

10.4 WRITE FILEINFO CARO 14510.4.1 Variable List and Data-Definition Keywords for the WRITE

FILEINFO Card 14510.4.2 Output Option Keywords for the WRITE FILEINFO Card 146

_ 10.4.3 Using WRITE FILEINFO Cards to Regenerate a SystemFile 148

10.4.4 Producing a VARIABLE LIST with the WRITE FILEINFOCard for Use on EDIT Runs 148

11ADDING TO, DELETING FROM, AND REARRANGINGDATA IN SPSS SYSTEM FILES:THE FILE-MODIFICATION CARDS 149

11.1 DELETING AND RETAINING VARIABLES: DELETE VARS ANDKEEP VARS CAROS 149

11.2 ADDING VARIABLES TO AN EXISTING SPSS SYSTEM FILE: ADOVARIABLES AND ADD DATA LIST CAROS 151

11.2.1 Adding Variables with an ADD VARIABLES Card 15111.2.2 Adding Variables with an ADD DATA LIST Card 153

11.3 MERGING SPSS SYSTEM FILES: THE MERGE FILES CARD 15511.4 ADDING CASES AND SUBFILES TO EXISTING SPSS SYSTEM

FILES: ADD CASES AND ADD SUBFILES CAROS 157

11.4.1 ADD CASES Card 157

Page 33: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS

11.4.2 ADO SUBFILES Card 15711.4.3 Se tt ing Up Runs Which Add Cases or Subfiles 158

11.5 DELETING SUBFILES WITH THE DELETE SUBFILES CARD 15911.6 DELETING CASES: ANOTHER USE OF THE SELECT IF CARD 15911.7 REORDERING THE SEQUENCE OF VARIABLES IN AN SPSS

SYSTEM FILE: THE REORDER VARS CARD 16011.8 CHANGING THE SEQUENCE OF CASES IN AN SPSS SYSTEM

FILE: THE SORT CASES CARD 162

11.8.1 Handling Missing Values While Sorting Cases 16411.8.2 Sorting Alphanumeric Variables : Don't 16411.8.3 Determining the Number of Cases When Sorting for New

Subfiles 16411.8.4 Deck Location of the SORT CASES Control Card 16411.8.5 Special Operating System Control Statements for Use

with SORT CASES 165

11.9 CREATING OR REPLACING THE SUBFILE STRUCTURE WITH THESUBFILE LIST CARD 165

11.10 SUMMARY OF FILE-MODIFICATION CARDS AND THEIR DECKPLACEMENT 165

12CREATING, RETRIEVING, AND MANIPULATING FILESWITH MORE THAN 500 VARIABLES:GET ARCHIVE, SAVE ARCHIVE,AND LIST ARCHINFO CARDS 167

12.1 ACCESSING INPUT FILES WITH THE GET ARCHIVE CARD 16812.2 PERMANENTLY SAVING AN ARCHIVE FILE WITH THE SAVE

ARCHIVE CARO 16912.3 USING THE ARCHIVE FACILITY: EXAMPLES 171

12.3.1 Creating an Archive File Which Includes New VariablesProduced by Data Transformations 171

12.3.2 Creating an Archive File by Merging Severa) Input Files 17112.3.3 Creating an Archive File Which Includes New Variables

Input Via the ADD VARIABLES Facilíty 172

12.4 SAVING THE ACTIVE VARIABLES ON A STANDARD SPSSSYSTEM FILE: THE SAVE FILE CARD 172

12.5 USING ARCHIVE FILES FOR ROUTINE STATISTICALPROCESSING 173

12.6 RETRIEVING DESCRIPTIVE INFORMATION FROMARCHIVE INPUT FILES: THE LIST ARCHINFO CARD 173

12.7 LIMITATIONS FOR THE ARCHIVE FACILITY 174

13EDITING SPSS CONTROL-CARD DECKS:THE EDIT FACILITY 176

13.1 EDITING DECKS WHICH ACCESS A RAW-INPUT-DATA FILE 17713.2 EDITING DECKS WHICH ACCESS AN SPSS SYSTEM FILE 17713.3 CORE STORAGE SPACE REQUIRED FOR EDIT RUNS 17713.4 PRINTED OUTPUT FROM EDIT RUNS 17813.5 CONTROL-CARD DECK ERRORS THAT THE EDIT FACILITY WILL

NOT DETECT 178

Page 34: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS xi

13.6 EXAMPLE OUTPUT FROM EDIT RUNS 178

14DESCRIPTIVE STATISTICS ANDONE-WAY FREQUENCY DISTRIBUTIONS 181

14.1 A BRIEF INTRODUCTION TO DESCRIPTIVE STATISTICS 18214.2 SUBPROGRAM CONDESCRIPTIVE: DESCRIPTIVE STATISTICS

FOR CONTINUOUS VARIABLES 185

14.2.1 CONDESCRIPTIVE Procedure Card 18614.2.2 Processing Options for Subprogram CONDESCRIPTIVE 18914.2.3 Statistics Available for Subprogram CONDESCRIPTIVE 18914.2.4 Program Limitations for Subprogram CONDESCRIPTIVE 19014.2.5 Examples of the Use of Subprogram CONDESCRIPTIVE 191

14.3 SUBPROGRAM FREQUENCIES: ONE-WAY FREQUENCY DIS-TRIBUTIONS WITH DESCRIPTIVE STATISTICS 194

14.3.1 Selecting Between General and Integer Operating Modes 19414.3.2 FREQUENCIES Procedure Card 19514.3.3 Printed Output from the FREQUENCIES Procedure 19714.3.4 Optíons Available for Subprogram FREQUENCIES 20014.3.5 Statistics Available for Subprogram FREQUENCIES 20114.3.6 Limitations for Subprogram FREQUENCIES 201

15PRODUCING DESCRIPTIVE STATISTICSFOR AGGREGATED DATA FILES:SUBPROGRAM AGGREGATE 203

15.1 SPECIFYING AGGREGATION GROUPS 205

15.1.1 Required Structure of the Input File 205

15.2 OBTAINING THE OUTPUT FILE OF AGGREGATED DESCRIPTIVESTATISTICS 205

15.3 AGGREGATE PROCEDURE CARD 206

15.3.1 GROUPVARS= Variable List 20715.3.2 VARIABLES= List 20715.3.3 AGGSTATS= Keyword List 20815.3.4 Providing for Missing Values with the RMISS=value/ 20915.3.5 Repetitions of the VARIABLES=, AGGSTATS=, and

RMISS= Specifications 21015.3.6 Outputting the Actual Values of the Grouping Variables

with Option 3 211

15.4 CONTENT AND FORMAT OF AGGREGATED OUTPUT DATAFILES 211

15.4.1 Types of Aggregated Output Data Files 21215.4.2 Contents of Each Case on an Aggregated Output File 212

15.5 PRINTED OUTPUT FROM SUBPROGRAM AGGREGATE 21315.6 OPTIONS AVAILABLE FOR SUSPROGRAM AGGREGATE 21315.7 STATISTICS AVAILABLE FOR SUBPROGRAM AGGREGATE 21415.8 PROGRAM LIMITATIONS FOR SUBPROGRAM AGGREGATE 21515.9 EXAMPLE DECK SETUP AND OUTPUT FOR SUBPROGRAM

AGGREGATE 215

Page 35: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xii CONTENTS

16CONTINGENCY TABLES AND RELATEDMEASURES OF ASSOCIATION:SUBPROGRAM CROSSTABS 218

16.1 AN INTRODUCTION TO CROSSTABULATION 218

16.1.1 Crosstabulation Tables 21916.1.2 Summary Statistics for Crosstabulations 222

16.2 SUBPROGRAM CROSSTABS: TWO-WAY TO N-WAY CROSS-TABULATION TABLES AND RELATED STATISTICS 230

16.2.1 Components of the CROSSTABS Procedure Card 23116.2.2 TABLES= List 23116.2.3 VARIABLES= List for Integer Mode 234

16.3 FORMAT OF THE CROSSTABS PROCEDURE CARD 23616.4 PRINTED OUTPUT FROM SUBPROGRAM CROSSTABS 237

16.4.1 A Note on Value Labeis for CROSSTABS 23916.4.2 Including Missing Values Only in Tables 240

16.5 OPTIONS AVAILABLE FOR SUBPROGRAM CROSSTABS: THEOPTIONS CARD 241

16.6 STATISTICS AVAILABLE FOR SUBPROGRAM CROSSTABS: THESTATISTICS CARD 242

16.7 PROGRAM LIMITATIONS FOR SUBPROGRAM CROSSTABS 243

16.7.1 Program Limitations for Subprogram CROSSTABS, Gen-eral Mode 243

16.7.2 Program Limitations for Subprogram CROSSTABS, In-teger Mode 244

16.8 EXAMPLE DECK SETUPS FOR SUSPROGRAM CROSSTABS 245

17DESCRIPTION OF SUBPOPULATIONSAND MEAN DIFFERENCE TESTING:SUBPROGRAMS BREAKDOWN AND T-TEST 249

17.1 SUBPROGRAM BREAKDOWN 249

17.1.1 BREAKDOWN Operating Modes: Integer and General 25017.1.2 TABLES= List 25217.1.3 VARIABLES= List for Integer Mode 25417.1.4 Format of the BREAKDOWN Procedure Card 25517.1.5 Options Available for Subprogram BREAKDOWN: The

OPTIONS Card 25717.1.6 Statistics for Subprogram BREAKDOWN: One-Way

Analysis of Variance and Test of Linearity 25717.1.7 Program Limitations for Subprogram BREAKDOWN 26117.1.8 Example Deck Setup for Subprogram BREAKDOWN 26217.1.9 BREAKDOWN Tables Printed in Crosstabular Form: The

CROSSBREAK Facility 264

17.2 SUBPROGRAM T-TEST: COMPARISON OF SAMPLE MEANS 267

17.2.1 Introduction to the T-TEST of Significance 26717.2.2 The T-TEST Procedure Card 27117.2.3 Options and Statistics for Subprogram T-TEST 27317.2.4 Program Limitations for Subprogram T-TEST 273

_ r-

Page 36: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS xíii

17.2.5 Example Deck Setups and Output for SubprogramT-TEST 274

18BIVARIATE CORRELATION ANALYSIS:PEARSON CORRELATION, RANK-ORDER CORRELATION,AND SCATTER DIAGRAMS 276

18.1 INTRODUCTION TO CORRELATION ANALYSIS 27618.2 SUBPROGRAM PEARSON CORR: PEARSON

PRODUCT-MOMENT CORRELATION COEFFICIENTS 280

18.2.1 PEARSON CORR Procedure Card 28118.2.2 Options Available for Subprogram PEARSON CORR 28318.2.3 Statistics Available for Subprogram PEARSON CORR 28518.2.4 Program Limitations for Subprogram PEARSON CORR 28518.2.5 Sample Deck Setup and Output for Subprogram PEAR-

SON CORR 286

18.3 SUBPROGRAM NONPAR CORR: SPEARMAN ANDIORKENDALL RANK-ORDER CORRELATION COEFFICIENTS 288

1183.1 NONPAR CORR Procedure Card 29018.3.2 Options Available for Subprogram NONPAR CORR 29118.3.3 Statistics Available for Subprogram NONPAR CORR 29118.3.4 Program Limitations for Subprogram NONPAR CORR 29118.3.5 Sample Deck Setup and Output for Subprogram

NONPAR CORR 292

18.4 SUBPROGRAM SCATTERGRAM: SCATTER DIAGRAM OF DATAPOINTS AND SIMPLE REGRESSION 293

18.4.1 SCATTERGRAM Procedure Card 29418.4.2 Options Available for Subprogram SCATTERGRAM 29618.4.3 Statistics Available for Subprogram SCATTERGRAM 29718.4.4 Program Limitations for Subprogram SCATTERGRAM 29718.4.5 Sample Deck Setup and Output for Subprogram SCAT-

TERGRAM 298

19PARTIAL CORRELATION:SUBPROGRAM PARTIAL CORR 301

19.1 INTRODUCTION TO PARTIAL CORRELATION ANALYSIS 30219.2 PARTIAL CORR PROCEDURE CARD 305

19.2.1 Correlation List 30619.2.2 Control List 30619.2.3 Order Value(s) 307

19.3 SPECIAL CONVENTIONS FOR MATRIX INPUT FORSUBPROGRAM PARTIAL CORR 308

19.3.1 Requirements on the Forrn and Format for CorrelationMatrices 308

19.3.2 Methods for Specifying the Order of Variables on InputMatrices 309

19.3.3 Specifying the Order of Variables on Input Matrices bythe Partials List: The Default Method 309

19.3.4 Specifying the Order of Variables on Input Matrices bythe Variable List Card: Option 6 310

Page 37: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xiv CONTENTS

19.3.5 SPSS Control Cards Required for Matrix Input 311

19.4 OPTIONS AVAILABLE FOR SUBPROGRAM PARTIAL CORR 31219.5 STATISTICS AVAILABLE FOR SUBPROGRAM PARTIAL CORR 31519.6 PROGRAM LIMITATIONS FOR SUBPROGRAM PARTIAL CORR 31519.7 EXAMPLE DECK SETUPS FOR SUBPROGRAM PARTIAL CORR 316

20MULTIPLE REGRESSION ANALYSIS:SUBPROGRAM REGRESSION 320

20.1 INTRODUCTION TO MULTIPLE REGRESSION 321

20.1.1 Simple Bivariate Regression 32320.1.2 Extension to Multiple Regression 328

20.2 REGRESSION PROCEDURE CARD 342

20.2.1 VARIABLES List 34320.2.2 REGRESSION Desígn Statement 343

20.3 SUMMARY OF PROCEDURE CARDS 34820.4 SPECIAL CONVENTIONS FOR MATRIX INPUT WITH

SUBPROGRAM REGRESSION 349

20.4.1 Format of Input Correlation Matrices 34920.4.2 Format and Position of Input Means and Standard Devia-

tions 35020.4.3 Methods for Specifying the Number and Order of Vari-

ables on Input Correlation Matrices 35020.4.4 Control Cards Required ,o Enter Matrices 351

20.5 SPECIAL CONVENTIONS FOR HANDLING RESIDUAL OUTPUT 35120.6 OPTIONS AVAILABLE FOR SUBPROGRAM REGRESSION 35220.7 STATISTICS AVAILABLE FOR SUBPROGRAM REGRESSION 35520.8 PROGRAM LIMITATIONS FOR SUBPROGRAM REGRESSION 35620.9 PRINTED OUTPUT FROM SUBPROGRAM REGRESSION 35820.10 EXAMPLE DECK SETUPS AND OUTPUT FOR SUBPROGRAM

REGRESSION 360

21SPECIAL TOPICS IN GENERAL LINEAR MODELS 368

21.1 NONLINEAR RELATIONSHIPS 368

21.1.1 Data Transformation 36921.1.2 Examining Polynomial Trends 37121.1.3 Interaction Terms 372

21.2 REGRESSION WITH DUMMY VARIABLES 373

21.2.1 Dummy Variables: Coding and Interpretation 37421.2.2 Dummy Variable Regression with Two or More Categori-

cal Variables 377

21.3 PATH ANALYSIS AND CAUSAL INTERPRETATION 383

21.3.1 Principals of Path Analysis 38421.3.2 Interpretation of Path Analysis Results 38721.3.3 Statistical In,ference 39221.3.4 Standardized versus Unstandardized Coefficients 394

Page 38: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS xv

22ANALYSIS OF VARIANCE AND COVARIANCE:SUBPROGRAMS ANOVA AND ONEWAY 398

22.1 INTRODUCTION TO ANALYSIS OF VARIANCE ANDCOVARIANCE 399

22.1.1 Variation and Its Decomposition: Basic Ideas 40022.1.2 Factorial Design with Equal Cell Frequency 40122.1.3 Factorial Designs with Unequal Cell Frequency 40522.1.4 Covariance Analysis 40822.1.5 Multiple Classification Analysis 409

22.2 SUBPROGRAM ANOVA 410

22.2.1 ANOVA Procedure Card 41122.2.2 Specifying the Form of the Analysis with the OPTIONS

Card 41322.2.3 Multiple Classification Analysis (MCA) 41622.2.4 Options and Statistics Available for ANOVA 41822.2.5 Special Limitations for ANOVA 41922.2.6 Example Deck Setups and Output for Subprogram

ANOVA 421

22.3 SUBPROGRAM ONEWAY 422

22.3.1 Tests for Trends 42522.3.2 A Priori Contrasts 42522.3.3 A Posteriori Contrasts 42622.3.4 More Complex ONEWAY Deck Setups 42822.3.5 Options Available for Subprogram ONEWAY 42922.3.6 Statistics Available for Subprogram ONEWAY 43022.3.7 Program Limitations for Subprogram ONEWAY 43022.3.8 Example Deck Setup and Output for Subprogram ONE-

WAY 430

23DISCRIMINANT ANALYSIS 434

23.1 INTRODUCTION TO DISCRIMINANT ANALYSIS 435

23.1.1 A Two-Group Example 43623.1.2 A Multigroup Example 439

23.2 ANALYTIC FEATURES OF SUBPROGRAM DISCRIMINANT 441

23.2.1 Determining the Number of Discriminant Functions 44223.2.2 Interpretation of the Discriminant Function Coefficients 44323.2.3 Plots of Discriminant Scores 44423.2.4 Rotation of the Discriminant Function Axes 44423.2.5 Classification of Cases 44523.2.6 Selection Methods: Direct and StepNvise 446

23.3 DISCRIMINANT PROCEDURE CARD 448

23.3.1 GROUPS Specification 44923.3.2 VARIABLES Specification 45023.3.3 Specifying Subanalyses with the ANALYSIS Keyvvord 45023.3.4 METHOD 45223.3.5 TOLERANCE 453

1

Page 39: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xvi CONTENTS

23.3.6 MAXSTEPS 45323.3.7 Setting Mínimum Criteria for the Stepwise Procedure 45323.3.8 Controlling the Number of Discriminant Functions 45423.3.9 Establishing Prior Probabilities for Classification Pur-

poses 45523.3.10 Summary of the DISCRIMINANT Procedure Card Specifi-

cations 456

23.4 OPTIONS AVAILABLE IN SUBPROGRAM DISCRIMINANT 45623.5 STATISTICS AVAILABLE IN SUBPROGRAM DISCRIMINANT 45923.6 SPECIAL CONVENTIONS FOR MATRIX OUTPUT AND INPUT FOR

SUBPROGRAM DISCRIMINANT 46023.7 PROGRAM LIMITATIONS FOR SUBPROGRAM DISCRIMINANT 46123.8 SAMPLE DECK SETUP AND OUTPUT FOR SUBPROGRAMM

DISCRIMf NANT 462

24FACTOR ANALYSIS 468

24.1 INTRODUCTION TO FACTOR ANALYSIS 469

24.1.1 Types of Factor Analysis 46924.1.2 Meaning of Essential Tables and Statistics in Factor

Analysis Output 473

24.2 METHODS OF FACTORING AVAILABLE INSUBPROGRAM FACTOR 478

24.2.1 Principal Factoring without Iteration: PA1 47924.2.2 Principal Factoring with Iteration: PA2 48024.2.3 Remaining Methods of Factoring 481

24.3 METHODS OF ROTATION AVAILABLE INSUBPROGRAM FACTOR 482

24.3.1 Orthogonal Rotation: QUARTIMAX 48424.3.2 Orthogonal Rotation: VARIMAX 48524.3.3 Orthogonat Rotation: EQUIMAX z 8524.3.4 Oblique Rotation: OBLIQUE 48524.3.5 Graphical Presentation of Rotated Orthogonal Factors 486

24.4 BUILDING COMPOSITE INDICES (FACTOR SCORES) FROM THEFACTOR-SCORE COEFFICIENT(OR FACTOR-ESTIMATE) MATRIX 487

24.4.1 Values of Factor Scores Output by Subprogram FACTOR 48924.4.2 Calculation of Factor Scores When Missing Data Are to

Be Replaced by the Mean 489

24.5 FACTOR PROCEDURE CARD 490

24.5.1 VARIABLES= List 49024.5.2 Selection of Factoring Methods by the TYPE= Keyword 49024.5.3 Altering the Diagonal of the Correlation Matrix by Means

of the DIAGONAL Value List 49124.5.4 Controlling the Factoring Process: NFACTORS,

MINEIGEN, ITERATE, and STOPFACT Parameters 49224.5.5 Selecting the Method of Rotation With the ROTATE

Parameter 49424.5.6 Writing Factor Scores on a Raw-Output-Data File: The

FACSCORE Keyword 496

Page 40: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS xvii

24.5.7 Performing Multiple Factor Analyses 496- 24.5.8 Summary of the Format of the FACTOR Procedure

Card 497

24.6 SPECIAL CONVENTIONS FOR MATRIX INPUT AND OUTPUT- FOR SUBPROGRAM FACTOR 499

24.6.1 Format of Input Correlation Matrices 50024.6.2 Methods for Specifying the Number and Order of Vari-

ables on Input Correlation Matrices 50024.6.3 Input of the Factor Matrix 50124.6.4 Control Cards Required to Enter Matrices 50124.6.5 Output of Correlation and Factor Matrices 502

24.7 FACTOR-SCORES OUTPUT FORMAT 502

24.7.1 Factor Scores Produced from Selected Data 50324.7.2 Printed Output Generated For Factor Scores 503

24.8 OPTIONS AVAILABLE FOR SUBPROGRAM FACTOR 50324.9 STATISTICS AVAILABLE FOR SUBPROGRAM FACTOR 50624.10 PROGRAM LIMITATIONS FOR SUBPROGRAM FACTOR 50724.11 EXAMPLE DECK SETUPS FOR SUBPROGRAM FACTOR 508

25CANONICAL CORRELATION ANALYSIS:SUBPROGRAM CANCORR 515

25.1 INTRODUCTION TO CANONICAL CORRELATION ANALYSIS 51525.2 BUILDING COMPOSITE INDICES (CANONICAL VARIATE

SCORES) FROM THE CANONICAL VARIATECOEFFICIENT MATRICES 519

25.3 THE CANCORR PROCEDURE CARD 52025.4 SPECIAL CONVENTIONS FOR MATRIX INPUT AND OUTPUT 52125.5 OUTPUT FROM SUBPROGRAM CANCORR 52225.6 OPTIONS AVAILABLE IN SUSPROGRAM CANCORR 52325.7 STATISTICS AVAILABLE FOR SUBPROGRAM CANCORR 52525.8 LIMITATIONS FOR SUBPROGRAM CANCORR 52525.9 EXAMPLE DECK SETUP AND OUTPUT 526

26SCALOGRAM ANALYSIS:SUBPROGRAM GUTTMAN SCALE 528

26.1 INTRODUCTION TO GUTTMAN SCALE ANALYSIS 529

26.1.1 Evaluating Guttman Scales 53126.1.2 Building Guttman Scales 533

26.2 GUTTMAN SCALE PROCEDURE CARD 53526.3 OPTIONS AVAILABLE FOR SUBPROGRAM GUTTMAN SCALE 536

_ 26.4 STATISTICS AVAILABLE FOR SUBPROGRAM GUTTMAN SCALE 53726.5 LIMITATIONS FOR SUBPROGRAM GUTTMAN SCALE 53726.6 SAMPLE DECK SETUP AND OUTPUT FOR SUSPROGRAM

GUTTiMAN SCALE 538

Page 41: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xviii CONTENTS

APPENDIXES

- ASUMMARY OF SPSS CONTROL CARDS: FUNCTION,STATUS, FORMAT, AND POSITION IN CARD DECK 540

A.1 PRECEDENCE TABLE 540A.2 CONTROL-CARD FORMATS 541

A.2.1 Nonprocedure Control Cards 541A.2.2 Temporary (Starred) Data-Modification and Data-Selection

Cards 555A.2.3 Procedure Cards 555

BCHANGES IN CONTROL-CARD FORMATS:

- HELP FOR OLD FRIENDS 562

CSPSS MAXIVERSION 576

DSPSSG: THE MINIVERSION OF SPSS 578

D.1 GENERAL DESCRIPTION OF SPSSG 578D.2 DATA AND FILE MODIFICATIONS 579D.3 DIFFERENCES IN THE IMPLEMENTATION OF CERTAIN FEA-

TU R E S 580

D.3.1 *SELECT IF 580D.3.2 ADD CASES 580D.3.3 ALLOCATE 581D.3.4 REPEAT Facility 581D.3.5 Archive Files 581D.3.6 FREEFIELD 581

D.4 SPSSG PROGRAM LIMITATIONS 581

D.4.1 General Program Limitations 582D.4.2 Maximum Arguments on a Control Card 582D.4.3 Limitations on Data and File Modifications 582D.4.4 Specific Subprogram Limitations 584

D.5 SYSTEM FILE COMPATIBILITY 584

EJOB CONTROL LANGUAGE FORIBM OS/370 INSTALLATIONS 585

E.1 PURPOSE OF JCL 585E.2 GENERAL RULES FOR PREPARING JCL 586E.3 ELEMENTS OF JCL 587 !

E.3.1 JOB Statement 587E.3.2 EXEC Statement 588E.3.3 DD Statement 588

}

V. I. I IIVr u( IVICVIUIVI L.dIu V i`+

Page 42: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

CONTENTS xix

E.4 BASIC JCL STATEMENTS REQUIRED FOR SPSS 592E.5 JCL STATEMENTS FOR DEFINING THE USER'S FILES 594

E.S.1 Raw-Input-Data File 594E.S.2 Output System File 595E.5.3 Input System File 595E.5.4 Raw-Output- Data Files 596

E.6 JCL STATEMENTS REQUIRED FOR SPECIAL SPSS FEATURES 597

E.6.1 Input Archive Files 598E.6.2 SORT CASES 598E.6.3 Reading OSIRIS Data Sets 600

E.7 EXTENSIONS TO BASIC JCL 600E.8 DD STATEMENT REFERENCE SUMMARY 602

FCDC 6000 AND CYBER 70 VERSION OF SPSS 604

F.1 CONTROL CARDS 604

F.1.1 Job Cards 605F.1.2 SPSS Call Cards 605

F.2 SAMPLE JOBS 607

F.2.1 BCD Input Files 607F.2.2 BCD Output Files 608F.2.3 Control Cards for Output SPSS System Files 609F.2.4 SPSS Input System Files 610

F.3 SCRATCH FILES 611F.4 DIFFERENCES BETWEEN THE IBM 360 AND CDC 6000 VERSIONS 612F.5 MEMORY REQUIREMENTS FOR SPSS-6000 JOBS 612

GUNIVAC 1100 SERIES VERSION OF SPSS 614

G.1 SPSS CARDS UNDER SPSS-1100 614

G.1.1 INPUT MEDIUM Card 614G.1.2 N OF CASES Card 615G.1.3 PAGE SIZE 615G.1.4 RAW OUTPUT UNIT 615

G.2 CONTROL CARDS 615

G.2.1 SPSS Files under EXEC 8 615

G.3 SAMPLE RUNS 616G.4 SUMMARY OF THE FILES USED BY SPSS-1100 617G.5 SPACE CALCULATIONS 618

HXEROX VERSION OF SPSS 619

H.1 JOB CONTROL LANGUAGE FOR XEROX SPSS 620

H.1.1 Modifying Limits Raw Case Size 621H.1.2 Xerox SPSS Data-Control Blocks and Scratch Files 622

Page 43: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

-xx CONTENTS

H.2 DETERMINING MEMORY SPACE REQUIREMENTS FOR XEROXSPSS 622

H.3 DESCRIBING TAPE AND DISC FILES USING A FILE IDENTIFIER 623

H.3.1 Adding Files to a Labelled Tape 625

H.4 XEROX SPSS EXTENSIONS 625

H.4.1 INPUT MEDIUM Card 625H.4.2 SAVE FILE and GET FILE Cards 626H.4.3 RAW OUTPUT UNIT Card 626H.4.4 OUTPUT MEDIUM Card 626H.4.5 Examples 627H.4.6 SORT CASES Card 628H.4.7 MERGE FILES, GET ARCHIVE, and SAVE ARCHIVE Cards 628

_ H.4.8 COMMAND FILE Card 629H.4.9 ALLOCATE Card 630

A PROGRAMMER'S GUIDE TO SPSS 631

1.1 SYSTEM FLOW AND LOGIC 632_ 1.2 SPECIFIC SUBPROGRAM LOGIC 639

1.2.1 Utility Subprograms 6391.2.2 Data-Modification Routines 6421,23 Statistical Routines 643

1.3 INCORPORATION OF A NEW ROUTINE INTO SPSS 645

1.3.1 Service Routines of Interest 6451.3.2 Existing Statistical Routines of Interest 6461.3.3 Installing A New Statistical Procedure 647

JOSIRIS-SPSS INTERFACE 657

J.1 OPERATION OF THE OSIRIS-SPSS INTERFACE 657J.2 OSIRIS VARS CONTROL CARO 658J.3 HANDLING MISSING VALUES 659J.4 SPSS CONTROL-CARD DECKS FOR OSIRIS-SPSS INTERFACE

RUNS 659J.5 EXAMPLE DECK SETUP 660

INDEXES 663

NAME INDEXSUBJECTINDEX

(t�

Page 44: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

PREFACE

Statistical Packagefor ¡he Social Sciences (SPSS), the system of computer programs described

in Chis volume, represents nearly a decade of systems design and programming and documenta-

tion on the parí of che authors and others. When che first edition of chis manual was published in

1970, SPSS was being used at approximately 60 installations. It is now being used at nearly 600installations, including conversions to almost 20 different operating systems and computers.The current version of SPSS has almost double the amount of statistical procedures and

data-manaeement facilities as were documented in the first edition. Needless co say, we are

thankful for the opportunity to continue developing SPSS and to serve an ever-increasingaudience.

SPSS has been a success, much more so than the authors ever expected. We feel that che

major reason for chis success is that SPSS was developed through che close cooperation of three

types of specialists: practicing social science researchers, computer scientists, and statisticians.

At each stage in che development of SPSS we have attempted to satisfy [hese criteria:

1 That che statistical procederes be mathematically and statistically correct

2 That the program design and code be computationally efficient3 That the logic and syntax of the system parallel the way in which social scientists approach

data analysis4 That the system provide statistical procedures and data-management facilities tailored to che

particular needs of empirical social researchers

Without che contribution of experts in each field SPSS could not have effectively satisfied these

goals.Vying for importance in the success of SPSS has been our continuing concern for

documentation. While many of our users are very sophisticated social scientists, chey are not

necessarily sophisticated statisticians and computer specialists, nor should they be expected to

have the time or training necessary to master [hese fields. We feel that SPSS documentation

must be comprehensive enoueh co enable students and researchers alike to use SPSS accurately

xxi

Page 45: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xxii PREFACE

and efficiently. We hope that even someone who has no experience with computers will be able- to run SPSS jobs successfully by using this manual. Furthermore, since the SPSS system is used

in many social science methods courses, we feel it is important that the manual not only explainhow to invoke the statistical procedures but also provide brief discussions of che variousstatistical techniques available. Of course, in attempting to satisfy these various needs chemanual has grown to be a very large book indeed. However, we hope that che organization ofche manual as well as the inclusion of an índex will enable users to find what they needconveniently.

Finally, the availability within SPSS of a wide variety of data-management facilities andstatistical procedures simplifies the process of data analysis. It avoids many difficult, time-consuming, and generally unrewarding tasks involved in using a variety of single-purposecomputer programs, each with its own idiosyncratic control•card syntaxes and input dataformats. The researcher thus spends less time as a data-preparation clerk and more time as asocial scientist analyzing substantive results.

The development of SPSS began in 1965 at Stanford University as a result of che frustrationfelt by severa! of the authors in trying to serve che research and teaching needs of che politicalscience department and the Instirute for Political Studies by a library of single-purpose dataanalysis programs. We and che users suffered the annoyance of having to learn how to operaremany different programs and endured che time-consuming task of transferring data and resultsbetween essentially noncompatible programs. Furthermore, the documentation for many ofthese programs ranged from cryptic to nonexistent. Thac and the fact that che programs werewritten in many different languages made them extremely difficult to maintain. Finally, whilemany statistical analysis programs were available, we were forced to write our own programs toperform even che simplest recoding, data transformation, Pile editing, and other routine house-keeping chores chal are essential in social science data analysis. As a result of our experienceswe began to design an integrated system that would automate che routine tasks of dataprocessing and around which a series of statistical programs could be built. The basic data-modification, file-handling, and data-description facilities were programmed, and as timepermitted statistical analysis procedures were added.

The system developed into SPSS as it was described in che fcrst edition of che manual.Since that time we have followed our plan of incremental development. New facilities havegradually been added and many flaws in the original design have been corrected. SPSS usershave submitted suggestions and even complete procedures which have been incorporated intothe system. The update manuals that we published each year fnally became so urwieldy that wewere compelled to provide chis second edition of che manual so that SPSS would once again befully described in a single volume.

The current version of SPSS still has a number of deficiencies, but we feel that it meets agreat many of che needs of social science data analysis. SPSS has always been considered anopen-ended system, and we have again included a programmer's cuide to SPSS (Appendix 1) inthe hopes chal users who wish to add statistical procedures to che system will do so. Because ofthe design of SPSS, all future programs incorporated into the system can cake completeadvantage of the capabilities for file maintenance and data handling which exist in che package;chis should be an incentive to those contemplating the addition of other statistical programs.

For some time now we have felt that one of che most serious drawbacks of che currentSPSS system lies not in the lack of a wider variety of statistical techniques but in thai it operaresonly as a batch program. We and others have had a great deal of success running SPSS through atext-editing and remoce batch-entry system, enabling che user to prepare and enter all jobs froma remoce terminal and to retrieve and print small jobs directly on che terminal. However, thoughchis adds conveniente, it in no way substituces for a crue conversational statistical analysispackage. Consequently, the SPSS project staff is now developing a conversational version ofSPSS.

The difference between performing analysis in a batch system and a conversational systemcan be compared to the difference between interaccing with another person by leaer and byphone. With a conversational system che researcher is brought finto close contad with che databeing analyzed; results from a statistical procedure are retumed on che researcher's terminal in a

Page 46: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

PREFACE xxiii

few seconds and may be used immediately to guide che next step of the analysis. This facilitatesthe iterative process of investigative research in which ideas are tested on che data, che resultssuggest new ideas and modifications of oíd ideas, che new ideas are then tested, and so forth. Atrain of thought which could cake days (and many trips to a perhaps distant computer cerner) toexplore in a batch system can be explored in a single interactive session. Thus che researcher'sconcentration and interest are focussed on the relationships being investigated. At che same timethat a conversational system facilitares pursuing a single fine of thought, the limited printingspeed on remote terminals discourages the Brand fishing expeditions and che production ofexcessive amounts of output that plague research in a batch environment. Finally, a goodconversational system is somewhat self-teaching; it can guide the user in formulating che correctrequests, and che almost instant feedback that facilitares che use in performing statistical analysisis also invaluable in identifying syntax errors and enabling che user to correct them immediately.In short, we feel that a conversational statistical analysis program could help revolutionizeempirical social research.

Conversational SPSS, in addition to including conversational statistical analysis proce-dures, will feature interactive data- and file-definition capabilities, expanded data-transformation facilities, and ultimately che capacity to process hierarchical and other complexlile structures. Although we are very enthusiastic about the prospecta for conversational dataanalysis, we do not expect it to completely supplant batch processing. While a conversationalsystem is ideal for investigative research, hypothesis testing, and rapid retrieval of a smallnumber of specific statistics, it cannot be used efficiently for lengthy routine jobs andlorprocessing files that contain a very large number of data cases. Thus we expect that researcherswill wish to use both the batch and conversational versions of SPSS. This will be reflected in chedesign of conversational SPSS as well as che evolution of batch SPSS-we are attempting tomake the two systems useful as a team. Conversational SPSS will be capable of reading andproducing batch-SPSS-formar system files, and the control statement syntax for conversationalSPSS will be as compatible as possible with batch-SPSS conventions.

We are al¡ very excited about che future of SPSS, and we thank you, the users, for theopportunity to contirue its development. Your support, enthusiasm, and constructive criticismhave been both impetus and reward to us. We dedicate Chis book to you in che hoye that it willmake data analysis an easier task, and that in some small way, through your research, we arecontributing to the development of che social sciences.

ACKNOWLEDGMENTS

For che past five years che SPSS project has been located at the Nacional Opinion ResearchCenter at che University of Chicago. Throughout chis critical period of our development NORChas provided us with frnancial, organizational, and intellectual support. This support has indeedbeen generous and has been a major factor responsible for our rapid growth and development.We can think of no other organization which would have provided a berter home for us. Inparticular, we are grateful to Norman Bradbum and James Davis, rhe past and present directorsof NORC, and Associate Director Shelby Orrell, without whose backing we might havefloundered in che early ye ars when our financia¡ situation could only be described as precarious.We also wish co thank che University of Chicago Coniputation Center for donating computertime and allowing C. Hadlai Hull che flexibility to continue his work on the SPSS project.

The narre of Patrick Bova, che librarían of NORC, is very familiar to many SPSS users.Mr. Bova has oganized and directs the distribution of che SPSS system and all associated [[literature, and while maintaining our financia! records, had thus far managed to keep us both •�solvent and honest. We are all very grateful for his efforts in what has become a very difficult,complicated, and time-consuming effort. The SPSS project is also very fortunate in che recentaddition of ViAnn Beadle, who has joined che staff as user consuitant. dls. Beadle givescompetent and conscientious attention to whatever questions and problems users may have. Shehas taken an enormous burden off our programming staff, providing us with che time for, amongother things, che completion of this manual.

Page 47: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

xxiv PREFACE

Special thanks must be given to Jae-On Kim. who has contributed greatly to SPSS over aperiod of years; Professor Kim helped design the factor analysis and analysis of varianteroutines, wrote several chapters of chis manual, and regularly consults with us about statisticalmatters. We also wish to thank William R. Klecka, who contributed to che development of thediscriminant function analysis procedure and wrote che associated chapter; William C. Mitchell,who designed che multiple-regression processing algorithms: and Bill Reynolds of che Univer-sity of North Carolina, who designed the SCATTEGRAM procedure. For various improve-ments and corrections we are thankful to David Muxworthy of che University of Edinburgh,David Specht of Iowa State University, Jonathan B. Fry and Michael Klein of Akron Univer-sity, Alfred J. Tuchfarber of che University of Cincinnati, Philip Burns of the University ofIllinois, Cleve Moler of the University of Michigan, Harold W. Gugel of General Motors, andWilliam L. McKeown of Stanford University. Martin L. Levin of Emory University and KisunHan of DUALabs have developed SOS, the SPSS-Override System, which enables SPSS toprocess hierarchical files of che structure found in the Public Use Sample files.t Finally, wethank all those installations and users who have contributed suggestions and procedures. Eventhose which have not been implemented have provided us with ideas and thus have influencedche development of SPSS.

- We are very appreciative of the work of Susan Hull, who prepared all of che more than 500control-card and output examples appearing in this manual, and who implemented the integerversion of the BREAKDOWN procedure. We must add that she performed these tasks as avolunteer, yet was as dedicated and painstaking as any other member of che staff. Another largecontribution to the completion of this manual was made by Karin Donker, who typed andretyped until she nearly went mad, but who nevertheless cheerfully worked evenings andweekends when we had deadlines to meet.

We are indebted to the Vogelback Computing Center at Northwestern University forcontinued support of SPSS and in particular to James Tuccy of that cerner. He not only manasesthe CDC-6000 version of SPSS but also designed and programmed a number of the facilitiesdescribed in chis manual, including the CANCORR. DISCRIMINANT, ONEWAY, andT-TEST procedures. Space prevents�us from individually mentioning each of the other SPSSconversions and the conversion managers, but we are grateful to them, as, we are sure, are theirusers.

We also express our appreciation to those at Stanford University who supported SPSSduring its early development there: the political science department and the Institute of Politica!Studies, Professor Sidney Verba, Director of the Cross-National Program on Political andSocial Change, and Professor Heinz Eulau and Kenneth Prewitt of the City Council ResearchProject. We thank che Stanford Computer Center for donating computer time. We are alsograteful to the many students and researchers both at Stanford and at the University of Chicagowho patiently used test versions of SPSS.

A system like SPSS is, of course, never developed in a vacuum, and we wouid like toacknowledge the contributions made to our general thinking by several previous systems: DataText, developed at Harvard University, and BMD, developed at the University of California atLos Angeles, were especially significant. The factor analysis subprogram was adapted from aprogram developed at che University of Alberca. The Guttman Scale subprogram borrowedheavily from a program originally designed by Professor Ronald Anderson. The output formaland some of che cable statistics used in subprogram CROSSTABS were directly borrowed fromche Data Text system. While we wish to acknowledge our great indebtness to those whodeveloped these programs, we alone are responsible for whatever errors or mistakes are presentin their implementation.

Norman H. Yie

C. Hadlai HutlJean G. Jenkins

Karin SteinbrennerDale H. Sent

'SOS is currently operacional for tBM 3601370 and Univac Series 70 compciers. Fun.Ser information r.,ay be obiained

from Kisun Han oí DUALabs, Suite 900, ¡601 North Kent Street, Arlinuton, Vireinia 22209.

Page 48: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

2 STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

in selecting appropriate statistical procedures. An introduction to che general features andoperation of the SPSS system is presented along with several examples in Sec. 1.3. Theattention of the reader is drawn especially to Sec. 1.4 where suggestions for che use of chis textare given for persons with ;reaten or lesser experience in che use of computers for data analysis.

1.1 COMPUTERE AND THE PROCESS OF INDUCTIVE SOCIALRESEARCH

Whether che intent of the researcher is to construct broad-gauee or middle-range socialtheory or simply to describe social reality, or as in most actual research endeavors, to do a littlebit of each, the intellectual process of inductive social research ideally takes the following form:

1 The researcher beáins with a set of ideas concerning the operation of certain aspects of socialreality. This involves the isolation of variables at che conceptual level and che forrnation ofsome general notions concerning their interrelationships and causal effects upon each other.

2 An empirical data base is generated (or located from existing data files), containing indicatorsof the conceptual variables in which the researcher is interested.

3 The researcher then formulares more concrete hypotheses concerning what pattern of interre-lationships should be found in the empirical indicators if the original ideas abour the operationof social reality were correct.

4 The data are then analyzed using one or more of a variety of statístical techniques in order todetermine whether che expected pattern of relationships can actually be discerned in che data.

Most often it is then discovered that at best che actual patterns in the data only partiallyreflect the original conceptions. There then begins an iterative process in which original concep-tions are altered in lighc of che empirical findings and further analysis is performed to test theseideas. The data suggest new ideas, which in turn, suggest new analyses. This iterative process iscontinued in the hope that the researcher will be able to reach an understanding of interre!ation-ships and cause and effect as they are refflected in the patterns of the data.

The computer has become an indispensable tool in this process of inductive social re-search. First, efficiently desiened computer programs facilitate che movement back and forthbetween the researcher's ideas and the findings from che data, making this process both quickand painless. Second, such programs operating on high-speed computers have yielded anexplosion of statistical capability. This has meant not only che efficient production of traditionaltoois such as crosstabulation Cables, but the availability of complex multivariate statisticaltechniques such as regression analysis. Third. these programs have made it possible ro testsocial theory with data files containine larse numbers of cases and variables, which heretoforewere virtually impossible to handle. v V

The research process is particularly facilitated when the researcher is able to use a unifiedsystem of programs which performs most of the different statistical techniques necessary, andwhich shares a common set of conventions regardine the way in which the user interaccs withthe programs. If well designed, the system permits the user to execute a sequence of tasks with amínimum of manual intervention, data handling, etc. The SPSS system is such a set of relatedprograms for che manipulation and statistical analysis of many types of data, with a particularemphasis on the needs of the social sciences. Subsequently, we will refer to the programs of thesystem as subprograms, or procedures. Once the user has entered che raw data finto thesystem, che computer can be instructed to carry out a variety of related tasks in any sequence checircumstances dictate. It is not necessary for *. he user ro reer,ter the data at any time, since chesystem will store and retrieve the appropriate data when required.

While an attempr has been made to inc!ude in che SPSS system a number of che mostcommonly used procedures in social science data analysis, it is possible co retrieve data from che }system so that it can be used with some other program. Also, SPSS itself can be extended byexperienced computer programmers to inciude procedures which have not alreadc been pro-vided.

Page 49: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

AN INTRODUCTION TO COMPUTING WITH SPSS 3

SPSS provides a set of common conventions for use of its various subprograms. This setof conventions constitutes a simplified language, corresponding closely co the natural languagea social scientist might use to describe the operations to be performed on the data.

1.1.1 STATISTICAL ANALYSIS ON COMPUTERS: USE ANO ABUSE

As we have suegested, statistical systems like SPSS can be important tools in the researchprocess because they provide simple and rapid access to the researcher's data and make availa-ble a wide variety of statistical techniques. However, precisely because of their power they maybe easily abused as shown in the following two ways:

1 Ease of access often ,neans overaccess. Modem computing packages have made it so easy toproduce large amounts of information on a single pass of a data file that even the experiencedresearcher is tempted to go on "grand fishing expeditions,'' substituting the crudest forro ofempiricism for the careful interaction of concepts, hypotheses, and data analysis. With SPSS,for example, the following crosstabulation statement will produce one crosstabulation table:

CROSSrABS TABLES • A BY Z

At the same time literally hundreds of tables may be introduced by the following card:

CAOSSTA85 TAB LES - A TO ¿ BY A TO Z

Here a separate table will be generated for every combination of variables between A and Z;chis might mean hundreds of rabies which, while taking the computer only a few seconds toproduce, would require a lifetime to examine with any degree of tare or thought. Essentially,there are two inherent problems here that significandy impede che development of anymeaningful type of social research. First, there is the problem of overproduction of informa-tion. How does one look at 500 crosstabulation tables? Where does one begin to cut into sucha mountain of information or begin to digest the significante of even a small portion of it?Second, if one generares enough information, one is bound by che laws of probability to havesome statistically significant findings. For example, a 100 x 100 correlation matrix generatedfrom a Pile of totally random numbers produces almost 5,000 unique correlation coefficients.Within this correlation matrix there will likely be 5 large correlation coefficients significant atthe .001 leve!, 50 at the .01 leve!, and 250 at che .05 leve!. The best rule-of-thumb here is torequest only those tables, coefficients, etc., for which you have some theoretical expectationsbased upon the hypotheses in your research desien.

2 Uninformed use of che available statistical techniques. The wide dissemination of statisticalpackages such as SPSS, containing large numbers of complex statistical procedures, have,almost overnight, made these techniques available to che social science community. There islittle doubt that social scientists are using them, and diere is equally little doubt that in manyinstantes statistical techniques are being utilized by both students and researchers who under-stand neither che assumptions of the methods nor their statistical or mathernatical bases. Therecan also be little doubt that this situation leads to some "garbage-in, garbage-out" research.The statistical procedures in SPSS have little ability to distinguish between proper and É..improper applications of the statistical techniques. They are basically blind computationalalgorithms that apply their formulas to whatever data the user enters. For example, if sodirected, subprogram FACTOR will perform a factor analysis on nominal data as long as tt isnumerically coded. However, as is emphasized in che next section, entering nominal vari-ables into such statistical procedures as factor analysis will almost always produce meaning-less results, though this may nor be immediately apparent from looking at che printed output.

The general rule is that a user should never attempt to use a sratistical procedure unless heunderstands both the appropriate procedure for che type of data and also che meaning of thestatistics produced. We do nor mean to say that a researcher should nor use factor analysis if hepersonally cannot invert a matrix or extract an eigenvalue. On che other hand, we are equallyconvinced thac the basic ideas of principal componente and factors as bes( linear combinations of

Page 50: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

4 STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

variables and the general oeometric mechanisms behind factor rotation are an absolute requisite- for (he successful�use of factor analysis in research.

Associated with each statistical procedure described in this manual is a basic introductionto the statistical fundamentals underlving the technique. In general, the more complex thetechnique the longer the introduction. Researchers wishine co use a technique with which ,hev arenot totally familiar should read carefully these introductions, and they should also refer to the basicstatistical texts cited in these chapters.

1.2 USING THE STATISTICAL CAPABILITIES OF SPSS

The statistical techniques used in the social sciences differ no( only in the nature of theresearch questions that they are designed to answer, but also in the nature of the data to whichthey may be applied. The leve) of measurement of each of the variables in die user's data set isthe most basic information that a researcher must have before selecting the statistical techniquesthat will be applied to the data.

1.2.1 A NOTE ON LEVELS OF MEASUREMENT

WVhen data are being collected, the process of assigning a value or score to che observedphenomenon constitutes the process of measurement. The rules defining the assignment of an

appropriate value determine the leve! of measurement. The different levels are distinguished

on the basis of che orderins and distance properties inherent in the measurement rules. Knowl-

edse of these rules and their implications is important to che user of stadstics because each

statistical technique is appropriate for data measured only at certain levels. The computer does

mor know what leve¡ of measurement underlies che numbers it receives, and will processwhatever numbers are fed finto it. Thus, it is up to the usen to determine whether a particular

technique is suitable for his or her data.The traditional classification of levels of measurement was developed by S. S. Stevens

(1946). He identified four levels: nominal, ordinal, intervai, and ratio. This remains the basictypology that every user of statistics should know. Other variations exist, however, and severa¡issues concerning the statistical effect of ignoring levels of measurement are still being debatedby social scientists. Attention will be paid to these matters at the end of this section.

1.2.1.1 Nominal - Leves Measurement

The nominal level of measurement is the "lowest" in Stevens' typology, because it

makes no assumption whatever about the values being assigned to the data. Each value is a

distinct catesory, and the value itself serves merely as a label or name (hence. "nominal" level)for the category. t No assumption of ordering or distances between categories is made. Forinstance, the city where a person was bom is a nominal variable. There is no inherent orderins

amons sities implied by such a variable. Althoush we could order cides according to their size.

density, or degree of air pollution, those are quite different concepts from "place of birth."

When we attach numeric values to nominal categories, we are using numbers merely as symbols

that are easily read by the computer. The properties of the real number system. for example, being

able to add and multiply numbers, etc., cannot be transferred to these numerically coded

categories. Therefore, statistics that assume ordering or meaningful numerical distances hetween

the catezories should not be used.

1.2.1.2 Ordinal- Level Measurement

When it is possible to rank-order all of the catesories according lo some criterion, tren che

ordinal level of measurement has been achieved. For instance, (be classification of social classes

'Keep in mind that any Yatid measurement cheme r_qui:es that the assienment rules are inciuslve ¡¡id ;nutually

exclusive . That is. each possible case can be assiened to ore an<t ovh ore di,tinct vaiue.

Page 51: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

AN INTRODUCTION TO COMPUTING WITH SPSS 5

as workina, middle, and upper is ordered according to status. Each cateaory has a uniqueposition relative to the other categories, that ¡s, it is lower in value than some catecories and

hiaher than others unless, of course, it happens to be the lowest or hiehest categorv. Further-

more, knowing that middle class is hiaher than working class and that upper class is hiaher thanmiddle class automatically tells us that the upper class ís higher in the ordering than the workini

class. However, we do not know how much lower the middle class ¡s, relative to the upperclass. All we know is that it is lower; we do not know the distante. The characteristic ofordering is the sole rathematical property of this level, and the use of numeric values assymbols for category narres does not imply that any other properties of the real number systemcan be used to summarize relationships of an ordinal-leve! variable.

1.2.1.3 Interval-Level Measurement

In addition to ordering, the interval leve! of measurement has the property that thedistantes between the categories are defined in tercos of fixed and equal units. A thermometer,for instante, records temperatures in ternas of degrees, and a single degree implies the sameamount of heat whether the temperatura is at the lower or the upper end of the scale. Thus, thedifference between 30 and 31°F ¡s (he same as the dífference between 80 and 81'F. Theimportant thing to note is that an interval scale does not have un inherently determined zeropoint. In the Fahrenheit and Centigrade systems, zero desrees is determined by an agreed-upondefinition. Neither implies the absence of heat. Consequently, interval-level measurement al-lows us to study differences between things but not their proportionate magnitudes. That ¡s. itwould be incorrect to say that at 80'F twice as much heat is present as at 40'F.

In social research, it is very difficult to find true interval-Icvel measures. Usually, ifdistances between cateeories can be measured by some fixed unit, a natural zero point can alsobe established. Yet, a great many statistics assume no more than an ordinal leve¡ of measure-ment. What must be kept in mind is that statistics developed for one leve! of measurement canalways be used with hiaher-(arel variables, but nor wjrh variables measared al a losarleve!. The median, for example, assumes an ordinal leve¡ of measurement, but ir can be usedlegitimately with interval- or ratio-level scales; it cannot, however. be applied to variablesmeasured at the nominal level.

1.2.1.4 Ratio-Level Measurement

The ratio leve! of measurement has ail of the properties of an interval scale with theadditional property that the zero point is inherently defined by the measurement scheme. Thus,when we measure physical distances, whether we use feet or meters, a zero distante is naturallydefined: It is the absence of any distance between two objects. This property of a fixed and givenzero point means that ratio comparisons can be rnade, as well as distance comparisons. Forexample, it is quite meaningful to say that a 6-foot-tali man is twice as tall as a 3-foot-tal! boy.

Since ratio-level measurements satisfy all the properties of the real number system, (he [numbers employed to describe the cases are more convenient symbols. Any mathematicalmanipulations appropriate for real numbers can also be applied to ratio-level measures. Al-though this leve¡ of measurement is common in social research, very few statistics require al! ofits properties; however, it is important to remember that all statistics requiring variables rneas-ured at the interval leve¡ are also appropriate for use with variables measured at the ratio leve!.

1.2.1.5 The Special Case of Dichotomies

A dichotoniv is a variable with only two possible categories or values. sucli as sex (malaor female). While some dichotornies are based on a natural ordering (passing a course versusfailing it), many have no inherent basis on which either categorv could be judged superior,preferable, larger, etc. Yet, anv dichotomv can be treated as thou_h it viere an interval-levelmeasure and in some cases even a ratio-level variable.

Althouah a rank order may not be inherent in the cate?ory definitions, either arrangcmentof the categories satisfies he mathematical requirements of ordering. (tt does not matter which

Page 52: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

6 STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

end oí a ranking is considered "high" and which is ''low.") The requirement oí a distantemeasure based on equal-sized intervals is also satisfied because there is only one intervalnaturally equal to itself. Consequently, a dichotomy can be treated as either a nominal, ordinal,or interval-level measure, depending upon the research situation.

1.2.1.6 Other Typologies for Leves of Measurement

A simpler scheme than Stevens' is to divide variables finto quanritative and qualitativetypes. Quantitative variables are [hose for which a fixed unir oí measurement is defined-essentially, variables at the interval and ratio levels. These are the variables for which themost powerful and sophisticated techniques have been developed. Qualitative variables, then,are ala others-namely, those at the nominal or ordinal level.

Coombs (1953) has expanded upon Stevens' four-level typology by adding two morelevels. The partially ordered level falis between nominal and ordinal. lt appiies to situationswhere an orderino can be defined between some oí the categories but not over ala oí them. Oígreater interest to social scientists is the ordered metric level. Falling between the ordinaland interval levels, an ordered metric consists of ordered categories where the relativeordering of the intercategory distances is known even thou¢h their absolute magnitude cannot bemeasured. For example, consider the rating oí a person's ability (o read a foreign language as(A) no ability, (B) able to read with the assistance oí a dictionary, and (C) able to read withoutassistance. Although there is no way to ascertain Ihe distantes between A, B, and C, it could beargued that B is closer to C than to A, because a type B person can translate and understandwritten material in that laneuage. Thus, we could rank the distantes between the categories asBC beine the smallest, followed bv AB, with AC as the largest.

Abelson and Tukey (1959) argue that the proper assignment of numeric values to thecateeories oí an ordered metric scale will allow it to be treated as thouóh it viere measured at theinterval leveL Labovitz (1970) goes further by arguin_ that, except for extreme situations,interval statistics can be applied tocr)' ordinal-level variable. He argues, "Although some smallerror may accompany the treatment oí ordinal variables as interval, this is offset by (he use ofmore powerful, more sensitive, better developed, and more clearly interpretable statistics withknown sampling error. - Statistical purista disagree with some or ala oí these suggestions , butmore and more data analysts are following them, especially when the research is exploratory orheuristic in nature. Whatever position the user adopts, it remains his responsibility to select anappropriate statistic and to interpret the resulta in light of che nature oí the data.'

There is no unique method for classifving the different types oí statistical proceduresincluded in SPSS. One distinction, based on levels oí measurement, is between parametric (orquantitative) and nonparametric (or qualitative) statistics. Nonparametric statistical proceduresrequire few assumptions about the distribution or leve¡ of measurement oí the variables and maybe applied to nominal and ordinal data. The parametric procedures, on the other hand, theoreti-cally require more stringent assumptions concerning the distribution of the data (usually anassumption oí normality), and they are desivned primarilv for data at an interval or ratio level ofmeasurement. While (he statistical procedures in SPSS can be catalogued according to (hisrubric (for example, Spearman versus Pearson correlation, n-dimensional crosstabulation versuspartial correlation and multiple regression, Guttman scaling versus factor analysis. etc.), [heseassumptions are so often violated (often with justifiable reasons) during the process of dataanalysis that their utility is Cuestionable.

1.2.2 STATISTICAL PROCEDURES IN SPSS

SPSS contains many oí the most common statistical procedures eraplo%.ed by social

scientists, but it is by no means exhaustive oí the mane useful procedures that llave been

`For in extended arrurnent aauinst strict and btind adherence to retes lin.king specific statistics to paricuiar le<els of

measuze nent, see Labovitz (1972-1. He also argues that the le el of measurement for a concept can often he impro,ed by

reconceptualizins (he way in wñich it is operationalized (measured)

Page 53: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

AN INTRODUCTION TO COMPUTiNG WITH SPSS 7

invented for social research or that have come from other fields to Che social sciences. Thechoice of statistical procedures in SPSS has been determined by our examination of Che amountof use they receive in day-to-day statistical analysis and, of course, by the e.xigencies of timeand resources.

In presenting the statistical procedures contained in SPSS, we will start with those Chal the }researcher often begins with and then proceed through the various types of procedures accordingto increasing leve) of complexity and sophistication. No single research endeavor would nor-mally employ all or even a larse number of these procedures, but it will often be the case that atleast one procedure from each of the groups will be employed at some point during the analysis.

`

1.2.2.1 One-Way Frequency Distributions , Measures of Central Tendency andDispersion

In most types of social science research, the first task of data analysis is to examine (hedistributional characteristics of each of the independent and dependent variables under investí-aation . SPSS contains tuvo statistical procedures for Chis purpose. (1) CONDESCRIPTI VE calcu-lates numerous common measures of central tendency and dispersion for interval-scale variablesthat assume a lame number of values. (2) FREQUENCIES calculates similar types of descrip-tive statistics and generales tabular reports of absolute and elative simple-frequency distrihu-tions for use with variables that assume only a limited number of values. An exaniple of Che typeof variable for which CONDESCRIPTI VE is appropriate would be income measured in dollars,which can assume a continuum of values. FREQUENCIES would be applicable to a measure ofincome when Che information has been grouped , such as, $O-S3,000, S3.001-55,000,55,001-510,000, 510,001+. The latter procedure can also produce descriptive frequency dis-tributions for nominal variables, such as religious affiliation, race , or political party affiliation.

Both CONDESCRIPTIVE and FREQUENCIES can produce statistics such as Che mean,mode, minimum, maximum, standard deviation, variance, skewness, kurtosis, and range, at cheuser's discretion. FREQUENCIES can also be used to produce histogram plots, and allows theuser to select from a variety of tabular formats for distribution Cables. CONDESCRIPTIVE willoptionally punch or write che standardized values of variables,Z scores, for al¡ cases in Che Ci te.These standardized variables can be reentered and merged with the variables in the SPSS file ona subsequent run.

1.2.2.2 Table Displays of Relationships Between Two or More Variables

Alter the researcher understands Che characteristics of each of Che variables, he normally �..begins to investigate sets of relationships. One or more procedures for e.xamining relationshipswill be selec ed depending upon che- characteristics of Che variables and the purposes of Cheresearcher. The researcher may choose correlation analysis or sorne form of table display suchas those discussed in chis section, particularly if Che variables are nominal or ordinal and areclassified loto a limited number of categories.

SPSS procedure CROSSTABS permits the user to produce two-way to rr-way crosstabula-tíons of variables and to, compute a variety of nonparametric statistics based on these tables.CROSSTABS produces a sequence of ovo-way Cables displaying the joint frequency distributionof two variables. The frequency counts can be expressed as a percentage of Che row total,column total , Cable total, or any combination thereof. The statistics available to measure Chedegree of association of the two variables based on Che distribution of frequency counts in Chetable include chi-square, Cramer's V, Kendall's tau B and C, gamma statistics, and Somer's D.For n-way crosstabulations, a sequence of such tuvo-way Cables is produced, ene for eachtwo-dimensional subsection of the n-dimensional table.

Another technique for examining Che relationship between two or more variables in a cableformat is provided by the BREAKDOWN procedure. This procedure, which requires that Chedependent variable be at least ordinal in scale, compiles Che means, standard deviations, andvariances of a criterion or dependent variable for each desired subgroup in a sample or popula-tion. In many respects chis operation is analogous to crosstabulations of Che rape produced bv

Page 54: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

8 STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

CROSSTABS, only in this case, each mean and standard deviation suinmarizes the distr:butionof a complete row or column of a crosstabulation Cable. Also in Chis case. the means, etc., ofeach group within groups are available on a single table. The user may enser up to six variablesinto a single BREAKDOWN table. BREAKDOWN optionaily computes a one-way ANOVAtable including a test for linearity.

1.2.2.3 Bivariate Correlation Analysis and Scatterplots

COrrelariOlt analysis provides Che researcher with a technique for measuring Che linearrelationship between two variables and produces a single summary statistic describine thestrength of Che association; this statistic is known as the c•orrelution cocjjic•ient. SPSS has twoprograms for computing correlations. PEARSON CORR produces zero-order or pruduct-moment correlation coefficients (Pearson's R) that are best suited for normally distributed datawith an interval scale. NONPAR CORR. suitable for ordinal data ,vith a larger number oícategories than would be appropriate for crosstabulation Cables. cnables Che user to computeeither Spearman or Kendall rank-order correlation coefficients. or both. Both PEARSONCORR and NONPAR CORR can produce correlations for selected pairs or lists oí` variables aswell as complete matrices of coefficients. The output from both subprograms provides thecorrelation coefficient, the number of observations upon which Che correlation was based• andChe leve) of statistical significante of the coefficient. In addition, each procedure provides forthe output of correlation matrices that may be used when applying multivariate statisúcaltechniques.

Though bivariate correlation analysis provides a single summary statistic •iescribing Cherelationship between two variables, there are numerous instantes when the researcher may wishlo examine such a relationship in grcater detall. Subprograrn SCATTERGR1 provides thiscapability by producing a scatterplot diagram of the relationship between tuco variables. Thetotal correlational pattern may thus be visually inspected. In addition to the plot itself. ChePearson correlation, and che standard error of estimase, the regression intercept and siope arealso available at the user's request.

1.2.2.4 Partial Correlation

Partial correlarion provides a single measure of association (Che partial-correlationcoefficient) describine the linear relationship between tuvo variables .chile adjusting or control-ling for Che effects of one or more additional variables. In Chis respect, partial correlation isanalogous to n-dimensional crosstabulation for continuous variables. Fi-si- to nth-orderpart ial-correlation coefficients can be obtained for anv set of variables with the PARTIALCORR procedure. This program can operate on raw data or from matrices of simple corelatíoncoefficients, such as may be produced by a previous run of PEARSON CORR or NONPARCORR.

Up to five orders of partials can he sinwltaneously computed for any set of variables, andthe user has total control over the orders and Che partials to be computed. Output from chisprocedure includes the partial-correlation coefficients, and Che level of statistical significanteand degrees of freedom for each partial. The zero-order corelations, means. and standarddeviations of the variables may also be obtained. The user may alzo opdonally request theoutput of zero-order correlation matrices for further computation.

1.2.2.5 Multiple Correlation and Regression

.Llultiple regression is an estension of Che bivariate correlation coefficient te multisariare [[

analysis. \tultiple regression allows Che researcher ro studv the linear relationship between a set

of independent variables and a dependent variable while taking roto iccount Che intterrelation-ships among the independen! variables. The basic _oat of ntultiple regression is to produce alinear combination of independent variables which wilI correlate as hiuhiv as possible with Chedependent variable. This linear combination can then he used co ' predtct' ,alues of Che

Page 55: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

AN INTRODUCTION TO COMPUTING WITH SPSS 9

dependent variable, and the importante of each of the independent variables in that predicrioncan be assessed.

A variety of multiple-regression calculations can be accomplished with the use of theREGRESSION procedure. This subprogram, like PARTIAL CORR, can operate either on rawdata or a matrix of correlation coefficients. The user can perform the reoression upon a fixednumber of variables or, using a forward-selection stepwise technique, a!low che variables to beintroduced roto che computation sequentially depending upon their explanatory power. RE-GRESSION also allows the user to perform a regression procedure midway between these twoextremes; he can allow the program to choose the order of introduction of the variables from acertain set, then force certain other variables finto the calculation, then proceed stepwise for aperiod, and so forth. This flexibility, together with the ability of SPSS to transform variables,allows the user to handle most polynomial and dummy-variable multiple-regression applicationswith relative case. Output from the program includes both the standardized and nonstandardizedregression coefficients, their standard error, and the significance leve) of the coefficients.Multiple r, r2, and che siantficance of che regressiun equation are also computed at each srage.The user can also obtainwritten or punched zero-order correlation matrices.

Subprogram REGRESSION also permits the user to write or punch-out a fui] set ofresiduals for each individual case in the file for any set of regression equations. The residual can

_ then, in any subsequent run, be entered finto SPSS as a new variable or group of variables in theanalysis.

1.2.2.6 One-Way to n-Way Analysis of Variante and Covariance

Anal vais of variance is a statistical technique that assesses the effects of one or morecategorical independent variables (factors), measured at any level upon a continuous dependen[variable that is usually assumed to be measured at an interval level. Conceptually, che casesare divided into cate._ories based on their values for each of the independent variables, and thedifferences betweenthe means of these categories on the dependent variable are tested forstatistical significante. The relative effect upon the dependent variables of each of the indepen-dent variables, their combined effects and interactions, may be assessed.

Analysis of variance is in many ways similar to multiple regression and, in fact. the SPSSn-way ANOVA procedure is based on the least-squares general-linear hypothesis approach.Analysis of variance differs from regression insofar as it relaxes the restrictions on levels ofmeasurement of independent variables and provides a convenient way for examining the interac-tion effects of specific combinations of independent variables. In analysis of covariance. the setof independent variables includes both categorical and continuous variables; che continuousvariables (coi aviares) are assumed to be linearly related to the dependen[ variable.

SPSS subprogram ANOVA performs n-way analysis of variance with up ro five factorsand will adjust for up to five covariates. It can handle factorial desions that are unbalanced andcontain some empty cells. In addition, the ANOVA subprogram can present the results ofanalyses of variance and covariance in multiple classification analysis (.kICA) forma[. In addi-tion to the general n-way analysis of variance procedure (ANOVA), SPSS contains the verydetailed one-way analysis of variance subprogram ONEWAY. ONEWAY includes many spe-cial opcional features including tests for trends across categories of the independent variable,user -specified a priori contrasta, a posteriori contrasts, and homogeneity of variantes.

1.2.2.7 Discriminant Analysis

With discriminant nnnh,sis a researcher calculares che effects of a collection of interval-level independent variables on a nominal dependent variable (classification). Linear combina-tions of independent variables that best distinguish between cases in che categories of chedependent variable are found.

SPSS subprogram DISCRI'vIINANT calculates and prints discriminan(-function coeffi-cients and classification-function coefficients. Al! independent variables may be entered finto chediscriminant functions, or, if the user chooses, DISCRIMINANT will operate in a stepwise

Ñ s

Page 56: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

10 STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

mode, entering variables in the order of their explanatory power- The user may control both thenumber of discriminant functions generated and the number of variables entered.

Discriminant scores, the probability of membership in each category of the dependentvariable, and the predicted category may be calculated and printed for each case in the Pile. Thediscriminant scores and the predicted category for each case may be punched or written on anoutput file and may be reentered into SPSS on a subsequent run for further analysis.

1.2.2.8 Gu ttman Scaling

All the statistical procedures previously discussed, with the exception of those used toexamine the characteristics of individual variables, represent different tnethods for examining,explaining, and predicting the relationship between one or more independent variables and adependent variable. In this section and the two sections following, we discuss procedurescontained in SPSS for locatina underiying continuums or variable sets from a larger groups ofvariables.

Gurtman-scale anal vais is a means of analyzino the underlying operating characteristicsof three of more items in order to determine if their interrelationships mees two special proper-ties that define an acceptable Guttman scale-unidimensionality and cumulativeness.

In the SPSS GUTTMAN SCALE procedure the scales are computed by the Goodenoughtechnique. Each item to be included in a scale may have up to three cutting points, and on anindividual scale the ítem is computed for all possible combinations of cutting points specified.The order of items may be automatically determined by the subprogram according to theproportion of the respondents who '`fail" or "reject" items. Aiternatively, the user maypersonally fix the order of items.

In addition to the basic tahle giving the frequencies, errors, and scale types, the user mayrequest a number of statistics that will aíd him in evaluating the scales. Included in the availablestatistics are: (1) the coefficient of reproducibility, (2) the mínimum marginal reproducibility,(3) the percent improvement, and (4) the coefficient of scalability. All these statistics help theuser to determine the quality of the scale. Interitem correlations and part-whole correlations maybe requested also.

1.2.2.9 Factor Analysis

Factor analysis is a much more óeneralized procedure for locating and defining dimen-sional space among a relatively large group of variables. Because of the generality of factoranalysis, it is difficult co present a capsule description of its funciions and appiications. Themajor use of factor analysis by social scientists is to locate a smaller number of valid dimen-sions, clusters, or factors contained in a larger set of independent items or variables. Andviewed from the other side, factor analysis can help determine the degree to which a givenvariable or several variables are part of a common underlying phenomenon.

The SPSS FACTOR procedure can begin with either raw data, a correlation matriz, or afactor matriz. The methods of factoring available are principal-component analysis, alphafactoring, principal-axis factoring, Rao's canonical fac:oring, and image factorin_. The factor-ing procedure can be controlled by specifying the number of iterations to be performed, ifapplicable, the number of factors co be e.xtracted, if applicable, or the mínimum value of aneigenvalue for which a factor will be extracted. Following the factor-extraction phase, rotationsmay be performed- The types of rotations that may be used are varimax, equimax, quartimax.and the direct oblimin method of oblique rotation-

Factor-seale scores for each observation in the lile can aiso be produced, and these scales(which can be written or punched on an output medium of (he user's choice) can be added asnew variables to the user's file on subsequent runs.

1.2.2.10 Canonical Correlation

Multiple regression is a technique for examining the relationships among several inde-pendent variables and a single dependent variable. Factor analysis is a technique of data

Page 57: A P P E N D I X Binfo.igme.es/SidPDF/035000/001/Apendice B/35001_0004.pdf · 2007. 7. 27. · A P P E N D I X B INTRODUCTION TO SPSS AND BMDP STATISTICAL PROGRAMS The complete programs

AN INTRODUCTION TO COMPUTING WITH SPSS 11

reduction that locates fewer underlying dimensions ( hi_her-order variables ) out of a largor poolof variables in which no distinction has been ;nade bet veen independent and dependent vari-ables . Canonical correlation is in some respects a combinarion of rhe two alternare multivariatetechniques. It contains data reduction capabilities similar to factor analysis , but, having requiredthe user to divide the variables into two sets, also assesses rhe relationship between the two setsof factors ( called canonical variares).

In chis way , the researcher is able to conveniently simplify and analvze the re!ationshipbetween a lar _ e number of independent variables and a larse number of dependent variables.More precisely, canonical correlation analysis Cakes as jis basic input two seis of variables, eachof which can be given theoretical meaning as a set , and extracts linear combinations of (hevari ables within each set; each linear combination maximally coi-relates with a correspondinglinear combination from che other set. These linear combinatinns ; irc che canonical variates andcome in associated pairs . Thus , che hi_her - order dimensions are ercared no( on che basis ofaccounting for the maximal v ariance within one set of variables ( as in factor analysis), but onche basis of accounting for a maxiinum amount of che relationship between the tw•o seis ofvariables.

Input to che SPSS CANCORR procedure can be cither raw mira or a correlation matrix.The user may specify che number of pairs of canonical variates to be extracted and the signifi-canee leve ) required for extraction . The procedure automatically outputs che canonical correla-tions, along with tesis of their statistical significance , and che cocfficients of che canonicalvariates . CANCORR will optionally punch or write che values of tito canonical variates for allcases in che fiie. These variates can be reentered ¡neo SPSS as new variables en a subsequentrun.

We have described che principal statistical procedures available within che SPSS system.It is important to realize , however . that these procedures can he exceuted in any sequence, orrepetitively in the course of a single run or session with che contputer. Thus the user may clec[ toperform some crosstabulations . do a multiple regression , and rhen do sume correlations uponthe same file of data in a single run . Also, the procedures described share the generalcapabilities of SPSS for file handling, variable manipulation, and so forth . so that they consti-tute a sequence of steps availahle te) rhe user in any order that malees :cose in (he context of cheproblem . In Sec . 1.3 we discuss some of rhe general capabilities of SPSS that are available inconjunction with any statistical procedure rhe user may specify.