creating summary data sets ron cody, ed.d. robert wood johnson medical school
TRANSCRIPT
![Page 1: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/1.jpg)
Creating Summary Data SetsRon Cody, Ed.D.
Robert Wood Johnson Medical School
![Page 2: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/2.jpg)
Test data set (CLINIC)SUBJECT GENDER AGE_GROUP BLOOD_TYPE HR SBP DBP
1 M 1 A 80 130 80
2 M 1 B 68 128 70
3 M 2 O . 120 72
4 M 1 A 48 140 86
5 F 2 A 56 160 94
6 F 1 B 60 109 64
7 F 2 O 82 118 70
8 F 2 O 64 . 76
9 F 1 A 56 . 88
10 F 1 B 88 188 110
11 M 1 B 64 120 80
12 M 2 B 62 120 76
![Page 3: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/3.jpg)
PROC MEANS DATA=data_set_name NOPRINT;
Is equivalent to
PROC SUMMARY DATA=data_set_name;
PROC MEANS vs. PROC SUMMARY
![Page 4: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/4.jpg)
Creating a SUMMARY Data Set Containing MEANS
PROC MEANS DATA=CLINIC NOPRINT;/****************************************Equivalent to PROC SUMMARY DATA=CLINIC;*****************************************/ CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Listing of data set OUT1
Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP
1 0 12 66.1818 133.300 80.5000 2 F 1 6 67.6667 143.750 83.6667 3 M 1 6 64.4000 126.333 77.3333
![Page 5: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/5.jpg)
Using a BY statement Instead of a CLASS Statement
PROC SORT DATA=CLINIC; BY GENDER;RUN;PROC MEANS DATA=CLINIC NOPRINT; BY GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;
Listing of data set OUT1
Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP
1 F 0 6 67.6667 143.750 83.6667 2 M 0 6 64.4000 126.333 77.3333
![Page 6: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/6.jpg)
Creating a SUMMARY Data Set Containing MEANS
Broken Down by GENDER and AGE_GROUP PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
. 0 12 66.1818 133.300 80.5000 1 1 7 66.2857 135.833 82.5714 2 1 5 66.0000 129.500 77.6000 F . 2 6 67.6667 143.750 83.6667 M . 2 6 64.4000 126.333 77.3333 F 1 3 3 68.0000 148.500 87.3333 F 2 3 3 67.3333 139.000 80.0000 M 1 3 4 65.0000 129.500 79.0000 M 2 3 2 62.0000 120.000 74.0000
![Page 7: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/7.jpg)
Explaining the _TYPE_ Variable
Class Variables Representation
GENDER AGE_GROUP Binary Decimal
0 0 00 0
0 1 01 1
1 0 10 2
1 1 11 3
CLASS GENDER AGE_GROUP;
![Page 8: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/8.jpg)
Demonstrating the NWAY Option
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000
![Page 9: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/9.jpg)
Outputting More than One StatisticPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN =M_HR M_SBP M_DBP N =N_HR N_SBP N_DBP MAX =MAX_HR MAX_SBP MAX_DBP MEDIAN =MED_HR MED_SBP MED_DBP;RUN; GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP N_HR N_SBP
0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6
N_DBP MAX_HR MAX_SBP MAX_DBP MED_HR MED_SBP MED_DBP
12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78
![Page 10: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/10.jpg)
Partial List of Some Available StatisticsKeyword Description________________________________ MEAN MeanN Number of non-missing valuesNMISS Number of missing values MIN Smallest non-missing valueMAX Largest valueMEDIAN MedianRANGE Range - difference between the minimum and
maximum valuesQ1 25th percentileQ3 75th percentileQRANGE Interquartile range
(difference between 25th and 75th percentile)STD Standard deviationSTDERR Standard errorUCLM Upper bound of the 95% confidence interval LCLM Lower bound of the 95% confidence interval
![Page 11: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/11.jpg)
Demonstrating the AUTONAME OUTPUT optionPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN = N = MAX = MEDIAN = / AUTONAME;RUN;
GENDER _TYPE_ _FREQ_ HR_Mean SBP_Mean DBP_Mean HR_N SBP_N
0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6
SBP_ DBP_DBP_N HR_Max SBP_Max DBP_Max HR_Median Median Median
12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78
![Page 12: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/12.jpg)
Another Way of Naming Output Variables
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=;RUN;
Listing of Data Set OUT1
AGE_GENDER GROUP _TYPE_ _FREQ_ HR SBP DBP
F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000
![Page 13: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/13.jpg)
Dropping Unneeded Variables in the Output Dataset
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1(DROP= _:) MEAN=M_HR M_SBP M_DBP;RUN;
Listing of Data Set OUT1
AGE_GENDER GROUP M_HR M_SBP M_DBP
F 1 68.0000 148.5 87.3333 F 2 67.3333 139.0 80.0000 M 1 65.0000 129.5 79.0000 M 2 62.0000 120.0 74.0000
![Page 14: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/14.jpg)
Demonstrating the CHARTYPE Procedure Option
PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Demonstrating CHARTYPE Option
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
. 00 12 66.1818 133.300 80.5000 1 01 7 66.2857 135.833 82.5714 2 01 5 66.0000 129.500 77.6000 F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333 F 1 11 3 68.0000 148.500 87.3333 F 2 11 3 67.3333 139.000 80.0000 M 1 11 4 65.0000 129.500 79.0000 M 2 11 2 62.0000 120.000 74.0000
![Page 15: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/15.jpg)
Demonstrating the CHARTYPE Procedure Option
PROC PRINT DATA=OUT1 NOOBS; TITLE "Demonstrating CHARTYPE Option"; WHERE _TYPE_ EQ "10";RUN;
Demonstrating CHARTYPE Option
AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333
![Page 16: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/16.jpg)
Another Way to Name Variables
(instead of using a VAR statement)PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; ***VAR STATEMENT OPTIONAL; OUTPUT OUT=OUT1 MEAN(HR) =M_HR N(HR SBP DBP) =N_HR N_SBP N_DBP MAX(SBP) =MAX_SBP MEDIAN(SBP DBP) =MED_SBP MED_DBP;RUN;
GENDER _TYPE_ _FREQ_ M_HR N_HR N_SBP N_DBP MAX_SBP MED_SBP MED_DBP
0 12 66.1818 11 10 12 188 124 78 F 1 6 67.6667 6 4 6 188 139 82 M 1 6 64.4000 5 6 6 140 124 78
![Page 17: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/17.jpg)
Multi-way Breakdowns Using a TYPES Statement
PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP BLOOD_TYPE; VAR HR SBP DBP; TYPES GENDER AGE_GROUP*GENDER BLOOD_TYPE*GENDER; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_ BLOOD_
GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333 F . A 101 2 56.0000 160.000 91.0000 F . B 101 2 74.0000 148.500 87.0000 F . O 101 2 73.0000 118.000 73.0000 M . A 101 2 64.0000 135.000 83.0000 M . B 101 3 64.6667 122.667 75.3333 M . O 101 1 . 120.000 72.0000 F 1 110 3 68.0000 148.500 87.3333 F 2 110 3 67.3333 139.000 80.0000 M 1 110 4 65.0000 129.500 79.0000 M 2 110 2 62.0000 120.000 74.0000
![Page 18: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/18.jpg)
Using the _TYPE_ Values to Create Multiple Data Sets
DATA GENDER AGE_BY_GENDER BLOOD_BY_GENDER; SET OUT1; IF _TYPE_ = "100" THEN OUTPUT GENDER; ELSE IF _TYPE_ = "110" THEN OUTPUT AGE_BY_GENDER;RUN; Listing of Data Set GENDER
AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333
Listing of Data Set AGE_BY_GENDER
AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP
F 1 110 3 68.0000 148.5 87.3333 F 2 110 3 67.3333 139.0 80.0000 M 1 110 4 65.0000 129.5 79.0000 M 2 110 2 62.0000 120.0 74.0000
![Page 19: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/19.jpg)
Examples of TYPES Statements
TYPES A A*C D*C; TYPES A*(B C D);TYPES () A A*C*D;
![Page 20: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/20.jpg)
Using PROC FREQ to Count Frequencies
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER;RUN;
Listing of Data Set NUMBER
AGE_GROUP COUNT PERCENT
1 7 58.3333 2 5 41.6667
![Page 21: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/21.jpg)
Renaming the COUNT Variable
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER(RENAME=(COUNT=N_AGE) DROP=PERCENT);RUN;
Listing of Data Set NUMBER
AGE_GROUP N_AGE
1 7 2 5
![Page 22: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/22.jpg)
Using PROC MEANS to Count Frequencies
PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS AGE_GROUP; VAR HR; /* ANY NUMERIC VARIABLE */ OUTPUT OUT=COUNTS(RENAME=(_FREQ_ = N_AGE) DROP=_TYPE_ DUMMY) N=DUMMY;RUN; Listing of Data Set COUNTS
AGE_GROUP N_AGE
1 7 2 5
![Page 23: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/23.jpg)
Using PROC FREQ to Count Frequencies in a Two-way Table
PROC FREQ DATA=CLINIC NOPRINT; TABLES GENDER*BLOOD_TYPE / OUT=FREQOUT(DROP=PERCENT
RENAME=(COUNT=NUMBER));RUN; Listing of Data Set FREQOUT
BLOOD_GENDER TYPE NUMBER
F A 2 F B 2 F O 2 M A 2 M B 3 M O 1
![Page 24: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School](https://reader033.vdocuments.net/reader033/viewer/2022051113/56649c4f5503460f948f6646/html5/thumbnails/24.jpg)
Using PROC FREQ to Output More than One Data Set
PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=OUT1; TABLES GENDER / OUT=OUT2; TABLES GENDER*AGE_GROUP / OUT=OUT3;RUN; Listing of Data Set OUT1
AGE_GROUP COUNT PERCENT
1 7 58.3333 2 5 41.6667----------------------------------------------------------------Listing of Data Set OUT2
GENDER COUNT PERCENT
F 6 50 M 6 50----------------------------------------------------------------Listing of Data Set OUT3
GENDER AGE_GROUP COUNT PERCENT
F 1 3 25.0000 F 2 3 25.0000 M 1 4 33.3333 M 2 2 16.6667