producing descriptive statistics

26
Mainframe Online Training Poldsni Anil Kumar Producing Descriptive Statistics Statistical Analysis System

Upload: tauret

Post on 14-Jan-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Producing Descriptive Statistics. Statistical Analysis System. Producing Descriptive Statistics. Introduction Computing Statistics Using PROC MEANS Creating a Summarized Data Set Using PROC SUMMARY Producing Frequency Tables Using PROC FREQ. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Producing Descriptive Statistics

Mainframe Online Training

Poldsni Anil Kumar

Producing Descriptive Statistics

Statistical Analysis System

Page 2: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 2

Producing Descriptive Statistics Introduction

Computing Statistics Using PROC MEANS

Creating a Summarized Data Set Using PROC SUMMARY

Producing Frequency Tables Using PROC FREQ

Page 3: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 3

Introduction

PROC REPORT is the ability to summarize large amounts of data by producing descriptive statistics. However, there are SAS procedures that are designed specifically to produce various types of descriptive statistics and to display them in meaningful reports

If the data values that you want to describe are continuous numeric values , then you can use the MEANS procedure or the SUMMARY procedure to calculate statistics such as the mean, sum, minimum, and maximum.

If the data values that you want to describe are discrete then you can use the FREQ procedure to show the distribution of these values.

Page 4: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 4

Computing Statistics Using PROC MEANS(Page 1)

The MEANS procedure provides descriptive statistics such as the mean, minimum, and maximum provide useful information about numeric data.

Procedure Syntax

PROC MEANS <DATA=SAS-data-set>

<statistic-keyword(s)> <option(s)>;

RUN;

Where

SAS-data-set is the name of the data set to be used

statistic-keyword(s) specify the statistics to compute

option(s) control the content, analysis, and appearance

Example

proc means data=perm.survey;

run;

Page 5: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 5

Computing Statistics Using PROC MEANS(Page 2)

Selecting Statistics

Consider that you want to see the median and range of Perm. Survey numeric values, add the MEDIAN and RANGE keywords as options.

Example

proc means data=perm.survey median range;

run;

The following keywords can be used with PROC

MEANS to compute statistics:

Keyword DescriptionMAX Maximum value

MEAN Average

MODE Value that occurs most frequently

MIN Minimum value

VAR Variance

Page 6: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 6

Computing Statistics Using PROC MEANS(Page 3)

Limiting Decimal Places

To limit decimal places, use the MAXDEC= option in the PROC MEANS statement, and set it equal to the length that you prefer.

Syntax

PROC MEANS <DATA=SAS-data-set>

<statistic-keyword(s)> MAXDEC=n;

where n specifies the maximum number of decimal places.

Example

proc means data=clinic.diabetes min max maxdec=0;

run;

Page 7: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 7

Computing Statistics Using PROC MEANS(Page 4)

Specifying Variables in PROC MEANS

To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable names.

General form, VAR statement:

VAR variable(s);

Example

proc means data=clinic.diabetes min max maxdec=0;

var age height weight;

run;

In addition to listing variables separately,

you can use a numbered range of variables

var item1-item5;

Page 8: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 8

Computing Statistics Using PROC MEANS(Page 5)

Group Processing Using the CLASS Statement

To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure.

General form, CLASS statement:

CLASS variable(s);

where variable(s) specifies category variables for group processing.

CLASS variables can be either character or numeric, but they should contain a limited number of discrete values that represent meaningful groupings.

Example

proc means data=clinic.heart maxdec=1;

var arterial heart cardiac urinary;

class survive sex;

run;

Page 9: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 9

Computing Statistics Using PROC MEANS(Page 6)

Group Processing Using the BY Statement

Like the CLASS statement, the BY statement specifies variables to use for categorizing observations.

General form, BY statement:

BY variable(s);

Difference between BY and CLASS

1.Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables.

2.BY group results have a layout that is different from the layout of CLASS group results. Note that the BY statement in the program below creates four small tables; a CLASS statement would produce a single large table.

Page 10: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 10

Computing Statistics Using PROC MEANS(Page 7)

Example for BY statement.

proc sort data=clinic.heart out=work.heartsort;

by survive sex;

run;

proc means data=work.heartsort maxdec=1;

var arterial heart cardiac urinary;

by survive sex;

run;

Page 11: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 11

Computing Statistics Using PROC MEANS(Page 8)

Creating a Summarized Data Set Using PROC MEANS

You might want to create an output SAS data set that contains just the summarized variable.

General form, OUTPUT statement:

OUTPUT OUT=SAS-data-set statistic=variable(s);

where

OUT= specifies the name of the output data set

statistic= specifies the summary statistic written out

Page 12: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 12

Computing Statistics Using PROC MEANS(Page 9)

Specifying the STATISTIC= Option

You can specify which statistics to produce in the output data set. To do so, you must specify the statistic and then list all of the variables. The variables must be listed in the same order as in the VAR statement. You can specify more than one statistic in the OUTPUT statement.

proc means data=clinic.diabetes;

var age height weight;

class sex;

output out=work.sum_gender

mean=AvgAge AvgHeight AvgWeight

min=MinAge MinHeight MinWeight;

run;

To see the contents of the output data set,

submit the following PROC PRINT step.

PROC MEANS in SAS LISTPROC MEANS in SAS LIST

PROC MEANS OUTPUT TO SAS DATASETPROC MEANS OUTPUT TO SAS DATASET

Page 13: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 13

Computing Statistics Using PROC MEANS(Page 10)

Creating only the output data set

You can use the NOPRINT option in the PROC MEANS statement to prevent the default report from being created. For example, the following program creates only the output data set:

Example

proc means data=clinic.diabetes noprint;

var age height weight;

class sex;

output out=work.sum_gender

mean=AvgAge AvgHeight AvgWeight;

run;

Page 14: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 14

Creating a Summarized Data Set Using PROC SUMMARY

You can also create a summarized output data set by using PROC SUMMARY.

The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.

Example

proc summary data=clinic.diabetes print;

var age height weight;

class sex;

output out=work.sum_gender

mean=AvgAge AvgHeight AvgWeight;

run;

Page 15: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 15

Producing Frequency Tables Using PROC FREQ (Page 1)

The FREQ procedure is a descriptive procedure as well as a statistical procedure. It produces one-way and n-way frequency tables,.

You can use the FREQ procedure to create cross-tabulation tables that summarize data for two or more categorical variables by showing the number of observations for each combination of variable values.

General form, basic FREQ procedure:

PROC FREQ <DATA=SAS-data-set>;

RUN;

By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and

cumulative percent of every

value of all variables in a data set.

Page 16: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 16

Producing Frequency Tables Using PROC FREQ (Page 2)

Example

For example, the following FREQ procedure creates a frequency table for each variable in the data set Parts. Widgets. All the unique values are shown for ItemName, LotSize, and Region.

proc freq data=parts.widgets;

run;

Page 17: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 17

Producing Frequency Tables Using PROC FREQ (Page 3)

Specifying Variables in PROC FREQ

By default, the FREQ procedure creates frequency tables for every variable in your data set.

To specify the variables to be processed by the FREQ procedure, include a TABLES statement.

Syntax

TABLES variable(s);

where variable(s) lists the variables to include.

Page 18: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 18

Producing Frequency Tables Using PROC FREQ (Page 4)

Example

Consider the SAS data set Finance.Loans. The variables Rate and Months are best described as categorical values, so they are the best choices for frequency tables.

proc freq data=finance.loans;

tables rate months;

run;

Page 19: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 19

Producing Frequency Tables Using PROC FREQ (Page 5)

Creating Two-Way Tables

It is often helpful to crosstabulate frequencies with the values of other variables. For example, census data is typically crosstabulated with a variable that represents geographical regions.

Syntax

TABLES variable-1 *variable-2 <* ... variable-n>;

where (for two-way tables)

variable-1 specifies table rows and variable-2 specifies table columns.

When crosstabulations are specified, PROC FREQ produces tables with cells that contain

column cell frequency

cell percentage of total frequency

cell percentage of row frequency

cell percentage of frequency.

Page 20: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 20

Producing Frequency Tables Using PROC FREQ (Page 6)

For example, the following program creates the two-way table shown below.

proc format;

value wtfmt low-139='< 140'

140-180='140-180‘

181-high='> 180';

value htfmt low-64='< 5''5"'

65-70='5''5-10"'

71-high='> 5''10"';

run;

proc freq data=clinic.diabetes;

tables weight*height;

format weight wtfmt.

height htfmt.;

run;

Tables

LEGEND BOX

Tables

LEGEND BOX

Page 21: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 21

Producing Frequency Tables Using PROC FREQ (Page 7)

Creating N-Way Tables

For a frequency analysis of more than two variables, use PROC FREQ to create n-way crosstabulations. A series of two-way tables is produced, with a table for each level of the other variables.

For example, suppose you want to add the variable Sex to your crosstabulation of Weight and Height in the data set Clinic.Diabetes. Add Sex to the TABLES statement, joined to the other variables with an asterisk (*).

Example

tables sex*weight*height;

The order of the variables is important. In n-way tables, the last two variables of the TABLES statement become the two-way rows and columns.

Page 22: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 22

Producing Frequency Tables Using PROC FREQ (Page 8)

Example

proc format;

value wtfmt low-139='< 140‘

140-180='140-180‘

181-high='> 180';

value htfmt low-64='< 5''5"'

65-70='5''5-10"‘

71-high='> 5''10"';

run;

proc freq data=clinic.diabetes;

tables sex*weight*height;

format weight wtfmt. height htfmt.;

run;

Page 23: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 23

Producing Frequency Tables Using PROC FREQ (Page 9)

Changing the Table Format

CROSSLIST option to your TABLES statement displays cross-tabulation tables

proc format;

value wtfmt low-139='< 140‘

140-180='140-180‘

181-high='> 180';

value htfmt low-64='< 5''5"'

65-70='5''5-10"‘

71-high='> 5''10"';

run;

proc freq data=clinic.diabetes;

tables sex*weight*height/crosslist;

format weight wtfmt. height htfmt.;

run;

Page 24: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 24

Producing Frequency Tables Using PROC FREQ (Page 10)

Creating Tables

When three or more variables are specified, the multiple levels of n-way tables can produce considerable output. Such bulky, often complex crosstabulations are often easier to read as a continuous list in List Format

Syntax

TABLES variable-1 *variable-2 <* … variable-n> / LIST;

Example

proc format;

value wtfmt low-139='< 140'

….;

run;

proc freq data=clinic.diabetes;

tables sex*weight*height / list;

format weight wtfmt. height htfmt.;

run;

Page 25: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 25

Producing Frequency Tables Using PROC FREQ (Page 11)

Suppressing Table Information

NOFREQ suppresses cell frequencies

NOPERCENT suppresses cell percentages

NOROW supresses row percentages

NOCOL suppresses column percentages.

Example

proc freq data=clinic.diabetes;

tables sex*weight / nofreq norow nocol;

format weight wtfmt.;

run;

Page 26: Producing Descriptive Statistics

Mainframe Online Training

SAS Presentation | IBM Confidential | Apr 21, 2023 26