bmtry 789 lecture9: proc tabulate

19
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems - 11.1, 11.2 Homework Due Next Week– HW6

Upload: kane-brock

Post on 30-Dec-2015

39 views

Category:

Documents


2 download

DESCRIPTION

BMTRY 789 Lecture9: Proc Tabulate. Readings – Chapter 11 & Selected SUGI Reading Lab Problems - 11.1, 11.2 Homework Due Next Week– HW6. Ready for a break yet? I AM!. Why Use Proc Tabulate?. SAS Software provides hundreds of ways you can analyze your data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BMTRY 789 Lecture9:  Proc Tabulate

BMTRY 789 Lecture9: Proc Tabulate

Readings – Chapter 11 & Selected SUGI Reading

Lab Problems - 11.1, 11.2Homework Due Next Week– HW6

Page 2: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 2

Ready for a break yet? I AM!

Page 3: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 3

Why Use Proc Tabulate?

SAS Software provides hundreds of ways you can analyze your data.

You can use the DATA step to slice and dice your data, and there are dozens of procedures that will process your data and produce all kinds of statistics.

Odds are that no matter how you organize and analyze your data, you’ll end up producing a report in the form of a table.

This is why every SAS user needs to know how to use PROC TABULATE.

Page 4: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 4

Why Use Proc Tabulate?

While TABULATE doesn’t do anything that you can’t do with other PROCs, the payoff is in the output.

TABULATE computes a variety of statistics, and it neatly packages the results in a single table.

Unfortunately, TABULATE has gotten a bad rap as being a difficult procedure to learn.

If we take things step by step, anyone can learn PROC TABULATE.

Page 5: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 5

One dimensional tables

To really understand TABULATE, you have to start very simply. The simplest possible table in TABULATE has to have three things: a PROC TABULATE statement, a TABLE statement, and a CLASS or VAR statement.

Page 6: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 6

One dimensional tables

The PROC TABULATE statement looks like this:PROC TABULATE DATA=TEMP;

TABLE SCORE1;RUN;

If you run this code as is, you will get an error message because TABULATE can’t figure out whether the variable SCORE1 is intended as an analysis variable, which is used to compute statistics, or a classification variable, which is used to define row or column categories in the table.

wrong

Page 7: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 7

One dimensional tables

In this case, we want to use SCORE1 as the analysis variable.

We will be using it to compute a statistic. To tell TABULATE that SCORE1 is an analysis variable, you use a VAR statement. The syntax of a VAR statement is simple: you just list the variables that will be used for analysis.

So now the syntax for our PROC TABULATE is:PROC TABULATE DATA=TEMP;

VAR SCORE1;TABLE SCORE1;

RUN;

right

Page 8: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 8

One dimensional tables The result is the table shown below. It

has a single column, with the header SCORE1 to identify the variable, and the header SUM to identify the statistic.

There is just a single table cell, which contains the value for the sum of Score1 for all of the observations in the dataset.

score1

Sum

245.27

Page 9: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 9

ADDING A STATISTIC The previous table shows what happens if you don’t

tell TABULATE which statistic to use. If the variable in your table is an analysis variable,

meaning that it is listed in a VAR statement, then the statistic you will get by default is the sum.

Sometimes the sum will be the statistic that you want. Most likely, sum isn’t the statistic that you want.

PROC TABULATE DATA=TEMP;VAR SCORE1;TABLE SCORE1*MEAN;

RUN;score1

Mean

10.22

Page 10: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 10

ADDING ANOTHER STATISTIC Each of the tables shown so far was useful, but the power of PROC

TABULATE comes from being able to combine several statistics and/or several variables in the same table.

TABULATE does this by letting you specify a series of “tables” within a single large table.

We’re going to add a “table” showing the number of observations, as well as showing the mean SCORE.

PROC TABULATE DATA=TEMP;VAR SCORE1;TABLE SCORE1*N SCORE1*MEAN;

RUN;OR

PROC TABULATE DATA=TEMP;VAR SCORE1;TABLE SCORE1*(N MEAN);

RUN;

*NOTE: When SAS is creating a one-dimensional table, additional variables, statistics, and categories are always added as new columns.

score1

N Mean

24 10.22

Page 11: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 11

ADDING A CLASSIFICATION VARIABLE

After seeing the tables we’ve built so far in this chapter, you’re probably asking yourself, “Why use PROC TABULATE? Everything I’ve seen so far could be done with a PROC MEANS.”

One answer to this question is classification variables. By specifying a variable to categorize your data, you can produce a concise table that shows values for various subgroups in your data.

PROC TABULATE DATA=TEMP;CLASS TREAT;VAR SCORE1;TABLE SCORE1*MEAN*TREAT;

RUN;

score1

Mean

treat

1 2

10.71 9.73

Page 12: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 12

TWO-DIMENSIONAL TABLES You probably noticed that our example table is not very elegant in

appearance. That’s because it only takes advantage of one dimension. It has multiple columns, but only one row.

It is much more efficient to build tables that have multiple rows and multiple columns. You can fit more information on a page, and the table looks better, too.

The easiest way to build a two-dimensional table is to build it one dimension at a time. First we’ll build the columns, and then we’ll add the rows.

PROC TABULATE DATA=TEMP;CLASS TREAT;VAR SCORE1;TABLE TREAT, SCORE1*(N MEAN);

RUN;

score1

N Mean

treat

12 10.711

212 9.73

Page 13: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 13

ADDING CLASSIFICATION VARIABLES ON BOTHDIMENSIONS

The previous example showed how to reformat a table from one to two dimensions, but it did not show the true power of two-dimensional tables.

With two dimensions, you can classify your statistics by two different variables at the same time.

To do this, you put one classification variable in the row dimension and one classification in the column dimension.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT, SCORE1*VISIT*(N MEAN);

RUN;

score1

visit

1 2 3

N Mean N Mean N Mean

treat

4 9.58 4 11.69 4 10.861

2 4 9.40 4 9.23 4 10.55

*Notice how the analysis variable SCORE1 remains as the column heading, and N and MEAN remains as the statistics, but now there are additional column headings to show the three categories of VISIT.

Page 14: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 14

ADDING ANOTHER CLASSIFICATION VARIABLE

The previous example showed how to add a classification variable to both the rows and columns of a two-dimensional table.

But you are not limited to just one classification per dimension.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT PTN;VAR SCORE1;TABLE TREAT PTN, SCORE1*VISIT*(N MEAN);

RUN;

*This ability to stack multiple mini-tables within a single table can be a powerful tool for delivering large quantities of information in a user-friendly format (i.e. TABLE 1 of most publications).

score1

visit

1 2 3

N Mean N Mean N Mean

treat

4 9.58 4 11.69 4 10.861

2 4 9.40 4 9.23 4 10.55

ptn

2 10.19 2 10.19 2 11.781

2 2 7.78 2 10.97 2 13.30

3 2 12.60 2 8.79 2 8.69

4 2 7.40 2 11.89 2 9.06

Page 15: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 15

NESTING THE CLASSIFICATION VARIABLES

So far all we have done is added additional “tables” to the bottom of our first table.

We used the space operator between each of the row variables to produce stacked tables.

By using a series of row variables, we can explore a variety of relationships between the variables.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT*SCORE1*VISIT*(N MEAN);

RUN;

treat

1 2

score1 score1

visit visit

1 2 3 1 2 3

N Mean N Mean N Mean N Mean N Mean N Mean

4 9.58 4 11.69 4 10.86 4 9.40 4 9.23 4 10.55

Page 16: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 16

ADDING TOTALS TO THE ROWS As your tables get more complex, you can help make your tables more readable

by adding row and column totals. Totals are quite easy to generate in TABULATE because you can use the ALL

variable. This is a built in classification variable supplied by TABULATE that stands for “all observations.”

You do not have to list it in the CLASS statement because it is a classification variable by definition.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT, SCORE1*(VISIT ALL)*(N MEAN);

RUN;score1

visit

All1 2 3

N Mean N Mean N Mean N Mean

treat

4 9.58 4 11.69 4 10.86 12 10.711

24 9.40 4 9.23 4 10.55 12 9.73

Page 17: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 17

ADDING TOTALS TO THE ROWS AND COLUMNS

Not only can you use ALL to add row totals, but you can also use ALL to produce column totals.

What you do is list ALL as an additional variable in the row definition of the TABLE statement. No asterisk is needed because we just want to add a total at the bottom of the table.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT ALL, SCORE1*(VISIT ALL)*(N MEAN);

RUN;score1

visit

All1 2 3

N Mean N Mean N Mean N Mean

treat

4 9.58 4 11.69 4 10.86 12 10.711

2 4 9.40 4 9.23 4 10.55 12 9.73

All 8 9.49 8 10.46 8 10.71 24 10.22

Page 18: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 18

MAKING THE TABLE PRETTY The preceding examples have shown how to create basic tables. They

contained all of the needed information, but they were pretty ugly. The next thing you need to learn is a few tricks to clean up your tables.

PROC TABULATE DATA=TEMP;CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT ALL=‘Total’, SCORE1*(VISIT ALL=‘Total’)*(N MEAN);

RUN;

score1

visit

Total1 2 3

N Mean N Mean N Mean N Mean

treat

4 9.58 4 11.69 4 10.86 12 10.711

2 4 9.40 4 9.23 4 10.55 12 9.73

Total 8 9.49 8 10.46 8 10.71 24 10.22

Page 19: BMTRY 789 Lecture9:  Proc Tabulate

Summer 2009 BMTRY 789 Intro to SAS Programming 19

MAKING THE TABLE PRETTY Another thing that could be improved about this table is getting rid of the excessive

column headings. Proc Format;

value visit 1='Visit 1' 2='Visit 2' 3='Visit 3';value tr 1='Therapy 1' 2='Therapy 2';

run;PROC TABULATE DATA=TEMP;

CLASS TREAT VISIT;VAR SCORE1;TABLE TREAT=‘ ’ ALL=‘Total’, SCORE1=‘ ‘ *(VISIT=‘ ‘ ALL=‘Total’)*MEAN/ box=‘Average Score’;Format treat tr. Visit visit.;

RUN;

Average ScoreVisit 1 Visit 2 Visit 3 Total

Therapy 19.58 11.69 10.86 10.71

Therapy 29.40 9.23 10.55 9.73

Total9.49 10.46 10.71 10.22