sas programming basics: producing results from your datasas® programming basics: producing results...

16
1 SAS ® Programming Basics: Producing Results from Your Data Helen Carey, Carey Consulting, Kaneohe, HI ABSTRACT This session explores how to choose the most appropriate tool for turning your data into results and tailoring the output to the needs of your intended audience. SAS procedures are designed for easy reporting and analysis of SAS data. This session will look at some of the basic reporting procedures, such as PRINT, MEANS, UNIVARIATE and FREQ. The procedures SQL, REPORT and TABULATE will be used to produce summary reports. PROC FORMAT will be used with the reporting procedures to customize the appearance of data values and to group observations and values. Using the Output Delivery System (ODS) to change the destination and look of your reports, along with ODS Graphics to produce graphs will be explored. INTRODUCTION Once data has been stored in a SAS data set, you can produce results quickly using SAS’s library of built-in programs known as SAS procedures. Base SAS procedures use data values from SAS data sets to do a variety of tasks, such as creating and printing reports, performing statistical analysis, summarizing data, producing graphics , manipulating data, managing data libraries and even creating SAS data sets. You can choose the destination of your results, the type of information that is produced, and the appearance of your reports using the options and statements available with the individual procedure. Also ODS, the Output Delivery System, gives you a lot of flexibility in how your output looks and where and in what form it is written. It is easy for people feels dread when they see a stack of papers loaded with numbers, spreadsheets and graphs. Remember that the readers of your results are depending upon you to make sense of the data for them. One of the main purposes of your analysis of the data and the production of reports is to assist thinking and communicating your results in a clear easy-to-understand form. SAS gives you the tools to do this. EXPLORING YOUR DATA WITH SAS PROCEDURES KNOW YOUR DATA Needing to know the data is something that was learned first-hand in analyzing and reporting on data collected from senior citizens. The analysis indicated that many were exercising more than 4 hours a day. That was not how I pictured retirement. The data was supposedly “clean” and it was. By running a PROC FREQ and PROC UNIVARIATE, we discovered many retirees enjoyed gardening for 4 to 8 hours daily. Gardening was considered an exercise in this research project. Another time we had to inform a researcher that his published results on a public website were incorrect because he did not understand that the dates were coded differently for the last few months of the research. Therefore, know your data. Before generating a report, you need to understand the structure and contents of your data. Check data values, range of values, frequencies of values, missing values, index variables with unique values, complete and consistent dates, required variables, and the occurrence of duplicate records. When exploring your data, use base procedures like PRINT, UNIVARIATE, FREQUENCY, MEANS, to find the range of data values, frequency, the distribution, the number of observations, and more. List the size of the data set and the name, label, format, length, and type of each variable by running PROC DATASETS or through the DIR and VAR windows of the windowing environment. Also be sure to check for the number of missing values. By exploring your data, you can determine the size of your report and whether you need additional labeling, formatting, or grouping for more readable reports. FRAMINGHAM HEART STUDY Some examples in this presentation use data sets found in the SAS library SASHELP. It is worth exploring the sample data found in SASHELP, which is a library installed with Base SAS software. One is the SASHELP.HEART, which has the 5209 adult subjects from the 1948 Framingham Heart Study. This is a view of the first few observations and variables of the Heart study using the Viewtable of the windowing environment.

Upload: others

Post on 10-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

1

SAS® Programming Basics: Producing Results from Your Data

Helen Carey, Carey Consulting, Kaneohe, HI

ABSTRACT

This session explores how to choose the most appropriate tool for turning your data into results and tailoring the output to the needs of your intended audience.

SAS procedures are designed for easy reporting and analysis of SAS data. This session will look at some of the basic reporting procedures, such as PRINT, MEANS, UNIVARIATE and FREQ. The procedures SQL, REPORT and TABULATE will be used to produce summary reports. PROC FORMAT will be used with the reporting procedures to customize the appearance of data values and to group observations and values. Using the Output Delivery System (ODS) to change the destination and look of your reports, along with ODS Graphics to produce graphs will be explored.

INTRODUCTION

Once data has been stored in a SAS data set, you can produce results quickly using SAS’s library of built-in programs known as SAS procedures. Base SAS procedures use data values from SAS data sets to do a variety of tasks, such as creating and printing reports, performing statistical analysis, summarizing data, producing graphics , manipulating data, managing data libraries and even creating SAS data sets.

You can choose the destination of your results, the type of information that is produced, and the appearance of your reports using the options and statements available with the individual procedure. Also ODS, the Output Delivery System, gives you a lot of flexibility in how your output looks and where and in what form it is written.

It is easy for people feels dread when they see a stack of papers loaded with numbers, spreadsheets and graphs. Remember that the readers of your results are depending upon you to make sense of the data for them. One of the main purposes of your analysis of the data and the production of reports is to assist thinking and communicating your results in a clear easy-to-understand form. SAS gives you the tools to do this.

EXPLORING YOUR DATA WITH SAS PROCEDURES

KNOW YOUR DATA

Needing to know the data is something that was learned first-hand in analyzing and reporting on data collected from senior citizens. The analysis indicated that many were exercising more than 4 hours a day. That was not how I pictured retirement. The data was supposedly “clean” and it was. By running a PROC FREQ and PROC UNIVARIATE, we discovered many retirees enjoyed gardening for 4 to 8 hours daily. Gardening was considered an exercise in this research project. Another time we had to inform a researcher that his published results on a public website were incorrect because he did not understand that the dates were coded differently for the last few months of the research.

Therefore, know your data. Before generating a report, you need to understand the structure and contents of your data. Check data values, range of values, frequencies of values, missing values, index variables with unique values, complete and consistent dates, required variables, and the occurrence of duplicate records.

When exploring your data, use base procedures like PRINT, UNIVARIATE, FREQUENCY, MEANS, to find the range of data values, frequency, the distribution, the number of observations, and more. List the size of the data set and the name, label, format, length, and type of each variable by running PROC DATASETS or through the DIR and VAR windows of the windowing environment. Also be sure to check for the number of missing values.

By exploring your data, you can determine the size of your report and whether you need additional labeling, formatting, or grouping for more readable reports.

FRAMINGHAM HEART STUDY

Some examples in this presentation use data sets found in the SAS library SASHELP. It is worth exploring the sample data found in SASHELP, which is a library installed with Base SAS software. One is the SASHELP.HEART, which has the 5209 adult subjects from the 1948 Framingham Heart Study.

This is a view of the first few observations and variables of the Heart study using the Viewtable of the windowing environment.

Page 2: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

2

Figure 1. Framingham Heart Study Viewtable

PROC PRINT

PROC PRINT gives you a detailed report, a listing of all the variables for all observations in a data set unless you give it limits. You can select the variables to print using the VAR statement and the observations using the WHERE statement or data option.

To get a quick look at the first few observations of the data set, you can use the OBS option to limit the output to 10 observations with status ‘Alive’. Also, let’s limit the variables printed with the VAR statement.

proc print data=sashelp.heart(obs=10);

where status = 'Alive';

var weight_status

ageatstart--weight

sex ;

run;

By default, the order of variables printed is the way that they are stored in the program data vector PDV. However, the included VAR statement in the program above indicates which variables to print and the order in which to print them. The variable list ageatstart--weight, with the double hyphen, says to select all the variables starting with ageatstart through weight as listed in the PDV. Variable names are listed as they are stored in the PDV but are case-insensitive when referenced. Notice that the printing of the column heading with the variable name AgeAtStart breaks at the capital letters.

PROC FREQ

The FREQ procedure produces a frequenecy count of the data in each of the possible values for a variable. It produces one-way to n-way frequency and cross-tabulation tables for numeric or character variables.

The first 10 observations printed in the example above only have a Weight_Status of ‘Overweight’. Explore the data by taking a count of the number of people in each category of Weight_Status.

proc freq

data=sashelp.heart;

where status = 'Alive';

tables weight_status;

run;

Figure 3. FREQ: Frequency Count In Each Weight Status

Figure 2. PRINT: Limiting Variables, Observations

Page 3: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

3

Now let’s look at a two-way table of sex and Weight_status to see if there is any association between them.

PROC FREQ has options for additional analysis, such as the Pearson chi-square statistic and for displaying the results, such as the option ORDER= to control the order in which variable values are displayed.

proc freq data=sashelp.heart

order=freq;

where status = 'Alive';

tables sex*weight_status;

run;

PROC UNIVARIATE

Use the UNIVARIATE procedure to find out all the interesting things about your data. PROC UNIVARIATE provides a large number of descriptive statistics on a single variable at a time. By default, PROC UNIVARIATE produces descriptive statistics for all numeric variables. To limit the analysis to specific variables, use a VAR statement. The CLASS statement specifies one or two variables used to group the data into subgroups.

In this example, we want to look only at what at the extreme values for weight broken down by the variable SEX. This is another way of exploring and learning more about the data. The ID statement is normally used to print information, such as the subject’s id or name to make it easier to locate the original data. In this case, I am using it to print out the height associated with the extreme values of weight.

In this example, the lowest weight value is 67 for Sex=Female. That person’s corresponding height is 57.75 inches.

ods select ExtremeObs;

proc univariate data=sashelp.heart;

class sex;

var weight;

id height;

run;

PROC SQL

For normally distributed data, you can use PROC SQL to select extreme values. This example lists values that are greater or smaller than 2 standard deviations from the mean. The function ABS returns the absolute value of the argument mean-score.

proc sql;

select *, avg(score) as mean,

std(score) as sd

from mylib.scores

group by gender

having abs(mean-score)>2*sd;

run; quit;

Figure 5. UNIVARIATE Results for Sex=Female

Figure 4. FREQ: Two-Way Table

Page 4: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

4

CHECK FOR MISSING VALUES USING PROC FORMAT

Here is a quick way to check the number or percentage of missing values for each numeric variable. Use PROC FORMAT to define valid groups for values that are OK and values that are missing. In this example, we have coded those responses where the student was excused with the special missing value of either ._ or .R, that is a period followed by an underscore and a period followed by the letter R. The special SAS name list _NUMERIC_ refers to all variables of type numeric. By running PROC FREQ on those groups, we can see how many missing values we have.

proc format;

value numfmt

. ='missing'

._,.R ='excused'

OTHER ='OK';

run;

proc freq data=grade;

format _numeric_ numfmt. ;

tables _numeric_ /nocum missing;

run;

PROC MEANS

The MEANS procedure produces descriptive statistics for numeric variables. By default, PROC MEANS will analyze all numeric variables in your data set. The default statistical measures are :

• N Number of observations with a non-missing value of the analysis variable • MEAN Mean (Average) of the analysis variable’s non-missing values • STD Standard Deviation • MAX Largest (Maximum) Value • MIN Smallest (Minimum) Value

However, if we want to limit the variables analyzed to only certain ones, use the VAR statement. The maxdec=0 option on the PROC statement indicates that we want the analyses rounded to the nearest whole number. Using MAXDEC=2 indicates to show 2 decimal places. The option MISSING list the number of observations that have a missing value for the classification variable.

proc means data=sashelp.heart

n nmiss mean median missing maxdec=0;

class sex;

var weight;

run;

Figure 7. MEANS Descriptive Statistics

Figure 6. FREQ Results For Missing Values

Page 5: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

5

Our results show no missing values for our classification variable SEX. This means that everyone had their sex coded as either Female or Male. In exploring our data, it is important to know how many observations have a missing or non-missing value for the variables. We included the NMISS and N statistics keywords to find out that information for the analysis variable.

ANALYZING YOUR DATA WITH SAS ENTERPRISE GUIDE

If you have SAS Enterprise Guide installed at your work, you can use it to do some or all of your analyses. SAS Enterprise Guide enables you to explore your data using the task called characterize data. With a simple wizard, you can create summary report, graphs, frequency and univariate statistics. Also, SAS Enterprise Guide can read your data from many sources, such as Excel, databases, Oracle, and of course SAS data sets.

SAS Enterprise Guide is a powerful user interface to SAS. It generates the SAS code and produces the results through a point-and-click interface. I find it easier to create my PROC TABULATE code by running the summary task in SAS Enterprise Guide and saving the generated code. The generated code is used as a starting point for writing my PROC TABULATE code and enhancing with additional options.

There is a section at this WUSS conference on Enterprise Guide. You will be able to find lots of papers explaining why putting SAS Enterprise Guide as a tool in your toolkit is such a good idea. Also there are on-line tutorials, the book SAS for Dummies and the book The Little SAS Book for SAS Enterprise Guide to help in learning SAS EG.

The SAS Online Resources for Statistics Education website has steps with short movies on how to perform basic statistics using SAS Enterprise Guide. The way to find this resource is to do a Google search for SAS Online Resources. Remember to bookmark it. You can learn statistics using SAS Enterprise Guide and at the same time learn EG.

Go to the website, select which version of SAS Enterprise Guide to view, select an analysis from the list in the left frame to open a simulation window. There you will be able to view the steps for the analysis in SAS Enterprise Guide in a brief movie.

Figure 8. SAS Online Resources for Statistics Education

Finally, install and use the latest version of SAS Enterprise Guide because the enhancements are worth it to upgrade.

DESIGNING YOUR REPORTS

The point of your analysis is to assist thinking. SAS is a powerful system that will help you take your data and turn it into information.

Remember that the readers of your results are depending upon you to make sense of the data for them and to assist them in decision making. Carefully consider who will be using your results and the best way to present it. Ask yourself how can you deliver the information in a form that is best understood and used by the decision makers. Explore and understand the data by running procedures like DATASETS, UNIVARIATE, FREQ, and MEANS. Create a mock-up of how you want the tables to appear. Data presentation comes in many forms, such as summary and detail reports, lists, statistical tables, and graphical displays. After knowing how you want to present your data, you need to decide which SAS procedure or technique is best to use. Taking the time to choose the best tool to use will save you considerable time in the long run and will be more machine efficient. Therefore, think first, code later. Spend the time up front to save time and to minimize effort and maximize results.

This quote by Warren Buffett says it best:

As the old saying goes, what the wise man does at the beginning, fools do in the end…

Page 6: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

6

Build your report with test data, and check the result. Double check the values of summary information and percentages. Determine how you want to handle missing values. When you like the layout and the test results are correct, turn your attention to making the table look good. Use titles, formats, labels, and options to make your report more readable and impressive.

Look at the delivery mechanism of your reports. Instead of a printed report, you could produce graphs, put it on the company website, put it in a PDF or Excel spreadsheet or use SAS to email the results. Even consider if the report is really needed or reporting some other results might help the decision making process. With the Output Delivery System, you have the means to deliver the results in many ways and forms.

GENERATING SUMMARY REPORTS OF THE DATA SET

The procedures MEAN, SQL, REPORT and TABULATE produce summary reports. It is worthwhile learning the capabilities, features and limitations of each of these procedures. There are many conference papers, available at www.lexjansen.com, on learning the basics, advanced features and idiosyncrasies of these procedures. The book Step-by-Step Programming with Base SAS

® Software (available as a free PDF at support.sas.com and mentioned in

the companion paper SAS® Programming Basics: Getting Your Data In and Understanding the DATA Step) has

chapters on “Creating Summary Tables with the TABULATE Procedure” and “Creating Detail and Summary Reports with the REPORT Procedure”.

The following examples show the same summary report produced by each of these procedures. They give the same results but are displayed differently.

This program generates the data set used in the reports.

data grade;

input Name $ 1-8 Gender $ 11

Status $ 13 Year $ 15-16 Section $ 18

Score 20-21 FinalGrade 23-24;

datalines;

Abbott F 2 97 A 90 87

Branford M 1 98 A 92 97

Crandell M 2 98 B 81 71

Dennison M 1 97 A 85 72

Edgar F 1 98 B 89 80

Faust M 1 97 B 78 73

Greeley F 2 97 A 82 91

Hart F 1 98 B 84 80

Isley M 2 97 A 88 86

Jasper M 1 97 B 91 93 ;

RUN;

PROC MEANS

Many SAS programs use PROC MEANS to display descriptive statistics for analysis variables across all observations and within groups of observations. Analysis variables are numeric variables that you want analyzed and are specified on the VAR statement. Classification variables are either numeric or character variables and are used to produce analyses for each separate value of the classification variable. They are specified on the CLASS statement or as a BY group variable. It is recommended to use the CLASS statement because BY group variables must be sorted before calling the procedure.

You may see SAS code that uses the SUMMARY procedure. PROC SUMMARY is similar to PROC MEANS, except MEANS displays the output by default and SUMMARY creates an output data set by default. You can change the default for either procedure.

PROC MEANS analyzes the analysis variables at all types or combinations of the values of the classification variables, unless you specify otherwise. If you do not need all types in your output, use the TYPES statement to specify particular subtypes. The TYPES statement, used in the example.

types () status*year;

Page 7: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

7

requests that the analysis be performed on all the observations in the GRADE data set (indicated by the pair of parentheses “()” to request the overall total) as well as the two-way combination of Status and Year, which results in four subgroups because Status and Year each have two unique values.

proc means data=grade maxdec=3;

var Score;

class Status Year;

types () status*year;

title 'Student Grades-Status';

run;

Figure 9. MEANS Results for Student Grades

PROC REPORT

PROC REPORT enable you to produce report from simple to complex and from detailed to summary.

If the data set contains some character variables, then the output from this PROC REPORT run is similar to that of proc print but without the observation numbers. .For a data set with only numeric variables, include a DEFINE statement. Without it, only the totals line is printed.

Be sure to include NOWD or NOWINDOWS on the PROC REPORT statement unless you want to open an interactive REPORT window for designing your report.

proc report data=Grade NOWD;

columns Status Year Score

Score=SD Score=MinSc Score=MaxSc;

define Status / group width=6;

define Year / group width=5 ;

define Score / mean 'Mean' F=6.2;

define SD / std 'STD' F=6.2;

define MinSc / min 'Min ' ;

define MaxSc / max 'Max ' ;

rbreak after/ dol summarize;

run;

In the example, the RBREAK statement with the summarize option gives the overall summary.

Page 8: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

8

Figure 10. REPORT Results for Student Grades

One of the features of PROC REPORT that is extremely useful is the ability to print the same variable twice with different formats. To do this using PROC PRINT, you need to create a duplicate variable in a DATA step. However, with PROC REPORT, it’s easy to use the same variable with more than one format by creating an alias in the column statement. It is worth showing a very simple example here. To print the variable Birth with the two different formats, you first use define the alias Day in the column statement for the variable Birth. Then use the DEFINE statement to define the formats that you want.

proc report nowindows;

column Birth Birth=Day;

define Birth/display format=worddate12.;

define Day/ display 'Day' format=downame.;

run;

PROC TABULATE

PROC TABULATE produces descriptive statistics. The results is in a table and can be multi-dimensional tables. You have a lot of control over content and layout. It can also generate an output data set. There are a variety of statistics, including quantile statistics and percentile statistics. The TABULATE procedure is flexible and gives you many ways to product tabular reports.

The syntax does not resemble other SAS procedures so it is difficult for beginning SAS programmers. SAS Enterprise Guide generates the SAS code for PROC TABULATE code by running the summary task. I find it easier to create my PROC TABULATE code by running the summary task in SAS Enterprise Guide and using the created code as a starting point. Also put comments in your code to help you understand it when you return to it later.

Here is the code and results for our summary example.

proc tabulate data=Grade;

class Status Year;

var Score;

tables /* rows */

Status*Year ALL,

/* columns */

Score*(MEAN STD MIN MAX);

run;

Figure 11. REPORT: Printing a Variable Twice

Page 9: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

9

Figure 12. TABULATE Results for Student Grades

PROC SQL

When SQL was first introduced in SAS, my student assistant always used PROC SQL He was able to use one procedure to do many tasks. I was using the SORT, MEANS, and PRINT procedures and the DATA step for merging (joining) the data sets to accomplish the same thing. It still took me a while to take the plunge and learn SQL. After all, all my old tools still worked. There is a learning curve, because it is difference from the SAS procedures. I learned about Cartesian joins and how not to create thousands and thousands of unwanted lines of output. I learned to test joins on subsets of my tables and to include commas to separate the variables in the SQL statement. Today, it is the first tool I reach for, instead of the previously mentioned procedures.

If you are not using PROC SQL, consider these reasons for taking the time to learn it: PROC SQL is easy and efficient to use for summarizing data and remerging the summarized data with the original data. PROC SQL is a natural for complex joins (merges) and has the ability to query up to 32 tables (data sets) simultaneously. SQL is a standardized, widely used language in data processing. By knowing SQL, you are more marketable and you know a common language to communicate with your data processing colleagues.

This example of PROC SQL gets the summary statistics by status and year for the Student Grades Report. To get the overall summary, run SQL without the GROUP BY statement and variables.

proc sql;

select Status, Year,

COUNT(*) as n,

Avg(Score) as avg format=6.2,

STD(Score) as std format=6.2,

MIN(Score) as min,

MAX(Score) as max

from Grade

group by Status, Year;

RUN;

Figure 13. SQL Results for Student Grades

OUTPUT

DATA steps typically create or modify SAS data sets, but they can also be used to produce custom-designed reports. Procedures typically produce reports, but they can also create or modify SAS data sets. When you can, let a procedure do some of the work for you by using the output data set capabilities of procedures.

Page 10: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

10

CREATING A SIMPLE DATA SET USING THE PROC MEANS PROCEDURE

PROC MEANS uses the OUTPUT statement to put the generated analyses in a SAS data set instead or in addition to producing a report. The OUTPUT statement controls what variables and observations are placed in to the data set. The NOPRINT option on the PROC statement indicates not to print the report. In the following example the output SAS data set is named HeartStudy. The MEAN and MEDIAN statistics will be generated and the procedure automatically names the variables for the statistics. The names of the statistics generated are Height_Mean, Weight_Mean, Height_Medium, and Weight_Medium.

proc means data=sashelp.heart noprint;

class sex;

var height weight;

output out=HeartStudy

mean= median=/autoname;

run;

/* printing the Proc Means output data set */

proc print data=HeartStudy;

title 'Framingham Heart Study';

run;

Figure 14. Printout Of Output Data Set Generated By PROC MEANS

OUTPUT DELIVERY SYSTEM

It is easy to add some excitement and interest to your output by using destinations and style available with the Output Delivery System. ODS controls the look of all procedure output. Each output is composed of two component objects: a data component, which contains the data values and a template component, which contains how the piece should

look.

You can help the reader by using appropriate destinations and styles for your results. Include ODS statements to specific other destinations such as standard listing, HTML code for web enablement, output data set, Rich Text Format (RTF), PDF, PCL, XML, Excel (via the ExcelXP tagset) and more. ODS has been designed so multiple output destinations can be created at the same time.

Depending on the destination, you can include links to html pages, logos, and table of contents. The output can be a simple style to a more elaborate style. You can even create your own styles by building a template.

ODS gives you many features and capabilities, such as traffic lighting illustrated below.

Check out Haworth's paper ODS Tips & Tricks. (WUSS 2010). Lauren Haworth, Cynthia Zender, Michele Burlew wrote a comprehensive ODS book called Output Delivery System: The Basics and Beyond. It is available on Amazon as a paper and Kindle edition.

Page 11: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

11

Figure 15. Listing Output

Traffic Lighting

Notice the pink and blue cells in the HTML listed above. It is using a pink background for the classification of females, blue for the males. This is called traffic lighting. This is done by defining formats for the different cell values and using the STYLE option on the VAR statement. Here is an example of traffic lighting from the Framingham Heart Study. We are using it to display values of concern for diastolic and systolic values.

Let's look at how easy it is to do this. First use PROC FORMAT to create the user defined formats DIAFMT and SYSFMT with the colors for the different values of the variables diastolic and systolic.

Next add the ODS statements around the procedure to give the HTML destination. And finally add the style option to the VAR statement so that the background is printed in a color depending on the value of the variable.

Using the color coding of the values gives the reader more information. Use the FOOTNOTE statement to add a footnote to explain the color coding.

proc format; ←PROC FORMAT defines how to print the background

value diafmt

0-140 = 'White'

141-high='Cyan';

value sysfmt

0-79, 211-high = 'Pink'

80-210 = 'White';

run;

ods html file='print.htm' style=analysis;

proc print data=sashelp.heart label split='*';

label diastolic='Diastolic>140';

label systolic='-- OR --*Systolic*(not 80-210)';

where status = 'Alive' and

((diastolic > 140) or

(systolic not between 80 and 210) and

(systolic is not missing)) ;

var diastolic/style=[background=diafmt. ←Style option was added to VAR statments

font_weight=bold];

var systolic/style=[background=sysfmt.

font_weight=bold];

var cholesterol;

run;

ods html close;

Figure 16. HTML Output With Traffic Lighting

Page 12: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

12

Figure 17. Framingham Heart Study Results with Traffic Lighting

Output to Excel with a tagset

When I worked in the insurance industry, my manager wanted the results of my analysis in an Excel spreadsheet.

There are many destinations that you can use to create the Excel spreadsheet. At first I just used HTML, which can be opened in Excel. After I opened the spreadsheet, I would need to resize the columns, change the fonts and do other manual steps to make it more readable. Then I heard a presentation on the destination TAGSETS.EXCELXP. It allows so much control over all aspects of the spreadsheet, such as column widths, page layout, formats, colors.

ods tagsets.excelxp style=minimal

file='c:\mylib\scores.xls

proc means data=sashelp.heart

n nmiss mean median missing maxdec=0;

class sex;

var weight;

run;

ods tagsets.excelxp close;

Figure 18. Excel Output Using the Destination Tagset.Excelxp

USE GRAPHS

Effective graphs clarify and focus the data. Ideally they provide information quickly and unambiguously. Just do a Google search on “sas effective graphs” to find many SAS papers on designing effective graphs. Tufte's book The Visual Display of Quantitative Information, 2nd edition shows how to display data for precise, effective, quick analysis.

It can be borrowed from your local library or added it to your own collection.

Explore your data visually. Graphs enables you to see unusual values in the data and can easily show patterns and properties of the data. . PROC PLOT and PROC CHART produce low-resolution printer graphs. Procedures, like

Page 13: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

13

PROC FREQ and PROC UNIVARIATE, have options to produce graphics. ODS Statistical Graphics enables you to create simple and complex statistical graphs easily.

A SIMPLE PLOT TO SEE OUTLIERS

This is an example of a simple plot, but it does show the outliers.

PROC PLOT DATA=sashelp.heart;

PLOT Height*Weight;

run;

HISTOGRAMS WITH PROC UNIVARIATE

Many of the procedures, including UNIVARIATE, also can produce graphs of the results.

This example is from the UNIVARIATE Procedure documentation of Base SAS(R) 9.2 Procedures Guide: Statistical Procedures, Third Edition. Search for "univariate Two-Way Comparative Histogram" to find the example on the SAS website.

This example shows how easy it is to graphically compare data from 2 suppliers for 2 years using PROC UNIVARIATE. By taking adding a HISTOGRAM statement to PROC UNIVARIATE, a graph is produced that helps in understanding the data.

proc format ;

value mytime 1 = '2002' 2 = '2003';

run;

proc univariate data=disk noprint;

class supplier year /

keylevel = ('supplier a' '2003');

histogram width / intertile = 1.0

vaxis = 0 10 20 30

ncols = 2

nrows = 2;

title 'Results of Supplier Training Program';

run;

FREQUENCY PLOTS WITH PROC FREQ

Most SAS programmers have used only the basic features of the FREQ procedure to evaluate their data. Many valuable features of PROC FREQ are not used even by experienced SAS users. Before turning to more complex procedures, check out what is available in PROC FREQ.

The FREQ procedure can produce frequency plots, cumulative frequency plots, deviation plots, odds ratio plots, and kappa plots by using ODS Graphics. Here’s an example.

Here is an example of producing a dot plot of the frequency of the different WEIGHT_STATUS categories.

Figure 20. PROC UNIVARIATE Histogram

Figure 19. PLOT with Outliers

Page 14: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

14

ods _ALL_ close;

ods graphics on;

ods html path='c:\myoutput' (url=none)

body="barGraph.htm"

style=analysis;

proc freq data=sashelp.heart;

where status = 'Alive';

tables

weight_status/plots=freqplot(type=dot);

run;

ods html close;

ods graphics off;

ods listing;

ODS GRAPHICS

Programmers with SAS/Graph experience will be amazed at how simple it is to create complex statistical graphs with ODS Statistical Graphics. New programmers may not be amazed because they do not know the drudgery of writing SAS/Graph code to get the results and layout they want. To learn more, see the On-demand Webinar Getting Started with ODS Statistical Graphics in SAS

® 9.2 by Bob Rodriguez. He’s a statistician (fellow of the American Statistical

Association) with an enlightening talk. SAS 9.3 does not require the SAS/Graph license to run ODS Graphics.

Here's an example:

ods graphics on;

ods html style=statistical;

ods select parameterestimates fitplot;

proc reg data=sashelp.class;

model weight=height;

quit;

ods html close;

ods graphics off;

KEEP LEARNING

Never assume you know everything about a procedure. As with any new tool, like a camera or IPad, spend time getting to know the procedure. Review a procedure a couple of weeks or months after you first start using it to learn about useful features and capabilities. Check out the SAS Press books and the numerous conference papers presented throughout the years to learn more about a procedure.

Read "What's New" when your company installs a new version of SAS software to learn the enhancements and features for the new version. For example SAS 9.2 makes it easy to sort data alphabetically, regardless of case.

Figure 21. FREQ: Frequency Plot

Figure 22. ODS Graphics

Page 15: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

15

Before the SAS 9.2 version, to sort a variable regardless of case, you either used PROC SQL or used a two-step process. That two-step process was to create an uppercase version of the BY variable in the DATA step and then sort using the new variable.

This example shows using PROC SQL with the UPCASE function to sort data alphabetically regardless of case.

proc sql;

create table territories as

select Territory, Name

from maps.names

where Territory contains "France"

order by

upcase(Territory);

quit;

There's a simpler solution with the SORT procedure in SAS 9.2. The SAS Usage Note 31369: Sorting Text Without Regard to Case in SAS 9.2 explains how to use the SORTSEQ=LINGUISTIC option with PROC SORT to treat alphabetic characters equally regardless of case.

proc sort data=maps.names out=territories

sortseq=linguistic(strength=primary);

by Territory name;

where Territory contains "France";

run;

SAS examples are another resource for learning. You can access the SAS examples and Sample Library programs from within the windowing environment of SAS or access them at support.sas.com.

From the menu bar of the windowing environment, Select Help ⇒ SAS Help and Documentation. In the left pane select Contents tab. In the list of contents, select Learning to Use SAS ⇒ Sample SAS Programs.

Figure 23. SAS Help and Documentation Examples

REFERENCES

Step-by-Step Programming with Base SAS® Software, Cary, NC: SAS Institute Inc., 2001

The SAS Usage Note 31369: Sorting Text Without Regard to Case in SAS 9.2

SAS Support site: http://support.sas.com

On-demand Webinar Getting Started with ODS Statistical Graphics in SAS® 9.2 by Bob Rodriguez

Page 16: SAS Programming Basics: Producing Results from Your DataSAS® Programming Basics: Producing Results from Your Data 4 CHECK FOR MISSING VALUES USING PROC FORMAT Here is a quick way

SAS® Programming Basics: Producing Results from Your Data

16

CONCLUSION

This paper has just touched the surface of a few of the many basic procedures available to analyze your data set and generate reports. Each procedure has many options and additional statements to use for controlling the analysis and output. Once you have used a procedure for a short while, you may want to revisit the SAS documentation to learn more about the procedure. This includes doing an Internet search using the words “SAS” and the procedure name to find useful papers by other SAS users. With each new release installed at your workplace, visit the SAS web site and read what’s new for the release.

I bought a plague while on vacation in Ireland that reads:

I am still learning.

Michelangelo (age 87)

With SAS and it’s wealth of rich features and it’s enhancements with each new release, you will discover that no matter how long you have been using SAS, you will be saying “I am still learning.”

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

Name: Helen Carey E-mail: [email protected] Web:

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.