sas short course presentation 11-4-09

Upload: -

Post on 13-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    1/87

    November 4, 2009

    Introduction to SAS

    LISA Short Course Series

    Mark Seiss, Dept. of Statistics

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    2/87

    Reference Material

    The Little SAS BookDelwiche and Slaughter

    SAS Programming I: Essentials

    SAS Programming II: Manipulating Data with theDATA Step

    Presentation and Data

    http://www.lisa.stat.vt.edu/?q=node/167

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    3/87

    Presentation Outline

    1. Introduction to the SAS Environment

    2. Working With SAS Data Sets

    3. Summary Procedures

    4. Basic Statistical Analysis Procedures

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    4/87

    Presentation Outline

    Questions/Comments

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    5/87

    Introduction to theSAS Environment

    1. SAS Programs

    2. SAS Data Sets and Data Libraries

    2. Creating SAS Data Sets

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    6/87

    SAS Programs

    File extension - .sas

    Editor window has four uses:

    Access and edit existing SAS programs

    Write new SAS programs

    Submitting SAS programs for execution Saving SAS programs

    SAS programsequence of steps that the user submits forexecution

    Submitting SAS programs Entire program

    Selection of the program

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    7/87

    SAS Programs

    Syntax Rules for SAS statements Free-formatcan use upper or lower case

    Usually begin with an identifying keyword

    Can span multiple lines

    Always end with a semicolon

    Multiple statements can be on the same line

    Errors

    Misspelled key words

    Missing or invalid punctuation (missing semi-colon common)

    Invalid options

    Indicated in the Log window

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    8/87

    SAS Programs

    2 Basic steps in SAS programs: Data Steps

    Typically used to create SAS datasets and manipulate data,

    Begins with DATA statement

    Proc Steps

    Typically used to process SAS data sets

    Begins with PROC statement

    The end of the data or proc steps are indicated by:

    RUN statementmost steps QUIT statementsome steps

    Beginning of another step (DATA or PROC statement)

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    9/87

    SAS Programs

    Output generated from SAS program2 Windows SAS log

    Information about the processing of the SAS program

    Includes any warnings or error messages

    Accumulated in the order the data and procedure steps are

    submitted

    SAS output

    Reports generated by the SAS procedures

    Accumulates output in the order it is generated

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    10/87

    SAS Data Sets and Data Libraries

    SAS Data Set Specifically structured file that contains data values.

    File extension - .sas7bdat

    Rows and Columns formatsimilar to Excel

    Columnsvariables in the table corresponding to fields of data

    Rowssingle record or observation

    Two types of variables

    Charactercontain any value (letters, numbers, symbols, etc.)

    Numericfloating point numbers

    Located in SAS Data Libraries

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    11/87

    SAS Data Sets and Data Libraries

    SAS Data Libraries Contain SAS data sets

    Identified by assigning a library reference namelibref

    Temporary

    Work library

    SAS data files are deleted when session ends

    Library reference name not necessary

    Permanent

    SAS data sets are saved after session ends SASUSER library

    You can create and access your own libraries

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    12/87

    SAS Data Sets and Data Libraries

    SAS Data Libraries cont. Assigning library references

    Syntax

    LIBNAME libref SAS-data-library;

    Rules for Library References

    8 characters or less

    Must begin with letter or underscore

    Other characters are letters, numbers, or under scores

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    13/87

    SAS Data Sets and Data Libraries

    SAS Data Libraries cont. Identifying SAS data sets within SAS Data Libraries

    libref.filename

    Accessing SAS data sets within SAS Data Libraries

    Example: DATA new_data_set;

    set libref.filename;

    run;

    Creating SAS data sets within SAS Data LibrariesExample: DATA libref.filename;

    set old_data_set;

    run;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    14/87

    Creating SAS Data Sets

    Creating a SAS data sets from raw data 4 methods

    1. Importing existing raw data in SAS program

    2. Manually entering raw data in SAS program

    3. Importing existing data sets using Import menu option

    4. Manually entering raw data using Table Editor

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    15/87

    Creating SAS Data Sets

    Importing existing raw data in SAS program1. Start Data step and name the SAS data set to be created

    (include SAS Data library to be stored in)

    DATA libref.SAS-data-set;

    2. Identify the file that contains the raw data file (.dat file)

    INFILE raw-data-filename;

    3. Provide instruction on how to read data from raw data file

    INPUT input-specifications;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    16/87

    Creating SAS Data Sets

    Input Specifications Specifies the names of the SAS variables in the new data set

    Specifies whether the SAS variables are character or numeric

    Identifies the locations of the variables in the raw data file

    List Input

    Column Input

    Formatted Input

    Mixed Input

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    17/87

    Creating SAS Data Sets

    List Input Used when raw data is separated by spaces

    All data in a row must be read in

    All missing data must be indicated by period

    Simple character datano embedded spaces, no lengths greater

    than 8

    INPUT statement

    Simply list variables after the INPUT keyword in the order theyappear on file.

    If variables are character format, place a $ after the variable name

    Example) INPUT Name $ City $ Age Height Weight Sex $;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    18/87

    Creating SAS Data Sets

    Column Input Used when raw data file does not have delimiters between values

    (large data sets)

    Each variables values are found in the same columns in each row

    Numeric data must be standardnumbers, decimals, signs, and

    scientific notation only Advantages

    No spaces required

    Missing values left blank

    Character data can have embedded spaces

    Ability to skip unwanted variables

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    19/87

    Creating SAS Data Sets

    Column Input cont. INPUT Statement

    Numeric variableslist variable name then list column or rangeof columns where the variable is found on the raw data file

    Character variableslist variable name, dollar sign, and then

    column or range of columns Example) INPUT Name $ 1-10 Age 26-28 Sex $ 35;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    20/87

    Creating SAS Data Sets

    Formatted Input Appropriate for reading:

    Data in fixed columns

    Standard and nonstandard character and numeric data

    Calendar values to be converted to SAS date value

    Read data in using SAS informats

    Instruction that SAS uses to read in data values

    General forms

    Character - $informatw.

    Numericinformatw.d

    Dateinformatw.

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    21/87

    Creating SAS Data Sets

    Formatted Input cont. Character Informats

    $w.character string with a width of w, trims leading blanks

    $charw.character string with a width of w, does not trim leadingor trailing blanks

    Numeric Informats

    w.dstandard numeric data with width w and d numbers afterthe decimal

    Raw Data Value = 1234567informat = 8.2SAS Data Value = 12345.67

    COMMAw.dnumeric data with embedded commas Raw Data Value =1,000,001 informat=COMMA10.

    SAS Data Value=1000001

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    22/87

    Creating SAS Data Sets

    Formatted Input cont. SAS date values

    Stored as special numeric number data

    Number of days between January 1, 1960 and the specified data

    Informats are used to read and convert the dates

    Raw Data Value Informat

    11/04/2009 MMDDYY10.

    11/04/09 MMDDYY8.

    04NOV2009 Date9.

    04/11/2009 DDMMYY10.

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    23/87

    Creating SAS Data Sets

    Formatted Input cont. Columns read are determined by the starting point and width of the

    informat

    Example:

    INPUT Name $10. Age 3. Height 5.1 BirthDate MMDDYY10.;

    - Name Character of length 10, columns 1-10

    - Age Numeric with length 3, columns 11-13

    - Height Numeric with length 5 (including decimal) and one

    decimal place (120.9 for instance), columns 14-18

    - Birthdate Date format MMDDYY (11-04-2009 for instance),columns 19 - 28

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    24/87

    Creating SAS Data Sets

    Formatted Input cont. Pointer controls

    +n moves pointer n positions

    @n moves pointer to column n

    Example:INPUT Flight 3. +4 Date mmddyy8. @20 Destination $3.;

    - Flight - Number of length 3, columns 1 through 3

    - DateDate format mmddyy (11/04/09) of length 8, columns 8 through 15

    - DestinationCharacter of length 3, columns 20 through 22

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    25/87

    Creating SAS Data Sets

    Mixed Formatted Input Styles Mix and match the previous 3 input styles

    Example:

    Raw Data: Great Smoky Mountains NC/TN 1926 520,269

    INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.;

    - Parkname - Character of length 22, columns 1 through 22

    - State - Character, separated by spaces

    - Year - Numeric, separated by spaces

    - Acreage - Numeric with informat COMMA9., starts column 40

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    26/87

    Creating SAS Data Sets

    Manually Entering Raw Data Files in SAS program1. Start Data step and name the SAS data set to be created

    DATA library.SAS-data-set;

    2. Provide instructions on how to read data from raw data fileINPUT input-specifications;

    3. Manually enter raw data

    DATALINES;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    27/87

    Creating SAS Data Sets

    Manually Entering Raw Data Files in SAS programExample:

    Data uspresidents;

    INPUT President $ Party $ Number;

    DATALINES;

    Adams F 2

    Lincoln R 16

    Grant R 18

    Kennedy D 35

    ;

    Run;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    28/87

    Creating SAS Data Sets

    Using the import data menu option

    1. FileImport Data

    2. Standard data sourceselect the file format

    3. Specify file location or Browse to select file4. Create name for the new SAS data set and specify location

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    29/87

    Creating SAS Data Sets

    Compatible file formats Microsoft Excel Spreadsheets

    Microsoft Access Databases

    Comma Separate Files (.csv)

    Tab Delimited Files (.txt)

    dBASE Files (.dbf) JMP data sets

    SPSS Files

    Lotus Spreadsheets

    Stata Files

    Paradox Files

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    30/87

    Creating SAS Data Sets

    Enter raw data directly into a SAS data set1. ToolsTable Editor

    2. Enter data manually into table

    - Observations in each row

    - Variables in each column

    3. Left Click ColumnColumn Attributes

    - Variable Name, Variable Label, TypeCharacter/Numeric,

    Format, Informat

    Note: Informats determine how raw data is read. Formats

    determine how variable is displayed.4. Close window Save ChangesYes

    Specify File name and directory

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    31/87

    Introduction to theSAS Environment

    Questions/Comments

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    32/87

    Working With SAS Data Sets

    1. Data Set Manipulation

    2. Data Set Processing

    3. Combining Data Sets

    A. Concatenating/Appending

    B. Merging

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    33/87

    Data Set Manipulation

    Create a new SAS data set using an existing SAS data set asinput

    Specify name of the new SAS data set after the DATA statement

    Use SET statement to identify SAS data set being read

    Syntax:

    DATA output_data_set;

    SET input_data_set;

    ;

    RUN;

    By default the SET statement reads all observations and variablesfrom the input data set into the output data set.

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    34/87

    Data Set Manipulation

    Assignment Statements Evaluate an expression

    Assign resulting value to a variable

    General Form: variable = expression;

    Example: miles_per_hour = distance/time;

    SAS Functions

    Perform arithmetic functions, compute simple statistics, manipulatedates, etc.

    General Form: variable=function_name(argument1, argument2,); Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5);

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    35/87

    Data Set Manipulation

    Selecting Variables Use DROP and KEEP to determine which variables are written to

    new SAS data set.

    2 Ways

    DROP and KEEP as statements

    Form: DROP = Variable1 Variable2;KEEP = Variable3 Variable4 Variable5;

    DROP and KEEP options in SET statement

    Form: SET input_data_set (KEEP=Var1);

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    36/87

    Data Set Manipulation

    Conditional Processing Uses IF-THEN-ELSE logic

    General Form: IF THEN ;

    ELSE IF THEN ;

    ELSE ;

    is a true/false statement, such as:

    Day1=Day2, Day1 > Day2, Day1 < Day2

    Day1+Day2=10

    Sum(day1,day2)=10

    Day1=5 and Day2=5

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    37/87

    Data Set Manipulation

    Conditional ProcessingSymbolic Mnemonic Example

    = EQ IF region=Spain;

    ~= or ^= NE IF region ne Spain;

    > GT IF rainfall > 20;

    < LT IF rainfall lt 20;>= GE IF rainfall ge 20;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    38/87

    Data Set Manipulation

    Conditional Processing cont. If is true, is processed

    ELSE IF and ELSE are only processed if is false

    Only one statement specified using this form

    Use DO and END statements to execute group of statements

    General Form: IF THEN DO;

    ;

    END;

    ELSE DO;

    ;

    END;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    39/87

    Data Set Manipulation

    Subsetting Rows (Observations) We will look at two ways

    Using IF statement

    Using WHERE option in SET statement

    IF statement

    Only writes observations to the new data set in which anexpression is true;

    General Form: IF ;

    Example: IF career = Teacher;

    IF sex ne M;

    In the second example, only observations where sex is not equalto M will be written to the output data set

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    40/87

    Data Set Manipulation

    Subsetting Rows (Observations) cont. Where Option in SET statement

    Use option to only read rows from the input data set in which theexpression is true

    General Form: SET input_data_set (where=());

    Example: SET vacation (where=(destination=Bermuda)); Only observations where the destination equals Bermuda will be

    read from the input data set

    Comparison

    Resulting output data set is equivalent

    IF statementall rows read from the input data set

    Where optiononly rows where expression is true are read frominput data set

    Difference in processing time when working with big data sets

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    41/87

    Data Set Manipulation

    PROC SORT sorts data according to specified variables General Form: PROC SORT DATA=input_data_set ;

    BY Variable1 Variable2;

    RUN;

    Sorts data according to Variable1 and then Variable2;

    By default, SAS sorts data in ascending order

    Number low to high

    A to Z

    Use DESCENDING statement for numbers high to low and letters Z to A BY City DESCENDING Population;

    SAS sorts data first by city A to Z and then Population high to low

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    42/87

    Data Set Manipulation

    Some Options NODUPKEY

    Eliminates observations that have the same values for the BYvariables

    OUT=output_data_set By default, PROC SORT replaces the input data set with the

    sorted data set

    Using this option, PROC SORT creates a newly sorted data setand the input data set remains unchanged

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    43/87

    Data Set Processing

    Data Set Processing DATA steps read in data from existing data sets or raw data files one

    row at a time, like a loop

    DATA step reads data from the input data set in the following way:

    1. Read in current row from input data set to Program Data

    Vector (PDV)2. Process SAS statements

    3. PDV to output data set

    4. Set current row to the next row in the input data set

    5. Iterate to Step 1

    One row at a time is processed

    Thus we cannot simply add the value of a variable in one row to thevalue in another row

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    44/87

    Data Set Processing

    Data Set ProcessingExample Let the following be the input data set dfwlax:

    Flight Date Dest FirstClass Economy

    439 14955 LAX 20 137

    921 14955 DFW 15 131

    114 14956 LAX 15 85

    982 14956 DFW 5 196

    439 14957 LAX 14 116

    982 14957 DFW 20 166

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    45/87

    Data Set Processing

    Data Set ProcessingExample Consider the following submitted code:

    DATA onboard;

    SET dfwlax;

    Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;

    ELSE FirstClassFull=0;

    RUN;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    46/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;Current SET dfwlax;

    Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 . .

    Flight Date Dest FirstClass Economy Total FirstClassFull

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    47/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;

    Current Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 .

    Flight Date Dest FirstClass Economy Total FirstClassFull

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    48/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;Total=FirstClass+Economy;

    Current IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

    Flight Date Dest FirstClass Economy Total FirstClassFull

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    49/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    Current RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    50/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    Current DATA onboard;SET dfwlax;Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 . .

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    51/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;Current SET dfwlax;

    Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    921 14955 DFW 15 131 . .

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    52/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;

    Current Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    921 14955 DFW 15 131 146 .

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    53/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;

    Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;

    Current ELSE FirstClassFull=0;RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    921 14955 DFW 15 131 146 0

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    54/87

    Data Set Processing

    Data Set ProcessingExample Execution of the Data Step

    DATA onboard;SET dfwlax;

    Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;

    Current RUN;

    PDV

    Onboard

    Flight Date Dest FirstClass Economy Total FirstClassFull

    921 14955 DFW 15 131 146 0

    Flight Date Dest FirstClass Economy Total FirstClassFull

    439 14955 LAX 20 137 157 1

    921 14955 DFW 15 131 146 0

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    55/87

    Combining Data Sets

    Concatenating (or Appending) Stacks each data set upon the other

    If one data set does not have a variable that the other datasetsdo, the variable in the new data set is set to missing for theobservations from that data set.

    General Form: DATA output_data_set;

    SET data1 data2;

    run;

    PROC APPEND may also be used

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    56/87

    Combining Data Sets

    Merging Data Sets One-to-One Match Merge

    A single record in a data set corresponds to a single record in allother data sets

    Example: Patient and Billing Information

    One-to-Many Match Merge

    Matching one observation from one data set to multipleobservations in other data sets

    Example: County and State Information

    Note: Data must be sorted before merging can be done

    (PROC SORT)

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    57/87

    Combining Data Sets

    One-to-One Match Merge Usually need at least one common variable between data sets

    matching purposes

    For the example, a patient ID would be needed

    Do not need common variable if all data sets are in exactly the sameorder

    General Form: DATA output_data_set;

    MERGE input_data_set1 input_data_set2;

    By variable1 variable2;

    RUN;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    58/87

    Combining Data Sets

    One-to-One Match Merge Example:

    Performance Goals

    Code:

    DATA compare;

    MERGE performance goals;

    BY month;

    difference=sales-goal;

    RUN;

    Month Sales

    1 8223

    2 6034

    3 4220

    Month Goal

    1 9000

    2 6000

    3 5000

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    59/87

    Combining Data Sets

    One-to-One Match Merge Example cont.:

    Compare

    Month Sales Goal Difference

    1 8223 9000 -777

    2 6034 6000 34

    3 4220 5000 -780

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    60/87

    Combining Data Sets

    One-to-Many Match Merge Requires at least one common variable in the data sets for matching

    purposes

    For the example, State information is in both the state and countyfiles

    If two data sets have variables with the same name, the variables inthe second data set will overwrite the variable in the first.

    General Form: DATA output_data_set;

    MERGE Data1 Data2 Data3;

    BY Variable1 Variable2;

    RUN:

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    61/87

    Combining Data Sets

    One-to-Many Match Merge Example:

    Videos Adjustment

    Code:

    DATA prices;

    MERGE videos adjustment

    BY category;

    NewPrice=(1-adjustment)*sales;

    RUN;

    Category Sales

    Aerobics 12.99

    Aerobics 13.99

    Aerobics 13.99

    Step 12.99

    Step 12.99

    Weights 15.99

    Category Adjustment

    Aerobics .20

    Step .30

    Weights .25

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    62/87

    Combining Data Sets

    One-to-One Many Merge Example cont.:

    VideosCategory Sales Adjustment NewPrice

    Aerobics 12.99 .20 10.39

    Aerobics 13.99 .20 11.19

    Aerobics 13.99 .20 11.19

    Step 12.99 .30 9.09

    Step 12.99 .30 9.09

    Weights 15.99 .25 11.99

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    63/87

    Working With SAS Data Sets

    Questions/Comments

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    64/87

    Summary Procedures

    1. Print Procedure

    2. Plot Procedure

    3. Univariate Procedure

    4. Means Procedure5. Freq Procedure

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    65/87

    Print Procedure

    PROC PRINT is used to print data to the output window By default, prints all observations and variables in the SAS data set

    General Form: PROC PRINT DATA=input_data_set

    ;

    RUN;

    Some Options

    input_data_set (obs=n) - Specifies the number of observations tobe printed in the output

    NOOBS - Suppresses printing observation number

    LABEL - Prints the labels instead of variablenames

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    66/87

    Print Procedure

    Optional SAS statements BY variable1 variable2 variable3;

    Starts a new section of output for every new value of the BYvariables

    ID variable1 variable2 variable3; Prints ID variables on the left hand side of the page and

    suppresses the printing of the observation numbers

    SUM variable1 variable2 variable3;

    Prints sum of listed variables at the bottom of the output

    VAR variable1 variable2 variable3;

    Prints only listed variables in the output

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    67/87

    Plot Procedure

    Used to create basic scatter plots of the data Use PROC GPLOT or PROC SGPLOT for more sophisticated plots

    General Form: PROC PLOT DATA=input_data_set;

    PLOT vertical_variable *horizontal_variable/;

    RUN;

    By default, SAS uses letters to mark points on plots

    A for a single observation, B for two observations at the same point,etc.

    To specify a different character to represent a point

    PLOT vertical_variable * horizontal variable = *;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    68/87

    Plot Procedure

    To specify a third variable to use to mark points PLOT vertical_variable * horizontal_variable = third_variable;

    To plot more than one variable on the vertical axis

    PLOT vertical_variable1 * horizontal_variable=2

    vertical_variable2 * horizontal_variable=1/OVERLAY;

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    69/87

    Univariate Procedure

    PROC UNIVARIATE is used to examine the distribution of data Produces summary statistics for a single variable

    Includes mean, median, mode, standard deviation, skewness,kurtosis, quantiles, etc.

    General Form: PROC UNIVARIATE DATA=input_data_set ;VAR variable1 variable2 variable3;

    RUN ;

    If the variable statement is not used, summary statistics will be produced

    for all numeric variables in the input data set.

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    70/87

    Univariate Procedure

    Options include: PLOTproduces Stem-and-leaf plot, Box plot, and Normal

    probability plot;

    NORMALproduces tests of Normality

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    71/87

    Means Procedure

    Similar to the Univariate procedure General Form: PROC MEANS DATA=input_data_set options;

    ;

    RUN;

    With no options or optional SAS statements, the Means procedure willprint out the number of non-missing values, mean, standard deviation,minimum, and maximum for all numeric variables in the input data set

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    72/87

    Means Procedure Options

    Statistics Available

    Note: The default alpha level for confidence limits is 95%. Use ALPHA= option tospecify different alpha level.

    CLM Two-Sided Confidence Limits RANGE Range

    CSS Corrected Sum of Squares SKEWNESS Skewness

    CV Coefficient of Variation STDDEV Standard Deviation

    KURTOSIS Kurtosis STDERR Standard Error of Mean

    LCLM Lower Confidence Limit SUM Sum

    MAX Maximum Value SUMWGT Sum of Weight Variables

    MEAN Mean UCLM Upper Confidence Limit

    MIN Minimum Value USS Uncorrected Sum of Squares

    N Number Non-missing Values VAR Variance

    NMISS Number Missing Values PROBT Probability for Students t

    MEDIAN (or P50) Median T Students t

    Q1 (P25) 25% Quantile Q3 (P75) 75% Quantile

    P1 1% Quantile P5 5% Quantile

    P10 10% Quantile P90 90% Quantile

    P95 95% Quantile P99 99% Quantile

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    73/87

    Means Procedure

    Optional SAS Statements VAR Variable1 Variable2;

    Specifies which numeric variables statistics will be produced for

    BY Variable1 Variable2;

    Calculates statistics for each combination of the BY variables

    Output out=output_data_set;

    Creates data set with the default statistics

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    74/87

    FREQ Procedure PROC FREQ is used to generate frequency tables

    Most common usage is create table showing the distribution of categoricalvariables

    General Form: PROC FREQ DATA=input_data_set;

    TABLE variable1*variable2*variable3/;

    RUN;

    Options

    LISTprints cross tabulations in list format rather than grid

    MISSINGspecifies that missing values should be included in the tabulations

    OUT=output_data_setcreates a data set containing frequencies, list format

    NOPRINTsuppress printing in the output window

    Use BY statement to get percentages within each category of a variable

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    75/87

    Summary Procedures

    Questions/Comments

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    76/87

    Statistical Analysis Procedures

    1. CorrelationPROC CORR

    2. RegressionPROC REG

    3. Analysis of VariancePROC ANOVA

    4. Chi-square Test of AssociationPROC FREQ5. General Linear ModelsPROC GENMOD

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    77/87

    CORR Procedure PROC CORR is used to calculate the correlations between variables

    Correlation coefficient measures the linear relationship between two variables

    Values Range from -1 to 1

    Negative correlation - as one variable increases the other decreases

    Positive correlationas one variable increases the other increases

    0no linear relationship between the two variables 1perfect positive linear relationship

    -1perfect negative linear relationship

    General Form: PROC CORR DATA=input_data_set

    VAR Variable1 Variable2;With Variable3;

    RUN;

    CO

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    78/87

    CORR Procedure

    If the VAR and WITH statements are not used, correlation is computedfor all pairs of numeric variables

    Options include

    SPEARMANcomputes Spearmans rank correlations

    KENDALLcomputes Kendalls Tau coefficients

    HOEFFDINGcomputes HoeffdingsD statistic

    REG P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    79/87

    REG Procedure PROC REG is used to fit linear regression models by least squares estimation

    One of many SAS procedures that can perform regression analysis

    Only continuous independent variables (Use GENMOD for categorical variables)

    General Form:

    PROC REG DATA=input_data_set

    MODEL dependent=independent1 independent2/;;

    RUN;

    PROC REG statement options include

    PCOMIT=m - performs principle component estimation with m principlecomponents

    CORRdisplays correlation matrix for independent variables in the model

    REG P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    80/87

    REG Procedure

    MODEL statement options include SELECTION=

    Specifies a model selection procedure be conductedFORWARD, BACKWARD, and STEPWISE

    ADJRSQ - Computes the Adjusted R-Square MSEComputes the Mean Square Error

    COLLINperforms collinearity analysis

    CLBcomputes confidence limits for parameter estimates

    ALPHA=

    Sets significance value for confidence and prediction intervalsand tests

    REG P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    81/87

    REG Procedure

    Optional statements include PLOT Dependent*Independent1generates plot of data

    ANOVA P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    82/87

    ANOVA Procedure

    PROC ANOVA performs analysis of variance Designed for balanced data (PROC GLM used for unbalance data)

    Can handle nested and crossed effects and repeated measures

    General Form: PROC ANOVA DATA=input_data_set ;

    CLASS independent1 independent2;

    MODEL dependent=independent1 independent2;;

    Run;

    Class statement must come before model statement, used to defineclassification variables

    ANOVA P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    83/87

    ANOVA Procedure

    Useful PROC ANOVA statement optionOUTSTAT=output_data_set Generates output data set that contains sums of squares,

    degrees of freedom, statistics, and p-values for each effect in themodel

    Useful optional statementMEANS independent1/

    Used to perform multiple comparisons analysis

    Set to:

    TUKEYTukeysstudentized range test

    BONBonferroni t test

    Tpairwise t tests DuncanDuncans multiple-range test

    ScheffeScheffesmultiple comparison procedure

    FREQ P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    84/87

    FREQ Procedure

    PROC FREQ can also be used to perform analysis with categorical data General Form: PROC FREQ DATA=input_data_set;

    TABLE variable1 variable2/;

    RUN;

    TABLE statement options include: AGREE Tests and measures of classification agreement including McNemarstest,

    Bowkerstest, Cochrans Q test, and Kappa statistics

    CHISQ - Chi-square test of homogeneity and measures of association

    MEASURE - Measures of association include Pearson and Spearman correlation,gamma, Kendalls Tau, Stuarts tau, SomersD, lambda, odds ratios, riskratios, and confidence intervals

    GENMOD P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    85/87

    GENMOD Procedure

    PROC GENMOD is used to estimate linear models in which the responseis not necessarily normal

    Logistic and Poisson regression are examples of generalized linearmodels

    General Form:

    PROC GENMOD DATA=input_data_set;

    CLASS independent1;

    MODEL dependent = independent1 independent2/

    dist=

    link=;

    run;

    GENMOD P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    86/87

    GENMOD Procedure DIST = - specifies the distribution of the response variable

    LINK= - specifies the link function from the linear predictor to the mean ofthe response

    ExampleLogistic Regression

    DIST = binomial LINK = logit

    ExamplePoisson Regression

    DIST = poisson

    LINK = log

    St ti ti l A l i P d

  • 7/27/2019 SAS Short Course Presentation 11-4-09

    87/87

    Statistical Analysis Procedures

    Questions/Comments