5. sas procedures for linear regression

56
5. SAS procedures for linear regression Karl B Christensen http://192.38.117.59/ ~ kach/SAS Karl B Christensenhttp://192.38.117.59/ ~ kach/SAS 5. SAS procedures for linear regression

Upload: others

Post on 29-Apr-2022

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5. SAS procedures for linear regression

5. SAS procedures for linear regression

Karl B Christensenhttp://192.38.117.59/~kach/SAS

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 2: 5. SAS procedures for linear regression

Contents

Scatter plots

Correlation

Simple linear regression

Residual plots

Histogram, Probability plot, Box plot

Data example: obesity score and blood pressure

http://192.38.117.59/~kach/SAS/bp.sas7bdat

http://192.38.117.59/~kach/SAS/bp.txt

http://192.38.117.59/~kach/SAS/bp.xlsx

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 3: 5. SAS procedures for linear regression

Use PROC PRINT to look at the data set

LIBNAME dat ’P:\’;

PROC PRINT DATA=dat.bp(obs =7);

RUN;

Output

Obs sexnr obese bp

1 1 1.31 130

2 1 1.31 148

3 1 1.19 146

4 1 1.11 122

5 1 1.34 140

6 1 1.17 146

7 1 1.56 132

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 4: 5. SAS procedures for linear regression

Simplest scatter plot using PROC GPLOT

PROC GPLOT DATA=dat.bp;

PLOT bp*obese;

RUN; QUIT;111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

can be improved a lot....Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 5: 5. SAS procedures for linear regression

Improved scatter plot using PROC GPLOT

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=3) VALUE =(H=2) MINOR=NONE;

AXIS2 LABEL=(H=3 A=90) VALUE=(H=2) MINOR=NONE;

SYMBOL1 V=CIRCLE H=2;

RUN; QUIT;111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 6: 5. SAS procedures for linear regression

Improved scatter plot using PROC GPLOT

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=3) VALUE =(H=2) MINOR=NONE;

AXIS2 LABEL=(H=3 A=90) VALUE=(H=2) MINOR=NONE;

SYMBOL1 V=CIRCLE H=2;

RUN; QUIT;

Things to notice

AXIS1: larger label and larger numbers on the axis

AXIS2: also rotates the label

MINOR=NONE: removes tick marks

SYMBOL1: option V (for value) gives circles in stead of +,option H makes symbols larger

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 7: 5. SAS procedures for linear regression

Correlation

Is blood pressure related to obesity ? We compute the correlationusing PROC CORR

Default is the parametric correlation, based on the bivariatenormal distribution: (the Pearson correlation).

The Spearman correlation based on ranks is the mostcommonly used non parametric correlation

The Kendall correlation is an alternative rank correlationbased on number of concordant and discordant pairs

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 8: 5. SAS procedures for linear regression

Correlation coefficient

Calculated as

r = rxy =Sxy√SxxSyy

=

∑ni=1 (xi − x)(yi − y)√∑n

i=1 (xi − x)2∑n

i=1 (yi − y)2

Measures the strength of the (linear) association between twovariables

values are between −1 and 1

0 corresponds to independence

−1 and 1 correspond to perfect linearity

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 9: 5. SAS procedures for linear regression

Computing correlations in SAS

SAS code

PROC CORR DATA=dat.bp PEARSON SPEARMAN KENDALL;

VAR bp;

WITH obese;

RUN; QUIT;

Options PEARSON, SPEARMAN, and KENDALL: we get the Pearsoncorrelation, the Spearman correlation, and the Kendall correlation.Can use

PROC CORR DATA=dat.bp PEARSON SPEARMAN KENDALL;

VAR bp obese;

RUN; QUIT;

but this gives us more (unneeded) output.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 10: 5. SAS procedures for linear regression

Output

Pearson Correlation Coefficients , N = 102

Prob > |r| under H0: Rho=0

bp

obese 0.32614

0.0008

Spearman Correlation Coefficients , N = 102

Prob > |r| under H0: Rho=0

bp

obese 0.30363

0.0019

Kendall Tau b Correlation Coefficients , N = 102

Prob > |tau| under H0: Tau=0

bp

obese 0.21298

0.0019

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 11: 5. SAS procedures for linear regression

Linear regression

Y Response variable, outcome variable, dependent variable(bp)

X : Explanatory variable, independent variable, covariate(obesity score)

Bivariate observations of X and Y for n individuals

(xi , yi ), i = 1, . . . , n

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 12: 5. SAS procedures for linear regression

Linear regression

The equation for a straight line: Y = α + βX

α: intercept, intersection with Y -axis (at X=0)

The expected bp for an individual with obesity score 0 (oftenan illegal extrapolation).

β: slope, regression coefficient

Expected difference in bp for two individuals with a differenceof 1 unit in obesity score.

Often the parameter of interest.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 13: 5. SAS procedures for linear regression

simple linear regression

Model

Yi = α + βXi + εi , εi ∼ N(0, σ2) independent

Estimation - least squares: α and β minimizing

n∑i=1

(yi − (α + βxi ))2

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 14: 5. SAS procedures for linear regression

Assumptions in linear regression

Linearity in the mean value

Independence between error terms εi

Normally distributed error terms, εi ∼ N(0, σ2)

Variance homogeneity, that is, identical variances for all εi ’s

First assumption is the most important. If the assumptionsunderlying the linear regression model are not met, we cannot trustthe results.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 15: 5. SAS procedures for linear regression

Add a smooth curve to the scatter plot

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 V=CIRCLE H=2 I=SM75S L=2;

RUN; QUIT;

Interpolation I=SM75S smooth curve, L=2 dashed curve 111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 16: 5. SAS procedures for linear regression

Add regression line to the scatter plot

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 V=CIRCLE H=2 I=RL L=1;

RUN; QUIT;

Interpolation I=RL the regression line, L=1 makes the line solid111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 17: 5. SAS procedures for linear regression

Regression using PROC REG

The SAS procedure PROC REG can be used for linear regression

PROC REG DATA=dat.bp;

MODEL bp=obese;

RUN; QUIT;

Output

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 3552.41931 3552.41931 11.90 0.0008

Error 100 29846 298.45541

Corrected Total 101 33398

shows a significant association

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 18: 5. SAS procedures for linear regression

Regression using PROC REG

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 96.81793 8.91960 10.85 <.0001

obese 1 23.00135 6.66701 3.45 0.0008

Estimated relation:

bp = 96.81793 + 23.00135×obese

Interpretation: A difference of 1 unit in obesity score correspondsto an expected difference of 23.00135 units in blood pressure.Confidence limits: use the option CLB

PROC REG DATA=dat.bp;

MODEL bp = obese / CLB;

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 19: 5. SAS procedures for linear regression

Variance around the regression line σ2 is estimated as

s2 =1

n − 2

n∑i=1

(yi − α− βxi )2

and found in output as mean square error, here the value is298.45541. Standard deviation around the regression line found inoutput as root mean square error ’root MSE’

s =√s2 = 17.27586

has the same unit as outcome and is easier to interpret.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 20: 5. SAS procedures for linear regression

Confidence limits for regression line

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 V=CIRCLE I=RLCLM95 L=1;

RUN; QUIT;

the option I=RLCLM95 gives us the confidence limits (I=RLCLI95which gives us prediction limits)

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 21: 5. SAS procedures for linear regression

Confidence limits for regression line

111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 22: 5. SAS procedures for linear regression

Model control in regression analysis

Residualsεi = yi − yi = yi − (α + βxi )

should be plotted against:

1 the explanatory variable xi – to check linearity

2 the fitted values yi – to check variance homogeneity (andnormality)

3 ’normal scores’ i.e. probability plot – to check normality

First two should give impression random scatter, while theprobability plot ought to show a straight line.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 23: 5. SAS procedures for linear regression

Residual plots and linearity

Look for ∪ or ∩ forms

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 24: 5. SAS procedures for linear regression

Plots for model checking in the HTML output

If you want to add a smooth curve to ease the check for ∪ or ∩forms in the plots of the residuals against the covariates, try

ODS GRAPHICS ON;

PROC REG DATA=dat.bp PLOTS=( DIAGNOSTICS RESIDUALS(SMOOTH ));

MODEL bp = obese / CLB;

RUN;

ODS GRAPHICS OFF;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 25: 5. SAS procedures for linear regression

Distribution of residuals not well approximated by normaldistribution.Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 26: 5. SAS procedures for linear regression

Types of residuals

Ordinary residuals = model deviations

εi = yi − yi .

SAS option: R

Standardized (studentized) residuals. SAS option: STUDENT

Residuals where current observation is not used in estimationof corresponding line. SAS option: PRESS

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 27: 5. SAS procedures for linear regression

Save residuals and predicted values

PROC REG calculates and saves all these quantities for later use,typically graphics (e.g., if the HTML output does not work on yourown computer or you want to create your own plots)

PROC REG DATA=dat.bp;

MODEL bp = obese / CLB;

OUTPUT OUT=WORK.ch P=forv R=resid;

RUN;

get a new data set WORK.ch with new variables forv and resid

which we may then to construct residual plots

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 28: 5. SAS procedures for linear regression

Plot residuals and predicted values

PROC GPLOT DATA=WORK.ch;

PLOT resid*(obese forv) /

HAXIS=AXIS1

VAXIS=AXIS2

VREF=0

CVREF=GRAYAA

LVREF =33;

SYMBOL1 V=CIRCLE CV=BLACK H=2 I=SM75S CI=RED L=8 W=3;

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 29: 5. SAS procedures for linear regression

Plot residuals and predicted values

111

Res

idua

l

-30

-20

-10

0

10

20

30

40

50

60

70

80

obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 30: 5. SAS procedures for linear regression

Plot residuals and predicted values

2222

Res

idua

l

-30

-20

-10

0

10

20

30

40

50

60

70

80

Predicted Value of bp110 120 130 140 150 160

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 31: 5. SAS procedures for linear regression

Plot histogram and probability plot for the residuals

PROC UNIVARIATE NORMAL DATA=WORK.ch;

VAR resid;

HISTOGRAM / HEIGHT =3 CFILL=GRAY NORMAL;

PROBPLOT / HEIGHT =3 NORMAL(MU=EST SIGMA=EST L=2 COLOR=RED);

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 32: 5. SAS procedures for linear regression

Plot histogram and probability plot for the residuals

3333

-22.5 -7.5 7.5 22.5 37.5 52.5 67.50

10

20

30

40

50P

erce

nt

Residual

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 33: 5. SAS procedures for linear regression

Plot histogram and probability plot for the residuals

5555

0.1 1 5 10 25 50 75 90 95 99 99.9

-40

-20

0

20

40

60

80R

esid

ual

Res

idua

l

Normal Percentiles

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 34: 5. SAS procedures for linear regression

Exercise: Regression and graphics I

In this exercise we want to study the effect of age on theSIGF1-level in the Juul data.

1 Get the data into SAS using a LIBNAME statement

2 Create a new data set containing only prepubertal children(Tanner stage 1 and age > 5).

3 Use PROC GPLOT to plot the relationship between SIGF1 andage for prepubertal children. Add regression lines or smoothcurves.

4 Use PROC REG to do a regression analysis of SIGF1 vs. agefor prepubertal children.

5 Use OUTPUT OUT=WORK.check .. to make a data set withresiduals and use these to evaluate model fit.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 35: 5. SAS procedures for linear regression

More about scatter plots in SAS

PROC GPLOT in the simplest form

PROC GPLOT DATA=dat.bp;

PLOT bp*obese;

RUN; QUIT;

111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

obese

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 36: 5. SAS procedures for linear regression

More about scatter plots in SAS

more code can give us nicer output

AXIS1 LABEL=(H=3 F=swissbe "Obese")

VALUE=(H=2)

ORDER =(0.8 TO 2.4 BY 0.2)

MINOR=NONE

OFFSET =(3,2)CM;

AXIS2 LABEL=(A=90 H=2 F=SWISSBI "Blood pressure")

VALUE=(H=2)

ORDER =(90 TO 210 BY 10)

LENGTH =12CM;

PROC GPLOT DATA=dat.bp;

PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 C=RED V=DOT H=2 I=SM45S L=1 W=3;

LEGEND1 LABEL=(H=2.5) VALUE=(H=2 JUSTIFY=LEFT);

TITLE1 F=CSCRIPT H=3 "BP and "

F="Times New Roman" H=2 "obese";

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 37: 5. SAS procedures for linear regression

More about scatter plots in SAS

111

90

100

110

120

130

140

150

160

170

180

190

200

210

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

obese

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 38: 5. SAS procedures for linear regression

More about scatter plots in SAS

AXIS1 LABEL=(H=3 F=swissbe "obese")

VALUE=(H=2)

ORDER =(0.8 TO 2.4 BY 0.2)

MINOR=NONE

OFFSET =(1,1)CM;

AXIS2 LABEL=(A=90 H=2 F=SWISSBI "Blood pressure")

VALUE=(H=2)

ORDER =(90 TO 210 BY 10)

LENGTH =12CM;

PROC GPLOT DATA=dat.bp;

PLOT bp*obese = sexnr / HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 C=RED V=CIRCLE H=2 I=SM85S L=1 W=3;

SYMBOL2 C=BLUE V=DOT H=2 I=SM85S L=7 W=3;

LEGEND1 LABEL=(H=2) VALUE =(H=2 JUSTIFY=LEFT);

TITLE1 F=CSCRIPT H=2 F="Times New Roman" "BP and obese";

RUN; QUIT;

use third variable in PLOT statement and include statements onAXIS, SYMBOL LEGEND and TITLE

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 39: 5. SAS procedures for linear regression

More about scatter plots in SAS

111

90

100

110

120

130

140

150

160

170

180

190

200

210

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

BP and obese

sexnr 1 2

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 40: 5. SAS procedures for linear regression

Symbol statements

One symbol statement for each group (each value of sexnr).Options

C=RED - color of points and lines: BLACK, RED, BLUE, ...

V=DOT plotting symbol: CIRCLE, DOT, STAR, ...

H=2 size of plotting symbol (default 1)

I=SM25S interpolation method: NONE, JOIN, SM75S, RL,

RLCLI95, RLCLM95, ...

L=1 line type (1: solid, 2-46: dashed)

W=3 - line three times thicker than standard

MODE=INCLUDE - observations outside range specified inORDER statement are used in the interpolation (default isMODE=EXCLUDE

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 41: 5. SAS procedures for linear regression

Plotting symbols

V= or VALUE= in SYMBOL statements

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 42: 5. SAS procedures for linear regression

Interpolation methods

I= or INTERPOL= in SYMBOL statements

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 43: 5. SAS procedures for linear regression

Line types

L= or LINE= in SYMBOL statements

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 44: 5. SAS procedures for linear regression

Fonts in SAS

F= or FONT= in SYMBOL, AXIS or TITLE statements

or specify font names in quotes (e.g. ”Times New Roman”)See choices: Tools → Options → Enhanced Editor → Appearance → Font Name:

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 45: 5. SAS procedures for linear regression

AXIS specifications

HAXIS=AXIS1 VAXIS=AXIS2: horizontal axis uses AXIS1 andvertical axis uses AXIS2 (AXIS1 and AXIS2 must beprespecified

LABEL=(A=90 H=2 "..") specifies axis text, size of theletters, and direction of text (A=90 text string rotated 90degrees counterclockwise)

VALUE=(H=2): the size of the digits on the axis

ORDER=(135 TO 195 BY 5) specifies desired range and thespecific numbers on the axis

MINOR=9 number of tick marks between the numbers, may beset to NONE

OFFSET=(3,2)CM leaves 3 cm of empty space to the left and2 cm of empty space to the right

LOGBASE=10: log-scaled axis - tick marks at powers of 10.

LENGTH=12CM length of the axis.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 46: 5. SAS procedures for linear regression

Histograms in PROC UNIVARIATE

PROC UNIVARIATE DATA=dat.bp;

CLASS sexnr;

VAR bp;

HISTOGRAM bp / CFILL=GRAY NORMAL;

INSET MEAN STD SKEWNESS / HEADER="Descriptive stats" FORMAT =8.3;

RUN;

HISTOGRAM: gives a histogram for the variable bp

CFILL=GRAY: bars are gray

CLASS sexnr: one histogram for each gender NORMAL: thebest fitting normal curve is included.

Endpoints of the bars in the histogram may also be specified,e.g., ENDPOINTS=1.9 TO 2.3 BY 0.1,

INSET MEAN STD SKEWNESS ...: Box with descriptivestatistics included.

can use formats for descriptive statistics

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 47: 5. SAS procedures for linear regression

Histograms in PROC UNIVARIATE

5555

BP and obese

0

5

10

15

20

25

30

35

40

45P

erc

ent

Descriptive stats

Mean 127.955

Std Deviation 16.591

Skewness 1.217

1

97.5 112.5 127.5 142.5 157.5 172.5 187.5 202.5

0

5

10

15

20

25

30

35

40

45

Pe

rce

nt

Descriptive stats

Mean 126.310

Std Deviation 19.419

Skewness 1.574

2

bp

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 48: 5. SAS procedures for linear regression

Probability plots in PROC UNIVARIATE

PROC UNIVARIATE DATA=dat.bp;

VAR bp;

PROBPLOT bp / HEIGHT =3 NORMAL(MU=EST SIGMA=EST L=41);

INSET MEAN STD SKEWNESS / HEADER="Descriptive stats" FORMAT =8.1;

RUN;

PROBPLOT: gives us a probability plot

HEIGHT=3: size of text

NORMAL(MU=EST SIGMA=EST L=41) adds (dashed) line ofbest fitting normal distribution.

CLASS sexnr separate plot for each gender (some SASversions show same fitted line is shown in all plots)

can use formats for descriptive statistics

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 49: 5. SAS procedures for linear regression

Probability plots in PROC UNIVARIATE

3333

BP and obese

0.1 1 5 10 25 50 75 90 95 99 99.9

75

100

125

150

175

200

225bpbp

Descriptive stats

Mean 127.0

Std Deviation 18.2

Skewness 1.4

Normal Percentiles

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 50: 5. SAS procedures for linear regression

Box plots

Two different ways: PROC BOXPLOT

PROC SORT DATA=dat.bp;

BY sexnr;

RUN;

PROC BOXPLOT DATA=dat.bp;

PLOT bp*sexnr / HEIGHT =3 BOXSTYLE=SCHEMATIC;

RUN;

or PROC GPLOT

PROC GPLOT DATA=dat.bp;

PLOT bp*sexnr / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM ORDER=1 TO 2 BY 1;

AXIS2 LABEL=(H=3 A=90) VALUE=(H=2);

SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 51: 5. SAS procedures for linear regression

Box plots in PROC BOXPLOT

111

BP and obese

1 2

75

100

125

150

175

200

225bp

sexnr

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 52: 5. SAS procedures for linear regression

Box plots in PROC GPLOT

111

bp

90

100

110

120

130

140

150

160

170

180

190

200

210

sexnr1 2

BP and obese

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 53: 5. SAS procedures for linear regression

PROC BOXPLOT

PROC SORT DATA=dat.bp;

BY sexnr;

RUN;

PROC BOXPLOT DATA=dat.bp;

PLOT bp*sexnr / HEIGHT =3 BOXSTYLE=SCHEMATIC;

RUN;

bp*sexnr: distribution of bp for each value of sexnr

Data must be sorted by sexnr

HEIGHT=3 size of text

BOXSTYLE=SCHEMATIC type of boxplot (there are several).

SCHEMATIC means that whiskers will extend to the mostextreme points within 1.5 × the interquartile range, measuredfrom the end of the box1

1if data are perfectly normally distributed, this is close to the 2.5 and 97.5percentiles – otherwise it is not really relevantKarl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 54: 5. SAS procedures for linear regression

Box plot in PROC GPLOT

PROC GPLOT DATA=dat.bp;

PLOT bp*sexnr / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM ORDER=1 TO 2 BY 1;

AXIS2 LABEL=(H=3 A=90) VALUE=(H=2);

SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;

RUN; QUIT;

bp*sexnr: the distribution of bp for each value of sexnr

OFFSET=(6,6)cm: needed, otherwise first and last box will beplaced adjacent to left and right border of the plot

I=BOX10TJ: 10 means that the whiskers show the 10th and90th percentiles (any number between 00 and 25 can beused), T means that there are serifs at the end of thewhiskers, and J means that medians are joined line2

W=3: width of boxes (and joining line)

2more interesting when x-variable has more that two levelsKarl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 55: 5. SAS procedures for linear regression

How to save graphics and tables

SAS can use the ’output delivery system’ (ODS) to direct outputfrom a SAS procedure to other files

ODS RTF FILE=’p:\ eksempel.rtf ’ STYLE=Journal BODYTITLE;

PROC GPLOT;

---;

RUN; QUIT;

PROC FREQ;

---;

RUN;

ODS RTF CLOSE;

this generates a file in ’rich text format’ (RTF), which can beopened by Word. Alternative PDF.

predefined STYLE looks a little like presentation in scientificpapers.

BODYTITLE: titles are placed in the documents page headerand SAS footnotes are not placed in the document’s pagefooter.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression

Page 56: 5. SAS procedures for linear regression

Exercise: Regression and graphics II

1 Download the bp.sas7bdat data set from the homepage andget it into SAS using a LIBNAME statement

2 Use graphical methods to answer the questions

Is there an association between obese and sexnr ?Is there an association between bp and sexnr ?Is there an association between obese and bp ?

3 Use ODS RTF to save the plots in an rtf file that can beopened using Word.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression