lint, a sas program checker - phuse wiki 2017 ad presentations/ad08.pdf · static code analyzers...

21
Lint, a SAS program checker AD08 – Phuse 2017 Igor Khorlo

Upload: ngoquynh

Post on 04-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Lint, a SAS programchecker

AD08 – Phuse 2017Igor Khorlo

Page 2: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Standards

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 2 / 21

I guess most people here have their own team rules orstyleguides which are written out in some document. Manyprogramming languages have community/company/etc drivenstandards:

I NASA C style Guide - 10 rules that allow NASA towrite millions of lines of code with minimal errors

I A community-driven Ruby coding style guideI Linux kernel coding styleI Google C++ Style Guide

Page 3: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Static Code Analyzers

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21

Static code analyzers are programs which check your sourcecode without running it. This can be achieved in many ways.The most common one is an AST (Abstract Syntax Tree)generation from a raw source code and its analysis. E.g.:

I Lint, a C Program CheckerI rubocop – A Ruby static code analyzer, based on the

community Ruby style guideI PVS-Studio analyzer – Static code analyzer for C,

C++, and C#

Page 4: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

SAS Examples

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 4 / 21

In the next slides, we will consider several examples of thestatic code analyzer for SAS. How this is implemented we willdiscuss later.

Page 5: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

SAS Strings Limitations Checks

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 5 / 21

SAS has many limitations for string length:

I Most of you familiar with LRECL option for an %include

and similar statements. But, there is a pitfall – macrocompiled via sasautos will use LRECL=256 – this isunchangeable.

I Maximum titles/footnotes statements length is 255,entrytitle/entryfootnote for GTL is 256 (SAS 9.2).

I general recommendation for a maximum line length of80 symbols (pep8, rubocop).

Page 6: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Question

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 6 / 21

Any thoughts what is wrong here?

if length(str) > 0 then do;

/* ... */

end;

Page 7: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Hidden-true Expressions

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 7 / 21

Let’s consider a common expression length(str) > 0. Theproblem here is that:

LENGTH Function – Returns the length of a non-blankcharacter string, excluding trailing blanks, and returns1 for a blank character string.

Therefore, this expression is always true, what does not seemso from the first glance.

Page 8: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Dangerous But Legal Code

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 8 / 21

Suppose we have the following condition:

where pcng < 90;

The issue here is that the condition may work now as expected,because there are no missings in the pcng variable, buttomorrow, when a new data arrives, it will work incorrectlywithout generating any error/warning and you may spendseveral hours identifying the issue. The safe way is:

where . < pcng < 90;

Writing a check for this sort of situation will definitely save youtime and nerves in the future as well as protect you from errors.

Page 9: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Styleguide

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 9 / 21

Many languages have a styleguide – a set of recommendationshow you should write/align/format your source code. Some ofthem even have a build-in functionality which force source codeformatting. E.g. Go language with gofmt.Let’s consider two pieces of code:Poorly aligned:

data example;

set sashelp.class;

if Origin=’Europe’ then do;

%do_something;

end;

proc freq data=example;

tables type;

Page 10: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Styleguide – Well-aligned

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 10 / 21

data example;

set sashelp.class;

if Origin = ’Europe’ then do;

%do_something;

end;

run;

proc freq data = example;

tables type;

run;

Page 11: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Optimizations

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 11 / 21

data adlb_lymph;

set adlb;

if paramcd = ’LYMPH’;

run;

Can be optimized to:

data adlb_lymph;

set adlb;

where paramcd = ’LYMPH’;

run;

Page 12: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Hardcode Detection

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 12 / 21

Check for an undocumented hardcoded code parts.Bad:

where (&itt_cond and ctd1 ^= 2.4 and paramcd = ’ALLCSMRT’

and parcat1n = 1)

or usubjid = ’99999-9999-9999’;

Good:

where (&itt_cond and ctd1 ^= 2.4 and paramcd = ’ALLCSMRT’

and parcat1n = 1)

/* cf. Section x.x of the data handling document */

or usubjid = ’99999-9999-9999’;

Page 13: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Linter Architecture

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 13 / 21

Linter architecture

Page 14: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Abstract Syntax Tree – AST

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 14 / 21

Abstract syntax tree (AST), or just syntax tree, is a treerepresentation of the abstract syntactic structure of sourcecode written in a programming language. Each node of the treedenotes a construct occurring in the source code. The syntax is“abstract” and not representing every detail appearing in thereal syntax.

Page 15: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Abstract Syntax Tree for an Expression

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 15 / 21

a + b * (var3 / 2)

Parse tree

Page 16: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

ANTLR

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 16 / 21

// Expr.g4

grammar Expr;

expr: expr (’*’|’/’) expr

| expr (’+’|’-’) expr | INT

| ID

| ’(’ expr ’)’

;

ID : [A-Za-z_][A-Za-z_0-9]* ; // match identifiers

INT: [0-9]+ ; // match integers

NL : ’\r’? ’\n’ ; // newlines

WS : [ \t]+ -> skip ; // toss out whitespace

Page 17: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Generating Data Flow Graph Using DOT Language.

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 17 / 21

Data flow graph for the example program

Page 18: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

SAS Integration

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 18 / 21

In the end, we are expecting a tool which can be easilyintegrated into SAS. ANTLR generates a parser in Java, logicfor linter rules/checks are written in Java as well. Sinceeverything is written in Java we can this easily integrate this inSAS and to call linter from SAS program. This can be doneusing Data Step component Java Object.

Page 19: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Final Look

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 19 / 21

line 1:9 SAS option PS was changed ‘ps=50‘

line 1:15 SAS option LS was changed ‘ls=100‘

line 1:22 SAS option FMTERR was changed ‘nofmterr‘

line 5:5 warning: possible missings in the ‘height‘

variable may cause the condition ‘pcng < 90‘

to work not as expected

...

Page 20: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Conclusion

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 20 / 21

Why do you need to lint?

I SpeedI Less error proneI Faster code reviewsI Machines should do the workI ScalabilityI Rules and checks are based on the algorithms

Page 21: Lint, a SAS program checker - PhUSE Wiki 2017 AD Presentations/AD08.pdf · Static Code Analyzers PhUSE 2017 — AD08 — Lint, a SAS Program Checker 3 / 21 Static code analyzers are

Thank You

PhUSE 2017 — AD08 — Lint, a SAS Program Checker 21 / 21

Company: INC Research/inVentiv Health

Email: [email protected]