investigating automatic static analysis results to identify quality problems: an inductive study

Investigating Automatic Static Analysis Results to Identify Quality Problems: an Inductive Study

Antonio Vetro’ – Nico Zazworka – Forrest Shull – Carolyn Seaman – Michele A. Shaw

IEEE Software Engineering Workshop (SEW-35), 12-13 October 2012 Heraclion, Crete, Greece

Introduction and motivations

Automatic Static Analysis (ASA) - process

Sourcecode

ASA toolIssues

(warnings)

rules, patterns

Fix issues

RefactoredSource code

Use of ASA: research streams

Looking at ASA issues to identify defects in single lines of code

Looking at large sets of issues as early indicators of the more defect prone modules

bug

ASA issue

ASA issue related to a bug

Legend

code





• Useful as an early verification technique

• It can shorten the defect insert-remove time

• However many studies report high rate of false positive ASA issues (from 30% to 96%)



• On realistic sized application applications ASA tools typically generate

thounsands of issues

• Output needs further fefinement and tailoring from developers to be useful



Legend

Software module (class, file, component)



• Useful as a proxy for defects location

• It can provide guidance to inspection/test planning

• Many studies report positive correlations between number of defects and number of ASA issues

This study

Contributions



• New tool/language/application combination (Resharper/ C#/ Web application).

• Analysis at two granularity levels, i.e. software components and source code files.

• We investigate whether specific types of ASA issues can be linked to specific quality dimensions.

Study context



• Web based industrial application (C#) of about 35 KLOCS

• 78 fixed and closed defects reported in the JIRA tracking system

• ASA tool: Resharper

Study goals

G1

Understand whether/which ASA issues are indicators of

defect-proneness

RQ C1

Which ASA issue categories can identify

defect-prone components?

RQ F1

Which ASA issue categories can identify defect-prone files?

G2

Understand whether/which ASA issues are related to specific software quality

characteristics

RQ C2

Which ASA issue categories can point to

defect-prone components that impact various system quality characteristics?

RQ F2

Which ASA issue categories can point to defect-prone files that impact various system quality characteristics?

Legend

F = file

C = component

G = goal

RQ = research question

Goal 1

G1

Understand whether/which ASA issues are indicators of defect-proneness

RQ C1

Which ASA issue categories can identify defect-

prone components?

ASA issues vs defects:

Spearman correlation

RQ F1

Which ASA issue categories can identify defect-prone files?

Issues in non defect prone files vs issues in defect prone files

Mann Whitney test

Goal 2

G2

Understand whether/which ASA issues are related to specific software

quality characteristics

RQ C2

Which ASA issue categories can point to defect-

prone components that impact various

system quality characteristics?

ASA issues vs defects:

Spearman correlation

RQ F2

Which ASA issue categories can point to defect-prone files that impact various system quality characteristics?

Issues in non defect prone files vs issues in defect prone files

Mann Whitney test

Goal 2

SW Quality

Functionality

Reliability

Efficiency

Usability

Portability

Maintainability

Using the ISO/IEC 9126 product quality model to classify defects : a controlled experiment, A. Vetro’, N. Zazworka, C. Seaman, and F. Shull, IET Digest 2012, 187 (2012), DOI:10.1049/ic.2012.0025

Mapping between ASA issues, Defects, Files, Components

Results

RQ C1-C2 Which ASA issue categories can identify defect-prone components ?

Defect types �All

RQ C1F FR FU R U

Resharper issuescategories

Common PracticesandCode Improvements

-0.14 -0.13 -0.34 0.07 0 -0.2

Compiler Warnings 0.3 0.31 0.48 0.28 0.04 0.25

Constraints Violations 0.11 0.1 0.03 0.09 0.23 0.18

Language UsageOpportunities 0.57 0.53 0.55 0.5 0.2 0.43

Potential Code QualityIssues

0.54 0.5 0.51 0.44 0.22 0.44

Redundancies in Code 0.52 0.49 0.47 0.33 0.39 0.53

Redundancies inSymbol Declarations

0.42 0.45 0.01 0.28 0.17 0.14

Unused symbols 0.53 0.53 0.75 0.57 0.33 0.56

Sum of Resharperissues

0.19 0.18 0.1 0.09 0.23 0.23

In bold significant values (90%)

RQ F1Which ASA issue categories can identify defect-prone files?

Resharper issuesPval

ASP.NET NA

Common Practices and Code Improvements 0.983

Compiler Warnings 0.333

Constraints Violations 0.014

Language Usage Opportunities 0.026

Potential Code Quality Issues 0.021

Redundancies in Code <0.001

Redundancies in Symbol Declarations 0.969

Unused.Symbols NA

Sum 0.133

In bold significant values (90%)

Quality characteristic –Resharper issue category

Pval

F – Constraints Violations 0.013

F – Redundancies in Code 0.002

FR – Compiler Warnings 0.001

FU – Constraints Violations 0.002

FU – Redundancies in Code 0.004

FU - Sum 0.062

R – Redundancies in Code 0.033

R - Sum 0.029

U – Constraints Violations 0.085

U – Language UsageOpportunities

0.042

U – Potential Code QualityIssues

<0.001

U – Redundancies in Code 0.033

RQ F2

Cumulative distribution of defects in files and indicators

Follow-up analysis: sorting files by different indicators

Cumulative distribution of defects in components and indicators

Follow-up analysis: sorting components by different indicators

Conclusions

Summary

• Few Resharper categories had positive correlations with defects at component level

• Several Resharper categories were concentrated to defect prone files

• The issues with higher correlations identify problems regarding code readability, performance, and more in general related to maintainability problems.

• Classifying the defects according to the ISO 9126 quality characteristics, different ASA issues categories were positively correlated to different quality characteristics.

• Comparing the capability of Resharper issues to detect the faultiest modules, specific ASA issues were more efficient than the sum of them or traditional indicators (i.e. software metrics).

Recommendations for future work

• Analysis on file level might lead to more promising results than on component level.

• The size of the project should be at least, but preferably larger than our medium sized project, to avoid data sparseness problems as we found in our study.

• Understand if results for specific categories are useful in other environments or tailoring is necessary

• Provide practitioner-oriented methods to build prediction models rather than building new models.

Questions?

Investigating Automatic Static Analysis Resultsto Identify Quality Problems: an Inductive Study

Antonio Vetro’ – Nico Zazworka – Forrest Shull –

Carolyn Seaman – Michele A. Shaw

IEEE Software Engineering Workshop (SEW-35), 12-13 October 2012 Heraclion, Crete, Greece

[email protected]

investigating automatic static analysis results to identify quality problems: an inductive study

Documents

whetherwhich asa issues

number of asa issues

defectprone files

defect prone files vs

asa issue categories

identifysets of issues

issues redundancies

prone componentsc