investigating automatic static analysis results to identify quality problems: an inductive study
Post on 24-May-2015
361 Views
Preview:
DESCRIPTION
TRANSCRIPT
Investigating Automatic Static Analysis Results to Identify Quality Problems: an Inductive Study
Antonio Vetro’ – Nico Zazworka – Forrest Shull – Carolyn Seaman – Michele A. Shaw
IEEE Software Engineering Workshop (SEW-35), 12-13 October 2012 Heraclion, Crete, Greece
Introduction and motivations
Automatic Static Analysis (ASA) - process
Sourcecode
ASA toolIssues
(warnings)
rules, patterns
Fix issues
RefactoredSource code
Use of ASA: research streams
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
bug
ASA issue
ASA issue related to a bug
Legend
code
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
• Useful as an early verification technique
• It can shorten the defect insert-remove time
• However many studies report high rate of false positive ASA issues (from 30% to 96%)
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
• On realistic sized application applications ASA tools typically generate
thounsands of issues
• Output needs further fefinement and tailoring from developers to be useful
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
Legend
Software module (class, file, component)
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
• Useful as a proxy for defects location
• It can provide guidance to inspection/test planning
• Many studies report positive correlations between number of defects and number of ASA issues
This study
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
Contributions
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
• New tool/language/application combination (Resharper/ C#/ Web application).
• Analysis at two granularity levels, i.e. software components and source code files.
• We investigate whether specific types of ASA issues can be linked to specific quality dimensions.
Study context
Looking at ASA issues to identify defects in single lines of code
Looking at large sets of issues as early indicators of the more defect prone modules
• Web based industrial application (C#) of about 35 KLOCS
• 78 fixed and closed defects reported in the JIRA tracking system
• ASA tool: Resharper
Study goals
G1
Understand whether/which ASA issues are indicators of
defect-proneness
RQ C1
Which ASA issue categories can identify
defect-prone components?
RQ F1
Which ASA issue categories can identify defect-prone files?
G2
Understand whether/which ASA issues are related to specific software quality
characteristics
RQ C2
Which ASA issue categories can point to
defect-prone components that impact various system quality characteristics?
RQ F2
Which ASA issue categories can point to defect-prone files that impact various system quality characteristics?
Legend
F = file
C = component
G = goal
RQ = research question
Goal 1
G1
Understand whether/which ASA issues are indicators of defect-proneness
RQ C1
Which ASA issue categories can identify defect-
prone components?
ASA issues vs defects:
Spearman correlation
RQ F1
Which ASA issue categories can identify defect-prone files?
Issues in non defect prone files vs issues in defect prone files
Mann Whitney test
Goal 2
G2
Understand whether/which ASA issues are related to specific software
quality characteristics
RQ C2
Which ASA issue categories can point to defect-
prone components that impact various
system quality characteristics?
ASA issues vs defects:
Spearman correlation
RQ F2
Which ASA issue categories can point to defect-prone files that impact various system quality characteristics?
Issues in non defect prone files vs issues in defect prone files
Mann Whitney test
Goal 2
SW Quality
Functionality
Reliability
Efficiency
Usability
Portability
Maintainability
Using the ISO/IEC 9126 product quality model to classify defects : a controlled experiment, A. Vetro’, N. Zazworka, C. Seaman, and F. Shull, IET Digest 2012, 187 (2012), DOI:10.1049/ic.2012.0025
Mapping between ASA issues, Defects, Files, Components
Mapping between ASA issues, Defects, Files, Components
Results
RQ C1-C2 Which ASA issue categories can identify defect-prone components ?
Defect types �All
RQ C1F FR FU R U
Resharper issuescategories
Common PracticesandCode Improvements
-0.14 -0.13 -0.34 0.07 0 -0.2
Compiler Warnings 0.3 0.31 0.48 0.28 0.04 0.25
Constraints Violations 0.11 0.1 0.03 0.09 0.23 0.18
Language UsageOpportunities 0.57 0.53 0.55 0.5 0.2 0.43
Potential Code QualityIssues
0.54 0.5 0.51 0.44 0.22 0.44
Redundancies in Code 0.52 0.49 0.47 0.33 0.39 0.53
Redundancies inSymbol Declarations
0.42 0.45 0.01 0.28 0.17 0.14
Unused symbols 0.53 0.53 0.75 0.57 0.33 0.56
Sum of Resharperissues
0.19 0.18 0.1 0.09 0.23 0.23
In bold significant values (90%)
RQ F1Which ASA issue categories can identify defect-prone files?
Resharper issuesPval
ASP.NET NA
Common Practices and Code Improvements 0.983
Compiler Warnings 0.333
Constraints Violations 0.014
Language Usage Opportunities 0.026
Potential Code Quality Issues 0.021
Redundancies in Code <0.001
Redundancies in Symbol Declarations 0.969
Unused.Symbols NA
Sum 0.133
In bold significant values (90%)
Quality characteristic –Resharper issue category
Pval
F – Constraints Violations 0.013
F – Redundancies in Code 0.002
FR – Compiler Warnings 0.001
FU – Constraints Violations 0.002
FU – Redundancies in Code 0.004
FU - Sum 0.062
R – Redundancies in Code 0.033
R - Sum 0.029
U – Constraints Violations 0.085
U – Language UsageOpportunities
0.042
U – Potential Code QualityIssues
<0.001
U – Redundancies in Code 0.033
RQ F2
Cumulative distribution of defects in files and indicators
Follow-up analysis: sorting files by different indicators
Cumulative distribution of defects in components and indicators
Follow-up analysis: sorting components by different indicators
Conclusions
Summary
• Few Resharper categories had positive correlations with defects at component level
• Several Resharper categories were concentrated to defect prone files
• The issues with higher correlations identify problems regarding code readability, performance, and more in general related to maintainability problems.
• Classifying the defects according to the ISO 9126 quality characteristics, different ASA issues categories were positively correlated to different quality characteristics.
• Comparing the capability of Resharper issues to detect the faultiest modules, specific ASA issues were more efficient than the sum of them or traditional indicators (i.e. software metrics).
Recommendations for future work
• Analysis on file level might lead to more promising results than on component level.
• The size of the project should be at least, but preferably larger than our medium sized project, to avoid data sparseness problems as we found in our study.
• Understand if results for specific categories are useful in other environments or tailoring is necessary
• Provide practitioner-oriented methods to build prediction models rather than building new models.
Questions?
Investigating Automatic Static Analysis Resultsto Identify Quality Problems: an Inductive Study
Antonio Vetro’ – Nico Zazworka – Forrest Shull –
Carolyn Seaman – Michele A. Shaw
IEEE Software Engineering Workshop (SEW-35), 12-13 October 2012 Heraclion, Crete, Greece
antonio.vetro@polito.it
top related