an empirical study on the adequacy of testing in open source projects
TRANSCRIPT
![Page 1: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/1.jpg)
An Empirical Study on the Adequacy of Testing inOpen Source Projects
Pavneet S. Kochhar1, Ferdian Thung1, David Lo1, and Julia Lawall2
1Singapore Management University2Inria/Lip6 France
{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg, [email protected]
Asia-Pacific Software Engineering Conference (APSEC’14)
![Page 2: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/2.jpg)
2
Open-Source Software, Why Bother?
• Plethora of open source software used by many commercial applications
• Large organizations investing time, effort and money in open source development
![Page 3: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/3.jpg)
3
Software Testing, Why Bother?
Functionality -- Requirements
Bugs -- Software reliability
Costs -- Late bugs cost more
![Page 4: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/4.jpg)
4
Software Testing, Why Bother?
• Horgan and Mathur [1]– Adequate testing is critical to develop reliable
software• Tassey [2]
– Inadequate testing cost US economy 59 billion dollars annually
[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.
![Page 5: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/5.jpg)
5
Study Goals
• Understand the state-of-the-practice of testing among open source projects
• Make recommendations to improve the state-of-practice
Are open-source projects adequately tested?
![Page 6: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/6.jpg)
6
Understanding State-of-Practice
• Study a large number of projects• Check adequacy of testing
– Execute test cases – Assess test adequacy
• Characterize cases of inadequate testing– Correlate project metrics with test adequacy– At various levels of granularity
![Page 7: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/7.jpg)
7
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
![Page 8: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/8.jpg)
8
Test Adequacy
• Test Adequacy Criterion– Property that must be satisfied for a test suite
to be thorough. – Often measured by code coverage.
• Code Coverage– Percentage of the code executed by test cases
• Line coverage• Branch coverage
![Page 9: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/9.jpg)
Test Adequacy
9
CT = number of branches that evaluate to trueCF = number of branches that evaluate to falseB = total number of branchesLC = total number of lines that are executedEL = total number of lines that are executable
![Page 10: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/10.jpg)
10
Why Code Coverage?• Mockus et al. [1]
– Higher coverage leads to low post-release defects.
• Berner et al. [2] – Judicious use of coverage helps in finding new
defects.• Shamasunder [3]
– Branch & block coverage have correlation with fault detection.
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study,” in ESEM, 2009.[2] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in ICSE, 2007.[3] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage,” Master’s thesis, 2012.
![Page 11: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/11.jpg)
11
Source Code Metrics
• Number of lines of code (LOC)• Cyclomatic complexity (CC)
– Number of linearly independent paths through the source code
• Number of developers
![Page 12: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/12.jpg)
12
Tool Support
• Computes the source code metrics• Runs test cases• Compute the overall coverage• Relies on the maven directory structure
![Page 13: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/13.jpg)
13
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
![Page 14: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/14.jpg)
14
Data Collection
• The largest site for open source project development– >3,000,000 users & 5,000,000 repositories
• One of the most popular Linux distributions
![Page 15: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/15.jpg)
15
Data Collection• Find projects that use Maven
– Needed to run Sonar
757 projects 228 projects
945 projects(After removing duplicates)
![Page 16: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/16.jpg)
16
Data Collection
• mvn clean install – Compiles the project• mvn sonar:sonar – Runs test cases and get statistics
945 projects
872 projectscontain test suites
327 projectsSuccessfully compile, run test
cases & produce coverage
![Page 17: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/17.jpg)
17
Data Collection
Number of Lines of Code
Number of Test Cases
![Page 18: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/18.jpg)
18
Data Collection
Cyclomatic Complexity
Number of Developers
![Page 19: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/19.jpg)
19
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
![Page 20: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/20.jpg)
20
Research Questions
RQ1: What are the coverage levels and test success densities exhibited by different projects? RQ2: What are the correlations between various software metrics and code coverage at the project level?
RQ3: What are the correlations between various software metrics and code coverage at the source code file level?
![Page 21: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/21.jpg)
21
Research Questions
RQ1:Coverage Levels & Test Success Densities
![Page 22: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/22.jpg)
22
RQ1: Coverage
Coverage Level (%) Number of Projects0-25 10525-50 9050-75 92
75-100 40
• 40 projects have coverage between 75%-100% • Average Coverage – 41.96%• Median Coverage – 40.30%
Coverage Level Distribution
![Page 23: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/23.jpg)
23
RQ1: Success Density
• 254 projects have test success density >= 98%
Test Success Density• Passing Tests / Total
tests
![Page 24: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/24.jpg)
24
Research Questions
RQ2:Metrics vs. Coverage at Project Level
![Page 25: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/25.jpg)
25
RQ2: Metrics vs. Coverage (Project)Lines of Code vs. Coverage
• Spearman’s rho = -0.306 (Negative Correlation)• p-value = 1.566e-08
![Page 26: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/26.jpg)
26
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = -0.276 (Negative Correlation)• p-value = 3.665e-07
Cyclomatic Complexity vs. Coverage
![Page 27: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/27.jpg)
27
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = 0.016 (Insignificant Correlation)• p-value = 0.763
Number of Developers vs. Coverage
![Page 28: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/28.jpg)
28
Research Questions
RQ3:Metrics vs. Coverage at File Level
![Page 29: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/29.jpg)
29
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.180 (Small +ve Correlation)• p-value < 2.2e-16
Lines of Code vs. Coverage
![Page 30: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/30.jpg)
30
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.221 (Small +ve Correlation)• p-value < 2.2e-16
Cyclomatic Complexity vs. Coverage
![Page 31: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/31.jpg)
31
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.050 (No Correlation)• p-value < 2.2e-16
Number of Developers vs. Coverage
![Page 32: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/32.jpg)
32
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
![Page 33: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/33.jpg)
33
Recommendations• Practitioners:
‒ Need to improve testing efforts, especially for large or complex software projects
‒ Need to look into automated test case generation tools
• Researchers:‒ Need to promote new tools that can be easily
used by developers‒ Need to develop test case generation tools
that can scale to large projects
![Page 34: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/34.jpg)
34
Threats to Validity
• Internal validity:– Sonar might produce incorrect metrics or
coverage values• Projects do not conform to Maven directory
structure– We have performed some manual checks
• External validity:– Only analyze 300+ projects from GitHub and
Debian
![Page 35: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/35.jpg)
35
Threats to Validity
• Construct validity:– Make use of standard adequacy criterion
• Code coverage– Make use of standard code metrics
• Lines of code (LOC)• Cyclomatic complexity (CC)
– Little threats to construct validity
![Page 36: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/36.jpg)
36
Related Work• Empirical study on testing and coverage
– Mockus et al. study the impact of coverage on number of post-release defects [1]
– Shamasunder analyze the impact of different kinds of coverage on fault detection [2]
– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study”, in ESEM, 2009.[2] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage”, Master’s thesis, 2012.[3] R Gopinath, C. Jensen, and A. Groce, “Code coverage for suite evaluation for developers”, ICSE, 2014.
![Page 37: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/37.jpg)
37
Related Work• Test case generation techniques
– Thummalapenta et al. automatically generates a series of method invocations to produce a target object state [1]
– Pandita et al. produce test inputs to achieve logical and boundary-value coverage [2]
– Park et al. combines random testing with static program analysis and concolic execution [3]
[1] S, Thummalapenta et al., “Synthesizing method sequences for high-coverage testing”, in OOPSLA, 2011.[2] R. Pandita et al., “Guided test generation for coverage criteria”, ICSM, 2010.[3] S. Park et al., “Carfast: Achieving higher statement coverage faster”, FSE, 2012.
![Page 38: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/38.jpg)
Conclusion
38
• Many open-source projects are poorly tested‒ Only 40/327 projects have high coverage‒ Average coverage: 41.96%
• Coverage is poorer when projects get larger and more complex.
• Coverage is better for larger and more complex source code files.
• Number of developers are not significantly correlated with coverage.
![Page 39: An Empirical Study on the Adequacy of Testing in Open Source Projects](https://reader035.vdocuments.net/reader035/viewer/2022070512/589bc4b31a28ab082b8b5f43/html5/thumbnails/39.jpg)
39
Future Work
• Expand the study to include more projects– Address the threats to external validity
• Investigate other software metrics – Common cases of poor coverage
• Investigate the amount of effort required to attain a particular level of coverage– Cost-effectiveness analysis: effort vs. benefit