Download - An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows
IEEE Transactions on Software EngineeringWes Masri, Andy Podgurski, David Leon
Presented by Jason R. Beck and Enrique G. Ortiz
Introduction and background definitions Paper Objectives Filtering Techniques Profile Types and Tools Empirical Study Description Subject Programs Results Conclusion Pros - Cons / Suggestions
Information Flow◦ Important concept in software testing research◦ Describes complex interactions between different
program elements.
Software failures◦ Often caused by untested information flows.◦ Why?
Information flows can be complex Too many to make testing them all feasible.
Test Case Filtering◦ Involves selecting a manageable number of test
cases to use.
Software Profiles◦ Software profiles are recorded interactions during
program operation.◦ Can describe control flow, data flow, input or
variable values, object states, event sequences, and timing.
◦ Profiles can analyzed for how likely they are to generate errors and those can be tested further.
1. Reduce the number of test cases to be executed.
2. Reduce the number of test executions which need a manually interpretation of correct output.
◦ Anything that requires a human interpretation of results as part of the test involves much effort.
◦ Can be eliminated if test cases are automated and self validating.
Presents the results of an empirical study using many test case filtering techniques.
Evaluates techniques for their ability to reveal defects in programs.
Information profiles created using author developed tool.
Generally graph theory models showing information flow in the software.
Many proposed techniques Authors focus on …
1. Information flow between objects. Data driven
2. Dynamic program slicing. Program statement driven (think stack trace when
debugging).
Both have static and runtime versions.
Two techniques compared Each driven by execution profiles which
indicate execution frequency of program elements.
1.Coverage Based-Techniques2.Distribution-Based Techniques
“Select test cases to maximize the proportion of program elements of a given type”◦ Attempts to cover as many elements of the
program as possible with the fewest number of test cases.
◦ Instance of a set-cover problem.
Algorithm◦ Each iteration selects a test case which covers the
largest number of program elements not covered by the previously selected tests.
Clustering technique◦ Test cases are clustered and a test case from each
cluster can be selected to represent the group.◦ Created by observing execution profiles as patterns with
n dimensions. ◦ Each dimension represents the execution count of a basic
block of code.
Also uses failure-pursuit sampling ◦ Audits test cases near failures using a k-nearest neighbor
approach. ◦ This allows cases similar to the errors to be checked.
Profiles characterize test executions by keeping track of execution frequencies of program elements.
The study takes into account eight types of profiles.◦ Generated using Byte Code Engineering Library to
examine the byte code of Java programs.◦ It also uses an existing tool the authors created
for dynamic information flow analysis.
Method Calls (MC) ◦ contains a count of how many times a method M
was called.
Method Call Pairs (MCP)◦ a count of how many times a method M1 called a
method M2.
Basic Blocks (BB)◦ A count of how many times a given basic block of
code was executed.
Basic Block Edges (BBE)◦ A count of how many times a basic block B1
branches to basic block B2.
Def-use pairs (DUP)◦ A count of how many times a variable definition is
defined and then later used.
All of the above combined (ALL)◦ Combination of all the above models.
More complex profile types
Information flow pairs (IFP)◦ Count of how many times a variable x flowed into
variable y.
Slice Pairs (SliceP)◦ For each statement pair s1 and s2, s1 occurs
before s2 in at least one slice.
Basic Coverage Maximization Cluster Filtering (One-per cluster sampling) Failure-Pursuit Sampling Simple Random Sampling
Empirical StudyEmpirical Study
Ties◦ “different tests that each covers the maximal
number of program elements not covered by previously selected tests”
Ran 1,000 times per program/profile type Randomly selected order of the tests Recorded
◦ Number of tests selected◦ How many failures and defects detected
Basic Coverage Basic Coverage MaximizationMaximization
Proportional Binary Metric and Agglomerative Hierarchical Clustering
Number of clusters varied to correspond to a range of percentages of the size of the test suite
Procedure1. Clustered into c clusters based on their profiles2. One test randomly selected from each cluster3. Recorded number of failures and defects
Run 1,000 times Failure Pursuit: Check 5 nearest neighbors
Cluster Filtering and Failure Cluster Filtering and Failure Pursuit SamplingPursuit Sampling
Randomly select test without replacement Record number of failure-inducing tests and
defects Ran 1,000 Times
Simple Random SamplingSimple Random Sampling
Subject Programs and Test Subject Programs and Test SuitesSuites
28,639 lines of code Jacks Test Suite
◦ 3,140 tests◦ 233 cause failures
javac javac Java CompilerJava Compiler
52,528 lines of code XML Conformance Test Suite
◦ Used 1,667 tests of 2,000 Difficult to determine pass/fail of dropped tests
◦ 10 cause failures◦ Only checks syntax
Xerces Xerces XML parserXML parser
Test compliance with Java Language Specification
1,000 files (tests) from Google Groups◦ Failed on 47 of test cases
TidyHTML Syntax CheckerTidyHTML Syntax Checker
Defects that caused errors were traced Results:
◦ Average percentage of defects that they revealed over a number of replicated applications viewed as a function of the number of tests selected
◦ Compared with respect to how often they reveled specific defects
AnalysisAnalysis
Basic Coverage Maximization Results
Basic Coverage Maximization Results
Several defects revealed in 1,000 replications
Some defects only revealed when SliceP and IFP maximized
“Maximization with one type of profile revealed defects that were not revealed with another type of profile that seems to be more detailed.”
ResultsResults
Simpler profile types (i.e. MC, MCP, BB, BBE, and DUP) revealed more defects than IFP
“Information Flow Pairs are recorded only when a variable is actually defined (assigned a value), but some defects may be triggered without executing such an operation.”
AnomaliesAnomalies
Distribution Based Filtering Distribution Based Filtering ResultsResults
Distribution Based Filtering Distribution Based Filtering ResultsResults
Distribution Based Filtering Distribution Based Filtering ResultsResults
Distribution Based Filtering Distribution Based Filtering ResultsResults
Distribution Based Filtering Distribution Based Filtering ResultsResults
Distribution Based Filtering Distribution Based Filtering ResultsResults
Programs too broad Did not debug programs enough Wrongly classified defects Assumes size of the final set of tests is an
accurate measure of cost
Threats to ValidityThreats to Validity
Time and space increases with level of profile detail
Time for collecting profile information, longer than time needed for analysis
Observations Cost and Observations Cost and AnalysisAnalysis
Coverage maximization, One-Per-Cluster Sampling, and Failure Pursuit Sampling more effective than Random Sampling when proportion of failure high
Coverage maximization based on complex profiles revealed most defects
ConclusionsConclusions
One-per-cluster sampling and failure pursuit did not clearly perform better than coverage maximization
No clear performance difference between one-per-cluster and failure pursuit sampling
ConclusionsConclusions
Empirically evaluate test case filtering techniques
Compare with respect to:◦ Effectiveness for revealing defects◦ Simple Random Sampling
Complex profiles such as IFP and SliceP justifiable for when large number of tests necessary
ConclusionsConclusions
Pros◦ Describes a good way to analyze programs.
◦ Uses profiles to help minimize complexity for only those most meaningful code chunks.
Cons◦ Programs tested were just compilers and syntax
checkers. ◦ Graphs could have better captions explaining
what is occuring0
Have only one Test Suite◦ Several different program types that can be
tested with same suite◦ Eliminates an additional variable
Select several types of programs
SuggestionsSuggestions