wcre11a.ppt
TRANSCRIPT
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
An Exploratory Study of Macro Co-changes
Fehmi Jaafar, Yann-Gael Gueheneuc, Sylvie Hamel, andGiuliano Antoniol
Universite de Montreal, Quebec, Canada
Thursday, October 20, 2011
Pattern Trace Identification, Detection, and Enhancement in JavaSOftware Cost-effective Change and Evolution Research Lab
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Introduction
Context
◮ Developers must continually change their softwareprograms to meet new requirements and user needs.
◮ Many approaches extract and analyse the changesundergone by artefacts and infer change propagation.a
◮ Several of these approaches identify co-changes amongartefacts.b
aA. E. Hassan and R. C. Holt. ICSM 2004.bZ. Xing and E. Stroulia. TSE 2005.
2 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Introduction
Co-change
◮ Two artefacts are co-changing if they were changed bythe same author and with the same log message in atime-window of less than 200 ms.a
◮ Mockusb defined the proximity in time of check-ins asthe check-in time of adjacent files that differ by lessthan three minutes.
◮ Other studiesc described issues about identifying atomicchange sets and reported that, in all cases, they differedby few minutes.
aT. Zimmermann et al. ICSE 2004.bA. Mockus et al. TSE 2004.cD. M. German. TSE 2006.
3 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Problems and Motivation
Missing Dependencies
◮ If files (e.g., in ArgoUML,NotationUtilityJava.java andModelElementNameNotationUml.java) were neverchanged by the same developer at the same time butwere changed by developers (mvw and tfmorris) in twoconsecutive change periods.
◮ Previous co-changes are intrinsically limited in time.They cannot express patterns of changes between longtime intervals (e.g., ArgoDiagram.java andModeCreateAssociationClass.java were maintainedby the same developer but separated by few hours).
4 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Problems and Motivation
Macro co-change
A macro co-change is two or more files that change together,i.e., they were maintained in the same change periods.
Figure: Two changes performed by one developer are sequential intime (after few hours), F1 and F2 are macro co-changing
5 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Problems and Motivation
Dephase macro co-change
A dephase macro co-change is two or more files that havebeen observed to change with the same shift s.
Figure: Files F1 and F2 are changed by different developers in twoconsecutive periods of time. They are dephase macro co-changing6 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(1/9)
Macocha
We propose Macocha to:
1. Mine version-control systems (CVS and SVN).
2. Identify the change periods in a program.
3. Group the program source files according to theirstability through the change periods.
4. Identify among changed files those that have similarco-changes pattern.
7 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(2/9)
Macocha
We draw inspiration and extend the classical sliding windowapproacha to consider that two subsequent changes are partof one change period if they were committed by:
◮ any author;
◮ with any log message;
◮ without an interrupt between two changes.
aT. Zimmermann et al. ICSE 2004.
8 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(3/9)
Macocha
◮ The duration of a change period is less than 40 hours.a
◮ If the interrupt between two subsequent changes ismore than t = 5.17 hours, theses two changes belong totwo different change periods.
◮ A SMCC is two or more files that have identical profilesduring the life cycle of a program.
◮ A SDMCC is the set composed of F1 and one or morefiles, F2...FM, such that F2...FM always macro co-changewith the same shift in time s with respect to F1.
◮ We use the Hamming distance DH to measure theamount of differences between two change profiles.
aL. Hatton. Computer 2007.
9 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(4/9)
Figure: Analysis-process of Macocha
10 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(5/9)
Macocha
Step 1: Detection of change periods.
Figure: Analysis of commits and creation of change periods
11 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(6/9)
Macocha
Step 2: Creation of files profiles.
Figure: From revision control systems to file profiles
12 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(7/9)
Macocha
Step 3: Stability analysis.
Figure: Profiles showing file stability
13 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(8/9)
Macocha
Step 4: Detection of macro co-changes.
Figure: Files F1 and F2 are in macro co-change
Figure: Three different bit vectors showing approximate macroco-changes (D(F1,F2)=2; D(F1,F3)=4)
14 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Macocha(9/9)
Macocha
Step 5: Detection of dephase macro co-changes.
Figure: Three different bit vectors showing dephase macroco-changes
15 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Empirical study
Research questions
1. How does Macocha compare to previous work in termof precision and recall?
2. Are there (approximate) dephase macro co-changesamong files and what is their usefulness?
Objects
We now present the results of our empirical study. We applyMacocha on four different programs: ArgoUML, FreeBSD,SIP, and XalanC, developed with three differentprogramming languages, C, C++, and Java.
16 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Empirical study
Objects
ArgoUML FreeBSD SIP XalanC
Languages Java C Java C++
Versions 30 8 2 21
Files 3,148 3,603 2,790 529
Changes 16,727 186,959 8,046 397,052
Start Dates 98-01-26 94-05-25 05-07-21 99-12-18
End Dates 09-01-29 09-02-11 10-12-09 09-01-17
CPs 2,843 1,121 1,553 924
Table: Descriptive statistics of the object programs (CPs: numbersof change periods)
17 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Empirical study
Results
ArgoUML FreeBSD SIP XalanC
Idle files 202 1,856 963 7
Changed files 2,946 1,747 1,827 522
# of SMCC 166 121 142 36
Max # files 35 24 15 17
Min # files 2 2 2 2
# of SMCCH 196 163 182 85
Max # files 46 44 32 22
Min # files 2 2 2 2
# of SDMCC 11 1 6 1
Max # files 4 2 3 2
Min # files 2 2 2 2
# of SDMCCH 53 63 36 4
Max # files 6 8 5 2
Min # files 2 2 2 2
Table: Cardinalities of the sets obtained in the empirical study
18 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Validation(1/3)
Validation
We perform two types of validation:
◮ Quantitatively, we compare the stability analysis ofMacocha with that of UMLDiffa and the co-changeanalysis of Macocha with association rulesb.
◮ Qualitatively, we use external information provided bybugs reports, mailing lists, and requirement descriptionsto validate the (dephase) macro co-changes not foundusing association rules.
aZ. Xing et al. ICSE 2005.bT. Zimmermann et al. ICSE 2004.
19 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Validation(2/3)
Validation
Idle Groups Changed Groups
ArgoUMLIdle Clusters 202 0Short-lived Clusters 0 1,390Active Clusters 0 1,556
SIPIdle Clusters 963 0Short-lived Clusters 0 997Active Clusters 0 830
XalanCIdle Clusters 7 0Short-lived Clusters 0 291Active Clusters 0 231
Table: Cardinality of Macocha sets in comparison to UMLDiff
20 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Validation(3/3)
Validation
Association Rules MacochaPrecision Recall Precision Recall
ArgoUML 15% 66% 20% 75%
FreeBSD 22% 100% 24% 100%
SIP 18% 89% 24% 91%
XalanC 16% 100% 22% 100%
Table: Association rules’s approach vs. Macocha
Association Rules External InformationPrecision Recall Precision Recall
ArgoUML 86% 98% 100% 99%
FreeBSD 98% 100% 100% 100%
SIP 85% 96% 100% 98%
XalanC 90% 100% 100% 100%
Table: External evaluation of Macocha when using the results ofthe association rules’s approach as oracle and after manualvalidation using external information
21 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Conclusion and Ongoing Work
Conclusion
1. We introduced the novel concepts of macro co-changesand dephase macro co-changes to describe that two fileswere changed by developers within same changeperiods, with possible shifts in time.
2. We described, Macocha, an approach to detect(dephase) macro co-changes using file profiles and theirstability in time.
3. We performed two types of validations: quantitativelyand qualitatively, and we showed that SMCC and SDMCC
do exist and bring supplementary information.
22 / 23
An ExploratoryStudy of MacroCo-changes
Fehmi Jaafar,Yann-Gael
Gueheneuc, SylvieHamel, and
Giuliano Antoniol
Introduction andcontext
Problems andMotivation
Macocha
Empirical study
Validation
Conclusion andOngoing Work
Conclusion and Ongoing Work
Ongoing Work
1. Performing a comprehensive study of the number ofMCCs and DMCCs with varying values of t and s.
2. Performing a comprehensive study of the different kindsof DMCCs.
3. Relating MCCs and DMCCs with static analysis andexternal software characteristics, such as changeproneness.
23 / 23