wcre11a.ppt

23
An Exploratory Study of Macro Co-changes Fehmi Jaafar , Yann-Ga¨ el Gu´ eh´ eneuc, Sylvie Hamel, and Giuliano Antoniol Introduction and context Problems and Motivation Macocha Empirical study Validation Conclusion and Ongoing Work An Exploratory Study of Macro Co-changes Fehmi Jaafar , Yann-Ga¨ el Gu´ eh´ eneuc, Sylvie Hamel, and Giuliano Antoniol Universit´ e de Montr´ eal, Qu´ ebec, Canada Thursday, October 20, 2011 Pattern Trace Identification, Detection, and Enhancement in Java SOftware Cost-effective Change and Evolution Research Lab

Upload: ptidej-team

Post on 11-Jun-2015

66 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

An Exploratory Study of Macro Co-changes

Fehmi Jaafar, Yann-Gael Gueheneuc, Sylvie Hamel, andGiuliano Antoniol

Universite de Montreal, Quebec, Canada

Thursday, October 20, 2011

Pattern Trace Identification, Detection, and Enhancement in JavaSOftware Cost-effective Change and Evolution Research Lab

Page 2: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Introduction

Context

◮ Developers must continually change their softwareprograms to meet new requirements and user needs.

◮ Many approaches extract and analyse the changesundergone by artefacts and infer change propagation.a

◮ Several of these approaches identify co-changes amongartefacts.b

aA. E. Hassan and R. C. Holt. ICSM 2004.bZ. Xing and E. Stroulia. TSE 2005.

2 / 23

Page 3: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Introduction

Co-change

◮ Two artefacts are co-changing if they were changed bythe same author and with the same log message in atime-window of less than 200 ms.a

◮ Mockusb defined the proximity in time of check-ins asthe check-in time of adjacent files that differ by lessthan three minutes.

◮ Other studiesc described issues about identifying atomicchange sets and reported that, in all cases, they differedby few minutes.

aT. Zimmermann et al. ICSE 2004.bA. Mockus et al. TSE 2004.cD. M. German. TSE 2006.

3 / 23

Page 4: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Problems and Motivation

Missing Dependencies

◮ If files (e.g., in ArgoUML,NotationUtilityJava.java andModelElementNameNotationUml.java) were neverchanged by the same developer at the same time butwere changed by developers (mvw and tfmorris) in twoconsecutive change periods.

◮ Previous co-changes are intrinsically limited in time.They cannot express patterns of changes between longtime intervals (e.g., ArgoDiagram.java andModeCreateAssociationClass.java were maintainedby the same developer but separated by few hours).

4 / 23

Page 5: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Problems and Motivation

Macro co-change

A macro co-change is two or more files that change together,i.e., they were maintained in the same change periods.

Figure: Two changes performed by one developer are sequential intime (after few hours), F1 and F2 are macro co-changing

5 / 23

Page 6: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Problems and Motivation

Dephase macro co-change

A dephase macro co-change is two or more files that havebeen observed to change with the same shift s.

Figure: Files F1 and F2 are changed by different developers in twoconsecutive periods of time. They are dephase macro co-changing6 / 23

Page 7: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(1/9)

Macocha

We propose Macocha to:

1. Mine version-control systems (CVS and SVN).

2. Identify the change periods in a program.

3. Group the program source files according to theirstability through the change periods.

4. Identify among changed files those that have similarco-changes pattern.

7 / 23

Page 8: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(2/9)

Macocha

We draw inspiration and extend the classical sliding windowapproacha to consider that two subsequent changes are partof one change period if they were committed by:

◮ any author;

◮ with any log message;

◮ without an interrupt between two changes.

aT. Zimmermann et al. ICSE 2004.

8 / 23

Page 9: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(3/9)

Macocha

◮ The duration of a change period is less than 40 hours.a

◮ If the interrupt between two subsequent changes ismore than t = 5.17 hours, theses two changes belong totwo different change periods.

◮ A SMCC is two or more files that have identical profilesduring the life cycle of a program.

◮ A SDMCC is the set composed of F1 and one or morefiles, F2...FM, such that F2...FM always macro co-changewith the same shift in time s with respect to F1.

◮ We use the Hamming distance DH to measure theamount of differences between two change profiles.

aL. Hatton. Computer 2007.

9 / 23

Page 10: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(4/9)

Figure: Analysis-process of Macocha

10 / 23

Page 11: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(5/9)

Macocha

Step 1: Detection of change periods.

Figure: Analysis of commits and creation of change periods

11 / 23

Page 12: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(6/9)

Macocha

Step 2: Creation of files profiles.

Figure: From revision control systems to file profiles

12 / 23

Page 13: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(7/9)

Macocha

Step 3: Stability analysis.

Figure: Profiles showing file stability

13 / 23

Page 14: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(8/9)

Macocha

Step 4: Detection of macro co-changes.

Figure: Files F1 and F2 are in macro co-change

Figure: Three different bit vectors showing approximate macroco-changes (D(F1,F2)=2; D(F1,F3)=4)

14 / 23

Page 15: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Macocha(9/9)

Macocha

Step 5: Detection of dephase macro co-changes.

Figure: Three different bit vectors showing dephase macroco-changes

15 / 23

Page 16: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Empirical study

Research questions

1. How does Macocha compare to previous work in termof precision and recall?

2. Are there (approximate) dephase macro co-changesamong files and what is their usefulness?

Objects

We now present the results of our empirical study. We applyMacocha on four different programs: ArgoUML, FreeBSD,SIP, and XalanC, developed with three differentprogramming languages, C, C++, and Java.

16 / 23

Page 17: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Empirical study

Objects

ArgoUML FreeBSD SIP XalanC

Languages Java C Java C++

Versions 30 8 2 21

Files 3,148 3,603 2,790 529

Changes 16,727 186,959 8,046 397,052

Start Dates 98-01-26 94-05-25 05-07-21 99-12-18

End Dates 09-01-29 09-02-11 10-12-09 09-01-17

CPs 2,843 1,121 1,553 924

Table: Descriptive statistics of the object programs (CPs: numbersof change periods)

17 / 23

Page 18: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Empirical study

Results

ArgoUML FreeBSD SIP XalanC

Idle files 202 1,856 963 7

Changed files 2,946 1,747 1,827 522

# of SMCC 166 121 142 36

Max # files 35 24 15 17

Min # files 2 2 2 2

# of SMCCH 196 163 182 85

Max # files 46 44 32 22

Min # files 2 2 2 2

# of SDMCC 11 1 6 1

Max # files 4 2 3 2

Min # files 2 2 2 2

# of SDMCCH 53 63 36 4

Max # files 6 8 5 2

Min # files 2 2 2 2

Table: Cardinalities of the sets obtained in the empirical study

18 / 23

Page 19: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Validation(1/3)

Validation

We perform two types of validation:

◮ Quantitatively, we compare the stability analysis ofMacocha with that of UMLDiffa and the co-changeanalysis of Macocha with association rulesb.

◮ Qualitatively, we use external information provided bybugs reports, mailing lists, and requirement descriptionsto validate the (dephase) macro co-changes not foundusing association rules.

aZ. Xing et al. ICSE 2005.bT. Zimmermann et al. ICSE 2004.

19 / 23

Page 20: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Validation(2/3)

Validation

Idle Groups Changed Groups

ArgoUMLIdle Clusters 202 0Short-lived Clusters 0 1,390Active Clusters 0 1,556

SIPIdle Clusters 963 0Short-lived Clusters 0 997Active Clusters 0 830

XalanCIdle Clusters 7 0Short-lived Clusters 0 291Active Clusters 0 231

Table: Cardinality of Macocha sets in comparison to UMLDiff

20 / 23

Page 21: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Validation(3/3)

Validation

Association Rules MacochaPrecision Recall Precision Recall

ArgoUML 15% 66% 20% 75%

FreeBSD 22% 100% 24% 100%

SIP 18% 89% 24% 91%

XalanC 16% 100% 22% 100%

Table: Association rules’s approach vs. Macocha

Association Rules External InformationPrecision Recall Precision Recall

ArgoUML 86% 98% 100% 99%

FreeBSD 98% 100% 100% 100%

SIP 85% 96% 100% 98%

XalanC 90% 100% 100% 100%

Table: External evaluation of Macocha when using the results ofthe association rules’s approach as oracle and after manualvalidation using external information

21 / 23

Page 22: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Conclusion and Ongoing Work

Conclusion

1. We introduced the novel concepts of macro co-changesand dephase macro co-changes to describe that two fileswere changed by developers within same changeperiods, with possible shifts in time.

2. We described, Macocha, an approach to detect(dephase) macro co-changes using file profiles and theirstability in time.

3. We performed two types of validations: quantitativelyand qualitatively, and we showed that SMCC and SDMCC

do exist and bring supplementary information.

22 / 23

Page 23: WCRE11a.ppt

An ExploratoryStudy of MacroCo-changes

Fehmi Jaafar,Yann-Gael

Gueheneuc, SylvieHamel, and

Giuliano Antoniol

Introduction andcontext

Problems andMotivation

Macocha

Empirical study

Validation

Conclusion andOngoing Work

Conclusion and Ongoing Work

Ongoing Work

1. Performing a comprehensive study of the number ofMCCs and DMCCs with varying values of t and s.

2. Performing a comprehensive study of the different kindsof DMCCs.

3. Relating MCCs and DMCCs with static analysis andexternal software characteristics, such as changeproneness.

23 / 23