process mining software repositories

26
Process Mining Software Repositories Master project kickoff presentation Wouter Poncin, [email protected]

Upload: mckenzie-harmon

Post on 01-Jan-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Process Mining Software Repositories. Master project kickoff presentation Wouter Poncin , [email protected]. Agenda. Introduction Existing approaches Project goal Prototype Design Current work. Introduction. Software development teams Software repositories Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Process Mining Software Repositories

Master project kickoff presentation

Wouter Poncin, [email protected]

/ Department of Mathematics and Computer Science 19-04-2023

Agenda

• Introduction• Existing approaches• Project goal• Prototype• Design• Current work

PAGE 2

/ Department of Mathematics and Computer Science 19-04-2023

Introduction

• Software development teams• Software repositories• Analysis

PAGE 3

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches

• NavTracks [Sin05]• eROSE [Zim05]• DynaMine [Liv05]• MarmoSet [Spa05]• projectWatcher [Gut04]

• Traceability links [Kag07]• Improve bug finding [Wil05]• Predict change [Yin04]

PAGE 4

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches – multiple data sources

• Hipikat: recommends relevant software artifacts based on the current context of a developer [Čub05]

PAGE 5

Images from: http://www.cs.ubc.ca/labs/spl/projects/hipikat/

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches – multiple data sources

• Alitheia Core: a platform for software engineering research [Gou09]

PAGE 6

Images from: http://www.sqo-oss.org/

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches – multiple data sources

• Other approaches:• Wolf et al. [Wol09]:

Mining task-based social networks to explore collaboration in software teams.

• Bird et al. [Bir06]:Mining email social networks

• Robles et al. [Rob05]:Developer identification methods for integrated data from various sources

PAGE 7

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches – problems

• Mostly single data source• Problems with multiple data source approaches:

• Provide artifact centered view (Hipikat)• Focus on metric calculation (Alitheia Core)

• No analysis on global process overview• Example analysis questions:

− How does the real (mined) organizational model relate to the ‘used’ organizational model?

− How to classify developers of open source projects? [Nak02]− Does the project follow a given development process model?

(waterfall / XP / …)

PAGE 8

/ Department of Mathematics and Computer Science 19-04-2023

Existing approaches – problems

• Mostly single data source• No analysis on global process overview• Solution: process mining

PAGE 9

19-04-2023

Intermezzo: process mining

/ Department of Mathematics and Computer Science PAGE 10

Image from: http://prom.win.tue.nl/research/wiki/_detail/research/processmining.gif

/ Department of Mathematics and Computer Science 19-04-2023

Intermezzo: process mining

• Input: event log• Output: models

PAGE 11

Case ID Task Name Event Type Originator Timestamp 1 File Fine Completed Anne 20-07-2004 14:00:002 File Fine Completed Anne 20-07-2004 15:00:001 Send Bill Completed system 20-07-2004 15:05:002 Send Bill Completed system 20-07-2004 15:07:003 File Fine Completed Anne 21-07-2004 10:00:003 Send Bill Completed system 21-07-2004 14:00:004 File Fine Completed Anne 22-07-2004 11:00:004 Send Bill Completed system 22-07-2004 11:10:001 Process Payment Completed system 22-07-2004 15:05:001 Close Case Completed system 24-07-2004 15:06:002 Send Reminder Completed Mary 20-08-2004 10:00:003 Send Reminder Completed John 21-08-2004 10:00:002 Process Payment Completed system 22-08-2004 09:05:002 Close case Completed system 22-08-2004 09:06:004 Send Reminder Completed John 22-08-2004 15:10:004 Send Reminder Completed Mary 22-08-2004 17:10:004 Process Payment Completed system 29-08-2004 14:01:00 4 Close Case Completed system 29-08-2004 17:30:003 Send Reminder Completed John 21-09-2004 10:00:003 Send Reminder Completed John 21-10-2004 10:00:003 Process Payment Completed system 25-10-2004 14:00:003 Close Case Completed system 25-10-2004 14:01:00

Example from: [Med09]

/ Department of Mathematics and Computer Science 19-04-2023

Project goal

• The goal of this project is to develop an application which facilitates process analysis of data from various software repositories, in an easy manner.

• Facilitate export data to log• Various repositories combine data• Various repositories later add new types of data• Easy manner add a data source by URL

• Open source & closed source projects

PAGE 12

/ Department of Mathematics and Computer Science 19-04-2023

Prototype

• Console application• Input: repository url’s• Output: MXML process log• Analysis: ProM

• Simple developer matching• High level events• Case: originator

PAGE 13

/ Department of Mathematics and Computer Science 19-04-2023

Prototype

• Project: Gallery (web based photo gallery software)http://sourceforge.net/projects/gallery/

• Used data sources:• SVN repository (20740 revisions)• TRAC tickets (1028)• Mailing list archives:

‘devel’ (2867 messages), ‘translate’ (108 messages),‘announce’ (69 messages)

PAGE 14

/ Department of Mathematics and Computer Science 19-04-2023

Prototype – analysis

PAGE 15

/ Department of Mathematics and Computer Science 19-04-2023

Prototype – analysis

PAGE 16

Legend:- yellow: TRAC ticket- white: SVN revision- red: Mail (translations)- blue: Mail (devel)- green: Mail (announce)

/ Department of Mathematics and Computer Science 19-04-2023

Prototype – analysis

PAGE 17

Legend:- yellow: TRAC ticket- white: SVN revision- red: Mail (translations)- blue: Mail (devel)- green: Mail (announce)

/ Department of Mathematics and Computer Science 19-04-2023

Prototype – analysis

PAGE 18

/ Department of Mathematics and Computer Science 19-04-2023

Design

• Application requirements:• Support multiple data sources (software repositories)

• Caching of data from data sources• Define data filters• Developer matching• Define mapping from data elements to log elements• Easy addition of new plugins for data source types / export

types

PAGE 19

/ Department of Mathematics and Computer Science 19-04-2023

Design

• Issues• How to define a case• Level of granularity of events• How to define developer matching (manual/automatic)

PAGE 20

/ Department of Mathematics and Computer Science 19-04-2023

Design

• Data sources to support:• Subversion• CVS• Git (used for jQuery / mootools for example)

• Bugzilla• TRAC• Wiki articles (+history)• SourceForge mailinglists• SourceForge thumbs up/down• Twitter

PAGE 21

/ Department of Mathematics and Computer Science 19-04-2023

Design

• Analysis tools:• ProM: www.processmining.org (open source)• Futura Reflect: www.futuratech.nl• Interstage Business Process Manager• Fluxicon: www.fluxicon.com• And others…

PAGE 22

/ Department of Mathematics and Computer Science 19-04-2023

Current work

• Finish application development• Developer matching• Case definition• Internal cache• Implement data source plugins

• Analyze projects• (Large) open source projects

− Like Firefox, WordPress, Filezilla for example

• SEP / student projects

PAGE 23

/ Department of Mathematics and Computer Science 19-04-2023

Questions

?

PAGE 24

/ Department of Mathematics and Computer Science 19-04-2023

References

• [Bir06] Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A. Mining email social networks. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 137–143, New York, NY, USA, (2006). ACM.

• [Čub05] Cubranic, D., Murphy, G.C., Singer, J., Booth, K.S. Hipikat: A project memory for software development. IEEE Trans. Softw. Eng., 31(6):446–465, (2005).

• [Gou09] Gousios, G., Spinellis, D. Alitheia core: An extensible software quality monitoring platform. Software Engineering, International Conference on, pages 579–582, (2009).

• [Gut04] Gutwin, C., Penner, R., Schneider, K. Group awareness in distributed software development. In CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81, New York, NY, USA, (2004).

• [Kag07] Kagdi, H., Maletic, J.I., Sharif, B. Mining software repositories for traceability links. In ICPC '07: Proceedings of the 15th IEEE International Conference on Program Comprehension, pages 145–154, Washington, DC, USA, (2007). IEEE Computer Society.

• [Liv05] Livshits, B., Zimmermann, T. DynaMine: nding common error patterns by mining software revision histories. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296–305, New York, NY, USA, (2005). ACM.

• [Med09] Medeiros, A.K.A. de, Aalst, W.M.P. van der. Process mining towards semantics. pages 35–80, (2009).

• [Moc00] Mockus, A., Fielding, R.T., Herbsleb, J. A case study of open source software development: the apache server. In ICSE '00: Proceedings of the 22nd international conference on Software engineering, pages 263–272, New York, NY, USA. ACM.

PAGE 25

/ Department of Mathematics and Computer Science 19-04-2023

References

• [Nak02] Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y. Evolution patterns of open-source software systems and communities. In IWPSE '02: Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, New York, NY, USA, (2002). ACM.

• [Rob05] Robles, G., Gonzalez-Barahona, J.M. Developer identication methods for integrated data from various sources. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5, New York, NY, USA, (2005). ACM.

• [Sin05] Singer, J., Elves, R., Storey, M. Navtracks: Supporting navigation in software maintenance. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance , pages 325–334, Washington, DC, USA, (2005). IEEE Computer Society.

• [Spa05] Spacco, J., Strecker, J., Hovemeyer, D., Pugh, W. Software repository mining with marmoset: an automated programming project snapshot and testing system. SIGSOFT Softw. Eng. Notes, 30(4):1–5, (2005).

• [Wil05] Williams, C.C., Hollingsworth, J.K. Automatic mining of source code repositories to improve bug finding techniques. Software Engineering, IEEE Transactions on, 31(6):466–480, June 2005.

• [Wol09] Wolf, T., Schröter, A., Damian, D., Panjer, L.D., Nguyen, T.H.D. Mining task-based social networks to explore collaboration in software teams. IEEE Softw., 26(1):58–66, (2009).

• [Yin04] Ying, A.T.T., Murphy, G.C., Ng, R., Chu-Carroll, M.C. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9), (2004).

• [Zim05] Zimmermann, T., Dallmeier, V., Halachev, K., Zeller, A. eROSE: guiding programmers in eclipse. In OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 186–187, New York, NY, USA, (2005). ACM.

PAGE 26