situated learning among open source software developers

20
A Master Thesis Presentation (Dartington Pottery Training Workshop, 1978) Author: Josef Hardi European Master in Software Engineering Supervisors: Prof. Barbara Russo Dr. Richard Torkar Situated Learning in Open Source Software Developers: The Case of Google Chrome Project Thursday, August 4, 2011

Upload: josef-hardi

Post on 28-Nov-2014

702 views

Category:

Technology


0 download

DESCRIPTION

Abstract--The presence of learning in organizations is important for success and survival. Recent research into open source software developers has primarily suggested a social constructivist view where knowledge is constructed in the social relationships within the organization culture. I report results from a case study that investigated the presence of situated learning in open source developers at earlier time of a project. Thirty-eight developers were systematically selected and examined on their performance, experience and roles during ten months of maintenance work. I followed a model of learning curve effects that associated the improvement in the average resolving time with the accumulated experience. I found a strong relationship between the two variables and confirmed the presence of learning. In addition, I found a less convincing evidence to affirm knowledge depreciates in open source software developers. The depreciation factor was estimated to be 94 percent, compared to other studies which ranged between 65 to 85 percent. An additional investigation was conducted around the organization structure to understand whether core and peripheral members have different average resolving time. The finding was inconclusive to claim both groups have different means towards issue resolution. The consistency in the result about learning existence between this thesis and several related research efforts suggests that learning is likely to be an intrinsic characteristic of open source software development rather than just a speculative belief.

TRANSCRIPT

Page 1: Situated learning among open source software developers

A Master Thesis Presentation

(Dartington Pottery Training Workshop, 1978)

Author:

Josef HardiEuropean Master in Software Engineering

Supervisors:

Prof. Barbara RussoDr. Richard Torkar

Situated Learning in Open Source Software Developers:

The Case of Google Chrome Project

Thursday, August 4, 2011

Page 2: Situated learning among open source software developers

Introduction

• Situated Learning is the learning that occurs in workplaces [Brown et al., 1989].

• No separation between ‘knowing’ and ‘doing’.

• Situated learning is primarily practiced by the community of practitioners.

1/18Thursday, August 4, 2011

Page 3: Situated learning among open source software developers

Existing Findings

2/18

• Learning curve effect.

• “That the more times a task has been performed, the less time will be required on each subsequent iteration.” [T.P. Wright, 1936]

• [Huntley, 2003]: Mozilla is reported to exhibit a strong learning curve compared to Apache.

• [Au et al., 2009]: Learning is universally present in OSS projects.

Thursday, August 4, 2011

Page 4: Situated learning among open source software developers

• Data are taken from each individual instead of from an aggregation of individuals.

• More insights to individual characteristics.

• i.e., Knowledge depreciation and team roles as factors that affect the learning process.

Distinctions in this Thesis

3/18Thursday, August 4, 2011

Page 5: Situated learning among open source software developers

4/18

Research Question 1:Is learning present in

OSS developers?

Hypothesis 1:

There is a relation between the accumulated

experience and the performance.

Hypothesis 2:Knowledge depreciates over

time among the OSS developers.

Hypothesis 3:Core developers resolve

issues faster.

Research Question 2:What are the factors that

affect learning?

Thursday, August 4, 2011

Page 6: Situated learning among open source software developers

• Google Chrome Project.

• Duration: 10 months ~ 10 releases (December 2008 - October 2009).

Case Study

5/18Thursday, August 4, 2011

Page 7: Situated learning among open source software developers

Construct Input Data

Research Methodology

6/18

Data CollectionData exploration

Review Interaction Data

Issue Report Data ExperiencePerformance Team Role

Identification of Learning Curve Models and Data Fitting

1 2

34

Thursday, August 4, 2011

Page 8: Situated learning among open source software developers

Research Methodology:

Data Collection

7/18

Issue Report Data(5,160 entries)

1. Unrelated project areas,2. Invalid issue status,3. Empty owner name.

Issue Report =[ID, Type, Area, Status, Owner, Open date,

Assigned date, Started date, Close date]

1 2 3 4

Thursday, August 4, 2011

Page 9: Situated learning among open source software developers

8/18

Interaction =[Owner, Reviewer, Comment date]

Review Interaction Data(12,037 entries)

"ben","sky",1226700214"ben","sky",1226706864"ben","pkasting",1226707765"mal","tony",1226809276"sgk","tony",1226874776"phajdan.jr","deanm",1227808551"phajdan.jr","deanm",1227809341"phajdan.jr","mark",1228496086...

Research Methodology:

Data Collection1 2 3 4

Thursday, August 4, 2011

Page 10: Situated learning among open source software developers

Issue Report Data

Issue Report Data

Releases

Dev

elop

ers

...

Experience

Releases

Dev

elop

ers

...

Performance

9/18

Research Methodology:

Data Exploration

Measure Experience Number of resolved issues

Measure PerformanceAverage of issue resolution time.

Sample = 274 developers

1 2 3 4

Thursday, August 4, 2011

Page 11: Situated learning among open source software developers

10/18

Research Methodology:

Data Exploration

Review Interaction

Data

Releases

Dev

elop

ers

...

Team RoleEstimate Team Role

Core and periphery structure model[Borgatti, 1999]

Sample = 274 developers

1 2 3 4

• Core entails a dense, cohesive structure and periphery entails a sparse, loose structure.

• The estimation is performed by using UCINET.

Thursday, August 4, 2011

Page 12: Situated learning among open source software developers

Research Methodology:

Construct Input Data

11/18

274 Developers

Not all of them working in a long-term.

Participate for at least 8 releases

38 Long-term Contributors

Refine

new longitudinal data

sets

1 2 3 4

Thursday, August 4, 2011

Page 13: Situated learning among open source software developers

Releases

Ave

rage

tim

e of

res

olvi

ng is

sues

(log

days

)

12/18

Input data set:

PerformanceThe data distribution in the group of long-term developers

Thursday, August 4, 2011

Page 14: Situated learning among open source software developers

Am

ount

of r

esol

ved

issu

es(N

)

13/18

The data distribution in the group of long-term developers

Releases

Input data set:

Experience

Thursday, August 4, 2011

Page 15: Situated learning among open source software developers

46%54%

R1

39%

61%

R2

39%

61%

R3

45%55%

R4

53% 47%

R5

47% 53%

R6

47% 53%

R7

42%58%

R8

42%58%

R9

39%

61%

R10

14/18

The team composition in the group of long-term developers

Input data set:

Team Role

Thursday, August 4, 2011

Page 16: Situated learning among open source software developers

Note

Research Methodology:

Identification of Learning Curve Models and Data Fitting

15/18

1 2 3 4

Model 1:

Model 2:

Thursday, August 4, 2011

Page 17: Situated learning among open source software developers

Result Summary

Hypothesis Variable Model 1 Model 2 Supported?

H1 KnowledgeStock -0.01*** -0.01*** Yes

H2 Lambda 0.94*** 0.94*** Yes

H3 TeamRole NA 0.18 No

16/18

*** Statistically significant p < 0.001

Thursday, August 4, 2011

Page 18: Situated learning among open source software developers

• The improvement in the solving issues might be caused by the improvement in the system design.

• Some of the issue data are incomplete

Threats to ValidityInternal Validity

Construct Validity

• The estimation of Core and Periphery structure might not reflect the real situation. However, the communication pattern is the best indicator.

External Validity

• Both models have a very low statistical prediction power (less than 5%).

17/18Thursday, August 4, 2011

Page 19: Situated learning among open source software developers

• I affirmed that learning is present in open source software developers.

• Knowledge does not significantly depreciate in the Google Chrome team.

• It is inconclusive to claim core developers work faster than those who are in the periphery.

• Methodological contribution: A method to harvest and analyze data from code review.

Conclusion

18/18Thursday, August 4, 2011

Page 20: Situated learning among open source software developers

Thank you!

Bolzano, 8 October 2010Thursday, August 4, 2011