situated learning among open source software developers
DESCRIPTION
Abstract--The presence of learning in organizations is important for success and survival. Recent research into open source software developers has primarily suggested a social constructivist view where knowledge is constructed in the social relationships within the organization culture. I report results from a case study that investigated the presence of situated learning in open source developers at earlier time of a project. Thirty-eight developers were systematically selected and examined on their performance, experience and roles during ten months of maintenance work. I followed a model of learning curve effects that associated the improvement in the average resolving time with the accumulated experience. I found a strong relationship between the two variables and confirmed the presence of learning. In addition, I found a less convincing evidence to affirm knowledge depreciates in open source software developers. The depreciation factor was estimated to be 94 percent, compared to other studies which ranged between 65 to 85 percent. An additional investigation was conducted around the organization structure to understand whether core and peripheral members have different average resolving time. The finding was inconclusive to claim both groups have different means towards issue resolution. The consistency in the result about learning existence between this thesis and several related research efforts suggests that learning is likely to be an intrinsic characteristic of open source software development rather than just a speculative belief.TRANSCRIPT
A Master Thesis Presentation
(Dartington Pottery Training Workshop, 1978)
Author:
Josef HardiEuropean Master in Software Engineering
Supervisors:
Prof. Barbara RussoDr. Richard Torkar
Situated Learning in Open Source Software Developers:
The Case of Google Chrome Project
Thursday, August 4, 2011
Introduction
• Situated Learning is the learning that occurs in workplaces [Brown et al., 1989].
• No separation between ‘knowing’ and ‘doing’.
• Situated learning is primarily practiced by the community of practitioners.
1/18Thursday, August 4, 2011
Existing Findings
2/18
• Learning curve effect.
• “That the more times a task has been performed, the less time will be required on each subsequent iteration.” [T.P. Wright, 1936]
• [Huntley, 2003]: Mozilla is reported to exhibit a strong learning curve compared to Apache.
• [Au et al., 2009]: Learning is universally present in OSS projects.
Thursday, August 4, 2011
• Data are taken from each individual instead of from an aggregation of individuals.
• More insights to individual characteristics.
• i.e., Knowledge depreciation and team roles as factors that affect the learning process.
Distinctions in this Thesis
3/18Thursday, August 4, 2011
4/18
Research Question 1:Is learning present in
OSS developers?
Hypothesis 1:
There is a relation between the accumulated
experience and the performance.
Hypothesis 2:Knowledge depreciates over
time among the OSS developers.
Hypothesis 3:Core developers resolve
issues faster.
Research Question 2:What are the factors that
affect learning?
Thursday, August 4, 2011
• Google Chrome Project.
• Duration: 10 months ~ 10 releases (December 2008 - October 2009).
Case Study
5/18Thursday, August 4, 2011
Construct Input Data
Research Methodology
6/18
Data CollectionData exploration
Review Interaction Data
Issue Report Data ExperiencePerformance Team Role
Identification of Learning Curve Models and Data Fitting
1 2
34
Thursday, August 4, 2011
Research Methodology:
Data Collection
7/18
Issue Report Data(5,160 entries)
1. Unrelated project areas,2. Invalid issue status,3. Empty owner name.
Issue Report =[ID, Type, Area, Status, Owner, Open date,
Assigned date, Started date, Close date]
1 2 3 4
Thursday, August 4, 2011
8/18
Interaction =[Owner, Reviewer, Comment date]
Review Interaction Data(12,037 entries)
"ben","sky",1226700214"ben","sky",1226706864"ben","pkasting",1226707765"mal","tony",1226809276"sgk","tony",1226874776"phajdan.jr","deanm",1227808551"phajdan.jr","deanm",1227809341"phajdan.jr","mark",1228496086...
Research Methodology:
Data Collection1 2 3 4
Thursday, August 4, 2011
Issue Report Data
Issue Report Data
Releases
Dev
elop
ers
...
Experience
Releases
Dev
elop
ers
...
Performance
9/18
Research Methodology:
Data Exploration
Measure Experience Number of resolved issues
Measure PerformanceAverage of issue resolution time.
Sample = 274 developers
1 2 3 4
Thursday, August 4, 2011
10/18
Research Methodology:
Data Exploration
Review Interaction
Data
Releases
Dev
elop
ers
...
Team RoleEstimate Team Role
Core and periphery structure model[Borgatti, 1999]
Sample = 274 developers
1 2 3 4
• Core entails a dense, cohesive structure and periphery entails a sparse, loose structure.
• The estimation is performed by using UCINET.
Thursday, August 4, 2011
Research Methodology:
Construct Input Data
11/18
274 Developers
Not all of them working in a long-term.
Participate for at least 8 releases
38 Long-term Contributors
Refine
new longitudinal data
sets
1 2 3 4
Thursday, August 4, 2011
Releases
Ave
rage
tim
e of
res
olvi
ng is
sues
(log
days
)
12/18
Input data set:
PerformanceThe data distribution in the group of long-term developers
Thursday, August 4, 2011
Am
ount
of r
esol
ved
issu
es(N
)
13/18
The data distribution in the group of long-term developers
Releases
Input data set:
Experience
Thursday, August 4, 2011
46%54%
R1
39%
61%
R2
39%
61%
R3
45%55%
R4
53% 47%
R5
47% 53%
R6
47% 53%
R7
42%58%
R8
42%58%
R9
39%
61%
R10
14/18
The team composition in the group of long-term developers
Input data set:
Team Role
Thursday, August 4, 2011
Note
Research Methodology:
Identification of Learning Curve Models and Data Fitting
15/18
1 2 3 4
Model 1:
Model 2:
Thursday, August 4, 2011
Result Summary
Hypothesis Variable Model 1 Model 2 Supported?
H1 KnowledgeStock -0.01*** -0.01*** Yes
H2 Lambda 0.94*** 0.94*** Yes
H3 TeamRole NA 0.18 No
16/18
*** Statistically significant p < 0.001
Thursday, August 4, 2011
• The improvement in the solving issues might be caused by the improvement in the system design.
• Some of the issue data are incomplete
Threats to ValidityInternal Validity
Construct Validity
• The estimation of Core and Periphery structure might not reflect the real situation. However, the communication pattern is the best indicator.
External Validity
• Both models have a very low statistical prediction power (less than 5%).
17/18Thursday, August 4, 2011
• I affirmed that learning is present in open source software developers.
• Knowledge does not significantly depreciate in the Google Chrome team.
• It is inconclusive to claim core developers work faster than those who are in the periphery.
• Methodological contribution: A method to harvest and analyze data from code review.
Conclusion
18/18Thursday, August 4, 2011
Thank you!
Bolzano, 8 October 2010Thursday, August 4, 2011