![Page 1: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/1.jpg)
Is the Pareto Principle Applicable to the Core Teams
of GitHub Projects?
KazuhiroYamashita
YasutakaKamei
ShaneMcIntosh
NaoyasuUbayashi
Ahmed E. Hassan
![Page 2: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/2.jpg)
Core developers play a critical role
in software development
2
Core developers are responsible for guiding and coordinating the development of an OSS project.
The most productive developers who have made roughly 80% of the total contributions.
Nakakoji
Mockus
![Page 3: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/3.jpg)
In fact, some argue that core developers in OSS projects follow the Pareto Principle
5Effort Result
80% 80%
20%20%
![Page 4: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/4.jpg)
Pareto Principle in Software Development
6
20%
80% 20%
80%
ProjectDevelopers Artifacts
![Page 5: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/5.jpg)
Prior studies have arrived at mixed conclusions about core teams and the Pareto Principle
7
Pareto Non-Pareto
Goeminne IWSQM
RoblesRAMSS
MockusTOSEM
GeldenhuysECSEAA
KochISJ Dinh-Trong
TSE
The results depend on small number of case study systems
Other
![Page 6: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/6.jpg)
Prior studies have arrived at mixed conclusions about core teams and the Pareto Principle
8
< 10 or 15 Other
Goeminne IWSQM
RoblesRAMSS
MockusTOSEM
GeldenhuysECSEAA
KochISJ
Dinh-TrongTSE
![Page 7: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/7.jpg)
Overview of our study of core teams on GitHub
19
Applicability of the Pareto PrincipleNumber of Core Developers
![Page 8: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/8.jpg)
Overview of our study of core teams on GitHub
20
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
![Page 9: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/9.jpg)
Collecting and analyzing GitHub data to study core team activity
21
Filter Heuristics
Core
Non-Core
Core
Non-Core
Calc Prop
Projects
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 10: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/10.jpg)
Collecting and analyzing GitHub data to study core team activity
22
Filter Heuristics
Core
Non-Core
Projects
22
Core
Non-Core
Calc Prop
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 11: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/11.jpg)
Preprocessing GitHub data to handle forks, duplicates, and to remove immature projects
23
8,510,504 repositories -> 2,496 repositories
![Page 12: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/12.jpg)
Collecting and analyzing GitHub data to study core team activity
24
Filter Heuristics
Core
Non-Core
Projects
24
Core
Non-Core
Calc Prop
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 13: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/13.jpg)
Using heuristics to identify core team members
26Commit-based LOC-based Access-based
Core Core Core
![Page 14: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/14.jpg)
29A B C D
Our commit-based core contributor heuristic
Number of Commits
= Commit
![Page 15: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/15.jpg)
Step1: Sort contributors by their number of commits
30A BC D
Number of Commits
![Page 16: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/16.jpg)
Step2: Compute the proportion of commits that each contributor
32A BC D
60% 20% 10% 10%Commits ratio
![Page 17: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/17.jpg)
Step3: Core contributors are those developers below the 0.8 cumulative contribution cutoff
33A BC D
0.8
1.0
0.6
Cumulativeratio
Pct. CoreDev2/4*100 = 50%
Num CoreDev2
![Page 18: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/18.jpg)
Collecting and analyzing GitHub data to study core team activity
35
Filter Heuristics
Core
Non-Core
Projects
35
Core
Non-Core
Calc Prop
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 19: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/19.jpg)
Overview of our study of core teams on GitHub
36
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
![Page 20: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/20.jpg)
Overview of our study of core teams on GitHub
37
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
![Page 21: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/21.jpg)
Collecting and analyzing GitHub data to study core team activity
38
Filter Heuristics
Core
Non-Core
Projects
38
Core
Non-Core
Calc Prop
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 22: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/22.jpg)
Our approach to study Core Team Size
40
30%20%10%Percentage of Core Devs
Compliance with the Pareto Principle
Stratify projects along the confounding factors
Small Medium Large Small Medium Large Small Medium LargeLOC Total Author Age
The example project does not follow the Pareto Principle
![Page 23: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/23.jpg)
Core team proportions are widespread
43
Commit-based Divide by LOC
![Page 24: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/24.jpg)
Often, there are fewer than 15 core developers in a projects
44
Number of core developers in projects
88% 98% 96%Commit-Based LOC-Based Access-Based
![Page 25: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/25.jpg)
Overview of our study of core teams on GitHub
45
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
More than half projects do not follow the Pareto principle
Most of projects have 15 or less core developers
![Page 26: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/26.jpg)
Overview of our study of core teams on GitHub
48
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
More than half projects do not follow the Pareto principle
Most of projects have 15 or less core developers
![Page 27: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/27.jpg)
Collecting and analyzing GitHub data to study core team activity
49
Filter Heuristics
Core
Non-Core
Projects
49
Core
Non-Core
Calc Prop
Core
Non-CoreClassifyCommits
Core Team Size Activity
![Page 28: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/28.jpg)
Our approach to study activity
50
By using the keywords, we classify the commits.
DevelopmentActivity Type KeywordsForward Engineering implement, add, requestMaintenanceReengineering optimiz, adjust
Corrective Engineering bug, fix, issue, error
Management license, formatting, TODO
![Page 29: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/29.jpg)
No big differences in proportions of development activities
54
Commit-Based LOC-Based Access-Based
![Page 30: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/30.jpg)
Overview of our study of core teams on GitHub
55
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
More than half projects do not follow the Pareto principle
Most of projects have 15 or less core developers
There are no big differences between
core and non-core activities
![Page 31: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/31.jpg)
Overview of our study of core teams on GitHub
56
Core and Non-Core Developers Activities
Applicability of the Pareto PrincipleNumber of Core Developers
More than half projects do not follow the Pareto principle
Most of projects have 15 or less core developers
There are no big differences between
core and non-core activities
![Page 32: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/32.jpg)
Extremely large core team may be interesting
58
Heuristic -15 16-20 21-50 51-100 101-
Commit-Based
2,197 98 137 17 47
LOC-Based
2,454 15 13 4 10
Access-Based
1,164 24 24 0 0
![Page 33: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/33.jpg)
Many projects face a risk of bus factor
59
Commit-Based LOC-Based Access-Based43% (Core=1: 8%) 81% (Core=1: 24%) 54% (Core=1: 21%)
In fact, most of projects have less than 5 core developers
![Page 34: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/34.jpg)
Conclusion
63
![Page 35: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/35.jpg)
64
![Page 36: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/36.jpg)
Core Developer• additional slides
65
![Page 37: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/37.jpg)
Additional description of our definition
66
0.8
1.0
A B C D E Depend on Name
![Page 38: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/38.jpg)
Commit-based
67
Age Total Author
![Page 39: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/39.jpg)
LOC-based
68
Age Total Author
LOC
![Page 40: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/40.jpg)
Access-based
69
Age Total Author
LOC
![Page 41: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/41.jpg)
Data Extraction
70
8,510,504 repositories -> 4,618 repositories
![Page 42: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/42.jpg)
Data Extraction
71
![Page 43: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/43.jpg)
Data Extraction
72
(1) Filter projects by GHTorrent
Filter forked repositories.
![Page 44: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/44.jpg)
Fork
73
One of the features of GitHub
Fork (clone)
Original Repository
Fork Repository
Pull Request
![Page 45: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/45.jpg)
Data Extraction
74
(1) Filter projects by GHTorrent
Filter forked repositories.
Filter less than 10 devs repositories.
![Page 46: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/46.jpg)
Data Extraction
75
(1) Filter projects by GHTorrent
Filter forked repositories.
Filter less than 10 devs repositories.
Filter repositories which is developed outside of GitHub.
![Page 47: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/47.jpg)
Data Extraction
76
(1) Filter projects by GHTorrent
Filter forked repositories.
Filter less than 10 devs repositories.
Filter repositories which is developed outside of GitHub.
8,510,504 repositories -> 4,618 repositories
![Page 48: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/48.jpg)
Data Extraction
77
![Page 49: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/49.jpg)
Data Extraction
78
(2) Clone repositories
4,618 repositories -> 4,154 repositories
local server
clone
![Page 50: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/50.jpg)
Data Extraction
79
![Page 51: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/51.jpg)
Data Extraction
80
(3) Filter duplicate projects
Project A Fork of Afork
clone
Project Bregister
Clone of A
![Page 52: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/52.jpg)
Data Extraction
81
(3) Filter duplicate projects
4,618 repositories -> 3,533 repositories
Project A Project B
Compare SHAs
c87cce1e1a7260f40ccb5455e44c8b67f28651fa5e
655b8be757dd93a4cf3718145880cf484e34e63bde
![Page 53: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/53.jpg)
Data Extraction
82
![Page 54: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/54.jpg)
Data Extraction
83
(4) Calculate metrics
LOCTotal CommitsTotal Authors
AgeRepository
![Page 55: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/55.jpg)
Data Extraction
84
![Page 56: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5cb1a28ab3e4b8b4687/html5/thumbnails/56.jpg)
Data Extraction
85
(5) Filter projects by metrics
4,618 repositories -> 2,496 repositories
Filter less than 10 devs repositories.
Filter less than 1,000 LOC repositories.