![Page 1: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/1.jpg)
Evolution in Open Source Software: A Case Study
Michael W. Godfrey Qiang Tu
Software Architecture Group University of Waterloo
![Page 2: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/2.jpg)
Overview What is software evolution?
Why should we care? Previous research A case study: The Linux OS kernel Observations, hypotheses, and
future research
![Page 3: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/3.jpg)
What is software evolution?
“Evolution is what happens while you’re busy
making other plans.”
Usually, we consider evolution to begin once the first version has been delivered:
Maintenance is the planned set of tasks to effect changes.
Evolution is what actually happens to the software.
![Page 4: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/4.jpg)
Previous research Lehman’s laws Parnas on software geriatrics Eick et al. on code decay (10 MLOC
telecom) Gall et al. (10 MLOC telecom)
Munro, Burd et al. (2 MLOC gcc)
![Page 5: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/5.jpg)
Lehman’s Laws in a nutshell Observations:
(Most) useful software must evolve or die. As a software system gets bigger, its resulting
complexity tends to limit its ability to grow. Development progress/effort is (more or less)
constant; growth is at best constant. Advice:
Need to manage complexity. Do periodic redesigns. Treat software and its development process as a
feedback system (and not as a passive theorem).
![Page 6: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/6.jpg)
Lehman’s examples
![Page 7: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/7.jpg)
A case study in evolution:The Linux OS kernel
![Page 8: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/8.jpg)
A case study in evolution:The Linux OS kernel It’s Linux!
Large system, very stable, many releases over several years, many developers
Growing mainstream adoption Open source development model
Interesting phenomenon in itself Easy to track, can publish results, many
experts Not much previous study
![Page 9: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/9.jpg)
Linux background Linux kernel v1.0 released March 1994
487 source files, 165 KLOC, i386 only Linux kernel v2.3.39 released January
2000 4854 source files, 2.2 MLOC, 10 hardware
architectures supported, over 300 developers credited
Maintained along two parallel paths: development and stable
![Page 10: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/10.jpg)
Methodology Examined 96 versions of Linux kernel
34 of the 67 stable releases 62 of the 369 development releases
All measures considered only .c/.h files contained in the tarball
Counted LOC using “wc –l” and an awk script that ignored comments and blank lines
Counted # of fcns/vars/macros using ctags Architectural model (SSs hierarchy) based on default
directory structure We plotted growth against calendar time
Lehman suggests plotting growth against release number
![Page 11: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/11.jpg)
Growth of compressed tar file
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
20,000,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Siz
e in
byt
es
Development releases (1.1, 1.3, 2.1, 2.3)
Stable releases (1.0, 1.2, 2.0, 2.2)
![Page 12: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/12.jpg)
Growth of # of source files
0
1000
2000
3000
4000
5000
6000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
# o
f so
urc
e co
de
file
s (*
.[ch
] )
Development releases (1.1, 1.3, 2.1, 2.3)
Stable releases (1.0, 1.2, 2.0, 2.2)
![Page 13: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/13.jpg)
Growth of # of global fcns, variables, and macros
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
# o
f g
lob
al f
cns,
var
iab
les,
an
d m
acro
s Development releases (1.1, 1.3, 2.1, 2.3)
Stable releases (1.0, 1.2, 2.0, 2.2)
![Page 14: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/14.jpg)
Growth of Lines of Code (LOC)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
LO
C
Total LOC ("wc -l") -- development releases
Total LOC ("wc -l") -- stable releases
Total LOC uncommented -- development releases
Total LOC uncommented -- stable releases
![Page 15: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/15.jpg)
Average/median .c file size
0
100
200
300
400
500
600
700
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
com
men
ted
LO
C
Average .c file size -- dev. releasesAverage .c file size -- stable releasesMedian .c file size -- dev. releasesMedian .c file size -- stable releases
![Page 16: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/16.jpg)
Average/median .h file size
0
20
40
60
80
100
120
140
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
co
mm
ente
d L
OC
Average .h file size -- dev. releasesAverage .h file size -- stable releasesMedian .h file size -- dev. releasesMedian .h file size -- stable releases
![Page 17: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/17.jpg)
Growth of major SSs (dev. releases)
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
drivers
arch
include
net
fs
kernel
mm
ipc
lib
init
![Page 18: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/18.jpg)
SS LOC as percentage of total system
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Per
cen
tag
e o
f to
tal
syst
em u
nco
mm
ente
d L
OC
driversarchincludenetfskernelmmipclibinit
![Page 19: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/19.jpg)
SS LOC as percentage of total system (ignoring drivers)
0.0
5.0
10.0
15.0
20.0
25.0
30.0
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Per
cen
tag
e o
f to
tal
syst
em u
nco
mm
ente
d L
OC
archincludenetfskernelmmipclibinit
![Page 20: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/20.jpg)
Growth of small core SSs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
kernel
mm
ipc
lib
init
![Page 21: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/21.jpg)
Growth of arch SSs
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
arch/ppc/
arch/sparc/
arch/sparc64/
arch/m68k/
arch/mips/
arch/i386/
arch/alpha/
arch/arm/
arch/sh/
arch/s390/
![Page 22: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/22.jpg)
Growth of drivers SSs
0
50,000
100,000
150,000
200,000
250,000
300,000
Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
un
com
men
ted
LO
C
drivers/netdrivers/scsidrivers/chardrivers/videodrivers/isdndrivers/sounddrivers/acorndrivers/blockdrivers/cdromdrivers/usbdrivers/"others"
![Page 23: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/23.jpg)
Observations and hypotheses
Growth along devel. path is super-linear
y = .21*x^2 + 252*x + 90,055 r2=.997y = size in LOC x = days since v1.0 r2 is “coefficient of determination” using least squares
[Lehman/Turski’s model: y’ = y + E/y^2 (3Ex)^(1/3)]
Linux’s strong growth is continuing. This is stronger growth at MLOC level than
observed by others (Lehman, Gall), even for other OSs.
![Page 24: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/24.jpg)
Why has Linux been able to continue its geometric growth? Core code quality is carefully maintained Architecture/problem domain
It’s largely drivers Much of the code is “parallel” It’s not as big as you might think
Vanilla configuration used only 15% of files
Development model (OSD) and its sociology Popularity and visibility has encouraged outsiders
(both hackers and industry) to contribute
![Page 25: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/25.jpg)
Growth of fetchmail [Raymond]
![Page 26: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/26.jpg)
Growth of pine (email client)
0
50
100
150
200
250
300
350
Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01
# o
f M
od
ule
s
![Page 27: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/27.jpg)
Growth of X Windows
0
500
1000
1500
2000
2500
3000
Nov-84 Aug-87 May-90 Jan-93 Oct-95 Jul-98 Apr-01
# o
f M
od
ule
s
X11R6
X11R5
X11R3
X10R3
X10R4
X11R1
X11R2
X11R6.1
X11R6.3X11R6.4
![Page 28: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/28.jpg)
Growth of gcc/g++/egcs
0
100
200
300
400
500
600
700
800
900
1000
Aug-87 Dec-88 May-90 Sep-91 Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01
# o
f m
od
ule
s g++
gcc
egcs
![Page 29: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/29.jpg)
Growth of vim (text editor)
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
To
tal
LO
C
Total LOC ("wc -l")
Total LOC (ignoring comments and blank lines)
![Page 30: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/30.jpg)
vim avg % comments and blank lines per file
25.0
26.0
27.0
28.0
29.0
30.0
31.0
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Ave
rag
e p
erce
nt
com
men
ts +
bla
nk
lin
es
![Page 31: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/31.jpg)
vim avg/median file size
0
100
200
300
400
500
600
700
800
900
1000
May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001
Un
com
men
ted
LO
C
Average uncommented LOC per source fileMedian uncommented LOC per source file
![Page 32: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/32.jpg)
vim’s architecture
![Page 33: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/33.jpg)
HypothesesFactors affecting evolution include
Size and age of system Use of traditional sw. eng. principles during
development
PLUS Problem domain
Problem complexity, multi-platform, multi-features Software architecture Process model Sociology, market forces, and acts-of-God
![Page 34: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/34.jpg)
Software evolution research: What next?So far, we have examined only growth. More case studies needed
Qualitative and quantitative Industrial and open source systems Different problem domains, architectures
Supporting tools to aid analysing, visualizing, and querying program evolution
More than just RCS and perl Support for architecture repair
Codified knowledge: Why and how does software change?
Build catalogue of change patterns and evolutionary narratives
![Page 35: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/35.jpg)
Codified knowledge Mature engineering disciplines codify knowledge
and experience. Arguably, this is lacking in software engineering.
Software architecture styles [Shaw] Design patterns [GoF]
Codified knowledge of how and why programs evolve:
Evolutionary narratives [Godfrey] Long term, coarse granularity
Change patterns Short term, fine granularity
![Page 36: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/36.jpg)
Change patterns and evolutionary narratives Phenomena observed in Linux evolution
Bandwagon effect Contributed third party code “Mostly parallel” enables sustained growth Clone and hack Careful control of core code; more flexibility
on contributed drivers, experimental features
![Page 37: Evolution in Open Source Software: A Case Study](https://reader036.vdocuments.net/reader036/viewer/2022062322/56814da4550346895dbaffa5/html5/thumbnails/37.jpg)