the evolution of the r software ecosystem (csmr 2013)

Post on 26-May-2015

285 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Software ecosystems form the heart of modern companies’ collaboration strategies with end users, open source developers and other companies. An ecosystem consists of a core platform and a halo of user contributions that provide value to a company or project. In order to sustain the level and number of high-quality contributions, it is crucial for companies and contributors to understand how ecosystems tend to evolve and can be maintained successfully over time. As a first step, this presentation explores the evolution characteristics of the statistical computing project GNU R, which is a successful, end-user programming ecosystem. We find that the ecosystem of user-contributed R packages has been growing steadily since R’s conception, at a significantly faster rate than core packages, yet each individual package remains stable in size. We also identified differences in the way user-contributed and core packages are able to attract an active community of users. http://sail.cs.queensu.ca/publications/pubs/German-CSMR2013.pdf

TRANSCRIPT

The Evolution of the R Software Ecosystem

Daniel M. GermanUniversity of Victoria

Bram AdamsÉcole Polytechnique

de Montréal

Ahmed E. HassanQueen's University

An Ecosystem is ...

An Ecosystem is ...

Jansen et al., ICSE '09

a set of (1) businesses functioning as a unit and interacting with a shared

market for (2) software and services, together with (3) the

relationships among [the businesses].

In Other Words

coreplatform

user contributions building on platform

coreplatform

user contributions building on platform

coreplatform

ecosystem infrastructure

user contributions building on platform

ecosystem infrastructure

user contributions building on platform

CRAN

ggplot

wethepeopledata.tableSim.DiffProc

randomForestrbundler

foreach

RODBC

rms

WGCNA

minpack.lm

fields caret heavy

plm

rv

ggplot2

Sim.DiffProcGUI

CRAN

ggplot

wethepeopledata.tableSim.DiffProc

randomForestrbundler

foreach

RODBC

rms

WGCNA

minpack.lm

fields caret heavy

plm

rv

ggplot2

Sim.DiffProcGUI

CRAN

In Other Words

Bosch, SPLC '09

Desktop ecosystems for end-user programming are the holy grail of software platforms!

6

6h#p://www.)obe.com

6

h#p://www.rexeranaly)cs.com/Data-­‐Miner-­‐Survey-­‐Results-­‐2011.html

6h#p://www.rexeranaly)cs.com/Data-­‐Miner-­‐Survey-­‐Results-­‐2011.html

But  How  Did  they  Get  This  Far?

• Very  successful  sta)s)cs  analysis  system• Created  by  Robert  Gentleman  in  1993• One  of  the  most  successful  languages  for  non-­‐programmers

• Very  successful  sta)s)cs  analysis  system• Created  by  Robert  Gentleman  in  1993• One  of  the  most  successful  languages  for  non-­‐programmers

Robert  Gentleman,  1993

• Very  successful  sta)s)cs  analysis  system• Created  by  Robert  Gentleman  in  1993• One  of  the  most  successful  languages  for  non-­‐programmers

Robert  Gentleman,  1993

non-­‐programmers

# Goals: A first look at R objects - vectors, lists, matrices, data frames.

# To make vectors "x" "y" "year" and "names"x <- c(2,3,7,9)y <- c(9,7,3,2)year <- 1990:1993names <- c("payal", "shraddha", "kritika", "itida")# Accessing the 1st and last elements of y --y[1]y[length(y)]

# To make a list "person" --person <- list(name="payal", x=2, y=9, year=1990)person# Accessing things inside a list --person$nameperson$x

# To make a matrix, pasting together the columns "year" "x" and "y"# The verb cbind() stands for "column bind"cbind(year, x, y)

# To make a "data frame", which is a list of vectors of the same length --D <- data.frame(names, year, x, y)nrow(D)# Accessing one of these vectorsD$names# Accessing the last element of this vectorD$names[nrow(D)]# Or equally,D$names[length(D$names)] 8

The  R  Language

9

R  has  an  ACTIVE  Community

9

R  has  an  ACTIVE  Community

package  infrastructure

9

R  has  an  ACTIVE  Community

package  infrastructure mailing  lists

9

R  has  an  ACTIVE  Community

package  infrastructure blogsmailing  lists

9

R  has  an  ACTIVE  Community

package  infrastructure

books

blogsmailing  lists

9

R  has  an  ACTIVE  Community

package  infrastructure

books

blogsmailing  lists

commercial  partners

9

R  has  an  ACTIVE  Community

package  infrastructure

books

blogsmailing  lists

commercial  partners conference

How  does  a  Successful  Ecosystem  like  R  Evolve?

10

How  does  a  Successful  Ecosystem  like  R  Evolve?

10

Package  Characteris)cs

How  does  a  Successful  Ecosystem  like  R  Evolve?

10

Package  Characteris)cs Package  Evolu)on

How  does  a  Successful  Ecosystem  like  R  Evolve?

10

Package  Characteris)cs Package  Evolu)on Package  Dependencies

How  does  a  Successful  Ecosystem  like  R  Evolve?

10

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

Package  Data  Used

Package  Data  Used

CRAN

23/04/1997  -­‐  25/02/201180  official  R  versions

base

recommended

popular

contributed

Package  Data  Used

CRAN

23/04/1997  -­‐  25/02/201180  official  R  versions

2,733

15

13

179

19,593  versions

+

How  to  Define  Popular  Packages?

How  to  Define  Popular  Packages?

How  to  Define  Popular  Packages?

contest  providing  list  of  installed  packages  by  52  users

1

510

50100

5001000

Number of Packages InstalledN

umbe

r of d

iffer

ent p

acka

ges

per u

ser

All Inst. by at least 20% users

popular  packages=

1

510

50100

5001000

Number of Packages InstalledN

umbe

r of d

iffer

ent p

acka

ges

per u

ser

All Inst. by at least 20% users

Mailing  List  Data  Used

13

Mailing  List  Data  Used

13

R-­‐help

R-­‐devel

Mailing  List  Data  Used

13

R-­‐help

R-­‐devel

MailMiner[Be#enburg  et  al.]

Mailing  List  Data  Used

13

R-­‐help

R-­‐devel

MailMiner[Be#enburg  et  al.]

PostgreSQL

How  does  a  Successful  Ecosystem  like  R  Evolve?

14

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

How  does  a  Successful  Ecosystem  like  R  Evolve?

14

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

0.0

0.1

0.2

0.3

0.4

0.5

Proportion of files for a given extension

Prop

ortio

n of

file

s ●

● ●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

BaseRecommendedPopularContributed

rd rtx

thp

prd

a c hde

scrip

tion

pdf

cpp

nam

espa

cef

rdat

apn

g gif

java rnw

save

htm

lxm

lte

x s qci

tatio

n

Documenta)on  Files  Dominate!

15

0.0

0.1

0.2

0.3

0.4

0.5

Proportion of files for a given extension

Prop

ortio

n of

file

s ●

● ●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

BaseRecommendedPopularContributed

rd rtx

thp

prd

a c hde

scrip

tion

pdf

cpp

nam

espa

cef

rdat

apn

g gif

java rnw

save

htm

lxm

lte

x s qci

tatio

n

Documenta)on  Files  Dominate!

15

documentaDon

0.0

0.1

0.2

0.3

0.4

0.5

Proportion of files for a given extension

Prop

ortio

n of

file

s ●

● ●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

BaseRecommendedPopularContributed

rd rtx

thp

prd

a c hde

scrip

tion

pdf

cpp

nam

espa

cef

rdat

apn

g gif

java rnw

save

htm

lxm

lte

x s qci

tatio

n

Documenta)on  Files  Dominate!

15

documentaDon

source  code

base recommended popular contributed

Size of Documentation per Package

Documentation Files (.rd)

Line

s

0

100

1k

10k

100k

Extensive  Package  Documenta)on

16

5.3k 3.6k1.7k

0.6k

Contributed  Packages  Contain  Less  Code

17

Size of Source Code per Package

File type

SLO

Cs

0

1001k

10k100k

1M

All source code r c

Base Recommended Popular Contributed

Size of Source Code per Package

File type

SLO

Cs

0

1001k

10k100k

1M

All source code r c

Base Recommended Popular Contributed

Size

of S

ourc

e C

ode

per P

acka

ge

File

type

SLOCs

0

1001k10k

100k1M

All s

ourc

e co

der

c

Base

Rec

omm

ende

dPo

pula

rC

ontri

bute

d

Size

of S

ourc

e C

ode

per P

acka

ge

File

type

SLOCs

0

1001k10k

100k1M

All s

ourc

e co

der

c

Base

Rec

omm

ende

dPo

pula

rC

ontri

bute

d

base recommended popular contributed

Size of Documentation per Package

Documentation Files (.rd)

Line

s

0

100

1k

10k

100k

7.3k 3.5k 1.8k0.7k

18

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

18

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

18

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

15

50500

Number of Packages over Time

Total

●●

●●

●● ●●●

1998 2000 2002 2004 2006 2008 2010

BaseRecommendedPopularContributed

Fast  Growth  of  Contributed  Packages

19

15

50500

Number of Packages over Time

Total

●●

●●

●● ●●●

1998 2000 2002 2004 2006 2008 2010

BaseRecommendedPopularContributed

Fast  Growth  of  Contributed  Packages

19

super-­‐linea

r  growth

15

50500

Number of Packages over Time

Total

●●

●●

●● ●●●

1998 2000 2002 2004 2006 2008 2010

BaseRecommendedPopularContributed

Fast  Growth  of  Contributed  Packages

19

super-­‐linea

r  growth

conservaDve  base/recommended  evoluDon

Evolution of the Size of Source Code per Package

1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011

0100

10k

1M

Base Recommended Popular Contributed

Evolution of the Size of Source Code per Package

1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011

0100

10k

1M

Base Recommended Popular Contributed

Contributed  Packages  have  Stable  Size

20

Evolution of the Size of Source Code per Package

1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011

0100

10k

1M

Base Recommended Popular Contributed

Evolution of the Size of Source Code per Package

1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011

0100

10k

1M

Base Recommended Popular Contributed

Number of Releases Per Package

● ●

●●

●● ●

●●

●●

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

15

1020

50160

● RecommendedPopularContributed

The  Less  Core,  the  Less  Releases

21

Number of Releases Per Package

● ●

●●

●● ●

●●

●●

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

15

1020

50160

● RecommendedPopularContributed

The  Less  Core,  the  Less  Releases

21

50%  had  <=17  releases

Number of Releases Per Package

● ●

●●

●● ●

●●

●●

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

15

1020

50160

● RecommendedPopularContributed

The  Less  Core,  the  Less  Releases

21

50%  had  <=3  releases

50%  had  <=17  releases

Date of Latest Release per Package

● ●● ● ● ● ● ● ● ● ● ● ●

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

200320042005200620072008200920102011

● RecommendedPopularContributed

...  but  Contributed  Packages  are  Ac)vely  Maintained!

22

>90%  of  packages  had  release  in  last  2  years

23

23

24

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

24

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

24

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

05

1015

2025

Number of Dependencies Per Package

Proportion of Packages

Num

ber o

f Dep

ende

ncie

s

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RecommendedPopularContributed

Packages  have  Few  Dependencies

05

1015

2025

Number of Dependencies Per Package

Proportion of Packages

Num

ber o

f Dep

ende

ncie

s

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RecommendedPopularContributed

Packages  have  Few  Dependencies

1/3  has  NONE

05

1015

2025

Number of Dependencies Per Package

Proportion of Packages

Num

ber o

f Dep

ende

ncie

s

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RecommendedPopularContributed

Packages  have  Few  Dependencies

1/3  has  NONE 1/4  has  1  dependency

Number of Dependents Per Package

Proportion of Packages

Num

ber o

f Dep

ende

nts

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

01

310

5026

0

RecommendedPopularContributed

Contributed  Packages  are  Higher-­‐Level

Number of Dependents Per Package

Proportion of Packages

Num

ber o

f Dep

ende

nts

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

01

310

5026

0

RecommendedPopularContributed

Contributed  Packages  are  Higher-­‐Level

NO  dependents

Number of Dependents Per Package

Proportion of Packages

Num

ber o

f Dep

ende

nts

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

01

310

5026

0

RecommendedPopularContributed

Contributed  Packages  are  Higher-­‐Level

NO  dependents50%  

popular  pa

ckages  has

 <=6  depe

ndents

27

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

27

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

few  dependencies

contributed  packages  are  higher  level

27

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

few  dependencies

contributed  packages  are  higher  level

1998 2000 2002 2004 2006 2008 2010

05000

10000

15000

20000

#messages

● ● ● ● ●

●●

● ●

baserecommendedpopularcontributed

Contributed  Packages  Generate  More  User  Traffic

1998 2000 2002 2004 2006 2008 2010

0500

1000

1500

2000

2500

#messages

● ●● ●

● ● ●

baserecommendedpopularcontributed

Contributed  Packages  take  over  Developer  Traffic

1998 2000 2002 2004 2006 2008 2010

0500

1000

1500

2000

2500

#messages

● ●● ●

● ● ●

baserecommendedpopularcontributed

Contributed  Packages  take  over  Developer  Traffic

110

010

000

Tota

l #m

essa

ges

base recommended popular contributed

The  Less  Core,  the  Less  Traffic

110

010

000

Tota

l #m

essa

ges

base recommended popular contributed

The  Less  Core,  the  Less  Trafficstrong

compeDDon

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

3  months

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

3  months 1  year

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

3  months 1  year

5  months  slower

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

3  months 1  year

5  months  slower 44.9%  gets  here

Tim

e

instantday

week

month

year

5 year10 year

1st msg. 10th msg. 100th msg. 1000th msg.

baserecommendedpopularcontributed

Star)ng  up  a  Community  takes  1  Year

3  months 1  year

5  months  slower

only  6.5%  gets  this  far

44.9%  gets  here

32

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

few  dependencies

contributed  packages  are  higher  level

32

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

few  dependencies

contributed  packages  are  higher  level

strong  compe))on  for  a#en)on

building  a  community  takes  a  year

So  What?• How  do  contributors  deal  with  the  fight  for  aYenDon?–  What  is  their  mo)va)on?–  How  much  effort  do  they  spend  on  their  package?

• How  does  a  package  become  popular/recommended?–  Do  bloggers/books  have  an  impact?–  Or  is  it  the  other  way  around?

• How  do  R-­‐forge  and  the  core  team  ensure  high  quality  releases  without  broken  packages?

• ...

Bosch, SPLC '09

Desktop ecosystems for end-user programming are the holy grail of software platforms!

base

recommended

popular

contributed

Case  Study  on  R

CRAN

23/04/1997  -­‐  25/02/201180  official  R  versions

2,733

15

13

179

19,593  versions

+

37

Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

extensive  documenta)on

small  contributed  packages

fast  growth  of  contributed  packages

stable  package  size

ac)ve  maintenance

few  dependencies

contributed  packages  are  higher  level

strong  compe))on  for  a#en)on

building  a  community  takes  a  year

1st International Workshop on Release Engineering

http://releng.polymtl.ca May 20, 2013, San Francisco, USA

RELENG 2013

top related