an empirical study of identical function clones in cran

21

Upload: tom-mens

Post on 15-Jul-2015

203 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: An Empirical Study of Identical Function Clones in CRAN

An Empirical Study ofIdentical Function Clones

in CRANMaëlick Claes

Tom Mens, Narjisse Tabout & Philippe Grosjean&

6th February 2014, IWSC 2015

Software Engineering Lab Numerical Ecology of AquaticSystems Lab

0

Page 2: An Empirical Study of Identical Function Clones in CRAN

Introduction

Page 3: An Empirical Study of Identical Function Clones in CRAN

Statistical environment based on the S languagePackages with code, doc, examples, tests, datasetsCRAN (Comprehensive R Archive Network)

Official R package repositoryStrict policy for package acceptancePackage quality regularly checked & archive processComplaints in the community Hornik 2012, Are there too many R packages?

Empirical study of Inter-project clones in CRAN

http://www.r-project.org

Page 4: An Empirical Study of Identical Function Clones in CRAN

Previous workPreliminary empirical study using CRAN meta-data

On the maintainability of CRAN packages (CSMR-WCRE 2014)R CMD check results from CRAN:

Most errors resolved quickly without developer interventionMaintenance effort needs to focus on fixing errors caused by othersNeed for a more specific tool to detect problems related to dependencychanges

Web-dashboard for CRAN maintainersmaintaineR, a web-based dashboard for maintainers of CRAN packages(ICSME 2014)Type-1 function clone identification

http://cran.r-project.org/web/checks/

Page 5: An Empirical Study of Identical Function Clones in CRAN

Identifying cloned R functionsParsing R code with R itselfAssigning a SHA-1 hash to each function's ASTIgnoring functions with less than 6 lines of codeIdentifying Type-1 clones = identifying identical hashes across packages

Page 6: An Empirical Study of Identical Function Clones in CRAN

Observed clone casesCoexisting package versions: plyr and dplyr, lme and nlme, np and npRmpiFork package: Rcmdr and QCAGUIFrequently cloned package: distrUtility package: DescToolsPopular package: MASSPopular function: permn() from combinat

Page 7: An Empirical Study of Identical Function Clones in CRAN

Research QuestionsHow prevalent are (Type-1) function clones in CRAN?Why did these clones appear?Is it possible to remove them and how?

Page 8: An Empirical Study of Identical Function Clones in CRAN

How prevalent are(Type-1) functionclones in CRAN?

Page 9: An Empirical Study of Identical Function Clones in CRAN

Evolution of the number ofpackages

Page 10: An Empirical Study of Identical Function Clones in CRAN

Evolution of the number of LOC

Page 11: An Empirical Study of Identical Function Clones in CRAN

Evolution of the relative size

Page 12: An Empirical Study of Identical Function Clones in CRAN

Why did clonesappear?

Page 13: An Empirical Study of Identical Function Clones in CRAN

Categorizing clonesAll clones on 1st December 2014

7366 clones162k LOC1409 packages

3184 clone setsIdentifying the origin of each clone setEach clone set origin is either

An anonymous and/or local functionAn archived global functionA private global functionA public global function

Page 14: An Empirical Study of Identical Function Clones in CRAN

Anonymous, local and globalfunctions

From DescTools 0.99.8.1 package...qbinom.abscont <- function(p, size, x){ fun <- function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p } uniroot(fun, interval = c(0, 1), size = size, x = x, p = p)$root}

... which could be rewritten asqbinom.abscont <- function(p, size, x){ uniroot(function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p }, interval = c(0, 1), size = size, x = x, p = p)$root}

Page 15: An Empirical Study of Identical Function Clones in CRAN

NAMESPACE fileAlso from DescTools 0.99.8.1

exportPattern("̂[̂\\.]")

importFrom("boot", "boot", "boot.ci", "corr")import(tcltk)

useDynLib(DescTools)

Page 16: An Empirical Study of Identical Function Clones in CRAN

Classification of clone origins

Most clones were created because it was not possible to re-use the original function

Page 17: An Empirical Study of Identical Function Clones in CRAN

Is it possible toremove clones and

how?

Page 18: An Empirical Study of Identical Function Clones in CRAN

Adding dependency toThe origin package

673 out of the 1899 global clone set origins are public functions782 functions that could potentially be removed in 332 packages48 functions in a package where there is already a direct dependency20 functions in a package where a dependency cannot be added withoutcreating cycles

A non-original clone copyOn 2511 clone sets with a non-public origin function, only 250 have anotherpublic copyOnly 299 functions could be removed by depending on another copy

=> Removing clones in CRAN packages cannot be reduced to code refactoring. Mostof the time it would require communication between maintainers of different

packages

Page 19: An Empirical Study of Identical Function Clones in CRAN

ConclusionCloned code represents a small fraction of all CRAN code but still more than100K LOC across the biggest CRAN packagesMost clones cannot be removed by adding dependencies without enforcing CRANpolicyBut still an important number of clones that could theoretically easily be removedFurther work needed to understand if the refactorable clones are justified or not

Page 20: An Empirical Study of Identical Function Clones in CRAN

Future WorkAsking developers (survey) about their cloning behaviorType-2 and Type-3 clonesClone patternsInter-project cloning behavior in other languages / ecosystems

Page 21: An Empirical Study of Identical Function Clones in CRAN

Thanks for your attention

Questions?Slides: http://maelick.net/presentations/iwsc2015/