an empirical study of identical function clones in cran

Post on 15-Jul-2015

203 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Empirical Study ofIdentical Function Clones

in CRANMaëlick Claes

Tom Mens, Narjisse Tabout & Philippe Grosjean&

6th February 2014, IWSC 2015

Software Engineering Lab Numerical Ecology of AquaticSystems Lab

0

Introduction

Statistical environment based on the S languagePackages with code, doc, examples, tests, datasetsCRAN (Comprehensive R Archive Network)

Official R package repositoryStrict policy for package acceptancePackage quality regularly checked & archive processComplaints in the community Hornik 2012, Are there too many R packages?

Empirical study of Inter-project clones in CRAN

http://www.r-project.org

Previous workPreliminary empirical study using CRAN meta-data

On the maintainability of CRAN packages (CSMR-WCRE 2014)R CMD check results from CRAN:

Most errors resolved quickly without developer interventionMaintenance effort needs to focus on fixing errors caused by othersNeed for a more specific tool to detect problems related to dependencychanges

Web-dashboard for CRAN maintainersmaintaineR, a web-based dashboard for maintainers of CRAN packages(ICSME 2014)Type-1 function clone identification

http://cran.r-project.org/web/checks/

Identifying cloned R functionsParsing R code with R itselfAssigning a SHA-1 hash to each function's ASTIgnoring functions with less than 6 lines of codeIdentifying Type-1 clones = identifying identical hashes across packages

Observed clone casesCoexisting package versions: plyr and dplyr, lme and nlme, np and npRmpiFork package: Rcmdr and QCAGUIFrequently cloned package: distrUtility package: DescToolsPopular package: MASSPopular function: permn() from combinat

Research QuestionsHow prevalent are (Type-1) function clones in CRAN?Why did these clones appear?Is it possible to remove them and how?

How prevalent are(Type-1) functionclones in CRAN?

Evolution of the number ofpackages

Evolution of the number of LOC

Evolution of the relative size

Why did clonesappear?

Categorizing clonesAll clones on 1st December 2014

7366 clones162k LOC1409 packages

3184 clone setsIdentifying the origin of each clone setEach clone set origin is either

An anonymous and/or local functionAn archived global functionA private global functionA public global function

Anonymous, local and globalfunctions

From DescTools 0.99.8.1 package...qbinom.abscont <- function(p, size, x){ fun <- function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p } uniroot(fun, interval = c(0, 1), size = size, x = x, p = p)$root}

... which could be rewritten asqbinom.abscont <- function(p, size, x){ uniroot(function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p }, interval = c(0, 1), size = size, x = x, p = p)$root}

NAMESPACE fileAlso from DescTools 0.99.8.1

exportPattern("̂[̂\\.]")

importFrom("boot", "boot", "boot.ci", "corr")import(tcltk)

useDynLib(DescTools)

Classification of clone origins

Most clones were created because it was not possible to re-use the original function

Is it possible toremove clones and

how?

Adding dependency toThe origin package

673 out of the 1899 global clone set origins are public functions782 functions that could potentially be removed in 332 packages48 functions in a package where there is already a direct dependency20 functions in a package where a dependency cannot be added withoutcreating cycles

A non-original clone copyOn 2511 clone sets with a non-public origin function, only 250 have anotherpublic copyOnly 299 functions could be removed by depending on another copy

=> Removing clones in CRAN packages cannot be reduced to code refactoring. Mostof the time it would require communication between maintainers of different

packages

ConclusionCloned code represents a small fraction of all CRAN code but still more than100K LOC across the biggest CRAN packagesMost clones cannot be removed by adding dependencies without enforcing CRANpolicyBut still an important number of clones that could theoretically easily be removedFurther work needed to understand if the refactorable clones are justified or not

Future WorkAsking developers (survey) about their cloning behaviorType-2 and Type-3 clonesClone patternsInter-project cloning behavior in other languages / ecosystems

Thanks for your attention

Questions?Slides: http://maelick.net/presentations/iwsc2015/

top related