an empirical study of function clones in open source software

22
An Empirical Study of Function Clones in Open Source Software Chnchal K.Roy and James R. Cordy Queen’s University Presenter: MF Khan

Upload: kfurqan

Post on 03-Jun-2015

993 views

Category:

Technology


4 download

DESCRIPTION

This a presentation on a Research paper basically they made a tool call NICAD.

TRANSCRIPT

Page 1: An Empirical Study Of Function Clones In Open Source Software

An Empirical Study of Function Clones in Open Source

SoftwareChnchal K.Roy and James R. Cordy

Queen’s University

Presenter: MF Khan

Page 2: An Empirical Study Of Function Clones In Open Source Software

Outline

• Introduction• NICAD Overview• Experimental Setup• Experimental Results• Conclusions• Discussion

2

Page 3: An Empirical Study Of Function Clones In Open Source Software

Introduction• Code Clone/Clone

– Reusing a code of fragment by copying and pasting with or without minor modifications

• Benefits– Software Maintenance (Bug detection)

• History– Several techniques were proposed– Lack of in depth comparative studies on cloning in

Variety of systems

3

Page 4: An Empirical Study Of Function Clones In Open Source Software

Introduction (Cont)• NICAD

– In depth study of function cloning in 15+ C and Java Systems including Apache and Linux kernel

– Accurate Detection of Near-Miss functions Clones.– Focusing on its worth in detecting copy/Pasted near-miss

clones by using pretty printing, Code normalization and filtering

– Light Weight using simple text line– Capable of detecting clones in very large system in different

languages

4

Page 5: An Empirical Study Of Function Clones In Open Source Software

NICAD Overview• Three phases of clone detection

– ExtractionAll potential clones are identified and extracted.All function and method in C & Java with their

original source coordinates– Comparison (Determination of Clones)

Potential clones are clustered and compared.Pretty printed potential clones line by line text wise using

Longest common subsequence(LCS).

5

Page 6: An Empirical Study Of Function Clones In Open Source Software

NICAD OverviewUnique Percentage of Items(UPI)

IF UPI for both line sequence is zero or below certain threshold.

– Potential Clones are consider to be clone

– Reporting Results from NICAD reported in XML database form and interactive HTML

6

Page 7: An Empirical Study Of Function Clones In Open Source Software

Experimental Setup

Paper applied NICAD to find function clones in a number of open source systems

Later on paper introduce a set of metrics to analyze the results

7

Page 8: An Empirical Study Of Function Clones In Open Source Software

Experimental SetupSubject Systems 10 C and 7 Java systems

8

Page 9: An Empirical Study Of Function Clones In Open Source Software

Clone Definition

• Non empty functions of at least 3 LOC• In Pretty printed format.• Different Unique Percentage of Items (UPI)

use to find exact and near miss clones.• E.g.

– If UPI threshold is 0.0 =Exact clone– If UPI threshold is 0.10=Two function as clone

9

Page 10: An Empirical Study Of Function Clones In Open Source Software

Validation of Clones

• To validate detected clone is 2 step process• 1:NICADE’s INTRACTIVE HTML OUTPUT

– To given an overall view of original source of clone classes an over view of original source of clone classes.

• 2:XML OUTPUT– To pair wise compare the original source of the

functions in each clone class– using Linux diff to determine the textual similarity

of the original source10

Page 11: An Empirical Study Of Function Clones In Open Source Software

Metrics and Visualizations

• Total Cloned Methods(TCM)– How to get over all cloning statistics

• File Associated with Clone(FAWC)– Overall localization of clones.– From a s/w maintenance point of view, a lower value of

FAWCP is desirable...Why?– If clone are localized to certain specific files and thus may

be easier to maintain– Still one can’t say which files contain the majority of clone

in the system11

Page 12: An Empirical Study Of Function Clones In Open Source Software

Metrics and Visualizations

• Cloned Ratio of File for Methods(CRFM)– With CRFM we attempt discover highly cloned files– In a particular file (f)

• Profile of Cloning Locality w.r.t Methods(PCLM)– Kapser and Godfrey provide 3 location base

function clones.– 1:In the same File 2:Same DIR 3: Different DIR

12

Page 13: An Empirical Study Of Function Clones In Open Source Software

Experimental Results

13

1.More function cloning in Open Source java than in C. On AvG about 15%(7.2% wrt LOC)

2.Effect of increasing UPI is almost identical.

Page 14: An Empirical Study Of Function Clones In Open Source Software

Detail Overview

14

1.Several of C system have <10% cloning function.

Java systems are consistent in cloning

Page 15: An Empirical Study Of Function Clones In Open Source Software

Clone Associated Files

15

Page 16: An Empirical Study Of Function Clones In Open Source Software

Clone Associated Files

• FAWC address the issue of what portion of the files in a system is associated with clone.

• A system with more clones but with associated with only a few files is in some sense better than a system with fewer clones scattered over many files from a software maintenance point of view.

16

Page 17: An Empirical Study Of Function Clones In Open Source Software

Profiles of Cloning Density• It tell us which files are highly cloned or which

files contain the majority of clones

17

That’s mean Scattered File and more near miss clones

Page 18: An Empirical Study Of Function Clones In Open Source Software

Profile of cloning Density

18

Assuming that cloned method in high density cloned file have been intentionally copy/Pasted.

Page 19: An Empirical Study Of Function Clones In Open Source Software

Profile Cloning Localization

19

Location of a clone pair is a factor in s/w maintenanceExcept Linux there are no exact clone in (UPI threshold 0.0) in C

When UPI threshold is 0.3,On average 45.9 %(49.0 % LOC) of clone pair in C Occur.

Page 20: An Empirical Study Of Function Clones In Open Source Software

Conclusion

• NICAD is capable of accurately finding the1.Exact Function Clone2.Near Miss Function Clones

20

Page 21: An Empirical Study Of Function Clones In Open Source Software

Discussion

21

• What is definition of Clone?• What is definition of near-miss clones?• Why Wel tab is higher in slide 14?• What if we use C++ or C#?• What will happen if we use smaller clone

granularity such as begin- end block

Page 22: An Empirical Study Of Function Clones In Open Source Software

Thank you.

22