information theory concepts in software engineering

24
Information theory concepts in software engineering Richard Torkar [email protected]

Upload: vesna

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Information theory concepts in software engineering. Richard Torkar [email protected]. Blekinge Institute of Technology. A YOUNG INSTITUTE. Founded in 1989 One of three independent institutes of technology Three campuses. ME. Richard Torkar Former officer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information  theory concepts in software engineering

Information theory concepts in software engineering

Richard Torkar

[email protected]

Page 2: Information  theory concepts in software engineering

Blekinge Institute of Technology

Page 3: Information  theory concepts in software engineering

A YOUNG INSTITUTE• Founded in 1989• One of three

independent institutes of technology

• Three campuses

Page 4: Information  theory concepts in software engineering

ME• Richard Torkar

– Former officer– PhD in software engineering at BTH

(studied at University West and Chalmers)– REFASTEN project– Director for SWELL– Project manager for CONES– Programme manager POKAL and EMSE– Participating in EASE and RUAG– Prof. Claes Wohlin’s research group SERL

Page 5: Information  theory concepts in software engineering
Page 6: Information  theory concepts in software engineering

Partner with problems…

• Millions of test cases constantly running (24/7)

• Tests a system containing 25-30 large subsystems

• Contractors and divisions all over the world use the same test bed

Page 7: Information  theory concepts in software engineering

NORMALISED COMPRESSION DISTANCE2

• Kolmogorov Complexity • Cilibrasi and Vitányi used a compression3

algorithm for approximating K• Non-neg number 0<=NCD<=1+e, where e

depends on how good C approximates K

Page 8: Information  theory concepts in software engineering

WHAT´S BEEN DONE?

• Information distance (ID)– The ID between two binary strings, x and y, is

the length of the shortest program that translates x to y and, consequently, y to x.

• ID is the universal distance metric– Minimal among computable distance

functions– Uncovers all effective similarities

Page 9: Information  theory concepts in software engineering

Distance = 1 || 0?

Page 10: Information  theory concepts in software engineering

What to try?

Hcog: Ordering tests based on their ∆VATdistance cannot be distinguished from howa human would order the tests based ontheir ‘cognitive similarity’.

Page 11: Information  theory concepts in software engineering
Page 12: Information  theory concepts in software engineering

Defs

• A complete VAT trace of a test is a string with all the information about the actual execution of a test for all the variation points in the VAT model.

• The Universal Test Distance, denoted ∆VAT, in the VAT model is the information distance between the complete VAT traces of two tests.

Page 13: Information  theory concepts in software engineering

UNIVERSAL TEST DISTANCE

• Universal Test Distance (UTD): Information Distance of complete VAT traces of n tests (where n >= 2)

• Should discover any similarities1 between tests…

• But ID is non-computable!

Page 14: Information  theory concepts in software engineering

USING NCD AS A TEST DISTANCE

• Uncover “meaningful” distances?• Three engineers ordered 25 tests applied

on the triangle problem• Coded these tests in Bacon (Ruby)• Traced exec of each test saved• Calculated an NCD matrix (distance tree)

Page 15: Information  theory concepts in software engineering

[4]

Page 16: Information  theory concepts in software engineering

[4]

Page 17: Information  theory concepts in software engineering
Page 18: Information  theory concepts in software engineering

RESULTS

• Humans and NCD classified in the same way (rooted non-binary trees)

• NCD:– Args permutations grouped together– float case close to int– Division between valid and non-valid

Page 19: Information  theory concepts in software engineering

CONCLUSIONS• NCD can cluster tests on cognitive similarities• Differences we see are mainly explained by

“white-boxness” (traces include implementation details)

• Input data only is not sufficient• NCD calculations are costly• Could be used as a way to smooth the search

space?

Page 20: Information  theory concepts in software engineering

WHAT WILL BE DONE?

• If we can measure distance, basically any distance, then why not measure:– Scientific real world propagation– Quality of alternative information sources– Quality of individual engineers…– Clustering trouble reports, using the CH or

Silhouette index (and then do RCA on clusters instead of individual reports to get indications regarding fault modules)

Page 21: Information  theory concepts in software engineering

OTHER TODOs

• Statistical tests for cumulative voting• Semi-automated sys lit review via abstract

clustering

http://www.torkar.se

Page 22: Information  theory concepts in software engineering

NODE DESCRIPTIONS

• For p. 12:– XY_A1_A2_A3– X = S/L: short/long integer arguments– X = F: Float arguments– Y = E: Equilateral triangle– Y = S: Scalene triangle– Y = I: Isosceles triangle– Y = X: Invalid triangle

Page 23: Information  theory concepts in software engineering

References[1] M. Li, X. Chen, X. Li, B. Ma and P.M.B. Vitányi, P. 2003. The similarity

metric. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, Maryland, January 12 - 14, 2003). Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp. 863-872.

[2] R.L. Cilibrasi, P.M.B. Vitányi, “The Google similarity distance,” IEEE Transactions on Knowledge and Data Engineering, pp. 370-383, March, 2007

[3] P.M.B. Vitányi and L. Ming, “Minimum description length induction, Bayesianism, and Kolmogorov complexity,” IEEE Transactions on Information Theory, (46) pp. 446-464. 2000.

[4] P.M.B. Vitanyi, F.J. Balbach, R.L. Cilibrasi, M. Li, Normalized information distance, pp. 45-82 in: Information Theory and Statistical Learning, Eds. F. Emmert-Streib and M. Dehmer, Springer-Verlag, New-York, 2008.

Page 24: Information  theory concepts in software engineering

Questions?