correlation analysis in automated - fosdem · 2020. 1. 29. · correlation analysis in automated...

15
Correlation analysis in automated testing FOSDEM 2020 Łukasz Wcisło 1 / 15

Upload: others

Post on 12-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Correlation analysis in automatedtesting

FOSDEM 2020

Łukasz Wcisło

1 / 15

Page 2: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

IntroductionPurposeFunction definition & deviationsCovariance matrixPearson correlation coefficientCorrelation MatrixUse-case

Agenda

2 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 3: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Science may be described as the art of systematic over-simplification — the art of discerning what we may with

advantage omit.

Karl Popper

Introduction

3 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 4: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

SimplicityTime savingLogicElegance

Purpose

4 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 5: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Test result as a Boolean function, a relation between a release version and aresult of a test.

Red - FAIL

Green - PASS

Function definition

5 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 6: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Instead of using expected value, we can use the probability.

Function deviations

6 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 7: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Where

is a variance of variable X, and

is a covariance between two standardized random variables.

(In our case - between two tests)

Covariance matrix

7 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 8: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

We can extract meaningful tests for better performance. Diagonal containsvariance of each test, covariance matrix is symmetric. Also, every covariancematrix is positive semi-definite.

Covariance matrix 2

8 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 9: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

What brings us to Pearson correlation coefficient.

It is a covariance of two variables divided by the product of their standarddeviations:

Pearson correlation coefficient

9 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 10: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Where correlation is normalized and always stays between -1 and 1.

Correlation Matrix

10 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 11: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Source

Actual use-case

11 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 12: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Mean of x, of y, variance of x, of y, correlation between x and y, linear regressionand coefficient of determination of the linear regression are the same for eachdata set.

Anscombe's quartet

12 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 13: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

1. A. Buda and A.Jarynowski (2010) Life-time of correlations and itsapplications vol.1, Wydawnictwo Niezależne: 5–21, December 2010, ISBN978-83-915272-9-0

2. W.J. Krzanowski: Principles of Multivariate Analysis. Nowy Jork: OxfordUniversity Press, 2003, seria: Oxford Statistical Science. ISBN 0-19-850708-9.

3. Cox, D.R., Hinkley, D.V. (1974) Theoretical Statistics, Chapman & Hall(Appendix 3) ISBN 0-412-12420-3

4. Anscombe, F. J. (1973). "Graphs in Statistical Analysis". AmericanStatistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966

Bibliography

13 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 14: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Q & A

14 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło

Page 15: Correlation analysis in automated - FOSDEM · 2020. 1. 29. · Correlation analysis in automated testing | Łukasz Wcisło. Q & A 14 / 15 FOSDEM 2020 Correlation analysis in automated

Thank you for your attention

"There are three kinds of lies: lies, damned lies, and statistics."

Benjamin Disraeli

15 / 15FOSDEM 2020

Correlation analysis in automated testing | Łukasz Wcisło