mysql performance metrics you can't measure - using regression instead

37
K NOWING THE U NKNOWABLE T O KNOW THAT WE KNOW WHAT WE KNOW , AND TO KNOW THAT WE DO NOT KNOW WHAT WE DO NOT KNOW , THAT IS TRUE KNOWLEDGE . N ICOLAUS C OPERNICUS C ERTAIN INVENTIONS DISCLOSED IN THIS PRESENTATION MAY BE CLAIMED WITHIN PATENTS OWNED OR PATENT APPLICATIONS FILED BY V IVID C ORTEX , I NC .

Upload: vividcortex

Post on 19-Jul-2015

390 views

Category:

Technology


2 download

TRANSCRIPT

KNOWING THE UNKNOWABLE

TO KNOW THAT WE KNOW WHAT WE KNOW, AND TO KNOW THAT WE DO NOT KNOW WHAT WE DO NOT KNOW, THAT IS TRUE KNOWLEDGE.

― NICOLAUS COPERNICUS

CERTAIN INVENTIONS DISCLOSED IN THIS PRESENTATION MAY BE CLAIMED WITHIN PATENTS OWNED OR PATENT APPLICATIONS FILED BY VIVIDCORTEX, INC.

LOGISTICS

SLIDES AND RECORDING WILL BE EMAILED TO YOU AFTER THE WEBINAR

TWEET QUESTIONS TO @VIVIDCORTEX AT ANY TIME (AND FOLLOW US!)

PLEASE OBSERVE THE LIGHTED EXIT SIGNS AND TURN OFF ALL ELECTRONIC DEVICES ;-)

BARON SCHWARTZ

@XAPRB ON TWITTER [email protected] LINKEDIN.COM/IN/XAPRB

Optimization, Backups, Replication, and more

Baron Schwartz, Peter Zaitsev &

Vadim Tkachenko

High PerformanceMySQL

3rd Edition

Covers Version 5.5

MOTIVATION

WHY NOT MEASURE?

NOT INSTRUMENTED

LESS DATA

LESS OVERHEAD

MORE FLEXIBILITY

MORE ACCURATE

MEASURING QUERIES

QUERY CLASSES

PROCESS STATISTICS

STANDARD TECHNIQUES

LINEAR REGRESSION ANALYSIS

CPU TIMEQUERY

TIME

MINIMIZING SQUARED ERRORS

MULTIPLE REGRESSION

VARIATIONS

CONSTRAINED REGRESSION SMOOTHING AND SAMPLING STEP REGRESSION LOCAL REGRESSION LOGISTIC REGRESSION DECISION TREES AND RANDOM FORESTS BAYESIAN REGRESSION; MLE ENSEMBLES MACHINE LEARNING WAVELET DECOMPOSITION AND FFT COMMERCIAL SOFTWARE

PROBLEMS

TOO COMPLEX; TOO GENERAL

TOO SLOW AND COSTLY; O(N2) IN X-VARS

PARTIAL RESULTS

FOOLED BY CORRELATED X-VARS

0 100 200 300 400 500 600 700

2e+0

63e

+06

4e+0

65e

+06

6e+0

67e

+06

Index

q.63

68bf

5907

564a

9f ti

me

0 100 200 300 400 500 600 7000

5000

1000

015

000

2000

0Index

e.78

5f8a

c3c1

ea1c

93 ti

me

0 100 200 300 400 500 600 700

5.0e

+07

1.0e

+08

1.5e

+08

Index

Serv

er C

PU ti

me

QUERY TIMES / SERVER CPU TIME

2e+06 4e+06 6e+06

5.0e

+07

1.5e

+08

Query q.6368bf5907564a9f time

Serv

er C

PU ti

me

0 5000 10000 15000 20000

5.0e

+07

1.5e

+08

Query e.785f8ac3c1ea1c93 time

Serv

er C

PU ti

me

QUERIES VS CPU

ORDINARY MULTIPLE LEAST-SQUARES REGRESSION

REQUIREMENTS

MEMORY & CPU EFFICIENT FOR LARGE DATASETS

FULL RESULTS

NO PRECOMPUTATION

REASONABLE ACCURACY

SIMPLE & PHYSICALLY REALISTIC

INSIGHTS

NON-NEGATIVE SLOPES

IDENTICAL DIMENSIONS

INDEPENDENC OF X-VARS

ALL VARS SIGNIFICANT

VARS ROUGHLY SIMILAR

ION CANNONS NOT NEEDED

WEIGHTED LINEAR REGRESSION

WEIGHTED LINEAR REGRESSION

0

250

500

750

1000

X-V Y-V Z-V

WEIGHTED LINEAR REGRESSION

0

250

500

750

1000

X-V Y-V Z-V

WEIGHTED LINEAR REGRESSION

0

250

500

750

1000

X-V Y-V Z-V

WEIGHTED LINEAR REGRESSION

0

250

500

750

1000

X-V Y-V Z-V

2e+06 4e+06 6e+06

5.0e

+07

1.5e

+08

Query q.6368bf5907564a9f time

Serv

er C

PU ti

me

0 5000 10000 15000 20000

5.0e

+07

1.5e

+08

Query e.785f8ac3c1ea1c93 time

Serv

er C

PU ti

me

2e+06 4e+06 6e+06

5.0e

+06

1.5e

+07

2.5e

+07

Query q.6368bf5907564a9f time

Allo

cate

d Se

rver

CPU

tim

e

0 5000 10000 15000 20000

020

000

4000

060

000

Query e.785f8ac3c1ea1c93 timeAl

loca

ted

Serv

er C

PU ti

me

RESULTS

METRICS OF QUALITY

R2

STANDARD ERROR; T-STATISTIC

F-STATISTIC

MAPE

RESIDUAL PLOTS

OUR APPROACH

DESCRIPTIVE STATS

VISUALIZATION

PREDICTION AND SCORING

SUBSETTING

DESCRIPTIVE STATS

QUERY CLASS SAMPLES R SLOPE T-VALUE INTERCEPT T-VALUE

Q.6368BF5907564A9F 719 0.98 3.65 0.0054 0.000083 0.98

E.785F8AC3C1EA1C93 711 0.98 3.09 0.0053 993.4 2.12

sample 001 actual CPU time

pred

icte

d C

PU ti

me

sample 002 actual CPU time

pred

icte

d C

PU ti

me

PREDICTIONS VERSUS ACTUAL MEASUREMENTS *PERFECT ACCURACY WOULD BE SLOPE = 1.0, R2 = 1.0, AND MAPE = 0%

SLOPE 0.97 R2 0.96

MAPE 5.9%

SLOPE 1.00 R2 0.91

MAPE 5.9%

0 100 200 300 400 500 600 700

−10

010

2030

40

sample 001

erro

r%

0 100 200 300 400 500 600 700

−20

−10

010

2030

4050

sample 002

erro

r%

RESIDUAL PERCENTAGES

sample 001 % error

Freq

uenc

y

−10 0 10 20 30 40 50

020

4060

80

sample 002 % error

Freq

uenc

y

−20 −10 0 10 20 30 40 50

020

4060

80

RESIDUAL HISTOGRAMS

SURPRISE!QUERIES CAN USE MORE CPU THAN WALL-CLOCK TIME

GOTCHASTRICKY QUERIES

NO RELATIONSHIP

UNMEASURED EFFECTS

WRONG MEASUREMENTS

CHAOTIC SYSTEMS

(AGAIN) WHY NOT MEASURE? CAN’T THE DATABASE MEASURE THE CPU TIME MORE ACCURATELY?

STATING THE OBVIOUS?

−5.0e+07 0.0e+00 5.0e+07 1.0e+08 1.5e+08

5.0e

+07

1.0e

+08

1.5e

+08

Total Query Time

Serv

er C

PU T

ime

LITMUS TEST

QUESTIONS? [email protected] • @XAPRB

REFERENCES

DETAILED TECHNICAL WHITE PAPER: HTTPS://VIVIDCORTEX.COM/WHITE-PAPERS/WLR/

SAMPLE DATA AND CODE: HTTPS://GITHUB.COM/VIVIDCORTEX/WLR