production-run software failure diagnosis via hardware performance...

33
Production-Run Software Failure Diagnosis via Hardware Performance Counters Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

Upload: others

Post on 04-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Production-Run Software Failure Diagnosis

via Hardware Performance Counters

Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

Page 2: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Motivation

Software inevitably fails on production machines

These failures are widespread and expensive

• Internet Explorer zero-day bug [2013]

• Toyota Prius software glitch [2010]

2

These failures need to be diagnosed before they can be fixed !

Page 3: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Production-run failure diagnosis

Diagnosing failures on client machines

• Limited info from each client machine

• One bug can affect many clients

• Need to figure out root cause & patch quickly

3

Page 4: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Executive Summary

4

Use existing hardware support to diagnose widespread production-run failures with low

monitoring overhead

Page 5: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnosing a real world bug

Sequential bug in print_tokens

5

int is_token_end(char ch){ if(ch == ‘\n’) return (TRUE); else if(ch == ‘ ’) // Bug: should return FALSE return (TRUE); else return (FALSE); }

Input: Abc Def

Expected Output:

{Abc}, {Def}

Actual Output:

{Abc Def}

Page 6: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnosing concurrency bugs

Concurrency bug in Apache server

6

decrement_refcnt(...) { atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup(obj); }

decrement_refcnt(...) { atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup(obj); }

2 --> 1

THREAD 1 THREAD 2

1 --> 0

0 0

Page 7: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Requirements for failure diagnosis

Performance

• Low runtime overhead for monitoring apps

• Suitable for production-run deployment

Diagnostic Capability

• Ability to accurately explain failures

• Diagnose wide variety of bugs

7

Page 8: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Existing work

8

Approach Performance Diagnostic Capability

FAILURE REPLAY

High runtime overhead OR Non-existent hardware support

Manually locate root cause

BUG DETECTION

Many false positives

Page 9: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Cooperative Bug Isolation

Cooperatively diagnose production-run failures

• Targets widely deployed software

• Each client machine sends back information

Uses sampling

• Collects only a subset of information

• Reduces monitoring overhead

• Fits well with cooperative debugging approach

9

Page 10: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Cooperative Bug Isolation

Approach Performance Diagnostic Capability

CBI / CCI >100% overhead for many apps (CCI)

Accurate & Automatic

10

Program Source

Predicates & J/L

Failure Predictors

TRUE in most FAILURE runs, FALSE in most SUCCESS runs.

Statistical Debugging

Code size increased

>10X

Compiler

Predicates

Sampling

Page 11: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Performance-counter based Bug Isolation

Requires no non-existent hardware support

Requires no software instrumentation

11

Program Binary

Predicates & J/L

Failure Predictors

Statistical Debugging

Hardware

Predicates

Sampling

Code size unchanged.

Hardware performance

counters

Page 12: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

PBI Contributions

Suitable for production-run deployment

Can diagnose a wide variety of failures

Design addresses privacy concerns

12

Approach Performance Diagnostic Capability

PBI <2% overhead for most apps evaluated

Accurate & Automatic

Page 13: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Outline

Motivation

Overview

PBI

• Hardware performance counters

• Predicate design

• Sampling design

Evaluation

Conclusion

13

Page 14: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Hardware Performance Counters

Registers monitor hardware performance events

• 1—8 registers per core

• Each register can contain an event count

• Large collection of hardware events

• Instructions retired, L1 cache misses, etc.

14

Page 15: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Accessing performance counters INTERRUPT-BASED POLLING-BASED

15

HW (PMU)

Kernel

User

Config

Config

Interrupt

Instruction

HW (PMU)

User

Special Config

Count

How do we monitor which event occurs at which instruction using performance counters ?

Page 16: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Predicate evaluation schemes

16

Natural fit for sampling Requires instrumentation

More precise Imprecise due to OO execution

INTERRUPT-BASED POLLING-BASED

Counter overflow

Kernel

Config Interrupt

Interrupt at Instruction C => Event occurred at C

old = readCounter() < Instruction C > new = readCounter() if(new > old) Event occurred at C

16

Page 17: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Concurrency bug failures

L1 data cache cache-coherence events

17

How do we use performance counters to diagnose concurrency bug failures ?

Local read Local write Remote read Remote write

Modified Exclusive Shared Invalid

Page 18: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Atomicity Violation Example

18

THD 1 on CORE 1

decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt);

C:if(!obj->refcnt) cleanup_cache(obj); }

CORE 1 – THD 1

Local Write

Modified

Page 19: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Atomicity Violation Example

19

decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt);

C:if(!obj->refcnt) cleanup_cache(obj); }

decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup_cache(obj); }

CORE 1 – THD 1 CORE 2 - THD 2

Remote Write

Invalid

Page 20: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Atomicity Violation Bugs

20

THREAD INTERLEAVING FAILURE PREDICTOR

WWR Interleaving INVALID

RWR Interleaving INVALID

RWW Interleaving INVALID

WRW Interleaving SHARED

Page 21: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Order violation

21

print(‚End‛,Gend)

C:print(‚Run‛,Gend-init)

Gend = time()

CORE 1 – MASTER THD CORE 2 – SLAVE THD

Shared

Remote Write

Local Read

Page 22: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Order violation

22

print(‚End‛,Gend)

C:print(‚Run‛,Gend-init)

Gend = time()

Exclusive

CORE 1 – MASTER THD CORE 2 – SLAVE THD

Local Read

Page 23: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

PBI Predicate Sampling

We use Perf (provided by Linux kernel 2.6.31+)

23

perf record –event=<code> -c <sampling_rate> <program monitored>

Log Id

APP Core Performance Event

Instruction Function

1 Apache

2 0x140 (Invalid)

401c3b

decrement_refcnt

Page 24: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

PBI vs. CBI/CCI (Qualitative) Performance

Diagnostic capability

• Discontinuous monitoring (CCI/CBI)

• Continuous monitoring (PBI)

CCI

Are other threads sampling?

Sample in this region?

Are other threads sampling?

24

Sample in this region?

PBI CBI

Page 25: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Outline

Motivation

Overview

PBI

• Hardware performance counters

• Predicate design

• Sampling design

Evaluation

Conclusion

25

Page 26: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Methodology

23 real-world failures • In open-source server, client, utility programs

• All CCI benchmarks evaluated for comparison

Each app executed 1000 runs (400-600 failure runs)

• Success inputs from standard test suites

• Failure inputs from bug reports

• Emulate production-run scenarios

Same sampling settings for all apps

26

Page 27: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Evaluation

27

Program Diagnostic Capability PBI CCI-P CCI-H

Apache1 Apache2 Cherokee X FFT X LU X Mozilla-JS1 X Mozilla-JS2 Mozilla-JS3 MySQL1 - - MySQL2 - - PBZIP2

Page 28: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnostic Capability

28

Program Diagnostic Capability PBI CCI-P CCI-H

Apache1 (Invalid) Apache2 (Invalid) Cherokee (Invalid) X FFT (Exclusive) X LU (Exclusive) X Mozilla-JS1 (Invalid) X Mozilla-JS2 (Invalid) Mozilla-JS3 (Invalid) MySQL1 (Invalid) - - MySQL2 (Shared) - - PBZIP2 (Invalid)

Page 29: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnostic Capability

29

Program Diagnostic Capability PBI CCI-P CCI-H

Apache1 Apache2 Cherokee X FFT X LU X Mozilla-JS1 X Mozilla-JS2 Mozilla-JS3 MySQL1 - - MySQL2 - - PBZIP2

Page 30: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnostic Capability

30

Program Diagnostic Capability PBI CCI-P CCI-H

Apache1 Apache2 Cherokee X FFT X LU X Mozilla-JS1 X Mozilla-JS2 Mozilla-JS3 MySQL1 - - MySQL2 - - PBZIP2

Page 31: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnostic Overhead

Program Diagnostic Overhead PBI CCI-P CCI-H

Apache1 0.40% 1.90% 1.20% Apache2 0.40% 0.40% 0.10% Cherokee 0.50% 0.00% 0.00% FFT 1.00% 121% 118% LU 0.80% 285% 119% Mozilla-JS1 1.50% 800% 418% Mozilla-JS2 1.20% 432% 229% Mozilla-JS3 0.60% 969% 837% MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

31

Page 32: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Diagnostic Overhead

Program Diagnostic Overhead PBI CCI-P CCI-H

Apache1 0.40% 1.90% 1.20% Apache2 0.40% 0.40% 0.10% Cherokee 0.50% 0.00% 0.00% FFT 1.00% 121% 118% LU 0.80% 285% 119% Mozilla-JS1 1.50% 800% 418% Mozilla-JS2 1.20% 432% 229% Mozilla-JS3 0.60% 969% 837% MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

32

Page 33: Production-Run Software Failure Diagnosis via Hardware Performance Countersjarulraj/talks/2013.pbi... · 2018. 9. 25. · MySQL1 3.80% - - MySQL2 1.20% - - PBZIP2 8.40% 1.40% 3.00%

Conclusion

Low monitoring overhead

Good diagnostic capability

No changes in apps

Novel use of performance counters

33

PBI will help developers diagnose production-run software failures with low overhead

Thanks !