department of computer science mining performance data from sampled event traces bret olszewski ibm...

24
Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa, Patricia J. Teller The University of Texas at El Paso Department of Computer Science

Upload: annice-fox

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Mining Performance Data from Sampled Event Traces

Bret OlszewskiIBM Corporation – Austin, TX

Ricardo Portillo, Diana Villa, Patricia J. Teller The University of Texas at El PasoDepartment of Computer Science

Page 2: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Outline

Motivation Data Collection Environment

• Workload & Platform• Monitored Events

Data Analysis & Results Conclusions and Future Work

Page 3: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Motivation

Capturing Event Traces System Simulation: Overhead penalty is too high Real-time Metrics: Capture every event during actual execution

Problem Growing size of full event traces is becoming unmanageable

GoalUse sampled event traces to analyze execution behavior

Page 4: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Data Collection Environment

Workload• TPC-C benchmark

Commercial OLTP

Platform• IBM eServer pSeries 690 architecture (p690)

8- and 32-processor configurations

Page 5: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

L2

L2

L3

MCM 0

8-processor p690 configurationPlatform

P X

XP

XP

P

L2

L2

L2

L2

L3

MCM 1

X XP

L2

Page 6: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

32-processor p690 configurationPlatformP P

PP

PP

P

L2

L2

L2

L2

L3

MCM 0

P

P P

PP

PP

P

L2

L2

L2

L2

L3

MCM 2

P

P P

PP

PP

P

L2

L2

L2

L2

L3

MCM 1

P

P P

PP

PP

P

L2

L2

L2

L2

L3

MCM 3

P

Page 7: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Monitored Events

L2-cache data-load misses• L2.5• L2.75• L3• L3.5• MEM

Page 8: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

L2

L3

MCM 0

P X

XP

XP

P

L2

L2

L2

L2

L3

MCM 1

X XP

L2

Where is L2 Miss Resolved?

L2

Page 9: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

L2

L2

L3

MCM 0

P X

XP

XP

P

L2

L2

L2

L2

L3

MCM 1

X XP

L2

Where is L2 Miss Resolved?

L2.5 Event

Page 10: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

L3

MCM 0

P X

XP

XP

P

L2

L2

L2

L2

L3

MCM 1

X XP

L2

L2 L2

Where is L2 Miss Resolved?

L2.5 Event L2.75 Event

Page 11: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

L3

MCM 0

P X

XP

XP

P

L3

MCM 1

X XP

L2

L2 L2

L2 L2

L2L2

Where is L2 Miss Resolved?

L2.5 Event L2.75 EventL3 Event

Page 12: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

P X

XP

XP

L2

MCM 0

P X

XP

XP

P

L3

MCM 1

X XP

Where is L2 Miss Resolved?

L2.5 Event L2.75 EventL3 Event

L3

L2

L2L2

L2 L2

L2L2

L3.5 Event

Page 13: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Data Collection

Performance Monitoring Unit (PMU)• Special-purpose registers• Programming interface

Kernel extension

eprof• PMU configuration• Event-based sampling

Page 14: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Sampled Event Trace

10-minute observation interval• Record periodic occurrences of an event• 100 events/sec/CPU

Event record372872 184469 0.328104637 000000000000A8C4 0000000000218880

PID TID Timestamp Effective Instruction Address

EffectiveData Address

Average number of samples collected/event• 238,448 for 8-processor data • 212,396 for 32-processor data

Page 15: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Analysis

• Memory Hotspots

• Individual Address Region

• Process Migration

Page 16: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

• L3 and Memory are most active memory levels

• Counted total number of L3 hits

• Counted number of L3 hits per address region

• Counted number of unique cache lines referenced per region

Memory Hotspots

Page 17: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Distribution of L3 Data Load Hits

0 0.1 0.2 0.3 0.4 0.5

Kernel

Text

Data,BSS,Heap

BufferPool

Stack

Ublock&KernelStack

M_BUF

KERN_HEAP

Ad

dre

ss r

egio

n

Fraction of data loads

Unique cache line

Hit %

Memory Hotspots

Page 18: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Individual Address Region

• We can look at an address region in more detail

• Looked at Buffer Pool region

• Counted number of references per memory level

• Counted number of unique cache lines referenced per memory level

Page 19: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

0

20000

40000

60000

80000

100000

120000

L2 L2.5 MOD L2.75 MOD L3 L3.5 MEMEvent Name

Distribution of Data Load Hits: BUFFER_POOL

DataLoadHits

UniqueCacheLines

Individual Address Region

Page 20: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Process Migration

• Process migration from one chip to another can degrade performance when all or part of the process' working set must follow, via L2-cache misses

• Looked at 885 threads

• Counted number of migrations per thread

• Counted number of L2.5 hits per thread

Page 21: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Process Migration32-Way L2.5 Hits VS. Intra-MCM Migrations

0

5000

10000

15000

20000

25000

0 1000 2000 3000 4000 5000 6000

Intra-MCM Migrations

L2.

5 M

od

ifie

d H

its

Page 22: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Only a few addresses in Buffer Pool region are causing most of its L3 hits

For Buffer Pool, heavily referenced shared data is constantly resolved outside an MCM

Process migration is not a source of performance degradation

Conclusions

Page 23: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Quantify representativeness of sampled event traces

Suggest more ways to improve p690 application performance

Study sampled event traces for other workloads

In depth study of process characterization

Future Work

Page 24: Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Department of Computer Science

Thank You!