presenter : shih-tung huang tsung-cheng lin kuan-fu kuo 2015/6/26 eice team dip: a non-intrusive...

15
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 111/03/21 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core Chi-Neng Wen, Shu-Hsuan Chou and Tien-Fu Chen National Chung-Cheng University, Chia-Yi, Taiwan e-mail: {wcn93, csh93, chen}@cs.ccu.edu.tw 2009 10th International Symposium on Pervasive Systems,

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Presenter : Shih-Tung Huang

Tsung-Cheng LinKuan-Fu Kuo

112/04/18

EICE team

dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Chi-Neng Wen, Shu-Hsuan Chou and Tien-Fu ChenNational Chung-Cheng University, Chia-Yi, Taiwane-mail: {wcn93, csh93, chen}@cs.ccu.edu.tw

2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks

Page 2: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Traditional debug facilities are limited in providing debugging requirements for multicore parallel programming. Synchronization problems or bugs due to race conditions are particularly difficult to detect with software debugging tools. This work presents a fast and feasible hardware-assistant solution for many-core non-intrusive debugging. The key idea is to keep tracks of data accesses of shared memory areas and their lock synchronization activities by proposed data structures in proposed debugging IP (dIP). A page-based shared variable cache is provided to keep shared variables as long as possible, and an inexpensive pluggable off-chip RAM can eliminate the false-positive rate efficiently.

2

Abstract (1)

Page 3: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

To decrease the debugging traffic block, this work provides a thread library to specify shared memory/lock events and transmit those events to the dIP by a small proper hardware co-processor (eXtend dIP) of each core. Our experimental result shows the debugging traffic block (worse-case) by increasing cores, and adding tolerance buffers in XdIP can efficiently ease off. Moreover, the real workloads (SPLASH-2, MPEG-4, and H.264) are executed by the dIP non-instructive race-detection with only 4.7%~12.2% slow down in average. Finally, the hardware cost of dIP is also low when the growing of many-core.

3

Abstract (2)

Page 4: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Data race detection in multi-cores Software method

Cause probe effectHardware method

Cause lot of memory (or hardware area) needed for log cores behavior

Cause false positive This paper propose method

Not software methodUse related work [3] to avoid probe effectUse centralized race detection : don’t increase huge hardware

area when increase cores

4

What’s the problem

Page 5: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Probe effect was introduced in related work [1]

Use related work [4] for data race detection

Related work [3] separate debugging data path from usual data path to avoid probe effect

5

Related work

Race detection (multi-core)

Software[5][6] hardware

[7][8][9]

This papermethod

Lock-set algo.[4]

Related work[3]

Page 6: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

6

Propose MPSOC framework Every core has a XdIP

XdIP as a co-processor for each coreXdIP is used to send debug event to dIP through Debug I/F

The interconnection flow the standard of related work [3] Data I/F is used for usual data pathDebug I/F is used for debug event path

Page 7: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

7

XdIP architecture The architecture is quite simply

Filter to filter debug event (Lock and Mem access info) to buffer which in packet & send and wait for sending to dIP

Filter is settled by SW setting Event monitor and transfer in each coreWhen buffer is full, it will announce dIP to stall all core for event transfer

Page 8: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

8

Data race detection flow First Table manager accept debug event from XdIP and

then maintain shard variable cache, lock-set and core-status table

Second Rule logic check if data race happen or nothappen: Alert will be enable to notify exception handler to fix race

detection

Page 9: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

9

dIP architecture Data race detection flow corresponds 1~5 6 is for ordering debug event (SqID) 7 is external RAM for cache miss

Page 10: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

10

Three tables Page-base Variable table is used for recording variable

latest access state Lock-key table is used for recording how many lock-set

and how many lock key are available Core-status table is used for recording core state (thread,

lock set, SqID)

Fully association

Page 11: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

11

Overall propose framework

Page 12: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

12

Allocation/de-allocation lock-key Allocation

Thread A execute W_lock S1, then the event sent to dIP by XdIPdIP allocate a lock-key to thread A, then thread A save lock-key

number with S1 de-allocation

Thread A execute W_unlock S1, in the mean time the lock-key will send to dIP together to de-allocate

Page 13: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

13

Data race detect rule

62 11

core1 core2

Page 14: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

When XdIP buffer full ,dIP will stall all cores for non-intrusive.

stall will reduce system performance, use a experience to show stall ratio by using SPLASH-2 benchmarks

14

Experiences

Sol: add buffer in XdIP

Page 15: Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core

Four different benchmarks worse case performance down is 12.25%

Compare with related work [9]

15

Experiences