presenter : shih-tung huang tsung-cheng lin kuan-fu kuo 2015/6/26 eice team dip: a non-intrusive...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Presenter : Shih-Tung Huang
Tsung-Cheng LinKuan-Fu Kuo
112/04/18
EICE team
dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core
Chi-Neng Wen, Shu-Hsuan Chou and Tien-Fu ChenNational Chung-Cheng University, Chia-Yi, Taiwane-mail: {wcn93, csh93, chen}@cs.ccu.edu.tw
2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
Traditional debug facilities are limited in providing debugging requirements for multicore parallel programming. Synchronization problems or bugs due to race conditions are particularly difficult to detect with software debugging tools. This work presents a fast and feasible hardware-assistant solution for many-core non-intrusive debugging. The key idea is to keep tracks of data accesses of shared memory areas and their lock synchronization activities by proposed data structures in proposed debugging IP (dIP). A page-based shared variable cache is provided to keep shared variables as long as possible, and an inexpensive pluggable off-chip RAM can eliminate the false-positive rate efficiently.
2
Abstract (1)
To decrease the debugging traffic block, this work provides a thread library to specify shared memory/lock events and transmit those events to the dIP by a small proper hardware co-processor (eXtend dIP) of each core. Our experimental result shows the debugging traffic block (worse-case) by increasing cores, and adding tolerance buffers in XdIP can efficiently ease off. Moreover, the real workloads (SPLASH-2, MPEG-4, and H.264) are executed by the dIP non-instructive race-detection with only 4.7%~12.2% slow down in average. Finally, the hardware cost of dIP is also low when the growing of many-core.
3
Abstract (2)
Data race detection in multi-cores Software method
Cause probe effectHardware method
Cause lot of memory (or hardware area) needed for log cores behavior
Cause false positive This paper propose method
Not software methodUse related work [3] to avoid probe effectUse centralized race detection : don’t increase huge hardware
area when increase cores
4
What’s the problem
Probe effect was introduced in related work [1]
Use related work [4] for data race detection
Related work [3] separate debugging data path from usual data path to avoid probe effect
5
Related work
Race detection (multi-core)
Software[5][6] hardware
[7][8][9]
This papermethod
Lock-set algo.[4]
Related work[3]
6
Propose MPSOC framework Every core has a XdIP
XdIP as a co-processor for each coreXdIP is used to send debug event to dIP through Debug I/F
The interconnection flow the standard of related work [3] Data I/F is used for usual data pathDebug I/F is used for debug event path
7
XdIP architecture The architecture is quite simply
Filter to filter debug event (Lock and Mem access info) to buffer which in packet & send and wait for sending to dIP
Filter is settled by SW setting Event monitor and transfer in each coreWhen buffer is full, it will announce dIP to stall all core for event transfer
8
Data race detection flow First Table manager accept debug event from XdIP and
then maintain shard variable cache, lock-set and core-status table
Second Rule logic check if data race happen or nothappen: Alert will be enable to notify exception handler to fix race
detection
9
dIP architecture Data race detection flow corresponds 1~5 6 is for ordering debug event (SqID) 7 is external RAM for cache miss
10
Three tables Page-base Variable table is used for recording variable
latest access state Lock-key table is used for recording how many lock-set
and how many lock key are available Core-status table is used for recording core state (thread,
lock set, SqID)
Fully association
11
Overall propose framework
12
Allocation/de-allocation lock-key Allocation
Thread A execute W_lock S1, then the event sent to dIP by XdIPdIP allocate a lock-key to thread A, then thread A save lock-key
number with S1 de-allocation
Thread A execute W_unlock S1, in the mean time the lock-key will send to dIP together to de-allocate
13
Data race detect rule
62 11
core1 core2
When XdIP buffer full ,dIP will stall all cores for non-intrusive.
stall will reduce system performance, use a experience to show stall ratio by using SPLASH-2 benchmarks
14
Experiences
Sol: add buffer in XdIP
Four different benchmarks worse case performance down is 12.25%
Compare with related work [9]
15
Experiences