memtracker efficient and programmable support for memory access monitoring and debugging guru...
Post on 22-Dec-2015
225 views
TRANSCRIPT
MemTracker Efficient and Programmable Support for
Memory Access Monitoring and Debugging
Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic
Venkataramani HPCA’07 2
Introduction
• Software is increasingly complex
• More complexity means more bugs
• Memory bugs are most common – Many are security vulnerabilities
• How to catch them efficiently?
Venkataramani HPCA’07 3
Debugging and Monitoring
• Maintain Information/state about memory
• Low performance overhead in “always-on” mode– Hard problem
• Flexible/Programmable– Even harder!!
Venkataramani HPCA’07 4
Challenges
• Software Approach– Flexible – Large (2X to 30X) slowdown
• Hardware approach– Faster– Most are checker specific– Others need Software intervention too
often
Venkataramani HPCA’07 5
Related Work
• DISE [ISCA’03]+ Pattern matches instructions,
dynamically injects instrumentation– Modifies front end of pipeline– Adds extra code to instruction stream
• Mondrian [ASPLOS’02]+ Fine grain protection- different
permissions for adjacent words – Software intervention for permission
updates – Complex hardware (trie structure)
Venkataramani HPCA’07 6
Objectives
• MemTracker* Maintains state for every memory
word* No software intervention for most
state checks and updates* Efficient checks and updates even
when nearby locations have different states
* Programmable (can implement different checkers)
Venkataramani HPCA’07 7
What is MemTracker?
• A programmable state machine– (State, event) → (State, Exception)
• Supports upto 16 states (4 state bits/word)– Not any fundamental limit; Can be extended
• All memory actions are events– Memory accesses : Loads, Stores – User events (affect only state)
Venkataramani HPCA’07 8
Example Heap Checker
Load/Store NON-
HEAP Malloc/Free ERROR
UNALLOCALLOC, UNINIT
INIT
Load/Store/Free ERROR
Malloc
Free
LoadERROR
Store
Free
MallocERROR
Load/Store
Venkataramani HPCA’07 9
MemTracker State Table
UEVT0 (Alloc)
UEVT1 (Free)
LOAD STORE
0(Non-Heap)
1(UnAlloc)
2(UnInit)
3(Init)
State↓
Event →
Venkataramani HPCA’07 10
MemTracker State Table
UEVT0 (Alloc)
UEVT1 (Free)
LOAD STORE
0(Non-Heap)
0 E 0 E 0 0
1(UnAlloc)
2(UnInit)
3(Init)
State↓
Event →
Venkataramani HPCA’07 11
MemTracker State Table
UEVT0 (Alloc)
UEVT1 (Free)
LOAD STORE
0(Non-Heap)
0 E 0 E 0 0
1(UnAlloc)
2 1 E 1 E 1 E
2(UnInit)
3(Init)
State↓
Event →
Venkataramani HPCA’07 12
MemTracker State Table
UEVT0 (Alloc)
UEVT1 (Free)
LOAD STORE
0(Non-Heap)
0 E 0 E 0 0
1(UnAlloc)
2 1 E 1 E 1 E
2(UnInit)
2 E 1 2 E 3
3(Init)
3 E 1 3 3
State↓
Event →
Venkataramani HPCA’07 13
State Storage
Application’s Virtual Address Space
Protected, Reserved Virtual Space for State Data
Normal Virtual Memory Space for code, data, stack and heap
State
Code, Data, Heap and
Stack
Venkataramani HPCA’07 14
State Base Reg Data address (0xABCD)
+State address
(0xF0000ABC)Cache
1100
1010
MU
X
State (11)
2
Number of State Bits
State Lookup – Word access only
101010111100 1101 1010101111000xF00000000xF00000000xF0000000
1100
1010
11
Venkataramani HPCA’07 15
Caching State information
Shared Caching
•No additional resources for state
•Data and state blocks compete for cache lines in existing caches
•Load/Stores already have data lookups, now they also need state lookups
•These state lookups double the L1 port contention
Venkataramani HPCA’07 16
Caching State information
Interleaved Caching
•Expand cache line to store state for its data
+One lookup finds both data and state- L1 cache larger and slower even when no checking
Venkataramani HPCA’07 17
Caching State information
Split Caching
•Dedicated (small) state L1 cache•Provides separate ports for state lookups•Leaves data L1 cache alone•When NOT checking, turn SL1 off
Venkataramani HPCA’07 18
Caching State information
Shared Caching
Split CachingInterleaved CachingL2 and below use shared caching (no addt’l space for state)
L2 single ported, rarely a contention problem (L1 filters out most acceses)State smaller than data, so needs less bandwidth and capacityWe use Split L1 and Shared L2 and below in the rest of the talk
Venkataramani HPCA’07 19
Pipeline
IF ID REN REG EXE MEM WB CMT
Data L1
Front End Out of Order Back end
Venkataramani HPCA’07 20
Pipeline Modifications
IF ID REN REG EXE MEM WBPre- CMT
CHK CMT
State L1
Data L1
State Forwarding
Prefetch
Venkataramani HPCA’07 21
Other Issues
• OS issues– Context switches (Fast)– Paging (same as data)
• Multiprocessor implementation– Coherence
• State information treated same as data
– Consistency• Key issue: atomicity of state and data
– Example: Same instruction accesses new data, old state
• More details in paper !
Venkataramani HPCA’07 22
Evaluation Platform
• SESC, Out of Order, 5 GHz.• L1 Data cache
– 16 KB, 2-way, 2-ports, 32B block
• L1 State cache (split caching)– 2KB, 2-way, 2-ports, 32B block
• L2 cache - 2 MB, 4-way, 1-port, 32B block
Venkataramani HPCA’07 23
Checkers used in Evaluation
• Heap Checker (Example seen before)– 4 states – NonHeap, UnAlloc, Alloc, Init
• Return Address Checker– Detects return address modifications– 3 states – NotRA, GoodRA, BadRA
• HeapChunks Checker– Detects sequential Heap Buffer overflows– 2 states – Delimit, NotDelimit
• Combined Checker– Combines all the above – 7 states,4 (although actually 3) state bits per
word– Most demanding; Default in evaluation
Venkataramani HPCA’07 24
Performance of Checkers on SPEC
Run
tim
e O
verh
ead
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
bzip2 eon mcf art swim Average
HeapChunks
Heap
Stack
Combined
2.7%
Venkataramani HPCA’07 25
Sensitivity - Prefetching
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
mcf vortex art equake mgrid Average
Imprecise
Precise without Prefetch
Precise with Prefetch
Run
tim
e O
verh
ead
Venkataramani HPCA’07 26
MemTracker vs. other schemes
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
gcc gzip art apsi swim average
MemTracker
Mondrian 30cycle update
Software 5 cycle check
Run
tim
e O
verh
ead
Venkataramani HPCA’07 27
Conclusions
• MemTracker– Monitors and checks memory accesses– Can be programmed to implement
different checkers– Low performance overheads
2.7% average and 4.7% worst for combined checker on SPEC
– Tested on injected bugs – it finds them!•More Details in paper
Venkataramani HPCA’07 29
BACKUP SLIDES
Venkataramani HPCA’07 30
Sensitivity – State Cache Size
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
9.00%
10.00%
crafty eon mcf art applu Average
1 KByte
2 KByte
4 KByte
16 KByte
Run
tim
e O
verh
ead
Venkataramani HPCA’07 31
Caching Configurations on SPEC
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
mcf parser vortex art applu Average
Shared
Interleaved
Interleaved (+1 Port)
Split
Run
tim
e O
verh
ead