transactional memory cda6159. outline introduction paper 1: architectural support for lock-free data...
Post on 12-Jan-2016
222 Views
Preview:
TRANSCRIPT
Transactional Memory
CDA6159
Outline
Introduction
Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)
Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Introduction
Transaction
A sequence of actions that appears indivisible and instantaneous to an outside observer.
Four specific attributes: atomicity, consistency, isolation, and
durability — collectively known as the ACID properties.
Introduction
Concurrency controlLock? Bad performance, deadlock, etc.
lock-free, optimistic cc
Herlihy and Moss in 1993 proposed hardware-supported transactional memory as a mechanism for building lock-free data structures.
Basic Transactional Mechanisms
Isolation Detect when transactions conflict Track read and write sets
Version management Record new and old values
Atomicity Commit new values Abort back to old values
H/W Transactional Memory Systems
Knight’s Lisp Work Transactional Memory Oklahoma Update SLE/TLR Transactional Coherence and Consistency Unbounded TM Virtual TM Thread-level TM
Outline
Introduction
Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)
Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Lock and Problems
Lock is commonly used with shared data Priority Inversion
Lower priority process hold a lock needed by a higher priority process
Convoy Effect When lock holder is interrupted, other is forced to wait
Deadlock Circular dependence between different processes acquiring locks, so
everyone just wait for locks
H&M’s Transactional Memory [’93]
Intended to replace short critical sections Motivated by lock-free data structures
Transactions: Read and write multiple locations Commit in arbitrary order Implicit begin, explicit commit operations Abort affects memory, not registers
Software manages restarting execution Validate instruction detects pending abort
Implementation extends cache coherence Read/Write locks correspond to MESI states Add orthogonal transaction states
Transactional Hardware State
processor state transaction active flag (TACTIVE)
whether a transaction is in progress; implicitly set by 1st xactional op
transaction status flag (TSTATUS)whether the transaction is active (true) or aborted (false)
small, fully-associative xactional cachedisjoint from the L1 cache (data can only be one or the other)
hold tentative writes before propagationinvalidated if aborted, snooped and/or written back if committed
2 copies of each xactional lineto avoid writebacks to memory; this enables xactional writes to hold both old & new value
abort another xaction that will cause conflict aborted by interrupts & xactional cache overflows act like regular cache if not in xaction fast commit and abort (in a single cache cycle)
TM Instructions
Instructions for accessing memory Load-transactional (LT)
Reads from shared memory into private register
Load-transactional-exclusive (LTX) LT+ hinting write is coming up
Store-transactional (ST) Tentatively write from private register to shared memory, new value is not visible to other processors till commit
Instructions for manupulating xaction state Commit
Tries to make tentative write permanent. Successful if no other processor read its write set or write its read/write set. Write set visible to others.When fails, discard all updates to write set
AbortDiscard all updates to write set
ValidateReturn current transaction status. Indicating whether it’s aborted.If current status is false, discard all updates to write set
Transaction Example
/* keep trying */While ( true ) {
/* read variables */v1 = LT ( V1 ); …; vn = LT ( Vn );/* check consistency */if ( ! VALIDATE () ) continue;/* compute new values */compute ( v1, … , vn);/* write tentative values */ ST (v1, V1); … ST(vn, Vn);/* try to commit */if ( COMMIT () ) return result;else backoff;
}
Transactional Cache
Extend cache coherency protocols any protocol capable of detecting accessibilit
y conflicts can also detect transaction conflict at no extra cost.
Includes bus snoopy, directory
Additional transactional tag EMPTY, NORMAL, XCOMMIT, XABORT Two entries per xaction data
XCOMMIT, XABORT
Allocation policyEMPTY>NORMAL>XCOMMIT
Bus cycles T_READ and T_RFO(read for ownership) BUSY
Request can be refused by responding BUSY; When BUSY is received, xaction is aborted;
This prevents deadlock and continual mutual aborts
Processor Operations
LT Check for XABORT entry If false, check for NORMAL entry
Switch NORMAL to XABORT and allocate XCOMMIT
If false, issue T_READ on bus, then allocate XABORT and XCOMMIT If T_READ receive BUSY, abort
Set TSTATUS to false Drop all XABORT entries Set all XCOMMIT entries to NORMAL Return random data
LTX, ST Same as LT Except
Use T_RFO on a miss rather than T_READ, cache line state to RESERVED For ST, XABORT entry is updated
Processor Operations
VALIDATE Return TSTATUS flag If false, set TSTATUS true, TACTIVE false
ABORT Set TSTATUS true, TACTIVE false Change XABORT to EMPTY, XCOMMIT to NORMAL
COMMIT Return TSTATUS, set TSTATUS true, TACTIVE false Drops all XCOMMIT and changes all XABORT to NORMAL
Snoopy Cache Actions
Regular cache acts as MESI, treats READ as T_READ, RFO as T_RFO
Transactional cache Non-xactional cycle: Acts like regular cache, NORMAL entries only T_READ: If the the entry is valid (share), returns the value All other cycle: BUSY
Memory Responds to READ, T_READ, RFO, T_RFO when no cache responds; WRITE
Advantage and disadvantage
Single cache for both reg/xaction data Set size would determine the max xaction size; Parallel commit/abort logic for a larger cache
Xaction size is limited by the xactional cache size Overflow, traps into software Xaction data set is small
Cannot survive interrupt
Simulation
Proteus Simulator 32 processors Regular cache
Direct mapped, 2048 8-byte lines Transactional cache
Fully associative, 64 8-byte lines Single cycle caches access 4 cycle memory access Both snoopy bus and directory are simulated 2 stage network with switch delay of 1 cycle each
Benchmarks
Counter n processors, each increment a shared counter (2^16)/n times
Producer/Consumer buffer n/2 processors produce, n/2 processor consume through a
shared FIFO end when 2^16 items are consumed
Doubly-linked list N processors tries to rotate the content from tail to head End when 2^16 items are moved Variables shared are conditional Traditional locking method can introduce deadlock
Comparisons
CompetitorsTransactional memoryLoad-locked/store-cond (Alpha)Spin lock with backoff Software queueHardware queue
Counter Result
Producer/Consumer Result
Doubly Linked List Result
Conclusion
Avoid extra lock variable and lock problems Trade dead lock for possible live lock/starvation Comparable performance to lock technique when shared
data structure is small Relatively easy to implement
Outline
Introduction
Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)
Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Basic TCC Transaction Control Bits
In each local cache Read bits (per cache line, or per word to eliminate false sharing)
Set on speculative loads Snooped by a committing transaction (writes by other CPU)
Modified bits (per cache line) Set on speculative stores Indicate what to rollback if a violation is detected Different from dirty bit
During A Transaction Commit
Need to collect all of the modified caches together into a commit packet
Potential solutions A separate write buffer, or An address buffer maintaining a list of the line tags to be committed Size?
Broadcast all writes out as one single (large) packet to the rest of the system
Other
Rollback is needed when a transaction cannot commit Checkpoints needed prior to a transaction Checkpoint register state
Hardware approach: Flash-copying rename table / arch register file Software approach: extra instruction overheads
Overflow issue Conflict or capacity misses require all the victim lines to be kept somewhere (e.g. victim
cache) Stall temporarily, request for commit
Thanks!
top related