performance evaluation of adaptivity in stmperformance evaluation of adaptivity in stm mathias payer...
TRANSCRIPT
Performance Evaluation of Adaptivity in STM
Mathias Payer and Thomas R. GrossDepartment of Computer Science,
ETH Zürich
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 2
Motivation● STM systems rely on many assumptions
● Often contradicting for different programs● Statically tuned to a baseline
● Use self-optimizing systems● Adapt to different workloads
● What parameters can be adapted?● How to measure effectiveness?
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 3
Outline● Introduction● STM System
● STM Baseline● Adaptive Parameters
● Evaluation● Related work● Conclusion
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 4
Introduction● Software Transactional Memory (STM) applies
transactions to memory● (Optimistic) concurrency control mechanism● Alternative to lock-based synchronization
● Multiple concurrent threads run transactions● Concurrent memory modifications
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 5
Introduction● Concurrent transactions modify memory without
synchronization● Transaction is verified after completion● Conflicts are detected and resolved● Changes committed for conflict-free transactions● Modifications only visible after commit
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 6
Introduction
withdraw { tmp = balance; tmp = tmp – 100 balance = tmp;}
deposit { tmp = balance; tmp = tmp + 100 balance = tmp;}
● What happens when balance is accessed concurrently?● Either locking or STM needed to ensure correct end
balance● STM system decides which tx is executed first
TX starts
balance inread-set
balance inwrite-setConflict detection,
data committed
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 7
STM Baseline● Many efficient STM implementations agree on
important design decisions:● Word-based locking● Global locking / version table● Eager locking● (Almost) no contention management● Simple write-set and read-set implementations
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 8
STM Baseline
Combined global write lock / version array
Transaction
Lock list
Write Hash
Read Hash
Writelist /
buffer
Read list /
buffer
Transaction
Lock list
Write Hash
Read Hash
Writelist /
buffer
Read list /
buffer
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 9
Adaptive STM Parameters● Global adaptivity
● Synchronization needed● Optimizes to global optimum● Averages over all concurrent transactions
● (Thread-) local adaptivity● No synchronization needed● Limits adaptable parameters● Best parameters for each thread/transaction
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 10
Adaptive STM Parameters● Different adaptive parameters measured:
● Size of global locking/version-table *G● Size of local hash-tables *L● Write strategy *L● Locality tuning for hash-functions *L● Contention management *L
*L – local, *G – global
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 11
Adaptive Hash-Table● Global hash-table: trade-off between over-
locking and locality● Global strategy: coordinate lock collisions and over-
locking between threads● Adapt size based on global information
● Local hash-table: trade-off between reset cost, and # hash-collisions● Local strategy: sample moving average of unique
write locations● Adapt size based on trend
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 12
Adaptive Write Strategy● Different costs depending on strategy
● Write-back: cheap abort, expensive commit● Write-through: expensive abort, cheap commit
● Adapt strategy to per-thread workload● Measure abort rate
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 13
Adaptive Locality Tuning● Different applications have different data
access patterns● No optimal hash function for all data accesses
● Measure number of hash collisions for thread-local hash tables● Circle through different hash functions
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 14
Adaptive Contention Management● No single strategy works in all environments
● Measure contention and implement an adaptive back-off strategy● Wait and retry● Abort later
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 15
Local Adaptive STM Parameters(for local hash-table)
0
enlarge write-hash
shrink write-hash
no change
# w
rites
vs.
has
h-ta
ble
spac
e
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 16
Local Adaptive STM Parameters(for local hash-table)
# hash collisions0
changehash-function
no change
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 17
Local Adaptive STM Parameters(for local hash-table)
# hash collisions0
changehash-function
enlarge write-hash
shrink write-hash
no change
# w
rites
vs.
has
h-ta
ble
spac
eenlarge write-hash
&change hash-function
shrink write-hash &change hash-function
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 18
AdaptSTM● Adaptive STM system built on presented
features● Statically tuned competitive baseline
– Static global hash function and hash table● Mature and stable implementation● Different local adaptive parameters
– Write-set hash function and size of hash table– Write-through and write-back write strategy– Adaptive contention management
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 19
Evaluation● Benchmark: STAMP 0.9.10
● ++ configuration (increased workload for kmeans)
● AdaptSTM version 0.5.1
● Intel 4-core Xeon E5520 CPU● 8 cores @ 2.27GHz, 12GB RAM● 64bit Ubuntu 9.04
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 20
Evaluation: Global Hash-Table
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Genome
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]0 2 4 6 8 10
0
10
20
30
40
50
60
70
80
kmeans
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 21
Evaluation: Global Adaptivity● Global optimizations have limited potential
● Small optimization potential● High synchronization cost● Reasonable baseline outperforms global
optimization
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 22
Evaluation: Local Adaptivity● Different configurations:
● naWB: no adaptivity, use write-back● aWBT: adaptivity, adjust write-through / write-back● aWWH: aWBT plus an adaptive hash-table for the
write-set● aWHH: aWWH plus different hash functions● aALL: all adaptive parameters plus Bloom filter for
write-entries
● Adaptation system starts with best 'average' parameters, improves from there
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 23
Evaluation: Local Adaptivity
● aWBT: adaptive, write-back/-through
● aWWH: adaptive, write-back/-through, write-hash
● aWHH: adaptive, write-back/-through, write-hash, hash-function
● aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter
1 2 4 8 16-15.00%
-10.00%
-5.00%
0.00%
5.00%
10.00%
15.00%
kmeans
aWBTaWWHaWHHaALL
Threads
Spe
edup
to n
on a
dapt
ive
1 2 4 8 16-4.00%
-3.00%
-2.00%
-1.00%
0.00%
1.00%
2.00%
3.00%
Labyrinth
aWBTaWWHaWHHaALL
ThreadsS
peed
up to
non
ada
ptiv
e
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 24
Evaluation: Local Adaptivity
1 2 4 8 16
-3.00%
-2.00%
-1.00%
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
Genome
aWBTaWWHaWHHaALL
Threads
Spe
edup
to n
on a
dapt
ive
1 2 4 8 16
-2.00%
-1.00%
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
Vacation
aWBTaWWHaWHHaALL
Threads
Spe
edup
to n
on a
dapt
ive
● aWBT: adaptive, write-back/-through
● aWWH: adaptive, write-back/-through, write-hash
● aWHH: adaptive, write-back/-through, write-hash, hash-function
● aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 25
Evaluation: Local Adaptivity● No single optimization works for all benchmarks● Combination of all options leads to best
performance● Impressive speed-ups for individual
benchmarks compared to the globally optimized case
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 26
Related Work● TL2 (Dice et al.): baseline STM system● Different related work on static tuning of global
parameters (Harris, Dice, Ennals, Felber)● Crucial for efficient baseline
● TinySTM (Felber et al.): adapts size and hash function of global locking table
● ASTM (Marathe et. al.): adapts lazy-eager locking strategies and different meta-formats
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 27
Conclusions● Adaptivity in STM is important for good
performance● Speedups up to 10% possible
● Global optimization are limited● Low potential, high synchronization cost
● Local optimizations tune thread-local parameters● High correlation with workload
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 28
Questions
● Contact: [email protected]● Source: http://nebelwelt.net/projects/adaptSTM/
?
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 29
Evaluation: Global Hash-Table
0 2 4 6 8 10
0
5
10
15
20
25
30
Bayes
4 Threads
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Genome
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
5
10
15
20
25
30
Vacation
4 Threads
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
10
20
30
40
50
60
70
80
kmeans
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 30
Evaluation: Global Hash-Table
0 2 4 6 8 10
0
5
10
15
20
25
Labyrinth
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
2
4
6
8
10
12
14
16
18
20
Intruder
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
2
4
6
8
10
12
14
16
18
SSCA2
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
0 2 4 6 8 10
0
5
10
15
20
25
30
35
40
45
50
YADA
4 Threads
2^162^182^202^222^242^26
# Shifts
Tim
e [s
]
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 31
STM Comparison
1 2 4 8 16
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Genome
astmtl2tstmtstm099
Threads
Re
lativ
e r
un
time
1 2 4 8 16
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Vacation
astmtl2tstmtstm099
Threads
Re
lativ
e r
un
time
1 2 4 8 16
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Labyrinth
astmtl2tstmtstm099
Threads
Re
lativ
e r
un
time
1 2 4 8 16
0
1
2
3
4
5
6
Intruder
astmtl2tstmtstm099
Threads
Re
lativ
e r
un
time
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 32
Evaluation: Local Adaptivity
1 2 4 8 16
-4.00%
-3.00%
-2.00%
-1.00%
0.00%
1.00%
2.00%
3.00%
Bayes
aWBTaWWHaWHHaALL
Threads
Sp
ee
du
p to
no
n a
da
ptiv
e
1 2 4 8 16
-3.00%
-2.00%
-1.00%
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
SSCA2
aWBTaWWHaWHHaALL
Threads
Sp
ee
du
p to
no
n a
da
ptiv
e
ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 33
Evaluation: Local Adaptivity
1 2 4 8 16
-2.00%
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
YADA
aWBTaWWHaWHHaALL
Threads
Sp
ee
du
p to
no
n a
da
ptiv
e