concurrency scalability
DESCRIPTION
Herb Sutter (GotW.ca) says that the concept of Concurrency is easier understood if split into three sub concepts; scalability, responsiveness and consistency. This presentation is the first of three covering these concepts, starting off with everyone’s favorite: Scalability – i.e. splitting a CPU-bound problem onto several cores in order to solve the problem faster. I will show what tools what .NET offer but also performance pitfalls that arise from an escalating problem that plagued computer architecture for the last 20 years.TRANSCRIPT
Mårten RångeWCOM AB
@marten_range
ConcurrencyExamples for .NET
Responsive
PerformanceScalable algorithms
Three pillars of Concurrency
Scalability (CPU) Parallel.For
Responsiveness Task/Future async/await
Consistency lock/synchronized Interlocked.* Mutex/Event/Semaphore Monitor
Scalability
Which is fastest?
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Race condition
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Poor performancevar ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Memory cycles
1.4 210 -150x
299,792,458 m/s
Speed of light is too slow
0.09 m/c
99% - latency mitigation
1% - computation
2 Core CPU
RAM
L3L2
L1
CPU1
L2
L1
CPU2
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
new Random ()
new int[InnerLoop]
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random object Random object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random object Random object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
4 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
L1
CPU3
L1
CPU4
new Random ()
new int[InnerLoop]
2x4 Core CPU
RAM
L3L2
L1
CPU1
L2
L1
CPU2
L2
L1
CPU3
L2
L1
CPU4
L3L2
L1
CPU5
L2
L1
CPU6
L2
L1
CPU7
L2
L1
CPU8
Solution 1 – Locks
var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => {lock (ints) {ints[i] = random.Next();}} );
Solution 2 – No sharing
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop, () => new Random(), (i, pls, random) => {ints[i] = random.Next(); return random;}, random => {} );
Parallel.For adds overheadLevel0
Level1
Level2
ints[0]
ints[1]
Level2
ints[2]
ints[3]
Level1
Level2
ints[4]
ints[5]
Level2
ints[6]
ints[7]
Solution 3 – Less overhead
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop / Modulus, () => new Random(), (i, pls, random) => { var begin = i * Modulus ; var end = begin + Modulus ; for (var iter = begin; iter < end; ++iter) { ints[iter] = random.Next(); } return random; }, random => {} );
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}
Solution 4 – Independent runs
var tasks = Enumerable.Range (0, 8).Select ( i => Task.Factory.StartNew ( () => { var ints = new int[InnerLoop]; var random = new Random (); while (counter.CountDown ()) { for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } } }, TaskCreationOptions.LongRunning)) .ToArray ();Task.WaitAll (tasks);
Parallel.For
Only for CPU bound problems
Sharing is bad
Kills performanceRace conditions
Dead-locks
Cache locality
RAM is a misnomerClass designAvoid GC
Natural concurrency
Avoid Parallel.For
Act like an engineer
Measure before and after
One more thing…
http://tinyurl.com/wcom-cpuscalability
Mårten RångeWCOM AB
@marten_range