la programmation concurrente par flux de données

33
ACENSI, Tour Monge - 22, Place des Vosges - 92 400 Courbevoie - La Défense 5 - www.acensi.fr Concurrent programming based on dataflow TPL DATAFLOW A new approach to Monte Carlo VAR 09/02/2015 Version du document

Upload: microsoft

Post on 18-Jul-2015

132 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: La programmation concurrente par flux de données

ACENSI, Tour Monge - 22, Place des Vosges - 92 400 Courbevoie - La Défense 5 - www.acensi.fr

Concurrent programming based on dataflow

TPL DATAFLOW

A new approach to Monte Carlo VAR

09/02/2015Version du document

Page 2: La programmation concurrente par flux de données

OVERVIEW

Optimization and multithreading

without getting your hands dirty!

09/02/2015 2Version du document

Page 3: La programmation concurrente par flux de données

TPL Dataflow Presentation

Why TPL Dataflow ?

A natural extension of framework 4.0

The library

Use cases

Case study : Monte Carlo Value At Risk (VAR)

What is VAR ?

Monte Carlo VAR: Basic Approach

Monte Carlo VAR: Dataflow Approach

Conclusion

SUMMARY

09/02/2015Version du document

Page 4: La programmation concurrente par flux de données

Speakers Presentation

Yves Alexandre SIMON James KOUTHON Julien LEBOT Adina SANDOU

R&D Director Technical Director .Net Expert .Net Expert

Information systems and Microsoft technologies Consulting

WHO ARE WE ?

09/02/2015 4Version du document

Page 5: La programmation concurrente par flux de données

Presentation

TPL DATAFLOW

09/02/2015 5Version du document

Page 6: La programmation concurrente par flux de données

TPL DATAFLOW: A NATURAL EXTENSION OF FRAMEWORK 4.0

Promotes actor-agent oriented designs through primitives.

Allows developers to create blocks to express computations based on directeddataflow graphs.

09/02/2015Version du document 6

Page 7: La programmation concurrente par flux de données

TPL DATAFLOW: THE LIBRARY

Overview TPL Dataflow falls in line with Map/Reduce

Can handle large volumes of data

Ideal for long computations

TPL Dataflow: paradigm shift Tasks are created and linked together as a graph

Each node can receive data as input and/or output data

09/02/2015 7Version du document

Page 8: La programmation concurrente par flux de données

TPL DATAFLOW: THE LIBRARY

Source blocks (1): acts like a source of data

ISourceBlock<TOutput>

Target blocks (2): acts like a receiver of data

ITargetBlock<TInput>

Propagator blocks: acts like (1) and (2)

IPropagatorBlock<TInput, TOutput>

09/02/2015 8Version du document

Page 9: La programmation concurrente par flux de données

TPL DATAFLOW: THE LIBRARY

Basic blocks BufferBlock: is a queue, a FIFO (First In First Out) buffer.

ActionBlock: like a “foreach”, it executes a delegate for each input item.

ex: var node = new ActionBlock<string>(s => Console.WriteLine(s));

TransformBlock: acts like a “Linq” selectex: var node = new TransformBlock<int, int>(p => p * 100);

Advanced blocks BroadcastBlock: forwards copies of data items as its output.

JoinBlock: collects many inputs and output a tuple

Others

09/02/2015 9Version du document

Page 10: La programmation concurrente par flux de données

TPL DATAFLOW: THE LIBRARY

Linking Used to link two blocks together.

Predicates and parallelism options available.

There’s no limit to what you can link.

Completion Status Each block supports an asynchronous form of

completion to propagate finished state.

09/02/2015 10Version du document

Page 11: La programmation concurrente par flux de données

WHY TPL DATAFLOW?

TPL Dataflow benefits

Paradigm shift for higher code expressivity

Using multithreading without effort

Boosting performance (optimization) painlessly

Focusing on the 'what' rather than the 'how'

09/02/2015 11Version du document

Page 12: La programmation concurrente par flux de données

TPL DATAFLOW: USE CASES

Build more complex systems easilySamples:

Data analysis/mining services

Web-crawlers

Image and Sound processors

Databases engine designs

Financial computation

09/02/2015 12Version du document

Page 13: La programmation concurrente par flux de données

Monte Carlo Value at Risk (VAR)

CASE STUDY

09/02/2015 13Version du document

Page 14: La programmation concurrente par flux de données

WHAT IS VAR?

What is VAR? Value at risk (VAR)

Monitor risk in trading portfolio

Financial Global risk indicator

Our use case Market VAR (VAR on market move)

Intensive computation (especially for Monte Carlo VAR)

09/02/2015 14Version du document

ExampleVAR 99/1D : Maximum lost in 1 day with99% probability

VAR Calculation Methods

Historical VAR

(historicaldata)

ParametricVAR

(formula data)

Monte Carlo VAR (montecarlosimulation

data)

Page 15: La programmation concurrente par flux de données

SIMPLE MONTECARLO VAR WORKFLOW

09/02/2015 15Version du document

Start

Portfolios Composition

Market Data

Static Data

Global Position

Position Pricing With MonteCarlo

Calculus

Position Pricing With MonteCarlo

Calculus

Position Pricing With MonteCarlo

Calculus

Statistics on Global

Distribution (VAR)End

1 2 3 4

Page 16: La programmation concurrente par flux de données

Basic approach

MONTE CARLO VAR

09/02/2015 16Version du document

Page 17: La programmation concurrente par flux de données

MONTE CARLO VAR: BASIC APPROACH

Pipeline:

09/02/2015 17Version du document

StartPortfolios

CompositionMarket Data

Global Position

Position Pricing With MonteCarlo

Calculus

Statistics on Global

Distribution (VAR)End

Page 18: La programmation concurrente par flux de données

MONTE CARLO VAR: BASIC APPROACH

Portfolio composition

Fetch portfolios by using the provider

Market data

Get product parameters from market data provider

Global position

Look over all portfolios and nettings and get the positions

09/02/2015 18Version du document

Portfolios = PortfolioProvider.Portfolios;

ProductParameters = ProductParametersProvider.ProductsParameters;

Portfolios Composition

Market Data

Global Position

IEnumerable<KeyValuePair<Product, long>> allTransactions = Portfolios.SelectMany(x => x.Transactions)

.GroupBy(y => y.Product).Select(z => new KeyValuePair<Product, long>

(z.Key, z.Sum(x => x.Position)));

Positions = allTransactions.ToDictionary(t => t.Key, t => t.Value);

Page 19: La programmation concurrente par flux de données

MONTE CARLO VAR: BASIC APPROACH

Position pricing For each product, run the Monte Carlo simulation

Statistics on global Multiply the result by the position value and calculate the lost value

09/02/2015 19Version du document

IEnumerable<double> results = StatisticsUtilities.SimulateMonteCarloWithPosition(

new MonteCarloInput{

Parameters = parameters,Position = position,Product = product

}, TotalSimulations);

Position Pricing With MonteCarlo

Calculus

IList<double> totals = new List<double>();

Func<IList<double>, string, IList<double>> sumList = (current, key) => Helpers.SumList(current, lostsValuesByProduct[key].ToList());

Page 20: La programmation concurrente par flux de données

MONTE CARLO VAR: BASIC APPROACH

09/02/2015 20Version du document

totals = lostsValuesByProduct.Keys.Aggregate(totals, sumList);

StatisticsUtilities.CalculateVar(totals, 0.99);

Aggregate the lost value for all products

Choose the VAR at 99% for 1 day

Statistics on Global

Distribution (VAR)

Page 21: La programmation concurrente par flux de données

Dataflow approach

MONTE CARLO VAR

09/02/2015 21Version du document

Page 22: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

DataFlow Graph

09/02/2015 22Version du document

Portfolios Composition And Market

Data

Global Position

Position Pricing With MonteCarlo

Calculus

Position Pricing With MonteCarlo

Calculus

Position Pricing With MonteCarlo

Calculus

AggregatorStatistics on

Global Distribution (VAR)

DataFlow

Page 23: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Chosen approach: parallelize per product

09/02/2015 23Version du document

Product

Product

Product

Product

N threads

CalculateLoss() x M iterations

CalculateLoss() x M iterations

CalculateLoss() x M iterations

CalculateLoss() x M iterations

Page 24: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Process overview

09/02/2015 24Version du document

TransformBlock

PriceMean

Standard DevPosition

IN: MonteCarloInput OUT: IEnumerable<double>

Losses

Normal distribution

Calculate Loss

ActionBlock TotalsLosses

IN: IEnumerable<double> OUT: IEnumerable<double>

Aggregator

Page 25: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

TransformBlock runs the Monte Carlo simulation

Key points:

▬ Do only one thing

▬ Keep work data local

▬ Fully enumerate returned data

09/02/2015 25Version du document

var monteCarlo = new TransformBlock<MonteCarloInput, IEnumerable<double>>(input =>

{

var normalDistribution = new NormalEnumerable();

return normalDistribution.Take(TotalSimulations)

.Select(alea => StatisticsUtilities.CalculateLoss(input, alea))

.ToList(); // Very important

}, ExecutionOptions);

Position Pricing With MonteCarlo

Calculus

Page 26: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

ActionBlock aggregates the result

No need to synchronize access to shared data

09/02/2015 26Version du document

var totals = new List<double>();

var aggregate = new ActionBlock<IEnumerable<double>>(doubles =>

{

if (!totals.Any())

{

totals.AddRange(doubles);

}

else

{

var losses = doubles.ToList();

foreach (var i in Enumerable.Range(0, losses.Count()))

{

totals[i] += losses[i];

}

}

});

Aggregator

Page 27: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Linking the blocks together

Triggering the data flow chain

Data posted asynchronously

09/02/2015 27Version du document

foreach (var portfolio in Portfolios

.SelectMany(x => x.Transactions)

.GroupBy(y => y.Product)

.Select(z => new KeyValuePair<Product, long>(z.Key, z.Sum(x => x.Position))))

{

var position = portfolio.Value;

var parameters = ProductParameters.First(x => x.Product.Equals(portfolio.Key));

monteCarlo.Post(new MonteCarloInput

{

Parameters = parameters,

Position = position

});

}

monteCarlo.LinkTo(aggregate, DataflowLinkOptions);

Global Position

Page 28: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Completing the tasks

Tricky to get right

▬ Can cause deadlocks

▬ Solution: Automatically propagate completion

09/02/2015 28Version du document

monteCarlo.Complete();

aggregate.Completion.Wait();

DataflowLinkOptions = new DataflowLinkOptions

{

PropagateCompletion = true

}

Page 29: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Manual completion propagation

Maximizing CPU usage

09/02/2015 29Version du document

monteCarlo.Completion.ContinueWith(t =>

{

if (t.IsFaulted)

{

((IDataflowBlock)aggregate).Fault(t.Exception); // Pass exception

}

else

{

aggregate.Complete(); // Mark next completed

}

});

ExecutionOptions = new ExecutionDataflowBlockOptions

{

MaxDegreeOfParallelism = Environment.ProcessorCount

}

Page 30: La programmation concurrente par flux de données

MONTE CARLO VAR: DATAFLOW APPROACH

Result

09/02/2015 30Version du document

0

500

1000

1500

2000

2500

3000

3500

4000

i5-4200U 4 @2.30GHz

Intel CeleronG1820 2 @

2.70GHz

Intel i5-2400 4 @3.00GHz

i7-3770K w/ 8 @5.09GHz

i7-4790K w/ 8 @4.00GHz

mill

isec

on

ds

CPU

Benchmark (lower is better)

Basic Data flow

Page 31: La programmation concurrente par flux de données

What did we learn?

CONCLUSION

09/02/2015 31Version du document

Page 32: La programmation concurrente par flux de données

CONCLUSION

Performance increase Faster

Automatically scale to hardware

Paradigm shift Macro-level optimization

New primitives

09/02/2015 32Version du document

github.com/acensi/techdays-2015

msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx

github.com/akkadotnet/akka.net

Find out more !

Experiment with the code

Parallelize data loading

Try new blocks

Come see us at the booth

Going further

Page 33: La programmation concurrente par flux de données

www.acensi.fr

Let’s keep the conversation going!

Come see us at booth 26

09/02/201533Version du document