gpu applications for modern large scale asset management · gpu applications for modern large scale...

GPU Applications for Modern Large Scale AssetManagement

GTC 2014 – San Jose, California

Dr. Daniel Egloff

QuantAlea & IncubeAdvisory

March 27, 2014

Outline

Portfolio Construction

Constructing Asset Return Distributions

I) Entropy ApproachLarge Scale Convex Optimization with GPUs

II) Bayesian ApproachParallel Metropolis Hastings on GPUs

Conclusion

Portfolio Construction

I Markowitz mean variance portfolio optimizationI Optimally balance risk and performance

w∗λ = arg maxw , 1Tw=b

Uλ(w), Uλ(w) = µ>w − λw>Σw (1)

I Risk budgetingI Construct portfolio so that risk contributions satisfy

RCi (w) = bi for given risk budgets biI For volatility risk measure we obtain

RCi (w) = wi(Σw)i√w>Σw

(2)

I For both approaches we need means µ and covariances Σ ofasset returns

Asset Return Distributions

Two steps

1. Prior asset return distribution from historical data or inverseoptimization principle

2. Posterior asset return distribution should reflect expert viewsand beliefs

I Views are distortions of a prior distributionI Views translate into constraintsI Linear constraints on mean return pioneered by Black and

Litterman

We discuss two approaches

I Entropy based approach

I Bayesian methods

Entropy Approach

I Minimum discrimination principle (Kullback and Leibler 1951)infers a posterior distribution such that it

I Satisfies a set of moments constraintsI Is as hard as possible to differentiate from given prior

distribution

I Minimum discrimination principle suitable methodology toincorporate constraints on a prior distribution which are

I LinearI QuadraticI Convex

Nonparametric Setup

I Sample the probability distributionsI Discrete sample space X = {x1, . . . , xm}I Probabilities elements of the standard simplex

p = (p1, . . . , pm)> ∈ ∆m ⊂ Rm

I Implement views as distortions of prior probabilities piI Express views as feature functions fj : X → R, j = 1, . . . , n

I Identify feature vector f = (f1, . . . , fn)> : X → Rn with matrixF = (fji ) ∈ Rn×m, with fji = fj(xi )

Linear Constraints on Features

I Expectation of feature f : X → R w.r.t measure q ∈ ∆m is

Eq[f ] =∑

qi f (xi ) = q>f

I Minimum discrimination principle defines probability p∗ as

p∗ = arg minq∈∆m

q> (log(q)− log(p)) (9)

subject to constraints

Eq[fj ] ≤ cj , j = 1, . . . , n, Eq[fj ] = cj , j = 1, . . . , n. (10)

Optimization Problem

I Expectation views (10) lead to linear constraints

p∗ = arg minq ∈ Rm

+, 1>q = 1

Fq ≤ c , F q = c

q> (log(q)− log(p)) (11)

I Need large number of samples for accurate representation ofdistributions with m ' 106

I With linear constraints: Lagrange duality to solve the optimalsolution

I Strong duality, i.e. no duality gapI Converting problem of dimension m to much smaller problems

of dimension n + n to find optimal Lagrange multipliers

Convex Constraints

I How to extend to variance and covariance constraints?

I Lower variance constraint q>f 2 − (q>f )2 ≥ c convex but dualproblem is ill-conditioned

I Upper variance constraint q>f 2− (q>f )2 ≤ c not even convex

I Approximation by linearization q>f 2 ≤ c + (p>f )2 to get asecond moment constraint

I But .... linearized version often not accurate enough

Large Scale Convex Optimization

I For convex constraints solve directly the primal problem

I Several accelerated first order methods (FOM) for convexoptimization algorithms have been developed

I Modern algorithms adjust to the problem’s geometryI Examples

I Mirror descent (MD) algorithmI Level bundle methodsI Bundle mirror descentI Many different variations

Exploiting Geometry

I Gradient methods regularized with quadratic proximity term

y = arg minx∈X

{γ〈ξ,∇f (x)〉+ c‖ξ − x‖2}

I Replace ‖ · ‖2 with some distance like function D(·, ·) thatbetter exploits the geometry of X and is simple to calculate

I Candidates: Bregman type distance based on kernelω : X → R

D(x , y) = Dω(x , y) ≡ ω(x)− ω(y)− 〈x − y , ω′(y)〉

Bregman Distances

I Using ω(u) = 12‖u‖

2 gives Euclidian norm distance

Dω(x , y) =1

2‖x − y‖2

I The entropy ω(x) = x> log(x) on X = ∆m leads to

Dω(x , y) = x> log

(x

y

)which is 1-strongly convex with respect to the L1 norm

Dω(x , y) ≥ 1

2‖x − y‖2

1 ∀x , y ∈ ∆m

Constrained Mirror Descent Setup

I Convex problem minx∈X f0(x) subject to f1(x) ≤ 0

I X ⊂ E , closed convex subset of f.d. vector space E

I ‖ · ‖ norm on E , ‖ · ‖∗ dual norm on E ∗

I Duality pairing 〈·, ·〉 : E ∗ × E → RI Bregman distance Dω(x , y) from kernel ω : X → RI Prox mapping Proxx : E ∗ → X ◦

Proxx(ξ) = arg minu∈X

{〈ξ, u〉+ Dω(u, x)}

Constrained Mirror Descent

I Mirror descent with constraints is a simple FOM

I Initial search point x1 = xω = arg minx∈X ω(x)I For t = 1, . . . ,N do

I If f1(xt) ≤ tol then i(t) = 0 else i(t) = 1I Select step size γtI Update search point xt+1 = Proxxt (γt f

′i(t)(xt))

I Approximate solution after N steps

xN = arg minx∈{xt |i(t)=0}

f0(x)

I Possible choice tol = γ‖f ′1(xt)‖∗ and γt = γ‖f ′i(t)(xt)‖−1∗

Geometry on Simplex

I Optimization problem (11) formulated on simplex X = ∆m

I Norm ‖ · ‖1 with dual norm ‖ · ‖∞I Prox mapping allows analytical expression

Proxx(ξ) =(x>e−ξ

)−1x • e−ξ

I Initial point xω = n−11n the uniform distribution

Constrained Mirror Descent on GPU

I GPU implementation of constrained mirror descent algorithmsfor very large n

I Parallel coordinate-wise calculation of gradients for objectivefunction and constraints

I Vector reductions to calculate objective value, norms andscalar products

I Parallel coordinate-wise calculation of prox vector, searchpoints and execution of gradient step

I Gradient steps can be further parallelized by coordinateaggregation or randomization methods

Schematic Gradient Step

Gradient step

Test convergence

Calculate objective value

Calculate constraints and objective values and gradients

Calculate dual norm of gradients and update approximate solution

Calculate unnormalized prox vector

Calculate prox normalization factor

Normalize prox vector and update search point

No

Parallel reduction algorithm

(2 kernels)

Parallel vector transform

kernel

Tuning with Kernel Fusion

I Usually kernel launch time (≈ 50µs) not an issue

I For algorithms with many iterations, this becomes crucial

I Use kernel fusion technique to reduce number of kernel callsper gradient step

I Minimize number of arguments to pass to each kernel

Bayesian Approach

I Bayesian methods treat the model parameters as randomvariables

I Views go into the prior distributions of the model parameters

I Historical data is used to update the prior distribution

I Not as flexible as entropy approach

I Does not rely on optimization hence less fragile

I Work horse is a parallel Metropolis Hastings Markov chainMonte Carlo sampler

Parallel Metropolis Hastings

I Parallelization of Metropolis Hastings through multipleparallel chains

I Further parallelization through pre-fetching

I Require efficient parallel random number generation andbranch free methods to sample from proposal distribution


GPU Metropolis Hastings sampler generation from generic targetand proposal distribution expressions with F# and Alea.cuBase

let inline mhsamples (targetpdf : Expr<’T -> ’T>)

(proppdf : Expr<’T -> ’T -> ’T -> ’T>)

(proprnd : Expr<’T -> ’T -> ’T -> ’T>) = cuda {

let! proppdf = proppdf |> Compiler.DefineInlineFunction

let! proprnd = proprnd |> Compiler.DefineInlineFunction

let! targetpdf = targetpdf |> Compiler.DefineInlineFunction

let! kernel =

<@ fun (steps:int) (scale:’T) (initial:deviceptr<’T>)

(numbers:deviceptr<’T>) (result:deviceptr<’T>) ->

let idx = blockIdx.x * blockDim.x + threadIdx.x

let nchains = gridDim.x * blockDim.x

. . .


Expressions for target distribution and proposal distribution arecompiled directly into kernel

/// Dynamically generate expressions for MH at runtime

let targetpdf =

<@ fun x ->

exp(-x*x)*(2G + sin(5G*x) + sin(2G*x)) @>

let proppdf =

<@ fun sigma x y ->

exp(-(y - x)*(y - x)/(2G*sigma*sigma)) / (sigma*__sqrt2pi()) @>

let proprng =

<@ fun sigma x u -> x + sigma*u @>

/// Compiler integrated into F#, callable through API

let mh = mhsamples targetpdf proppdf proprng |> Util.load Worker.Default

Example

−8 −6 −4 −2 0 2 4 6 80

0.5

1

1.5

2

2.5x 10

4

Figure : Sample histogram fortarget densitye−x

2

(2 + sin(5x) + sin(2x))

−10 −8 −6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Figure : Kernel densitysmoothed sample pdf fore−x

2

(2 + sin(5x) + sin(2x))

Conclusion

I We have shown two applications of GPUs for quantitativeasset management

I More applications in strategy calibration, back-testing andMIP

I Large scale convex optimization occurs in other fields such asI Machine learning such as e.g. deep believe network training in

a big data contextI Network analysisI Truss topology designI Compressed sensing

I Markov chain Monte Carlo methods require flexible ways tospecify various parts such as target pdf, the proposaldistribution and a dynamic way to derive tuning parameters,for which our Alea.cuBase F# GPU compiler is ideal

gpu applications for modern large scale asset management · gpu applications for modern large scale...

Documents