slides portfolio-2017-2

Arthur Charpentier, Université de Rennes 1, Portfolio Optimization - 2017

Portfolio Optimization # 1A. Charpentier (Université de Rennes 1)

Université de Rennes 1, 2017/2018

@freakonometrics freakonometrics freakonometrics.hypotheses.org 1

https://twitter.com/freakonometrics

https://freakonometrics.github.io/

https://freakonometrics.hypotheses.org/


Markowitz (1952) & Theoretical Approach

Following Markowitz (1952), consider n assets, infinitely divisible.

Their returns are random variables, denoted X, (jointly) normaly distributed,N (µ,Σ), i.e. E[X] = µ and var(X) = Σ.

Let ω denotes weights of a given portfolio.

Portfolio risk is measured by its variance var(αTX) = σ2α = αTΣα

For minimal variance portfolios, with a given portfolio return, r, the optimizationproblem can be stated as

α? = argminαTΣα

s.t.

E(αTX) = αTµ = r

αT1 = 1

Allocation α is said to be efficient if it is not possible to find another one, withthe same expected return, and a strictly lower variance, or dually, to find anotherone with the same variance, and a strictly higher expected return.


https://www.math.ust.hk/~maykwok/courses/ma362/07F/markowitz_JF.pdf






Recall that to solve a program

α∗ = argminf(α)

s.t. g1(α), · · · , gp(α) ≥ 0,

where g1, · · · , gp are p continuously differentiable functions, a necessary andsufficient condition for α∗ to be a solution is that (α∗,λ) is solution of the n+ p

first order conditions∂

∂αi(f(α) + λ1g1(α) + · · ·+ λpgp(α)) = 0 for i = 1, 2, ..., n,

∂

∂λj(f(α) + λ1g1(α) + · · ·+ λpgp(α)) = 0 for j = 1, 2, ..., p.

Constants λ = (λ1, · · · , λp) are Lagrange multipliers, and function

α 7→ f(α) + λ1g1(α) + · · ·+ λpgp(α)

is the Lagrangien associated with the optimization program.







Here we want to minimize var(αX), i.e. αTΣα (a quadratic function) with two(linear) constraints. We want to solve

∂

∂αi

(α′Σα+ λ1(αTI− 1) + λ2(αTµ− r)

)= 0 for i = 1, 2, · · · , n,

∂

∂λj

(α′Σα+ λ1(αTI− 1) + λ2(αTµ− r)

)= 0 for j = 1, 2.

Observe that∂

∂ααTΣα = 2Σα and ∂

∂αµTα = µ,

so that α∗ has to be α∗ = λ1Σ−1I + λ2Σ−1µ, where Lagrange multipliers aregiven by λ1ITΣ−1µ+ λ2ITΣ−1I = 1

λ1µTΣ−1µ+ λ2µ

TΣ−1I = r,







Set a = ITΣ−1µ, b = µTΣ−1µ and c = ITΣ−1I, we can write the last system λ1a+ λ2c = 1λ1c+ λ2a = ε

,

Set d = bc− a2, so that

λ1 = cε− ad

and λ2 = b− aεd

.

From that expression of Lagrange multipliers, use the first order conditions toexpress Σα = λ1R+ λ2I and the optimal variance of the portfolio is

σ2∗ = αTΣα = λ1ε+ λ2,







Hence, without any risk-free asset, the efficient frontier in the mean-varianceproblem is a parabolic function

σ2∗ = cε2 − 2aε+ b

d

where ε is the expected return of the portfolio.

Further, we have optimal weights

α∗ = Σ−1 (λ1X + λ2I) ,

which is a linear expression.






Computational Aspects

Consider the following three assets, as in Zivot (2013)

1 > asset . names <- c("MSFT", "NORD", "SBUX")

2 > mu.vec = c(0.0427 , 0.0015 , 0.0285)

3 > names (mu.vec) = asset . names

4 > sigma .mat = matrix (c(0.0100 , 0.0018 , 0.0011 ,0.0018 , 0.0109 ,

0.0026 ,0.0011 , 0.0026 , 0.0199) ,nrow =3, ncol =3)

5 > dimnames ( sigma .mat) = list( asset .names , asset . names )

6 > mu.vec

7 MSFT NORD SBUX

8 0.0427 0.0015 0.0285

9 > sigma .mat

10 MSFT NORD SBUX

11 MSFT 0.0100 0.0018 0.0011

12 NORD 0.0018 0.0109 0.0026

13 SBUX 0.0011 0.0026 0.0199


https://faculty.washington.edu/ezivot/econ424/






Here are the three assets in the (σ, µ)-plane (with possibly much more)







Return and variance of a given portfolio are given by1 > x.vec = rep (1 ,3)/3

2 > names (x.vec) = asset . names

3 > mu.p.x = crossprod (x.vec ,mu.vec)

4 > sig2.p.x = t(x.vec)%*% sigma .mat%*%x.vec

5 > sig.p.x = sqrt(sig2.p.x)

6 > mu.p.x

7 [ ,1]

8 [1 ,] 0.02423

9 > sig.p.x

10 [ ,1]

11 [1 ,] 0.07587

Global minimum variance portfolio is given by solving

minαTΣα s.t. αT1 = 1






First order conditions on the Lagrangian yield2Σ 11T 0

︸︷︷︸

A

α?λ

=

01

︸︷︷︸b







Here we have1 > top.mat = cbind (2* sigma .mat , rep (1, 3))

2 > bot.vec = c(rep (1, 3) , 0)

3 > Am.mat = rbind (top.mat , bot.vec)

4 > b.vec = c(rep (0, 3) , 1)

5 > z.m.mat = solve (Am.mat)%*%b.vec

6 > m.vec = z.m.mat [1:3 ,1]

7 > m.vec

8 MSFT NORD SBUX

9 0.4411 0.3656 0.1933

the portfolio return and standard deviation are1 > mu.gmin = as. numeric ( crossprod (m.vec , mu.vec))

2 > mu.gmin

3 [1] 0.02489

4 > sig2.gmin = as. numeric (t(m.vec)%*% sigma .mat%*%m.vec)

5 > sig.gmin = sqrt(sig2.gmin)






6 > sig.gmin

7 [1] 0.07268

Another way to compute it is to go further on the analytical expression, and toderive

α? = Σ−111TΣ−11

1 > one.vec = rep (1, 3)

2 > sigma .inv.mat = solve ( sigma .mat)

3 > top.mat = sigma .inv.mat%*%one.vec

4 > bot.val = as. numeric ((t(one.vec)%*% sigma .inv.mat%*%one.vec))

5 > m.mat = top.mat/bot.val

6 > m.mat [ ,1]

7 MSFT NORD SBUX

8 0.4411 0.3656 0.1933






Efficient portfolios are obtained by solving

minαTΣα s.t. αTµ = r0 and αT1 = 1

First order conditions on the Lagrangian yield2Σ µ 1µT 0 01T 0 0

︸︷︷︸

A

α?

λ1

λ2

=

0r0

1

︸︷︷︸b

If our target is to get the same return as Microsoft, say (MSFT),1 > top.mat = cbind (2* sigma .mat , mu.vec , rep (1, 3))

2 > mid.vec = c(mu.vec , 0, 0)

3 > bot.vec = c(rep (1, 3) , 0, 0)

4 > A.mat = rbind (top.mat , mid.vec , bot.vec)

5 > bmsft .vec = c(rep (0, 3) , mu.vec["MSFT"], 1)






6 > z.mat = solve (A.mat)%*% bmsft .vec

7 > x.vec = z.mat [1:3 ,]

8 > x.vec

9 MSFT NORD SBUX

10 0.82745 -0.09075 0.26329

the portfolio return and standard deviation are1 > mu.px = as. numeric ( crossprod (x.vec , mu.vec))

2 > mu.px

3 [1] 0.0427

4 > sig2.px = as. numeric (t(x.vec)%*% sigma .mat%*%x.vec)

5 > sig.px = sqrt(sig2.px)

6 > sig.px

7 [1] 0.09166

If our target is to get the same return as Starbucks (SBUX),1 > bsbux .vec = c(rep (0, 3) , mu.vec["SBUX"], 1)

2 > z.mat = solve (Ax.mat)%*% bsbux .vec






3 > y.vec = z.mat [1:3 ,]

4 > y.vec

5 MSFT NORD SBUX

6 0.5194 0.2732 0.2075

with expected return and standard deviation1 > mu.py = as. numeric ( crossprod (y.vec , mu.vec))

2 > sig2.py = as. numeric (t(y.vec)%*% sigma .mat%*%y.vec)

3 > sig.py = sqrt(sig2.py)

4 > mu.py

5 [1] 0.0285

6 > sig.py

7 [1] 0.07355

Observe that actually, those two portfolios are extremely correled1 > sigma .xy = as. numeric (t(x.vec)%*% sigma .mat%*%y.vec)

2 > rho.xy = sigma .xy/(sig.px*sig.py)

3 > rho.xy






4 [1] 0.8772

Here again, one can go further on analytical expressions,

α? = −12λ1Σ−1µ− 1

2λ2Σ−11 = 12Σ−1Mλ

where M = [µ,1].

One can easily prove that λ = (λ1, λ2) is solution of

−12

µTΣ−1µ µTΣ−111TΣ−1µ 1TΣ−11

︸︷︷︸

B

λ =

r0

1

︸︷︷︸r0

thus, λ = −2B−1r0,1 > M.mat = cbind (mu.vec , one.vec)

2 > B.mat = t(M.mat)%*% solve ( sigma .mat)%*%M.mat

3 > mu. tilde .msft = c(mu.vec["MSFT"], 1)






4 > x.vec .2 = solve ( sigma .mat)%*%M.mat%*% solve (B.mat)%*%mu. tilde .msft

5 > x.vec .2

6 [ ,1]

7 MSFT 0.82745

8 NORD -0.09075

9 SBUX 0.26329

We can now compute the efficient frontier.

We have seen that any minimum variance portfolio can be created as a convexcombination of any two minimum variance portfolios with different targetexpected returns.

Consider our two previous portfolio, x and y. Let α be a portfolio, expressedfrom x and y, α = αx+ (1− α)y

1 > a = 0.5

2 > z.vec = a*x.vec + (1-a)*y.vec

3 > z.vec

4 MSFT NORD SBUX






5 0.6734 0.0912 0.2354

with expected return and standard deviation1 > mu.pz = as. numeric ( crossprod (z.vec , mu.vec))

2 > sig2.pz = as. numeric (t(z.vec)%*% sigma .mat%*%z.vec)

3 > sig.pz = sqrt(sig2.pz)

4 > mu.pz

5 [1] 0.0356

6 > sig.pz

7 [1] 0.08006

or equivalently1 > mu.pz = a*mu.px + (1-a)*mu.py

2 > sig.xy = as. numeric (t(x.vec)%*% sigma .mat%*%y.vec)

3 > sig2.pz = a^2 * sig2.px + (1-a)^2 * sig2.py + 2*a*(1-a)*sig.xy

4 > sig.pz = sqrt(sig2.pz)

5 > mu.pz

6 [1] 0.0356






7 > sig.pz

8 [1] 0.08006

If we have a given target expected return, we have to find first our appropriateweight α. For instance, if we want the same return as Nordstrom (NORD)

1 > a.nord = (mu.vec["NORD"] - mu.py)/(mu.px - mu.py)

2 > z.nord = a.nord*x.vec + (1 - a.nord)*y.vec

3 > z.nord

4 MSFT NORD SBUX

5 -0.06637 0.96509 0.10128







Expected return and standard deviation are then1 > mu.pz.nord = a.nord*mu.px + (1-a.nord)*mu.py

2 > sig2.pz.nord = a.nord ^2 * sig2.px + (1-a.nord)^2 * sig2.py + 2*a.

nord*(1-a.nord)* sigma .xy

3 > sig.pz.nord = sqrt(sig2.pz.nord)

4 > mu.pz.nord

5 NORD

6 0.0015

7 > sig.pz.nord

8 NORD

9 0.1033







Now, to compute the efficient frontier, consider a sequence of α’s1 a = seq(from =1, to=-1, by = -0.1)

2 n.a = length (a)

3 z.mat = matrix (0, n.a, 3)

4 mu.z = rep (0, n.a)

5 sig2.z = rep (0, n.a)

6 sig.mx = t(m)%*% sigma .mat%*%x.vec

7 for (i in 1:n.a)

8 z.mat[i, ] = a[i]*m + (1-a[i])*x.vec

9 mu.z[i] = a[i]*mu.gmin + (1-a[i])*mu.px

10 sig2.z[i] = a[i]^2 * sig2.gmin + (1-a[i]) ^2 * sig2.px + 2*a[i]*(1-a[i

])*sig.mx

11 plot(sqrt(sig2.z), mu.z, type="b", ylim=c(0, 0.06) , xlim=c(0, 0.17) ,

pch =16 , col="blue", ylab= expression (mu[p]) , xlab= expression ( sigma [

p]))

12 text(sig.gmin , mu.gmin , labels =" Global min", pos =4)

13 text(sd.vec , mu.vec , labels = asset .names , pos =4)







The weights, as a function of volatility, are the following1 library ( PerformanceAnalytics )

2 chart . StackedBar (z.mat , xaxis . labels =sqrt(sig2.z))






Alternative Formulations

The problem we did solve was, for a given target return r

α? = argminαTΣα

s.t.

E(αTX) = αTµ = r

αT1 = 1

Alternatively, for a given target variance σ2, solve

α? = argmaxE(αTX) = αTµ

s.t.

αTΣα = σ2

αT1 = 1

Or if λ > 0 denotes the Arrow-Pratt risk aversion coefficient, solve

α? = argmaxE[u(X)] = αTµ− λ

2αTµ s.t. αT1 = 1

Those three problems are equivalent.






Interpretation of Markowitz optimal allocation

Optimal portfolios can be expressed as linear combinations of expected returns,i.e.

α∗ = p+ εq,

where p and q depend only on the variance-covariance matrix of returns

p = 1d

(bΣ−11− aΣ−1µ) and q = 1d

(cΣ−1µ− aΣ−11)

p is actually a portfolio allocation since p′I = 1, while q and be interpreted aschanges with respect to allocation p (qT1 = 0 and [p+ q]T1 = 1), see Merton(1972). We can also write

α = (1− ε)p+ ε(p+ q).

Tobin (1958) and Sharpe (1964) extended Markowitz’s theory by introducting arisk-free asset.


http://www.people.hbs.edu/rmerton/analytical%20derivation.72.pdf

http://www.people.hbs.edu/rmerton/analytical%20derivation.72.pdf

http://web.uconn.edu/ahking/Tobin58.pdf

http://efinance.org.cn/cn/fm/Capital%20Asset%20Prices%20A%20Theory%20of%20Market%20Equilibrium%20under%20Conditions%20of%20Risk.pdf





Allocation when there is a Risk-Free Asset

With a risk-free asset, parabolic functions are degenerated.

Let 0 be the index of that asset, so that X0 = r0, with E(X0) = r0 andvar(X0) = 0.

The investor can invest a share αT1 in risky assets, and 1−αT1 in the risk-freeasset.

The return of the porfolio is random variable Rα = αTX + [1−αT1]r0. Firstmoments are

µα = αTµ+ [1−αT1]r0 and σ2α = αTΣα.

Risk minimization yields the following optimization problem

α? = argminαTΣα

s.t. αTµ+ [1−αT1]r0 = r

Here again, apply Lagrange multipliers method to this convex optimizationproblem

L(α, λ1) = αTΣα+ λ1[(r − r0)αT(µ− r01)]







Henceα? = λ1Σ−1(µ− r01)

whereλ1 = r − r0

(µ− r01)TΣ−1(µ− r01)

Define the so-called Market Portfolio as follows: consider the fully-investedoptimal portfolio αM with αT

M1 = 1 i.e.

αM = λ1Σ−1(µ− r01)

whereλ1 = 1

1TΣ−1(µ− r01)







with expected return

r0 + (µ− r01)TΣ−1(µ− r01)1TΣ−1(µ− r01)

and variance(µ− r01)TΣ−1(µ− r01)

(1TΣ−1(µ− r01))2

Tobins Separation Theorem: Every optimal portfolio invests in acombination of the risk-free asset and the Market Portfolio, from Tobin (1958).

Thus α? invests in the same risky assets as the Market Portfolio and in the sameproportions, the only difference is the total weight.

The efficient frontier of optimal portfolios as represented on the (σα, µα)-plane ofreturn expectation (µα) vs standard-deviation (σα) for all portfolios.

The portfolio expected return increases linearly with standard deviation, theso-called Capital Market Line.


http://web.uconn.edu/ahking/Tobin58.pdf






Efficient portfolios of risky assets and a single risk-free asset (Treasury Bill) areportfolios consisting of the highest Sharpe ratio portfolio (tangency portfolio)and the Treasury Bill.

Thus, the first step is to compute the tangency portfolio

argmaxαTµ− rf√αTΣα

s.t. αT1 = 1

i.e.

αt = Σ−1(µ− rf1)1TΣ−1(µ− rf1)

1 > rf = 0.005

2 > sigma .inv.mat = solve ( sigma .mat)

3 > one.vec = rep (1, 3)

4 > mu. minus .rf = mu.vec - rf*one.vec

5 > top.mat = sigma .inv.mat%*%mu. minus .rf






6 > bot.val = as. numeric (t(one.vec)%*%top.mat)

7 > t.vec = top.mat [ ,1]/bot.val

8 > t.vec

9 MSFT NORD SBUX

10 1.0268 -0.3263 0.2994

Expected return and standard deviation are here1 > mu.t = as. numeric ( crossprod (t.vec , mu.vec))

2 > mu.t

3 [1] 0.05189

4 > sig2.t = as. numeric (t(t.vec)%*% sigma .mat%*%t.vec)

5 > sig.t = sqrt(sig2.t)

6 > sig.t

7 [1] 0.1116







An alternative representation is given from a standard minimum variancecomputation. Let µ = µ− rf1 and consider some target excess returnr0 = r0 − rf . Our problem is to solve

minαTΣα s.t. αTµ = r0

One can prove that

α? = −λΣ−1µ where λ = − r0

µTΣ−1µ

or

α? = r0Σ−1µ

µTΣ−1µ

and 1−α?T1 is invested in the risk-free asset.

We have seen that the tangency portfolio is






1 > t.vec

2 MSFT NORD SBUX

3 1.0268 -0.3263 0.2994

4 > mu.t

5 [1] 0.05189

6 > sig.t

7 [1] 0.1116

If our target is a portfolio with standard deviation 20%, use1 > x.t.02 = 0.02/sig.t

2 > x.t.02

3 [1] 0.1792

4 > 1-x.t.02

5 [1] 0.8208

i.e. 82% on the risk-free asset, and 18% on the tangency portfolio. Expectedreturn and standard deviation are here

1 > mu.t.02 = x.t.02*mu.t + (1-x.t.02)*rf






2 > sig.t.02 = x.t.02*sig.t

3 > mu.t.02

4 [1] 0.01340

5 > sig.t.02

6 [1] 0.02

If we want a 7% expected return1 > x.t.07 = (0.07 - rf)/(mu.t - rf)

2 > x.t.07

3 [1] 1.386

4 > 1-x.t.07

5 [1] -0.3862

which involves borrowing at the Treasury Bill (leveraging)







Again, we can get weights in differents assets,






CAPM and APT

In this model (corresponding to the CAPM - Capital Asset Pricing Model), atequilibrium, only the systematic risk of the asset is valued by the market, and

E(Xi)− r0 = [E(XM )− r0] · βi, where XM = αTMX,

where E(XM )− r0 is the risk price, and [E(XM )− r0] · βi is the risk premium ofasset i, with

βi = cov(XM , Xi)var(XM ) .

This β measures sensitivity of asset i to market fluctuations. This model,introduced in Markowitz (1952), was presented in Sharpe (1963) as a regressionmodel

Xi = αi + βiXM + εi.

The risk of asset i is var(Xi) = (β2i var(XM )) · var(εi), where (β2

i var(XM )) can beinterpreted as the systematic risk of asset i, and var(εi) as the specific risk ofasset i.






CAPM and APT

In the CAPM, only one risk factor is considered, a so-called market factor. Ross(1976) sugested to extend the approach of Sharpe (1963) by introducing severalrisk factors, in the APT (Arbitrage Pricing Theory) model, where

Xi = αi + b1,iF1 + · · ·+ bk,iFk + εi.

Those risk factors can be determined endogeneously, using some principalcomponent analysis, or exogeneously using some (economic) covariates.






Describing Diversification

Choueifaty & Coignard (2008) and Choueifaty et al. (2011) introduced adiversification ratio (DR) is defined for a given weight vector ω in the allowed setof portfolio solutions Ω as

DR(ω) =ωT√

diag(Σ)√ωTΣω

the higher the DR, the more the portfolio is diversified

DR has a lower bound of one (obtained with single-asset portfolios)


http://www.tobam.fr/wp-content/uploads/2014/.../TOBAM-JoPM-Maximum-Div-2008.pdf

http://www.qminitiative.org/UserFiles/files/FroidureSSRN-id1895459.pdf





Describing Diversification

An alternative formulation is the following : let CR denote thevolatility-weighted concentration ratio (see Herfindahl–Hirschmann index)

CR(ω) =∑i(ωiσi)2(∑i ωiσi

)2

(the later is bounded in the interval [n−1, 1]) and the volatility-weighted averagecorrelation

ρ(ω) =∑i 6=j ωiωjσiσjρi,j∑i 6=j ωiωjσiσj

thenDR(ω) = 1√

(DR(ω) + ρ(ω))−DR(ω) ∗ ρ(ω)

The most diversified portfolio is then solution of

argmaxDR(ω)






Robust portfolio optimization

The use of sample estimators for the expected returns and the covariance matrixcan result in sub-optimal portfolio results due to estimation error, see Maronna etal. (2006) and Todorov and Filzmoser (2009)

The most popular estimators are the ML estimators (maximum likelihood).Consider i.i.d. data, with density f(·;θ)

θ = argmax L(θ;x) = argmax

n∏i=1

f(xi;θ)

= argmin−

n∑i=1

log f(xi;θ)

First order condition :n∑i=1

1log f(xi; θ)

∂f(xi;θ)∂θ

∣∣∣∣∣θ=θ

=n∑i=1

Ψ(x, θ) = 0 were

Ψ(x, θ) is the score function.


https://www.jstatsoft.org/article/view/v032i03






For the Gaussian distribution, f(x, θ) ∝ exp(− (x− θ)2

2σ2

)so that

Ψ(x, θ) = x− θ2σ2 , and θ = x = 1

n

n∑i=1

xi

For the Laplace distribution, f(x, θ) ∝ exp(−|x− θ|

σ

)so that

Ψ(x, θ) = sign(x− θ)σ

, and θ = median(x).

Most MLE estimators θ are consistent (converge in probability) estimators of thetrue parameter θ, and has an asymptotic normal distribution,

√n(θ − θ

) L→ N (0,V [θ])

where the asymptotic variance is actually the smallest possivle variance (forunbiased estimators), V [θ] = I[θ]−1 where I[θ] denotes Fisher information,E[Ψ2(x, θ)]







Hence, for the Gaussian and the Laplace distribution, V [θ] = σ2

See Casella and Berger (1990) for more details.

One can define the efficiency of any (univariate) estimator θ as

eff(θ, F ) = V [θmle, F ]V [θ, F ]

∈ [0, 1]

For instance, eff(x,Laplace) = 50% and eff(median(x),Gaussian) = 63.7%

Since we don’t know the (true) distribution, how could we know ?

The MLE estimator of the correlation is

ρj,k = Σj,k√Σj,jΣk,k

where Σj,k = 1n

n∑i=1

(xi,j − xj)(xi,k − xk)







How robust is that estimator ? See Martin (2014)

Consider the following two financial series







How robust is that correlation estimator ? See Martin (2014)

MLE (or classical) correlation estimator

θ ∼ 30%

while a more robust estimator yields

θ ∼ 65%

(here fast minimum covariance determinant -MCD)







One can also consider a regression problem (EDS returns vs. Market returns)

MLE (or classical) slope estimator

β ∼ 1.41

while a more robust estimator yields

β ∼ 2.03

The main reason for such a difference is outliers.







How to detect outliers ?

Use Mahalanobis distance (Euclidean distance with spherized coordinates),induced by the norm

‖xi‖2 = (xi − µ)TΣ−1(xi − µ) = zTi zi

where zi = Σ−1/2(xi − µ).







How robust is that approach ? See Martin (2014)MLE (or classical) correlation estimatoron the left and a more robust estimatorestimator for Σ on the rightSee Tukey (1979) “[· · · ]just which robustmethods you use is not important – whatis important is that you use some. Itis perfectly proper to use both classicaland robust methods routinely, and onlyworry when they differ enough to mat-ter. But when they differ, you shouldthink hard.”







Use of trimmed mean (or truncated mean) instead of standard sample mean

µ = x = 1n

n∑i=1

xi:n against µα = 1n(1− 2α)

n−[αn]∑i=1+[αn]

xi:n

where classically α ∼ 5% in Tukey (1960) while it is suggested to use 24% forLibor Rate Fixing (BBA, see wikipedia).

Observe that on the same dataset, banks can have very different estimates, seeTuckman & Serrat (2011)

On can consider so-called location M-estimates,

θ = argmin

n∑i=1

ρ(xi;θ)

for some function ρ,


https://en.wikipedia.org/wiki/Libor

https://en.wikipedia.org/wiki/Libor






θ = argmin

n∑i=1

ρ(xi;θ)

If ρ(x;θ) = − log f(x;θ) we get the MLE

If ρ(x;θ) = (x− θ)2 we get the OLS

In the case of the mean,

µ = argmin

n∑i=1

ρ

(xi − µσ

)

which yields a first order conditionn∑i=1

Ψ(xi − µσ

)= 0







Huber (1964) suggested

ρ(x) =

x2/2 if |x| ≤ δδ(|x| − δ/2

)otherwise

which is (asymptotically) equivalent to a trimmed mean.

ρ(x) =

x if |x| ≤ δ±δ otherwise


https://www.jstor.org/stable/2238020






Hampel (1974) introduced the Empirical Influence Function

eif(x; θ,x) = (n+ 1)[θ(x,x)− θ(x)

]The class of MM estimators was originally introduced by Yohai (1987) and Yohaiet al. (1991) in the context of robust regression analysis.

Consider some random vector X ∼ N (µ,Σ)

Consider the related Mahalanobis norm,

‖x‖2 = (x− µ)TΣ−1(x− µ)

Robust estimators of µ and Σ are obtained from

(µ, Σ) = argmin∑

‖xi‖2







The Minimum Covariance Determinant Estimator (MCD) was introduced byRousseeuw (1984)

Given a sample x1, · · · ,xn, consider the subset of h observations (out of n)whose (classical) covariance matrix has a smallest determinant

Consider usually h = (d+ dim(x) + 1)/2

A C-step is a move from one approximation µ1, C1 of the mean and the varianceto a new one µ2, C2 with (possibly) a lower determinant...

1 > library ( rrcov )

2 > mcd= CovMcd (X)

3 > summary (mcd)







To compute it, consider the following 3 steps,

C-step iteration: repeat 500 times some C-steps and keep the best 10 solutions.From each of the top 10, carry out C-steps until convergence, and keep the bestone.

Partitoning if the dataset is large, partition into (say) 5 disjoint subsets. Carryout C-steps iterations for each subsets. Use the best solutions as starting points.

Nesting if the subset is even larger, consider random subsets and use thepartition procedure







Finally, the Stahel–Donoho estimator (SDE) is due to Stahel (1981) and Donoho(1982), while the OGK estimator was proposed by Maronna & Zamar (2002).






Computational Aspects1 library ( FRAPO )

2 data( StockIndex )

3 pzoo = zoo ( StockIndex , order .by = rownames ( StockIndex ) )

4 rzoo = ( pzoo / lag ( pzoo , k = -1) - 1 ) * 100

The following function can be used to estimate Σ1 Moments <- function ( x , method = c ( " CovClassic " , " CovMcd " , "

CovMest " , " CovMMest " , " CovMve " , " CovOgk " , " CovSde " , " CovSest "

) , ... )

2 method <- match .arg ( method )

3 ans <- do.call ( method , list ( x = x , ... ) )

4 return ( getCov ( ans ) )






Computational Aspects1 > Moments (as. matrix (rzoo)," CovClassic ")

2 SP500 N225 FTSE100 CAC40 GDAX HSI

3 SP500 17.772 12.706 13.766 17.802 19.451 18.924

4 N225 12.706 36.619 10.773 15.025 16.180 16.653

5 FTSE100 13.766 10.773 17.288 18.787 19.400 19.065

6 CAC40 17.802 15.025 18.787 30.947 29.904 22.774

7 GDAX 19.451 16.180 19.400 29.904 38.037 26.089

8 HSI 18.924 16.653 19.065 22.774 26.089 58.135

1 > Moments (as. matrix (rzoo)," CovMcd ")


3 SP500 18.216 14.622 13.998 17.863 20.773 17.844

4 N225 14.622 40.945 12.910 18.157 18.789 15.832

5 FTSE100 13.998 12.910 16.331 17.790 19.813 15.117

6 CAC40 17.863 18.157 17.790 28.204 28.072 17.589

7 GDAX 20.773 18.789 19.813 28.072 35.880 22.427

8 HSI 17.844 15.832 15.117 17.589 22.427 44.121






Computational Aspects1 > Moments (as. matrix (rzoo)," CovMest ")


3 SP500 17.279 13.456 13.166 16.899 18.722 16.369

4 N225 13.456 40.132 11.482 15.795 16.992 16.907

5 FTSE100 13.166 11.482 15.438 16.666 17.828 14.273

6 CAC40 16.899 15.795 16.666 27.096 25.678 16.035

7 GDAX 18.722 16.992 17.828 25.678 31.926 20.549

8 HSI 16.369 16.907 14.273 16.035 20.549 42.624

1 > Moments (as. matrix (rzoo)," CovMMest ")


3 SP500 16.126 11.966 12.328 16.043 17.746 16.295

4 N225 11.966 35.634 10.322 14.605 15.432 14.458

5 FTSE100 12.328 10.322 15.109 16.447 17.367 15.439

6 CAC40 16.043 14.605 16.447 27.571 26.430 18.110

7 GDAX 17.746 15.432 17.367 26.430 33.008 21.800

8 HSI 16.295 14.458 15.439 18.110 21.800 46.634






Computational Aspects1 > Moments (as. matrix (rzoo)," CovOgk ")


3 SP500 12.757 9.871 9.382 12.324 13.607 12.294

4 N225 9.871 27.533 8.252 11.905 12.626 11.448

5 FTSE100 9.382 8.252 10.920 11.811 12.419 10.366

6 CAC40 12.324 11.905 11.811 19.690 18.640 12.316

7 GDAX 13.607 12.626 12.419 18.640 23.188 15.027

8 HSI 12.294 11.448 10.366 12.316 15.027 32.958

1 > Moments (as. matrix (rzoo)," CovSde ")


3 SP500 16.988 14.172 12.266 16.469 19.210 18.683

4 N225 14.172 39.978 12.342 18.379 19.660 16.415

5 FTSE100 12.266 12.342 14.234 15.402 17.316 15.634

6 CAC40 16.469 18.379 15.402 25.932 25.941 18.895

7 GDAX 19.210 19.660 17.316 25.941 32.906 23.186

8 HSI 18.683 16.415 15.634 18.895 23.186 47.170






Optimal Portfolio under Value-at-Risk Constraint

Consider first the case of Elliptical Returns. Elliptical distributions (see Fang, Ng& Kotz (1991)) are interesting since they extend the Gaussian case, with similarproperties.

If X ∼ N (µ,Σ), and if

X =

X1

X2

∼ N µ1

µ2

,

Σ11 Σ12

Σ21 Σ22

• Xi ∼ N (µi,Σi), for all i = 1, · · · , d,

• α′X = α1X1 + · · ·+ αdXd ∼ N (α′µ,α′Σα),

• X1|X2 = x2 ∼ N (µ1 + Σ12Σ−12,2(x2 − µ2),Σ1,1 −Σ12Σ−1

2,2Σ21)






Elliptical Distribution







For instance, if X is elliptical, then for any α, αTX is also elliptical.

Elliptical distributions were (formally) introduced in Kelker (1970).

Spherical distributions are extensions of the N (0, I) and elliptical ones areextensions of the N (µ,Σ).

Random vector X in Rn is said to have an elliptical distribution with generator gand parameters µ and Σ, denoted X ∼ Ell(g, µ,Σ) if it has density

fX (x) = cn

|Σ|1/2 g

(12 (x− µ)T Σ−1 (x− µ)

),

où la for some normalization constant cn defined as

cn = Γ (n/2)(2π)n/2

∫ ∞0

tn/2−1g (t) dt.

If g(·) is the exponential function, we obtain the multivariate Gaussiandistribution N (µ,Σ).







Marginal distributions of elliptical ones are themselves elliptical

X ∼ Ell(g, µ,Σ)⇒ Xk ∼ Ell(g, µk, σ2k) for k = 1, . . . , n.

Furthermore, for any matrix B and vector b,

BX ∼ Ell(g, µ,Σ)⇒ b+BX ∼ Ell(g, b+Bµ,BΣBT).

For instance, one can obtain the multivariate Student t distribution with

g (t) =(

1 + t

kp

)−p,

where p should exceed n/2, and kp is some normalization coefficient.

The density of X is given by

fX (x) = cn

|Σ|1/2

[1 + (X − µ)T Σ (X − µ)

Γ (p− n/2)

]−pfor X ∈ Rn.







This multivariate Student distribution can be obtained also as follows. LetZ ∼ N (0, I) and S ∼ χ2(m) be independent. Vector Y =

√mZ/S has a Student

t distribution, with m degrees of freedom. The general distribution can then beobtained using Y ∗ = µ+ Σ1/2Y , where (Σ1/2)T(Σ1/2) = Σ.

Similarly, the multivariate Cauchy distribution has density

fX (x) = Γ ((n+ 1) /2)π(n+1)/2|ΣΣ|1/2

(1 + (X − µ) (X − µ)

)−(n+1)/2for x ∈ Rn.

One can also introduce a multivariate logistic distribution, with generator

g (t) = exp(−t)(1 + exp(−t)

)2 .






Density can then be writen

fX (x) = cn

|Σ|1/2

exp(− 1

2 (X − µ)T Σ−1 (X − µ))

(1 + exp

(− 1

2 (X − µ)T Σ−1 (X − µ)) )2 pour X ∈ Rn.

Finally, consider the following generator g (t) = exp (−rts) , where r ans s are twopositive parameters. The associated density is

fX (x) = cn

|Σ|1/2 exp(−r2(

(x− µ)T Σ−1 (x− µ))s) for any x ∈ Rn,

where normalizing constant cn is here

cn = sΓ (n/2)(2π)n/2 Γ (n/ (2s))

rn/(2s),

When s = r = 1, we recognize the Gaussian distribution, while s = 1/2 andr =√

2 is the Laplace distribution.






Allocations Based on Higher Moments

Higher moments are more complex to define in high dimension (d ≥ 2). Indimension 1, skewness and kurtosis are respectively defined as

s = m3 = E((X − E(X))3) and κ = m4 = E((X − E(X))4).

In dimension d, variance becomes a variance-covariance matrix d× d, andskewness will be d× d2 matrix. Set µ = E(X)

S = M3 = E((X − µ)(X − µ)T ⊗ (X − µ)T) = [si,j,k]

wheresi,j,k = E((Xi − µi)(Xj − µj)(Xk − µk)),

while

K = M4 = E((X − µ)(X − µ)T ⊗ (X − µ)T ⊗ (X − µ)T) = [κi,j,k,l]

whereκi,j,k,l = E

((Xi − µi)(Xj − µj)(Xk − µk)(Xl − µl)

),






where ⊗ is Kronecker product, e.g.

A =

1 32 4

, B =

1 20 1

, A⊗B =

B 3B2B 4B

=

1 2 3 60 1 0 3

2 4 4 80 2 0 4

In R, matrix product is %*% while Kronecker prodct is %x%

1 > A= matrix (1:4 ,2 ,2)

2 > B= matrix (c(1 ,0 ,2 ,1) ,2,2)

3 > A%*%B

4 [ ,1] [ ,2]

5 [1 ,] 1 5

6 [2 ,] 2 8

7 > A%x%B

8 [ ,1] [ ,2] [ ,3] [ ,4]

9 [1 ,] 1 2 3 6






10 [2 ,] 0 1 0 3

11 [3 ,] 2 4 4 8

12 [4 ,] 0 2 0 4

Consider the case of dimension d = 2 (2 assets), then the skweness matrix is

=

s1,1,1 s1,1,2 s2,1,1 s2,1,2

s1,2,1 s1,2,2 s2,2,1 s2,2,2

=[S1 S2

],

where S1 and S2 are 2× 2 matrices. Similarily,

M4 =[K1,1 K1,2 K2,1 K2,2

],

where Ki,j are 2× 2 matrices.






Allocations Based on Higher Moments

The variance was based on d(d+ 1)/2 distinct terms, the skewness is based ond(d+ 1)(d+ 2)/6 distinct terms, and the kurtosis is based on d(d+ 1)(d+ 3)/27terms.

Let X denote a random vector with mean µ, variance Σ, skewness S andKurtosis K, then for any α,

E(αTX) = αTµ and var(αTX) = αTΣα,

Similarly, skewness and kurstosis can be derived for α′X

s =(αTS(α⊗α)

)1/3 while κ =(αTK(α⊗α⊗α)

)1/4.






Cornish-Fisher Decomposition to Approximate Value-at-Risk

In a Gaussian model X ∼ N (µ, σ), the α-VaR is

VaRα(X) = −µ+ Φ−1(1− α) · σ = −E(X) + Φ−1(1− α) ·√V ar(X),

where Φ is the c.d.f of the N (0, 1) distribution.

Using here a Gaussian approximation

V aR(X,α) = −E(X) + Φ−1(1− α) ·√V ar(X),

for non-Gaussian variables might be a bad approximation.

Use the Normal-Power approximation, or Edgeworth expansion.

Cornish-Fisher approximation (from Cornish & Fisher (1950) or Hill & Davis(1968) ) is based on a higher moment expansion of a normalized version of X,[X − E(X)]/σ







SetVaRα(X) ∼ −E(X) + z1−α ·

√V ar(X),

where z1−α is

Φ−1(1−α)+ζ1

6 [Φ−1(1−α)2−1]+ ζ2

24 [Φ−1(1−α)3−3Φ−1(1−α)]− ζ21

36 [2Φ−1(1−α)3−5Φ−1(1−α)],

where ζ1 is the skewness, and ζ2 is the excess-kurtosis

ζ1 = E([X − E]3)E([X − E]2)3/2 and ζ2 = E([X − E]4)

E([X − E]2)2 − 3.

Given a n sample X1, · · · , Xn, the Cornish-Fisher estimation of the α-quantileis

qn(α) = µ+ zασ, where µ = 1n

n∑i=1

Xi and σ =

√√√√ 1n− 1

n∑i=1

(Xi − µ)2,







and

zα = Φ−1(α)+ ζ1

6 [Φ−1(α)2−1]+ ζ2

24 [Φ−1(α)3−3Φ−1(α)]− ζ21

36 [2Φ−1(α)3−5Φ−1(α)],

where ζ1 is the natural estimator for the skewness of X, and ζ2 is the natural

estimator of the excess kurtosis, i.e. ζ1 =√n(n− 1)n− 2

√n∑ni=1(Xi − µ)3

(∑ni=1(Xi − µ)2)3/2 and

ζ2 = n− 1(n− 2)(n− 3)

((n+ 1)ζ ′2 + 6

)where ζ ′2 =

n∑ni=1(Xi − µ)4

(∑ni=1(Xi − µ)2)2 − 3.






Numerical Optimization

The goal is to find

x∗ = Argminf(x) or x∗ = Argminf(x)

where with a very general setting, f is a Rp → R function.

Construct a sequence (xn) or (xn) following the gradient f ′(xn) or ∇f(xn).Itetatively, starting from x0 or x0 construct

xn+1 = xn − µn · f ′(xn) or xn+1 = xn − µn · ∇f(xn)

where µn minR is obtained at each step using

f(xn+1) = infµ∈Rxn − µf ′(xn) or f(xn+1) = inf

µ∈Rdxn − µ∇f(xn)

In dimension 1, if f is differentiable and β-convex, i.e.

f

(x+ y

2

)≤ f(x) + f(y)

2 − β

8 ‖x− y‖2,






and if f ′ is Lipschitz on any bounded set, i.e. for all M , there is CM such that if

‖x‖+ ‖y‖ ≤M, ‖f ′(x) + f ′(y)‖ ≤ CM‖x− y‖,

then the sequence (xn) converges towards the optimal value, whatever x0.

Fixed Step Gradient Descent

Consider here a sequence (xn) or (xn) constructed iteratively using

xn+1 = xn − µ · f ′(xn) or xn+1 = xn − µ · ∇f(xn),

where µ > 0 is fixed.

The algorithm is simple, but there are more restictive conditions to insureconvergence.

If f is differentiable and β-convex, i.e.

f

(x+ y

2

)≤ f(x) + f(y)

2 − β

8 ‖x− y‖2,






and if f ′ is Lipschitz, i.e. there exists C such that

‖f ′(x) + f ′(y)‖ ≤ C‖x− y‖,

then if 0 < µ < 2β/C2, the sequence (xn) converges to the optimal solution,whatever x0.

However, if we can obtain convergence, the speed of convergence is at leastgeometric since

‖xn − x∗‖ ≤ κn‖x0 − x∗‖ where κn =√

1− 2βµ+ µ2C2.

Using Newtonś Algorithm

Suppose f : Rn → Rn twice differentiable, and x∗ be a singular zero of f , in thesense that

f(x∗) = 0 and ∇f(x∗) is an invertible matrix.

Taylor’s expansion, in the neighborhood of x yields

f(x) = f(x∗) +∇f(x∗)(x− (x∗)) +O(‖x− x∗‖2).






If we neglect the remaining term, we construct a sequence (xn) en posant, àpartir d’une valeur initiale x0,

xn+1 = xn −∇f(xn)−1f(xn) for all n = 0, 1, 2, · · ·

If f : Rn → Rn is twice differentiable, and x∗ is a singular zero of f , thensequence (xn) converges to the optimal decision whatever x0, and furthermore,there is C > 0 such that

‖xn+1 − x∗‖ ≤ C‖xn − x∗‖2.

This algorithm is simple, but it is based on solving a linear system.






Numerical Aspects1 > asset . names = c("MSFT", "NORD", "SBUX")

2 > er = c(0.0427 , 0.0015 , 0.0285)

3 > names (er) = asset . names

4 > covmat = matrix (c(0.0100 , 0.0018 , 0.0011 , 0.0018 , 0.0109 , 0.0026 ,

0.0011 , 0.0026 , 0.0199) ,nrow =3, ncol =3)

5 > rk.free = 0.005

6 > dimnames ( covmat ) = list( asset .names , asset . names )

One can also use internal R function for most computations, see1 > load("http:// freakonometrics .free.fr/ portfolio .r")

from Zivot (2013).1 > ew = rep (1 ,3)/3

2 > equalWeight . portfolio = getPortfolio (er=er ,cov.mat=covmat , weights =

ew)

3 > equalWeight . portfolio

4 Call:


https://faculty.washington.edu/ezivot/econ424/





5 getPortfolio (er = er , cov.mat = covmat , weights = ew)

6 Portfolio expected return : 0.02423

7 Portfolio standard deviation : 0.07587

8 Portfolio weights :

9 MSFT NORD SBUX

10 0.3333 0.3333 0.3333

11 > plot( equalWeight . portfolio )

To get the minimum variance portfolio, use1 > gmin.port <- globalMin . portfolio (er , covmat )

2 > attributes (gmin.port)

3 $ names

4 [1] "call"

5 "er"

6 "sd"

7 " weights "

8 $ class

9 [1] " portfolio "






10 > gmin.port

11 Call:

12 globalMin . portfolio (er = er , cov.mat = covmat )




16 MSFT NORD SBUX

17 0.4411 0.3656 0.1933

or for any efficient portfolio, with a given target expected return1 > target . return <- er [1]

2 > e.port.msft <- efficient . portfolio (er , covmat , target . return )

3 > e.port.msft

4 Call:

5 efficient . portfolio (er = er , cov.mat = covmat ,

6 target . return = target . return )









10 MSFT NORD SBUX

11 0.8275 -0.0907 0.2633

We can also consider the case where a risk-free asset is available1 > tan.port <- tangency . portfolio (er , covmat , rk.free)

2 > tan.port

3 Call:

4 tangency . portfolio (er = er , cov.mat = covmat , risk.free = rk.free)




8 MSFT NORD SBUX

9 1.0268 -0.3263 0.2994

1 > ef <- efficient . frontier (er , covmat , alpha .min=-2, alpha .max =1.5 ,

nport =20)






Numerical Aspects

One can also use standard R packages for most computations.1 > library ( FRAPO )

2 > library ( PerformanceAnalytics )

3 > library ( fPortfolio )

(to be continued...)





slides portfolio-2017-2

Economy & Finance