estimating risk and return of infrequently-traded assets ...web.math.ku.dk/~rolf/ms_211207.pdf ·...

30
1 Preliminary and Incomplete Comments Welcome Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model of Venture Capital Arthur Korteweg Morten Sorensen October, 2007 Abstract: When estimating risk and return of infrequently traded assets, a bias arises due to heteroscedasticity and sample selection. We present a Bayesian model to resolve these problems, and estimate the model using data with VCs’ investments in entrepreneurial companies. Correcting for the bias, we find that the intercept of the market model is significantly lower, and both systematic and idiosyncratic risks are significantly higher, than suggested by standard methods. The results are robust across specifications, and the estimates of the selection equation provide new insights into determinants of refinancing decisions for these companies. We present a new empirical model of the risk and return of infrequently traded assets and apply it to investments in entrepreneurial companies by venture capital (VC) investors. A Stanford University GSB and University of Chicago GSB and NBER. We are grateful to John Cochrane for very helpful discussions and suggestions and to Susan Woodward of Sand Hill Econometrics for generous access to data and the Center for Research in Security Prices (CRSP) and the Kauffman Foundation for financial support.

Upload: others

Post on 24-Jun-2020

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

1

Preliminary and Incomplete

Comments Welcome

Estimating Risk and Return of Infrequently-Traded Assets: A

Bayesian Selection Model of Venture Capital

Arthur Korteweg

Morten Sorensen†

October, 2007

Abstract: When estimating risk and return of infrequently traded assets, a bias arises due

to heteroscedasticity and sample selection. We present a Bayesian model to resolve these

problems, and estimate the model using data with VCs’ investments in entrepreneurial

companies. Correcting for the bias, we find that the intercept of the market model is

significantly lower, and both systematic and idiosyncratic risks are significantly higher,

than suggested by standard methods. The results are robust across specifications, and the

estimates of the selection equation provide new insights into determinants of refinancing

decisions for these companies.

We present a new empirical model of the risk and return of infrequently traded assets and

apply it to investments in entrepreneurial companies by venture capital (VC) investors. A

† Stanford University GSB and University of Chicago GSB and NBER. We are grateful to John Cochrane for very helpful discussions and suggestions and to Susan Woodward of Sand Hill Econometrics for generous access to data and the Center for Research in Security Prices (CRSP) and the Kauffman Foundation for financial support.

Page 2: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

2

number of assets, such as privately held companies, real estate, corporate bonds, some

collateralized bonds and other OTC securities are only infrequently traded. Since assets’

market values are only known when they trade, data with these assets’ valuations and

returns are necessarily sporadic. When the current value reflects an earlier trade, this

leads to “stale-price” problem (Scholes and Williams (1977) and Dimson (1979)). In

addition, when the timing of the trades, and hence the timing of the observed valuations

and returns, is endogenous, a potentially important sample selection problem arises (see

Woodward (2004) and Cochrane (2005)). Focusing on the latter problem, we present a

dynamic sample selection model and use it to estimate the risk and return of VC

investments.

Our main findings are that, correcting for the selection bias, the investments are

more risky than previously found. Our estimates of β range from 2.6 to 3.0. Estimates of

the Fama-French three-factor model result in loadings on the SMB factor of 0.9 to 1.1

and loadings on the HML of -1.7 to -2.0. Not surprisingly, entrepreneurial companies

behave much like small growth companies. Moreover, we find that the intercepts in the

market model decline substantially after correction for the selection.

Our model extends previous models by Woodward (2004) and Cochrane (2005).

We find that it is more flexible and numerically tractable, and this allows us to impose

additional restrictions implied by the selection model and to estimate a more flexible

specification of the selection equation. We estimate industry-specific risk and returns,

and we find substantial variation across the major industry groups. Further, we estimate a

Page 3: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

3

Fama-French three-factor specification and, perhaps not surprisingly, find that the returns

of entrepreneurial companies load heavily on the size (small) and M/B (growth) factors.

The empirical model combines a dynamic asset pricing model with a selection

model. We assume that the valuations of entrepreneurial companies develop according to

a standard CAPM or three-factor log-normal model. However, most of these valuations

unobserved and must be treated as latent variables in the model. Only when a company

receives a new round of financing, goes public, or is acquired is its valuation observed,

and to account for the potential endogeneity of these events, we add a selection equation

to the model. For each company, at each point in time, this equation specifies the

probability of observing the company’s valuation as a function of the valuation itself, the

time since last refinancing, and general market conditions.

Estimation of the model is numerically difficult. For each company, at each point

in time, the selection equation defines a latent selection variable, which interacts with the

latent valuation variables and the other variables in the model. To estimate the model, it

is necessary to account for the entire joint distribution of the latent selection and

valuations variables, and evaluation of the likelihood function requires simultaneously

integrating all these variables, which is practically infeasible due to the “curse of

dimensionality.” To overcome this numerical problem, we estimate the model using a

Bayesian methodology, based on a Markov Chain Monte Carlo (MCMC) method known

as Gibbs sampling. The model is constructed such that the problem of simulating the

parameters’ posterior distribution can be separated into three smaller interrelated

problems: a Bayesian regression, a draw of truncated random variables, and a Kalman

Page 4: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

4

Filtering problem. Each of these problems is well understood and numerically easy, and

the posterior distribution can be simulated by solving these smaller problems in an

iterative way specified by the Gibbs sampling procedure.

A. Previous Literature

Our analysis is closely related to Woodward (2004) and Cochrane (2005). Like

them, we develop a statistical selection model for the valuations of the entrepreneurial

companies, but we extend the models in important ways. Unlike Woodward (2004), we

explicitly model the path of the unobserved market values and impose all restrictions

implied by the selection model. Similarly to the standard Heckman (1979) sample

selection model, the selection model specifies that a valuation is only observed when a

selection variable exceeds a certain threshold. Conversely, the valuation is unobserved

when the valuation is below this threshold. However, in contrast to Heckman (1979),

observations with unobserved outcomes contain important information about the outcome

or valuation variables, because the time-series nature of the data introduces serial

correlations in the values. In other words, in the standard selection model the

observations with unobserved outcomes are used to estimate the first stage but not the

second stage. Here, the serial correlation in valuations means that the restrictions on

valuations implied by observations where valuations are unobserved must also be

imposed in the second stage.

Compared to Cochrane (2005), our model is numerically more tractable, and we

are able to estimate more flexible specifications. In particular, Cochrane (2005) assumes

that the probability of observing a valuation through a refinancing or exit event is a

Page 5: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

5

function of the valuation alone. While the valuation is clearly an important determinant of

these events, this specification may be overly parsimonious. The specification implies

that a company with high valuations should refinance each single period, which does not

happen in practice. By including the time since last financing round as an additional

variable in the selection equation, we capture the infrequent nature of these events.

Further, we include variables with the aggregate investment activity of VCs in the

market. These variables are related to the probability of an entrepreneurial company

receiving refinancing but independent of the company’s value (conditional on the market

return), and provide exogenous variation in the selection equation. It is well known that

such exogenous variation helps the statistical identification of selection model generally.

In addition, Ljungqvist and Richardson (2003), Kaplan and Schoar (2005),

Driessen, Lin and Phallippou (2007), and Phalippou and Zollo (2006) estimate various

aspects of the risk and return of investments in entrepreneurial companies from the cash

flows between VCs and their limited partners (LP), sometimes summarized in the funds’

IRRs. These flows capture the entire net return that VCs earn on their investors (net of

fees), but we find that there are some advantages to estimating the risk and return from

the market values of the individual portfolio companies instead. First, we have more

observations. Clearly, there are more individual entrepreneurial companies than VC

funds, and the total return earned by a fund necessarily reflects an average return earned

across a portfolio of companies over a long period of time. It is not possible to attribute

this return to a particular time periods, which makes it difficult to calculate betas and

exposure to risk factors more generally, and it is hard to calculate, for example, industry

specific effects.

Page 6: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

6

Second, and perhaps more importantly, the estimation of risk and return from cash

flows alone faces an identification problem. To illustrate this problem, consider two VCs,

both with an initial endowment of $1 and both investing for two periods, after which their

entire capital is paid out. Their returns are given by the CAPM model, the market returns

are 10% and -11.2% in the two periods, and both VCs earn zero α. VC One has a β of

one and pays out 10% of his total capital after the first period. Hence, the two cash flows

observed for this investor are (1 + 10%) x 10% = 1.1 and (1 + 10%) (1 - 10%) (1 -

11.2%) = 0.879. VC Two has a β of negative one and pays out 12.2% of her total capital

after the first period. Hence, the observed cash flows for VC Two are also (1 - 10%) x

12.2% = 1.1 and (1 - 10%) (1 - 12.2%) (1 + 11%) = 0.879. The identification problem is

obvious. The two investors have opposite betas, yet return exactly the same cash flows to

their investors. When these are the observed cash flows, it is not possible to determine

whether the investor has a positive or a negative β. If α is allowed to differ from zero, this

identification problem becomes even more problematic. To our knowledge, this

identification problem has not been formally solved, and the risk and return calculations

that are based on observed cash flows (or IRRs) implicitly rely on some (unspecified)

combination of functional form and distributional assumptions, assumptions about funds’

payout ratios, and assumptions about parameters being equal across funds and time

periods. Obviously, when the estimates are based directly on market values, instead of

cash flows, this identification problem disappears.

The paper proceeds as follows. Section one presents the econometric model.

Section two presents the data. Section three contains a discussion of the empirical results,

and Section four concludes.

Page 7: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

7

I. Econometric Model

A. The Valuation Equation

Let the market value of company i be denoted Vi and assume that this value develops

according to a continuous-time log-normal market model (the three-factor models arises

from a standard extension of this specification).

α β⎛ ⎞

− = + − +⎜ ⎟⎝ ⎠

i mi im i

i m

dV dPrdt rdt dw

V P (1)

Here Pm is the value of the market portfolio, r is the risk-free rate, and wi is a standard

Brownian motion with volatility σ 2w . Starting from this continuous-time CAPM allows us

to define the model’s parameters independently of the timing between the observed

valuations. This is important when comparing the MCMC estimator to the OLS and GLS

estimators below. An additional advantage is that this specification implies that the

discrete-time evolution of the log-valuations is linear, which allows us to draw on

standard techniques from Kalman Filtering. The non-linear Kalman Filter presents

substantially greater challenges.

To derive the discrete-time change in the valuation, we use Ito’s lemma to write

σ α β σ⎛ ⎞⎛ ⎞ ⎛ ⎞+ − = + + − +⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

2 21 1ln ln

2 2i i i im m m id V r dt dt d P r dt dw (2)

Page 8: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

8

where σ 2i and σ 2

m are the instantaneous variances of iV and mP , respectively, and the

relationship between the variances is σ σ β σ= +2 2 2 2i w im m . The discrete-time distribution of

the return until time t is

σ α β β σ⎛ ⎞ ⎛ ⎞+ − = + + − +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ∫2 2

0

( ) ( )1 1ln ln

(0) 2 (0) 2

ti m

i i im im m ii m

V t P tr t t r t dw

V P (3)

Rearranging leads to

( )α σ β β σ β ε⎛ ⎞⎛ ⎞− = − + − + − +⎜ ⎟⎜ ⎟

⎝ ⎠ ⎝ ⎠2 2( ) ( )1 1

ln 1 ln(0) 2 2 (0)i m

i w im im m im ii m

V t P trt t rt

V P (4)

Define = ( ) / (0)m m mr P t P and =fr rt . Let the intercept of the model be denoted

( )α α σ β β σ= − + −2 21 11

2 2i i w im im m , and let ( )εε σ 2~ 0,i N for εσ σ=2 2wt . Then αi is the

one-period intercept of the log-market model. While this is not the abnormal return, this

intercept is useful for comparing estimates arising from different statistical models. We

can now write the one-period valuation equation for the econometric model as

( )( )α β ε+ = + + + − +ln ( 1) ln ( ) lni i f i im m f iV t V t r r r (5)

B. The Selection Equation

Most valuations are unobserved and are latent variables in the model. A valuation

is only observed when the company experiences a refinancing or exit event, and we use

the following selection equation to account for the endogeneity of these events. For

company i, at time t, define the selection variable ( )iw t as

Page 9: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

9

γ η′= +( ) ( ) ( )i i iw t X t t (6)

Here, Xi(t) is a vector of characteristics (including a constant term) which affects whether

a refinancing or exit event happens. One of these characteristics is typically the asset’s

own valuation or the return earned, either in total or since the previous financing round.

This is natural, since more successful companies are more likely to receive additional

financing, be acquired, or go public. Other characteristics are the time since the previous

financing round and variables capturing the general market conditions. The error term

ηη σ 2( ) ~ (0, )i t N is assumed i.i.d. and captures unobserved shocks. The valuation ( )iV t is

observed when ≥( ) 0iw t , and we define

[ ]= ≥( ) 1 ( ) 0i id t w t (7)

such that ( )id t equals one when a valuation is observed and is zero otherwise. Overall,

the model assumes that the population distribution of ( )( ), ( ), ( ), ( ) ( ), ( )i m f i i iX t r t r t d t V t d t is

observed in the data.

C. Gibbs Sampler

To estimate the model using Gibbs sampling (Geman and Geman (1984) and

Gelfand and Smith (1990)), the variables are divided into three blocks: The first block

contains the parameters, ( )θ α β γ σ= 2, , , . The second block contains the latent selection

variables, ( )iw t . And the third block contains the latent valuations, ( )iV t . The Gibbs

sampler simulates the joint posterior distribution of all these variables by iteratively

simulating the variables in each block conditional on the previous values of the variables

Page 10: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

10

in the other blocks. Note that the resulting posterior distribution, is not just the posterior

distribution over the parameters in the model, but the distribution is augmented with the

latent selection and valuation variables. This augmentation technique is introduced by

Tanner and Wong (1987) and significantly improves the numerical tractability of the

model. The model simulated using 500 iterations for burn-in followed by 500 iterations

for the actual estimation. The simulation of the variables in each of the three blocks is

described next.

First, the parameters are sampled using a Bayesian regression. The parameters of

the valuation equation are estimated by a Bayesian regression, in which all the valuations

(both observed and latent) are regressed on the returns to the market and a constant term.

It is well know that, conditional on the variance, the posterior distribution of these

parameters is the Normal distribution

( )2( , | ) ~ ,Nα β σ μ Σ (8)

and the distribution of the variance is the Inverse Gamma distribution

2 ~ ( , )IG a bσ (9)

The parameters in the Normal distribution are

( )

( )

1 20 0

11 20

' /

' /

Z Y

Z Z

μ μ σ

σ

−−

= Σ Σ +

Σ = Σ + (10)

where the vector Y contains the monthly excess returns (in logs) computed from the latent

valuations ( )iV t . The matrix Z contains a constant term and the monthly excess market

Page 11: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

11

returns (in logs), corresponding to the excess returns. The parameters in the Inverse

Gamma distribution are

0

0

12'2

ii

a a T

e eb b

= +

= +

∑ (11)

where Ti is the number of months between the first and last observation for firm i, and a0

and b0 represent the prior distribution for 2σ , which is 0 0( , )IG a b . The vector e contains

the stacked idiosyncratic returns, ( )i tε ,

For the selection equation, the posterior distribution of the parameters is found by

a Bayesian regression of the selection variables on the exogenous variables. The scale of

this equation is not identified, and it is normalized by setting the variance equal to one.

As a result, the posterior distribution of γ is simply

~ ( , )Nγ θ Ω (12)

where

( )( )

110

10 0

'

'

X X

X Wθ θ

−−

Ω = Ω +

= Ω Ω + (13)

The matrix X contains Xi(t) stacked, and W is the stacked vector ( )iw t . The prior

distribution of the parameters, is a Normal distribution with covariance matrix Ω0 and

mean θ0 .

Page 12: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

12

In the second block, the selection variables are sampled conditional on the

previous values of the valuation variables and parameters. The conditional distribution of

the selection variables follows directly from equations (5) to (7).

( )( ) ~ ( ) ,1i iw t TN X t γ′ (14)

where TN denotes a truncated Normal distribution. The truncation is at zero from above

when the corresponding valuation is unobserved and at zero from below when it is

observed.

In the final block, the latent valuation variables are simulated. This block is the

most complex part of the estimation procedure. After conditioning on the parameters and

the selection variables, the evolution of the valuation variables corresponds to a Kalman

Filter, which filters out the unobserved valuations using the observed valuations and the

restrictions imposed by the selection equation. To simulate the conditional posterior

distribution of these valuations, we rely on the Forward Filtering Backwards Sampling

(FFBS) procedure by Carter and Kohn (1994). This procedure allows us to draw a sample

path of all a company’s valuations from the posterior distribution of this path implied by

the observed valuations and the selection equation. In the terminology of the Kalman

Filter, the latent valuation variables are state variables, and the transition rule is the

valuation equation

( ) ( ) ( )ln ( 1) ln ( ) ( ) ( )i i i f m f iV t V t r r t r tα β ε+ = + + + − + (15)

Page 13: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

13

In the filtering terminology, iα and ( )β −( )m fr t r are observed controls, acting on the

state. Depending on whether the valuation is observed or not, the system has one or two

observation equations. Since we condition on the selection variables, these are always

“observed,” and they are given by the equation

γ η′= +( ) ( ) ( )i i iw t X t t (16)

When a valuation is observed, this observed valuation is modeled using a second

observation equation

( ) ( )= *ln ( ) ln ( )i iV t V t (17)

Here *iV is the observed valuation, iV is the valuation in the Kalman Filter. These two

valuations are equal, since valuations are assumed observed without error.

D. Prior Distributions

We use diffuse priors for the model parameters. The prior mean of α is zero and

we use the GLS estimate of β as its prior mean. The prior variance of both α and β are

chosen to be equal to 1000 times their estimated GLS variance (which is typically around

.008). The prior distribution of 2σ is an Inverse Gamma distribution with parameters 2.1

and 600. This distribution implies that ( ) 0.12E σ = and σ is between 1 and 50 percent

per month with 99% probability.

For the parameters in the selection equation, we take the prior distributions to be

Normal with prior variances of 100. The intercept has a prior mean of -1, and the prior

Page 14: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

14

mean of the log valuations is 1. The prior mean coefficient on time since last refinancing

and its square are 1 and -0.1. All other coefficients have means of zero. The prior

distributions are diffuse relative to the resulting posterior distributions, and the estimates

are largely robust to changes in the means and variances of the prior distributions, and the

estimated coefficients reflect information in the data rather than in the priors.

II. Description of Data

Monthly market returns and Fama-French portfolios are downloaded from

Kenneth French’s website. The factor returns are constructed from all NYSE, AMEX and

NASDAQ firms in CRSP. Monthly Treasury-Bill rates are from Ibbotson Associates, and

are also available on Ken French’s website.

A. Venture Capital Data

The data with VCs’ investments in entrepreneurial companies are provided by

Sand Hill Econometrics (SHE), a commercial data provider. The data contain the

majority of US investments, in the period from 1987 to 2005. SHE combines and extends

two commercially available databases, Venture Xpert (formerly Venture Economics) and

VentureOne. These two databases are extensively used in the VC literature, and Gompers

and Lerner (1999) and Kaplan, Sensoy and Strömberg (2002) investigate the

completeness of the Venture Xpert data and find that they contain the majority of the

investments and missing investments tend to be less significant ones. In addition, SHE

has spent substantial amount of time and effort to ensure the accuracy of the data. This

includes removing investment rounds that did not actually occur, adding investment

Page 15: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

15

rounds that were not in the original data, and consolidating rounds, so that each round

corresponds to a single actual investment by one or more VC. Cochrane (2005) uses

similar data, but our version is more recent and many of the previously encountered data

problems have been resolved.

B. Calculating Valuations

VCs distinguish between the pre- and post-money valuation of an investment, and the

data contain both of these valuations for a large number of the investments. When a VC

invests I in a company with a total value of PVPOST after the investment (the post-money

valuation), the PVPRE (the pre-money valuation) is defined by = +POST PREPV PV I .

To illustrate, imagine VC A invests $1m in a company with a pre-money

valuation of $2m and a post-money valuation of $3m. This implies that VC A receives

1/3 of the shares of the company. In the next round, VC B invests $4m, at a pre-money

valuation of $6m and a post-money valuation of $10m. The issuance of new shares to VC

B dilutes VC A’s ownership fraction to 1/5, and the value of VC A’s shares is 1/5 x

$10m. = $2m. This number can also be calculated using VC A’s original ownership

fraction and the pre-money valuation in the second round, as 1/3 x $6m = $2m. In short,

we calculate the investor’s return from round t to round t+ using the formula

+

+

= −,

( )1

( )PRE

t tPOST

PV tr

PV t (18)

Page 16: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

16

We can use this relationship to construct a new valuation variable that strips out

the dilution of the original ownership. Starting from V(0) = 1, future values of this

valuation are calculated using the equation

+

+

=,

( )

( ) t t

V tr

V t (19)

and the resulting valuations are used as the observed valuations in the estimation

procedure.

Note that this variable can only be constructed when pre- and post-money

valuations are observed for consecutive rounds. When they are not observed for an

intermediate investment round, it is impossible to adjust for the dilution and the valuation

variable is “restarted” after the break. Note that while this reduces the number of

valuations used for the estimation, it does not introduce any bias.

C. Descriptive Statistics

We observe a total of 15,169 financing rounds, of which 9,637 have data on

valuations. The observations are concentrated in the Biotech and IT industries. There is a

total of 3,237 unique start-ups. A third of these firms ultimately go public, another 742

are acquired, and we have explicit information about 444 companies being liquidated.

There is no information about the fate of the remaining 940 firms in the data. Some of

these may be alive and well, some may be “zombies,” and most have likely been

liquidated. The empirical model must deal with this uncertainty about unobserved

outcomes.

Page 17: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

17

On average, an entrepreneurial firm receives 4.7 financing rounds (the median is 4

rounds). However, this distribution is highly skewed, with some firms having as many as

9 rounds. On average, 12.4 months (the median 10 months) pass between consecutive

rounds. This distribution is highly also skewed. While 5% of follow-on investments occur

after as little as 2 months, another 5% take more than 32 months. The observed returns

between rounds are 129% on average, with an enormous standard deviation of 340%.

**** TABLE I ABOUT HERE ****

III. Empirical Results

We first present estimates of the CAPM model and discuss the econometric issues

in the context of this model. Then we present and discuss the effect of major industry

groups and time periods, different risk-factors, and market-wide determinants of the

return to VC investments.

We compare estimates arising from a standard OLS, a GLS, and our MCMC

procedures. The OLS and GLS estimators are standard estimation procedures, which

estimate the risk and return in a regression framework without accounting for the

endogeneity of the observed returns. For these two estimators, we calculate the excess

log-return and regress it on the corresponding excess log-return on the market. To

facilitate the comparisons of the various estimators, these returns are calculated from the

basic specification in equation (4). In particular, the following equation is estimated using

OLS

Page 18: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

18

( ) ( ) ( ) ( ), , ,ln ( ) / ( ) ln( ) ln lnt t t t t ti i f OLS OLS m f OLSV t V t r t t r rα β ε

+ + ++ + ⎡ ⎤− = − + − +⎣ ⎦ (20)

For each observed return, +t and t represent the time of the current and the previous

valuations, respectively.

The GLS estimator adjusts for the heteroscedasticity in equation (20), arising

since valuations that are further apart have more volatile error terms. Formally, equation

(4) implies that ( )2~ 0, ( )OLS wN t tε σ+ − , and the GLS estimator normalizes this variance

by dividing equation (20) by the square root of the time between observed valuations.

Hence, the GLS estimator is estimated from the specification

( ) ( )

, , ,ln ( ) / ( ) ln( ) ln( ) ln( )t t t t t ti i f m f

GLS GLS GLS

V t V t r r rt t

t t t tα β ε

+ + +++

+ +

− −= − + +

− − (21)

A. Heteroscedasticity, OLS, GLS

Estimates of the OLS, the GLS, and the MCMC models are presented in Table II.

All these estimates are calculated without accounting for the endogeneity of the observed

returns. We observe that the intercept of monthly log-returns from the OLS and the GLS

models are 0.91%OLSα = and 2.39%GLSα = , respectively. The difference between these

two estimates is statistically significant, but difficult to interpret (Cochrane (2005) finds

an annual intercept of 462%).

To turn these estimates into annual returns, we correct for the log-linearization

and add 21 2σ to the intercept. The estimated idiosyncratic risk in the OLS regression is

88%. However, due to the heteroskedasticity, it is not clear how to interpret this value.

Page 19: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

19

The GLS estimation finds a volatility of 32.66%. Correcting the intercept, leads to an

estimate of the monthly abnormal return of 7.7%, a substantial return.

**** TABLE II ABOUT HERE ****

B. The Selection Equation

In table II we observe that, ignoring selection, the GLS and MCMC methods

produce similar estimates. To investigate the magnitude of the selection problem, we

compare MCMC with and without the selection equation. In table III we observe that the

log-market model intercept declines from 2.13% to somewhere between -2.11% and -

3.45%, depending on the specification of the selection model. The β increases from the

initial estimate of 1.6152 to somewhere between 2.6377 and 3.0157, again depending on

the selection model. The idiosyncratic risk estimates also increase from 32.43% to

somewhere between 39.49% and 45.74%. The decrease in the intercept and the increases

in the β and volatility estimates are the expected changes in the estimates following the

inclusion of the selection equation in the model.

The estimates of the parameters in the selection equation are also sensible. Across

all specifications, the probability of observing a valuation depends positively on the

return since the previous financing round. As the time since last financing round (τ)

increases, it is more likely that a new financing round occurs (in other words, the firm can

trade off return and time since last refinancing). However, when the time since previous

financing round becomes sufficiently large, the τ2 term dominates, and the chance of

refinancing declines.

Page 20: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

20

For the identification of selection models, it is important to have independent

variation in the selection equation. We use the number of acquisitions or public offerings

of VC backed companies to provide this variation, called ACQ and IPO, respectively.

The identifying assumption is that, conditional on the current valuation, ACQ and IPO do

not contain any information about the future returns earned by the company. This

assumption is true as long as investors rationally incorporate all beliefs about future

returns into the current valuation. In specification 3 and 4 we find that ACQ and IPO

enter positively and significantly. When there is more activity in the aggregate VC

market, it is more likely that a given firm obtains a new financing round.

C. Estimating a Three-Factor Model

To provide further insights into the determinants of the risk and return to

investments in entrepreneurial companies, Table IV presents estimates of a Fama-French

three-factor specification. Not surprisingly, we find a large positive loading on the market

factor, indicating that entrepreneurial companies are very exposed to general market

conditions. We find a positive loading on the SMB factor, indicating that the value of

entrepreneurial companies behave like values of small stocks. Finally, we find negative

loadings on the HMB factor, indicating that our valuations behave like valuations of

companies will small book-to-market ratios (growth companies). Perhaps the more

surprising finding is the large magnitudes of these factor loadings. Controlling for these

factors leaves the α largely unchanged.

Page 21: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

21

D. Industry Betas

Table V presents estimates of alphas and betas for the four major industry

classifications: Biotech, IT, Retail and Other. There are substantial differences in the risk

and return across these four industries, with IT consistently having the highest betas, and

Biotech the lowest betas. Correspondingly, Biotech has the largest estimated monthly

intercepts, while the lowest intercepts are found for Retail and Other investments.

IV. Conclusion

When estimating risk and return for assets with infrequently observed market

valuations, a number of empirical problems arise. When the duration between the

observed valuations is variable, this introduces heteroschedasticity in the standard OLS

approach. We show that a straightforward GLS correction, resolves this problem, and

does change the estimated parameters substantially. Moreover, when the timing of the

observed valuations is endogenous, this creates a sample selection problem. To resolve

this problem, we introduce a new sample selection model.

We estimate our model using a Bayesian approach, relying on MCMC methods.

The model generalizes previous models by Cochrane (2005) and Woodward (2004). Like

Cochrane (2005), it explicitly models the underlying path of unobserved valuations,

which is important for imposing the constraints of the selection model in the instances

where valuations are unobserved. Unlike Cochrane (2005), it is numerically more

tractable and allows for more flexible and realistic specifications of the selection

equation, which is important for accurately correcting for selection bias and provides

Page 22: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

22

insights into the process governing refinancing and exit events for entrepreneurial

companies.

Our first result is that correcting for heteroscedasticity in the error terms is

important for the estimates. The GLS estimates and the MCMC estimates (ignoring

selection) correct for the heteroscedasticity in similar ways. In addition, when correcting

for the selection bias, the MCMC procedure finds significantly higher risk exposures,

both in terms of systematic and idiosyncratic risks, and lower intercepts. These findings

are robust across a number of specifications of the pricing model, including CAPM, the

Fama-French three-factor, and industry specific loadings on the market factor. Further,

the results are robust to various specifications of the selection equation. We include the

underlying valuations to correct for the endogeneity. We include the time since previous

financing round to control for the fact that even well performing companies only receive

financing with some lags. Further, we include market-wide variables, such as the total

number of companies receiving VC financing and the total amount invested during the

same month. These market-wide variables provide exogenous variation in the selection

equation, since it is reasonable to assume that, conditional on the entrepreneurial

company’s valuation, they are independent of its future returns.

We are reluctant to interpret the intercepts as abnormal returns for two reasons.

First, the investors and the entrepreneurs cannot earn these returns, since they cannot

rebalance they portfolio and since the calculation does not adequately correct for the fact

that most investments end with returns of negative one hundred percent for everybody.

Second, translating the intercept in the log-normal model into an abnormal return in the

Page 23: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

23

standard arithmetic model is not a trivial task either. Both of these issues are issues we

are currently tackling.

Page 24: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

24

Bibliography

Carter, C., and R. Kohn, 1994, On Gibbs Sampling for State Space Models, Biometrika 81, 541-553.

Cochrane, John, 2005, The Risk and Return of Venture Capital, Journal of Financial Economics 75, 3-52.

Dimson, E., 1979, Risk Measurement When Shares are Subject to Infrequent Trading, Journal of Financial Economics 7, 197-226.

Driessen, Joost, Tse-Chun Lin, and Ludovic Phallippou, 2007, Measuring the risk of private equity funds: A new approach, working paper.

Gelfand, Alan, and Adrian Smith, 1990, Sampling Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Association 85, 398-409.

Geman, S., and D. Geman, 1984, Stochastic Relaxation, Gibbs Distributions, adn the Bayesian Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721-741.

Gompers, Paul, and Josh Lerner, 1999. The Venture Capital Cycle (MIT Press, Cambridge).

Heckman, James, 1979, Sample Selection Bias as a Specification Error, Econometrica 47, 153-162.

Kaplan, Steven, and Antoinette Schoar, 2005, Private Equity Performance: Returns, Persistence, and Capital Flows, Journal of Finance 60, 1791-1823.

Kaplan, Steven, Berk Sensoy, and Per Strömberg, 2002, How Well do Venture Capital Databases Reflect Actual Investments? working paper, University of Chicago.

Ljungqvist, Alexander, and Matthew Richardson, 2003, The Cash Flow, Return and Risk Characteristics of Private Equity, working paper.

Phalippou, Ludovic, and Maurizio Zollo, 2006, What Drives Private Equity Fund Performance? working paper.

Scholes, M., and J. Williams, 1977, Estimating Betas from Nonsynchronous Data, Journal of Financial Economics 5, 309-328.

Tanner, Martin, and Wing Wong, 1987, The Calculation of Posterior Distributions by Data Augmentation, Journal of the American Statistical Association 82, 528-549.

Page 25: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

25

Woodward, Susan, 2004, Measuring Risk and Performance for Private Equity, working paper.

Page 26: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

Table I: Descriptive Statistics Table I presents descriptive statistics for the VC sample. Across the four industry classifications, Panel A contains the total number of rounds with and without observed returns. Similarly, Panel B contains the number of entrepreneurial firms across the four industry classifications and according to the observed exits. Finally, Panel C contains information about the average number of rounds received by each entrepreneurial firm. The variable tau contains the time since the previous round (measured in months), and return is the return earned since the previous round (presented in percent).

Panel A: Total Rounds

Total Biotech IT Retail Other Number of rounds 15169 3892 9152 1700 425 With valuations 9637 2485 5759 1126 267

Panel B: Number of firms Total Biotech IT Retail Other IPO 1111 343 623 121 24 acquisition 742 137 507 74 24 liquidated 444 70 288 77 9 unknown/still alive 940 260 549 92 39 Total 3237 810 1967 364 96

Panel C: Rounds per firm mean median stdev p5 p95 total 4.6861 4 2.2607 2 9 w/ valid data 2.9771 3 1.3099 2 6 tau 12.3522 10 11.2771 2 32 return 128.77 50 340.29 -62.99 500.77

Page 27: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

Table II: OLS, GLS, and MCMC Estimates The table presents OLS, GLS and MCMC estimates of the market model in monthly log-returns. The alphas are intercepts and the betas are coefficients from a regression of the log-returns to the companies on the market log-return. Sigma is the estimated variance of the error term from this regression. The GLS estimator scales each observation with the inverse of the square-root of the time since last financing round, as specified in the text. Reported MCMC estimates are mean and standard deviation of the parameters’ simulated posterior distribution. The simulations use 500 iterations preceded by 500 discarded iterations for burn-in. For frequentist estimates ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively. For Bayesian estimates ***, **, and * denote whether zero is contained in the 1%, 5%, and 10% credible intervals, respectively.

OLS GLS MCMC 1 2 3

Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. alpha_tilde 0.0091 (0.0008)*** 0.0239 (0.0013)*** 0.0213 (0.0012) *** beta 1.6104 (0.0672)*** 1.3891 (0.0869)*** 1.6152 (0.0739) *** sigma 0.8777 0.3266 0.3243 (0.0031) ***

Page 28: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

Table III: MCMC Estimates with selection correction

1 2 3 4 Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.

alpha_tilde -0.0211 (0.0011)*** -0.0345 (0.0038)*** -0.0318 (0.0046)*** -0.0295 (0.0037)*** beta 2.8898 (0.1087)*** 3.0157 (0.1460) 2.7753 (0.1635)*** 2.6377 (0.1529)*** sigma 0.3949 (0.0028)*** 0.4574 (0.0193)*** 0.4552 (0.0214)*** 0.4498 (0.0173)*** Selection Equation return 0.1810 (0.0040)*** 0.2214 (0.0085)*** 0.2165 (0.0172)*** 0.2157 (0.0137)*** tau 0.2635 (0.0117)*** 0.2664 (0.0110)*** 0.2644 (0.0115)*** tau^2 -0.0290 (0.0024)*** -0.0298 (0.0028)*** -0.0295 (0.0025)*** X_rounds 0.4876 (0.0662)*** X_dollars 11.9621 (2.8409)*** -8.2378 (4.1443)* X_ACQ 2.7620 (0.5679)*** 0.3078 (0.5833) X_IPO 5.0547 (0.7148)*** 5.4601 (0.6836)*** const -1.4817 (0.0052)*** -1.6304 (0.0093) *** -1.8056 (0.0164)*** -1.8604 (0.0174)***

Page 29: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

Table IV: MCMC Estimates of Fama-French 3-factor Model The table presents the posterior distributions of the parameters of the Fama-French model in log-returns. Factor and risk-free returns are from Kenneth French’s website. The MCMC algorithm uses two different selection models and 500 iterations for burn-in and 500 iterations to sample the posterior distributions. Alphas and sigmas are annualized.

OLS GLS MCMC 1 2 3 4

Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. Mean Std. Dev. alpha_tilde 0.0082 (0.0009)*** 0.0216 (0.0014)*** -0.0316 (0.0034)*** -0.0278 (0.0036)*** RMRF 1.4299 (0.0694)*** 1.2327 (0.0877)*** 2.5446 (0.1562)*** 2.2662 (0.1354)*** SMB 0.4832 (0.0803)*** 0.7230 (0.1026)*** 1.0948 (0.1343)*** 0.9155 (0.1182)*** HML -0.6322 (0.0686)*** -0.8113 (0.0856)*** -2.0071 (0.1189)*** -1.7411 (0.1580)*** sigma 0.8714 0.3240 0.4455 (0.0126)*** 0.4401 (0.0169)*** Selection Equation return 0.2258 (0.0109)*** 0.2244 (0.0128)*** tau 0.2756 (0.0124)*** 0.2803 (0.0146)*** tau^2 -0.0296 (0.0024)*** -0.0302 (0.0026)*** X_rounds 0.4614 (0.0647)*** X_dollars -10.8600 (3.9612)*** X_ACQ 1.1112 (0.6038)** X_IPO 5.2952 (0.6930)*** const -1.6386 (0.0090)*** -1.8691 (0.0194)***

Page 30: Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf · Estimating Risk and Return of Infrequently-Traded Assets: A Bayesian Selection Model

Table V: Industry level MCMC Estimates

OLS GLS MCMC 1 2 3 4

Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. Mean Std. Dev. alpha_tilde Biotech 0.0114 (0.0016)*** 0.0200 (0.0025)*** -0.0285 (0.0035) *** -0.0249 (0.0036)*** IT 0.0091 (0.0011)*** 0.0258 (0.0017) *** -0.0318 (0.0036) *** -0.0278 (0.0039)*** Retail 0.0068 (0.0030)** 0.0326 (0.0044)*** -0.0492 (0.0058) *** -0.0476 (0.0062)*** Other 0.0063 (0.0053) 0.0105 (0.0077) -0.0505 (0.0081) *** -0.0462 (0.0080)*** beta Biotech 0.5927 (0.1300)*** 0.3140 (0.1736)* 2.0956 (0.2288) *** 1.6607 (0.1841)*** IT 1.8656 (0.0843)*** 1.6240 (0.1087)*** 2.9741 (0.1647) *** 2.7108 (0.1563)*** Retail 3.1028 (0.2244)*** 2.7645 (0.2757)*** 5.4072 (0.3543) *** 5.1907 (0.3538)*** Other 0.8432 (0.4012)** 0.6402 (0.5138) 1.7414 (0.6055) *** 1.2323 (0.5800)*** sigma 0.8680 0.3232 0.4508 (0.0136) 0.4445 (0.0172)*** Selection Equation return 0.2204 (0.0140) *** 0.2108 (0.0136)*** tau 0.2594 (0.0123) *** 0.2610 (0.0120)*** tau^2 -0.0287 (0.0021) *** -0.0299 (0.0023)*** X_rounds 0.4714 (0.0632)*** X_dollars -6.8734 (3.8495)* X_ACQ 0.2844 (0.6360) X_IPO 5.2208 (0.7185)*** const -1.6293 (0.0094) *** -1.8505 (0.0175)***