key determinants of demand, credit underwriting, and...

1

Key determinants of demand, credit underwriting, and performance on government-insured

mortgage loans in Russia

Authors: Agatha M. Lozinskaya (Poroshina), Evgeniy M. Ozhegov

Affiliation: Higher School of Economics, Department of Economics, Research group for Applied Markets

and Enterprises Studies

Abstract

This research analyses the process of lending from russian state-owned mortgage provider. Two-

level lending and insurance of mortgage system leads to substantially higher default rates for insured loans.

This means that underwriting incentives for regional operators of government mortgage loans perform

poorly. We use loan-level data of issued mortgage by one regional government mortgage provided in order

to understand the interdependence between underwriting, choice of contract terms including loan insurance

by borrower and loan performance.

JEL Classification: C36; D12; R20

Research proposal

Introduction

Key issues of government policy include providing of affordable housing, identifying the main

drivers of mortgage borrowing and performance of mortgage loans. Therefore, the problem of developing

optimal credit contracts and effective risk management systems, especially on the residential mortgage

market, is becoming crucial.

National institute for development of housing activity - Agency of Home Mortgage Lending

(AHML) helps to implement strong government housing policy and anti-recessionary measures to support

mortgage lending in Russia. AHML is state-owned provider of government-insured loans, which uses two-

level system of lending. In the first step banks and non-credit organizations provide mortgage loans to

households according the common standards of AHML. The second step is refinancing (redemption) of

2

mortgage receivables by AHML. AHML develops special mortgage programs and refinances risks from its

regional branches and commercial banks, which operates such programs. The list of programs contains

“Young researchers”, “Young teachers”, “Mortgage for Soldiers”, “Mothers’ capital” and other social and

subprime programs. This research investigates both the key drivers of self-selection of borrowers to

participate in AHML credit programs, choosing particular terms of credit contract, loan performance

considering possible interdependence of all these decisions.

When applying for an AHML loan the potential borrower chooses whether to have government

loan insurance (provided by AHML insurance company) in case of delinquency, along with other mortgage

terms. If loan-to-value ratio (LTV) is more than 70% then the loan must be insured. While credit risk in the

Russian residential mortgage market has been stable over the past 8 years and the mean probability of

default varied from 4 to 5%1, government-insured AHML loans performed substantially worse and showed

a 16% probability of default. This means that government insurance covers potential losses from such loans

and may affect its approval process. We are interested in the conditions leading to having a government-

insured loan, its performance and the underwriting process of such loans. Preliminary data analysis allows

us to assume that insured loans are better underwrited by AHML regional banks despite the fact of

substantial credit risk because this risk and potential losses are distributed between AHML and its insurance

company.

Obtained results can help to understand the nature of credit risk distribution between regional

operators, AHML and AHML insurance company and analyze the tradeoff between achieving social goals

and credit risk losses for government. Also, it may help to revise the underwriting process and incentives

for regional AHML operators.

This proposal has the following structure. It starts with literature review and some generalization

of recent studies of mortgage borrowing process. The second part contain the description of identification

strategy, which allows correcting for sample selection bias and endogeneity. Third part describes collected

loan-level data and instrumental variables. Finally, we discuss the preliminary results and conclude with

further work.

1 Agency of Housing Mortgage Lending data, www.ahml.ru

3

Literature review

Estimation of demand function for differentiated product with assumption on equal volume of

consumption across individuals is highly developed by recent research. It is based on the classical papers

of McFadden (1973, 1976) who proposed logit model of discrete choice and Barry, Levinsohn, Pakes

(1995) who extended this approach to the case of random elasticities of demand on product characteristics.

Later Nevo (2000) proposed to estimate discrete choice model with random coefficients where elasticities

of demand on product characteristics are functions of socio-demographic characteristics of consumers and

random component.

Classical models of consumer behavior on mortgage market widely use these parametric

approaches to construct regression functions. Probit and logit for binary choice and linear regression model

for continuous choice are common. Main issue in such models is sample selection bias that arises with self-

selection of borrowers not to participate in some steps of borrowing process. Moreover, self-selection

generates partial observability of contract terms and loan performance data. Thus, we only have this data

for all approved borrowers and for those who signed the mortgage contract. Then the magnitude of sample

selection bias depends on the strength of correlation between application process, underwriting process,

choice of credit terms and loan performance (Ross, 2000).

The second issue when modeling demand for credit is simultaneity bias. It arises when terms of

credit contract and characteristics of flat are chosen simultaneously, and this choice is correlated. Mortgage

borrowing as a sequence of consumer and bank decisions firstly introduced by Follain (1990). He defines

the borrowing process as a choice of how much to borrow (the Loan-To-Value ratio, LTV decision), if and

when to refinance or default (the termination decision), and the choice of mortgage instrument itself (the

contract decision). Later, Rachlis and Yezer (1993) suggested a theoretical model of mortgage lending

process, which consists of a system of four simultaneous equations: (1) borrower’s application, (2)

borrower’s selection of mortgage terms, (3) lender’s endorsement, and (4) borrower’s default. They showed

that all of four equations (and decisions) should be considered as interdependent and if it is not so then the

estimated would be inconsistent.

From the mid-1990s, such data as American mortgage datasets from the Federal Housing Authority

(FHA) foreclosure, The Boston Fed Study, The Home Mortgage Disclosure Act (HMDA) became publicly

4

available. Then several empirical studies analyzed mortgage lending process and studied the

interdependency of bank endorsement decision and borrower’s decisions modeled by bivariate probit model

using this sort of data.

As an extension of study (Rachlis, Yezer, 1993), Yezer, Phillips, Trost, (1994) applied Monte-Carlo

experiment to estimate above-listed theoretical model. They empirically showed that isolated modeling

processes of the credit underwriting and default lead to the biased parameter estimates. Later on Phillips

and Yezer (1996) and Ross (2000) supported these findings.

Phillips and Yezer (1996) compared the estimation results of the single equation approach with

those of the bivariate probit model. They showed that discrimination estimation is biased if the lender’s

rejection decision is decoupled from the borrower’s self-selection of loan programs, or if the lender’s

underwriting decision is decoupled from the borrower’s refusal decision.

Ross (2000) studied the link between loan approval and loan default by bivariate probit and found

that most of the approval equation parameters have the opposite sign compared with the same from the

default equation after correction for the sample selection. In this paper, it was outlined that if the sample

of defaulters/non-defaulters contains small information on borrowers’ characteristics then estimated

probability of default and sample selection models will be much biased. As more information on borrower’s

characteristics is available, including credit history and other risk metrics, as less the sample selection bias

will be.

As key determinants of default on mortgage contract usually considered socio-demographic and

financial characteristics of borrowers and contract terms. When data on characteristics of borrowers is

unavailable, some papers, for ex. (Bajari et al., 2008), deal with aggregated demographics and

unemployment rate as proxies for individual demographics.

In paper (Attanasio et al., 2008) authors using approach of Das et al. (2003) for nonparametric

estimation of models with sample selection have shown that contract terms should be included in demand

for auto credit equation in non-linear way and assumption on joint distribution function should be relaxed.

Summarizing findings of recent research it should be mentioned that: 1) When model demand

equation we should consider simultaneity and interdependency of choice in all stages of borrowing process,

2) Errors in contract terms, credit risk and demand equations will be biased by sample selection, 3) The

5

nature of error terms correlations and regression functions can be non-linear and is much complicated to

specify.

Identification strategy

Mortgage borrowing process can be represented by following sequence of decisions:

1. Application of borrower.

A potential borrower realizes the necessity of borrowing, chooses the credit organization and credit

program that reflects her preferences, fills an application form with demographic and financial

characteristics.

2. Approval of borrower.

Considering application form and recent credit history, credit organization endorses the application

or not, inquires the form data (some banks also set the loan amount limit when the borrower is endorsed).

3. Choice of credit terms.

The approved borrower makes a choice on contract agreement and, when agreed, on property to

buy and credit terms: loan amount (not more than limit), down payment, annual payment, rate, type of rate

(adjusted or fixed) and maturity.

4. Loan performance.

Borrower chooses the strategy of loan performance: to pay in respect to contract terms, to default

or prepay.

Econometric model repeats steps of the structural one. The functional form of regression function

is unrestricted following (Das et al., 2003).

𝑑𝑖 = {1, 𝑔0(𝑤0𝑖, 𝑥𝑖

1) + 𝑒0𝑖 ≥ 0

0, 𝑔0(𝑤0𝑖, 𝑥𝑖1) + 𝑒0𝑖 < 0

{𝑦1𝑖

∗ = 𝑔1(𝑥𝑖1, 𝑥𝑖

2∗, 𝑤1𝑖, 𝑦−1𝑖

∗ ) + 𝑒1𝑖

…

𝑦𝑘𝑖∗ = 𝑔𝑘(𝑥𝑖

1, 𝑥𝑖2∗

, 𝑤𝑘𝑖, 𝑦−𝑘𝑖∗ ) + 𝑒𝑘𝑖

𝑥𝑖2∗

= 𝜋(𝑥𝑖1, 𝑧𝑖) + 𝜈𝑖

𝑑𝑒𝑓𝑖∗ = {

1, 𝑔𝑑𝑒𝑓(𝑦𝑖∗, 𝑥𝑖

1, 𝑥𝑖2∗

) + 𝑒𝑑𝑒𝑓,𝑖 ≥ 0

0, 𝑔𝑑𝑒𝑓(𝑦𝑖∗, 𝑥𝑖

1, 𝑥𝑖2∗

) + 𝑒𝑑𝑒𝑓,𝑖 < 0

(1)

6

(𝑦𝑖 , 𝑥𝑖1, 𝑥𝑖

2, 𝑑𝑒𝑓𝑖) = 𝑑𝑖(𝑦𝑖∗, 𝑥𝑖

1, 𝑥𝑖2∗

, 𝑑𝑒𝑓𝑖∗) is observed

where 𝑑𝑖 is a binary indicator of contract signing (both bank’s and borrower’s decision), 𝑥𝑖1 is a set of

demographic and financial characteristics of the borrower and co-borrowers, 𝑦𝑖 is a set of credit terms, 𝑥𝑖2

is the logarithm of loan limit, (𝑤0𝑖, 𝑤1𝑖, … , 𝑤𝑘𝑖, 𝑧𝑖) is a set of excluded instruments for contract signing

decision, credit terms and loan limit respectively. The set of credit terms 𝑦𝑖 then will contain LTV,

logarithm of rate, type of rate, logarithm of maturity and probability of having government insurance. 𝑑𝑒𝑓𝑖

is a binary indicator of default.

The paper of Ozhegov (2014) extends proposed methods for identification and estimation of non-

triangular system of simultaneous equations with sample selection, endogenous regressors and arbitrary

joint error distribution and functional form of regression and control functions in reduced and structural

forms. We may apply this method to estimate model (1) with the following steps.

1. Firstly, we need to estimate the propensity score for the contract agreement equation:

𝑝 = 𝐸[𝑑|𝑥0, 𝑤0] = 𝑔0(𝑤0, 𝑥0) (2)

2. On the second step we will estimate the prediction of endogenous regressors which is log of loan

limit corrected for sample selection using estimate of propensity score:

𝐸[𝑥2|𝑥1, 𝑧, 𝑤0, 𝑑 = 1] = 𝜋(𝑥1, 𝑧) + 𝜆(�̂�) (3)

3. Then we will estimate each contract term equation in the reduced form corrected for sample

selection and endogeneity of loan limit using estimates of propensity score and residuals from the

loan limit equation:

𝐸[𝑦𝑗|𝑥1, 𝑥2, 𝑧, 𝑤, 𝑤0, 𝑑 = 1] = 𝛾𝑗(𝑥1, 𝑥2, 𝑤) + 𝜇(�̂�, �̂�) (4)

4. On the next step we will estimate the structural form contract terms equations corrected for sample

selection, endogeneity and simultaneity using estimates of propensity score, residuals from loan

limit equations and reduced form contract terms residuals:

𝐸[𝑦𝑗|𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗, 𝑧, 𝑤−𝑗, 𝑤0, 𝑑 = 1] = 𝑔𝑗(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗) + 𝜑(�̂�, �̂�, �̂�−𝑗) (5)

5. On the last step we will estimate the probability of default equation corrected for sample selection

and endogeneity of contract terms using propensity score, residuals from loan limit equation and

structural form residuals:

7

𝐸[𝑑𝑒𝑓|𝑥1, 𝑥2, 𝑦, 𝑧, 𝑤, 𝑑 = 1] = 𝑔𝑑𝑒𝑓(𝑥1, 𝑥2, 𝑦) + 𝜑𝑑𝑒𝑓(�̂�, �̂�, �̂�) (6)

In Ozhegov (2014) it was show that if all regression and correction functions are continuously

differentiable and we have at least one excluded variable for selection equation and matrix of instrument’s

marginal effects in reduced form contract terms equations has full rank then equations (2)-(6) is identified

up to additive constant.

An estimation procedure is based on approximation by series of power functions which depend on

initial set of regressors.

Let 𝜔 = (𝜔1, … , 𝜔𝜒) be a set of variables with 𝜒 = 𝑑𝑖𝑚(𝜔).

𝜅(𝜌, 𝜒) =(𝑝+𝜒)!

𝑝!𝜒! will be the number of polynomial terms with power no more than 𝜌 which may be

obtained from 𝜒 variables.

Let Q𝜌(𝜔) = (𝑞1(𝜔), … , 𝑞𝜅(𝜔)) be a vector of 𝜅 power functions, which are a full set of

polynomial terms with the power no more than 𝜌 obtained from 𝜔, i.e. 𝑞𝑗(𝜔) = ∏ 𝜔𝜏𝑠𝜏𝜒

𝜏=1 , ∑ 𝑠𝜏𝜒𝜏=1 ≤ 𝜌,

𝑠𝜏 ∈ {0,1, … , 𝑝} ∀𝜏 = 1, 𝜒̅̅ ̅̅ ̅.

Let Q𝜌(𝜔) call a polynomial approximating series with power 𝜌.

Then the propensity score of selection equation may be estimated by OLS as

𝑝�̂� = 𝐸[𝑑𝑖|𝑥0𝑖, 𝑤0𝑖] = 𝑄𝜌0(𝑤0𝑖, 𝑥0𝑖)[(𝑄𝜌0(𝑤0, 𝑥0))′𝑄𝜌0(𝑤0, 𝑥0)]−1(𝑄𝜌0(𝑤0, 𝑥0))′𝑑 (7)

Let 𝑎 = (𝑎1, 𝑎2), 𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�) = (𝑄𝑍1(𝑥1, 𝑧), 𝑄𝑍2(�̂�)), then 𝑎 may be obtained by OLS as

�̂� = [(𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�))′𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�)]−1(𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�))′𝑥2 (8)

Then the residuals of loan limit equation may be obtained as

�̂�𝑖 = 𝑥𝑖2 − 𝑄𝑍1,𝑍2(𝑥𝑖

1, 𝑧𝑖 , �̂�𝑖)�̂� (9)

Let 𝑏𝑗 = (𝑏1𝑗, 𝑏2𝑗) and 𝑄𝑀1,𝑀2(𝒲) = 𝑄𝑀1,𝑀2(𝑥1, 𝑥2, 𝑤, �̂�, �̂�) = (𝑄𝑀1(𝑥1, 𝑥2, 𝑤), 𝑄𝑀2(�̂�, �̂�)) then

𝑏𝑗 may be obtained by OLS as

�̂�𝑗 = [(𝑄𝑀1,𝑀2(𝒲))′𝑄𝑀1,𝑀2(𝒲)]−1(𝑄𝑀1,𝑀2(𝒲))′𝑦𝑗 (10)

Then the reduced form contract terms residuals will be

�̂�𝑗𝑖 = 𝑦𝑗𝑖 − 𝑄𝑀1,𝑀2(𝑥𝑖1, 𝑥𝑖

2, 𝑤𝑖 , �̂�𝑖, �̂�𝑖)�̂�𝑗 (11)

Let 𝛽𝑗 = (𝛽1𝑗, 𝛽2𝑗) and 𝑄𝜉1,𝜉2(𝒳) = 𝑄𝜉1,𝜉2(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗, �̂�, �̂�, �̂�−𝑗) =

(𝑄𝜉1(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗), 𝑄𝜉2(�̂�, �̂�, �̂�−𝑗)) then the estimate for 𝛽𝑗 may be obtained by OLS as

8

�̂�𝑗 = [(𝑄𝜉1,𝜉2(𝒳))′𝑄𝜉1,𝜉2(𝒳)]−1(𝑄𝜉1,𝜉2(𝒳))′𝑦𝑗 (12)

Then the structural form contract terms residuals will be

�̂�𝑗𝑖 = 𝑦𝑗𝑖 − 𝑄𝜉1,𝜉2(𝒳)�̂�𝑗 (13)

Let 𝛼 = (𝛼1, 𝛼2) and 𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�) = (𝑄𝜃1(𝑥1, 𝑥2, 𝑦), 𝑄𝜃2(�̂�, �̂�, �̂�)) then the estimate

for 𝛼 may be obtained by OLS as

�̂� = [(𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�))′

𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�)]−1 ∗

∗ (𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�))′𝑑𝑒𝑓

(14)

Crucial assumption for identification is full rank of matrix of marginal effects of excluded

instruments. Instead of validity assumption this one is testable. We will follow Sanderson and Windmeijer

(2014) conditional F-test approach in order to test the hypothesis of matrix of marginal effects reduction to

one from full rank.

Firstly, consider testing of linear model with multiple endogenous variables. Then we will

generalize the test for nonlinear semiparametric model with sample selection.

In simplest case studied by Sanderson and Windmeijer (2014) we have one linear equation of y

with k endogenous variables 𝑋 = (𝑥1, … , 𝑥𝑘) = (𝑥𝑗, 𝑥−𝑗) and m instruments 𝑍 = (𝑧1, … , 𝑧𝑚) independent

on distribution of error terms. The model may be expressed as

𝑦 = 𝑋𝛽 + 𝑒0

𝑥𝑗 = 𝑍𝜋𝑗 + 𝑒𝑗

(𝑒0, 𝑒1, … , 𝑒𝑘) ⊥ 𝑍

The problem is to test whether 𝑟𝑎𝑛𝑘[Π′𝑍′𝑍Π] = 𝑘. Stock and Yogo (2005) introduced a test based

on minimal eigenvalue of matrix Π̂′𝑍′𝑍Π̂

𝑚 with 𝐻0: 𝑟𝑎𝑛𝑘[Π′𝑍′𝑍Π] = 𝑘 − 1. If its minimal eigenvalue

statistically differs from zero we can reject the null that matrix has not full rank and instruments are weak.

They also calculated critical values for the test but for 𝑚 ≤ 2. Sanderson and Windmeijer (2014) followed

Angrist and Pischke (2009) conditional F-test approach for testing of joint significance of instruments in

reduced form regression. Angrist and Pischke (2009) proposed conditional F-statistics and Sanderson and

Windmeijer (2014) corrected its asymptotic distribution and proved equivalence to Stock and Yogo (2005)

test. For this application conditional F-testing approach has an advantage of existence of known limiting

9

distribution that can be easily extended for semiparametric and sample selection case and we can easily

calculate its critical values even for 𝑚 > 2. Testing contains 3 steps: 1) estimation of endogenous variable

conditional on all other endogenous variables, 2) estimation of conditional reduced form parameters and 3)

calculating test statistics. Formally saying:

1) Obtain 𝛾𝑗 by OLS from regression 𝑥𝑗 = 𝑥−𝑗𝛾𝑗 + 𝜉𝑗

2) Obtain �̂�𝑗 by OLS from regression 𝑥𝑗 − 𝑥−𝑗𝛾𝑗 = 𝑍𝜅𝑗 + 𝜈𝑗

3) For each endogenous variable calculate instrument’s conditional 𝐹𝑥𝑗|𝑥−𝑗=

�̂�𝑗′𝑍′𝑍�̂�𝑗

(𝑚−𝑘+1)(�̂�𝑗

′�̂�𝑗

𝑛)

.

For the case of semiparametric equation with continuously differentiable regression functions like

𝑦 = 𝑔(𝑋) + 𝑒0

𝑥𝑗 = 𝜋𝑗(𝑍) + 𝑒𝑗

(𝑒0, 𝑒1, … , 𝑒𝑘) ⊥ 𝑍

for joint instrument’s Z relevance we need to prove that 𝑟𝑎𝑛𝑘 [𝜕𝜋(𝑍)

𝜕𝑍] = 𝑑𝑖𝑚(𝑋) = 𝑘. If we approximate

each unknown regression function 𝑓(𝑠) with its polynomial approximation function 𝑄𝜌(𝑠)𝛼 with power 𝜌

and 𝑑𝑖𝑚 (𝛼) =(𝜌+𝑑𝑖𝑚(𝑠))!

𝜌! 𝑑𝑖𝑚(𝑠)! then we can test exclusion restriction by:

1) Obtain 𝛾𝑗 by OLS from regression 𝑥𝑗 = 𝑄𝜌(𝑥−𝑗)𝛾𝑗 + 𝜉𝑗

2) Obtain �̂�𝑗 by OLS from regression 𝑥𝑗 − 𝑄𝜌(𝑥−𝑗)𝛾𝑗 = 𝑄𝜌(𝑍)𝜅𝑗 + 𝜈𝑗

3) For each endogenous variable calculate instrument’s conditional 𝐹𝑥𝑗|𝑥−𝑗=

�̂�𝑗′(𝑄𝜌(𝑍))′𝑄𝜌(𝑍)�̂�𝑗

((𝜌+m)!

𝜌!m! −

(𝜌+k)!

𝜌!k!+1)(

�̂�𝑗′�̂�𝑗

𝑛)

.

This type of testing is very simply may be generalized for the case of presence of sample selection

by including control functions for error terms when estimating regressions parameters on steps (1-2).

Data description

One of the regional AHML operators provided the data set of all applications for mortgage collected

from 2008 to 2012. We know the demographic and financial characteristics of each of the 3870 applicants

as main borrowers and their co-borrowers on the date of application, we also know the date of application

(Table A.1). For all signed contracts we know the loan limit set by the bank, the contract terms, and the

value of property. The characteristics of the borrower are fully observable and the contract characteristics

10

are partially observable for only the subsample of applicants who signed the contract (Table A.2).

Some mortgage programs allow the applicants not to provide any information on their income.

These programs are usually linked with a higher contract rate. The reason for this choice may be explained

by a temporary or changeable income (LaCour-Little, 2007), for instance, for entrepreneurs. Generally,

income should be considered endogenous while modeling the approval of borrower or contract terms.

However, we can control for employment category, which rejects the inconsistency due to possible

endogeneity of income. Moreover, co-borrower income may also be endogenous and we cannot provide

any proxy for co-borrower income since we do not have any characteristics of co-borrowers. This is a

limitation of the research. But we may consider it as insignificant for the choice of contract terms compared

to the income of the main borrower.

To estimate the model we need to find a set of relevant excluded instruments for the probability of

signing a contract, the loan limit and each credit term.

Bajari et al. (2008) discussed the possibility of using aggregated district-level variables as proxies

for unavailable data. We will use the same strategy to find the set of instruments. Since we have data without

spatial variation we can use time variation in applications. We have data from July 2008 to August 2012

and we know the application date for each applicant. Each application was matched with the set of

aggregated mortgage and housing market characteristics by date of application. On average, the process

takes two months from the date of application to the date of contract agreement. Also, Ozhegov and

Poroshina (2013) showed that aggregated demand on mortgage reacts to changes in supply within two

months. Then we need to fix the aggregated market characteristics for each application not only in the

month of application but also the 1-2 months prior the application, and use these as instruments.

Table A.3 represents the descriptive statistics of aggregated mortgage and housing market

characteristics for the period from July 2008 to August 2012 (50 months).

About 15% of issued loans were refinanced by AHML, but not all of them were issued by the bank

supplying the data. Generally, the number of applications to the bank is fewer than the number refinanced

by AHML by all the regional banks.

The difference between the number of loans refinanced by AHML and the number of applications

to the bank within a particular month may be the excluded variable which explains the probability of

contract agreement, but it does not affect the contract term choice. Since every commercial bank operates

11

with the same AHML programs, the difference in the approval process does not affect the term choice. But

an increase in the number of refinanced loans shows the changes in the underwriting process in other banks

and may correlate with the probability of a contract agreement with the bank. This variable should be

considered as exogenous since each individual decision explains negligible variation of the aggregated

market characteristic (less than 1%).

As an excluded instrument for the loan limit we use the mean Debt-to-Income ratio (DTI). The

positive dependence of these two variables is because the mean DTI for all issued loans reflects the

evaluation of the mean credit risk (the higher the DTI of issued loans, the less risk). It positively correlates

with the loan limit, which reflects the willingness to issue a larger loan for a particular borrower. This

variable is valid since individual shocks of loan limit do not affect the aggregated characteristic of issued

loans.

As excluded instruments for credit terms, LTV, rate, maturity and insurance, we used mean LTV,

median rate, median maturity for issued loans and the housing affordability coefficient. The relevance of

the first three instruments is implied by the interdependence of mortgage market characteristics and the

AHML credit programs conditions. Validity is implied by the exogeneity of the program terms for each

particular borrower.

The affordability coefficient is relevant for the probability of insurance because the increase of

affordability should lead to the choice of a lower LTV and consequently to a lower probability of loan

insurance. Validity is also implied by the independence of individual preference on insurance shocks and

the aggregated affordability of housing. All the variables are relevant and valid and may be used as

instruments. The relevance was proved for each model with the conditional F-testing approach.

Preliminary results

First, we estimated the model of the probability of a contract agreement (Table A.4) based on the

characteristics of the borrower and co-borrowers and the difference between the number of AHML

refinanced loans and the number of applications. The last variable which was taken as an excluded

instrument is significant at the 1% level. The sign and significance of borrower characteristics are consistent

with recent research. The demographic characteristics, such as age, sex, marital status and the level of

education of the borrower are insignificant, which supports the absence of discrimination. The probability

12

of a contract agreement is positively correlated with the income of the main borrower and co-borrowers

and, on the contrary, negatively correlates with the failure to provide income details. Entrepreneurs have a

higher probability of a contract agreement ceteris paribus.

These estimates were obtained from the linear probability model and were compared with the probit

model. The comparison showed an insignificant difference in the significance of the parameter estimates

and predictive power (with slightly higher predictive power for the linear probability model). The

propensity score 𝑝�̂� = 𝐸[𝑑𝑖|𝑥𝑖, 𝑤0𝑖] was obtained from the linear probability model.

The model of the logarithm of the loan limit was estimated for all the signed contracts (Table A.5).

The excluded instrument (DTI) is significant at the 1% level. We used polynomials up to the third power

as approximations for the control function on �̂�. The hypothesis of its significance was rejected (at 29%

level) which suggests there is no selection bias of underwriting on the set of loan limits. The estimated

parameters for borrower characteristics are also not counterintuitive.

For each credit term we estimated the reduced form equation in order to test instrument’s relevance.

The control function was approximated by the polynomial with power 𝑀2 on the estimate of the propensity

score and the loan limit equation residuals. The regression function was estimated as partially polynomial.

It was linear for the characteristics of the borrower and polynomial for the excluded instruments for contract

terms with power 𝑀1. We test three set of instruments. First we fix market-level variables on the month of

application. For the second and third sets we used market-level data for month one and two months before

the month of application respectively. The proof of relevance of excluded instruments based on conditional

F-test is provided in Table A.6. All sets of excluded instruments are relevant on 5% level. The set of market-

level variables fixed on the month before the month of application are significant at 1% level.

All excluded variables appear to be relevant and is robust to degree of approximation which allows

to use it for obtaining consistent estimates for contract terms and default equations.

Participants, funding and project plan

Evgeniy M. Ozhegov and Agatha M. Lozinskaya (Poroshina) are young research fellows of the

research group for applied markets and enterprises studies (Grigory Kosenok, PhD in economics, full

professor in NES, is a scientific advisor of the Group and this project) and senior lectureres in Department

of Applied mathematics in Higher School of Economics. They have 18 published articles, on demand

13

estimation, credit risk evaluation and mortgage lending modeling, including one on demand estimation in

international double-peer review journal.

Parts of this research were presented on 6 international conferences in 2013-2014: American Real

Estate and Urban Economics Association International Conference, Perm Winter School on Market Risk

and Modeling of Financial Markets, Russian Economic Congress, Eurasian Business and Economic

Society, International Conference on Applied Research in Economics, XV April International Academic

Conference on Economic and Social Development.

There is no recent won grants for this project. Basic salary of senior lecturer (1 wage) and young

researcher (half of wage) in NRU HSE is 700 euros.

Planned steps of project are:

1. Estimation of contract terms choice and default equation using abovementioned procedure.

Calculate the distribution of the expected loss for AHML and insurance company.

2. Interpreting the estimation and calculation results. Describing the results in paper for russian

journal (March 2014). Refining and submitting to journal on real estate (Real Estate

Economics) (July 2015).

References

Angrist J., Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton

University Press, Princeton.

Ambrose, B., LaCour-Little M., Sanders, A. 2004. The Effect of Conforming Loan Status on Mortgage

Yield Spreads: A Loan Level Analysis. Real Estate Economics, 32(4), pp. 541–569.

Attanasio, O.P., Goldberg, P.K., Kyriazidou, E. 2008. Credit Constraints in the Market for Consumer

Durables: Evidence from Micro Data on Car Loans. International Economic Review, 49(2), pp. 401-

436.

Berry, S., Levinsohn, J., & Pakes, A., 1995. Automobile prices in market equilibrium. Econometrica:

Journal of the Econometric Society, 841-890.

Das, M., Newey, W. K., Vella, F. 2003. Nonparametric estimation of sample selection models. The Review

of Economic Studies, 70(1), pp. 33-58.

Follain, J. R. 1990. Mortgage choice. Real Estate Economics, 18(2), pp. 125–144.

McFadden D., 1973. Conditional logit analysis of qualitative choice behavior. In: Zaremka P, editor.

Frontiers in Econometrics. New York: Academic Press,. pp. 105-42

McFadden, D., 1978. Modeling the choice of residential location. In: Karlquist, A. (Ed.), Spatial Interaction

Theory and Residential Location. North-Holland, Amsterdam, pp. 75–96.

14

Nevo, A., 2000. Mergers with differentiated products: the case of the ready-to-eat cereal industry, RAND

Journal of Economics, 31(3), pp. 395-421

Ozhegov E. M., Poroshina A. M. (2013). The Lagged Structure of Dynamic Demand Function for Mortgage

Loans in Russia. EJournal of Corporate Finance, 27, 37-49.

Ozhegov, E.M., 2014. Identification and estimation of nonparametric simultaneous equations with sample

selection, mimeo.

Phillips, R., Yezer, A. 1996. Self-Selection and Tests for Bias and Risk in Mortgage Lending: Can You

Price the Mortgage If You Don't Know the Process? Journal of Real Estate Research, 11, pp. 87–102.

Philips, R., Trost R., Yezer, A. 1994. Bias in Estimates of Discrimination and Default in Mortgage Lending:

the Effects of Simultaneity and Self-Selection. Journal of Real Estate Finance and Economics, 9, pp.

197-215.

Rachlis, M., Yezer A. 1993. Serious Flaws in Statistical Tests for Discrimination in Mortgage Markets.

Journal of Housing Research, 4, pp. 315–336.

Ross, S. L. 2000. Mortgage Lending, Sample Selection and Default. Real Estate Economics, 28(4), pp. 581-

621.

Sanderson, E., Windmeijer, F. (2014). A Weak Instruments F-test in linear IV models with multiple

endogenous variables. Discussion paper 14/644, University of Bristol, Department of Economics.

Stock, J.H., Yogo M. (2005). Testing for weak instruments in linear IV regression. In: D.W.K. Andrews

and J.H. Stock (Eds.), Identification and Inference for Econometric Models, Essays in Honor of

Thomas Rothenberg, 80-108. New York: Cambridge University Press.

http://www.hse.ru/en/org/persons/24954570

15

Appendix

Tab. A1. Descriptive statistics for applicants’ characteristics.

Variable Full sample

(33442 obs.)

Signed contract

(2019 obs.)

Did not signed

contract (1325

obs.)

Age3, years 33.77 33.93 33.53

(7.56) (7.65) (7.41)

Sex

Male 1848 (55.3%) 1151 (57.0%) 697 (52.6%)

Female 1496 (44.7%) 868 (43.0%) 628 (47.4%)

Marital status

Married 1793 (53.6%) 1132 (56.1%) 661 (49.9%)

Single 1013 (30.3%) 587 (29.1%) 426 (32.2%)

Divorced 497 (14.9%) 281 (13.9%) 216 (16.3%)

Widowed 41 (1.2%) 19 (0.9%) 22 (1.7%)

Category of employment

Hired employee 3210 (96.0%) 1923 (95.2%) 1287 (97.1%)

State-owned employee 111 (3.3%) 79 (3.9%) 32 (2.4%)

Entrepreneur 23 (0.7%) 17 (0.8%) 6 (0.5%)

Level of education

Complete higher 1756 (52.5%) 1116 (55.3%) 640 (48.3%)

Less than higher 1588 (47.5%) 903 (44.7%) 685 (51.7%)

Declared income of main borrower

Not declared 2333 (69.8%) 1223 (60.6%) 1110 (83.8%)

From 0 to $249 85 (2.5%) 47 (2.3%) 38 (2.9%)

From $250 to $499 279 (8.3%) 237 (11.7%) 42 (3.2%)

From $500 to $1 000 442 (13.2%) 358 (17.7%) 84 (6.3%)

More than $1 000 205 (6.1%) 154 (7.6%) 51 (3.8%)

Number of co-borrowers

0 1416 (42.3%) 823 (40.8%) 593 (44.8%)

1 1794 (53.6%) 1105 (54.7%) 689 (52.0%)

2 134 (4.0%) 91 (4.5%) 43 (3.2%)

Declared income of co-borrowers

Not declared 2939 (87.9%) 1677 (83.1%) 1262 (95.2%)

From 0 to $249 105 (3.1%) 97 (4.8%) 8 (0.6%)

From $250 to $499 157 (4.7%) 129 (6.4%) 28 (2.1%)

More than $500 143 (4.3%) 116 (5.7%) 27 (2.0%)

2 The outliers from the sample were excluded. We treat an observation as an outlier if the age, level of education,

marital status or other characteristics were missing (119 obs.). We exclude observations with borrowers under age 21,

with LTV or DTI (debt-to-income ratio) less than 0 or more than 1 (135 obs.). We consider those outliers as random

and due to the errors in the database. We also exclude 2.5% observations with extremal value of bought property from

each side of its distribution (89 obs.). After excluding the outliers the sample was 3344 observations. 2019 applicants

signed the mortgage contract, while 1325 of them did not. 3 Mean and standard deviation in the parenthesis.

16

Tab. A2. Descriptive statistics of the issued loans (2019 contracts).

Variable Mean St. dev. Min Max

Loan amount, $ 25 068.3 12 440.54 3 750 120 000

Downpayment, $ 20 130.0 13 740.91 1 250 117 500

Flat value, $ 45 198.3 21 191.38 12 500 175 000

Monthly payment, $ 303.5 158.8 60.2 1 766

Loan limit, $ 25 964.3 13 394.9 3 750 183 750

Loan-to-value ratio (LTV) 0.57 0.16 0.11 0.94

Maturity, months 190.4 61.5 26 360

Rate, % 11.4 1.54 9.5 19

Insurance Not insured 1837 (91.0%)

Is insured 182 (9.0%)

Insured Not insured Total

Indicator of default Not defaulted 153 (84.1%) 1772 (96.5%) 1925 (95.3%)

Defaulted 29 (15.9%) 65 (3.5%) 94 (4.7%)

Tab. A3. Aggregated mortgage and housing markets characteristics.

Variable Mean St. dev. Min Max

Volume of issued mortgage in region, mln. $ 23.0 14.1 2.9 54.8

Volume of issued mortgage in region, number 894.4 529.2 134 2112

Mean loan amount, $ 28 814.2 6299.8 22 482.7 47 705.0

Median maturity, months 200.79 12.81 173 222.2

Median rate, % 12.97 0.80 12 14.3

Mean LTV 0.58 0.03 0.48 0.65

Mean DTI4 0.35 0.01 0.33 0.37

Mean ft2 value, $ 89.7 14.3 66.9 119.2

Affordability of housing coefficient5 0.287 0.055 0.215 0.389

Number of refinanced in AHML loans 129.1 83.7 30 310

Number of application to the bank 121.4 51.9 43 222

4 DTI – ratio between monthly payment and monthly income. 5 Affordability coefficient reflects the ratio between an income of mean household and a value of mean flat.

17

Tab. A4. Estimated parameters for selection equation.

Variable (1) (2)

OLS Probit

Age of borrower -0.006 -0.014

(0.009) (0.025)

Age squared 0.000 0.000

(0.000) (0.000)

Male 0.028 0.081

(0.018) (0.051)

Family status (Married is base level):

Single -0.029 -0.093

(0.025) (0.071)

Divorced -0.042 -0.130

(0.029) (0.083)

Widowed -0.130* -0.363*

(0.076) (0.209)

Category of activity (Hired employee is base level):

Entrepreneur 0.066 0.165

(0.099) (0.294)

State employee 0.140*** 0.393***

(0.045) (0.131)

Level of education (Complete higher is base level):

Less than higher -0.071*** -0.197***

(0.017) (0.047)

Number of co-borrowers (No co-borrowers is base level)

1 co-borrower 0.001 -0.015

(0.024) (0.069)

2 co-borrowers 0.019 0.055

(0.048) (0.140)

Declared income of co-borrowers (Not declared is base level):

From 0 to $249 0.155*** 0.731***

(0.052) (0.198)

From $250 to $499 0.088** 0.291**

(0.043) (0.135)

More than $500 0.073 0.245*

(0.045) (0.138)

Declare income of main borrower (Not declared is base level):

From 0 to $249 -0.011 -0.083

(0.054) (0.151)

From $250 to $499 0.265*** 0.798***

(0.034) (0.107)

From $500 to $999 0.232*** 0.656***

(0.027) (0.080)

More than $1000 0.179*** 0.475***

(0.036) (0.105)

Difference between AHML loans number and number of

applications

-0.000*** -0.001***

(0.000) (0.000)

Constant 0.646*** 0.295

(0.161) (0.452)

N 3344 3344

k 20 20

% of correct predictions 64.8 64.7

Test for excluded variable significance 𝐹(1, 3224)=31.98 𝜒2(1)=32.23

Note: Robust standard errors are in parenthesis,

significance level obtained from t-statistics,

* - 10%, ** - 5%, *** - 1%.

k – number of estimated parameters, N – number of observations

18

Tab. A5. Estimated parameters for loan limit equation. Variable (1) (2) (3)

Age of borrower 0.016* 0.015 0.015

(0.010) (0.010) (0.010)

Age squared -0.000* -0.000* -0.000*

(0.000) (0.000) (0.000)

Male -0.012 -0.011 -0.012

(0.021) (0.021) (0.021)

Family status (Single is base level):

Married -0.037 -0.035 -0.034

(0.027) (0.027) (0.027)

Divorced -0.043 -0.040 -0.039

(0.033) (0.033) (0.033)

Widowed -0.065 -0.062 -0.041

(0.099) (0.100) (0.101)

Category of activity (Hired employee is base level):

Entrepreneur 0.066 0.067 0.065

(0.101) (0.101) (0.101)

State employee -0.063 -0.058 -0.063

(0.053) (0.053) (0.054)

Level of education (Complete higher is base level):

Less than higher -0.166*** -0.169*** -0.164***

(0.023) (0.023) (0.023)

Number of co-borrowers (No co-borrowers is base level):

1 co-borrower 0.087*** 0.087*** 0.085***

(0.027) (0.027) (0.027)

2 co-borrowers 0.129** 0.134** 0.129**

(0.052) (0.052) (0.052)

Declared income of co-borrowers (Not declared is base level):

From 0 to $249 -0.066 -0.068 -0.075

(0.058) (0.063) (0.063)

From $250 to $499 0.038 0.037 0.039

(0.047) (0.047) (0.048)

More than $500 0.200*** 0.203*** 0.207***

(0.046) (0.047) (0.047)

Declare income of main borrower (Not declared is base level):

From 0 to $249 -0.908*** -0.916*** -0.910***

(0.065) (0.066) (0.066)

From $250 to $499 -0.486*** -0.491*** -0.481***

(0.060) (0.060) (0.064)

From $500 to $999 -0.081 -0.084 -0.073

(0.053) (0.053) (0.058)

More than $1000 0.365*** 0.360*** 0.370***

(0.051) (0.052) (0.055)

Mean DTI 0.071*** -2.677** 7.878

(0.015) (1.231) (8.655)

Mean DTI squared - 0.040** -2.190

(0.018) (2.319)

Mean DTI cubed - - 0.021

(0.022)

Prop. score -0.201 -0.225 2.530

(0.180) (0.559) (2.661)

Prop. score squared - 0.014 -3.988

(0.388) (3.883)

Prop. score cubed - - 1.862

(1.803)

Constant 7.550*** 5.155*** -8.291

(0.510) (2.333) (9.977)

N 2019 2019 2019

k 21 23 25

Test for excluded variable significance F(1,1998)=23.5 F(2,1998)=14.3 F(3,1998)=9.5

19

Note: Robust standard errors are in parenthesis,

significance level obtained from t-statistics,

* - 10%, ** - 5%, *** - 1%.

k – number of estimated parameters, N – number of observations

Model (1) was estimated for 𝑍1 = 1, 𝑍2 = 1, model (2) for 𝑍1 = 2, 𝑍2 = 2, model (3) for 𝑍1 = 3, 𝑍2 = 3.

Tab. A6. Results of instruments’ relevance test

I II III

Equation (1) (2) (3) (1) (2) (3) (1) (2) (3)

LTV 2.115 3.039 2.324 3.128 2.162 2.197 2.449 2.059 1.854

Log. of rate 156.0 107.2 70.66 203.1 112.8 67.00 287.1 124.3 69.08

Log. of maturity 2.185 2.227 1.411 4.398 2.356 1.861 3.838 2.582 1.570

Prob. of insurance 4.055 2.275 1.617 2.275 2.805 2.400 2.701 2.756 2.013

10% critical values 1.414 1.352 1.300 1.414 1.352 1.300 1.414 1.352 1.300

5% critical values 1.561 1.473 1.400 1.561 1.473 1.400 1.561 1.473 1.400

1% critical values 1.863 1.719 1.602 1.863 1.719 1.602 1.863 1.719 1.602

Note: In the table cells there are conditional F-statistics of excluded instruments.

Critical values are provided.

For each equation, models (I) are calculated with market-level instruments fixed in the month of application,

models (II) with market-level instruments fixed one month before the month of application,

and models (III) for two months before the month of application.

For each equation, model (1) was estimated for 𝑀1 = 1, 𝑀2 = 1, model (2) for 𝑀1 = 2, 𝑀2 = 2, model (3) for 𝑀1 = 3, 𝑀2 =3.

key determinants of demand, credit underwriting, and...

Documents