key determinants of demand, credit underwriting, and...
TRANSCRIPT
1
Key determinants of demand, credit underwriting, and performance on government-insured
mortgage loans in Russia
Authors: Agatha M. Lozinskaya (Poroshina), Evgeniy M. Ozhegov
Affiliation: Higher School of Economics, Department of Economics, Research group for Applied Markets
and Enterprises Studies
Abstract
This research analyses the process of lending from russian state-owned mortgage provider. Two-
level lending and insurance of mortgage system leads to substantially higher default rates for insured loans.
This means that underwriting incentives for regional operators of government mortgage loans perform
poorly. We use loan-level data of issued mortgage by one regional government mortgage provided in order
to understand the interdependence between underwriting, choice of contract terms including loan insurance
by borrower and loan performance.
JEL Classification: C36; D12; R20
Research proposal
Introduction
Key issues of government policy include providing of affordable housing, identifying the main
drivers of mortgage borrowing and performance of mortgage loans. Therefore, the problem of developing
optimal credit contracts and effective risk management systems, especially on the residential mortgage
market, is becoming crucial.
National institute for development of housing activity - Agency of Home Mortgage Lending
(AHML) helps to implement strong government housing policy and anti-recessionary measures to support
mortgage lending in Russia. AHML is state-owned provider of government-insured loans, which uses two-
level system of lending. In the first step banks and non-credit organizations provide mortgage loans to
households according the common standards of AHML. The second step is refinancing (redemption) of
2
mortgage receivables by AHML. AHML develops special mortgage programs and refinances risks from its
regional branches and commercial banks, which operates such programs. The list of programs contains
“Young researchers”, “Young teachers”, “Mortgage for Soldiers”, “Mothers’ capital” and other social and
subprime programs. This research investigates both the key drivers of self-selection of borrowers to
participate in AHML credit programs, choosing particular terms of credit contract, loan performance
considering possible interdependence of all these decisions.
When applying for an AHML loan the potential borrower chooses whether to have government
loan insurance (provided by AHML insurance company) in case of delinquency, along with other mortgage
terms. If loan-to-value ratio (LTV) is more than 70% then the loan must be insured. While credit risk in the
Russian residential mortgage market has been stable over the past 8 years and the mean probability of
default varied from 4 to 5%1, government-insured AHML loans performed substantially worse and showed
a 16% probability of default. This means that government insurance covers potential losses from such loans
and may affect its approval process. We are interested in the conditions leading to having a government-
insured loan, its performance and the underwriting process of such loans. Preliminary data analysis allows
us to assume that insured loans are better underwrited by AHML regional banks despite the fact of
substantial credit risk because this risk and potential losses are distributed between AHML and its insurance
company.
Obtained results can help to understand the nature of credit risk distribution between regional
operators, AHML and AHML insurance company and analyze the tradeoff between achieving social goals
and credit risk losses for government. Also, it may help to revise the underwriting process and incentives
for regional AHML operators.
This proposal has the following structure. It starts with literature review and some generalization
of recent studies of mortgage borrowing process. The second part contain the description of identification
strategy, which allows correcting for sample selection bias and endogeneity. Third part describes collected
loan-level data and instrumental variables. Finally, we discuss the preliminary results and conclude with
further work.
1 Agency of Housing Mortgage Lending data, www.ahml.ru
3
Literature review
Estimation of demand function for differentiated product with assumption on equal volume of
consumption across individuals is highly developed by recent research. It is based on the classical papers
of McFadden (1973, 1976) who proposed logit model of discrete choice and Barry, Levinsohn, Pakes
(1995) who extended this approach to the case of random elasticities of demand on product characteristics.
Later Nevo (2000) proposed to estimate discrete choice model with random coefficients where elasticities
of demand on product characteristics are functions of socio-demographic characteristics of consumers and
random component.
Classical models of consumer behavior on mortgage market widely use these parametric
approaches to construct regression functions. Probit and logit for binary choice and linear regression model
for continuous choice are common. Main issue in such models is sample selection bias that arises with self-
selection of borrowers not to participate in some steps of borrowing process. Moreover, self-selection
generates partial observability of contract terms and loan performance data. Thus, we only have this data
for all approved borrowers and for those who signed the mortgage contract. Then the magnitude of sample
selection bias depends on the strength of correlation between application process, underwriting process,
choice of credit terms and loan performance (Ross, 2000).
The second issue when modeling demand for credit is simultaneity bias. It arises when terms of
credit contract and characteristics of flat are chosen simultaneously, and this choice is correlated. Mortgage
borrowing as a sequence of consumer and bank decisions firstly introduced by Follain (1990). He defines
the borrowing process as a choice of how much to borrow (the Loan-To-Value ratio, LTV decision), if and
when to refinance or default (the termination decision), and the choice of mortgage instrument itself (the
contract decision). Later, Rachlis and Yezer (1993) suggested a theoretical model of mortgage lending
process, which consists of a system of four simultaneous equations: (1) borrower’s application, (2)
borrower’s selection of mortgage terms, (3) lender’s endorsement, and (4) borrower’s default. They showed
that all of four equations (and decisions) should be considered as interdependent and if it is not so then the
estimated would be inconsistent.
From the mid-1990s, such data as American mortgage datasets from the Federal Housing Authority
(FHA) foreclosure, The Boston Fed Study, The Home Mortgage Disclosure Act (HMDA) became publicly
4
available. Then several empirical studies analyzed mortgage lending process and studied the
interdependency of bank endorsement decision and borrower’s decisions modeled by bivariate probit model
using this sort of data.
As an extension of study (Rachlis, Yezer, 1993), Yezer, Phillips, Trost, (1994) applied Monte-Carlo
experiment to estimate above-listed theoretical model. They empirically showed that isolated modeling
processes of the credit underwriting and default lead to the biased parameter estimates. Later on Phillips
and Yezer (1996) and Ross (2000) supported these findings.
Phillips and Yezer (1996) compared the estimation results of the single equation approach with
those of the bivariate probit model. They showed that discrimination estimation is biased if the lender’s
rejection decision is decoupled from the borrower’s self-selection of loan programs, or if the lender’s
underwriting decision is decoupled from the borrower’s refusal decision.
Ross (2000) studied the link between loan approval and loan default by bivariate probit and found
that most of the approval equation parameters have the opposite sign compared with the same from the
default equation after correction for the sample selection. In this paper, it was outlined that if the sample
of defaulters/non-defaulters contains small information on borrowers’ characteristics then estimated
probability of default and sample selection models will be much biased. As more information on borrower’s
characteristics is available, including credit history and other risk metrics, as less the sample selection bias
will be.
As key determinants of default on mortgage contract usually considered socio-demographic and
financial characteristics of borrowers and contract terms. When data on characteristics of borrowers is
unavailable, some papers, for ex. (Bajari et al., 2008), deal with aggregated demographics and
unemployment rate as proxies for individual demographics.
In paper (Attanasio et al., 2008) authors using approach of Das et al. (2003) for nonparametric
estimation of models with sample selection have shown that contract terms should be included in demand
for auto credit equation in non-linear way and assumption on joint distribution function should be relaxed.
Summarizing findings of recent research it should be mentioned that: 1) When model demand
equation we should consider simultaneity and interdependency of choice in all stages of borrowing process,
2) Errors in contract terms, credit risk and demand equations will be biased by sample selection, 3) The
5
nature of error terms correlations and regression functions can be non-linear and is much complicated to
specify.
Identification strategy
Mortgage borrowing process can be represented by following sequence of decisions:
1. Application of borrower.
A potential borrower realizes the necessity of borrowing, chooses the credit organization and credit
program that reflects her preferences, fills an application form with demographic and financial
characteristics.
2. Approval of borrower.
Considering application form and recent credit history, credit organization endorses the application
or not, inquires the form data (some banks also set the loan amount limit when the borrower is endorsed).
3. Choice of credit terms.
The approved borrower makes a choice on contract agreement and, when agreed, on property to
buy and credit terms: loan amount (not more than limit), down payment, annual payment, rate, type of rate
(adjusted or fixed) and maturity.
4. Loan performance.
Borrower chooses the strategy of loan performance: to pay in respect to contract terms, to default
or prepay.
Econometric model repeats steps of the structural one. The functional form of regression function
is unrestricted following (Das et al., 2003).
𝑑𝑖 = {1, 𝑔0(𝑤0𝑖, 𝑥𝑖
1) + 𝑒0𝑖 ≥ 0
0, 𝑔0(𝑤0𝑖, 𝑥𝑖1) + 𝑒0𝑖 < 0
{𝑦1𝑖
∗ = 𝑔1(𝑥𝑖1, 𝑥𝑖
2∗, 𝑤1𝑖, 𝑦−1𝑖
∗ ) + 𝑒1𝑖
…
𝑦𝑘𝑖∗ = 𝑔𝑘(𝑥𝑖
1, 𝑥𝑖2∗
, 𝑤𝑘𝑖, 𝑦−𝑘𝑖∗ ) + 𝑒𝑘𝑖
𝑥𝑖2∗
= 𝜋(𝑥𝑖1, 𝑧𝑖) + 𝜈𝑖
𝑑𝑒𝑓𝑖∗ = {
1, 𝑔𝑑𝑒𝑓(𝑦𝑖∗, 𝑥𝑖
1, 𝑥𝑖2∗
) + 𝑒𝑑𝑒𝑓,𝑖 ≥ 0
0, 𝑔𝑑𝑒𝑓(𝑦𝑖∗, 𝑥𝑖
1, 𝑥𝑖2∗
) + 𝑒𝑑𝑒𝑓,𝑖 < 0
(1)
6
(𝑦𝑖 , 𝑥𝑖1, 𝑥𝑖
2, 𝑑𝑒𝑓𝑖) = 𝑑𝑖(𝑦𝑖∗, 𝑥𝑖
1, 𝑥𝑖2∗
, 𝑑𝑒𝑓𝑖∗) is observed
where 𝑑𝑖 is a binary indicator of contract signing (both bank’s and borrower’s decision), 𝑥𝑖1 is a set of
demographic and financial characteristics of the borrower and co-borrowers, 𝑦𝑖 is a set of credit terms, 𝑥𝑖2
is the logarithm of loan limit, (𝑤0𝑖, 𝑤1𝑖, … , 𝑤𝑘𝑖, 𝑧𝑖) is a set of excluded instruments for contract signing
decision, credit terms and loan limit respectively. The set of credit terms 𝑦𝑖 then will contain LTV,
logarithm of rate, type of rate, logarithm of maturity and probability of having government insurance. 𝑑𝑒𝑓𝑖
is a binary indicator of default.
The paper of Ozhegov (2014) extends proposed methods for identification and estimation of non-
triangular system of simultaneous equations with sample selection, endogenous regressors and arbitrary
joint error distribution and functional form of regression and control functions in reduced and structural
forms. We may apply this method to estimate model (1) with the following steps.
1. Firstly, we need to estimate the propensity score for the contract agreement equation:
𝑝 = 𝐸[𝑑|𝑥0, 𝑤0] = 𝑔0(𝑤0, 𝑥0) (2)
2. On the second step we will estimate the prediction of endogenous regressors which is log of loan
limit corrected for sample selection using estimate of propensity score:
𝐸[𝑥2|𝑥1, 𝑧, 𝑤0, 𝑑 = 1] = 𝜋(𝑥1, 𝑧) + 𝜆(�̂�) (3)
3. Then we will estimate each contract term equation in the reduced form corrected for sample
selection and endogeneity of loan limit using estimates of propensity score and residuals from the
loan limit equation:
𝐸[𝑦𝑗|𝑥1, 𝑥2, 𝑧, 𝑤, 𝑤0, 𝑑 = 1] = 𝛾𝑗(𝑥1, 𝑥2, 𝑤) + 𝜇(�̂�, �̂�) (4)
4. On the next step we will estimate the structural form contract terms equations corrected for sample
selection, endogeneity and simultaneity using estimates of propensity score, residuals from loan
limit equations and reduced form contract terms residuals:
𝐸[𝑦𝑗|𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗, 𝑧, 𝑤−𝑗, 𝑤0, 𝑑 = 1] = 𝑔𝑗(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗) + 𝜑(�̂�, �̂�, �̂�−𝑗) (5)
5. On the last step we will estimate the probability of default equation corrected for sample selection
and endogeneity of contract terms using propensity score, residuals from loan limit equation and
structural form residuals:
7
𝐸[𝑑𝑒𝑓|𝑥1, 𝑥2, 𝑦, 𝑧, 𝑤, 𝑑 = 1] = 𝑔𝑑𝑒𝑓(𝑥1, 𝑥2, 𝑦) + 𝜑𝑑𝑒𝑓(�̂�, �̂�, �̂�) (6)
In Ozhegov (2014) it was show that if all regression and correction functions are continuously
differentiable and we have at least one excluded variable for selection equation and matrix of instrument’s
marginal effects in reduced form contract terms equations has full rank then equations (2)-(6) is identified
up to additive constant.
An estimation procedure is based on approximation by series of power functions which depend on
initial set of regressors.
Let 𝜔 = (𝜔1, … , 𝜔𝜒) be a set of variables with 𝜒 = 𝑑𝑖𝑚(𝜔).
𝜅(𝜌, 𝜒) =(𝑝+𝜒)!
𝑝!𝜒! will be the number of polynomial terms with power no more than 𝜌 which may be
obtained from 𝜒 variables.
Let Q𝜌(𝜔) = (𝑞1(𝜔), … , 𝑞𝜅(𝜔)) be a vector of 𝜅 power functions, which are a full set of
polynomial terms with the power no more than 𝜌 obtained from 𝜔, i.e. 𝑞𝑗(𝜔) = ∏ 𝜔𝜏𝑠𝜏𝜒
𝜏=1 , ∑ 𝑠𝜏𝜒𝜏=1 ≤ 𝜌,
𝑠𝜏 ∈ {0,1, … , 𝑝} ∀𝜏 = 1, 𝜒̅̅ ̅̅ ̅.
Let Q𝜌(𝜔) call a polynomial approximating series with power 𝜌.
Then the propensity score of selection equation may be estimated by OLS as
𝑝�̂� = 𝐸[𝑑𝑖|𝑥0𝑖, 𝑤0𝑖] = 𝑄𝜌0(𝑤0𝑖, 𝑥0𝑖)[(𝑄𝜌0(𝑤0, 𝑥0))′𝑄𝜌0(𝑤0, 𝑥0)]−1(𝑄𝜌0(𝑤0, 𝑥0))′𝑑 (7)
Let 𝑎 = (𝑎1, 𝑎2), 𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�) = (𝑄𝑍1(𝑥1, 𝑧), 𝑄𝑍2(�̂�)), then 𝑎 may be obtained by OLS as
�̂� = [(𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�))′𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�)]−1(𝑄𝑍1,𝑍2(𝑥1, 𝑧, �̂�))′𝑥2 (8)
Then the residuals of loan limit equation may be obtained as
�̂�𝑖 = 𝑥𝑖2 − 𝑄𝑍1,𝑍2(𝑥𝑖
1, 𝑧𝑖 , �̂�𝑖)�̂� (9)
Let 𝑏𝑗 = (𝑏1𝑗, 𝑏2𝑗) and 𝑄𝑀1,𝑀2(𝒲) = 𝑄𝑀1,𝑀2(𝑥1, 𝑥2, 𝑤, �̂�, �̂�) = (𝑄𝑀1(𝑥1, 𝑥2, 𝑤), 𝑄𝑀2(�̂�, �̂�)) then
𝑏𝑗 may be obtained by OLS as
�̂�𝑗 = [(𝑄𝑀1,𝑀2(𝒲))′𝑄𝑀1,𝑀2(𝒲)]−1(𝑄𝑀1,𝑀2(𝒲))′𝑦𝑗 (10)
Then the reduced form contract terms residuals will be
�̂�𝑗𝑖 = 𝑦𝑗𝑖 − 𝑄𝑀1,𝑀2(𝑥𝑖1, 𝑥𝑖
2, 𝑤𝑖 , �̂�𝑖, �̂�𝑖)�̂�𝑗 (11)
Let 𝛽𝑗 = (𝛽1𝑗, 𝛽2𝑗) and 𝑄𝜉1,𝜉2(𝒳) = 𝑄𝜉1,𝜉2(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗, �̂�, �̂�, �̂�−𝑗) =
(𝑄𝜉1(𝑥1, 𝑥2, 𝑤𝑗, 𝑦−𝑗), 𝑄𝜉2(�̂�, �̂�, �̂�−𝑗)) then the estimate for 𝛽𝑗 may be obtained by OLS as
8
�̂�𝑗 = [(𝑄𝜉1,𝜉2(𝒳))′𝑄𝜉1,𝜉2(𝒳)]−1(𝑄𝜉1,𝜉2(𝒳))′𝑦𝑗 (12)
Then the structural form contract terms residuals will be
�̂�𝑗𝑖 = 𝑦𝑗𝑖 − 𝑄𝜉1,𝜉2(𝒳)�̂�𝑗 (13)
Let 𝛼 = (𝛼1, 𝛼2) and 𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�) = (𝑄𝜃1(𝑥1, 𝑥2, 𝑦), 𝑄𝜃2(�̂�, �̂�, �̂�)) then the estimate
for 𝛼 may be obtained by OLS as
�̂� = [(𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�))′
𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�)]−1 ∗
∗ (𝑄𝜃1,𝜃2(𝑥1, 𝑥2, 𝑦, �̂�, �̂�, �̂�))′𝑑𝑒𝑓
(14)
Crucial assumption for identification is full rank of matrix of marginal effects of excluded
instruments. Instead of validity assumption this one is testable. We will follow Sanderson and Windmeijer
(2014) conditional F-test approach in order to test the hypothesis of matrix of marginal effects reduction to
one from full rank.
Firstly, consider testing of linear model with multiple endogenous variables. Then we will
generalize the test for nonlinear semiparametric model with sample selection.
In simplest case studied by Sanderson and Windmeijer (2014) we have one linear equation of y
with k endogenous variables 𝑋 = (𝑥1, … , 𝑥𝑘) = (𝑥𝑗, 𝑥−𝑗) and m instruments 𝑍 = (𝑧1, … , 𝑧𝑚) independent
on distribution of error terms. The model may be expressed as
𝑦 = 𝑋𝛽 + 𝑒0
𝑥𝑗 = 𝑍𝜋𝑗 + 𝑒𝑗
(𝑒0, 𝑒1, … , 𝑒𝑘) ⊥ 𝑍
The problem is to test whether 𝑟𝑎𝑛𝑘[Π′𝑍′𝑍Π] = 𝑘. Stock and Yogo (2005) introduced a test based
on minimal eigenvalue of matrix Π̂′𝑍′𝑍Π̂
𝑚 with 𝐻0: 𝑟𝑎𝑛𝑘[Π′𝑍′𝑍Π] = 𝑘 − 1. If its minimal eigenvalue
statistically differs from zero we can reject the null that matrix has not full rank and instruments are weak.
They also calculated critical values for the test but for 𝑚 ≤ 2. Sanderson and Windmeijer (2014) followed
Angrist and Pischke (2009) conditional F-test approach for testing of joint significance of instruments in
reduced form regression. Angrist and Pischke (2009) proposed conditional F-statistics and Sanderson and
Windmeijer (2014) corrected its asymptotic distribution and proved equivalence to Stock and Yogo (2005)
test. For this application conditional F-testing approach has an advantage of existence of known limiting
9
distribution that can be easily extended for semiparametric and sample selection case and we can easily
calculate its critical values even for 𝑚 > 2. Testing contains 3 steps: 1) estimation of endogenous variable
conditional on all other endogenous variables, 2) estimation of conditional reduced form parameters and 3)
calculating test statistics. Formally saying:
1) Obtain 𝛾𝑗 by OLS from regression 𝑥𝑗 = 𝑥−𝑗𝛾𝑗 + 𝜉𝑗
2) Obtain �̂�𝑗 by OLS from regression 𝑥𝑗 − 𝑥−𝑗𝛾𝑗 = 𝑍𝜅𝑗 + 𝜈𝑗
3) For each endogenous variable calculate instrument’s conditional 𝐹𝑥𝑗|𝑥−𝑗=
�̂�𝑗′𝑍′𝑍�̂�𝑗
(𝑚−𝑘+1)(�̂�𝑗
′�̂�𝑗
𝑛)
.
For the case of semiparametric equation with continuously differentiable regression functions like
𝑦 = 𝑔(𝑋) + 𝑒0
𝑥𝑗 = 𝜋𝑗(𝑍) + 𝑒𝑗
(𝑒0, 𝑒1, … , 𝑒𝑘) ⊥ 𝑍
for joint instrument’s Z relevance we need to prove that 𝑟𝑎𝑛𝑘 [𝜕𝜋(𝑍)
𝜕𝑍] = 𝑑𝑖𝑚(𝑋) = 𝑘. If we approximate
each unknown regression function 𝑓(𝑠) with its polynomial approximation function 𝑄𝜌(𝑠)𝛼 with power 𝜌
and 𝑑𝑖𝑚 (𝛼) =(𝜌+𝑑𝑖𝑚(𝑠))!
𝜌! 𝑑𝑖𝑚(𝑠)! then we can test exclusion restriction by:
1) Obtain 𝛾𝑗 by OLS from regression 𝑥𝑗 = 𝑄𝜌(𝑥−𝑗)𝛾𝑗 + 𝜉𝑗
2) Obtain �̂�𝑗 by OLS from regression 𝑥𝑗 − 𝑄𝜌(𝑥−𝑗)𝛾𝑗 = 𝑄𝜌(𝑍)𝜅𝑗 + 𝜈𝑗
3) For each endogenous variable calculate instrument’s conditional 𝐹𝑥𝑗|𝑥−𝑗=
�̂�𝑗′(𝑄𝜌(𝑍))′𝑄𝜌(𝑍)�̂�𝑗
((𝜌+m)!
𝜌!m! −
(𝜌+k)!
𝜌!k!+1)(
�̂�𝑗′�̂�𝑗
𝑛)
.
This type of testing is very simply may be generalized for the case of presence of sample selection
by including control functions for error terms when estimating regressions parameters on steps (1-2).
Data description
One of the regional AHML operators provided the data set of all applications for mortgage collected
from 2008 to 2012. We know the demographic and financial characteristics of each of the 3870 applicants
as main borrowers and their co-borrowers on the date of application, we also know the date of application
(Table A.1). For all signed contracts we know the loan limit set by the bank, the contract terms, and the
value of property. The characteristics of the borrower are fully observable and the contract characteristics
10
are partially observable for only the subsample of applicants who signed the contract (Table A.2).
Some mortgage programs allow the applicants not to provide any information on their income.
These programs are usually linked with a higher contract rate. The reason for this choice may be explained
by a temporary or changeable income (LaCour-Little, 2007), for instance, for entrepreneurs. Generally,
income should be considered endogenous while modeling the approval of borrower or contract terms.
However, we can control for employment category, which rejects the inconsistency due to possible
endogeneity of income. Moreover, co-borrower income may also be endogenous and we cannot provide
any proxy for co-borrower income since we do not have any characteristics of co-borrowers. This is a
limitation of the research. But we may consider it as insignificant for the choice of contract terms compared
to the income of the main borrower.
To estimate the model we need to find a set of relevant excluded instruments for the probability of
signing a contract, the loan limit and each credit term.
Bajari et al. (2008) discussed the possibility of using aggregated district-level variables as proxies
for unavailable data. We will use the same strategy to find the set of instruments. Since we have data without
spatial variation we can use time variation in applications. We have data from July 2008 to August 2012
and we know the application date for each applicant. Each application was matched with the set of
aggregated mortgage and housing market characteristics by date of application. On average, the process
takes two months from the date of application to the date of contract agreement. Also, Ozhegov and
Poroshina (2013) showed that aggregated demand on mortgage reacts to changes in supply within two
months. Then we need to fix the aggregated market characteristics for each application not only in the
month of application but also the 1-2 months prior the application, and use these as instruments.
Table A.3 represents the descriptive statistics of aggregated mortgage and housing market
characteristics for the period from July 2008 to August 2012 (50 months).
About 15% of issued loans were refinanced by AHML, but not all of them were issued by the bank
supplying the data. Generally, the number of applications to the bank is fewer than the number refinanced
by AHML by all the regional banks.
The difference between the number of loans refinanced by AHML and the number of applications
to the bank within a particular month may be the excluded variable which explains the probability of
contract agreement, but it does not affect the contract term choice. Since every commercial bank operates
11
with the same AHML programs, the difference in the approval process does not affect the term choice. But
an increase in the number of refinanced loans shows the changes in the underwriting process in other banks
and may correlate with the probability of a contract agreement with the bank. This variable should be
considered as exogenous since each individual decision explains negligible variation of the aggregated
market characteristic (less than 1%).
As an excluded instrument for the loan limit we use the mean Debt-to-Income ratio (DTI). The
positive dependence of these two variables is because the mean DTI for all issued loans reflects the
evaluation of the mean credit risk (the higher the DTI of issued loans, the less risk). It positively correlates
with the loan limit, which reflects the willingness to issue a larger loan for a particular borrower. This
variable is valid since individual shocks of loan limit do not affect the aggregated characteristic of issued
loans.
As excluded instruments for credit terms, LTV, rate, maturity and insurance, we used mean LTV,
median rate, median maturity for issued loans and the housing affordability coefficient. The relevance of
the first three instruments is implied by the interdependence of mortgage market characteristics and the
AHML credit programs conditions. Validity is implied by the exogeneity of the program terms for each
particular borrower.
The affordability coefficient is relevant for the probability of insurance because the increase of
affordability should lead to the choice of a lower LTV and consequently to a lower probability of loan
insurance. Validity is also implied by the independence of individual preference on insurance shocks and
the aggregated affordability of housing. All the variables are relevant and valid and may be used as
instruments. The relevance was proved for each model with the conditional F-testing approach.
Preliminary results
First, we estimated the model of the probability of a contract agreement (Table A.4) based on the
characteristics of the borrower and co-borrowers and the difference between the number of AHML
refinanced loans and the number of applications. The last variable which was taken as an excluded
instrument is significant at the 1% level. The sign and significance of borrower characteristics are consistent
with recent research. The demographic characteristics, such as age, sex, marital status and the level of
education of the borrower are insignificant, which supports the absence of discrimination. The probability
12
of a contract agreement is positively correlated with the income of the main borrower and co-borrowers
and, on the contrary, negatively correlates with the failure to provide income details. Entrepreneurs have a
higher probability of a contract agreement ceteris paribus.
These estimates were obtained from the linear probability model and were compared with the probit
model. The comparison showed an insignificant difference in the significance of the parameter estimates
and predictive power (with slightly higher predictive power for the linear probability model). The
propensity score 𝑝�̂� = 𝐸[𝑑𝑖|𝑥𝑖, 𝑤0𝑖] was obtained from the linear probability model.
The model of the logarithm of the loan limit was estimated for all the signed contracts (Table A.5).
The excluded instrument (DTI) is significant at the 1% level. We used polynomials up to the third power
as approximations for the control function on �̂�. The hypothesis of its significance was rejected (at 29%
level) which suggests there is no selection bias of underwriting on the set of loan limits. The estimated
parameters for borrower characteristics are also not counterintuitive.
For each credit term we estimated the reduced form equation in order to test instrument’s relevance.
The control function was approximated by the polynomial with power 𝑀2 on the estimate of the propensity
score and the loan limit equation residuals. The regression function was estimated as partially polynomial.
It was linear for the characteristics of the borrower and polynomial for the excluded instruments for contract
terms with power 𝑀1. We test three set of instruments. First we fix market-level variables on the month of
application. For the second and third sets we used market-level data for month one and two months before
the month of application respectively. The proof of relevance of excluded instruments based on conditional
F-test is provided in Table A.6. All sets of excluded instruments are relevant on 5% level. The set of market-
level variables fixed on the month before the month of application are significant at 1% level.
All excluded variables appear to be relevant and is robust to degree of approximation which allows
to use it for obtaining consistent estimates for contract terms and default equations.
Participants, funding and project plan
Evgeniy M. Ozhegov and Agatha M. Lozinskaya (Poroshina) are young research fellows of the
research group for applied markets and enterprises studies (Grigory Kosenok, PhD in economics, full
professor in NES, is a scientific advisor of the Group and this project) and senior lectureres in Department
of Applied mathematics in Higher School of Economics. They have 18 published articles, on demand
13
estimation, credit risk evaluation and mortgage lending modeling, including one on demand estimation in
international double-peer review journal.
Parts of this research were presented on 6 international conferences in 2013-2014: American Real
Estate and Urban Economics Association International Conference, Perm Winter School on Market Risk
and Modeling of Financial Markets, Russian Economic Congress, Eurasian Business and Economic
Society, International Conference on Applied Research in Economics, XV April International Academic
Conference on Economic and Social Development.
There is no recent won grants for this project. Basic salary of senior lecturer (1 wage) and young
researcher (half of wage) in NRU HSE is 700 euros.
Planned steps of project are:
1. Estimation of contract terms choice and default equation using abovementioned procedure.
Calculate the distribution of the expected loss for AHML and insurance company.
2. Interpreting the estimation and calculation results. Describing the results in paper for russian
journal (March 2014). Refining and submitting to journal on real estate (Real Estate
Economics) (July 2015).
References
Angrist J., Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton
University Press, Princeton.
Ambrose, B., LaCour-Little M., Sanders, A. 2004. The Effect of Conforming Loan Status on Mortgage
Yield Spreads: A Loan Level Analysis. Real Estate Economics, 32(4), pp. 541–569.
Attanasio, O.P., Goldberg, P.K., Kyriazidou, E. 2008. Credit Constraints in the Market for Consumer
Durables: Evidence from Micro Data on Car Loans. International Economic Review, 49(2), pp. 401-
436.
Berry, S., Levinsohn, J., & Pakes, A., 1995. Automobile prices in market equilibrium. Econometrica:
Journal of the Econometric Society, 841-890.
Das, M., Newey, W. K., Vella, F. 2003. Nonparametric estimation of sample selection models. The Review
of Economic Studies, 70(1), pp. 33-58.
Follain, J. R. 1990. Mortgage choice. Real Estate Economics, 18(2), pp. 125–144.
McFadden D., 1973. Conditional logit analysis of qualitative choice behavior. In: Zaremka P, editor.
Frontiers in Econometrics. New York: Academic Press,. pp. 105-42
McFadden, D., 1978. Modeling the choice of residential location. In: Karlquist, A. (Ed.), Spatial Interaction
Theory and Residential Location. North-Holland, Amsterdam, pp. 75–96.
14
Nevo, A., 2000. Mergers with differentiated products: the case of the ready-to-eat cereal industry, RAND
Journal of Economics, 31(3), pp. 395-421
Ozhegov E. M., Poroshina A. M. (2013). The Lagged Structure of Dynamic Demand Function for Mortgage
Loans in Russia. EJournal of Corporate Finance, 27, 37-49.
Ozhegov, E.M., 2014. Identification and estimation of nonparametric simultaneous equations with sample
selection, mimeo.
Phillips, R., Yezer, A. 1996. Self-Selection and Tests for Bias and Risk in Mortgage Lending: Can You
Price the Mortgage If You Don't Know the Process? Journal of Real Estate Research, 11, pp. 87–102.
Philips, R., Trost R., Yezer, A. 1994. Bias in Estimates of Discrimination and Default in Mortgage Lending:
the Effects of Simultaneity and Self-Selection. Journal of Real Estate Finance and Economics, 9, pp.
197-215.
Rachlis, M., Yezer A. 1993. Serious Flaws in Statistical Tests for Discrimination in Mortgage Markets.
Journal of Housing Research, 4, pp. 315–336.
Ross, S. L. 2000. Mortgage Lending, Sample Selection and Default. Real Estate Economics, 28(4), pp. 581-
621.
Sanderson, E., Windmeijer, F. (2014). A Weak Instruments F-test in linear IV models with multiple
endogenous variables. Discussion paper 14/644, University of Bristol, Department of Economics.
Stock, J.H., Yogo M. (2005). Testing for weak instruments in linear IV regression. In: D.W.K. Andrews
and J.H. Stock (Eds.), Identification and Inference for Econometric Models, Essays in Honor of
Thomas Rothenberg, 80-108. New York: Cambridge University Press.
15
Appendix
Tab. A1. Descriptive statistics for applicants’ characteristics.
Variable Full sample
(33442 obs.)
Signed contract
(2019 obs.)
Did not signed
contract (1325
obs.)
Age3, years 33.77 33.93 33.53
(7.56) (7.65) (7.41)
Sex
Male 1848 (55.3%) 1151 (57.0%) 697 (52.6%)
Female 1496 (44.7%) 868 (43.0%) 628 (47.4%)
Marital status
Married 1793 (53.6%) 1132 (56.1%) 661 (49.9%)
Single 1013 (30.3%) 587 (29.1%) 426 (32.2%)
Divorced 497 (14.9%) 281 (13.9%) 216 (16.3%)
Widowed 41 (1.2%) 19 (0.9%) 22 (1.7%)
Category of employment
Hired employee 3210 (96.0%) 1923 (95.2%) 1287 (97.1%)
State-owned employee 111 (3.3%) 79 (3.9%) 32 (2.4%)
Entrepreneur 23 (0.7%) 17 (0.8%) 6 (0.5%)
Level of education
Complete higher 1756 (52.5%) 1116 (55.3%) 640 (48.3%)
Less than higher 1588 (47.5%) 903 (44.7%) 685 (51.7%)
Declared income of main borrower
Not declared 2333 (69.8%) 1223 (60.6%) 1110 (83.8%)
From 0 to $249 85 (2.5%) 47 (2.3%) 38 (2.9%)
From $250 to $499 279 (8.3%) 237 (11.7%) 42 (3.2%)
From $500 to $1 000 442 (13.2%) 358 (17.7%) 84 (6.3%)
More than $1 000 205 (6.1%) 154 (7.6%) 51 (3.8%)
Number of co-borrowers
0 1416 (42.3%) 823 (40.8%) 593 (44.8%)
1 1794 (53.6%) 1105 (54.7%) 689 (52.0%)
2 134 (4.0%) 91 (4.5%) 43 (3.2%)
Declared income of co-borrowers
Not declared 2939 (87.9%) 1677 (83.1%) 1262 (95.2%)
From 0 to $249 105 (3.1%) 97 (4.8%) 8 (0.6%)
From $250 to $499 157 (4.7%) 129 (6.4%) 28 (2.1%)
More than $500 143 (4.3%) 116 (5.7%) 27 (2.0%)
2 The outliers from the sample were excluded. We treat an observation as an outlier if the age, level of education,
marital status or other characteristics were missing (119 obs.). We exclude observations with borrowers under age 21,
with LTV or DTI (debt-to-income ratio) less than 0 or more than 1 (135 obs.). We consider those outliers as random
and due to the errors in the database. We also exclude 2.5% observations with extremal value of bought property from
each side of its distribution (89 obs.). After excluding the outliers the sample was 3344 observations. 2019 applicants
signed the mortgage contract, while 1325 of them did not. 3 Mean and standard deviation in the parenthesis.
16
Tab. A2. Descriptive statistics of the issued loans (2019 contracts).
Variable Mean St. dev. Min Max
Loan amount, $ 25 068.3 12 440.54 3 750 120 000
Downpayment, $ 20 130.0 13 740.91 1 250 117 500
Flat value, $ 45 198.3 21 191.38 12 500 175 000
Monthly payment, $ 303.5 158.8 60.2 1 766
Loan limit, $ 25 964.3 13 394.9 3 750 183 750
Loan-to-value ratio (LTV) 0.57 0.16 0.11 0.94
Maturity, months 190.4 61.5 26 360
Rate, % 11.4 1.54 9.5 19
Insurance Not insured 1837 (91.0%)
Is insured 182 (9.0%)
Insured Not insured Total
Indicator of default Not defaulted 153 (84.1%) 1772 (96.5%) 1925 (95.3%)
Defaulted 29 (15.9%) 65 (3.5%) 94 (4.7%)
Tab. A3. Aggregated mortgage and housing markets characteristics.
Variable Mean St. dev. Min Max
Volume of issued mortgage in region, mln. $ 23.0 14.1 2.9 54.8
Volume of issued mortgage in region, number 894.4 529.2 134 2112
Mean loan amount, $ 28 814.2 6299.8 22 482.7 47 705.0
Median maturity, months 200.79 12.81 173 222.2
Median rate, % 12.97 0.80 12 14.3
Mean LTV 0.58 0.03 0.48 0.65
Mean DTI4 0.35 0.01 0.33 0.37
Mean ft2 value, $ 89.7 14.3 66.9 119.2
Affordability of housing coefficient5 0.287 0.055 0.215 0.389
Number of refinanced in AHML loans 129.1 83.7 30 310
Number of application to the bank 121.4 51.9 43 222
4 DTI – ratio between monthly payment and monthly income. 5 Affordability coefficient reflects the ratio between an income of mean household and a value of mean flat.
17
Tab. A4. Estimated parameters for selection equation.
Variable (1) (2)
OLS Probit
Age of borrower -0.006 -0.014
(0.009) (0.025)
Age squared 0.000 0.000
(0.000) (0.000)
Male 0.028 0.081
(0.018) (0.051)
Family status (Married is base level):
Single -0.029 -0.093
(0.025) (0.071)
Divorced -0.042 -0.130
(0.029) (0.083)
Widowed -0.130* -0.363*
(0.076) (0.209)
Category of activity (Hired employee is base level):
Entrepreneur 0.066 0.165
(0.099) (0.294)
State employee 0.140*** 0.393***
(0.045) (0.131)
Level of education (Complete higher is base level):
Less than higher -0.071*** -0.197***
(0.017) (0.047)
Number of co-borrowers (No co-borrowers is base level)
1 co-borrower 0.001 -0.015
(0.024) (0.069)
2 co-borrowers 0.019 0.055
(0.048) (0.140)
Declared income of co-borrowers (Not declared is base level):
From 0 to $249 0.155*** 0.731***
(0.052) (0.198)
From $250 to $499 0.088** 0.291**
(0.043) (0.135)
More than $500 0.073 0.245*
(0.045) (0.138)
Declare income of main borrower (Not declared is base level):
From 0 to $249 -0.011 -0.083
(0.054) (0.151)
From $250 to $499 0.265*** 0.798***
(0.034) (0.107)
From $500 to $999 0.232*** 0.656***
(0.027) (0.080)
More than $1000 0.179*** 0.475***
(0.036) (0.105)
Difference between AHML loans number and number of
applications
-0.000*** -0.001***
(0.000) (0.000)
Constant 0.646*** 0.295
(0.161) (0.452)
N 3344 3344
k 20 20
% of correct predictions 64.8 64.7
Test for excluded variable significance 𝐹(1, 3224)=31.98 𝜒2(1)=32.23
Note: Robust standard errors are in parenthesis,
significance level obtained from t-statistics,
* - 10%, ** - 5%, *** - 1%.
k – number of estimated parameters, N – number of observations
18
Tab. A5. Estimated parameters for loan limit equation. Variable (1) (2) (3)
Age of borrower 0.016* 0.015 0.015
(0.010) (0.010) (0.010)
Age squared -0.000* -0.000* -0.000*
(0.000) (0.000) (0.000)
Male -0.012 -0.011 -0.012
(0.021) (0.021) (0.021)
Family status (Single is base level):
Married -0.037 -0.035 -0.034
(0.027) (0.027) (0.027)
Divorced -0.043 -0.040 -0.039
(0.033) (0.033) (0.033)
Widowed -0.065 -0.062 -0.041
(0.099) (0.100) (0.101)
Category of activity (Hired employee is base level):
Entrepreneur 0.066 0.067 0.065
(0.101) (0.101) (0.101)
State employee -0.063 -0.058 -0.063
(0.053) (0.053) (0.054)
Level of education (Complete higher is base level):
Less than higher -0.166*** -0.169*** -0.164***
(0.023) (0.023) (0.023)
Number of co-borrowers (No co-borrowers is base level):
1 co-borrower 0.087*** 0.087*** 0.085***
(0.027) (0.027) (0.027)
2 co-borrowers 0.129** 0.134** 0.129**
(0.052) (0.052) (0.052)
Declared income of co-borrowers (Not declared is base level):
From 0 to $249 -0.066 -0.068 -0.075
(0.058) (0.063) (0.063)
From $250 to $499 0.038 0.037 0.039
(0.047) (0.047) (0.048)
More than $500 0.200*** 0.203*** 0.207***
(0.046) (0.047) (0.047)
Declare income of main borrower (Not declared is base level):
From 0 to $249 -0.908*** -0.916*** -0.910***
(0.065) (0.066) (0.066)
From $250 to $499 -0.486*** -0.491*** -0.481***
(0.060) (0.060) (0.064)
From $500 to $999 -0.081 -0.084 -0.073
(0.053) (0.053) (0.058)
More than $1000 0.365*** 0.360*** 0.370***
(0.051) (0.052) (0.055)
Mean DTI 0.071*** -2.677** 7.878
(0.015) (1.231) (8.655)
Mean DTI squared - 0.040** -2.190
(0.018) (2.319)
Mean DTI cubed - - 0.021
(0.022)
Prop. score -0.201 -0.225 2.530
(0.180) (0.559) (2.661)
Prop. score squared - 0.014 -3.988
(0.388) (3.883)
Prop. score cubed - - 1.862
(1.803)
Constant 7.550*** 5.155*** -8.291
(0.510) (2.333) (9.977)
N 2019 2019 2019
k 21 23 25
Test for excluded variable significance F(1,1998)=23.5 F(2,1998)=14.3 F(3,1998)=9.5
19
Note: Robust standard errors are in parenthesis,
significance level obtained from t-statistics,
* - 10%, ** - 5%, *** - 1%.
k – number of estimated parameters, N – number of observations
Model (1) was estimated for 𝑍1 = 1, 𝑍2 = 1, model (2) for 𝑍1 = 2, 𝑍2 = 2, model (3) for 𝑍1 = 3, 𝑍2 = 3.
Tab. A6. Results of instruments’ relevance test
I II III
Equation (1) (2) (3) (1) (2) (3) (1) (2) (3)
LTV 2.115 3.039 2.324 3.128 2.162 2.197 2.449 2.059 1.854
Log. of rate 156.0 107.2 70.66 203.1 112.8 67.00 287.1 124.3 69.08
Log. of maturity 2.185 2.227 1.411 4.398 2.356 1.861 3.838 2.582 1.570
Prob. of insurance 4.055 2.275 1.617 2.275 2.805 2.400 2.701 2.756 2.013
10% critical values 1.414 1.352 1.300 1.414 1.352 1.300 1.414 1.352 1.300
5% critical values 1.561 1.473 1.400 1.561 1.473 1.400 1.561 1.473 1.400
1% critical values 1.863 1.719 1.602 1.863 1.719 1.602 1.863 1.719 1.602
Note: In the table cells there are conditional F-statistics of excluded instruments.
Critical values are provided.
For each equation, models (I) are calculated with market-level instruments fixed in the month of application,
models (II) with market-level instruments fixed one month before the month of application,
and models (III) for two months before the month of application.
For each equation, model (1) was estimated for 𝑀1 = 1, 𝑀2 = 1, model (2) for 𝑀1 = 2, 𝑀2 = 2, model (3) for 𝑀1 = 3, 𝑀2 =3.