benford’s law and property appraisals for private label ... · 1 benford’s law and property...
TRANSCRIPT
1
Benford’s Law and Property Appraisals for Private-label Mortgages
1. Introduction A mathematical property, which has become known as Benford’s Law, was discovered
independently by Newcomb (1881) and Benford (1938). Benford’s Law holds that, contrary to
intuition, the digits in large sets of positive-valued, naturally occurring numbers that range over
many orders of scale are not uniformly distributed: instead they often (not always) follow a
logarithmic distribution such that numbers beginning with smaller digits appear more frequently
than those beginning with larger ones. Because manipulated, unrelated, or created numbers
usually do not follow a Benford distribution, Benford’s Law has been used to identify suspicious
data in a variety of settings. Financial auditors, for example, routinely check data for compliance
with Benford’s Law (Kumar and Bhattacharya, 2007). Although Benford’s Law is not without
its critics (e.g. Diekmann and Jann, 2010), a growing body of empirical evidence suggests that
fraud may be a possibility when data deviates from the Benford distribution.
This paper reports the first application of Benford’s Law to property appraisals for
private-label mortgages. Using ABSNet Loan which covers the majority of the private mortgage
securitization industry, we examine whether the distribution of the first digit in house appraisal
values significantly deviate from Benford’s distribution. Our empirical results show significant
deviations from a Benford’s distribution in the period leading up to the financial crisis. Also,
property appraisal values for large originators’ mortgages do not conform to what Benford
predicts with the largest deviation for WaMu that had $9.9 million settlement for its appraisal
fraud with eAppraiseIT. We also identify loan characteristics that are closely related to
nonconformity of appraisal data with the Benford’s distribution. Above all, ‘exotic’ loans with
the features of negative amortization, balloon payments, and interest-only payments significantly
2
deviate from the distribution of natural data. First-lien mortgages show conformity to Benford’s
distribution while second-lien loans do not. Regarding loan purpose, cash-out refinancing and
purchase loans conform while refinance, debt consolidation, home improvement, and
construction loans do not conform. Owner occupied and second home loans acceptably conform
while non-owner/investment loans only marginally conform. Adjustable rate mortgages (ARM)
had slightly larger deviations than fixed rate mortgages (FRM) although both types of mortgages
conform to Benford’s distribution. Deviations in house value appraisal are not restricted to the
four states of Arizona, California, Florida, and Nevada hit the hardest by the subprime mortgage
crisis. In most of the states where non-agency mortgages were popular, property appraisals show
significant deviations.
The remainder of the paper is organized as follows. Section 2 provides a brief review of
Benford’s Law. The third section contains a review of the literature related to Benford’s law.
Section 4 introduces the data and presents the empirical results regarding the conformity or
deviation in property appraisals from Benford’s distribution in terms of loan vintages, originators,
and major loan-level characteristics. Section 4 concludes.
2. Benford’ Law
According to Benford’s Law the expected occurrence or proportion of a given number (a)
as the first digit in a number set (P1a) can be calculated using equation (1).
P1a = log10 (a + 1) – log10 (a) (1)
3
Further, the expected proportion of a given number (a) as the first digit and the number (b)
as the second digit (P1a2b) can be calculated using equation (2).
𝑃1𝑎2𝑏 = log10 (𝑎 +𝑏+1
10) − log10 (𝑎 +
𝑏
10) (2)
And equation (3), which sums equation (2) over all possible a values for a particular b
value yields an overall expected proportion for b as the second digit (proportions are shown in
the third column of Exhibit 1).
𝑃1𝑎2𝑏 = ∑ (𝑙𝑜𝑔10 (𝑎 +𝑏+1
10) − 𝑙𝑜𝑔10 (𝑎 +
𝑏
10))9
𝑎=1 (3)
The expected proportion of each number in the third, and all subsequent, digits can be
similarly derived. Exhibit 1 shows the proportion of each number in the first through fourth
digits as predicted by Bedford’s Law. Note that the proportions shown in Exhibit 1 are skewed
towards 1 for the first digit (because zero cannot be a first digit) and towards zero for subsequent
digits.
4
Table 1
Expected Proportions Based on Benford’s Law
Number 1st digit 2nd digit 3rd digit 4th digit
0 .11968 .10178 .10018
1 .30103 .11389 .10138 .10014
2 .17609 .19882 .10097 .10010
3 .12494 .10433 .10057 .10006
4 .09691 .10031 .10018 .10002
5 .07918 .09668 .09979 .09998
6 .06695 .09337 .09940 .09994
7 .05799 .09035 .09902 .09990
8 .05115 .08757 .09864 .09986
9 .04576 .08500 .09827 .09982
Source: Nigrini (1996)
3. Literature Review
Newcomb’s mathematical discovery was ignored for nearly six decades until Bedford
rediscovered it, and for another six decades after its rediscovery published empirical applications
of Bedford’s Law were sparse. In recent years, however, empirical studies have mushroomed.
A variety of data has been shown to follow Benford’s Law including, among others, aggregated
data reported to American (Nigrini, 1996) and Italian (Mir, et al, 2014) taxing agencies, prices in
various stock markets (Ley, 1996) and eBay auctions (Giles 2007).
As mentioned in the introduction, financial auditors routinely check data for compliance
with Benford’s Law. For example, McGinty (2014) relates the results an audit of a national call
5
center. Several hundred call center operators were authorized to issue refunds up to $50
(anything larger required the permission of a supervisor), and each operator had processed more
than 10,000 refunds over several years. Auditors decided to check whether the first digit of each
operator’s refunds was consistent with Benford’s Law. For most operators no discrepancy was
discovered, but for a small group there was a large spike in the 4 category indicating that lots of
refunds just below the $50 threshold were being issued. Further investigation revealed that these
operators had issued thousands of dollars in fraudulent refunds to themselves, family and friends.
Deviations from a Benford distribution are not necessarily a result of fraud. McGinty
(2014a) describes another case in which reasonable explanations for incongruities were
discovered. Auditors ran a Benford test on three types of a client’s expense accounts. Two
ended up exactly as predicted by Benford’s Law. For the third, auto and truck expenses, 9s were
overrepresented, and 1s were underrepresented. Further investigation, however, indicated the
discrepancies were not fraudulent as many employees were simply following company policy
which allowed them to expense gas purchases and to combine expenses as long as the combined
amount didn’t exceed $100. The price of a tank of gas had effectively eliminated 1s from the
equation, and combining expenses increased the frequency of 9s.
Because Benford’s Law works best with large data sets, many researchers using
Benford’s Law to analyze the private sector use data from an entire industry or groups of
companies rather than focusing on a particular company; a procedure followed in the present
study. Some of these studies report small irregularities or data that conformed to the Bedford
distribution. Alali and Romero (2013a) conducted tests on a variety of accounts from financial
statements of American banks that failed between October, 2000 and February 2012. First, they
compared the distribution of the first digit in the accounts to Benford’s theoretical distribution.
6
They also computed what Nigrini (1996) coined the distortion factor model which equals the
difference in the mean of the observed first two digits compared to the expected mean according
to Benford’s Law. They report no significant anomalies. Özer and Babacan (2013) examine the
first digit in annual off-balance sheet disclosures of Turkish Banks over the period 1990-2010
and report significant deviations between the distribution of the reported numbers and Benford’s
theoretical distribution for only one year: 1999. Gava and Vitiello (2014) compared the
distribution asset accounts first digits for fourteen Brazilian companies over the time period 1986
through 2009 to the Benford distribution. The study period contains periods of high and low
inflation, and they found that the data from the low-inflation period fit better to Benford’s law
than data from the high-inflation period, and suggest that high inflation increases the possibility
of fraud.
Other researchers report suspicious data. Johnson (2009) used Benford’s Law to analyze
the first digit of quarterly net income and earnings per share data for twenty-four randomly
selected publicly-traded companies for fiscal years 1999 through 2004 to identify firm
characteristics that may be associated with earnings management. Johnson identified several firm
characteristics where earnings management appeared possible because the earnings distributions
were inconsistent with Benford’s Law, including (1) companies with low capitalization (below
$45 billion), (2) companies with higher levels of inside trading (3% and higher), and (3)
companies that have been publicly traded for less than 25 years.
Hsieh and Lin (2013) analyzed the second digit of quarterly net income reported by 8,817
firms in the U.S. marine industry between the 1st quarter of 1980 and the 1st quarter of 2009.
Finding significantly more zeros in the second digit than would be expected in a Benford
7
distribution, they conclude that managers in the industry engage in managing earnings through
rounding earnings numbers to achieve key reference points.
Several researchers have used Benford’s Law to scrutinize government entities.
Michalski and Stoltz (2013) analyzed data from 1989 through 2007 using Benford’s Law and
conclude that some countries strategically provide manipulated financial data to economic agents.
They observed non-Benford distributions for the first digits of data issued by groups of countries
that: are more vulnerable to high capital outflows, have fixed exchange rate regimes, have the
highest levels of net indebtedness and those that were running current account deficits. In
addition, they report rejection of the Benford distribution for the first digits of the balance of
payments statistics for euro-adopting countries after these countries joined the euro zone.
Johnson and Weggenmann (2013) subjected the first digits in a small set of American state
government data to Benford’s Law. The accounts for each of the fifty states examined were: (1)
total general revenues of the primary government, (2) total fund balance of the general fund, and
(3) total fund balance of governmental funds; all of which are often used as benchmarks in
financial analysis. Most authorities (e.g., Durtschi, 2004) agree that Benford’s Law is most
effective when applied to large data sets, but in Johnson and Weggenmann study only three
(unidentified) years of data were collected, yielding 150 data points for each state/balance. The
authors report distributions in conformity with Benford’s Law for the first two accounts, but
nonconformity for the total fund balance of governmental funds. de Freitas Costa, et.al. (2012)
analyzed 134,281 contracts issued by 20 management units in two Brazilian states and
discovered significant deviations in the distribution of the 1st and 2nd digits from the distribution
predicted by Benford’s Law. The first digit of the contract data contained an excess amount of
the numbers 7 and 8 while 9 and 6 were rare occurrences which the authors assert denoted a
8
tendency to avoid conducting the bidding process. Analysis of the 2nd digit revealed a
significant excess of the numbers 0 and 5 which indicated the use of rounding in determining the
value of contracts.
4. Data and empirical findings
4.1. ABSNet Loan Data 1 ABSNet Loan is one of the most popular loan-level data sources competing with
LoanPerformance from First American CoreLogic, Lender Processing Servies (LPS or formerly
known as McDash Analytics), and BBx data from BlackBox Logic LLC. ABSNet Loan
normalizes and provides loan-level information based on non-agency RMBS performance data
from various trustees and servicers. ABSNet data contains loan-level underwriting characteristics
at the time of origination, and monthly performance and payment information for 22 million
mortgages securitized by private institutions. Vintages for mortgages and HELOCs in ABSNet
date back to 1950 for loan origination and to 1988 for deal closing, However we focus on the
period from 2002 and 2007 when the data has reasonable coverage for the entire private
mortgage securitization industry. As shown in Tables 1 and 2, ABSNet covers 18 million loans
and 5,564 deals between 2002 and 2007. We do not consider the post-crisis period after 2007
when the private-label mortgage market was practically frozen.
Table 1. Loan vintages covered in ABSNet Loan
Loan origination year
Freq Pct Loan origination year
Freq Pct
1950 177 0 1984 11701 0.06
1951 1 0 1985 8134 0.04
1954 8 0 1986 9869 0.05
1955 2 0 1987 15262 0.07
1957 1 0 1988 18942 0.09
1958 4 0 1989 24689 0.12
1959 1 0 1990 15829 0.08
1 See http://www.lewtan.com/products/absnetloan.html.
9
1960 4 0 1991 25986 0.12
1961 4 0 1992 48175 0.23
1962 6 0 1993 69817 0.33
1963 11 0 1994 58503 0.28
1964 6 0 1995 40602 0.19
1965 28 0 1996 80094 0.38
1966 31 0 1997 181093 0.86
1967 39 0 1998 440716 2.09
1968 115 0 1999 513264 2.43
1969 293 0 2000 529794 2.51
1970 201 0 2001 878389 4.17
1971 510 0 2002 1454214 6.9
1972 1055 0.01 2003 2381305 11.29
1973 1129 0.01 2004 3711168 17.6
1974 1031 0 2005 4924732 23.36
1975 1321 0.01 2006 4375701 20.75
1976 6136 0.03 2007 1213662 5.76
1977 3411 0.02 2008 3500 0.02
1978 3725 0.02 2009 387 0
1979 4006 0.02 2010 459 0
1980 3020 0.01 2011 1122 0.01
1981 2516 0.01 2012 4164 0.02
1982 1853 0.01 2013 5157 0.02
1983 5316 0.03
Table 2. Deal vintages covered in ABSNet Loan
Deal closing year
Freq Pct Deal closing year
Freq Pct
1988 6 0.09 2002 505 7.41
1989 13 0.19 2003 785 11.52
1990 6 0.09 2004 990 14.53
1991 27 0.4 2005 1254 18.41
1992 26 0.38 2006 1226 18
1993 77 1.13 2007 804 11.8
1994 68 1 2008 26 0.38
1995 30 0.44 2009 2 0.03
1996 69 1.01 2010 1 0.01
1997 72 1.06 2011 2 0.03
1998 166 2.44 2012 6 0.09
1999 157 2.3 2013 11 0.16
2000 151 2.22 2014 4 0.06
2001 329 4.83
10
To examine whether the distribution of the first digit in property appraisal values, we
focus on OriginalAppraisedValue in ABSNet loan that is defined as “the appraised value of the
property at the time of underwriting.” Table 3 shows two sets of summary statistics for
OriginalAppraisedValue for the entire period and for the period of interest between 2002 and
2007. To avoid the possibility that our results are driven by the difference in distribution between
mortgages and HELOC, and we strictly focus on mortgages for our analysis. ABSNet documents
that OriginalAppraisedValue is calculated using LTV and original loan balance if the appraisal
prices are not provided from trustees and servicers. Among 20 million mortgages whose
appraisal information is available, we focus on 17.8 million loans that averages $353,960 and
ranges between $195 and $230 million.
Table 3. Summary statistics for original appraisal values in ABSNet Loan
N Obs N Miss Mean Min Max
the entire period 20,781,988 644,136 $339,195 $1 $230,000,000
2002 - 2007 17,839,015 221,767 $353,960 $195 $230,000,000
4.2. Deviations by loan vintage In this section, we examine whether deviations in property appraisal values from
Benfords’ Law are related to when mortgages were originated. We first calculate the actual
distributions of the first digit in house appraisal prices year by year from 2002 to 2007 for which
ABSNet Loan has a decent coverage of industry. As shown in Table 4, the number of
originations for loans privately securitized dramatically increases in the period leading up to the
financial crisis hitting 4.86 million in 2005. Interestingly, the portion of loans whose appraisal
11
information is missing generally increases from 0.62% in 2003 to 3.81% in 2007, which implies
the possibility of deterioration in appraisal process.
Table 4. Actual distribution of the first digit in property appraisal values
Originated in
First digit 2002 2003 2004 2005 2006 2007 1 34.43% 34.59% 33.99% 30.63% 28.16% 25.72% 2 15.84% 17.57% 20.10% 20.90% 21.62% 19.46% 3 8.67% 10.04% 12.77% 14.05% 14.52% 13.03% 4 8.66% 7.83% 8.57% 9.38% 9.69% 9.15% 5 7.92% 7.41% 6.48% 7.39% 7.65% 8.85% 6 7.25% 6.74% 5.40% 5.84% 6.48% 8.84% 7 6.41% 5.88% 4.61% 4.63% 4.82% 6.46% 8 5.94% 5.46% 4.37% 3.97% 3.92% 4.88% 9 4.88% 4.48% 3.71% 3.22% 3.12% 3.61%
N Obs 1,429,814 2,366,622 3,689,843 4,864,008 4,321,249 1,167,479
N missing 24,400 14,683 21,325 60,724 54,452 46,183
missing rate 1.68% 0.62% 0.57% 1.23% 1.24% 3.81%
Table 5 presents how much the distribution of each number in the first digit of actual
appraised values deviates from the expected distribution based on Benford’s Law. We calculate
the amount of deviations using mean absolute deviation (MAD), which is the average of absolute
values of the difference between actual and expected portions for each significant digit.
Following Drake and Nigiri (2000), we break down MAD value into four different ranges to
determine the goodness-of-fit.
MAD: 0.000±0.004 (close conformity)
MAD: 0.004±0.008 (acceptable conformity)
MAD: 0.008±0.012 (marginally acceptable conformity)
MAD: greater than 0.012 (nonconformity)
12
As shown in Table 3-2, the distribution of the first digit in loan appraisal values
marginally conforms to Benford’s Law only for 2003 and 2005 with MAD values of 1.1% and
1.19%. The year with the largest MAD is 2004 for which the first digit is populated with 1 or 2
more than Benford’s Law by 3.89% and 2.49%. In 2002 and 2003, the actual distribution is
skewed more to 1 sacrificing the portion of 3, while in more recent years in 2006 and 2007, the
portion of 1 is smaller than natural and digit 2 or 6 appear more frequently than Benford predicts.
Table 5. Deviations in the first significant digit from Benford’s Law
Originated in
First digit 2002 2003 2004 2005 2006 2007 1 4.33% 4.49% 3.89% 0.53% -1.94% -4.38% 2 -1.77% -0.04% 2.49% 3.29% 4.01% 1.85% 3 -3.82% -2.45% 0.28% 1.56% 2.03% 0.54% 4 -1.03% -1.86% -1.12% -0.31% 0.00% -0.54% 5 0.00% -0.51% -1.44% -0.53% -0.27% 0.93% 6 0.55% 0.04% -1.30% -0.86% -0.22% 2.15% 7 0.61% 0.08% -1.19% -1.17% -0.98% 0.66% 8 0.83% 0.35% -0.75% -1.15% -1.20% -0.24% 9 0.30% -0.10% -0.87% -1.36% -1.46% -0.97%
MAD 1.47% 1.10% 1.48% 1.19% 1.34% 1.36%
3.3 Deviations by originators Mortgage properties are appraised in the process of underwriting for which originators
have major controls. Therefore, we break down our sample by originators to examine whether
appraisal deviations are more severe for a particular group of originators. Table 6 presents how
mortgages closed by different group of originators have the difference in the portion of each
number in the first digit of property appraisal values between actual and Benford’s distributions.
The distributions are listed by size in terms of their market share. Originators are defined to be
large if they are ranked above 20th in terms of the number of originations they have made.
13
Originators ranked below 20th in terms of their business size are separately categorized to be
“small originators.” Mortgages whose originator identity information is not available is
categorized to be “unidentified originators.” For all three groups, actual distributions of the first
digit in appraised prices acceptably conform to Benford’s Law, however MAD is higher for large
originators by 0.19% than for small originators, which implies the possibility of poorer appraisal
process among large originators.
Table 6. Difference in proportion of the first digit between actual and expected distributions
First digit Large originators small originators Unidentified originators
1 2.08% 0.15% -0.46%
2 1.06% 1.63% 1.05%
3 -0.31% 0.51% 0.44%
4 -0.92% -0.18% 0.06%
5 -0.63% -0.08% 0.25%
6 -0.18% -0.09% 0.21%
7 -0.36% -0.49% -0.28%
8 -0.21% -0.52% -0.38%
9 -0.53% -0.93% -0.88%
N obs 12,068,607 8,713,381 5,994,009
N missing 338,884 305,252 269,665
Missing rate 2.73% 3.38% 4.31%
MAD 0.70% 0.51% 0.45%
If we break down the aggregate sample for large originators into individual institutions,
the hypothesis of poorer appraisals among large originators seem to be more likely.
14
Table 7. Difference in proportion of the first digit between actual and expected distributions
First digit Countrywide RFC Wells Fargo
Option One
New Century
First Franklin IndyMac WaMu Ameriquest Long Beach Fremont
1 -0.98% 8.57% -3.67% 6.55% 3.22% 7.92% -9.23% -6.40% 12.32% 2.01% -1.11%
2 3.28% -1.15% -4.68% 2.68% 4.03% 3.50% 1.05% -7.54% 3.17% 1.99% 6.37%
3 1.39% -2.55% -3.45% -0.51% 2.13% -0.97% 3.53% -5.24% -1.88% 0.82% 3.93%
4 0.27% -2.74% -0.41% -2.77% -0.67% -2.70% 3.34% 0.23% -4.45% -0.62% 1.12%
5 -0.04% -2.06% 2.84% -3.00% -2.16% -2.71% 2.60% 4.80% -4.51% -1.05% -1.06%
6 -0.30% -0.84% 4.16% -1.77% -2.11% -1.97% 1.59% 6.19% -3.46% -0.93% -2.08%
7 -1.05% -0.21% 2.93% -0.84% -1.94% -1.56% -0.17% 4.36% -1.77% -0.95% -2.53%
8 -1.15% 0.51% 1.81% -0.03% -1.31% -0.85% -1.00% 2.73% 0.15% -0.63% -2.41%
9 -1.43% 0.46% 0.49% -0.31% -1.19% -0.66% -1.70% 0.86% 0.42% -0.65% -2.26%
N Obs 2,613,012 2,404,130 872,110 749,190 643,231 558,741 462,769 453,532 443,755 435,476 375,878
N missing 86,172 33 513 9,072 183 2 504 125 1 52 3
missing rate 3.19% 0.00% 0.06% 1.20% 0.03% 0.00% 0.11% 0.03% 0.00% 0.01% 0.00%
MAD 1.10% 2.12% 2.72% 2.05% 2.08% 2.54% 2.69% 4.26% 3.57% 1.07% 2.54%
Conformity Marginally acceptable No No No No No No No No
Marginally acceptable No
15
Table 7. (cont’d) Difference in proportion of the first digit between actual and expected distributions
First digit BOA WEYERHAEUSER Argent Impac Chase
Manhattan AHM Centex GMAC First
Horizon
1 -2.33% -4.94% 6.94% -0.55% 2.91% -5.41% 7.18% -4.95% -3.24%
2 -6.00% 5.59% 5.52% 5.13% -3.23% 3.89% -3.04% -2.61% 4.06%
3 -4.76% 5.98% 0.95% 3.78% -3.37% 3.79% -4.16% -2.54% 2.29%
4 0.12% 2.18% -2.25% 0.91% -2.37% 1.51% -3.50% 0.30% -0.68%
5 3.70% -0.22% -3.67% -1.16% -0.03% 0.59% -1.62% 3.99% -0.59%
6 4.05% -1.55% -3.73% -1.89% 1.91% -0.06% 0.40% 3.70% 0.35%
7 2.84% -2.34% -2.15% -2.30% 1.85% -0.98% 1.47% 1.85% -0.19%
8 1.83% -2.36% -0.91% -2.02% 1.65% -1.48% 2.01% 0.67% -0.71%
9 0.57% -2.34% -0.70% -1.91% 0.68% -1.86% 1.26% -0.40% -1.31%
N Obs 323,127 331,696 332,917 286,715 286,610 226,962 161,547 49,102 58,107
N missing 21,783 3,576 342 135 119 22,773 3,238 103,956 86,347
missing rate 6.32% 1.07% 0.10% 0.05% 0.04% 9.12% 1.96% 67.92% 59.77%
MAD 2.91% 3.05% 2.98% 2.18% 2.00% 2.17% 2.74% 2.33% 1.49%
Conformity No No No No No No No No No
16
As shown in Table 7, Among the group of large originators, only mortgages closed by
Countrywide and Long Beach show marginally acceptable conformity to Benford’s Law with
MADs of 1.1% and 1.07%. If deviations from Benford’s distribution is meaningfully related to
manipulation of data, it would not be surprising to see the largest MAD of 4.26% for WaMu that
had $9.9 million settlement for its appraisal fraud with eAppraiseIT.
The nonconformity in terms of MAD prevalent among the largest players including
WaMu, Ameriquest, WEYERHAEUSER, Argent, BOA, etc. doesn’t seem to be related to a
particular business model or loan processing channels. Also, as shown in Figure 1, this non-
conformity doesn’t seem to be driven by any selective missing for appraised values with a
particular number as the first digit.
Figure 1.MAD and missing rate for house appraisal values by originators
Interestingly, for small originators ranked below 20th and among unidentified originators,
MAD is significantly small implying acceptable conformity to Benford’s law. This doesn’t seem
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
4.00%
4.50%
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00%
MA
D
Missing rate
17
to be simply driven by large number of observations for the group of loans whose originators are
small or unidentified because Countrywide and Long Beach loans achieve conformity with even
smaller number of observations.
3.4 Deviations by loan characteristics In this section, we examine the relation between different loan characteristics and the
amount of deviations in appraised values from Benford’s Law. The most interesting finding is
riskier loans are more likely to be associated with poor appraisals at least in terms of several
major mortgage risk characteristics including lien type, negative amortization, balloon payment,
interest-only, and occupancy types. We do not find evidence that more deviations are related to
particular property locations, loan documentation types, or purpose types.
Table 8. Deviations by lien type
Lien type
First digit 1st 2nd
1 0.49% 6.05%
2 0.74% 5.09%
3 -0.22% 1.32%
4 -0.53% -1.40%
5 -0.03% -2.65%
6 0.32% -2.78%
7 -0.06% -2.49%
8 -0.11% -1.80%
9 -0.60% -1.34%
N Obs 16,325,746 2,863,022
N Missing 197,950 168,719
Missing rate 1.20% 5.57%
MAD 0.34% 2.77%
Table 8 shows how deviations in the first digit distribution of property appraised values
vary depending on lien position. A mortgage is said to be in a first lien position on the
collateralized property if the first priority is given to the lender over all other claims in case of
default. If a borrower chooses to use a second mortgage to manage her loan-to-value ratio or to
18
minimize the amount of private mortgage insurance premium, her second mortgage is said to be
in the second lien.
Mortgages in the second lien positions are exposed to more default risk because those
borrowers tend to have less equity in their houses and are less willing to make a scheduled
repayment under adverse situations. As presented in Table 5, the second lien loans have more
missing values for their property appraised values and the distribution of the first digit in
appraised values does not conform to Benford’s Law with MAD of 2.77% while first lien loans
show conformity to Benford’s distribution.
Table 9. Distribution by Negative amortization
Negative Amortization
First digit No Yes
1 2.27% -14.40%
2 1.35% 0.26%
3 -0.30% 5.03%
4 -1.02% 5.41%
5 -0.70% 4.31%
6 -0.30% 2.53%
7 -0.43% 0.12%
8 -0.28% -1.19%
9 -0.61% -2.08%
N Obs 19,270,659 1,167,539
N Missing 371,532 6,872
Missing rate 1.89% 0.59%
MAD 0.81% 3.92%
Mortgages are said to be ‘exotic’ when they include one or more features to help
borrowers qualify for mortgages they couldn’t have successfully obtained otherwise. An
example of an exotic is onethat allows negative amortization by giving the borrower an option to
add deferred interests to the principal balance. Since negative amortization is engineered
particularly for borrowers with short-term disability to make monthly payments, it is reasonable
think that mortgages with negative amortization are exposed to more default risk. Table 6
19
presents the amount of deviation in appraissed values from Benford’s Law depending on whether
negative amortization is allowed. As shown in Table 6, riskier loans in terms of negative
amortization tend to have more deviation in the first digit of appraised value, with MAD of
3.92%, while loans with no negative amortization conform to Benford’s Law.
Balloon payment is the second exotic feature whose effects on appraisals are studied in
this section. Mortgages are said to require balloon payments if a lump-sum bulk payment for the
remaining outstanding balance is scheduled before the end of the amortization term. As shown
in Table 10, balloon payment is more associated with larger deviation in the first digit
distribution of appraised values with MAD of 2.54%.
Table 10. Deviations by Balloon payment type
Balloon payment
First digit No Yes
1 1.56% -1.63%
2 0.76% 6.63%
3 -0.36% 3.98%
4 -0.75% 0.81%
5 -0.34% -1.01%
6 0.04% -1.90%
7 -0.21% -2.44%
8 -0.15% -2.28%
9 -0.55% -2.17%
N Obs 18,893,408 1,888,580
N Missing 599,100 45,036
Missing rate 3.07% 2.33%
MAD 0.52% 2.54%
The last exotic feature we examine is the provision of interest-only payment. Table 11
shows the difference in the amount of deviations between non-IO and IO loans. As expected
from the results for other exotic features, IO loans do not conform to Benford’s Law with MAD
of 2.57% while non-IO loans show close conformity.
20
Table 11. Deviations by interest-only type
Interest Only
First digit No Yes
1 2.94% -7.33%
2 0.84% 3.45%
3 -0.69% 3.64%
4 -1.12% 2.01%
5 -0.78% 1.62%
6 -0.32% 0.86%
7 -0.34% -0.70%
8 -0.10% -1.51%
9 -0.43% -2.03%
N Obs 16950671 3337174
N Missing 600904 24650
Missing rate 3.42% 0.73%
MAD 0.84% 2.57%
Mortgages can be categorized to three different groups based on their occupancy type. A
property is said to be owner-occupied if it is used for primary residence. A house can also be
used as a second home. If a house is purchased for an investment, the property is occupied by
non-owner, which shows weaker conformity than other occupancy types.
Table 12. Deviations by occupancy type
Occupancy type
First digit Non-owner Owner-occupied Second Home
1 3.92% 0.91% 0.01%
2 0.64% 1.30% 1.79%
3 -1.52% 0.25% -0.58%
4 -2.06% -0.40% -1.30%
5 -1.10% -0.29% -0.21%
6 0.05% -0.14% 0.38%
7 0.14% -0.47% 0.29%
8 0.25% -0.41% 0.18%
9 -0.32% -0.75% -0.55%
N Obs 1,831,377 16,877,757 621,999
N Missing 19,398 242,756 5,417
Missing rate 1.05% 1.42% 0.86%
MAD 1.11% 0.55% 0.59%
21
Borrowers can use either fixed rate mortgages (FRM) or adjustable rate mortgages (ARM)
As shown in Table 13, FRMs show acceptable conformity to Benford’s distribution with MAD
of 0.49% while ARMs only marginally conform with MAD of 0.87%.
Table 13. Deviations by interest rate type
Interest rate type
First digit FRM ARM
1 1.86% 0.73%
2 0.04% 2.44%
3 -0.76% 0.76%
4 -0.81% -0.42%
5 -0.26% -0.53%
6 0.23% -0.47%
7 0.01% -0.80%
8 0.06% -0.71%
9 -0.37% -1.00%
N Obs 9930452 10851536
N Missing 546495 97641
Missing rate 5.22% 0.89%
MAD 0.49% 0.87%
Table 14 presents the distribution of the first digit in the property appraisal prices for 20
states where private-label mortgages were particularly popular during the period leading up to
the crisis. Regardless of the property location, MADs are all higher than 1.2%, or the threshold
of non-conformity. Ohio shows the highest MAD of 5.66% while VA shows the least deviation
with MAD of 1.48%. NV, CA, AZ, and FL that suffered the most from the subprime crisis do
not seem to be associated with the amount of deviations.
22
Table 14. Deviations by property states
First digit CA FL TX IL NY AZ GA VA MI
1 -15.70% 7.83% 17.06% 6.05% -13.20% 8.20% 20.60% -4.25% 15.88% 2 -2.17% 8.21% -5.31% 6.57% -4.26% 11.52% 0.14% 0.43% -3.85% 3 5.17% -0.86% -6.81% -0.49% 5.64% -1.11% -4.65% 2.38% -6.73% 4 5.99% -3.64% -5.21% -3.05% 6.94% -4.07% -4.93% 1.95% -5.60% 5 5.12% -3.61% -3.09% -2.79% 3.86% -3.94% -3.93% 1.27% -3.22% 6 3.22% -2.81% -0.96% -2.00% 2.38% -3.44% -2.87% 0.64% -0.71% 7 0.77% -2.21% 0.41% -1.71% 0.38% -2.96% -2.19% -0.42% 0.87% 8 -0.68% -1.54% 1.81% -1.26% -0.51% -2.36% -1.34% -0.78% 1.92% 9 -1.72% -1.38% 2.10% -1.31% -1.23% -1.85% -0.83% -1.22% 1.43%
N Obs 5,002,396 1,901,792 1,114,388 795,898 783,259 724,914 637,270 593,493 593,548 N Missing 140,554 38,897 30,660 20,849 18,492 20,530 19,793 21,049 19,245 Missing rate 2.73% 2.00% 2.68% 2.55% 2.31% 2.75% 3.01% 3.43% 3.14%
MAD 4.50% 3.57% 4.75% 2.80% 4.27% 4.38% 4.61% 1.48% 4.47%
First digit MD NJ WA OH CO PA MA NV NC MN MO
1 -6.83% -9.32% 0.48% 14.47% 6.13% 7.69% -11.33% -6.56% 16.86% 8.36% 14.40%
2 6.52% 4.86% 11.47% -7.58% 12.21% -2.52% 5.67% 15.06% -4.05% 13.79% -6.72%
3 5.93% 7.62% 2.25% -8.52% -0.80% -4.54% 9.70% 7.48% -6.45% -3.00% -7.94%
4 0.87% 3.52% -2.01% -6.13% -3.26% -3.38% 3.31% -0.62% -5.45% -4.38% -5.51%
5 -0.30% 0.31% -2.49% -3.21% -3.24% -1.23% 0.77% -2.51% -3.29% -3.83% -2.37%
6 -0.64% -0.91% -2.43% -0.03% -2.93% 0.48% -1.09% -3.08% -1.12% -3.18% 0.67%
7 -1.57% -1.83% -2.60% 2.60% -2.93% 1.06% -2.14% -3.42% 0.39% -2.96% 2.15%
8 -1.82% -2.01% -2.37% 4.68% -2.66% 1.53% -2.31% -3.26% 1.58% -2.48% 3.04%
9 -2.14% -2.22% -2.29% 3.72% -2.53% 0.91% -2.57% -3.10% 1.53% -2.31% 2.27%
N Obs 576,836 575,909 525,652 514,907 503,789 499,255 420,328 423,628 394,860 311,317 296,892
N Missing 18,495 16,115 17,232 13,380 15,589 14,935 13,395 9,619 13,141 6,426 8,459
Missing rate 3.11% 2.72% 3.17% 2.53% 3.00% 2.90% 3.09% 2.22% 3.22% 2.02% 2.77%
MAD 2.96% 3.62% 3.15% 5.66% 4.08% 2.59% 4.32% 5.01% 4.53% 4.92% 5.01%
23
There is also little evidence that loan documentation type is associated with poor
appraisal practices. As shown in Table 15, regardless of whether borrowers income, assets, and
employment status are fully, partially or never verified, the distributions of first digit in appraised
values do not conform to Benford’s Law.
Table 15. Deviations by loan documentation type
Documentation type
First digit Full Low No Unknown
1 5.55% -4.01% -4.55% 2.42%
2 0.47% 2.02% 0.52% 1.54%
3 -1.88% 2.23% 0.43% 0.28%
4 -2.15% 1.26% 0.80% -0.77%
5 -1.36% 0.85% 1.96% -1.06%
6 -0.55% 0.45% 2.26% -0.75%
7 -0.23% -0.50% 0.21% -0.69%
8 0.21% -0.90% -0.45% -0.37%
9 -0.07% -1.38% -1.16% -0.61%
N Obs 9,213,483 7,603,108 429,422 2,074,158
N Missing 137,342 75,184 8,594 355,151
Missing rate 1.47% 0.98% 1.96% 14.62%
MAD 1.39% 1.51% 1.37% 0.94%
Variation in loan purpose exhibited some diversity in terms of conformity to a Benford’s
distribution. Table 16 shows that appraisals prepared to support a mortgage for a house purchase,
cash-out refinancing, or refinancing are associated with acceptable conformity to Benford’s Law
with MAD values ranging from 0.75% to 1.17%, but appraisal prepared to support loans for debt
consolidation, home improvement or construction did not demonstrate any conformity.
24
Table 16. Deviations by loan purpose type
First digit Cash-out refi Debt consolidation
Home improvement Construction Purchase Refinance
1 0.34% 25.25% 8.42% 2.19% 1.46% 0.59%
2 2.59% -5.36% -2.91% -3.01% 1.59% -2.19%
3 0.93% -8.51% -3.51% -2.06% 0.31% -2.25%
4 -0.50% -7.22% -3.01% -1.27% -0.45% -0.82%
5 -0.80% -5.39% -1.63% 0.13% -0.32% 0.95%
6 -0.54% -3.03% -0.28% 0.91% -0.26% 1.65%
7 -0.67% -0.59% 0.61% 0.88% -0.68% 1.15%
8 -0.48% 1.78% 1.26% 1.48% -0.68% 0.84%
9 -0.85% 3.07% 1.05% 0.76% -0.97% 0.09%
N Obs 7,387,335 341,593 25,805 24,029 8,600,425 2,849,287
N Missing 110,243 32,145 6,366 811 86,122 26,827
Missing rate 1.47% 8.60% 19.79% 3.26% 0.99% 0.93%
MAD 0.85% 6.69% 2.52% 1.41% 0.75% 1.17%
5. Conclusion This paper reports the results of an investigation of whether real property appraisals for
private-label mortgages conform to Benford’s Law. Using data that covers the majority of
private mortgage securitization industry for the six year period leading up to the recent financial
crisis, we calculate the distribution of the first digit in house appraisal values and compare them
with a Benford’s distribution. Significant deviations were discovered. Property appraisal values
for large originators’ mortgages did not conform to what Benford predicts with the largest
deviation discovered for loans originated for WaMu which coincidently had a $9.9 million
settlement for its appraisal fraud with eAppraiseIT. We also identify loan characteristics that are
associatedwith nonconformity of appraised values with the Benford’s distribution. Exotic loan
features including payment terms that result in negative amortization, balloon payments, and
interest-only loans are significantly associated with more deviation from the distribution of
natural data. First-lien mortgages show conformity to Benford’s distribution while second-lien
loans do not. Regarding loan purpose, cash-out refinancing and purchase loans conform while
25
refinance, debt consolidation, home improvement, and construction loans do not conform.
Owner occupied and second home loans were found to be in acceptable conformance while non-
owner/investment loans only marginally conformed.
The results presented here suggest the possibility that appraised values for certain loan
types have been subject to manipulation in that the distribution of their first digit does not
conform to Benford’s Law. But, nonconformance does not guarantee that data has been
manipulated. Therefore, we consider this study preliminary in nature and plan next to search for
reasons that may explain the deviations reported here.
26
References
Alali, F. and S. Romero. Characteristics of Failed U.S. Commercial Banks: An Exploratory
Study, Accounting & Finance, 2013a, 53:4, 1149-1174.
Alali, F.A. and S. Romero. Benford’s Law: Analyzing a Decade of Financial Data, Journal of
Emerging Technologies in Accounting, 2013b, 10, 1-39.
Amiram, D., Z. Bozanic and E. Rouen. Financial Statement Irregularities: Evidence from the
Distributional Properties of Financial Statement Numbers (May 8, 2014). Columbia
Business School Research Paper No. 14-9.
Benford, F. The Law of Anomalous Numbers, Proceedings of the American Philosophy Society,
1938, 78, 551–572.
da Silva, C.G. Gomes and P.M.R. Carreira. Selecting Audit Samples Using Benford's Law,
Auditing: A Journal of Practice and Theory, 2013, 32:2, 53-65.
de Freitas Costa, J.I., J. dos Santos and S.K. de Melo Travassos. An Analysis of Federal Entities'
Compliance with Public Spending: Applying the Newcomb-Benford Law to the 1st and
2nd Digits of Spending in Two Brazilian States. Revista Contabilidade & Finanças - USP.
2012, 23:60, 187-198.
de Marchi, S. and J.T. Hamilton. Assessing the Accuracy of Self-Reported Data: an Evaluation
of the Toxics Release Inventory, Journal of Risk and Uncertainty, 2006, 32:1, 57-76.
Diekmann, A. and B. Jann. Benford's Law and Fraud Detection: Facts and Legends, German
Economic Review, 2010, 11:3, 397-401.
Durtschi, C. W. Hillison and C. Pacini. The Effective Use of Benford’s Law to Assist in
Detecting Fraud in Accounting Data, Journal of Forensic Accounting, 2004, V, 17-34.
27
Gava, A.M. and L. Vitiello. Inflation, Quarterly Balance Sheets and the Possibility of Fraud:
Benford's Law and the Brazilian case, Journal of Accounting, Business and Management,
2014, 21:1, 43-52.
Geyer, C.L. and P.P. Williamson. Detecting Fraud in Data Sets Using Benford's Law,
Communications in Statistics: Simulation & Computation, 2004, 33:1, 229-246.
Giles, D.A.E. Benford's Law and Naturally Occurring Prices in Certain ebaY Auctions, Applied
Economics Letters, 2007, 14:3, 157-161.
Hsieh, C.H. and F. Lin. Applying Digital Analysis to Detect Fraud: An Empirical Analysis of
US Marine Industry, Applied Economics, 2013, 45:1, 135-140.
Jackson, S.B. and M.K. Pitman. Auditors and Earnings Management, CPA Journal, 2001, 71:7,
39-44.
Johnson, G. Using Benford's Law to Determine if Selected Company Characteristics are Red
Flags for Earnings Management, Journal of Forensic Studies in Accounting & Business,
2009, 1:2, 39-65.
Johnson, G. and J. Weggenmann. Exploratory Research Applying Benford's Law to Selected
Balances in the Financial Statements of State Governments, Academy of Accounting and
Financial Studies Journal, 2013, 17:3, 31-44.
Kumar, K. and S. Bhattacharya. Detecting the Dubious Digits: Benford’s Law in Forensic
Accounting, Significance, 2007, 4:2. 81-83.
Ley, E. On the Peculiar Distribution of the U.S. Stock Indexes’ Digits, The American
Statistician, 1996, 50:4, 311–313.
McGinty, J.C. Accountants Increasingly Use Data Analysis to Catch Fraud: Auditors Wield
Mathematical Weapons to Detect Cheating, WSJ.com, 2014a. Retrieved on 1/6/2015 from
28
http://www.wsj.com/articles/accountants-increasingly-use-data-analysis-to-catch-fraud-
1417804886
McGinty, J.C. When Using Math to Catch Crooks, You Can’t Jump to Conclusions, WSJ.com,
2014b. Retrieved on 1/6/2015 from http://blogs.wsj.com/numbers/when-using-math-to-
catch-crooks-you-cant-jump-to-conclusions-1870/
Michalski, T. and G. Stoltz. Do Countries Falsify Economic Data Strategically? Some
Evidence That They Might, The Review of Economics and Statistics, 2013, 95:2, 591-616.
Mir, T., M. Ausloos and R. Cerqueti. Benford's law predicted digit distribution of aggregated
income taxes: the surprising conformity of Italian cities and regions, European Physical
Journal B -- Condensed Matter, 2014, 87:11, 1-8.
Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural Numbers,
American Journal of Mathematics, 1881, 4, 39–40.
Nigrini, M. J. Taxpayer Compliance Application of Benford’s Law, Journal of the American
Taxation Association, 1996, 18:1, 72-92.
Nigrini, M. J. 1999. Adding value with digital
Nigrini, M.J. and S.J. Miller. Data Diagnostics Using Second-Order Tests of Benford’s Law,
Auditing: A Journal of Practice & Theory, 2009, 28:2, 305-324.
Özer, G. and B. Babacan. Benford's Law and Digital Analysis: Application on Turkish Banking
Sector, Business and Economics Research Journal, 2013, 4:1, 29-41.
Rodriguez, R.J. Reducing False Alarms in the Detection of Human Influence on Data, Journal
of Accounting, Auditing & Finance, 2004, 19:2, 141-158.
Rose, A.M. and J.M. Rose. Turn Excel into a Financial Sleuth, Journal of Accountancy, 2003,
196:2, 58-60.
29