harnessing statistical power with e(mad) in benford's law research and practice

Walk the Line: Harnessing Power with E(MAD) in Benford’s Law Research and Practice*

*formerly, Are Sarbanes-Oxley and Dodd-Frank Working? Non-Parametric Exploration of Benford’s Law and Financial Statements 1970-2013

Bradley J. Barney, Kennesaw State University

Kurt S. Schulzke, Kennesaw State University

New Abstract - Three primary objectives

• Describe with greater precision than extant literature the mathematical relationship between the mean absolute deviation (MAD) from Benford’s Law frequencies and N

• Propose an alternative MAD-based statistic, Excess MAD, that explicitly adjusts for sample size to provide an estimator of deviation from Benford's Law frequencies

• Compare the differing Benford’s Law outcomes from Excess MAD compared to MAD, applying non-parametric, generalized additive modelling to four decades of public company financial statement numbers

• Collateral finding: Evidence Benford’s Law conformity did not noticeably increase after implementation of the Sarbanes Oxley Act (2003) or the Dodd Frank Act (2010)

Benford’s Law Overview

• Benford’s Law prescribes the frequency with which each leading digit(s) should occur in certain data sets.

• Examples:

• Pr(first digit=k) = log10(1+1/k) for k=1, 2, …, 9

• Pr(first two digits=k) =log10(1+1/k) for k=10, 11, …, 99

See, e.g., Nigrini (1996); Durtschi et al (2004); Nigrini (2011, Chapter 5); Fewster (2009)

Benford’s Law First Digit(s) Frequencies

Literature Review

• Nigrini (1996, 2005, 2011, 2012) Basics of BL, including N-sensitivity of MAD and Chi-Sq

• Durtschi, Hill & Pacini (2004) Basics plus caveats; link to conditional probability & Bayes’ rule

• Johnson & Ireland (2007) Used Z-stats, Chi-square, and MAD scores (without predetermined critical values) to test individually and jointly the first, second, and third digits of 22 P&L accounts, 1998 to 2003. Most P&L accounts diverged significantly from BL, most notably rental income (N=4,294) and loss provisions (N=4,756); upward revenue manipulation appears more frequently than downward expense; digits to the left (e.g., first digit) less than those to the right (e.g., third digit).

• Alali & Romero (2013) Broke 2001-2010 into six sub-periods. Abandoned Chi-square for MAD after reviewer comments questioned excess power but assumed MAD is sample-size insensitive. Total N = 24,453; subsets as small as 10 and 42.

• Dichev et al (2013) Not a BL study. Survey of CFOs suggests F/S manipulation base-rate of at least 20%

• Amiram et al. (2015) BL Conformity expected to drop in presence of earnings management

Literature Review – Johnson & Ireland (2007)

Correlations -0.68 to -0.76 (for all Ns); -0.64 and -0.78 (N > 5000). E.g., Table 3, N > 5000 and All N:

Assessing Conformity with BL

• Pearson’s chi-square test: popular for significance testing

𝑘𝑂𝑏𝑠𝑘−𝐸𝑥𝑝𝑘

2

𝐸𝑥𝑝𝑘,

where Obsk (Expk) are the observed (expected) frequencies of leading digit(s) equaling k under BL

• Mean Absolute Deviation (MAD): popular for estimating effect size

1

90

𝑘=10

99𝑂𝑏𝑠𝑘 − 𝐸𝑥𝑝𝑘𝑛

Gives the two-digit MAD

Dependence of Statistics on N

• If Benford’s Law does not exactly hold (if actual data diverge from BL), chi-square test statistic tends to increase with N. • If it does hold, expected value converges to df.

• MAD tends to converge to the population’s true deviation with N, rather than diverge, whether or not the population conforms precisely to Benford’s Law.

Dependence of MAD on N

• Benford’s Law is a model, not necessarily the truth.

• If the truth were known and MAD calculation used true digit probabilities in comparison with BL-implied probabilities, MAD would not depend on N.

• Because true probabilities not known, MAD is biased upward when Benford’s Law holds.

• Literature cautions against using MAD if N is not sufficiently large.

• Nigrini suggests various possible minimum Ns for first-two digits.

Minimum N for First Two Digits?

• 1,000 to 10,000

• 3,000 should provide a “good” Benford’s Law “fit”

• at least 1,000 for “good conformity” to BL

• not less than 300

• Which minimum is the real minimum?

Sources: Johnson and Weggenmann 2013, 37; Nigrini, 2011; Nigrini 2012, 20).

Proposed MAD Thresholds for BL conformity

• Two-digit MAD > .0022 Nonconformity

• .0018 <Two-digit MAD < .0022 Marginal conformity

• .0012 <Two-digit MAD < .0018 Acceptable conformity

• Two-digit MAD < .0012 Close conformity

Nigrini (2011, p. 115)

Why N Matters: potentially too many red flags

Using BL to Screen for F/S Manipulation• Interested in, say, Pr(Manipulation | MAD above threshold), hereafter, Pr(Manip |Large

MAD)

Per Bayes’ Rule, equals

Pr 𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷 𝑀𝑎𝑛𝑖𝑝)Pr(𝑀𝑎𝑛𝑖𝑝)

Pr(𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷|𝑀𝑎𝑛𝑖𝑝) Pr 𝑀𝑎𝑛𝑖𝑝 + Pr 𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷 𝑁𝑜 𝑀𝑎𝑛𝑖𝑝)Pr(𝑁𝑜 𝑀𝑎𝑛𝑖𝑝)

• Nature of diagnostic tests: As Pr(Large MAD|Manip) increases, so too will Pr(Large MAD|No Manip)

• If manipulation is not common, then for desired conditional probability to be large we need to greatly limit false positives.

ConundrumSmall N Moderate N Large NNothing works particularly well

MAD likely to be spuriously high

Chi-square statistic is sensitive to even minor BL violations, a.k.a., “excess power”

MAD works well at summarizing severity of violation

What to do with moderate sizes: N=1000 to 2500 for a two-digit test?

Excess MAD• Excess MAD: MAD – Expected MAD under BL

• Advantages: • Removes N-related bias and thus facilitates comparability of MAD for

different N’s.

• Easy to accurately approximate: • E.g., for first two digits: Excess MAD ≈ MAD – 1/sqrt(159*N)

• Functions well whenever MAD would; functions better if MAD doesn’t work.

•Can use with more moderately sized N’s than MAD requires, and yet can still use with large sizes.

Illustrative Example – SOX & Dodd-Frank

• Was conformity with BL affected by SOX or DF?

• Data: Domestic firms in Compustat database

• Fiscal years 1970-2013

• Focus on 14 variables, including net income, total assets, and comprehensive income.

• Considered (Excess) MAD values with N ≥ 1000 for yearly total.

• Generalized additive model (GAM) used to smoothly yet flexibly model changes over time while allowing AR(1) correlation in errors

First-two-digit MAD for Positive Net Incomes, by Year

N by Year for Positive Net Income

First-two-digit Excess MAD for Positive Net Incomes, by Year

First-two-digit MAD, N, and Excess MAD for Positive Net Incomes, by Year

First-two-digit MAD, N, and Excess MAD for Negative Net Incomes, by Year

First-two-digit MAD, N, and Excess MAD for (Positive) Total Net Property, Plant, and Equipment, by Year

First-two-digit MAD, N, and Excess MAD for (Positive) Total Receivables, by Year

Conclusions

• Using first-two-digit MAD, with limitation that N >= 1000, often suggested temporal trend with nonconformity increasing post 2000.

• When Excess MAD is applied to these data, most trends essentially disappear.

• Excess MAD is not only quite stable, but proximity to 0 suggests MAD not appreciably higher than could be explained by chance alone given that Benford’s Law holds

• Excess MAD should never be less reliable than other metrics

Recommendations

• Strive for large N

• Regardless of N, use Excess MAD

• Recognize that BL analysis comes with subtle limitations, one of which is its sensitivity to sample size

harnessing statistical power with e(mad) in benford's law research and practice

Data & Analytics