harnessing statistical power with e(mad) in benford's law research and practice
TRANSCRIPT
Walk the Line: Harnessing Power with E(MAD) in Benford’s Law Research and Practice*
*formerly, Are Sarbanes-Oxley and Dodd-Frank Working? Non-Parametric Exploration of Benford’s Law and Financial Statements 1970-2013
Bradley J. Barney, Kennesaw State University
Kurt S. Schulzke, Kennesaw State University
New Abstract - Three primary objectives
• Describe with greater precision than extant literature the mathematical relationship between the mean absolute deviation (MAD) from Benford’s Law frequencies and N
• Propose an alternative MAD-based statistic, Excess MAD, that explicitly adjusts for sample size to provide an estimator of deviation from Benford's Law frequencies
• Compare the differing Benford’s Law outcomes from Excess MAD compared to MAD, applying non-parametric, generalized additive modelling to four decades of public company financial statement numbers
• Collateral finding: Evidence Benford’s Law conformity did not noticeably increase after implementation of the Sarbanes Oxley Act (2003) or the Dodd Frank Act (2010)
Benford’s Law Overview
• Benford’s Law prescribes the frequency with which each leading digit(s) should occur in certain data sets.
• Examples:
• Pr(first digit=k) = log10(1+1/k) for k=1, 2, …, 9
• Pr(first two digits=k) =log10(1+1/k) for k=10, 11, …, 99
See, e.g., Nigrini (1996); Durtschi et al (2004); Nigrini (2011, Chapter 5); Fewster (2009)
Benford’s Law First Digit(s) Frequencies
Literature Review
• Nigrini (1996, 2005, 2011, 2012) Basics of BL, including N-sensitivity of MAD and Chi-Sq
• Durtschi, Hill & Pacini (2004) Basics plus caveats; link to conditional probability & Bayes’ rule
• Johnson & Ireland (2007) Used Z-stats, Chi-square, and MAD scores (without predetermined critical values) to test individually and jointly the first, second, and third digits of 22 P&L accounts, 1998 to 2003. Most P&L accounts diverged significantly from BL, most notably rental income (N=4,294) and loss provisions (N=4,756); upward revenue manipulation appears more frequently than downward expense; digits to the left (e.g., first digit) less than those to the right (e.g., third digit).
• Alali & Romero (2013) Broke 2001-2010 into six sub-periods. Abandoned Chi-square for MAD after reviewer comments questioned excess power but assumed MAD is sample-size insensitive. Total N = 24,453; subsets as small as 10 and 42.
• Dichev et al (2013) Not a BL study. Survey of CFOs suggests F/S manipulation base-rate of at least 20%
• Amiram et al. (2015) BL Conformity expected to drop in presence of earnings management
Literature Review – Johnson & Ireland (2007)
Correlations -0.68 to -0.76 (for all Ns); -0.64 and -0.78 (N > 5000). E.g., Table 3, N > 5000 and All N:
Assessing Conformity with BL
• Pearson’s chi-square test: popular for significance testing
𝑘𝑂𝑏𝑠𝑘−𝐸𝑥𝑝𝑘
2
𝐸𝑥𝑝𝑘,
where Obsk (Expk) are the observed (expected) frequencies of leading digit(s) equaling k under BL
• Mean Absolute Deviation (MAD): popular for estimating effect size
1
90
𝑘=10
99𝑂𝑏𝑠𝑘 − 𝐸𝑥𝑝𝑘𝑛
Gives the two-digit MAD
Dependence of Statistics on N
• If Benford’s Law does not exactly hold (if actual data diverge from BL), chi-square test statistic tends to increase with N. • If it does hold, expected value converges to df.
• MAD tends to converge to the population’s true deviation with N, rather than diverge, whether or not the population conforms precisely to Benford’s Law.
Dependence of MAD on N
• Benford’s Law is a model, not necessarily the truth.
• If the truth were known and MAD calculation used true digit probabilities in comparison with BL-implied probabilities, MAD would not depend on N.
• Because true probabilities not known, MAD is biased upward when Benford’s Law holds.
• Literature cautions against using MAD if N is not sufficiently large.
• Nigrini suggests various possible minimum Ns for first-two digits.
Minimum N for First Two Digits?
• 1,000 to 10,000
• 3,000 should provide a “good” Benford’s Law “fit”
• at least 1,000 for “good conformity” to BL
• not less than 300
• Which minimum is the real minimum?
Sources: Johnson and Weggenmann 2013, 37; Nigrini, 2011; Nigrini 2012, 20).
Proposed MAD Thresholds for BL conformity
• Two-digit MAD > .0022 Nonconformity
• .0018 <Two-digit MAD < .0022 Marginal conformity
• .0012 <Two-digit MAD < .0018 Acceptable conformity
• Two-digit MAD < .0012 Close conformity
Nigrini (2011, p. 115)
Why N Matters: potentially too many red flags
Using BL to Screen for F/S Manipulation• Interested in, say, Pr(Manipulation | MAD above threshold), hereafter, Pr(Manip |Large
MAD)
Per Bayes’ Rule, equals
Pr 𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷 𝑀𝑎𝑛𝑖𝑝)Pr(𝑀𝑎𝑛𝑖𝑝)
Pr(𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷|𝑀𝑎𝑛𝑖𝑝) Pr 𝑀𝑎𝑛𝑖𝑝 + Pr 𝐿𝑎𝑟𝑔𝑒 𝑀𝐴𝐷 𝑁𝑜 𝑀𝑎𝑛𝑖𝑝)Pr(𝑁𝑜 𝑀𝑎𝑛𝑖𝑝)
• Nature of diagnostic tests: As Pr(Large MAD|Manip) increases, so too will Pr(Large MAD|No Manip)
• If manipulation is not common, then for desired conditional probability to be large we need to greatly limit false positives.
ConundrumSmall N Moderate N Large NNothing works particularly well
MAD likely to be spuriously high
Chi-square statistic is sensitive to even minor BL violations, a.k.a., “excess power”
MAD works well at summarizing severity of violation
What to do with moderate sizes: N=1000 to 2500 for a two-digit test?
Excess MAD• Excess MAD: MAD – Expected MAD under BL
• Advantages: • Removes N-related bias and thus facilitates comparability of MAD for
different N’s.
• Easy to accurately approximate: • E.g., for first two digits: Excess MAD ≈ MAD – 1/sqrt(159*N)
• Functions well whenever MAD would; functions better if MAD doesn’t work.
•Can use with more moderately sized N’s than MAD requires, and yet can still use with large sizes.
Illustrative Example – SOX & Dodd-Frank
• Was conformity with BL affected by SOX or DF?
• Data: Domestic firms in Compustat database
• Fiscal years 1970-2013
• Focus on 14 variables, including net income, total assets, and comprehensive income.
• Considered (Excess) MAD values with N ≥ 1000 for yearly total.
• Generalized additive model (GAM) used to smoothly yet flexibly model changes over time while allowing AR(1) correlation in errors
First-two-digit MAD for Positive Net Incomes, by Year
N by Year for Positive Net Income
First-two-digit Excess MAD for Positive Net Incomes, by Year
First-two-digit MAD, N, and Excess MAD for Positive Net Incomes, by Year
First-two-digit MAD, N, and Excess MAD for Negative Net Incomes, by Year
First-two-digit MAD, N, and Excess MAD for (Positive) Total Net Property, Plant, and Equipment, by Year
First-two-digit MAD, N, and Excess MAD for (Positive) Total Receivables, by Year
Conclusions
• Using first-two-digit MAD, with limitation that N >= 1000, often suggested temporal trend with nonconformity increasing post 2000.
• When Excess MAD is applied to these data, most trends essentially disappear.
• Excess MAD is not only quite stable, but proximity to 0 suggests MAD not appreciably higher than could be explained by chance alone given that Benford’s Law holds
• Excess MAD should never be less reliable than other metrics
Recommendations
• Strive for large N
• Regardless of N, use Excess MAD
• Recognize that BL analysis comes with subtle limitations, one of which is its sensitivity to sample size