digital analysis tests and statistics -...

28
DIGITAL ANALYSIS TESTS AND STATISTICS DATAS Using Digit and Number Patterns to Detect Fraud, Errors, Biases, Irregularities, & Processing Inefficiencies DATAS 2009 FOR EXCEL 2007 INSTALLATION OF PROGRAMS RUNNING THE PROGRAMS DESCRIPTION OF PROGRAMS DESCRIPTION OF TABLES DESCRIPTION OF TESTS 1 2 3 4 5 6 7 8 9

Upload: others

Post on 29-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DIGITAL ANALYSIS TESTS AND STATISTICS

DATAS

Using Digit and Number Patterns to Detect Fraud,Errors, Biases, Irregularities, & Processing Inefficiencies

DATAS 2009 FOR EXCEL 2007

INSTALLATION OF PROGRAMS

RUNNING THE PROGRAMS

DESCRIPTION OF PROGRAMS

DESCRIPTION OF TABLES

DESCRIPTION OF TESTS

Copyright 1995-2009 by Mark J. Nigrini.All rights reserved

www.nigrini.com

1 2 3 4 5 6 7 8 9

Page 2: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DATAS 2009 FOR EXCEL: INSTALLATION OF PROGRAMS

1. DATAS 2009 for Excel 2007 contains 3 Excel 2007 macro-enabled workbooks, 3 Excel 2007 workbooks (data files), one documentation file (ProgramDetails_2009.docx), and two Benford’s Law papers in a directory called Datas2009_Excel. Copy the files from the CD contents to your c:\ drive. After the copying is done your c:\ drive should have a Datas2009_Excel directory that is an exact copy of the Datas2009_Excel directory on the CD. If you received your files by e-mail then unzip and extract the files to a folder called Datas2009_Excel.

2. The contents of Datas2009_Excel must be copied as is to your c:\drive. If you have purchased a site license, you may copy the files to a Network drive. If not, the files may only be loaded onto a drive accessible by one computer.

3. Datas2009_Excel contains 3 data files. The names of these files are DataCensus2000.xlsx, DataInvoices.xlsx, and DataStreamflow.xlsx.The invoices data is used with the permission of the owner.The census data is available from the website of the U.S. Census Bureau.The streamflow data was provided by the authors of a paper on streamflows. The source website is,

http://www.ce.unlv.edu/~piechota/DataSets3.htm

and the paper reference is,

Tootle, G.A., T.C. Piechota, and A.K. Singh, 2005. Coupled Interdecadal and Interannual Oceanic / Atmospheric Variability and United States Streamflow. Water Resources Research, 41(W12408).

These data files can be used for practice.

1

Page 3: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DATAS 2009 FOR EXCEL: RUNNING THE PROGRAMS

The section below assumes that your security setting is Disable all macros with notification. If your security setting is to Disable all macros without notification then this needs to be adjusted.

You must Enable the macros to run the programs. This is done after you open the .xlsm files, as follows,

Select Enable this content

Click OK.

2

Page 4: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

You now need to load your data. This is done by copying and pasting your data into column B starting at cell B2. The F5 (Go To) key will allow you to select the data range from the census data file,

Click OK, to give,

Right click, select Copy,

3

Page 5: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

Go to the BenfordsLawFirstSecondFirstTwo.xlsm worksheet, right click and select Paste,

To give,

4

Page 6: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

The programs are run using Excel 2007’s macro menu. In the View tab click Macros and then View Macros,

To give,

Click Run and wait for program to finish running.

Done!!! will be written in cell A1.

For small data sets the execution speed is quick. At the maximum number of records (1,048,575 records) it might take 5 minutes to process on a reasonably fast machine. Excel 2007 has 1,048,576 rows but 1 row is used for the heading, leaving 1,048,575 rows for data.

To speed up the execution speed for large data, sets it could help to close all programs that are not needed.

5

Page 7: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

To give,

and,

6

Page 8: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

and,

7

Page 9: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

8

Page 10: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DATAS 2009 FOR EXCEL: DESCRIPTION OF PROGRAMS

Data Profile and Digit Tests (BenfordsLawFirstSecondFirstTwo.xlsm)

This program prepares the Data Profile (count, the % of total count, and the sum of all the records) for,

1. Positive amounts (equal to or larger than 10.00, and amounts from 0.01 to 9.99)2. Amounts equal to zero,3. Negative amounts (less than -10.00, and from -0.01 to -9.99)4. Small (under 50.00) and large (over 100,000.00) amounts.

This program also prepares a graph and table for the first, second, and first-two digit tests:

The digit graphs and tables are prepared based on numbers >= 0.000001. Numbers smaller than this are rare in real life. Negative numbers should be analyzed separately from positive numbers. Negative numbers would need to be converted to positive numbers by using the absolute (ABS) function.The graphs do not include upper and lower confidence bounds showing the limits for differences at the 0.05 level of significance. These bounds are calculated in the Bounds worksheet. The bounds can be added to the charts by going to the Design tab and selecting Select Data. Click Add and enter the facts required, e.g.,

The output is viewed by clicking on the tabs at the bottom of the spreadsheet.

Number Frequencies (NumberFrequencies.xlsm)

This program prepares a table of the number frequencies showing the frequency with which each number occurred in the data set. The headings are self-explanatory.

Benford’s Law Second Order Tests (BenfordsLawSecondOrderTests.xlsm)

This program runs the new Benford’s Law second order test. This test analyzes the digits of the differences between the numbers that have been sorted (ordered) from smallest to largest.

The output is similar to the usual Benford’s Law tests except that neither the Data Profile nor the bounds are calculated because they are not really appropriate, and the expected results are digits patterns that are approximately Benford. If the numbers are clustered together then the spiked pattern below is likely. If the results are not close to Benford, or the spiked pattern then there is some significant anomaly in the data.

9

Page 11: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

10

Page 12: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DATAS 2009 FOR EXCEL: DESCRIPTION OF TABLES

The contents of the BenfordsLawFirstSecondFirstTwo.xlsm and BenfordsLawSecondOrderTests.xlsm tables are described below.

A. (Detail). Not used.

B. ($). These are the numbers that were copied from the Profile tab. It should include all positive numbers larger than 0.000001.

C. (TwoDig). This converts each amount to a number in the range [10,100), by moving the decimal point. Each number in column C is therefore greater than (and including) 10 and less than (and excluding) 100. This calculation is needed to identify the first-two digits in the number.

D. (First 2). This contains the first two-digits of the amounts. It is therefore an integer in the range [10,100). The formula in this column is copied from D2 downwards.

E. (First). This contains the first digit of the amount which is an integer in the range [1, 2, ...,9]. The formula in this column is copied from E2 downwards.

F. (Second). This contains the second digit of the record which is an integer in the range [0, 1, ...,9].

G. (Digit). This contains the bin ranges for the frequency counts. G2..G10 is the bin range for the first digits and G12..G21 is the bin range for the second digits.

H. (Count). This contains the counts of the digits to the left of the count number. H2..H10 contains the first digit frequencies. For example, the number "28" in H3 would indicate that there are 28 first digit 2's in the data set. H12..H21 contains the second digit frequencies. The sum of the first and second digit counts (H2..H10 and H12..H21) equals the number of records greater than or equal to 0.000001.

I. (Prop). This is the proportion that each count represents. For example, if the data set has 200 observations and H3 had a value of 28, then the proportion would be 0.140 (28/200).

J. (Ben Law). This contains the expected proportions of Benford's Law. J2..J10 contains the expected first digit proportions and J12..J21 contains the expected second digit proportions. The proportions in each row are matched horizontally with the digit in column G that it relates to.

K. (Diff). This shows the difference between the actual proportions ( I) and the expected proportions (J). A positive difference means that the actual proportion exceeds the expected proportion.

L. (Signif). This column shows the statistical significance of the difference between the two proportions. The Z-stat measures the statistical significance of the first digit 1 difference. Significance takes into account the size of the difference (over or under), the expected proportion, and the sample size. Scores above 1.96 are significant at the 0.05 level, and above 2.57 are significant at the 0.01 level.

M. (FTDigit). This is the bin range 10 to 99 for the first two-digit frequency counts.

N. (Count). See (H) above except that this count relates to the first-two digits.

11

Page 13: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

O. (Prop). See (I) above.

P. (Ben Law). The expected proportions of Benford's Law matched rowwise with column M.

Q. (Diff). See (K) above except that this is the difference between column O and column P.

R. (Signif). See (L) above.

12

Page 14: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

DATAS 2009 FOR EXCEL: DESCRIPTION OF TESTS

Data ProfileThe Data Profile is there to give the data analyst insights into the composition of the numbers in

the data set under review. The test could be called “getting to know your numbers better.” In addition to displaying the data in groups, the Data Profile also serves some control and reasonableness purposes.

The Data Profile provides statistics on the following data partitions:

1. Numbers equal to or larger than 10.00,2. Numbers from 0.01 to 9.99,3. Numbers equal to zero,4. Numbers from -0.01 to -9.99, and5. Numbers equal to or smaller than -10.00.

The above categories will be called large positive numbers, small positive numbers, zeroes, small negative numbers, large negative numbers. The program adds,

6. Numbers from 0.01 to 50.00, and,7. Numbers above $100,000.00

to your data profile. Categories (6) and (7) are useful for accounts payable and would point auditors to the low-value items (that cost resources to process) and to the high-value items that would usually be material.

The output from the test data set (DataInvoices.xlsx) is as follows:

13

Page 15: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

For some data sets it might be more appropriate to use more intervals or to change the value of the low-value dollar cutoffs. This might be particularly appropriate in countries where there are many units of the local currency to the US dollar. Examples would include Norway, India, and South Africa. Different break points would also be appropriate for data sets with particularly large numbers such as a database of Frequent Flyer Miles or bank wire transfers. In these cases you would have to do the calculations manually using Excel’s COUNTIF and SUMIF functions.

Notes and tipsExperience has provided the following guidelines:

1. The first task is to reconcile the total dollars in the file (in this case $93,136,553.14) to the financial records. This would help to ensure that that you are working with a complete file (i.e., all the transactions for the period). The Data Profile would also help you to understand what transactions are included in the data set and what transactions are excluded. For example, an auditor analyzing health care claims might discover that certain types of claims (e.g., dental claims or claims by retired employees) are processed by another system and that a second, or third, separate analysis is also needed. I have also at times seen that companies process “immediate and urgent” checks through a system separate from accounts payable, and government agencies process contract payments through separate payables systems.

2. In Accounts Payable data the first finding is usually that there is a high proportion of low-value ($50 and under) invoices. The norm for low-value invoices is about 15 percent. There have been cases where the Data Profile has shown low-value invoices to be above 50 percent. A value-added data analysis project would recommend ways to cut down on the percentage or count of low-value invoices. Examples would include purchasing cards or only reimbursing employees for expenses by deposit linked to the payroll run.

3. Many zero-value invoices would also be an audit finding. The Data Profile once showed that a company had 8,000 zero invoices. It turned out that these were warranty claims that were being processed like normal purchases. A value-added auditor would recommend system changes where there were excessive zero-value invoices. Inefficient systems are usually found in companies that have experienced explosive growth. The system that worked well when the company was younger and smaller becomes inefficient with large transaction volumes.

4. The percentage of credit memos should be reviewed. The norm is about 3 percent. Percentages above 3 percent might indicate that an excessive amount of correcting is done to data after it has been entered for processing. Percentages lower than 3 percent might indicate that the firm is missing credits due to it, or that not enough correcting or reviewing is being done. At the extremes the Data Profile has showed credit memos as high as 6 percent and as low as 1 percent. A very low percentage is an indicator that not much correcting is being done.

5. The Data Profile is also there to detect negative numbers in data sets that should not have negative numbers, e.g., perpetual inventory numbers and payroll (gross or net pay) numbers. Other examples would include census numbers, election results and car odometer readings.

14

Page 16: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

First, Second, and First-Two Digit TestsThe digit tests are tests of the (a) first digits, (b) second digits, and (c) first-two digits. The

Basic Digit Tests are each run on either all the positive numbers equal to or greater than 0.000001, or on all the negative numbers equal to or less than -0.000001. The positive and negative numbers are evaluated separately because the incentive to manipulate is opposite for these groups of numbers. For example, when Earnings Per Share are positive, management want the number to be bigger, but conversely, when making a loss, management want the EPS number to be smaller. Previous versions of DATAS deleted all numbers less than 10 but this has been changed for DATAS 2009 for Excel.

The results of the first digit test on the DataInvoice.xlsx table is shown below,

Notes and TipsThe First Digit test is an overall test of reasonableness. I compare the test to looking out the

window of the plane when you’re descending to land in your home city. One or two landmarks and the look of the terrain would be a reasonableness check that you are indeed landing at your home city. For data sets that are expected to conform to Benford’s Law I would expect a good fit for data sets of 3,000 or more records. The general rule is that a weak fit to Benford’s Law is a signal that the data

15

Page 17: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

set contains abnormal duplications and anomalies. Benford’s Law is thus a bit of a predictor test in that it will be our first indicator of data issues.

For Accounts Payable data and other data sets where prices are involved, the second digits graph will usually show excess 0s and 5s due to round numbers (such as 75, 100, and 250). This should not be a cause for concern. If the second digits shows an excess of (say) 8s, the usual step would be to go to the first-two digits graph to check which first-two digits combination is causing the spike (excess). The result might be that 48 has a large spike in which case the analyst would have isolated the cause to a smaller sample of suspect records.

The first-two digits test can highlight possible biases in the data. A bias is described as a gravitation to some part(s) of the number line due to control level critical points or due to psychological boundaries assumed by the other party to a transaction. Examples of results are:

1. A common finding when analyzing company expenses is a spike at 24. A spike is an actual proportion that exceeds the expected (Benford) proportion by a significant amount. This usually occurs at firms that require employees to submit vouchers for expenses that are $25 and higher. The graph would then show that employees are excessively submitting claims for just under $25.

2. The auditor should check for spikes at 48 and 49 and 98 and 99. This would indicate that amounts are being entered that are just below the psychological cutoff points of $100, $500, $1000, $5000, or $10000.

3. An auditor would also check for spikes that are just below internal authorization levels. For example, an insurance company might allow junior and mid-level adjusters to approve claims just below (say) $5,000 and $10,000. Spikes at 48, 49, 98, 99 would signal excessive paid claims just below authorization levels. This might signal fraud.

4. Auditors at a bank analyzed credit card balances written off. The first-two digits graph showed a spike at 49. The number duplication test showed many amounts for $4,900 to $4,980. Most of these amounts were attributable to one employee. The final result of the audit showed that the person was having cards issued to friends and family. The employee’s write-off limit was $5,000. The friends and family then ran up balances to just below $5,000 (as evidenced by the spike at 49) and the employee then wrote the balance off. The activity was detected on the first-two digits graph because the person was systematic in their actions.

5. An auditor ran the first-two digits test on two consecutive months of cost prices on inventory sheets. The graphs showed some differences in the first-two digits patterns. The follow up work showed that many of the items with positive cost values in the first month erroneously has zero cost amounts in the second month.

6. A finding recently reported by Inland Revenue in the U.K. was a big spike at 14 for revenue numbers reported by small businesses. The analysis showed them clearly that many business people were “managing” their sales numbers to just below 15,000 GBP (pounds). The tax system in the U.K. allows businesses with sales under $15,000 to use a “Schedule C Easy” when filing.

7. Employee credit card purchases were analyzed. The organization had a limit for any purchase by credit card of $2,500.00. Our analysis showed a big spike (excess) at 24 due to employees purchasing with great gusto in the $2,400.00 to $2,499.99 range.

16

Page 18: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

Number FrequenciesThe data profile and the digit tests give us a high level overview of our data. The number

frequencies test drills down into finer partitions of the data searching for anomalies usually in the

form of abnormal duplications. This test is essentially a numbers hit parade. The number duplication

test will tell us what specific numbers were causing spikes (positive differences) on the first-two

digits graph. Spikes on the first-two digits graph would be correlated with certain numbers that are

occurring abnormally often. For example, a large spike at 50 could correlate with an abnormal

duplication of perhaps $50 or $500.

The output for the DataInvoices.xlsx table is as follows,

This test has yielded some valuable findings. Usually data analysts and auditors would

investigate the following numbers:

Numbers associated with large positive spikes on the first-two digits graph. For the test data set

this would be numbers starting with either 50, 25, 15, 75, and 30. The significance column for

the first-two digits table would indicate which first-two digits had the largest deviations from

Benford’s Law.

Large round numbers. Round numbers are usually numbers that have been negotiated. These are

usually for professional services or donations, both of which are open to fraud and abuse. In the

test data case the $15,000 numbers were for quarterly director’s fees and consequently did not

indicate fraud or error. However many of the $5,000 and $10,000 numbers were for donations.

17

Page 19: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

Odd numbers that have occurred unusually often. An example in the test data set is $608.50 and

$46.17.

Numbers just below psychological thresholds or just below control amount levels. Although not

present in this case, $24.50 or $2,495 would qualify here.

Numbers that have occurred relatively more times than any other number. In this case the first

positive number ($25.00) occurred about twice as often as any other number.

This test is not limited to currency units. The auditors of an airline ran this test against the miles

deposits to frequent flyer mileage accounts for a calendar year. The test showed that (not

surprisingly) 500 miles was the most frequently deposited number, this being the minimum award for

a flight and consequently for any flight under 500 miles the passenger was awarded 500 miles. The

second most frequent number was (say) 817 miles and this turned out to be the distance between the

two main hubs of the carrier. This was not surprising because the airline had very frequent flights

between its two main hubs.

A company in Tennessee used the Number Duplication test to test for fictitious employees. The

auditor used the Number Duplication test on the payroll file and tested whether there was any

duplication in the bank account numbers of employees (from the Direct Deposit Details field). More

than two employees having their pay amounts deposited to the same bank account number could be

an indicator of fraud. They found cases of a count of 2 for a checking account and this turned out to

be cases where two employees were married. They also found other duplications where two or three

(younger) employees shared an apartment and also a checking account. The explanation was that

some of their employees did not qualify for a checking account (perhaps due to prior histories of

bouncing checks) and shared an account with an employee that was a friend.

This test has also been run on employee credit cards and the results showed that there was

very little number duplication. That is, the most frequently used number was not all that frequently

used. These results made sense because there is no real reason that any number for corporate

purchases (other than Federal Express charges) should occur abnormally often.

This test has also been used with varying successes on inventory counts, temperature

readings, health care claims, airline ticket refunds, airline flight liquor sales, electricity meter

readings, and election counts.

18

Page 20: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

Second-Order TestAuditors are required to use analytical procedures to identify the existence of unusual

transactions, events, and trends. Benford’s Law gives the expected patterns of the digits in numerical

data, and has been advocated as a test for the authenticity and reliability of transaction level

accounting data. There is now a new second-order test related to Benford’s Law that could be used

by auditors to detect inconsistencies in the internal patterns of data. This new test diagnoses the

relationships and patterns found in transactional data and is based on the digits of the differences

between amounts that have been sorted from smallest to largest (ordered). These digit patterns are

expected to closely approximate the digit frequencies of Benford’s Law. The second-order test is

demonstrated using studies that use (1) accounts payable amounts, (2) journal entry amounts, and (3)

annual revenue and cost data. The results showed that the second-order test can detect (a) anomalies

occurring in data downloads, (b) rounded data, (c) the use of regression output in place of actual

transactional data, (d) the use of statistically generated data in place of actual transactional data, and

(e) inaccurate ranking in data that is assumed to be ordered from smallest to largest. These error

conditions would not have been easily detectable using the usual set of descriptive statistics. The

second-order test gives few, if any, false positives in that if the results are not as expected (close to

Benford’s Law), then the data does have some characteristic that is rare and unusual, abnormal, or

irregular.

The second-order Benford test is described below:

Let x1, …, xN be a data set comprising observations drawn from a continuous distribution, and

let y1, …, yN be the xi’s in increasing order. Then, for many natural data sets, for large N, the digits

of the differences between adjacent observations (yi+1 – yi ) is close to Benford’s Law. Large

deviations from Benford’s Law indicate an anomaly that should be investigated.

The results of the second order test (which analyzes the digits of the differences between data

that has been sorted from smallest to largest) should be like the first or second graphs shown below.

The second pattern (with spikes at 10, 20, …, 90) could be more or less pronounced, depending on

your data. Any pattern other than those shown below indicate some anomalous issue with the data.

19

Page 21: DIGITAL ANALYSIS TESTS AND STATISTICS - Nigrininigrini.com/data_software/Program_Details_2009.docx · Web viewon all the negative numbers equal to or less than -0.000001. The positive

The second graph has two different patterns. The first Benford-like pattern applies to the

first-two digits of 10, 20, 30, … , 90, and a second Benford-like function applies to the remaining

first-two digits. The reason for the spikes at 10, 20, … , 90 is that the numbers in the

DataInvoices.xlsx table are tightly packed in the $10.00 to $999.99 range with almost one-quarter of

the differences being 0.01 or 0.02 (giving the spikes at 10 and 20). The mathematical explanation for

the systematic spikes is that the table is not made up of numbers from a continuous distribution.

Currency amounts can only differ by multiples of $0.01. The second-order results are because of the

high density of the numbers over a short interval and because the numbers are restricted to 100

evenly spaced fractions after the decimal point. These spiked second-order patterns should occur

with any discrete data (e.g., population numbers) and the size of the spikes at 10, 20, …, 90 are a

function of both N and the range. When many numbers are packed into a small range then many of

the differences will be a small value such as 0.01 (for currency) or 1 (for integers).

Summary: The second-order test analyses the digit patterns of the differences between the

ordered (ranked) values of a data set. In most cases the digit frequencies of the differences will

closely follow Benford’s law irrespective of the distribution of the underlying data. While the usual

Benford’s Law tests are usually only of value on data that is expected to follow Benford’s law, the

second-order test can be performed on any data set. This second-order test could actually return

compliant results for data sets with errors or omissions. However, the data issues that the second-

order tests did detect in the studies (errors in the download, rounding, and the use of statistically

generated numbers) would not have been detectable using the usual descriptive statistics.

20