benford’s very strange law john d. barrow

31
nford’s Very Strange nford’s Very Strange John D. Barrow John D. Barrow

Upload: nerea-atkins

Post on 31-Dec-2015

48 views

Category:

Documents


1 download

DESCRIPTION

Benford’s Very Strange Law John D. Barrow. Simon Newcomb. 1888: "We are probably nearing the limit of all we can know about astronomy". 1835-1909. ‘Note on the Frequency of Use of the Different Digits in Natural Numbers’, 1881. Log Tables Yield…. Newcomb’s ‘Law’ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Benford’s Very Strange Law John D. Barrow

Benford’s Very Strange LawBenford’s Very Strange LawJohn D. BarrowJohn D. Barrow

Page 2: Benford’s Very Strange Law John D. Barrow

Simon NewcombSimon Newcomb

1835-19091835-1909

1888:1888:"We are probably nearing the limit of all we can know about astronomy""We are probably nearing the limit of all we can know about astronomy"

‘‘Note on the Frequency of Use of the Different Digits in Natural Numbers’, 1881Note on the Frequency of Use of the Different Digits in Natural Numbers’, 1881

Page 3: Benford’s Very Strange Law John D. Barrow

Log Tables Yield…Log Tables Yield…

Page 4: Benford’s Very Strange Law John D. Barrow
Page 5: Benford’s Very Strange Law John D. Barrow

Newcomb’s ‘Law’

"That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones.

The first significant figure is oftener 1 than any other digit, and the frequency diminishes up to 9."

The law of probability of the occurrence of The law of probability of the occurrence of numbers is such that all mantissae numbers is such that all mantissae [fractional [fractional part]part] of their logarithms are equally probable. of their logarithms are equally probable.

Page 6: Benford’s Very Strange Law John D. Barrow

Data on first digits are evenly spread on a logarithmic scaleBut it will not be on a linear scale. They become increasingly sparse

Newcomb said this law was “evident”Newcomb said this law was “evident”

P(d) P(d) [log(d+1) – log(d)]/[log(10) – log(1)] = log(1 + 1/d) [log(d+1) – log(d)]/[log(10) – log(1)] = log(1 + 1/d)

Page 7: Benford’s Very Strange Law John D. Barrow

P(d)= logP(d)= log1010[1 + 1/d], d = 1, 2,..[1 + 1/d], d = 1, 2,..

Probability of the First Digit Being Equal to dProbability of the First Digit Being Equal to d

Ignore signs and take first digit after decimal point eg for -3.1526 it is 1

Page 8: Benford’s Very Strange Law John D. Barrow

P(1) = 0.30P(1) = 0.30P(2) = 0.18P(2) = 0.18P(3) = 0.12P(3) = 0.12P(4) = 0.10P(4) = 0.10P(5) = 0.08P(5) = 0.08P(6) = 0.07P(6) = 0.07P(7) = 0.06P(7) = 0.06P(8) = 0.05P(8) = 0.05P(9) = 0.05P(9) = 0.05

You might have thought P(1) = P(2) = P(3) = ….P(9) = 0.11..You might have thought P(1) = P(2) = P(3) = ….P(9) = 0.11..But…But…

A Big SurpriseA Big Surprise

Page 9: Benford’s Very Strange Law John D. Barrow

Rediscovered by Rediscovered by Frank Benford Frank Benford at GEC in 1938at GEC in 1938

1883-1948

‘The Law of Anomalous Numbers’ (1938)

P(d)= logP(d)= log1010[1 + 1/d] first-digit distribution[1 + 1/d] first-digit distribution

then becomes known asthen becomes known as

““Benford’s Law”Benford’s Law”

Page 10: Benford’s Very Strange Law John D. Barrow

Benford gathered 20,000 pieces of data and studied Benford gathered 20,000 pieces of data and studied First-digit frequenciesFirst-digit frequencies

DataData 11 22 33 44 55 66 77 88 99River River areasareas

31.0%31.0% 16.416.4 10.710.7 11.311.3 7.27.2 8.68.6 5.55.5 4.24.2 5.15.1

BaseBase

ballball

32.732.7 17.617.6 12.612.6 9.89.8 7,47,4 6.46.4 4.94.9 5.65.6 3.03.0

magazimagazinesnes

33.433.4 18.518.5 12.412.4 7.517.51 7.17.1 6.56.5 5.55.5 4.94.9 4.24.2

Powers Powers of 2of 2

3030 1717 1313 1010 77 77 66 66 55

20 20 tablestables

30.630.6 18.518.5 12.412.4 9.49.4 8.08.0 6.46.4 5.15.1 4.94.9 4.74.7

half -half -liveslives

29.629.6 17.817.8 1.71.7 10.510.5 9,99,9 4.84.8 5.25.2 5.25.2 5.25.2

BenfordBenfordLawLaw

30.130.1 17.617.6 12.512.5 9.79.7 7.97.9 6.76.7 5.85.8 5.15.1 4.64.6

Page 11: Benford’s Very Strange Law John D. Barrow

Random street addresses

Page 12: Benford’s Very Strange Law John D. Barrow
Page 13: Benford’s Very Strange Law John D. Barrow
Page 14: Benford’s Very Strange Law John D. Barrow

Picking Raffle Tickets

P(1) = 1/3P(1) = 1/2

P(1) = 1/9P(1) = 1/5

P(1) goes up as be go to 19 tickets, then falls

Page 15: Benford’s Very Strange Law John D. Barrow

P(1)

Number of tickets

P(1) depends on thenumber of tickets

Take an average over all Possible numbers of tickets

The average is 30.1%

P(1)

Number of tickets S. Mould

Page 16: Benford’s Very Strange Law John D. Barrow

Universal distribution P(x) for numbers with units Universal distribution P(x) for numbers with units Means it must be scale invariantMeans it must be scale invariant

P(kx) = f(k)P(x)P(kx) = f(k)P(x)

Since Since P(x)dx = 1 we must have P(x)dx = 1 we must have P(kx)dx = 1/k P(kx)dx = 1/k so 1/k = so 1/k = P(kx)dx = f(k) P(kx)dx = f(k) P(x)dx = f(k) P(x)dx = f(k)

Means f(k) = 1/kMeans f(k) = 1/k

d/dk of P(kx) = f(k)P(x)xdP(kx)/d(kx) xdP(kx)/d(kx) d(kx)/dk = -P(x)/k d(kx)/dk = -P(x)/k22

Put k = 1Means P(x) = 1/x

In reality we won’t go to zero or infinity so don’t worry about 0 1/x dx being infinite

Page 17: Benford’s Very Strange Law John D. Barrow

By the same kind of analysis we can determine the probability that the second digit will have a certain value.

It's only necessary to consider a single order of magnitude, since the pattern is repeated on each order.

For example, in the base 10, the probability of the second digit being "3" is equal to the sum of the probabilities

of the first two digits being "1.3", "2.3", "3.3", ... or "9.3" for numbers in the range from 1 to 10.

This is indicated by the shaded regions in the logarithmic scale:

The fraction in 1.4 to 1.3 is

Now just find the fractions in 2.2 to 2.3 etc and add all the answers together

Other DigitsOther Digits

Page 18: Benford’s Very Strange Law John D. Barrow

Probabilities for Successive Significant DigitsProbabilities for Successive Significant Digits

P(first digit is d) = log[1 + 1/d], d = 1,2,3,…9.P(first digit is d) = log[1 + 1/d], d = 1,2,3,…9.

P(second digit is d) = P(second digit is d) = 99k=1k=1 log[1 + (10k+d) log[1 + (10k+d)-1-1], d = 0,1,2…9.], d = 0,1,2…9.

The joint distribution of all digits can be found and they are not independent

P(first = dP(first = d11, …,k, …,kthth = d = dkk) = log[1 + () = log[1 + (i=1i=1kk d di i 10 10k-ik-i))-1-1]]

Eg for 0.314; P(3,1,4) = log[1 + (314)Eg for 0.314; P(3,1,4) = log[1 + (314)-1-1] = 0.0014..] = 0.0014..

Unconditional probability that second digit is 1 is P(second digit =1) = 0.109, But conditional probability that it is 1 given that the first is 1 is 0.115Dependence falls off fast as distance between digits increases Distn of the nth digit approaches a uniform distribution on 0,1,2,…,9 very fast as n , so P 1/10 for occurrence of each 0,1,2…,9 as log(1 + 1/n) 1/n

(Newcomb)

Page 19: Benford’s Very Strange Law John D. Barrow

Invariances Pick Out Invariances Pick Out BenfordBenford

Scale invariance – Scale invariance – no preferred unitsno preferred units

Base invariance wrt base of Base invariance wrt base of arithmetic barithmetic b

P(d) = logP(d) = logbb(1 + 1/d)(1 + 1/d)

But why should there be a But why should there be a distribution like this at all?distribution like this at all?

Page 20: Benford’s Very Strange Law John D. Barrow

Do All First-Digit DistributionsDo All First-Digit Distributions Follow Newcomb-Benford?Follow Newcomb-Benford?

Page 21: Benford’s Very Strange Law John D. Barrow

US tax return data Random number generator

Page 22: Benford’s Very Strange Law John D. Barrow

Not Everything Follows Not Everything Follows BenfordBenford

Continued fraction digits are mostly 1’s in Continued fraction digits are mostly 1’s in general but they are not Benford-Newcomb-likegeneral but they are not Benford-Newcomb-like

P(k) = ln[1 + 1/k(k + 2)]/ln[2]P(k) = ln[1 + 1/k(k + 2)]/ln[2]

P(1) = 0.41, P(2) = 0.17, P(3) = 0.09, P(4) = 0.06, P(5) = 0.04

Steeper than Benford: P(k) Steeper than Benford: P(k) 1/k 1/k22 as k as k ln(1+x) ln(1+x) x x

a = k + x = integer + fractional part

For almost all real numbers:

Page 23: Benford’s Very Strange Law John D. Barrow

First digits are Benford-Newcomb distributed so long asFirst digits are Benford-Newcomb distributed so long as• Data measure same phenomena (eg all prices or areas)Data measure same phenomena (eg all prices or areas)• There is no built in max or min valuesThere is no built in max or min values• The numbers are not assigned (like phone nos)The numbers are not assigned (like phone nos)• The underlying distribution is fairly smooth The underlying distribution is fairly smooth • More observations of small items than large ones More observations of small items than large ones • Data spans several whole numbers on the log scale:Data spans several whole numbers on the log scale:

* The distribution must be broad rather than narrow ** The distribution must be broad rather than narrow *

Ratios of areasproportional to widthsEg incomes. populns

Ratios of areas notproportional to widthsEg human heights, IQscores

Red area is relativeProb first digit is 1

Blue area is relativeProb first digit is 8 BroadBroad

NarrowNarrow

Page 24: Benford’s Very Strange Law John D. Barrow

Different Types Different Types of Dataof Data

yes

yes

no

yes

Benford-like ?Benford-like ?

Page 25: Benford’s Very Strange Law John D. Barrow

Winning LotteriesWinning Lotteries The Massachusetts Numbers Game – State LotteryThe Massachusetts Numbers Game – State Lottery

1. Bet on a 4-digit number1. Bet on a 4-digit number

2. A 4-digit number is generated randomly2. A 4-digit number is generated randomly

3. All winners share the jackpot3. All winners share the jackpot

A Possible StrategyA Possible Strategy

To avoid sharing the prize. Assume entrants pick numbers from To avoid sharing the prize. Assume entrants pick numbers from their experience (ie not at random) and obey Benford’s law. So pick their experience (ie not at random) and obey Benford’s law. So pick numbers that are least probable by the Benford-Newcomb law. So numbers that are least probable by the Benford-Newcomb law. So start with 9’s and 8’sstart with 9’s and 8’s

Evidence (Hill 1988) that numbers ‘randomly’ chosen by people tend Evidence (Hill 1988) that numbers ‘randomly’ chosen by people tend to start with low digitsto start with low digits

Page 26: Benford’s Very Strange Law John D. Barrow

Generalised Benford’s LawsGeneralised Benford’s Laws

A random process with probability distribution P(x) A random process with probability distribution P(x) 1/x 1/x gives Benford data for first digits:gives Benford data for first digits:

P(d)= log[1 + 1/d]P(d)= log[1 + 1/d] Random processes with P(x) Random processes with P(x) 1/x 1/xaa and a and a 1 give 1 give

P(x) = C P(x) = C ddd+1 d+1 xx-a-a dx = (10 dx = (101-a 1-a – 1)– 1)-1-1[(d+1)[(d+1)1-a1-a – d – d1-a1-a]]

For a = 2: P(d = 1) = 0.56, P(d = 2) = 0.185, For a = 2: P(d = 1) = 0.56, P(d = 2) = 0.185,

P(d = 3) =0.09, P(d = 9) = 0.012P(d = 3) =0.09, P(d = 9) = 0.012 For prime numbers from 1 to NFor prime numbers from 1 to N

a(N) = 1/[logN – c]a(N) = 1/[logN – c]

c = 1.10 c = 1.10 ++ 0.05 large N 0.05 large N Perone et al

Page 27: Benford’s Very Strange Law John D. Barrow

a = 1.10Christian Perone

A Well-defined Approach to Uniformity by the PrimesA Well-defined Approach to Uniformity by the Primes

Page 28: Benford’s Very Strange Law John D. Barrow

Detecting FraudDetecting Fraud

‘‘Natural’ distributions and their combinations should follow BenfordNatural’ distributions and their combinations should follow BenfordMaybe ‘Doctored’ or ‘artificial’ constructions do not ???Maybe ‘Doctored’ or ‘artificial’ constructions do not ???

Mark Nigrini Univ. Cincinnati PhD thesis (1992)Mark Nigrini Univ. Cincinnati PhD thesis (1992)‘‘The detection of income evasion through an analysis of digital distributions’The detection of income evasion through an analysis of digital distributions’

Data from the lines of 169,662 IRS model files follow Benford's law Data from the lines of 169,662 IRS model files follow Benford's law closely. closely.

Fraudulent data taken from a 1995 King’s County, New York, District Fraudulent data taken from a 1995 King’s County, New York, District Attorney's Office study of cash disbursement and payroll in business Attorney's Office study of cash disbursement and payroll in business don’t follow Benford's law.don’t follow Benford's law.

The fraudulent or concocted data appear to have far fewer numbers The fraudulent or concocted data appear to have far fewer numbers starting with 1 and many more starting with 5 or 6 than do true data.starting with 1 and many more starting with 5 or 6 than do true data.

Page 29: Benford’s Very Strange Law John D. Barrow
Page 30: Benford’s Very Strange Law John D. Barrow

Robert Burton, the chief financial investigator for the Brooklyn District Attorney recalled in an interview that he had read an article by Dr. Nigrini that fascinated him.

"He had done his Ph.D. dissertation on the potential use of Benford's Law to detect tax evasion, and I got in touch with him in what turned out to be amutually beneficial relationship," Mr. Burton said. "Our office had handledseven cases of admitted fraud, and we used them as a test of Dr. Nigrini'scomputer program. It correctly spotted all seven cases as "involving probable fraud."

Forensic Accounting with Newcomb-BenfordForensic Accounting with Newcomb-Benford

He feels your pain

Page 31: Benford’s Very Strange Law John D. Barrow

President Clinton’s Tax Returns over 13 YearsPresident Clinton’s Tax Returns over 13 Years