scan statistics on enron graphs - johns hopkins applied ...priebe/.files/cepssg072105.pdf · scan...

39
<>-+ Scan Statistics on Enron Graphs Carey E. Priebe [email protected] Johns Hopkins University Department of Applied Mathematics & Statistics Department of Computer Science Center for Imaging Science “The wealth of your practical experience with sane and interesting problems will give to mathematics a new direction and a new impetus.” — Leopold Kronecker to Hermann von Helmholtz Scan Statistics on Enron Graphs – p.1/39

Upload: trantruc

Post on 05-May-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Scan Statistics on Enron Graphs

Carey E. [email protected]

Johns Hopkins University

Department of Applied Mathematics & StatisticsDepartment of Computer Science

Center for Imaging Science

“The wealth of your practical experience with sane and interesting problemswill give to mathematics a new direction and a new impetus.”

— Leopold Kronecker to Hermann von Helmholtz

Scan Statistics on Enron Graphs – p.1/39

Page 2: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Citation

C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park,“Scan Statistics on Enron Graphs,”Computational and Mathematical Organization Theory,to appear.

SIAM International Conference on Data MiningWorkshop on Link Analysis, Counterterrorism and SecurityNewport Beach, CaliforniaApril 23, 2005

http://www.ams.jhu.edu/∼priebe/sseg.html

Scan Statistics on Enron Graphs – p.2/39

Page 3: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Outline

Scan Statistics on Graphs, and on Time Series thereofScan Statistics on Enron Graphs

Scan Statistics on Enron Graphs – p.3/39

Page 4: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Introduction

Objective: to develop and apply a theory of scan statistics onrandom graphs to perform change point / anomaly detectionin graphs and in time series thereof.Hypotheses:

H0: homogeneityHA: local subregion of excessive activity

Scan Statistics on Enron Graphs – p.4/39

Page 5: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Scan Statistics

“moving window analysis”:to scan a small “window” (scan region) over data,calculating some locality statistic for each window;e.g.,• number of events for a point pattern,• average pixel value for an image,• ...scan statistic ≡ maximum of locality statistic:If maximum of observed locality statistics is large,then the inference can be made thatthere exists a subregion of excessive activity.

Scan Statistics on Enron Graphs – p.5/39

Page 6: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Scan Statistics on Graphs

directed graph (digraph): D = (V, A)

order: |V (D)|

size: |A(D)|

k-th order neighborhood of v:Nk[v; D] = w ∈ V (D) : d(v, w) ≤ k

scan region (example: induced subdigraph): Ω(Nk[v; D])

locality statistic (example: size): Ψk(v) = |A(Ω(Nk[v; D]))|

“scale-specific” scan statistic: Mk(D) = maxv∈V (D) Ψk(v)

Scan Statistics on Enron Graphs – p.6/39

Page 7: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

HA1

p 00

pAA> p00

p0A = pA0 = p00

Figure 1: Alternative random graph hypothesis HA1.

Scan Statistics on Enron Graphs – p.7/39

Page 8: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Monte Carlo SimulationH0

unrandomized size = 0.05T1

Dens

ity19200 19400 19600 19800 20000 20200

0.00

000.

0015

0.00

30

19795

HA

randomized power = 0.16T1

Dens

ity

19200 19400 19600 19800 20000 20200

0.00

000.

0015

0.00

30

19795

unrandomized size = 0.045T2

Dens

ity

50 55 60 65 70 75 80

0.00

0.05

0.10

0.15

65

randomized power = 0.154T2

Dens

ity

50 55 60 65 70 75 80

0.00

0.05

0.10

0.15

65

unrandomized size = 0.046T3

Dens

ity

150 200 250

0.00

0.02

0.04

153

randomized power = 1T3

Dens

ity

150 200 250

0.00

00.

010

0.02

00.

030

153

n = 1000, n′ = 13, α = 0.05, MC = 1000

T = size

T = scan0

T = scan1

β = 0.160

β = 0.154

β = 1.000

Scan Statistics on Enron Graphs – p.8/39

Page 9: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Gumbel Conjecture

T1

Dens

ity19200 19400 19600 19800 20000

0.00

00.

001

0.00

20.

003

T2

Dens

ity

50 55 60 65 70 75 80

0.00

0.05

0.10

0.15

T3

Dens

ity

120 140 160 180

0.00

0.02

0.04

n = 1000, MC = 1000

T = size

T = scan0

T = scan1

Scan Statistics on Enron Graphs – p.9/39

Page 10: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: H0

size = 26, scan0 = 5, scan1 = 5, scan2 = 11.

Scan Statistics on Enron Graphs – p.10/39

Page 11: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: HA1

size = 32, scan0 = 5, scan1 = 7, scan2 = 11.

Scan Statistics on Enron Graphs – p.11/39

Page 12: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: Monte Carlo Simulation H0 vs HA1H0

unrandomized size = 0.044T1

Dens

ity

10 20 30 40 50 60

0.00

0.03

0.06

35

HA

randomized power = 0.273T1

Dens

ity

10 20 30 40 50 60

0.00

0.04

35

unrandomized size = 0.032T2

Dens

ity

0 2 4 6 8 10

0.0

0.2

0.4

5

randomized power = 0.35T2

Dens

ity

0 2 4 6 8 10

0.0

0.2

0.4

5

unrandomized size = 0.018T3

Dens

ity

0 5 10 15 20

0.0

0.2

0.4

6

randomized power = 0.976T3

Dens

ity

0 5 10 15 20

0.00

0.20

6

unrandomized size = 0.037T4

Dens

ity

0 5 10 15 20 25 30

0.00

0.10

13

randomized power = 0.451T4

Dens

ity

0 5 10 15 20 25 30

0.00

0.06

0.12

13

n = 50, n′ = 4, α = 0.05, MC = 1000

T = size

T = scan0

T = scan1

T = scan2

β = 0.273

β = 0.350

β = 0.976

β = 0.451

Scan Statistics on Enron Graphs – p.12/39

Page 13: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

HA2

consider now a structured alternative . . .

Scan Statistics on Enron Graphs – p.13/39

Page 14: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: HA2

size = 34, scan0 = 5, scan1 = 5, scan2 = 17.

Scan Statistics on Enron Graphs – p.14/39

Page 15: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: HA2

size = 34, scan0 = 5, scan1 = 5, scan2 = 17.

Scan Statistics on Enron Graphs – p.15/39

Page 16: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Example: Monte Carlo Simulation H0 vs HA2H0

unrandomized size = 0.044T1

Dens

ity

10 20 30 40 50 60

0.00

0.03

0.06

35

HA

randomized power = 0.633T1

Dens

ity

10 20 30 40 50 60

0.00

0.04

35

unrandomized size = 0.032T2

Dens

ity

0 2 4 6 8 10

0.0

0.2

0.4

5

randomized power = 0.545T2

Dens

ity

0 2 4 6 8 10

0.0

0.2

0.4

5

unrandomized size = 0.018T3

Dens

ity

0 5 10 15 20

0.0

0.2

0.4

6

randomized power = 0.501T3

Dens

ity

0 5 10 15 20

0.0

0.2

6

unrandomized size = 0.037T4

Dens

ity

0 10 20 30 40

0.00

0.10

13

randomized power = 0.915T4

Dens

ity

0 10 20 30 40

0.00

0.10

13

n = 50, n′ = 13, α = 0.05, MC = 1000

T = size

T = scan0

T = scan1

T = scan2

β = 0.633

β = 0.545

β = 0.501

β = 0.915

Scan Statistics on Enron Graphs – p.16/39

Page 17: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Time Series

0 20 40 60 80 100

160

180

200

220

240

timeT 1

0 20 40 60 80 100

810

1214

16

time

T 2

0 20 40 60 80 100

1012

1416

1820

time

T 3

n = 100, n′ = 4, α = 0.05

T = size

T = scan0

T = scan1

Scan Statistics on Enron Graphs – p.17/39

Page 18: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Time Series

0 20 40 60 80 100

160

180

200

220

240

timeT 1

0 20 40 60 80 100

810

1214

1618

time

T 2

0 20 40 60 80 100

1015

2025

30

time

T 3

n = 100, n′ = 6, α = 0.05

T = size

T = scan0

T = scan1

Scan Statistics on Enron Graphs – p.18/39

Page 19: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Time Series

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

n’= 4z

F Z(z

)

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

n’= 6z

F Z(z

)

n = 100, n′ = 4&6, α = 0.05

n = 4

n = 6

Scan Statistics on Enron Graphs – p.19/39

Page 20: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Enron

Scan Statistics on Enron Graphs – p.20/39

Page 21: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Enron Data

125,409 distinct messages from 184 unique “From” field,mostly Enron executives.189 weeks, from 1988 through 2002.directed edges (arcs) At = (v, w): vertex v sends at least oneemail to vertex w during the t-th week (“To”, “CC”, or “BCC”)

Scan Statistics on Enron Graphs – p.21/39

Page 22: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Statistics and Time Series

scale-k locality statistics: Ψk,t(v) = |A(Ω(Nk[v; Dt]))|

k = 0 : Ψ0,t(v) = outdegree(v; Dt).scan statistic: Mk,t = maxv Ψk,t(v); k = 0, 1, 2

vertex-dependent standardized locality statistic:Ψk,t(v) = (Ψk,t(v) − µk,t,τ (v)) / max(σk,t,τ (v), 1)

µk,t,τ (v) = 1τ

∑t−1t′=t−τ Ψk,t′(v)

σ2k,t,τ (v) = 1

τ−1

∑t−1t′=t−τ (Ψk,t′(v) − µk,t,τ (v))2

standardized scan statistic: Mk,t = maxv Ψk,t(v)

Scan Statistics on Enron Graphs – p.22/39

Page 23: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Statistics and Time Series

050

100

150

200

250

300

time (mm/yy)

raw

scan

sta

tistic

s

11/98 05/99 10/99 04/00 10/00 04/01 09/01 03/02

sizescan0scan1scan2

050

100

150

time (mm/yy)

stan

dard

ized

scan

sta

tistic

s

11/98 05/99 10/99 04/00 10/00 04/01 09/01 03/02

scan0scan1scan2

Scan Statistics on Enron Graphs – p.23/39

Page 24: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection

temporally-normalized scanstatistics: Sk,t =

(Mk,t − µk,t,`)/ max(σk,t,`, 1)

detection: time t such thatSk,t > 5

t∗ = 132 (May, 2001)

−20

24

68

time (mm/yy)

scan

0

02/01 03/01 04/01 05/01 06/01

−20

24

68

time (mm/yy)

scan

1

02/01 03/01 04/01 05/01 06/01

−20

24

68

time (mm/yy)

scan

2

02/01 03/01 04/01 05/01 06/01

Scan Statistics on Enron Graphs – p.24/39

Page 25: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Detection Graph D132

arg maxv

Ψ0,132(v) = john.lavorato

arg maxv

Ψ1,132(v) = john.lavorato

arg maxv

Ψ2,132(v) = richard.shapiro

arg maxv

eΨ0,132(v) = richard.shapiro

arg maxv

eΨ1,132(v) = joannie.williamson

arg maxv

eΨ2,132(v) = k..allen

a..martinandrea.ring

andrew.lewis

andy.zippera..shankman

barry.tycholiz

benjamin.rogersbill.rapp

bill.williamsbrad.mckay

cara.sempergerc..gironcharles.weldon

chris.dorlandchris.germanyclint.dean

cooper.richeycraig.dean

dana.davis

dan.hyvldanny.mccarty

darrell.schoolcraftdarron.giron

david.delaineydebra.perlingiere

diana.scholtes

d..martin

don.baughman

drew.fossumd..thomas

dutch.quigleye..haedicke

elizabeth.sager

eric.basseric.saibi

errol.mclaughlin

f..brawner

f..campbell

f..keaveyfletcher.sturmfrank.ermis

geir.solberg

geoff.storey

gerald.nemecgreg.whalley

harry.arorah..lewis

holden.salisbury

hunter.shively

james.derrick

james.steffes

jane.tholt

jason.williamsjason.wolfe

jay.reitmeyer

jeff.dasovich

jeff.king

jeffrey.hodgejeffrey.shankman

jeff.skillingj..farmer

j.harris

jim.schwieger

j.kaminskijoannie.williamson

joe.parksjoe.quenet

joe.stepenovitch

john.arnoldjohn.forney

john.griffithjohn.hodge

john.lavorato

john.zufferlijonathan.mckay

j..sturm

juan.hernandezjudy.townsend

k..allen

kam.keiserkate.symeskay.mann

keith.holst

kenneth.laykevin.hyatt

kevin.presto

kevin.ruscitti

kimberly.watsonkim.wardlarry.campbell

larry.mayl..gay

lindy.donoholisa.gangliz.taylor

l..mims

louise.kitchen

lynn.blairmarie.heardmark.e.haedicke

mark.haedickemark.taylor

mark.whitt

martin.cuillamatthew.lenhartmatt.motley matt.smithm..forneymichelle.cash

michelle.lokay

mike.carson

mike.grigsby

mike.maggi

mike.mcconnell

mike.swerzbin

m..lovemonika.causholli

monique.sanchez

m..prestom..scott

m..tholtpatrice.mims

paul.thomas

peter.keaveyphillip.allen

phillip.lovephillip.platter

randall.gay

richard.ringrichard.sandersrichard.shapiro

rick.buy

robert.badeerrobert.benson

rod.hayslettryan.slinger

sally.beck

sandra.brawner

sara.shackleton

scott.hendrickson

scott.neal

shelley.corman

s..shively

stacy.dicksonstanley.horton

stephanie.panus

steven.kean

steven.south

susan.bailey

susan.pereira

susan.scotts..wardtana.jonesteb.lokeytheresa.staab

thomas.martintom.donohoetori.kuykendall

tracy.geacconevince.kaminski

vladi.pimenov

v.weldon

w..pereira

Scan Statistics on Enron Graphs – p.25/39

Page 26: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Detection Graph D132

Details for the ‘detection’ graph D132

time t∗ 132 (week of May 17, 2001)size(D132) 267

scale k Mk,132 Mk,132 Sk,132

0 66 8.3 0.32

1 93 7.8 −0.35

2 172 116.0 7.30

3 219 174.0 5.20

number of isolates 50

Scan Statistics on Enron Graphs – p.26/39

Page 27: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (Aliasing)

v∗ = arg maxv Ψ2,132(v) = k..allen

k..allen == phillip.allen?k..allen had no activity before t∗ = 132.At t∗ = 132, phillip.allen switched to k..allen.

Matched Filter:For each vertex v ∈ V \ v∗,

st∗,κ(v; v∗) =

t∗−1∑

t′=t∗−κ

|N1(v; Dt′) ∩ N1(v∗; Dt∗)|

Is this a detection we want?

Scan Statistics on Enron Graphs – p.27/39

Page 28: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

New York Times (May 22, 2005)

Scan Statistics on Enron Graphs – p.28/39

Page 29: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

New York Times (May 22, 2005)

Scan Statistics on Enron Graphs – p.29/39

Page 30: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Enron Timeline

Scan Statistics on Enron Graphs – p.30/39

Page 31: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (another )

Non-zero activity: Ψk,t(v) · Iµ0,t,τ (v) > c

For c = 1, v∗ = roy.hayslett at t∗ = 152.

scale k Ψk,t∗−5:t∗(v∗)

0 [ 1 , 2 , 1 , 3 , 1 , 2]1 [ 1 , 2 , 2 , 9 , 2 , 4]2 [ 1 , 2 , 2 , 19 , 4 , 175]3 [ 1 , 2 , 2 , 58 , 6 , 268]

Scan Statistics on Enron Graphs – p.31/39

Page 32: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (another )

roy.hayslett communicates with sally.beck , who is a k = 0

detection!

scale k Ψk,t∗−5:t∗(v)

0 [ 3 , 2 , 0 , 2 , 3 , 62]1 [ 3 , 3 , 0 , 3 , 6 , 154]2 [ 4 , 3 , 0 , 37 , 11 , 229]3 [ 4 , 3 , 0 , 98 , 16 , 267]

Scan Statistics on Enron Graphs – p.32/39

Page 33: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (chatter )

Seek a detection in which the excess activity is due to chatteramongst the 2-neighbors!

Ψ′

t(v) =(Ψ2,t(v) · It,τ (v)

)/ max(γt(v), 1)

It,τ (v) =I1 × I2 × I3

I1 =Iµ0,t,τ > c1,

I2 =IΨ0(v) < σ0,t,τ (v)c2 + µ0,t,τ (v),

I3 =IΨ1(v) < σ1,t,τ (v)c3 + µ1,t,τ (v).

Scan Statistics on Enron Graphs – p.33/39

Page 34: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (chatter )

0.0

0.5

1.0

1.5

2.0

time (mm/yy)

M~ ′ 2

11/98 05/99 10/99 04/00 10/00 04/01 09/01 03/02

−0.5

0.0

0.5

1.0

1.5

time (mm/yy)

S′2

11/98 05/99 10/99 04/00 10/00 04/01 09/01 03/02

Scan Statistics on Enron Graphs – p.34/39

Page 35: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (chatter )

(v∗, t∗) = (steven.kean, 109)

scale k Ψk,t∗−5:t∗(v∗)

0 [ 3 , 5 , 4 , 5 , 4 , 5]1 [11 , 13 , 10 , 10 , 11 , 18]2 [14 , 35 , 21 , 38 , 13 , 65]

Scan Statistics on Enron Graphs – p.35/39

Page 36: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (chatter )

dan.hyvl

danny.mccarty

david.delainey

drew.fossum

james.steffes

jane.tholt

jeff.dasovich

jeffrey.hodge

john.lavorato

kevin.presto

lindy.donoho

mark.haedicke

mark.taylor

phillip.allen

richard.sanders

richard.shapiro

robert.badeerscott.neal

shelley.corman

stanley.horton

steven.kean

susan.scott

Ω109

Scan Statistics on Enron Graphs – p.36/39

Page 37: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Anomaly Detection (chatter )

dan.hyvl

danny.mccarty

david.delainey

drew.fossum

james.steffes

jane.tholt

jeff.dasovich

jeffrey.hodge

john.lavorato

kevin.presto

lindy.donoho

mark.haedicke

mark.taylor

phillip.allen

richard.sanders

richard.shapiro

robert.badeerscott.neal

shelley.corman

stanley.horton

steven.kean

susan.scott

kenneth.lay

Ω108

Scan Statistics on Enron Graphs – p.37/39

Page 38: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Discussion

scan statistics offers promise fordetecting anomalies in time series of graphs.extensions:• weighted graphs (# of messages),• coloured hypergraphs (“To”, “CC”, or “BCC”),• ...• sliding window (online analysis),• ...• exponential smoothing, detrending, variance stabilization, ...“Content and Scan Statistics for Enron” — John Conroy, Thursday

Scan Statistics on Enron Graphs – p.38/39

Page 39: Scan Statistics on Enron Graphs - Johns Hopkins Applied ...priebe/.FILES/cepssg072105.pdf · Scan Statistics on Enron Graphs Carey E. Priebe cep@jhu.edu Johns Hopkins University Department

< > - +

Kronecker Quote

“The wealth of your practical experiencewith sane and interesting problems

will give to mathematicsa new direction and a new impetus.”

– Leopold Kronecker to Hermann von Helmholtz –

Scan Statistics on Enron Graphs – p.39/39