cs 589 information risk management 6 february 2007

CS 589Information Risk Management

6 February 2007

Today

• More Bayesian ideas – Empirical Bayes

• Your presentations

• Prior Distributions for selected distribution parameters

• Updating Priors Posterior Distribution Updated Parameter Estimates

References

• A. R. Solow, “An Empirical Bayes Analysis of Volcanic Eruptions”, Mathematical Geology, 33, Vol.1, 2001.

• J. Geweke, Contemporary Bayesian Economics and Statistics. Wiley, 2005.

• S. L. Scott, “A Bayesian Paradigm for Designing Intrusion Detection Systems”, Computational Statistics and Data Analysis, 45, Vol. 1, 2003.

Why are we doing this?

• Model risks

• Model outcomes

• Use the models in a model of the decision situation to help us rank alternatives

• Gain deeper understanding of the problem and the context of the problem

Basic Relation

dfxP

fxPxf

||

|

The prior distribution in the numerator shouldbe selected with some care. The distribution in the denominator is known as the predictivedistribution.

Recall: Why Bayesian Approach?

• Incorporate prior knowledge into the analysis

• From Scott – synthesize probabilistic information from many sources

• Consider the following exercise:

• P(I) = .01; P(D|I) = .9, P(D not|I not) = .95.

• An intrusion alarm goes off. What is the probability that it’s really an intrusion?

Posterior Probabilities as a Function of Priors and Conditional Probabilities

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

P(I)

P(I|D

) P(D|I) =.95P(D|I) = .9P(D|I) = .7

Priors

• Prior for a Poisson parameter is Gamma

0,)(

)(

0)(

1)(

0

1

/1

xE

dxex

xexxf

x

x

Gamma(16, .125)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.254 2.887

Gamma(2, 1)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 1 2 3 4 5 6 7 8

>5.0%90.0%0.355 4.744

Gamma(64, .03125)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0 1 2 3 4 5 6 7 8

>5.0% 5.0%90.0%1.607 2.428

Gamma Parameters

• How do we pick them?

• Expert

• Data

• Expert + Data

Recall Our Data Example

• Go from Data to Gamma Parameters

• We want to pick parameters that reflect the data

• We will have to use our judgment to decide on a final prior parametric estimate

Events in a 24-Hour Period

0

1

2

3

4

5

6

7

1 3 5 7 9 11 13 15 17 19 21 23

Hour

Eve

nts

HourEvent

1 32 13 54 65 26 37 18 09 3

10 511 412 1

HourEvent

13 214 315 216 517 118 319 420 221 322 523 224 3

Parameterization Ideas

• Distribution Mean = Data Mean

• Equate

– Cumulative/Frequency Distribution Data

– Sum of Distribution Frequency Data and 1

– Sum of Absolute Differences and 0

• Pick Criteria that fit best

We can formulate and optimize

• Pick the best parameters given what we know

• I used Excel and the Solver add-in

• Any optimization program will work

• Canned probability functions are preferred …

Use All the Data

• Several reasonable possibilities

• This will matter for updating purposes

• Use all data for the parameter estimate

• Use some of the data to estimate the gamma prior – and therefore the Poisson parameter – and the rest to illustrate the idea of updating the prior

Prior Distribution

• The prior should reflect our degree of certainty, or degree of belief, about the parameter we are estimating

• One way to deal with this is to consider distribution fractiles

• Use fractiles to help us develop the distribution that reflects the synthesis of what we know and what we believe

Prior + Information

• As we collect information, we can update our prior distribution and get a – we hope – more informative posterior distribution

• Recall what the distribution is for – in this case, a view of our parameter of interest

• The posterior mean is now the estimate for the Poisson lambda, and can be used in decision-making

Information

• For our Poisson parameter, information might consist of data similar to what we already collected in our example

• We update the Gamma, take the mean, and that’s our new estimate for the average occurrences of the event per unit of measurement.

Gamma(3.74, .769)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

-1 0 1 2 3 4 5 6 7 8

>5.0% 5.0%90.0%0.936 5.676

Sum of AbsoluteDifferences Minimized

Gamma(11.1, .259)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.617 4.426

Updating

• It’s pretty intuitive

• Add the number of hourly intrusions to alpha

• Add the number of hours (that is, the number of hour intervals) to beta

• Be careful with beta – sometimes it’s written in inverse form, which means we need to add the inverse of the number of hourly units

Back to our Example

• Use the first 22 observations

• Update with the remaining 2

• What happens to

– Our distribution?

– Our Poisson parameter estimate?

• First, let’s get our new Prior

New Prior

• The first one is a result of minimizing the sum of absolute differences between probability computations and summing computed probabilities to 1

• The second is computed without the latter constraint

Gamma(11.166, .261)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.643 4.481

Gamma(14.404, .202)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.773 4.275

Updates

• What can we say about them vis-à-vis

– The original gamma estimate from all 24 points

– The measures we care about (mean, relative accuracy, etc.)

• Which one is “better”?

Gamma(19.404, .144)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.839 3.913

E(Lambda) = 2.79

Gamma(22.516, .125)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.915 3.856

E(Lambda) = 2.815

Another way to Observe Data

• In this case, we’ll use the next 12 hours

• And we’ll update our prior distributions

• Which one provides more accuracy?

• How would we know in a more realistic situation?

Gamma(46.062, .063)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

< >5.0% 5.0%90.0%2.236 3.639

E(Lambda) = 2.902

So, What’s the Conclusion?

• Do our updated priors make sense – especially in light of the original data-driven distribution?

• What can we say about the way in which observed data can impact our posterior distribution and the associated estimate for the Poisson parameter?

• What else can we conclude?

Another Prior Distribution

• Of interest in Information Risk – and risk in general – applications is the notion of the probability of a binary outcome– Intrusion/Non-Intrusion– Bad item/non-bad item

• In this case, we can model the probability of an event happening – or not

• The number of events of interest in a space of interest could be modeled using a binomial distribution

Example

• Suppose we know how many intrusion attempts (or any other event) happened in the course of normal operation of our system – and we know how many non-intrusion events happened.

• So our data would look something like the following slide

HourEvent Total Prob

1 3 172 0.0174422 1 152 0.0065793 5 106 0.047174 6 121 0.0495875 2 97 0.0206196 3 53 0.0566047 1 78 0.0128218 0 101 09 3 88 0.034091

10 5 93 0.053763

Now …

• We might be interested in the probability that a given input is malicious, bad, etc.

• How could we do this risk model?

• The binomial is a clear choice

• We know n for a given period

• We need p

• p seems to vary – what can we do?

A Model for p

• Develop a prior distribution for p that combines– The data– What we know that might not be in the data

• Use the expectation of the distribution for E(p)

• Use E(p) in our preliminary analysis

Another Prior

• The Prior Distribution model for the binomial p is a beta distribution.

• Binomial

• Beta

yny ppy

nyf

1

11 1)(

xxxf

Beta Prior

pE

The predictive distribution is theBeta-Binomial (you can look it up)

Like the Gamma prior for the Poisson, thisis very easy to update after observing data

Other Estimates

• Outcomes– These can be in the form of costs, both real and

opportunity– Distributions are better than point estimates if we

know that we don’t know the future

• Problem: Expected Value criterion can diminish the importance of our probability modeling efforts for events and outcomes

Outcome Distributions

• Unlike our discussion to this point, where the variable of interest has been associated with a discrete distribution, outcome distributions may be continuous in nature

• Normal, Lognormal, Logistic

• Usually estimating more than one parameter

• Possibly more complex prior – info – posterior structure

Homework

• I’m going to send you sample datasets

• I need team identification – same ones as today?

• Due at the beginning of class next week

• Presentation, not paper

• Also – please be ready to discuss the Scott paper

cs 589 information risk management 6 february 2007

Documents