cs 589 information risk management 6 february 2007
Post on 19-Dec-2015
220 views
TRANSCRIPT
![Page 1: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/1.jpg)
CS 589Information Risk Management
6 February 2007
![Page 2: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/2.jpg)
Today
• More Bayesian ideas – Empirical Bayes
• Your presentations
• Prior Distributions for selected distribution parameters
• Updating Priors Posterior Distribution Updated Parameter Estimates
![Page 3: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/3.jpg)
References
• A. R. Solow, “An Empirical Bayes Analysis of Volcanic Eruptions”, Mathematical Geology, 33, Vol.1, 2001.
• J. Geweke, Contemporary Bayesian Economics and Statistics. Wiley, 2005.
• S. L. Scott, “A Bayesian Paradigm for Designing Intrusion Detection Systems”, Computational Statistics and Data Analysis, 45, Vol. 1, 2003.
![Page 4: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/4.jpg)
Why are we doing this?
• Model risks
• Model outcomes
• Use the models in a model of the decision situation to help us rank alternatives
• Gain deeper understanding of the problem and the context of the problem
![Page 5: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/5.jpg)
Basic Relation
dfxP
fxPxf
||
|
The prior distribution in the numerator shouldbe selected with some care. The distribution in the denominator is known as the predictivedistribution.
![Page 6: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/6.jpg)
Recall: Why Bayesian Approach?
• Incorporate prior knowledge into the analysis
• From Scott – synthesize probabilistic information from many sources
• Consider the following exercise:
• P(I) = .01; P(D|I) = .9, P(D not|I not) = .95.
• An intrusion alarm goes off. What is the probability that it’s really an intrusion?
![Page 7: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/7.jpg)
Posterior Probabilities as a Function of Priors and Conditional Probabilities
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
P(I)
P(I|D
) P(D|I) =.95P(D|I) = .9P(D|I) = .7
![Page 8: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/8.jpg)
Priors
• Prior for a Poisson parameter is Gamma
0,)(
)(
0)(
1)(
0
1
/1
xE
dxex
xexxf
x
x
![Page 9: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/9.jpg)
Gamma(16, .125)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.254 2.887
![Page 10: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/10.jpg)
Gamma(2, 1)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0 1 2 3 4 5 6 7 8
>5.0%90.0%0.355 4.744
![Page 11: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/11.jpg)
Gamma(64, .03125)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
0 1 2 3 4 5 6 7 8
>5.0% 5.0%90.0%1.607 2.428
![Page 12: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/12.jpg)
Gamma Parameters
• How do we pick them?
• Expert
• Data
• Expert + Data
![Page 13: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/13.jpg)
Recall Our Data Example
• Go from Data to Gamma Parameters
• We want to pick parameters that reflect the data
• We will have to use our judgment to decide on a final prior parametric estimate
![Page 14: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/14.jpg)
Events in a 24-Hour Period
0
1
2
3
4
5
6
7
1 3 5 7 9 11 13 15 17 19 21 23
Hour
Eve
nts
![Page 15: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/15.jpg)
HourEvent
1 32 13 54 65 26 37 18 09 3
10 511 412 1
HourEvent
13 214 315 216 517 118 319 420 221 322 523 224 3
![Page 16: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/16.jpg)
Parameterization Ideas
• Distribution Mean = Data Mean
• Equate
– Cumulative/Frequency Distribution Data
– Sum of Distribution Frequency Data and 1
– Sum of Absolute Differences and 0
• Pick Criteria that fit best
![Page 17: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/17.jpg)
We can formulate and optimize
• Pick the best parameters given what we know
• I used Excel and the Solver add-in
• Any optimization program will work
• Canned probability functions are preferred …
![Page 18: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/18.jpg)
Use All the Data
• Several reasonable possibilities
• This will matter for updating purposes
• Use all data for the parameter estimate
• Use some of the data to estimate the gamma prior – and therefore the Poisson parameter – and the rest to illustrate the idea of updating the prior
![Page 19: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/19.jpg)
Prior Distribution
• The prior should reflect our degree of certainty, or degree of belief, about the parameter we are estimating
• One way to deal with this is to consider distribution fractiles
• Use fractiles to help us develop the distribution that reflects the synthesis of what we know and what we believe
![Page 20: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/20.jpg)
Prior + Information
• As we collect information, we can update our prior distribution and get a – we hope – more informative posterior distribution
• Recall what the distribution is for – in this case, a view of our parameter of interest
• The posterior mean is now the estimate for the Poisson lambda, and can be used in decision-making
![Page 21: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/21.jpg)
Information
• For our Poisson parameter, information might consist of data similar to what we already collected in our example
• We update the Gamma, take the mean, and that’s our new estimate for the average occurrences of the event per unit of measurement.
![Page 22: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/22.jpg)
Gamma(3.74, .769)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
-1 0 1 2 3 4 5 6 7 8
>5.0% 5.0%90.0%0.936 5.676
Sum of AbsoluteDifferences Minimized
![Page 23: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/23.jpg)
Gamma(11.1, .259)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.617 4.426
![Page 24: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/24.jpg)
Updating
• It’s pretty intuitive
• Add the number of hourly intrusions to alpha
• Add the number of hours (that is, the number of hour intervals) to beta
• Be careful with beta – sometimes it’s written in inverse form, which means we need to add the inverse of the number of hourly units
![Page 25: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/25.jpg)
Back to our Example
• Use the first 22 observations
• Update with the remaining 2
• What happens to
– Our distribution?
– Our Poisson parameter estimate?
• First, let’s get our new Prior
![Page 26: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/26.jpg)
New Prior
• The first one is a result of minimizing the sum of absolute differences between probability computations and summing computed probabilities to 1
• The second is computed without the latter constraint
![Page 27: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/27.jpg)
Gamma(11.166, .261)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.643 4.481
![Page 28: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/28.jpg)
Gamma(14.404, .202)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.773 4.275
![Page 29: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/29.jpg)
Updates
• What can we say about them vis-à-vis
– The original gamma estimate from all 24 points
– The measures we care about (mean, relative accuracy, etc.)
• Which one is “better”?
![Page 30: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/30.jpg)
Gamma(19.404, .144)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.839 3.913
E(Lambda) = 2.79
![Page 31: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/31.jpg)
Gamma(22.516, .125)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.915 3.856
E(Lambda) = 2.815
![Page 32: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/32.jpg)
Another way to Observe Data
• In this case, we’ll use the next 12 hours
• And we’ll update our prior distributions
• Which one provides more accuracy?
• How would we know in a more realistic situation?
![Page 33: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/33.jpg)
Gamma(46.062, .063)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
< >5.0% 5.0%90.0%2.236 3.639
E(Lambda) = 2.902
![Page 34: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/34.jpg)
So, What’s the Conclusion?
• Do our updated priors make sense – especially in light of the original data-driven distribution?
• What can we say about the way in which observed data can impact our posterior distribution and the associated estimate for the Poisson parameter?
• What else can we conclude?
![Page 35: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/35.jpg)
Another Prior Distribution
• Of interest in Information Risk – and risk in general – applications is the notion of the probability of a binary outcome– Intrusion/Non-Intrusion– Bad item/non-bad item
• In this case, we can model the probability of an event happening – or not
• The number of events of interest in a space of interest could be modeled using a binomial distribution
![Page 36: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/36.jpg)
Example
• Suppose we know how many intrusion attempts (or any other event) happened in the course of normal operation of our system – and we know how many non-intrusion events happened.
• So our data would look something like the following slide
![Page 37: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/37.jpg)
HourEvent Total Prob
1 3 172 0.0174422 1 152 0.0065793 5 106 0.047174 6 121 0.0495875 2 97 0.0206196 3 53 0.0566047 1 78 0.0128218 0 101 09 3 88 0.034091
10 5 93 0.053763
![Page 38: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/38.jpg)
Now …
• We might be interested in the probability that a given input is malicious, bad, etc.
• How could we do this risk model?
• The binomial is a clear choice
• We know n for a given period
• We need p
• p seems to vary – what can we do?
![Page 39: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/39.jpg)
A Model for p
• Develop a prior distribution for p that combines– The data– What we know that might not be in the data
• Use the expectation of the distribution for E(p)
• Use E(p) in our preliminary analysis
![Page 40: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/40.jpg)
Another Prior
• The Prior Distribution model for the binomial p is a beta distribution.
• Binomial
• Beta
yny ppy
nyf
1
11 1)(
xxxf
![Page 41: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/41.jpg)
Beta Prior
pE
The predictive distribution is theBeta-Binomial (you can look it up)
Like the Gamma prior for the Poisson, thisis very easy to update after observing data
![Page 42: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/42.jpg)
Other Estimates
• Outcomes– These can be in the form of costs, both real and
opportunity– Distributions are better than point estimates if we
know that we don’t know the future
• Problem: Expected Value criterion can diminish the importance of our probability modeling efforts for events and outcomes
![Page 43: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/43.jpg)
Outcome Distributions
• Unlike our discussion to this point, where the variable of interest has been associated with a discrete distribution, outcome distributions may be continuous in nature
• Normal, Lognormal, Logistic
• Usually estimating more than one parameter
• Possibly more complex prior – info – posterior structure
![Page 44: CS 589 Information Risk Management 6 February 2007](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d3a5503460f94a14c77/html5/thumbnails/44.jpg)
Homework
• I’m going to send you sample datasets
• I need team identification – same ones as today?
• Due at the beginning of class next week
• Presentation, not paper
• Also – please be ready to discuss the Scott paper