bsideslv 2014 the power law of information security

31
NORMAL DISTRIBUTIONS RULE EVERYTHING AROUND ME NORMAL DISTRIBUTIONS RULE EVERYTHING AROUND ME Many empirical quantities cluster around a typical value. The dice rolls in these casinos, the number of reporters on the wall of sheep every year, the air pressure, the sea level, the temperature on a sunny BlackHat day in Vegas. All of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say that it is really fucking hot in vegas in August because it never deviates very far from this. Even the largest deviations, which are exceptionally rare, are still only about a factor of two from the mean in either direction and hence the distribution can be well characterized by quoting just its mean and standard deviation. But not everything.

Upload: michael-roytman

Post on 08-May-2015

366 views

Category:

Internet


0 download

DESCRIPTION

We need to stop using averages in information security. Breaches and other infosec events are actually power law distributed, not normally disturbed. This means a better analogy is "a 10 year breach" vs an average breach. This talk looks at 3 distinct datasets to illustrate the phenomenon.

TRANSCRIPT

Page 1: BsidesLV 2014 The Power Law of Information Security

NORMAL DISTRIBUTIONS RULE EVERYTHING AROUND ME

NORMAL DISTRIBUTIONS RULE EVERYTHING AROUND ME Many empirical quantities cluster around a typical value. The dice rolls in these casinos, the number of reporters on the wall of sheep every year, the air pressure, the sea level, the temperature on a sunny BlackHat day in Vegas. All of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say that it is really fucking hot in vegas in August because it never deviates very far from this. Even the largest deviations, which are exceptionally rare, are still only about a factor of two from the mean in either direction and hence the distribution can be well characterized by quoting just its mean and standard deviation. But not everything.

Page 2: BsidesLV 2014 The Power Law of Information Security

ALEX HUTTON DREAMS OF RISK My name is Alex Hutton and I model risk for a small too big to fail bank. Last year, like every other day, I woke up and built a risk model. Since we’re a bank, we track the prices of a lot of things. For one of these widgets, I built a distribution of price movements. This one is a normal distribution and I assumed that the s.dev was 3%, which is a typical number for daily price movements in financial markets. My boss used this to make some decisions, and was quite happy. We made millions from the tiny everyday price fluctuations and trades.

Page 3: BsidesLV 2014 The Power Law of Information Security

SHIT GOES WRONG SLIDE

Today, however, we are fucked. Today is Black Monday, October 19, 1987 and the S&P drops by 21%. My boss freaks out, the firm is in financial ruin, my kids starve.

Page 4: BsidesLV 2014 The Power Law of Information Security

How could this happen? Under my model, the probability of a 21% fluctuation is 10^-16, or… nonexistent. So what happened? Well, the distribution of price fluctuations actually has a fat tail. In fact, the mistake I made was using a normal distribution. Take a look at what happens if we use a power law distribution instead.

Page 5: BsidesLV 2014 The Power Law of Information Security

Probability 0.9 0.99 0.999 10

NORMAL 3.8 7.0 9.2 21

POWER 2.8 7.8 38.5 almost 0

SOMEBODY SET UP US THE BOMB!

Now, the chance of a 21% fluctuation is 0.08%, something that my risk model would certainly have included. And, would have certainly changed our behavior on the financial markets. The good news is most financial firms are aware of this phenomenon, and model accordingly (after a few massive failures). In info sec, we’re just not there yet.

Page 6: BsidesLV 2014 The Power Law of Information Security

MOTHERF*RS

SWANS

ACT LIKE THEY FORGOT ABOUT

Often, as Russell Thomas likes to point out, people mistake events that they did not predict for black swan events. However, !What makes a "Black Swan event" is not the event itself.  Instead, it is how that event fits into the object-observer system. !And in fact, the paradigm shift to using power law distributions to describe many of the variables we use in info sec explains away plenty of “black swans” - by making the object-observer system more receptive to rare, high impact events.

Page 7: BsidesLV 2014 The Power Law of Information Security

THE POWER LAW(S) OF INFORMATION

SECURITY@mroytman

THE POWER LAW OF INFORMATION SECURITY But in fact, nothing is linear. This talk is about the power laws which occur in information security, what they mean, where i’ve found some, and what to do about them. The research i’ll present is far from done, but it’s a starting point and I hope to make you think twice before using a normal distribution in a model again.

Page 8: BsidesLV 2014 The Power Law of Information Security

SLIDE WITH FRACTALS

WHAT ARE POWER LAWS Power laws are distributions which describe scale-free phenomenon. What this means in lay man’s terms is that the same mechanism is at work across a range of scales, and orders of magnitudes. In fact, power laws are a necessary and sufficient condition for scale free phenomenon. The importance and ubiquity of scale free behavior was first pointed out by Mandlebrot, who coined the term “fractals”. In fractals, we see the same behavior across different scales of length, time, price or any other relevant variable with a scale attached to it.

Page 9: BsidesLV 2014 The Power Law of Information Security

A quantity is said to follow a power law if it is drawn from a probability distribution that looks like: P(x) ~ Cx^alpha !alpha is a constant parameter of the distribution known as the exponent, or scaling parameter. typical scaling parameters are in the range 2-3, but there are exceptions.

Page 10: BsidesLV 2014 The Power Law of Information Security

Lots of things follow a power law power law phenomenon. The oldest (1948) and cleanest statistical regularity in international relations is Richardson's law which states that the severity of warfare is power law distributed. This behavior is not unique to wars, and occurs in natural sciences (traffic jams, earthquakes, biodiversity, coastlines, brownian motion, asteroid impacts, etc) and social sciences (language, wealth, firm size, salaries, guild sizes in world of warcraft, links to blogs). These power laws are considered fingerprints of a "complex" system; although what exactly is meant by complex is transient. These systems generally produce outputs that are patterned, but have no standard(for lack of a better term) size in the Gaussian sense. More often than not, a power law only applies to the values of a distribution greater than some minimum x. In these cases, we say that the tail follows a power law.

Page 11: BsidesLV 2014 The Power Law of Information Security

FAKE SWANS Tails are vitally important. A power law is an instance of a fat tailed distribution. There exist precise proofs that “sufficiently fat tails” == power law distributions. Measuring how fat a tail is, is actually quite difficult - The question of proving that something is or isn’t a power law, is often reduced to a question of “just how fat the tail is”.

Page 12: BsidesLV 2014 The Power Law of Information Security

You can’t tell the difference here, but when we go further out…

Page 13: BsidesLV 2014 The Power Law of Information Security

You can see how much smaller the tails of the non-power law distributions are.

Page 14: BsidesLV 2014 The Power Law of Information Security

LACK OF PREDICTION Why does this matter? It’s because when the tails are small, we can say meaningful things about the “mean” and variance” of the distributions. With a power law distribution, the mean or variance don’t necessarily stay stable over time. !An interest aspect of power laws is that the alpha exponent has a natural interpretation. It is the cutoff above which moments of the function do not exist. More familiarly, for exponents less than 2, the variance does not exist, and the central limit theorem does not apply. In effect, even with an infinite amount of data, we cannot say much about the variance of such functions. For exponents less than 1, the mean does not exist. For this reason there is no such thing as an “average flood”. There is instead a 100 year flood, a 10 year flood.

Page 15: BsidesLV 2014 The Power Law of Information Security

Perhaps we ought to start talking about the target breach as a “10 year breach”. !But let’s get back to our own industry - why would information security exhibit power law behavior? And where?

Page 16: BsidesLV 2014 The Power Law of Information Security

First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results stay power law distributed. !Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect our own variables to inherit those power law properties.

Page 17: BsidesLV 2014 The Power Law of Information Security

LAW 1

First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results stay power law distributed. !Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect our own variables to inherit those power law properties.

Page 18: BsidesLV 2014 The Power Law of Information Security

BREACH FREQUENCY BY CVE TYPE

P(CVE has breach volume X) = X^-1.5

The Kolmogorov–Smirnov D-value: 0.1134174, xmin: 15, alpha: 1.5 !The chance that a particular CVE has high breach volume is substantially higher than we previously thought, just like in the hutton example the chance that the S&P dropped by 21% was underestimated.

Page 19: BsidesLV 2014 The Power Law of Information Security

ONE VULN WILL CAUSE YOUR BREACH(OR A COUPLE)

What does this mean for you? It means there are vulnerabilities which have an extremely high probability of causing a breach. Since this breach data comes from how attackers are behaving, having a handle on threat intelligence globally allows you to identify _which_ vulnerabilities are those most likely to cause the breaches. !It means shifting your strategy away from trying to fix everything, or even trying to fix everything that comes out on patch tuesday, and instead focusing on identifying and remediating the few vulnerabilities which are _most_ likely to cause a breach. THIS is non-linear thinking.

Page 20: BsidesLV 2014 The Power Law of Information Security

LAW 2

First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results stay power law distributed. !Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect our own variables to inherit those power law properties.

Page 21: BsidesLV 2014 The Power Law of Information Security

Kevin Thormson’s talk tomorrow at 2pm - This talk introduces the VERIS Community Database (VCDB), a research project aimed at gathering news articles about information security incidents, extracting data, and serving as a public repository of breach data suitable for analysis and research

Page 22: BsidesLV 2014 The Power Law of Information Security

ID THEFT FREQUENCY

P(Theft has X victims) = X^-0.7

beta 0.7+- 0.1 Malliart and Sornette, ETH Zurich 2009 (datalossdb). !

Page 23: BsidesLV 2014 The Power Law of Information Security

STABLE ACROSS INDUSTRIES

beta 0.7+- 0.1 Malliart and Sornette, ETH Zurich 2009 (datalossdb).

Page 24: BsidesLV 2014 The Power Law of Information Security

ONE BREACH WILL MATTER MOST(OR A COUPLE)

The takeaway here is that impact is concentrated in the fat tails of the distributions as well - it means we ought to be tailoring our strategies to preventing the one big breach. This also means there’s no average breach, and estimates of potential losses need to plan for scenarios like the black friday that was missed in the opening example.

Page 25: BsidesLV 2014 The Power Law of Information Security

LAW 3

First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results stay power law distributed. !Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect our own variables to inherit those power law properties.

Page 26: BsidesLV 2014 The Power Law of Information Security

BREACH FREQUENCY BY DAY

P(Day has breach volume X) = X^-1.5

The Kolmogorov–Smirnov D-value: 0.1134174, xmin: 15, alpha: 1.5 !

Page 27: BsidesLV 2014 The Power Law of Information Security

ONE DAY IT’LL HAPPEN TO YOU(OR A COUPLE)

Page 28: BsidesLV 2014 The Power Law of Information Security

SLIDE WITH WHAT DO WE DO ABOUT IT

From Russell: Handling Fat Tails for Decisionmakers !Here's a list of things that analysts and decision makers can do to successfully cope with the unruliness of very fat tailed probability distributions: 1. To the method of frequentist statistical analysis of historical data, add other methods and other data.  Simulations, laboratory experiments, and subjective probability estimates by calibrated experts are just three alternative methods that can fill in for the limitations of frequentist methods with limited sample data. 2. Resist using colloquial terms like "average", "typical", "spread", or even "worst case".  Using them will only add to confusion, misunderstanding, and mis-set expectations. 3. Communicate and decide using quantiles, not the usually summary statistics mean, standard deviation, etc.  If any summary statistics are used as decision criteria or in models, use quantiles. 4. Put in some effort to estimate the "fatness" of the tail, either parametrically or non-parametrically.  Even a not-very-good fat tail model is much better than one based on thin tails.  There are ways to test how good the alternative models are.   In my opinion, the best academic paper on this is "Power-law distributions in empirical data".

Page 29: BsidesLV 2014 The Power Law of Information Security

You should model risk differently !!michael ![8:21 AM] You should focus your efforts on identifying things that live in the fat tail or are predictors of it !!michael ![8:22 AM] Bc there is no average !!michael ![8:22 AM] you should never ever use metrics like average vulns closed or something like that

1. Investing to fix 100% of vulns is poor use of resources 2. When the Big Loss event happens, only one or a few vulnerabilities will be exploited 3. Ahead of that (ex ante), you need a systematic method to invest to fix a portfolio of vulns which, with very high confidence, include ALL of the vulns that could be part of the Big Loss event.  These vulns will be strategically positioned in the most likely attack graphs. 4. And here’s how you’d do that in practice ...

Page 30: BsidesLV 2014 The Power Law of Information Security

Holler!www.risk.io@mroytman

Page 31: BsidesLV 2014 The Power Law of Information Security

Dan Geer, Power. Law. http://geer.tinho.net/ieee/ieee.sp.geer.1201a.pdfClauset et al. Power Law Distributions in Empirical Data http://arxiv.org/abs/0706.1062

Farmer and Geanokoplos, Power Laws in Economics and Elsewhere http://tuvalu.santafe.edu/~jdf/papers/powerlaw3.pdf

Malliart and Sornette, Heavy-Tailed Distribution of Cyber Risks, http://arxiv.org/abs/0803.2256

poweRlaw R Package http://cran.r-project.org/web/packages/poweRlaw/vignettes/poweRlaw.pdf

Gabaix, Some Nondescript NYU Stern Lecture on Power Laws http://pages.stern.nyu.edu/~xgabaix/papers/powerLaws.pdf

Russell Thomas for graphs and everything he writes on http://exploringpossibilityspace.blogspot.com/

THANKS!

and Alex Hutton