case-oriented alert correlation -...

Case-Oriented Alert Correlation

JIDONG LONG and DANIEL G. SCHWARTZ

Department of Computer Science

Florida State University

Tallahassee, Florida 32303

U.S.A.

[email protected], [email protected]

Abstract: - Correlating alerts is of importance for identifying complex attacks and discarding false alerts.Most popular alert correlation approaches employ some well-defined knowledge to uncover the connec-tions among alerts. However, acquiring, representing and justifying such knowledge has turned out tobe a nontrivial task. In this paper, we propose a novel method to work around these difficulties by usingcase-based reasoning (CBR). In our application, a case, constructed from training data, serves as anexample of correlated alerts. It consists of a pattern of alerts caused by an attack and the identity of theattack. The runtime alert stream is then compared with each case, to see if any subset of the runtimealerts are similar to the pattern in the case. The process is reduced to a matching problem. Two kindsof matching methods were explored. The latter is much more efficient than the former. Our experimentswith the DARPA Grand Challenge Problem attack simulator have shown that both produce almost thesame results and that case-oriented alert correlation is effective in detecting intrusions.

Key-Words: - Alert Correlation, Case-Based Reasoning, Data Mining, Intrusion Detection

1 Introduction

Intrusion detection has been studied for morethan 20 years. Although many intrusion detec-tion techniques have been explored, all apply es-sentially the same methodology, namely, lookingfor patterns from a single data source. Typicalsuch sources are network traffic, the sequence ofsystem calls on a host, and the sys logs on ahost. Intrusion detection techniques can be clas-sified into misuse detection or anomaly detection,based on the types of patterns for which the de-tector is searching. In misuse detection, one hasa knowledge base of previously observed abnor-mal behavior (attacks) and searches for matchesof current behavior with records currently in theknowledge base. In anomaly detection, one has aknowledge base of records of normal behavior anddetermines whether the current behavior fails tomatch all the records currently in the knowledgebase (a mismatch indicates a possible attack). In-trusion detection techniques can also be classified

as being host-based or network-based dependingon the type of data source. Host-based systemsmonitor events on a host, such as audit data orsys logs. Network-based systems monitor packetsflowing through a network.

Different techniques have both advantagesand drawbacks. Misuse detection cannot detectpreviously unknown attacks, whereas anomalydetection tends to generate large quantities offalse alerts. Moreover, neither host-based nornetwork-based intrusion detection can provide areasonable coverage of all attacks, as some at-tacks, especially distributed ones, can only be dis-covered by looking into multiple data sources atthe same time. Consequently, a large number offalse reports (both positives and negatives) hasbecome a common and critical issue in the intru-sion detection community.

Traditional intrusion detection systems(IDSs) are typically limited by using a single in-trusion detection technique. This inherent imper-

WSEAS TRANSACTIONS on COMPUTERS

Jidong Long and Daniel G. Schwartz

ISSN: 1109-275098

Issue 3, Volume 7, March 2008

fection makes them more likely to generate falsereports. A practical idea that tries to solve thisproblem is to have complementary IDSs workingtogether. This has been proposed in many works.However, mainly due to the different formats ofdata sources and alerts, most IDSs have beendeveloped independently and were not intendedto work collaboratively.

To address this issue, a common intrusiondetection framework (CIDF) has been proposed[1]. This entails a common alert format that canbe used for all intrusion detection environments.Some ideas in CIDF encouraged the creation ofan intrusion detection working group (IDWG) ofthe Internet Engineering Task Force (IETF). Twoimportant proposals, the intrusion detection mes-sage exchange format (IDMEF) and the intru-sion detection exchange protocol (IDXP), havebeen increasingly accepted as standards in the in-trusion detection community. Our work, in par-ticular, has concentrated on correlating IDMEFalerts.

A common alert format makes it much easierto build a system incorporating different IDSs.Such a system is often referred to as multi-sensorIDS or meta IDS. In such a system, differentIDSs complementary to each other are strate-gically deployed at different locations watchingdifferent data sources. In this context, the no-tion of sensor is more general than that of anIDS and includes any security device (in hard-ware of software), such as a firewall or antivirussystem, that can fire alerts. As sensors differ inmany aspects, such as their capabilities, inputdata sources, and working environments, alertsfrom different sensors can be very different. Forexample, alerts from a host sensor may containprocess information that a network sensor oth-erwise cannot provide. The essential idea is tomake the sensors somehow work together. Forthis it is necessary to determine the logical con-nections between alerts, i.e., to determine whichalerts are being caused by the same attack. Ifthis can be achieved then one obtains a globalview of attacks and at the same time can reducethe volumes of false alerts. The general approachis often referred to as alert correlation.

The effectiveness of an alert correlation ap-proach determines the effectiveness of the asso-ciated IDS. For this reason, many approaches

have been explored. Most popular alert correla-tion approaches employ some well-defined knowl-edge such as rules or probabilistic models, to un-cover the deep connections among alerts. How-ever, acquiring and representing such knowledgehas turned out to be a nontrivial task that re-quires high quality training data, and such datais unavailable for most intrusion detection appli-cations. In addition, the correctness of the re-sulting model is also difficult to determine, sincesome attack behaviors cannot be expressed nat-urally in rigid terms.

Case-based reasoning (CBR) is a decision-making technology that can work around thesedifficulties. The expertise of a CBR system isembodied in a library of past cases. A case con-sists of a description of a problem together withits solution. The design of a case-based reasoningsystem includes two important tasks. One is tocharacterize the relevant problems. This amountsto identifying the features that can be used todescribe the problems. The other task is to de-fine a similarity measure that can be used to es-timate the similarity between any two problems.In contrast with other types of intrusion detectionsystems, the CBR methodology makes knowledgeacquisition and presentation much easier. More-over, a case library can often be built on sparsetraining data and still be effective.

A CBR system is assumed to be operating inthe context of some environment that can giverise to problems. Given such a problem, the sys-tem applies its similarity measure to locate thosecases in the case library whose problem compo-nents are most similar to it. Then the solutionparts of those retrieved cases are used to suggesta solution for the problem.

In our application, problems are attacks.Each such problem is accordingly represented as apattern of alerts from the various sensors. Whendealing with the problem of alert correlation, theproblem component of a case can be viewed asa concrete example of a set of correlated alerts.More exactly, our approach can be briefly de-scribed as follows. From training data, we canknow exactly what true alerts the sensors fire forcertain attacks. All alerts for a particular attackare aggregated as an alert pattern. That pat-tern, along with its associated attack type, formsa single case. Then, given a stream of runtime

WSEAS TRANSACTIONS on COMPUTERS Jidong Long and Daniel G. Schwartz

ISSN: 1109-275099


alerts as input, if we can find a case whose prob-lem component closely matches some subset ofthe input stream, it is reasonable to believe thatthe detected subset constitutes a set of correlatedalerts caused by an attack of the same type as thecase.

Given a set of run-time alerts, however, alarge amount of computation may be requiredto determine whether the problem componentsof cases in the library match any subsets of thisset. For example, if the input set has 10 alerts,and the problem component of some case in thelibrary has 3 alerts, then one has 10-choose-3, or120, comparisons to determine whether there areany matchings with that particular case; and thismust be repeated for every case in the library.

Thus arises the issue of how to effectively findthe most likely candidates in the library. Al-though the relation among alerts may be compli-cated, alerts can sometimes be correlated simplythrough session information, namely, source IPaddress and port and destination IP address andport. In fact, session information has been con-sidered as key attributes in most intrusion detec-tion approaches used for building intrusion detec-tion models. Using session information suggestswhat we call explicit alert correlation. Althoughit is straightforward, it does have drawbacks. Theimportant prerequisite to make this approachwork is that the alerts themselves contain the ses-sion information. Unfortunately, this cannot besatisfied in the following circumstances.

• The data sources do not contain session in-formation. If they are used for intrusion de-tection, sensors are not able to associate at-tacks with sessions. For example, antivirussoftware can act as a host-based sensor. Itworks on the binary executables and mayfire alerts when virus patterns are found.But executables cannot be associated withsessions. Viruses embedded in executablescan be evoked in an unpredictable man-ner. Other typical data sources withoutsession information include application logsand system logs that only record high-leveldata such as user commands.

• The sensors do not provide session infor-mation. Different sensors have differentcapabilities. Depending on the environ-

ment, one may install a lightweight sen-sor that provides little information in itsalerts, or a more complex sensor that willhave a greater impact on the running sys-tem but provide more detailed alert infor-mation. Thus, even though the data sourcemay have session information, the sensormay not use it.

• Session information is implicit in the datasources. Even in network traffic, session in-formation sometimes is not apparent. IPpackets must include both IP addresses andports. However, packets in protocols lowerthan the TCP/IP protocol may not havecomplete session information. For example,ICMP packets only have IP addresses. Anattacker may first perform an “IP sweep”(sending a series of ICMP packets) to findlive machines in a particular network andthen follow with attack sessions to breakinto those hosts by exploiting some vulner-abilities. In such cases, the ICMP packetscan only be associated with sessions fromthe attack context.

• Not every intrusion occurs over a network.The system can be compromised by insid-ers, such as attacks involving an abuse ofprivileges or previously installed backdoorsoftware.

The drawbacks of explicit alert correlationdrove us to look for a more general approach.This lead to the idea of the case-oriented alertcorrelation. The basic idea is to turn the alertcorrelation problem into a matching problem.More specifically, for each case in the case library,the task is to find a subset of the set of run-timealerts that closely match the alerts in that case. Ifsuch a subset of alerts shows sufficient similarityto the alerts in the case, it is reasonable to believethat the set of alerts are correlated with respectthe same type of attack as represented by thecase. This amounts to what is known as a max-imal matching problem, which can be perfectlysolved by the well-known Hungarian algorithm[13]. But the downside is that the complexity ofthe algorithm is too high for it to be a practicalsolution. To work around this problem, we intro-duce the concept of order-preserving matching.


ISSN: 1109-2750100


This is based on the observation that alerts areoften fired in a sequential manner. A dynamicprogramming solution for the order-preservingmatching is given, which has linear time complex-ity. Our experiment shows that this is as good asthe Hungarian algorithm in solving the problemof alert correlation. Our case-oriented alert cor-relation has also demonstrated great potential indetecting attacks.

The rest of this paper is organized as follows.Some related work will be discussed in Section2. Then some basic data representations em-ployed in our approach are introduced in Section3. We discuss case-oriented alert correlation andthe Hungarian algorithm in Section 4. Sections5 and 6 detail the dynamic programming solu-tion for order-preserving matching and the asso-ciated algorithm. Section 7 gives our experimen-tal results. Section 8 provides some concludingremarks.

2 Related Work

There have been several proposals that addressthe problem of alert correlation. Approaches suchas probabilistic alert correlation [17] and alertclustering methods [2] and [7] do not use pre-defined knowledge other than alerts themselvesand are based on the similarity between alert at-tributes. Although they are effective in findingsimilar alerts (e.g., alerts with the same sourceand destination IP addresses), approaches of thistype are criticized for not being able to discoverdeep logical connections between alerts.

Some approaches, such as [5] and [11], cor-relate alerts according to the attack scenariosspecified by human users or learned from train-ing data by knowledge discovery approaches (e.g.,data mining and machine learning). A limitationof these methods is that they are restricted toknown attacks or variants of known attacks.

Approaches based on prerequisites and conse-quences of attacks, such as [3], [14], and [16], maydiscover novel attack scenarios. Intuitively, theprerequisite of an attack is the necessary condi-tion for the attack to be successful, while the con-sequence of an attack is the possible outcome ofthe attack. However, in practice, it is impossibleto predefine the complete set of prerequisites andconsequences. In fact, some relationships can-not be expressed naturally with the given set of

terms.Some approaches, such as [12] and [15], ap-

ply additional information sources, e.g., firewallsand system state monitors, to facilitate alert cor-relation. System state information may includesuch items as number of users, amount of mem-ory usage, and amount of CPU usage. In partic-ular, [12] has proposed a formal model, M2D2,for alert correlation using such multiple infor-mation sources. Because of this multiplicity ofsources, their method can potentially lead to bet-ter results than those that look at alerts only.However, it invites more human involvement andmakes the development of the intrusion detectionsystem more time-consuming and error-prone inpractice.

In some relevant fields such as network man-agement and supervision, CBR has been appliedto alert correlation [10]. In addition, CBR canalso be used as a detection approach for a stand-alone system [6]. Our approach significantly dif-fers in that it applies an XML description of prob-lems and a generic XML distance measure.

3 Problems and Cases

Problems and cases have to be clearly defined fora specific domain where CBR is applied. Ourdomain is meta intrusion detection. For this wedefine a problem as a pattern of alerts represent-ing an attack and a case as a previously knownproblem together with information regarding itsidentity and possibly some prescribed response.Normally, an intrusion is a series of attacks. Itseems simpler to let a problem represent the en-tire intrusion. As the same type of attacks canbe employed to launch different types of intru-sions, defining a problem at the attack level al-lows a finer-grained analysis. The conventionalapproach to characterize problems is through aset of predefined features. In other words, a prob-lem is described by a set of features together withtheir values. This is basically the same as theattribute-value representation employed by otherknowledge or learning systems. Under such a rep-resentation, data can be put into a single table.

The information the meta IDS can obtain dur-ing an attack consists of IDMEF alerts from de-ployed sensors. Alert information is inherentlyheterogeneous. Some alerts are defined with verylittle information, such as origin, destination,


ISSN: 1109-2750101


Figure 1: The XML representations of a case, a problem and an alert.

name, and time of the event. Other alerts pro-vide much more information, such as ports or ser-vices, processes, user information, and so on. Inaddition, alert information may contain extendedinformation due to the adoption of new detectionapproaches. Thus, predefining a fixed set of fea-tures that can cover all important information inalerts is almost impossible. Instead of attempt-ing this, we choose a more flexible approach to de-scribe cases in our design. The collection of alertsgenerated from different sensors during the at-tack, which forms a pattern of alerts, constitutesthe description of an attack (or a problem). Sincean alert is an XML object, in order to facilitatethe aggregation of alerts, a problem is also repre-sented in XML. The alerts comprising a problemare organized according to the sensors that pro-duced them, and alerts from the same senor aresorted in chronological order. As mentioned, acase consists of the description of an attack andits identity. Figure 1 shows the representationsand structures of a case, a problem, and an alert.

The XML representation of objects is inher-ently different from the attribute-value represen-tation applied by conventional CBR systems. Inan attribute-value representation, an object is de-scribed by a fixed number of attributes (attributenames and attribute values; attributes are called‘features’ in CBR). Although the attribute-valuerepresentation is popular and used in a number ofknowledge systems, it has a limitation when de-scribing complex objects, such as trees. An XMLdocument is a tree structure. If an XML object istransformed into a set of attributes (the processis called ‘flattening a tree’), some information in

it may be lost. Thus, in our approach, data anal-ysis is performed directly over the alerts, givenas XML objects, rather than working on a set ofattributes (features) extracted from the originalalerts.

4 The Core Methodology

We have called our approach case-oriented alertcorrelation because it makes use of a case library.We assume that there is a case library that con-tains descriptions of known attacks, where, asbefore, cases are identified by their patterns ofalerts represented in XML. The underlying ideais, given some collection of alerts raised by thevarious sensors during some given time frame, tofind those cases in the case library whose alertpatterns are sufficiently similar to some subset ofthe given set to suggest that the attack describedby that case has taken place.

To be more specific, the problem of alert cor-relation has been reduced to a matching problemas follows. The pattern of alerts in a given caseand the collection of runtime alerts form two setsof alerts. An alert in one set can be matchedwith up to one alert in the other set. The dis-tance between these two alerts is associated withthat match. Suppose all the pair-wise distancesbetween alerts in these two sets are available. Amatching problem can be defined as one of look-ing for matches with the overall minimal sum oftheir distances. The resulting matching deter-mines a pattern of alerts that can be found fromthe runtime alerts and is most similar to one inthe given case. Given a set of runtime alerts, amatching problem is formulated for each case in


ISSN: 1109-2750102


Correlated alerts (A new pattern of alerts)

<problem> <sensor id = “s1”>

+ <IDMRF-message> … A11


<sensor id = “s2”> + <IDMRF-message> … A21


</problem>

+ <IDMRF-message> … B11






B11 B12 B13

A11 0.30 0.10 0.20

A12 0.25 0.20 0.15

B21 B22 B23

A21 0.20 0.26 0.35

A22 0.30 0.25 0.31

From sensor 2

From sensor 1

Runtime alertsPattern of alerts in a given case

<problem> <sensor id = “s1”>



<sensor id = “s2”> + <IDMRF-message> … B21


</problem>

Figure 2: An example of case-oriented alert correlation.

the case library.

The manner in which these subsets are identi-fied and applied is illustrated in Figure 2. A frag-ment of some given set of alerts is shown in themiddle column, where the alerts are organized ac-cording to the sensors that produced them. Herethere are 6 alerts, with 3 coming from sensor S1,denoted B11, B12, and B13, and 3 from sensorS2, denoted B21, B22, and B23. On the left isshown the alert pattern from some case in thecase library. For the purposes of the illustration,this pattern also involves the same sensors, S1and S2, although in general this need not be thecase. Some cases might involve other sensors, andmight not involve either S1 or S2. The patternof alerts in the given case consists of 4 alerts, A11

and A12 from S1, A21 and A22 from S2.

We use the pattern in the case to extract asubset of the alerts in the given set and builda new (derived) pattern as shown on the right.This is done as follows. First, for each sensor,we compute the pair-wise distance between thealerts in the case and those in the given set, us-ing the distance measure described in [8]. Theseare then recorded in a distance matrix as shownat the bottom of Figure 2.

To these matrices we next apply the Hungar-ian algorithm, to find the optimal matching, i.e.,the set of alert pairs with the minimum sum ofthe distances. These are shown in the boxes inthe matrices. The matched alerts from the givenset are then extracted to form the derived set.Last we apply the distance measure again to findthe distance between the derived pattern and the

pattern of the case. If this distance is below somespecified threshold, we take this is as meaningwhat was explained above, i.e., that alerts ex-tracted from the given set are correlated in thesame manner as their corresponding alerts in thecase and that the attack represented by the caseis likely to have occurred.

It should be noted that this is done for ev-ery case in the case library. Thus it is possiblethat more than one case will be matched in thismanner, indicating that possibly more than oneattack has taken place.

5 Order-Preserving Correlation

The methodology described in the previous sec-tion does not consider the temporal ordering ofthe sequences of alerts. This, however, can serveas valuable additional information. Intrusions areoften carried out in phases, through a series ofattacks. An earlier attack normally prepares theconditions for following attacks. In the differentphases, different types of alerts are generated. Asa result, alerts are fired in a temporal sequencethat corresponds to the attack phases. Whenmatching two sets of alerts, if the order of alertsin one set is the same as the order of their matchesin the other set, the matching between these twosets is said to be order-preserving. A graphicalillustration of this notion is given in Figure 3.If the elements in the sets are ordered, (a) and(b) show two matchings where the ordering hasbeen broken, while (c) and (d) are two matchingswhere the ordering is preserved.

By imposing an order constraint on the


ISSN: 1109-2750103


Figure 3: Arbitrary matching (a) and (b) and order-preserving matching (c) and (d).

matching, the possible matches for one elementin one set is no longer all the elements in theother set. As a result, we can significantly reducethe solution search space by concentrating on asmaller set of possible matchings.

To illustrate the method, suppose we aregiven two sets X = {x1, x2, x3} and Y ={y1, y2, y3, y4, y5}, with cost matrix C,

2 1 5 4 36 7 8 9 103 6 4 5 2

where the element ci,j is the cost c(xi, yj) for xi

matching yj, which, in our application, is the dis-tance between the two alerts. We wish to find aminimal cost matching of the members of X withthose in Y that preserves their orderings. Thepossible matchings for each member of X, giventhis ordering constraint, are shown in Figure 4.The left diagram shows that x1 can be matchedwith any of y1, y2, or y3. The reason that x1

cannot be matched with either y4 or y5 is thatthen there would be no order-preserving match-ings for x2 and x3. Similar considerations applyto give the possible matchings indicated in thecenter and right diagrams for x2 and x3.

How one determines this minimal cost is thecrux of dynamic programming (DP). In DP, aproblem is divided into many sub-problems thatcan be solved easily. The solutions of those sub-problems are then used to find the overall solutionof the original problem. Our method will first beexplained in terms of the foregoing example, andthen the general formulation will be presented.We consider the process of determining a set of

matchings for the xi as progressing through a se-ries of stages. In the present example, the firststage is to choose a matching for x1, the secondstage is to choose one for x2, and the third is tochoose one for x3. Each stage s has a possiblecost that is determined by the possible matchesfor xs. Thus, in our example, state 1 may havecost 2, 1, or 5, according to which of, y1, y2, or y3

is matched with x1. In the language of dynamicprogramming, if the action at stage s is to matchxs with yk, then that stage is said to be in stateks. The sum of the costs of all states is the cost ofan order-preserving matching. Our goal is to findthe minimal overall cost. Given these notations,then the minimum cost of getting to this stageand state is denoted fs(ks). To illustrate, thesecosts for the stages and states in the foregoingexample are determined as follows.

f1(1) = 2, f1(2) = 1, f1(3) = 5

These values are simply the costs of matching x1

respectively with y1, y2, y3, as given by the costmatrix.

f2(2) = 7 + f1(1) = 7 + 2 = 9

In stage 2, one can match x2 with y2 only if, instage 1, x1 was matched with y1. Otherwise orderwould not be preserved. Hence the total accumu-lated cost at this stage and state is the sum of theearlier choice, given as f1(1), and the cost of thenew matching, as given by the cost matrix.

f2(3) = 8 + min{f1(1), f1(2)}= 8 + min{2, 1} = 9

Here the reasoning is similar, except if we arematching x2 with y3, then there were two possi-ble matchings in stage 1 for x1, namely, with y1


ISSN: 1109-2750104


Figure 4: Possible matchings for the xi.

or y2. So the minimum accumulated cost at thisstate and stage is determined as the minimumof the previous two possibilities plus the cost ofthe new matching as given by the cost matrix.The remaining steps are determined in this samemanner.

f2(4) = 9 + min{f1(1), f1(2), f1(3)}= 9 + min{2, 1, 5} = 10

f3(3) = 4 + f2(2)= 4 + 9 = 13

f3(4) = 5 + min{f2(2), f2(3)}= 5 + min{9, 9} = 14

f3(5) = 2 + min{f2(2), f2(3), f2(4)}= 2 + min{9, 9, 10} = 11

Thus the minimal possible overall cost is deter-mined to be the minimum of the values for f3,namely, 11.

The matchings that provide this solution arenot explicitly evident from the computationalprocess, but can be determined as part of theimplementation of the dynamic programming al-gorithm. This will be discussed in the sectionbelow.

The general case illustrated by this exam-ple can now be described as follows. Supposewe are given two sets X = {x1, x2, ..., xn} andY = {y1, y2, ..., ym}. For purposes of this discus-sion, assume that X is the smaller of the two,i.e., that n ≤ m. Then an element in X can onlymatch with m − n + 1 elements in Y ; the candi-date matches for the ith element in X are the ithto (i + m − n)th elements in Y .

As in the example, we can envision this prob-lem as being decomposed along a tree, where herethe tree has n levels. Moreover, again each pathupwards through the tree represents a series ofstages in constructing a complete matching forthe xi. As above, where s is the stage and yk isthe element matched with xs at this stage, the

stage is said to be in state ks, and the minimumcost of getting to this stage and state is denotedfs(ks) . The recursion formulas for dynamic pro-gramming, which compute these costs are:

f1(k1) = c1,k1(1)

for all possible values of k1, and

fi(ki) = ci,ki+ min

i−1≤j<ki

fi−1(j), (2)

where 1 < i ≤ n, for all possible values of eachki. The constraint i − 1 ≤ j < ki on states overwhich we minimize in (2) ensures the requirementfor ordered matching. The computation producesdifferent costs on different states at the final stagen, namely, fn(n), fn(n+1), . . . , fn(m). The solu-tion for the overall problem is the minimal cost,namely, min

n≤kn≤m{fn(kn)}.

6 The Algorithm for

Order-Preserving Matching

From formulas (1) and (2), it seems natural thatthere should be a recursive solution for our algo-rithm. However, the implementation of our dy-namic algorithm is by iteration instead of recur-sion. This is because a recursive solution maycause many common sub-problems to be com-puted repeatedly. For example, in order to com-pute f3(4) and f3(5), some sub-problems, suchas f2(2) and f2(3) have to be computed morethan once, if recursion is applied. As an opti-mization, however, one can compute such sub-problems once and then store the results to readback later. The iteration solution thus is intro-duced to reduce call overhead. In fact, such iter-ative implementations of dynamic programmingalgorithms are quite common in dynamic pro-gramming research, cf. [9].


ISSN: 1109-2750105


Figure 5: Initialize the cache table

Figure 6: Algorithm of order-preserving matching

The algorithm is presented in Figures 5 and6, and the result of applying this algorithm tothe 3 by 5 matrix in Section 6 is shown in 7. Theroutine ‘initialize’ in 5 is invoked by ‘ordered-min-cost’ in 6, and has the purpose of extracting thedata of interest from the initial matrix and insert-ing this into a cache table. In this example, n = 3and m = 5, and the result of executing the line“C′ = initialize(C)” is to create the cache tableshown on 7(a). The result of the first iterationof the outer for loop, i.e., when i = 2 is shown in7(b). The code is mostly self-explanatory. Theline “Attach idx to C′(i,j)” means to create thesuperscript shown in Figure 7. The result of thesecond iteration, when i = 2, is shown in Figure7(c).

This matrix can then be used to determine theordered matchings that yield the minimal overallcost, together with this cost. These are indicated

by the gray boxes in 7(d). First, the minimalvalue is selected (grayed). This value (i.e., 11) isthe desired minimal overall cost. Then the super-script on the value (i.e., 1) is taken as the indexof the cell to be grayed on the immediately pre-ceding row (i.e., row 2). In turn, the superscripton this value (i.e., 1) is taken as the index of thecell to be grayed on its immediately precedingrow (row 1). The interpretation of the grayedboxes is that the best matching (with cost 11) isobtained by having x1 match its first candidatematch, namely y1 (see 4), having x2 match itsfirst candidate match, namely y2, and having x3

match its third candidate match, namely y5.

7 Experiments

We used the DARPA Cyber Panel Grand Chal-lenge Problem (GCP) program [4], an attack sim-ulator, to evaluate the case-oriented alert correla-


ISSN: 1109-2750106


Figure 7: An example of changes in a cache table

tion methodology. The experiments assume thata meta IDS is set up to protect a corporation onthe large scale. All the cases and problems forthe experiments came from the attack scenariossimulated by the program. The experimental re-sults have shown the enhanced case-oriented alertcorrelation is a promising approach.

7.1 The Grand Challenge Problem

Simulator

The GCP models a fictitious shipping company,called Global, Inc., and allows one to simulatethis company under different kinds of strategiccoordinated attacks. As such, it exhibits the pri-mary characteristics of many large, network cen-tric organizations. The GCP scenario describesthree strategic, coordinated cyber attacks thatare conducted against Global. These attacks ex-emplify typical attack classes: life-cycle, insider,or distributed denial of service of attacks (de-noted as A1, A2, and A3, respectively).

The Global corporate structure has 5 differ-ent types of network sites, namely Headquar-ters (HQ), Air Planning Center (APC), cor-porate transport ships (SHIP), Air TransportHubs (ATH), and Partner Air Transport Hubs(PATH). Each site is defended by typical sensors,as well as postulated high-level sensors. There arein total 10 types of sensors in the networks. Allthe sensors generate the alerts in IDMEF mes-sages. Simulation tools allow researchers to con-

figure a network of arbitrary size and observe thealert streams that would be generated by sensorswhen attacks are perpetrated.

7.2 Experiment Setup

Running an attack scenario with the GCP simu-lator requires providing the following inputs:

• Specification of the number of sites of typeSHIP, ATH, and PATH. This sets the sizeof the corporation being attacked.

• Three arbitrarily chosen integers, evidentlyused as seeds for random number genera-tors applied during the attack simulation.

• Specification of the type(s) of attack. Thereare three choices, A1, A2, and A3. One canselect any combination of these, e.g., onecan simulate having simultaneous attacksof types A1 and A3.

• Specification of whether the attacks should,or should not, be accompanied withrandomly generated background attacks(noise).

We experimented with a number of differentsite configurations, represented here as triples.To illustrate, (1, 1, 1) indicates one site of eachtype, SHIP, ATH, and PATH, (1, 5, 10) indicatesone of type SHIP, 5 of type ATH, and 10 of type


ISSN: 1109-2750107


PATH. For each such configuration, we built asimple case library as follows. For each attacktype, A1, A2, A3, we ran one attack of this typewithout background noise. The alerts generatedfor each attack then became the problem part ofa case, and the identity of the attack type formedthe solution part of the case. Here it should benoted further that the simulator allows attacksof type A3 (distributed denial of service) to begenerated only for configurations having one siteof each type. Thus for configuration (1, 1, 1), thecase library has 3 cases, one for each of A1, A2,A3. For all other configurations, the case libraryhas only two cases, one for each of A1 and A2.Different random number seeds were chosen foreach run.

Attacks of combinations of A1, A2 and A3were then run for each configuration, with back-ground noise, and the alert streams were col-lected. These attacks are referred to as “prob-lems”. The alert streams were then input to ourcase-based reasoner, once using the Hungarian al-gorithm, and once using dynamic programming,as the underlying matching technique.

7.3 Experimental Results

The results of these experiments are shown in Ta-bles 1 through 5. In Table 1, the configuration(1, 1, 1) was used. The column heading C1(A1)indicates library case 1, created using an attackof type 1. Similarly for other columns. The sec-ondary column headings “H” and “D” indicateresults using the Hungarian algorithm and dy-namic programming, respectively. The row labelsindicate the problems. To illustrate, problem P1involved an attack of type A1, and problem P4involved simultaneous attacks of types A1 andA2.

The numbers in the cells are the computeddistances between the given problem and case asdetermined by the indicated matching technique.Thus, the 0’s for both H and D under case 1,for problem 1, represent an exact match, i.e., theproblem represented by the case is reported tohave occurred. The other cells in this table, aswell as those in the other tables, are interpretedsimilarly.

From these tables, we can draw a few gen-eral conclusions. The closeness of the resultsfor the Hungarian algorithm based case-oriented

alert correlation and the dynamic programmingbased cased-oriented alert correlation approachesdemonstrate that they are equally good in cor-relating alerts. However, the dynamic program-ming approach has much less complexity. Thecomplexity of the Hungarian algorithm is givenas O((m+n)3) for a m by n cost matrix. In con-trast, with the same m by n cost matrix (supposen < m), the complexity of the dynamic program-ming approach is O(n × (m − n + 1)) because itonly has to fill a cache table with n× (m−n+1)cells in order to find the solution. Thus, the dy-namic programming method is a more efficientsolution for case-based alert correlation.

Given a problem and a case, one of two sit-uations is possible, either the problem is repre-sented by the case, or it is not. Whether the prob-lem is represented by the case can be determinedby means of a threshold on the distance betweenthat problem and case. If the distance exceedsthe threshold, the problem can be determined asnot matching the case, and if the the distanceis smaller than the threshold, it does match thecase, i.e., the attack represented by the case canbe assumed to have occurred.

One can determine such a threshold for eachcase as follows. We illustrate this for case C1where the matching used the Hungarian algo-rithm. We scan through all the tables for thesmallest distance between a problem and C1,where the problem does not match C1. Given theabove tables, this is 0.7798, the distance betweenproblem P6 and C1 in Table 1. We next scanthrough all the tables for the largest distance be-tween a problem and C1, where the problem doesmatch C1. This is 0.0588, the distance betweenproblem P3 and C1 in Table 2.

Thus any value in the range [0.0588, 0.7798]could be an acceptable threshold. It is desirable,however to have this be as insusceptible as possi-ble to noisy data or variants of known attacks. Astraightforward and practical approach is to usethe median of the interval as the threshold. Thus,for cases representing attack A1, the threshold isgiven as (0.7798 + 0.0588)/2 = 0.4193.

Using this same technique one can computea threshold for cases representing attack A2 as(0.5002+0.3283)/2 = 0.4142, and a threshold forcases representing attack A3 as (0.3078+0.0)/2 =0.1539. Since the table values corresponding to


ISSN: 1109-2750108


Table 1: Experimental results with (1,1,1)

C1(A1) C2(A2) C3(A3)H D H D H D

P1(A1) 0 0 0.5386 0.5484 0.3078 0.3078

P2(A2) 0.7828 0.7847 0.0185 0.0185 0.3078 0.3078

P3(A3) 0.8126 0.8137 0.5697 0.5697 0 0

P4(A1,A2) 0 0 0.0185 0.0185 0.3078 0.3078

P5(A1,A3) 0 0 0.5002 0.51070 0 0

P6(A2,A3) 0.7798 0.7818 0.0185 0.0185 0 0

P7(A1,A2,A3) 0 0 0.0185 0.0185 0 0


C1(A1) C2(A2)H D H D

P1(A1) 0.0577 0.0595 0.5030 0.5047

P2(A2) 0.8943 0.8951 0.3327 0.3327

P3(A1,A2) 0.0588 0.0576 0.3001 0.3018

the dynamic programming matching method areessentially identical to those based on the Hun-garian algorithm, we will restrict our discussionto just those involving the latter.

As can be seen from the tables, these thresh-old values effectively identify the attacks con-tained in all the given problems. Thus these re-sults suggest that the overall technique can beeffective. One could, if one wished, add furthertables based on further network configurations tothe mix, but the present choice of configurationsis sufficiently varied that such further additionswill not significantly change the threshold valuesfrom those computed here.

To further validate our method for computingthresholds, one can apply the well-known leave-one-out analysis technique. To illustrate, let usfirst leave out problem P1 in Table 1. This meansthat we recompute the thresholds using all thevalues in the tables except those in the row P1 ofTable 1. We then see if these values can be used toeffectively identify the attack present in P1 of Ta-ble 1. It will be seen that it does. (Coincidentally,in this case the new computation gives the samethree threshold values as computed above.) Thisis then repeated for every problem (row) in everytable. We found that in all cases the method suc-

ceeds; the newly computed thresholds effectivelyidentify the attacks present in each problem.

Note that a large threshold tends to makefalse positives and a small threshold tends tomake false negatives. Thus, if in some applica-tion it is determined that false positives are lessdesirable than false negatives, one can lower thethreshold values by some appropriate amount.Conversely, if false negatives are less desirablethat false positives, one can raise the thresholds.

8 Concluding Remarks

Alert correlation is a difficult and critical issuefor any multi-sensor intrusion detection environ-ment. This paper has presented an approachusing case-based reasoning for alert correlation.We call this case-oriented alert correlation. Theunderlying idea is to transform alert correlationinto a matching problem, using either a maxi-mal matching or an order-preserving matching.The Hungarian algorithm provides the solutionfor maximal matching, whereas dynamic pro-gramming is used for order-preserving matching.These were determined to be equally effective,while the latter is much more efficient. Exper-iments with the DARPA Grand Challenge Prob-


ISSN: 1109-2750109




P1(A1) 0.0298 0.0298 0.5002 0.5107

P2(A2) 0.9449 0.9454 0.3282 0.3293

P3(A1,A2) 0.0298 0.0302 0.2973 0.3079



P1(A1) 0 0 0.5002 0.5107

P2(A2) 0.8596 0.8607 0.3282 0.3293

P3(A1,A2) 0 0 0.2973 0.3079

lem program have shown that our case-orientedapproach is very effective in finding correlatedalerts and the corresponding attacks.

The CBR methodology has several advan-tages. One is that it does not require exten-sive training data. This is significant inasmuch aslarge amounts of training data often is not avail-able. In CBR, effective performance can often beobtained with only a small case library. A secondadvantage is that the knowledge acquisition pro-cess is generally easier. Obtaining case data andincorporating this into the case library is quitestraightforward, whereas other approaches oftenrequire complex learning techniques. A third ad-vantage is that the methodology can be appliedto intrusions that occur as a result of a series ofattacks, as well as those occurring in a single at-tack. This only requires that some expert be ableto identify the relevant attacks in the sequenceand write these into a case for the library, i.e., sothat a case becomes a description of the entire se-quence of attacks. Since CBR allows for approx-imate matches between problems and cases, thishas the added benefit that the system can detectvariants of the same attack, as long as the vari-ance is not too great. A fourth advantage is thatthis methodology is easier to learn and use thanapproaches which, like the well-known Snort sys-tem, require that one learn a new special purposelanguage. A fifth is that the CBR approach is notthwarted by false/noisy alerts and quite effective

in detecting multiple attacks, since the presenceof such alerts seldom affects the best-matchingprocess.

One downside of this approach is the highcomputation cost when there is a storm of alertsin the network. Our experiment used all the avail-able runtime alerts, i.e., everything in the givendata set. In real applications, however, one realis-tically can only keep the alerts occurring in sometime window and shift the window over time.How big the window should be, and whether itssize should be fixed or adjustable, are questionsfor future study. Another drawback is that theeffectiveness of the intrusion detection can seri-ously degrade if one or more of the relevant sen-sors are removed from the network. This couldcause cases in the library to not be matched, eventhough the attacks specified by those cases areactually occurring. For best performance is itnecessary that all sensors corresponding to theproblem features used in the case library be op-erational.

References

[1] A. Bundy. Artificial Intelligence Techniques:A Comprehensive Catalogue. Springer-VerlagNew York, Inc, 4th edition, 1995.

[2] F. Cuppens. Managing alerts in a multi-intrusion detection environment. In Proceed-ings of the 17th Annual Computer Security


ISSN: 1109-2750110




P1(A1) 0.0563 0.0576 0.5030 0.5047

P2(A2) 0.9304 0.9311 0.0185 0.0185

P3(A1,A2) 0.0822 0.0949 0.2973 0.3079

Application Conference, page 22. IEEE Com-puter Society, 2001.

[3] F. Cuppens and A. Miege. Alert correlation ina cooperative intrusion detection framework.In Proceedings of the 2002 IEEE Symposiumon Security and Privacy, page 202, Oakland,CA, 2002. IEEE Computer Society.

[4] DARPA. Cyber panel grand challenge prob-lem specification version 4.1. Technical re-port, June 2004.

[5] H. Debar and A. Wespi. Aggregation and cor-relation of intrusion-detection alerts. In Lec-ture Notes In Computer Science, Proceedingsof the 4th International Symposium on Re-cent Advance in Intrusion Detection, pages85–103. Springer-Verlag, 2001.

[6] M. Esmaili, B. Balachandran, R. Safavi-Naini, and J. Pieprzyk. Case-based reasoningfor intrusion detection. In ACSAC ’96: Pro-ceedings of the 12th Annual Computer Secu-rity Applications Conference, pages 214–137.IEEE Computer Society, 1996.

[7] K. Julisch. Mining alarm clusters to improvealarm handling efficiency. In Proceedings ofthe 15th Annual Computer Security Applica-tions Conference, pages 12–21. IEEE Com-puter Society, 2001.

[8] J. Long, D. G. Schwartz, and S. Stoecklin. AnXML distance measure. In Proceedings of the2005 International Conference on Data Min-ing, pages 119–125, 2005.

[9] R. Luus. Iterative Dynamic Programming.CRC Press, 2000.

[10] L. Lewis. A case-based reasoning approachto the resolution of faults in communica-tion networks. In Proceedings of the IFIP

TC6/WG6.6 Third International Symposiumon Integrated Network Management with par-ticipation of the IEEE Communications Soci-ety CNOM and with support from the Insti-tute for Educational Services, pages 671–682.North-Holland, 1993.

[11] B. Morin and D. Debar. Correlation of in-trusion symptoms: an application of chroni-cles. In Lecture Notes In Computer Science,Proceedings of the 6th International Sympo-sium on Recent Advance in Intrusion Detec-tion, pages 94–112. Springer-Verlag, 2003.

[12] B. Morin, L. Me, H. Debar, and M. Ducasse.Correlation of intrusion symptoms: an appli-cation of chronicles. In Lecture Notes In Com-puter Science, Proceedings of the 5th Interna-tional Symposium on Recent Advance in In-trusion Detection, pages 115. Springer-Verlag,2002.

[13] J. Munkres. Algorithms for the assignmentand transportation problems. Journal OfSIAM, 5(1):32–38, March 1957.

[14] P. Ning, Y. Cui, and D. S. Reeves. Con-structing attack scenarios through correlationof intrusion alerts. In Proceedings of the 9thACM Conference on Computer and Commu-nications Security, pages 245–254, Washing-ton, D.C., November 2002.

[15] M. W. Fong P. A. Porras and A. Valdes. Amission-impact-based approach to INFOSECalarm correlation. In Lecture Notes in Com-puter Science, Proceedings Recent Advancesin Intrusion Detection, pages 95–114, Zurich,Switzerland, October 2002.


ISSN: 1109-2750111


[16] S. Templeton and K. Levitt. A re-quires/provides model for computer attacks.In New Security Paradigms Workshop, Pro-ceedings of the 2000 workshop on New se-curity paradigms, Ballycotton, County Cork,Ireland, October 2001. ACM Press.

[17] A. Valdes and K. Skinner. Probabilisticalert correlation. In Lecture Notes In Com-puter Science, Proceedings of the 4th Interna-tional Symposium on Recent Advances in In-trusion Detection, number 2212, pages 54–68.Springer-Verlag, 2001.


ISSN: 1109-2750112


case-oriented alert correlation -...

Documents