[ieee 2013 ieee conference on communications and network security (cns) - national harbor, md, usa...

8
Objective Metrics for Firewall Security: A Holistic View Mohammed Noraden Alsaleh, Saeed Al-Haj, and Ehab Al-Shaer Department of Software and Information Systems University of North Carolina Charlotte Charlotte, NC, USA {malsaleh, salhaj, ealshaer}@uncc.edu Abstract—Firewalls are the primary security devices in cyber defense. Yet, the security of firewalls depends on the quality of protection provided by the firewall policy. The lack of metrics and attack incident data makes measuring the security of firewall policies a challenging task. In this paper, we present a new set of quantitative metrics that can be used to measure, as well as, compare the security level of firewall policies in an enterprise network. The proposed metrics measure the risk of attacks on the network that is imposed due to weaknesses in the firewall policy. We also measure the feasibility of mitigating or removing that risk. The presented metrics are proven to be (1) valid as compared with the ground truth, and (2) practically useful as each one implies actionable security hardening. I. I NTRODUCTION The cyber world has become the primary focus of research as most of our socioeconomic activities have gone cyber- based and new attacks are emerging over time. Therefore, it is important to keep improving network security for better strategic defense mechanisms. In any network system, firewalls play a crucial role in defining secure boundaries. The quality of protection provided by the deployed firewalls contributes significantly in building a secure network. Assessing the defense level of a network requires rigorous evaluation of firewall policies in the network. With the growing complexity of managing firewall policies manifested in policy sizes and rule correlation [1], [2], there is a tremendous need for defining quantitative metrics that can pinpoint weaknesses, and can measure the complexity of firewall polices. Assessing and measuring the quality of a firewall policy is not an easy task; it requires careful analysis of rules’ distribution and rules’ interaction in single and distributed firewalls in the network. There are many challenges in designing measurable metrics for firewall security. First, there is no specific standard that can be used to evaluate various metric methodologies. Second, security boundaries are usually not well defined as there are no constraints to define such boundaries. The presence of well defined security metrics has a great contribution to security community in the following aspects: First, the ability to compare between different firewall policies relatively. Second, the ability to assess the quality of protection of a firewall over time by comparing the effects of dynamic changes on the policy. Third, the ability to test and measure the security level of a firewall policy before deployment. These aspects are important factors in designing networks. The metrics can suggest restructuring the network or adding more devices to get the desired level of protection. Designing practical metrics requires taking in consideration some criteria that need to be addressed to ensure metrics usability [3], [4]. These criteria can be summarized as follows: Quantitatively accurate: the metric has to give an accurate quantitative value to be used practically, it should not give a wide range or an approximation that can not be used to take a proper decision. Qualitatively measurable: the metric has to be designed to evaluate what it is intended to measure; otherwise, there will be no meaning for that metric. Valid: a well defined metric is a metric that can be vali- dated and verified. For example, if a metric is calculated based on different factors, each factor has to participate in calculating the final value. Increasing or decreasing a value of one factor must be reflected in the copulative metric value accordingly based on the metric definition. Repeatable: a metric value has to be independent of the analyst performing the measurement. To the best of our knowledge, this is the first work to address and define quantitative security metrics for firewall policies. The previous studies focused on the anomalies in firewall policies and measured the complexity based on the number of policy misconfigurations [1], [2]. However, in this paper we focus on evaluating firewall policies that have no misconfigurations which make it remarkably challenging and distinguishable as compared to previous works. Our contri- bution in this paper comes in designing a set of security firewall metrics that can measure the risk of attacks on the network due to weaknesses in the firewall policy. We also measure how feasible this risk can be removed or mitigated. We provide metrics that are measurable, provable, actionable, and repeatable. They can be used as stand alone measures and to conduct relative comparison between instances of firewall polices over different time periods or between different firewall policies. The proposed metrics provide a holistic view because they consider the structure of firewall policy as well as the end-host security. The metrics are defined to include the trust of traffic sources, the vulnerabilities of traffic receivers as well as the potential risk propagation due to network 978-1-4799-0895-0/13/$31.00 ©2013 IEEE 6th Symposium on Security Analytics and Automation 2013 470

Upload: ehab

Post on 09-Feb-2017

216 views

Category:

Documents


4 download

TRANSCRIPT

Objective Metrics for Firewall Security:A Holistic View

Mohammed Noraden Alsaleh, Saeed Al-Haj, and Ehab Al-ShaerDepartment of Software and Information Systems

University of North Carolina CharlotteCharlotte, NC, USA

{malsaleh, salhaj, ealshaer}@uncc.edu

Abstract—Firewalls are the primary security devices in cyberdefense. Yet, the security of firewalls depends on the quality ofprotection provided by the firewall policy. The lack of metricsand attack incident data makes measuring the security of firewallpolicies a challenging task. In this paper, we present a new setof quantitative metrics that can be used to measure, as well as,compare the security level of firewall policies in an enterprisenetwork. The proposed metrics measure the risk of attacks onthe network that is imposed due to weaknesses in the firewallpolicy. We also measure the feasibility of mitigating or removingthat risk. The presented metrics are proven to be (1) valid ascompared with the ground truth, and (2) practically useful aseach one implies actionable security hardening.

I. INTRODUCTION

The cyber world has become the primary focus of researchas most of our socioeconomic activities have gone cyber-based and new attacks are emerging over time. Therefore,it is important to keep improving network security for betterstrategic defense mechanisms. In any network system, firewallsplay a crucial role in defining secure boundaries. The qualityof protection provided by the deployed firewalls contributessignificantly in building a secure network. Assessing thedefense level of a network requires rigorous evaluation offirewall policies in the network. With the growing complexityof managing firewall policies manifested in policy sizes andrule correlation [1], [2], there is a tremendous need for definingquantitative metrics that can pinpoint weaknesses, and canmeasure the complexity of firewall polices. Assessing andmeasuring the quality of a firewall policy is not an easy task;it requires careful analysis of rules’ distribution and rules’interaction in single and distributed firewalls in the network.

There are many challenges in designing measurable metricsfor firewall security. First, there is no specific standard thatcan be used to evaluate various metric methodologies. Second,security boundaries are usually not well defined as there areno constraints to define such boundaries. The presence of welldefined security metrics has a great contribution to securitycommunity in the following aspects: First, the ability tocompare between different firewall policies relatively. Second,the ability to assess the quality of protection of a firewallover time by comparing the effects of dynamic changes onthe policy. Third, the ability to test and measure the securitylevel of a firewall policy before deployment. These aspects

are important factors in designing networks. The metrics cansuggest restructuring the network or adding more devices toget the desired level of protection.

Designing practical metrics requires taking in considerationsome criteria that need to be addressed to ensure metricsusability [3], [4]. These criteria can be summarized as follows:

• Quantitatively accurate: the metric has to give an accuratequantitative value to be used practically, it should not givea wide range or an approximation that can not be usedto take a proper decision.

• Qualitatively measurable: the metric has to be designed toevaluate what it is intended to measure; otherwise, therewill be no meaning for that metric.

• Valid: a well defined metric is a metric that can be vali-dated and verified. For example, if a metric is calculatedbased on different factors, each factor has to participatein calculating the final value. Increasing or decreasing avalue of one factor must be reflected in the copulativemetric value accordingly based on the metric definition.

• Repeatable: a metric value has to be independent of theanalyst performing the measurement.

To the best of our knowledge, this is the first work toaddress and define quantitative security metrics for firewallpolicies. The previous studies focused on the anomalies infirewall policies and measured the complexity based on thenumber of policy misconfigurations [1], [2]. However, in thispaper we focus on evaluating firewall policies that have nomisconfigurations which make it remarkably challenging anddistinguishable as compared to previous works. Our contri-bution in this paper comes in designing a set of securityfirewall metrics that can measure the risk of attacks on thenetwork due to weaknesses in the firewall policy. We alsomeasure how feasible this risk can be removed or mitigated.We provide metrics that are measurable, provable, actionable,and repeatable. They can be used as stand alone measures andto conduct relative comparison between instances of firewallpolices over different time periods or between different firewallpolicies. The proposed metrics provide a holistic view becausethey consider the structure of firewall policy as well as theend-host security. The metrics are defined to include thetrust of traffic sources, the vulnerabilities of traffic receiversas well as the potential risk propagation due to network

978-1-4799-0895-0/13/$31.00 ©2013 IEEE

6th Symposium on Security Analytics and Automation 2013

470

configuration. Our approach employs the power of formalmethods techniques using Binary Decision Diagrams (BDDs)and SAT tools to model and implement these metrics.

The rest of the paper is organized as follows. Section IIdefines formally the network configuration model. Section IIIpresents the proposed security metrics to evaluate and comparefirewall policies. In section IV we describe the evaluationplan and report the results. Section V surveys the relatedwork. Finally, conclusion and future remarks are presentedin Section VI.

II. NETWORK MODELING

In this work, we use ConfigChecker [22] to model thenetwork and to run the reachability analysis. ConfigCheckermodels the network as a single monolithic finite state machine.The state space is the cross-product of the packet attributesby its possible locations in the network. The packet attributesinclude the header information that determines the networkresponse to a specific packet. The locations represent thenetwork devices that transfer the packet to its destination.

A. State representation

A single state is characterized by a set of variables thatrepresent (1) packet source and destination IP addresses andport numbers as well as the protocol, (2) the ID of thedevice that is currently processing the packet. The followingrepresents the characteristic function:

σ : IPs × ports × IPd × portd × loc → {true, false}

IPs the 32-bit source IP addressports the 16-bit source port numberIPd the 32-bit destination IP addressportd the 16-bit destination port numberloc the 32-bit IP address of the device currently process-

ing the packetThe function σ encodes the state of the network by eval-

uating to true whenever the parameters used as input to thefunction correspond to a packet that is in the network and falseotherwise. Note that because we abstract payload information,we cannot distinguish between 2 packets that are at the samedevice if they also have the same IP header information.Each device in the network is modeled by describing howit does change packets. For example, a router might changethe location of the packet but leave all the header informationunchanged. A device performing network address translationmight change the location of the packet as well as some ofthe IP header information. The behavior of each of thesedevices can be described by a list of rules. Each rule has acondition and an action. The rule condition is described usinga Boolean formula over the parameters of the characteristicfunction σ. If the packet at the device matches a rule condition,then the appropriate action is taken. Performing actions isrepresented as transitions in the state machine. Based ondevices’ configuration, a transition relation is constructed foreach device. The final state machine is the disjunction of alldevices’ transition relations.

ConfigChecker provides a comprehensive network modelfor different types of network devices including routers, fire-walls, IPSec gateways, NAT, etc. In the following section,we include the modeling for firewall policies. For furtherinformation about modeling other devices please refer to [22].

B. Firewall Modeling

Firewall can perform one of two actions: drop the packeror forward to another device without changing the packetattributes. Each firewall policy is a sequence of filtering rules,R1, R2, ..., Rn. Each rule can be written as Ri : Ci ; aiwhere Ci is the constraint on the packet attributes that mustbe satisfied in order to trigger the corresponding action ai.The constraint Ci is represented as a Boolean expression overthe packet attributes. For example,

Ci = (IPs ⇔ v1)∧(IPd ⇔ v2)∧(ports ⇔ v3)∧(portd ⇔ v4)

Where (v1, v2, v3, v4) are some arbitrary values. Since first-match semantic is used for rule matching in firewalls, a policycan be formally represented as:

Pa =∨

i∈index(Accept)

i−1∧j=1

¬Cj ∧ Ci

where index(Accept) = {i | Ri = Ci ; Accept} i.e.the indices of rules whose action is “Accept”. Any packetthat matches the expression Pa will be forwarded. Given thepolicy expression, we can find the transition relation. Let P2

represents the policy of a firewall whose address is 2 andit is connected to the a device whose address is 3, then thetransition relation for this firewall is:

(loc = 2) ∧ P2 ∧ (loc′ = 3)∧(IP′

s=IPs)∧(IP′d=IPd)∧(port′s=ports)∧(port′d=portd)

The primed variables (IP′s, IP′d,port

′s,port

′d) represent the

next state of the transition. The first line of the transitionrelation means that all packets that match the policy P2 will betransferred from location 2 to the new location 3. The secondline indicates that the packet attributes are not changed andcopied from the current state to the next.

III. QUANTIFYING FIREWALL SECURITY

Designing quantitative metrics for a firewall policy is achallenging task. One way to quantify a policy is by studyingthe rules in the policy and how they do interact with each other.In this section, we present a set of metrics that quantify a singlefirewall policy in terms of risk exposure and distribution.

A. Firewall Policy Risk Metric

Risk assessment is one of the main tasks in risk manage-ment process. Along with risk analysis and mitigation, riskassessment helps to reduce the risk level to an acceptablelevel by identifying, controlling, and minimizing the impactof uncertain events. Risk management drives the decisions of(i) hosting new services on the network, (ii) updating topologyor configuration, and (iii) implementing countermeasures for

6th Symposium on Security Analytics and Automation 2013

471

security hardening. Risk analysis is the identification of factorsthat can hinder the security of the network. In this work, weidentify three risk sources that contribute to the total riskof the network: the risk of having a successful attack formunknown or untrusted users, the risk of hosting vulnerableapplications, and finally the risk of attack propagation throughthe network. Risk assessment is the process of computing risk.The risk is calculated as a function of a number of factorsthat can be weighted based on organization’s policies andinterests. In our model, the risk is calculated as a functionof trust score, vulnerability, and assets. The risk mitigationis the process of implementing controls and countermeasuresto prevent the identified risks. The mitigation is normallytriggered based on the scores reported from the risk assessmentprocess. Mitigation techniques are beyond the scope of thispaper.

Our target in this work is to quantify the security offirewall policies. We quantify the security of a given firewallpolicy by computing a single score that measures thepotential risk imposed on the network by the policy beinganalyzed. As the firewall policy allows a set of traffic flows(represented as source-destination pairs) to pass through,each flow contributes to the total risk imposed by the policy.The contribution of each flow is calculated based on theflow attributes (source and destination) in addition to thepropagation impact.

Hypothesis 1: The potential risk of a particular flow isproportional to the trust index of the source and the potentialimpact on the destination network.

The potential risk introduced by each traffic flow is afunction of three factors: source trust index, destination vul-nerability score, and the impact on the network. These factorare defined as follows:• Source Trust Index. The trust index of a traffic flow

is a value between 0 and 1. The index measures howmuch a particular source is trustful. The value 0 meansthe least trustful and 1 means the most trustful. The trustindex is built based on previous knowledge about thesources. I.e., it can be estimated based on the trafficlogs, the organization structure, or the public IP lists.For example, a trust value of 0 is assigned to blacklistedIP addresses. Each organization can have its own trustindex calculation. The following equation represents anexample for calculating the trust index:

Ti =

fint(CVSS-Score, Exposure) i ∈ Internalfu(frequency, familiarity) i ∈ Unknown0 i ∈ Blacklisted

In this equation, the trust value Ti of the source i is set to0 if i belongs to a public blacklist. If the source is insidethe network, then the trust index is calculated based onthe function fint. The function fint is a function of twoparameters: vulnerability score and exposure. The expo-

sure parameter defines how much the source is accessible(i.e., reachable) from outside. For instance, a machinethat runs multiple services and can be accessed from theinternet should have a lower trust index than a machinethat is only accessible from internal machines. Finally,if the source is unknown, the trust value is a function(fu) of two parameters: (1) familiarity which indicateswhether the source is the same or similar (from the samedomain) to another one seen before, (2) frequency whichindicate how often this source has been seen before. Thevalues of fint and fu should be normalized to a rangebetween 0 and 1.

• Destination Vulnerability. The vulnerability score rep-resents the susceptibility of the destination for attacks.As the vulnerability score increases, the destination hasmore vulnerabilities; which increases the attack surfaceand the probability of being exploited. This score can beobtained using standard vulnerability scoring tools [25],[26].

• Destination Impact. The impact measures the expecteddamage of the network if a particular destination isattacked or compromised. The impact score is calculatedbased on direct and indirect damage. The direct damageis the damage caused directly to the assets of thetarget destination. The indirect damage includes theexpected damage if the attack spread through thenetwork as a result of infecting a particular host. Thedirect damage depends completely on the assets keptin the intended destination, while the indirect (attackpropagation) damage depends also on the networkconfiguration which determines the reachability from thetarget destination to other hosts in the network. Hence,the propagation damage is the aggregated damage ofthe hosts that are reachable from the compromiseddestination. We show later how to determine thereachability and calculate the total impact.

Based on the definitions above, the total potential risk forthe firewall policy F is defined as:

R(F) =∑f∈F

(1− Tsrcf ) ∗ Pdestf (1)

WhereF is a set of permitted flows by the firewall policy.srcf is the source of the permitted flow f .destf is the destination of the permitted flow f .Tsrcf is the trust index of the source srcf .Pdestf is the potential impact of destination destf

considering the direct and indirect damage.

As appears in the formal definition, the value of the potentialrisk is inversely proportional to the value of trust. Intuitively,when the trust index of the source increases, the threat comingfrom that source should decrease. This is reflected by a smallerrisk score. The opposite holds for the value of potential

6th Symposium on Security Analytics and Automation 2013

472

impact of the destination. If the expected damage is large,then the risk score should be large as well. Note that thedestination vulnerability does not appear explicitly in the riskdefinition. However, it is implied within the impact definition.We calculate the potential impact Pdestf for the target hostdestf by considering the direct damage on the target hostand the indirect damage (through attack propagation) on otherhosts. The formalization is discussed next in this section.

In addition to comparing different firewall policies, thefirewall policy risk metric provides directions for configurationsecurity hardening. The metric helps administrators to (i)prioritize the flows which should be blocked, (ii) identify thepolicy rules that exhibit the majority of the risk, and (iii)understand the policy weaknesses.

Computing the Firewall Policy Risk. To compute the po-tential risk of a complete firewall policy, we need first toidentify the set of traffic flows that are permitted to passthrough. Finding the set of allowed flows is not as simple asenumerating the rules that have allow action and discardingthose with deny action. This is due to the correlation betweenthe different rules within the single policy. For example,assume that there is a rule in the policy that allows the setS1 of flows to pass through, but it is preceded by anotherrule that blocks the set S2 such that S2 ⊆ S1. Due to thefirst-match semantic used in firewalls, only the set (S1 − S2)will be allowed to pass through. Another type of correlationis redundancy, a set of flows may be included in more thanone rule. To calculate an accurate policy risk score, we needto calculate the set of permitted flows by the firewall policy.

To calculate this, we use the BDD expression discussedin Section II. Utilizing the disjunctive normal form (DNF)for the policy representation and the BDD encoding; theset of flows is simply the satisfying assignments for thefirewall BDD expression. The source and destination valuesare extracted from each satisfying assignment and mapped tothe appropriate trust and impact values.

Computing the Potential Impact. As mentioned earlier,the direct damage for a particular host h depends on itsvulnerability and the total assets of the host.

Hypothesis 2: The potential impact on a particular host isproportional to the vulnerability score, the total assets, andthe propagation impact on other hosts in the network.

In this work, the direct damage of a host is defined as themultiplication of its vulnerability score by the total assets.In addition to the direct damage, the potential impact alsoincludes the indirect damage due to attack propagation. Theindirect damage is the aggregation of the damage on thehost neighbors. Neighbors are all the hosts that are directlyreachable from the selected host. However, the impact of thefirst level neighbors also includes the indirect damage on thesecond level neighbors and so on. We define the impact of

host h as recursive function P (h, τ) as follows:

P (h, τ) =

{Vh ∗Ah τ ≤ 0

Vh ∗(Ah + δ

∑i∈N P (i, τ − 1)

)τ > 0

(2)

WhereVh is the vulnerability score of the host h.Ah is the total assets of the host h.δ is the weight of propagation damage such that

0 ≤ δ ≤ 1.N is the set of unvisited neighbors of h.τ is a threshold to determine when the recursion stops.

The weight δ is a fraction less than or equal to 1 thatweighs the indirect damage so that the damage of the firstlevel accounts for more impact than the damage of the nextlevel and so on. The threshold τ determines the number ofrecursion levels to be considered.

Reachability Analysis. In order to estimate the propagationdamage, we need to enumerate the neighbors of each host (i.e.the hosts that are directly reachable from the selected host). Forthis purpose, we use ConfigChecker that has been introducedin Section II. ConfigChecker provides the means to verify end-to-end reachability requirements. It provides a comprehensivenetwork model as well as a temporal query language based onComputation Tree Logic (CTL) to express the requirements.To verify if the host src can reach the destination dest, wecan run the basic reachability query:

(loc = src) ∧ (sip = src) ∧ (dip = dest) ∧ EF (loc = dest)

This query means that a packet is initially in the host srcand eventually at some point in the future it will reach thehost dest. The variables (loc, sip, dip) represent the location,the source, and the destination IP addresses, respectively. Theoperator EF (φ) is a standard CTL operator which means thatthere is at least one path in which the logical statement φ holdsin the future. We assume that the list of hosts is known in ournetwork and this query can be executed between each pair ofhosts in the network to build a connectivity matrix.

B. Risk Mitigation Metric

In large networks, the response to mitigate attack effectsplays an important role in risk management for a network.One major key in assessing firewall policies is to know howthe potential risk of accepted flows by the firewall policyis distributed among hosts in the network. The distributionof the potential risk can be measured based on specificflow attributes such as source and destination addresses.The distribution skewness of the firewall policy risk overthe source or destination addresses implies that few sourcesor destination addresses exhibit the majority of the risk.Therefore, the higher the risk distribution skewness the betterthe mitigation and control of this risk. On the other hand, theuniform distribution of firewall policy risk imposes extremechallenges in risk control such as flow monitoring and attack

6th Symposium on Security Analytics and Automation 2013

473

quarantining because all hosts exhibit similar risk. In thiscontext, we present the Risk Mitigation metric; an entropybased metric that captures the skewness of the potential riskdistribution of a firewall policy. In this paper, we will take twofields in consideration to define the metric: source IP addressand destination IP address. Other fields such as source anddestination port numbers or the protocol can be consideredas well, but we will focus in studying flow distribution forthese two fields.

Hypothesis 3: The risk mitigation metric with respectto the source/destination IP address is proportional to thepotential risk distribution skewness over the source/destinationIP addresses.

The metric FWMitig is defined formally with respect tothe source and destination IP addresses (src and dest for short)as follows

FWMitig(α) = 1 +

∑i∈S(α) t

αi log(t

αi )

log(D)(3)

Where α is the flow attribute based on which the metric iscalculated and can be either the source or the destinationaddress. The value log(D) is used to normalize the value ofentropy where D is the size of the full IP space (e.g. S = 28

for class C networks). S(α) represents the set of distinct sourceor destination addresses (based on α) in the firewall policy. Thevalues tαi is the portion of risk associated with a particularsource or destination address. To calculate tαi we define therisk Rf associated with a particular permitted flow based onequation 1 as follows

Rf = (1− Tsrcf ) ∗ Pdestf (4)

Where srcf and destf are the source and destination addressesof the flow f . The risk associated with a particular source ordestination IP address i is the summation of the risk associatedwith the flows that have i as their source or destination. Let Rαibe the risk associated with the source or destination addressi, then Rαi is formally defined as follows:

Rαi =∑

f∈Si(α)

Rf α ∈ {src, dest} (5)

Where Si(α) is a set of all flows within the policy whosesource or destination address (according to the value of α) isi. Rf is the risk of flow f as calculated based on Equation 4.

Since the skewness of firewall policy risk distribution withrespect to the source or destination addresses is measuredusing the entropy, the ratio of Rαi to the total risk of the firewallpolicy is used rather than the absolute value. The portion ofrisk tαi associated with the source or destination IP address ito the total risk of the firewall policy is defined formally as:

tαi =RαiR(F)

α ∈ {src, dest} (6)

Where R(F) is the total potential risk of the firewall policyF as defined in Equation 1.

The risk mitigation metric suggests the efficient way todistribute the countermeasures. If the distribution is skewed,the countermeasures may be deployed around the high riskhosts and the intrusion detection systems may be configuredto monitor sources that accounts for the high potion of risk.However, if the distribution is sparse it will be more massivein order to counter the resulting risk.

IV. EVALUATION

In this section, we prove the validity of the proposed metricsto measure firewall security. Starting from the assumption thatsecure networks are more resistant to worm attacks, we sim-ulated worm attacks on synthesized networks and comparedthe results with our security metrics measures. Our securitymetrics measures should be consistent with simulation result,that is, a worm should cause more damage to networks withhigh risk metric values than the damage it cases to networkswith low risk metric values. We also study the effect of riskfactors such as: vulnerability and network connectivity on themeasured metrics.

In order to verify our hypothesis, we follow the followingmethod. First, we randomly generate several networks withvarying sizes and configurations. The configuration is fedto ConfigChecker in order to run reachability analysis andto calculate the impact values for each host. Second, wemeasure the metrics based on the definitions introduced insection III. Third, we simulate worm attacks with differentscanning strategies and calculate the damage caused by theworm. Finally, we compare the metrics measures with theresults of the worm simulation to show that they are consistent.In other words, firewall policies with lower metric measuresprovide more security protection against network attacks suchas worm propagation. In the following discussion, we providemore details about network generation and worm simulation.

A. Network Generator

Due to privacy issues, it is hard to obtain full networkconfiguration from administrators. Therefore, we implementeda random network generator which builds random networktopologies with different sizes and complexity. The networkgenerator also generates random configuration for the differentinstances of network devices. Since this work focuses onmeasuring firewall security, the generated networks are limitedto firewalls, routers, and hosts. The following shows therandom generation process.• Topology. The network generation starts by generation a

random graph. The topology generator takes the numberof nodes, edges, as well as the maximum node degreeas inputs. The degree of a node is the number of edgesconnected to it. The links are then randomly distributedbetween the nodes. The nodes are classified into firewalls,routers, and hosts based on the degree. The leaves areclearly classified as hosts. If a node has a degree greaterthan 2, then it is considered a router. Finally, if a nodehas a degree of exactly 2, it is randomly classified as arouter or a firewall.

6th Symposium on Security Analytics and Automation 2013

474

• Routing. A routing entry is added to each router in thetree. The routing entry mainly consists of two parts: thedestination address, and the next hob. To find the routingdata, we run a depth first search algorithm to build aspanning tree for each host in the network. Starting fromthe host as the root of the tree, the algorithm traversesthe graph to reach all other hosts. Then, a routing entry isadded to each router in the tree where the destination isthe root host and the next hob is the parent of the routerin the spanning tree.

• Access Control Lists. For those nodes that are selectedas firewalls, we need to generate access control lists. Thegenerator reads the average policy size as an input fromthe user and generates a random number of access controlrules for each firewall. The source and destination of eachrule can be an internal address randomly selected fromthe hosts available in the network or a random externaladdress.

• Host Configuration. In addition to the basic host attributessuch as the IP address and subnet mask, we generaterandom values for the vulnerability score and the assetsof each host. The average values entered by the user areused to generate the random values.

The generated network and configuration are exported ina format compatible with ConfigChecker input as it is usedto run the reachability analysis in order to generate the riskmetrics.

B. Worm Simulator

We implemented a simulator to simulate a worm propa-gation in the synthesized networks and calculate the dam-age caused by the worm under the generated configuration.A worm is a malicious program that self-propagate acrossthe network in order to achieve a particular goal such asleaking information or causing damage. We simulate scan-based worms that propagate by generating IP addresses andcompromising vulnerable hosts. Several scanning strategiesthat vary in their complexity and attack efficiency have beenadopted by attackers. We simulate three types of scan-basedworms for our evaluation: the uniform scan worm, divide-and-conquer scan worm, and cooperative scan worm. In thefollowing discussion, we call them Uniform, D&C, and Coop,respectively for short.

• Uniform scan worm (Uniform). This is the simplest scan-ning strategy in which the worm has no idea about wherethe vulnerable hosts are located in the network. The wormselects random IP addresses to find the potential victims.The scanning space is normally the entire IP addressspace. In our simulation we can specify an IP addressrange to draw the scanning address from.

• Divide-and-Conquer worm (D&C). In this type of scan-ning the IP address space is divided on the infected hostswhich makes the scanning more efficient. When a worminfects a target, the target is assigned half of its scan spaceand the original worm continue scanning the other half.

• General Cooperative scan worm (Coop). In this scanningstrategy, the infected machines are cooperating with eachother such that each IP address is scanned only onceby one machine. This strategy is clearly more efficientthan the previous two, but it is harder to implement inlarge scale networks because it requires a well establishedcommunication link between the infected machines.

Two conditions need to hold for a worm in order to attack atarget machine: (1) the target should be reachable by the worm,and (2) the target should be vulnerable. We run a reachabilityquery using ConfigChecker as explained earlier to determineif the worm can reach the target. If the target is reachablethen we compare the vulnerability score of the target againsta vulnerability threshold; if the vulnerability score is greaterthan the threshold, then we consider the target as infected.The final result returned by the simulator is the summation ofinfected machines’ assets which reflects the damage causedby the worm during the simulation period.

C. Experiment Setup

The purpose of this experiment is to show that the risk scoreis consistent with the damage caused by simulated wormssuch that higher risk score is associated with higher damage.We generated a number of different networks. The networksize ranges between 100 and 1000 devices. The assets of thenetwork are randomly distributed among hosts by assigning arandom value between 0 and 1000 for each host. The firewalls’policies are randomly generated with an average policy sizeequal to twice the number of hosts in the network. We usednormal distribution to randomly generate the vulnerabilityscores for the hosts. To study the effect of vulnerability on therisk score, we generated three instances of each one of the 10random networks. The first one with an average vulnerabilityscore of 25%, the second with an average of 50%, and the lastwith an average of 75%.

We simulate the three worm attacks (Uniform, D&C, andCoop) on each of the generated networks. We run the simula-tion for 20 times and then we report the average damage. Foreach run, the simulation time is selected randomly but it is atleast as long as the address scanning space.

D. Results

The results are shown in figures 1, 2, and 3. In these figures,we plot the calculated risk score against the damage reportedby the worm simulators. For each vulnerability setting: averageof 25% (Figure 1), average of 50% (Figure 2), and average of75% (Figure 3), we plot a graph for each type of worms: Uni-form, D&C, and Coop. Each point on the graph correspondsto a test network, the x-coordinate represents the measuredrisk score using our metric, and the y-coordinate representsthe average damage caused by the simulated worm. For eachgraph, a trend line of the scattered points is drawn.

The reported results show that all the trend lines are non-decreasing function. The variance from the line is due to therandomness of worms simulations. However, we notice thatthe graphs of D&C and Coop worms have a large degree

6th Symposium on Security Analytics and Automation 2013

475

0

10

20

30

40

50

3 5 7 9 11 13 15

Dam

age

(%

)

Uniform Worm

0

10

20

30

40

50

3 5 7 9 11 13 15

D&C Worm

0

10

20

30

40

50

3 5 7 9 11 13 15

Coop Worm

Fig. 1. Average vulnerability of 25%.

0

10

20

30

40

50

2 4 6 8 10 12 14 16

Dam

age

(%

)

0

10

20

30

40

50

2 4 6 8 10 12 14 16

0

10

20

30

40

50

2 4 6 8 10 12 14 16Fig. 2. Average vulnerability of 50%.

0

10

20

30

40

50

60

2 5 8 11 14 17 20

Dam

age

(%

)

Risk Score

0

10

20

30

40

50

60

2 5 8 11 14 17 20

Risk Score

0

10

20

30

40

50

60

2 5 8 11 14 17 20

Risk Score Fig. 3. Average vulnerability of 75%.

of similarity due to the similarity of their behavior. TheD&C worms are classified as Cooperative worms in someclassifications [23]. We can see the effect of increasing theaverage vulnerability on both the score and the worm damage.The points are shifted to the right as the average vulnerabilityincrease. The worm damage is also increased as the averagevulnerability increase. However, we notice that the damageincrease is not significant when the average vulnerability isincreased from 25% to 50%. This is due to the simulationsettings. We set the vulnerability threshold that determinesif a node should be compromised if it is reachabile by aninfected machine to a value greater than 50%. The increase inthe damage is obvious in Figure 3.

V. RELATED WORK

In recent years, some research targeted measuring andquantifying network elements. There is no work providedcomplete quantifiable measures for firewall policies. Most ofthe existing work quantify the security measures in a simpleway or evaluates risk for the entire network; but not firewall

policies. Other studies compared the equivalence of firewallpolicies but did not provide quantifiable metrics for firewallor network comparison.

Acharya et al. in [21] presented the dependency metric andthe inversion metric for measuring firewall complexity. Depen-dency metric finds the set of dependent rules preceding a rule.Inversion metric measures the count of pair of adjacent rulesthat have different action. In our work, firewall managementcomplexity is measured by finding the dependency betweenrules in each field in the rule.

Krautsevich et al. [6] presented a formal description andanalysis of security metrics. In this work, number of securitymetrics has been formalized and the dependency among thesemetrics had been investigated. The presented metrics targetedsystem’s security in terms of attacks; attack surface, numberof attacks, minimal cost of attack, shortest length of attack,maximal probability of attack, etc. were formalized.

Lu et al. [8] compared firewall rule tables in term ofequivalence, they used the set representation to model firewall

6th Symposium on Security Analytics and Automation 2013

476

rules and to compare between policies. The work does not givequantitative analysis of firewall policies, it just compared theequivalence of firewall policies. Wool [9] presented a simplemeasure to quantify firewall rule’s complexity. The complexitycriterion in [9] is calculated based on number of rules, objects,and interfaces on a firewall policy. Ahmed et al. [10] proposeda framework for measuring and quantifying network security,the framework identifies and quantifies risk and vulnerabilityfactors of a network and uses it to predict the probability ofexistence of a vulnerability in a host in the network. Khakpouret al. [11] quantify network reachability metrics. The proposedmetrics evaluate the upper and lower bound reachability in thenetwork using FDDs.

A generic overall framework for network security evaluationhas been presented by Atzeni et al. in [16] and discuss theimportance of security metrics in [17]. In [18], Pamula et al.propose a security metric based on the weakest adversary (i.e.the least amount of effort required to make an attack success-ful). Sahinoglu et al. propose a framework in [19], [20] forcalculating existing risk depending on present vulnerabilitiesin terms of threat represented as probability of exploiting thisvulnerability and the lack of counter-measures.

In [24] we provided a general definition for firewall securityand manageability metrics. In this work, we focus on securitymetrics that will help in security hardening. We introduce newmetrics and provide results validating the proposed approach.

VI. CONCLUSION

Measuring security of firewall policies is the first steptoward measuring the system’s security. In this paper, wehave presented novel metrics to measure firewall security.The metrics quantify security aspects of firewall policies andallows comparing different policies relatively. Each metric isdefined to give a quantitative value that measures an importantaspect of firewall policy security. The evaluation shows thatthe metric value complies with the result of worms simulationand reflects the security of firewall policies. Using metrics toquantify firewall security helps decision makers to understandthe current system and evaluate its security. The proposedmetrics are associated with some suggested actions to betaken for improving the overall system’s security. For thefuture work, we will quantify firewall policies based on othermeasures such as performance and manageability.

REFERENCES

[1] H. Hamed, E. Al-Shaer and W. Marrero, Modeling and Verifi-cation of IPSec and VPN Security Policies, IEEE ICNP 2005,November 2005

[2] E. Al-Shaer and H. Hamed. Discovery of Policy Anomalies inDistributed Firewalls. In Proceedings of IEEE INFOCOM’04,pp. 2605-2626, Hong Kong, China, 2004.

[3] F.T. Sheldon, R.K. Abercrombie,and A. Mili, Methodology forEvaluating Security Controls Based on Key Performance Indi-cators and Stakeholder Mission, in HICSS ’09, 2009.

[4] DHS Science and Technology. A Roadmap for CybersecurityResearch, Chapter 2, p. 13, 2009.

[5] M. Huth and M. Ryan, Logic in Computer Science: Modellingand Reasoning about Systems, 2004, Cambridge UniversityPress, New York, NY, USA.

[6] L. Krautsevich, F. Martinelli, and A. Yautsiukhin. Formal ap-proach to security metrics.: what does ”more secure” mean foryou?. In Proceedings of the Fourth European Conference onSoftware Architecture (ECSA ’10), 2010, NY, USA, 162-169.

[7] D. Lin, P. Rao, E. Bertino, and J. Lobo. An approach to evaluatepolicy similarity, In SACMAT 07, NY, USA, 2007.

[8] L. Lu, R. Safavi-Naini, J. Horton and W. Susilo. Comparingand debugging firewall rule tables IET Information Security,Volume 1, Number 4, Pages 143-151, 2007.

[9] A. Wool, A Quantitative Study of Firewall Configuration Errors,Computer, Volume 37, Number 6, Pages 62-67, 2004.

[10] M.S. Ahmed, E. Al-Shaer, and L. Khan, A novel quantitativeapproach for measuring network security, In Proceedings ofIEEE INFOCOM’08, pp 1957 - 1965, Phoenix, AZ, 2008.

[11] R. Khakpour and A. Liu, Quantifying and Querying NetworkReachability, In Proceedings of IEEE ICDCS 2010, pp. 817-826,2010.

[12] M.S. Ahmed, E. Al-Shaer, and L. Khan, Objective Risk Evalu-ation for Automated Security Management, Journal of NetworkSystem Management, Vol 19, pp 343-366, September 2011.

[13] J. Lind-Nielsen, The BuDDy OBDD package.http://sourceforge.net/projects/buddy/.

[14] P. Manadhata and J. Wing, An attack surface metric, in FirstWorkshop on Security Metrics, Vancouver, BC, August 2006.

[15] M. Howard, J. Pincus, and J. M. Wing, Measuring relativeattack surfaces, in Workshop on Advanced Developments inSoftware and Systems Security, Taipei, December 2003.

[16] A. Atzeni, A. Lioy, and L. Tamburino, A generic overall frame-work for network security evaluation, in Congresso AnnualeAICA 2005, October 2005, pp. 605-615.

[17] A. Atzeni and A. Lioy, Why to adopt a security metric? alittle survey, in QoP-2005: Quality of Protection workshop,September 2005.

[18] J. Pamula, P. Ammann, S. Jajodia, and V. Swarup, A weakest-adversary security metric for network configuration securityanalysis, in ACM 2nd Workshop on Quality of Protection 2006,Alexandria, VA.

[19] M. Sahinoglu, Security meter: A practical decision-tree modelto quantify risk, in IEEE Security and Privacy, June 2005.

[20] M. Sahinoglu,Quantitative risk assessment for dependent vul-nerabilities, in Reliability and Maintainability Symposium,2006.

[21] H.B. Acharya, A. Joshi and M.G. Gouda, Firewall modules andmodular firewalls, in IEEE ICNP2010, October 2010.

[22] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. Elbadawi.Network configuration in a box: Towards end-to-end verificationof network reachability and security. In ICNP, pages 123132,2009.

[23] Cliff Changchun Zou, Don Towsley, Weibo Gong. On the per-formance of internet worm scanning strategies, Elsevier Journalof Performance Evaluation, Vol 63, pp 700-723, 2003.

[24] Saeed Al-Haj and Ehab Al-Shaer. Measuring Firewall Security,4th Symposium on Configuration Analytics and Automation(SafeConfig 2011), IEEE, Arlington, VA, October 2011.

[25] M. Salim Ahmed, Mohamed Taibah, Ehab Al-Shaer and LatifKhan. Objective Risk Evaluation for Automated Security Man-agement, Journal of Network System Management (JNSM),Volume: 19, Number: 3, Pages: 343-366 , December 2011.

[26] Mell, P.; Scarfone, K.; Romanosky, S., Common VulnerabilityScoring System, Security & Privacy, IEEE , vol.4, no.6, pp.85,89,Nov.-Dec. 2006.

6th Symposium on Security Analytics and Automation 2013

477