chapter 4

Modeling and Detection of Camouflaging Worm 2012

CHAPTER 4

SYSTEM DESIGN

Systems design is the process of defining the architecture, components, modules,

interfaces, and data for a system to satisfy specified requirements. It implies a systematic

and rigorous approach to design—an approach demanded by the scale and complexity of

many systems problems. The purpose of System Design is to create a technical solution that

satisfies the functional requirements for the system. At this point in the project lifecycle

there should be a Functional Specification, written primarily in business terminology,

containing a complete description of the operational needs of the various organizational

entities that will use the new system. The challenge is to translate all of this information into

Technical Specifications that accurately describe the design of the system, and that can be

used as input to System Construction. The Functional Specification produced during System

Requirements Analysis is transformed into a physical architecture.

The purpose of System Design is to create a technical solution that satisfies the

functional requirements for the system. At this point in the project lifecycle there should be

a Functional Specification, written primarily in business terminology, containing a complete

description of the operational needs of the various organizational entities that will use the

new system. The challenge is to translate all of this information into Technical

Specifications that accurately describe the design of the system, and that can be used as

input to System Construction.

The Functional Specification produced during System Requirements Analysis is

transformed into a physical architecture. System components are distributed across the

physical architecture, usable interfaces are designed and prototyped, and Technical

Specifications are created for the Application Developers, enabling them to build and test

the system.

System design contains Logical Design & Physical Designing, logical designing

describes the structure & characteristics or features, like output, input, files, database &

procedures. The physical design, which follows the logical design, actual software & a

working system. There will be constraints like Hardware, Software, Cost, Time &

Interfaces.

Department of Computer Science & Engg, SaIT


System Design involves the analysis, design, and configuration of the necessary

hardware and software components to support your solution's architecture. The five major

components of System Design include: the Information Model, Community Model,

Security/Permission Model, System Integration, Workflow, and Technical Architecture.

A System Design typically provides the following benefits:

Improved system performance; individually tailored configuration advice

demonstrates where improvement is necessary, and how to improve the system to

regain lost performance.

Customers gain a detailed understanding of how their users use their system. This

Usage Profile can be leveraged to develop future architecture changes.

Potential to learn of future concerns, allowing customers to take proactive measures

to avoid problems.

A baseline performance level is established against which benefits can be compared

and changes to the system predicted or foreseen.

System design is the process of working out the overall functionality and approach

that the system will include. It starts at a high level and then drills down into great detail,

and normally ends up with the production of a technical specification.

The design is the process of designing exactly how the specifications are to be

implemented. Analysis and design are very important in the whole development cycle. Any

fault in the design could affect the product or could be very expensive to solve in the later

stage of software development.

System Design is the activity of proceeding from an identified set of requirements

for a system to a design that meets those requirements. A distinction is sometimes drawn

between high-level or architectural design, which is concerned with the main components of

the system and their roles and interrelationships, and detailed design, which is concerned

with the internal structure and operation of individual components. The term system design

is sometimes used to cover just the high-level design activity.

System components are distributed across the physical architecture, usable interfaces

are designed and prototyped, and Technical Specifications are created for the Application.

Developers, enabling them to build and test the system.



The system design are broadly classified into two categories : high level design and

low level design.

High Level Design :-A high-level design provides an overview of a solution, platform, system, product,

service, or process. Such an overview is important in a multi-project development to make

sure that each supporting component design will be compatible with its neighboring designs

and with the big picture. The highest level solution design should briefly describe all

platforms, systems, products, services and processes that it depends upon and include any

important changes that need to be made to them. A high-level design document will usually

include a high-level architecture diagram depicting the components, interfaces and networks

that need to be further specified or developed.

The high-level design defines the project level architecture of the system. This

architecture defines the sub-systems to be built, internal and external interfaces to be

developed, and interface standards identified. The high level design is where the sub-

system requirements are developed. The high-level design also identifies the major

candidate off-the-shelf products that might be used in the system.

High-level design is the transitional step between what [requirements for sub-

systems] the system does, and how [architecture and interfaces] the system will be

implemented to meet the system requirements. This process includes the decomposition of

system requirements into alternative project architectures and then the evaluation of these

project architectures for optimum performance, functionality, cost, and other issues

[technical and non-technical]. Stakeholder involvement is critical for this activity. In this

step, internal and external interfaces are identified along with the needed industry standards.

These interfaces are then managed throughout the development process. The following uses

ramp metering as an example for the two key decomposition activities:

Functional decomposition is breaking a function down into its smallest parts. [E.g.,

ramp metering includes the sub-functions of detection, meter rate control, main line

metering, ramp queuing, time of day, and communications].

Physical decomposition defines the physical elements needed to carry out the

function. [E.g., ramp metering decomposition includes loops, controller clock, fiber or

twisted pair for communications, 2070 controllers, host computers, cabinets, and conduits].


http://www.fhwa.dot.gov/cadiv/segb/glossary/r.htm#text_Requirements

http://www.fhwa.dot.gov/cadiv/segb/glossary/a.htm#text_Architecture

http://www.fhwa.dot.gov/cadiv/segb/glossary/d.htm#text_Design


Finally, allocating these sub-functions to the physical elements of the system will

form the complete project architecture. This step also defines the integration and

verification activities needed when the system elements are developed.

The high-level design of a software system is a collection of module and subroutine

interfaces related to each other by means of USES and IS_COMPONENT_OF

relationships. The High Level Design Document is a pretty important document for a

project, covering at a high level the overall design of the solution. If one were to try and

present a very succinct summary of the High Level Document, it could be something like

this:

Detailed use case scenarios of key process flows of the application

The class model and relationships

The sequence diagrams which outline key use case scenarios

The data/object model with relational table design

User interface style and design

After the requirements definition the high level design is the most important document

and provides the blueprint for the further stages of a project including the detailed design

and implementation stages. By not getting the high level design right, organisations run the

risk of creating problems which could be extremely expensive to remedy at a later stage.

The purpose of this High Level Design (HLD) Document is to add the necessary

detail to the current project description to represent a suitable model for coding. This

document is also intended to help detect contradictions prior to coding, and can be used as a

reference manual for how the modules interact at a high level. The HLD documentation

presents the structure of the system, such as the database architecture, application

architecture (layers), application flow (Navigation), and technology architecture. The HLD

uses non-technical to mildly-technical terms which should be understandable to the

administrators of the system.

The document may also depict or otherwise refer to work flows and/or data flows

between component systems. In addition, there should be brief consideration of all

significant commercial, legal, environmental, security, safety and technical risks, issues and

assumptions. The idea is to mention every work area briefly, clearly delegating the

ownership of more detailed design activity whilst also encouraging effective collaboration

between the various project teams.



Today, most high-level designs require contributions from a number of experts,

representing many distinct professional disciplines. Finally, every type of end-user should

be identified in the high-level design and each contributing design should give due

consideration to customer experience. The HLD uses non-technical to mildly-technical

terms which should be understandable to the administrators of the system.

The functioning of high level design can be easily explained by the use of

architecture diagram, class diagram and sequence diagram.

Architecture Diagram

An architecture diagram in “system architecture” is typically a technological

set-up, either various computer components working together, or steps in a software

process working towards a specific end result.

FIG. 4.1 Architecture diagram of camouflaging worm

In fig 4.1 we have a centralized C-Worm detection system along with its

different modules. The different component includes pure random scan, worm

detection list, and a system scan. The system scan is performed by selecting system

volume information.



Class Diagram

In software engineering, a class diagram in the Unified Modeling Language (UML)

is a type of static structure diagram that describes the structure of a system by showing the

system's classes, their attributes, and the relationships between the classes. The class

diagram is the main building block of object oriented modeling. It is used both for

general conceptual modeling of the systematic of the application, and for detailed

modeling translating the models into programming code. Class diagrams can also be

used for data modeling. The classes in a class diagram represent both the main

objects and or interactions in the application and the objects to be programmed. In

the class diagram these classes are represented with boxes which contain three parts.

A class with three sections:-

The upper part holds the name of the class

The middle part contains the attributes of the class

The bottom part gives the methods or operations the class can take or

undertake

In the system design of a system, a number of classes are identified and

grouped together in a class diagram which helps to determine the static relations

between those objects. With detailed modeling, the classes of the conceptual design

are often split into a number of subclasses.

FIG. 4.2 Class diagram of camouflaging worm



Sequence diagram

A sequence diagram in a Unified Modeling Language (UML) is a kind of

interaction diagram that shows how processes operate with one another and in what

order. It is a construct of a Message Sequence Chart. A sequence diagram shows

object interactions arranged in time sequence. It depicts the objects and classes

involved in the scenario and the sequence of messages exchanged between the

objects needed to carry out the functionality of the scenario. Sequence diagrams

typically are associated with use case realizations in the Logical View of the system

under development.

Sequence diagrams are sometimes called event diagrams, event scenarios,

and timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines),

different processes or objects that live simultaneously, and, as horizontal arrows, the

messages exchanged between them, in the order in which they occur. This allows

the specification of simple runtime scenarios in a graphical manner.

FIG. 4.3 Sequence diagram of camouflaging worm



Main Modules :-The different modules included in this project are:

1. C-Worm detection Module

The C-Worm has a self-propagating behavior similar to traditional worms, i.e., it

intends to rapidly infect as many vulnerable computers as possible. However, the C-Worm

is quite different from traditional worms in which it camouflages any noticeable trends in

the number of infected computers over time. The camouflage is achieved by manipulating

the scan traffic volume of worm-infected computers. Such a manipulation of the scan traffic

volume prevents exhibition of any exponentially increasing trends or even crossing of

thresholds that are tracked by existing detection schemes.

This worm attempts to remain hidden by sleeping (suspending scans) when it

suspects it is under detection. Worms that adopt such smart attack strategies could exhibit

overall scan traffic patterns different from those of traditional worms. Since the existing

worm detection schemes will not be able to detect such scan traffic patterns, it is very

important to understand such smart-worms and develop new countermeasures to defend

against them.

2. Worms are malicious : Detection Module OR Anomaly Detection

Worms are malicious programs that execute on these computers, analyzing the

behavior of worm executables plays an important role in host based detection systems.

Many detection schemes fall under this category. In contrast, network-based detection

systems detect worms primarily by monitoring, collecting, and analyzing the scan traffic

(messages to identify vulnerable computers) generated by worm attacks. Many detection

schemes fall under this category. Ideally, security vulnerabilities must be prevented to begin

with, a problem which must addressed by the programming language community. However,

while vulnerabilities exist and pose threats of large-scale damage, it is critical to also focus

on network-based detection, as this paper does, to detect wide spreading worms.

Anomaly detection, also referred to as outlier detection refers to detecting patterns in

a given data set that do not conform to an established normal behavior.[2] The patterns thus

detected are called anomalies and often translate to critical and actionable information in

several application domains. Anomalies are also referred to as outliers, change, deviation,

surprise, aberrant, peculiarity, intrusion, etc.



In particular in the context of abuse and network intrusion detection, the interesting

objects are often not rare objects, but unexpected bursts in activity. This pattern does not

adhere to the common statistical definition of an outlier as a rare object, and many outlier

detection methods (in particular unsupervised methods) will fail on such data, unless it has

been aggregated appropriately. Instead, acluster analysis algorithm may be able to detect the

micro clusters formed by these patterns.

Three broad categories of anomaly detection techniques exist. Unsupervised

anomaly detection techniques detect anomalies in an unlabeled test data set under the

assumption that the majority of the instances in the data set are normal by looking for

instances that seem to fit least to the remainder of the data set. Supervised anomaly

detection techniques require a data set that has been labeled as "normal" and "abnormal"

and involves training a classifier (the key difference to many other statistical classification

problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly

detection techniques construct a model representing normal behavior from a given normal

training data set, and then testing the likelihood of a test instance to be generated by the

learnt model.

3. Pure Random Scan (PRS) Module

C-Worm can be extended to defeat other newly developed detection schemes, such

as destination distribution-based detection. In the following, Recall that the attack target

distribution based schemes analyze the distribution of attack targets (the scanned destination

IP addresses) as basic detection data to capture the fundamental features of worm

propagation, i.e., they continuously scan different targets

4. Worm propagation Module

Worm scan traffic volume in the open-loop control system will expose a much

higher probability to show an increasing trend with the progress of worm propagation. As

more and more computers get infected, they, in turn, take part in scanning other computers.

Hence, we consider the C-Worm as a worst case attacking scenario that uses a closed loop

control for regulating the propagation speed based on the feedback propagation status.



Low Level DesignLow Level Design (LLD) is like detailing the HLD. It defines the actual logic for

each and every component of the system. Class diagrams with all the methods and relation

between classes comes under LLD. Programs specs are covered under LLD. LLD describes

each and every module in an elaborate manner so that the programmer can directly code the

program based on this. There will be at least 1 document for each module and there may be

more for a module. The LLD will contain: - detailed functional logic of the module in

pseudocode - database tables with all elements including their type and size - all interface

details with complete API references (both requests and responses) - all dependency issues -

error message listings - complete input and outputs for a module.

The low level design document for a project should provide a complete and detailed

specification of the design for the software that will be developed in the project, including

the classes, member and non-member functions, and associations between classes that are

involved. By the end of the Low Level Design stage, the code should be "all but written".

The low level design document should contain a listing of the declarations of all the classes,

non-member-functions, and class member functions that will be defined during the

implementation stage, along with the associations between those classes and any other

details of those classes (such as member variables) that are firmly determined by the low

level design stage. The low level design document should also describe the classes, function

signatures, associations, and any other appropriate details, which will be involved in testing

and evaluating the project according to the evaluation plan defined in the project's

requirements document.

More importantly, each project's low level design document should provide a

narrative describing (and comments in your declaration and definition files should point

out) how the high level design is mapped into its detailed low-level design, which is just a

step away from the implementation itself. This should be an English description of how you

converted the technical diagrams (and text descriptions) found in your high level design into

appropriate class and function declarations in your low level design.

This document describes each and every module in an elaborate manner, so that the

programmer can directly code the program based on this. There will be at least 1 document

for each module and there may be more for a module. The LLD will contain: - detailed

functional logic of the module, in pseudo code - database tables, with all elements,

including their type and size - all interface details with complete API references(both



requests and responses) - all dependency issues -error message listings - complete input and

outputs for a module.

The low level design document for a project should provide a complete and detailed

specification of the design for the software that will be developed in the project, including

the classes, member and non-member functions, and associations between classes that are

involved. By the end of the Low Level Design stage, the code should be "all but written".

The low level design document should contain a listing of the declarations of all the

classes, non-member-functions, and class member functions that will be defined during the

implementation stage, along with the associations between those classes and any other

details of those classes (such as member variables) that are firmly determined by the low

level design stage. The low level design document should also describe the classes, function

signatures, associations, and any other appropriate details, which will be involved in testing

and evaluating the project according to the evaluation plan defined in the project's

requirements document.

More importantly, each project's low level design document should provide a narrative

describing (and comments in your declaration and definition files should point out) how the

high level design is mapped into its detailed low-level design, which is just a step away

from the implementation itself. This should be an English description of how you converted

the technical diagrams (and text descriptions) found in your high level design into

appropriate class and function declarations in your low level design. You should be

especially careful to explain how the class roles and their methods were combined in your

low level design, and any changes that you decided to make in combining and refining

them.

During the detailed phase, the view of the application developed during the high level

design is broken down into modules and programs. Logic design is done for every program

and then documented as program specifications. For every program, a unit test plan is

created. The entry criteria for this will be the HLD document. And the exit criteria will the

program specification and unit test plan (LLD). The Low Level Design Document gives the

design of the actual program code which is designed based on the High Level Design

Document. It defines Internal logic of corresponding sub module designers are preparing

and mapping individual LLDs to Every module. A good Low Level Design Document

developed will make the program very easy to be developed by developers because if

proper analysis is made and the Low Level Design Document is prepared then the code can



be developed by developers directly from Low Level Design Document with minimal effort

of debugging and testing. The Low Level Design is explained by Data Flow Diagram and

Activity Diagram.

Data Flow Diagram

A Data flow diagram (DFD) is a graphical representation of the "flow" of data

through an information system, modeling its process aspects. Often they are a

preliminary step used to create an overview of the system which can later be elaborated.

[2] DFDs can also be used for the visualization of data processing (structured design).

Data Flow diagrams (DFDs) are one of the three essential perspectives of the

structured-systems analysis and design method SSADM. The sponsor of a project and

the end users will need to be briefed and consulted throughout all stages of a system's

evolution. With a data flow diagram, users are able to visualize how the system will

operate, what the system will accomplish, and how the system will be implemented. The

old system's dataflow diagrams can be drawn up and compared with the new system's

data flow diagrams to draw comparisons to implement a more efficient system. Flow

diagrams can be used to provide the end user with a physical idea of where the data they

input ultimately has an effect upon the structure of the whole system from order to

dispatch to report. How any system is developed can be determined through a data flow

diagram.

A Data flow diagram (DFD) is a graphical representation of the "flow" of data

through an information system, modeling its process aspects. Often they are a

preliminary step used to create an overview of the system which can later be elaborated.

[2] DFDs can also be used for the visualization of data processing (structured design).

A DFD shows what kinds of data will be input to and output from the system,

where the data will come from and go to, and where the data will be stored. It does not

show information about the timing of processes, or information about whether processes

will operate in sequence or in parallel (which is shown on a flowchart).



FIG. 4.4 Data Flow diagram of camouflaging worm

Activity Diagram



Activity diagrams are graphical representations of workflows of stepwise

activities and actions with support for choice, iteration and concurrency. In the

Unified Modeling Language, activity diagrams can be used to describe the business

and operational step-by-step workflows of components in a system. An activity

diagram shows the overall flow of control.

Activity diagrams are constructed from a limited number of shapes,

connected with arrows. The most important shape types:

Rounded rectangles represent activities.

Diamonds represent decisions.

Bars represent the start (split) or end (join) of concurrent activities.

A black circle represents the start (initial state) of the workflow.

An encircled black circle represents the end (final state).

Arrows run from the start towards the end and represent the order in which

activities happen. Hence they can be regarded as a form of flowchart. Typical

flowchart techniques lack constructs for expressing concurrency. However, the join

and split symbols in activity diagrams only resolve this for simple cases. The

meaning of the model is not clear when they are arbitrarily combined with decisions

or loops.

Activity diagram is basically a flow chart to represent the flow form one

activity to another activity. The activity can be described as an operation of the

system. So the control flow is drawn from one operation to another. This flow can

be sequential, branched or concurrent. Activity diagrams deals with all type of flow

control by using different elements like fork, join etc.

Activity is a particular operation of the system. Activity diagrams are not

only used for visualizing dynamic nature of a system but they are also used to

construct the executable system by using forward and reverse engineering

techniques. The only missing thing in activity diagram is the message part. It does

not show any message flow from one activity to another. Activity diagram is some

time considered as the flow chart. Although the diagrams looks like a flow chart but

it is not. It shows different flow like parallel, branched, concurrent and single.



FIG. 4.5 Activity diagram of camouflaging worm



Use Case Diagram:-

In software and systems engineering, a use case is a list of steps, typically

defining interactions between a role (known in UML as an "actor") and a system, to

achieve a goal. The actor can be a human or an external system. In systems

engineering, use cases are used at a higher level than within software engineering,

often representing missions or stakeholder goals. The detailed requirements may

then be captured in SysML or as contractual statements.

A use case defines the interactions between external actors and the system

under consideration to accomplish a goal. Actors must be able to make decisions,

but need not be human: "An actor might be a person, a company or organization, a

computer program, or a computer system — hardware, software, or both." Actors

are always stakeholders, but many stakeholders are not actors, since they "never

interact directly with the system, even though they have the right to care how the

system behaves."

For example, "the owners of the system, the company's board of directors,

and regulatory bodies such as the Internal Revenue Service and the Department of

Insurance" could all be stakeholders but are unlikely to be actors.

Similarly, a person using a system may be represented as different actors because he

is playing different roles. For example, user "Joe" could be playing the role of a

Customer when using an Automated Teller Machine to withdraw cash from his own

account, or playing the role of a Bank Teller when using the system to restock the

cash drawer on behalf of the bank.

Actors are often working on behalf of someone else. A stakeholder may play

both an active and an inactive role: for example, a Consumer is both a "mass-market

purchaser" (not interacting with the system) and a User (an actor, actively

interacting with the purchased product).[13] In turn, a User is both a "normal

operator" (an actor using the system for its intended purpose) and a "functional

beneficiary" (a stakeholder who benefits from the use of the system).[13] For

example, when user "Joe" withdraws cash from his account, he is operating the

Automated Teller Machine and obtaining a result on his own behalf.

Conceptual modelling refers to specifying, visualizing, and documenting

models of for instance the context of use, a business model, or a software system.

The perspective of the terms in this category is rather technical.



Context of use refers to the characteristics of the users, tasks, and the organizational

and physical environment. Context of use may also describe the cognitive,

motivational and emotional characteristics of the different users, tasks, cooperative

behavior, articulation work and the organizational and physical environment. This is

done out of observations of real work and interviews, including the reflexive point

of view of actors on their context of use. Analyses the possible conflicts of interest

or need between different types of actors. Tries to anticipate different ways in which

a new tool or method could affect the content of the observed tasks and activities,

including the network and collaborative behavior. Analyses both norms and

practices.

FIG 4.6 Use Case Diagram of Camouflaging Worm Detection System



Low Level Design Of The Modules

1. C-Worm detection Module

The C-Worm has a self-propagating behavior similar to traditional worms, i.e., it

intends to rapidly infect as many vulnerable computers as possible. However, the C-Worm

is quite different from traditional worms in which it camouflages any noticeable trends in

the number of infected computers over time. The camouflage is achieved by manipulating

the scan traffic volume of worm-infected computers. Such a manipulation of the scan traffic

volume prevents exhibition of any exponentially increasing trends or even crossing of

thresholds that are tracked by existing detection schemes.

Worm detection has been intensively studied in the past and can be generally

classified into two categories: “hostbased” detection and “network-based” detection.

Hostbased detection systems detect worms by monitoring, collecting, and analyzing worm

behaviors on end-hosts. Since worms are malicious programs that execute on these

computers, analyzing the behavior of worm executables plays an important role in host-

based detection systems. Many detection schemes fall under this category [37]. In contrast,

network-based detection systems detect worms primarily by monitoring, collecting, and

analyzing the scan traffic (messages to identify vulnerable computers) generated by worm

attacks.

Many detection schemes fall under this category [19]. Ideally, security

vulnerabilities must be prevented to begin with, a problem, which must addressed by the

programming language community. However, while vulnerabilities exist and pose threats of

large-scale damage, it is critical to also focus on network-based detection, as this paper

does, to detect widespreading worms. In order to rapidly and accurately detect Internet-wide

large-scale propagation of active worms, it is imperative to monitor and analyze the traffic

in multiple locations over the Internet to detect suspicious traffic generated by worms.

The widely adopted worm detection framework consists of multiple distributed

monitors and a worm detection center that controls the former [41]. This framework is well

adopted and similar to other existing worm detection systems, such as the Cybercenter for

disease controller [11], Internet motion sensor [42], SANS ISC [23], Internet sink [41], and

network telescope [43].

The monitors are distributed across the Internet and can be deployed at endhosts,

router, or firewalls, etc. Each monitor passively records irregular port-scan traffic, such as



connection attempts to a range of void IP addresses (IP addresses not being used) and

restricted service ports. Periodically, the monitors send traffic logs to the detection center.

The detection center analyzes the traffic logs and determines whether or not there are

suspicious scans to restricted ports or to invalid IP addresses.

Network-based detection schemes commonly analyze the collected scanning traffic

data by applying certain decision rules for detecting the worm propagation. For example,

Venkataraman et al. [20] andWuet al. [21] proposed schemes to examine statistics of scan

traffic volume, Zou et al. presented a trend-based detection scheme to examine the

exponential increase pattern of scan traffic [19], Lakhina et al. [40] proposed schemes to

examine other features of scan traffic, such as the distribution of destination addresses.

Other works study worms that attempt to take on new patterns to avoid detection

[39]. Besides the above detection schemes that are based on the global scan traffic monitor

by detecting traffic anomalous behavior, there are other worm detection and defense

schemes, such as sequential hypothesis testing for detecting worm-infected computers [44]

and payload-based worm signature detection [45]. In addition, Cai et al. [46] presented both

theoretical modeling and experimental results on a collaborative worm signature generation

system that employs distributed fingerprint filtering and aggregation and multiple edge

networks.

Dantu et al. [47] presented a state-space feedback control model that detects and

control the spread of these viruses or worms by measuring the velocity of the number of

new connections an infected computer makes. Despite the different approaches described

above, we believe that detecting widely scanning anomaly behavior continues to be a useful

weapon against worms, and that, in practice, multifaceted defense has advantages.

2. Worms are malicious: Detection Module OR Anomaly Detection

Worms are malicious programs that execute on these computers, analyzing the

behavior of worm executables plays an important role in host based detection systems.

Many detection schemes fall under this category. In contrast, network-based detection

systems detect worms primarily by monitoring, collecting, and analyzing the scan traffic

(messages to identify vulnerable computers) generated by worm attacks. Many detection

schemes fall under this category. Ideally, security vulnerabilities must be prevented to begin

with, a problem which must addressed by the programming language community. However,



while vulnerabilities exist and pose threats of large-scale damage, it is critical to also focus

on network-based detection, as this paper does, to detect wide spreading worms.

In this section, we develop a novel spectrum-based detection scheme. Recall that the

C-Worm goes undetected by detection schemes that try to determine the worm propagation

only in the time domain. Our detection scheme captures the distinct pattern of the C-Worm

in the frequency domain, and thereby has the potential of effectively detecting the C-Worm

propagation. In order to identify the C-Worm propagation in the frequency domain, we use

the distribution of PSD and its corresponding SFM of the scan traffic.

Particularly, PSD describes how the power of a time series is distributed in the

frequency domain. Mathematically, it is defined as the Fourier transform of the

autocorrelation of a time series. In our case, the time series corresponds to the changes in

the number of worm instances that actively conduct scans over time. The SFM of PSD is

defined as the ratio of geometric mean to arithmetic mean of the coefficients of PSD. The

range of SFM values is [0,1] and a larger SFM value implies flatter PSD distribution and

vice versa.

Notice that the frequency-domain analysis will require more samples in comparison

with the time-domain analysis, since the frequency-domain analysis technique, such as the

Fourier transform, needs to derive power spectrum amplitude for different frequencies. In

order to generate the accurate spectrum amplitude for relatively high frequencies, a high

granularity of data sampling will be required. In our case, we rely on ITM systems to collect

traffic traces from monitors (motion sensors) in a timely manner. As a matter of fact, other

existing detection schemes based on the scan traffic rate [20], variance [21], or trend [19]

will also demand a high-sampling frequency for ITM systems in order to accurately detect

worm attacks. Enabling the ITM system with timely data collection will benefit worm

detection in real time.

3. Pure Random Scan (PRS) Module

C-Worm can be extended to defeat other newly developed detection schemes, such

as destination distribution-based detection. In the following, Recall that the attack target

distribution based schemes analyze the distribution of attack targets (the scanned destination

IP addresses) as basic detection data to capture the fundamental features of worm

propagation, i.e., they continuously scan different targets.



Pure Random Scan Strategy: The worm propagator can randomly select computers

in cyber Space to identify whether a computer is vulnerable. For example, the pure random

scan (PRS) worm randomly scans the entire network IPv4 address space [1, 19]. In this

model, worm- infected hosts do not have any prior vulnerability knowledge or

active/inactive information of other hosts. The worm-infected host randomly selects IP

addresses of victims from the global network IP address space and launches the attack to

those addresses. When the new host is infected, it continuously attacks the network via the

same method.

The main short coming in this approach is that many IP addresses in the network are

not being used by any valid host. Thus, many scans are wasted when targeting non existing

hosts. To address this issue, improvements on random scan have been proposed to launch

selective scans by using the knowledge of network address allocation. For example, some

chunk of IP addresses are used by organizations or enterprises, and thus are more likely to

be well-maintained and less vulnerable.

Some other IP addresses are more likely to be occupied by personal computers, and

thus have higher probability to be vulnerable [33]. Also, computers in the same subnet work

are more likely to use similar system settings and May share the same vulnerabilities. Such

network topology-related information can be obtained through routing tables and DNS and

can improve the probability of successful identification by (up to) three times [34].

We describe a generic random scan algorithm by a sequence of iterates {Xk} on

iteration k = 0, 1, . . . which may depend on previous points and algorithmic parameters.

The current iterate Xk may represent a single point, or a collection of points, to include

populationbased algorithms. The iterates are also capitalized to denote that they are random

variables, reflecting the probabilistic nature of the random search algorithm.

Generic Random Scan Algorithm

Step 0. Initialize algorithm parameters Θ0, initial points X0 ⊂ S and iteration index k = 0.

Step 1. Generate a collection of candidate points Vk+1 ⊂ S according to a specific

generator and associated sampling distribution.

Step 2. Update Xk+1 based on the candidate points Vk+1, previous iterates and algorithmic

parameters. Also update algorithm parameters Θk+1.

Step 3. If a stopping criterion is met, stop. Otherwise increment k and return to Step 1.



4. Worm propagation Module

Worm scan traffic volume in the open-loop control system will expose a much

higher probability to show an increasing trend with the progress of worm propagation. As

more and more computers get infected, they, in turn, take part in scanning other computers.

Hence, we consider the C-Worm as a worst case attacking scenario that uses a closed loop

control for regulating the propagation speed based on the feedback propagation status.

To analyze the C-Worm, we adopt the epidemic dynamic model for disease

propagation, which has been extensively used for worm propagation modeling [2]. Based on

existing results [12], this model matches the dynamics of real-worm propagation over the

Internet quite well. For this reason, similar to other publications, we adopt this model in our

paper as well.

Since our investigated C-Worm is a novel attack, we modified the original epidemic

dynamic formula to model the propagation of the C-Worm by introducing the P2P—the

attack probability that a worm-infected computer participates in worm propagation at time t.

We note that there is a wide scope to notably improve our modified model in the future to

reflect several characteristics that are relevant in real-world practice.

Particularly, the epidemic dynamic model assumes that any given computer is in one

of the following states: immune, vulnerable, or infected. An immune computer is one that

cannot be infected by a worm; a vulnerable computer is one that has the potential of being

infected by a worm; an infected computer is one that has been infected by a worm.

Algorithm for worm propagation:

Step 1. Collect traffic in local network

Step 2. Create suspicious list from outbound traffic

Step 3. foreach (record in suspicious list) do

Step 4. if (destination addresses have sequential distribution)

Step 5. then ‘worm alert’

Step 6. else if (destination addresses contain unused IP addresses)


Step 8. else if (the number of distinct addresses of inbound traffic with related port

are large)


Step 10. else ‘the record is normal activity

Step 11. End For.



We think our algorithm can effectively detect random, sequential and other

intelligent worm such as selective-random scan worm. And we can know infected hosts in

local network and take proper actions against those hosts. In addition, our algorithm can be

applied to a real network having a lot of worms that are not removed. It detects not only the

appearance of a new worm also already existing worms.


chapter 4

Documents

automated

computer science

programming

closed loop

pure random

feedback propagation

complete api

candidate