1 the attack and defense of computers dr. 許富皓. 2 attacking program bugs

1

The Attack and Defense of Computers

Dr. 許富皓

2

Attacking Program Bugs

3

Attack TypesBuffer Overflow Attacks:

Stack Smashing attacksReturn-into-libc attacksHeap overflow attacksFunction pointer attacks.dtors overflow attacks.setjump/longjump buffer overflow attacks.

Format string attacks:Integer overflow and integer sign attacks

4

Why Buffer Overflow Attacks Are So Dangerous?

Easy to launch:Attackers can launch a buffer overflow attack by just sending a craft string to their targets to complete such kind of attacks.

Plenty of targets:Plenty of programs have this kind of vulnerabilities.

Cause great damage:Usually the end result of a buffer overflow attack is the attacker’s gaining the root privilege of the attacked host.

Internet worms proliferate through buffer overflow attacks.

5

Stack Smashing Attacks

6

Principle of Stack Smashing Attacks

Overwritten control transfer structures, such as return addresses or function pointers, to redirect program execution flow to desired code.

Attack strings carry both code and address(es) of the code entry point.

7

Explanation of BOAs (1)

b

return address add_g

address of G’s

frame point

C[0]

H’s stack

frame

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[100];

int i;

while((c[i++]=getch())!=EOF)

{

}

}

C[99]

Input String: xyz

Z

Y

X

G’s stack frame

0xabc

0xaba0xabb

8

Explanation of BOAs (2)

b


address of G’s

frame point

C[0]

H’s stack

frame

addrress oxabc

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[100];

int i;

while((c[i++]=getch())!=EOF)

{

}

}

C[99]

Injected Code0xabc

Attack String: xxInjected Codexy0xabc

Length=108 bytes

0xaba0xabb x

x

x

y

9

Injected Code:

The attacked programs usually have root privilege; therefore, the injected code is executed with root privilege.

The injected code is already in machine instruction form; therefore, a CPU can directly execute it.

However the above fact also means that the injected code must match the CPU type of the attacked host.

Usually the injected code will fork a shell; hence, after an attack, an attacker could have a root shell.

10

Injected Code of Remote BOAs

In order to be able to interact with the newly forked root shell, the injected code usually need to execute the following two steps:

Open a socket.

Redirect standard input and output of the newly forked root shell to the socket.

11

Example of Injected Code for X86 Architecture : Shell Code

char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";

12

Two Factors for A Successful Buffer Overflow-style Attack(1)

A successful buffer overflow-style attack should be able to overflow the right place (e.g. the place to hold a return address with the correct value (e.g. the address of injected code entry point)).

13

Two Factors for A Successful Buffer Overflow-style Attack(2)

buffer where the

overflow startinjected code

return address

offset between the beginning of the

overflowed buffer and the overflow

target.

address of injected code

entry point.

The offset and the entry point address are non-predicable. They can

not decided by just looking the source code or local binary code.

14

Non-predicable OffsetFor performance concerns, most compilers don’t allocate memory for local variables in the order they appear in the source code, sometimes some space may be inserted between them. (Source Code doesn’t help)Different compiler/OS uses different allocation strategy. (Local binaries don’t help)Address obfuscation insert random number of space between local variables and return address. (Super good luck may help)

15

Non-predicable Entry Point Address

[fhsu@ecsl]#

0xbfffffff system data

environment variablesargument strings

env pointersargv pointers

argc

webserver –a –b security

command line arguments

and environment variables

Function main()’s stack frame

16

Strategies Used by Attackers to Increase Their Success Chance

Repeat address patterns.

Insert NOP (0x90) operations before the entry point of injected code.

17

Exploit Code Web Sites

Exploit World

MILWORM

Metasploit

Securiteam

http://www.insecure.org/sploits.html

http://www.milw0rm.com/

http://metasploit.com/opcode_database.html

http://www.securiteam.com/exploits/archive.html

18

An Exploit Code Generation ProgramThis program uses the following three loop to generate the attack string which contains the shell code.

for(i=0;i<sizeof(buff);i+=4)

*(ptr++)=jump;

for(i=0;i<sizeof(buff)-200-strlen(evil);i++) buff[i]=0x90;

for(j=0;j<strlen(evil);j++) buff[i++]=evil[j];

http://www.milw0rm.com/exploits/382

http://www.milw0rm.com/exploits/382

19

Return-into-libc Attacks

20

Return-into-libc

A mutation of buffer overflow attacks.Utilize code already resided in the attacked programs’ address space, such as libc functions.Attack strings carry entry point address(es) of a desired libc function, new frame point address and parameters to the function.

21

How Parameters and Local Variables

Are Represented in an Object File?

abc(int aa)

{ int bb;

bb==aa;

:

:

}

abc:

function prologue

*(%ebp-4)=*(%ebp+8)

function epilogue

aa

return address

previous frame

pointbb

ebp

22

A Way to Change the Parameters and

Local Variables of a Function. A parameter or a local variable in an object file is represented through its offset between the position pointed by %ebp and its own position. Therefore, the value of the %ebp register decides where a function to get its parameters and local variables. In other words, if an attacker can change the %ebp of a function, then she/he can also change the function’s parameters and local variables.

23

Function Prologue and Epilogue

#include <stdio.h>

int add_three_items(int a, int b, int c){ int d;

d=a+b+c; return d;}

add_three_items: pushl %ebp movl %esp, %ebp subl $4, %esp

movl 12(%ebp), %eax addl 8(%ebp), %eax addl 16(%ebp), %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax

leave ret

leave=movl %ebp,%esp

popl %ebp

function prologue

function epilogue

3

4

24

Function Calls

main(){ int a, b,c,f; extern int add_three_items();

a=1; b=2; c=3; f=add_three_items(a,b,c);}

main: pushl %ebp movl %esp, %ebp subl $24, %esp

andl $-16, %esp movl $0, %eax subl %eax, %esp movl $1, -4(%ebp) movl $2, -8(%ebp) movl $3, -12(%ebp)

subl $4, %esp pushl -12(%ebp) pushl -8(%ebp) pushl -4(%ebp) call add_three_items addl $16, %esp

movl %eax, -16(%ebp)

leave ret

leave=movl %ebp,%esp

popl %ebp

1

2

5

25

Example codefunction: pushl %ebp movl %esp, %ebp subl $40, %esp leave retmain: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax subl %eax, %esp pushl $3 pushl $2 pushl $1 call function addl $12, %esp leave ret

void function(int a, int b, int c) { char buffer1[5]; char buffer2[10];}

main(int argc, char *argv[]) { function(1,2,3);}

gcc -S test.c;

26

heap

bss

…

%ebp

ret addr (EIP)

$1

$2

$3

…

%ebp

ret addr (EIP)

low

highsp

bp

function: pushl %ebp movl %esp, %ebp subl $40, %esp leave retmain: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax subl %eax, %esp pushl $3 pushl $2 pushl $1 call function addl $12, %esp leave ret

leave =movl %ebp, %esppopl %ebp

27

Explanation of Return-into-libc

b


address of G’s frame point

C[9]

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[10];

overflow occurs

here

} C[0]

H’s stack frame

ebpany value

abc(), e.g. system()

any value

abc: pushl %ebp

movl %esp,%ebp

esp

parameter 1, e.g. pointer to /bin/sh

28


b



C[9]

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[10];

overflow occurs

here

} C[0]

H’s stack frame

ebpany value


any value

abc: pushl %ebp

movl %esp,%ebp

esp


movl %ebp,%esp

(an instruction in function epilogue)

29


b



C[9]

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[10];

overflow occurs

here

} C[0]

H’s stack frame

ebpany value


any value

abc: pushl %ebp

movl %esp,%ebp

esp


any value(popl %ebp)

30


b



C[9]

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[10];

overflow occurs

here

} C[0]

H’s stack frame

ebpany value


any value

abc: pushl %ebp

movl %esp,%ebp

esp


any value

(ret)

31


b



C[9]

G(int a)

{

H(3);

add_g:

}

H( int b)

{ char c[10];

overflow occurs

here

} C[0]

H’s stack frameebp

any value

any value

any value

abc: pushl %ebp

movl %esp,%ebp

esp


After the following two instruction in function system()’s function prologue is executed

pushl %ebp movl %esp, %ebp, the position of %esp and %ebp is shown in the figure.

32

Properties of Return-into-libc Attacks

The exploit strings don’t need to contain executable code.

33

Heap/Data/BSS Overflow Attacks

34

Principle of Heap/Data/BSS Overflow Attacks

Similarly to stack smashing attacks, attackers overflow a sensitive data structure by providing a buffer which is adjacent to the sensitive data structure more data than the buffer can store; hence, to overflow the sensitive data structure.

The sensitive data structure may contain:• A function pointer• A pointer to a string• … and so on.

Both the buffer and the sensitive data structure may locate at the heap, or data, or bss section.

35

Heap and Data/BSS Sections

The heap is an area in memory that is dynamically allocated by the application by using a system call, such as malloc() .

On most systems, the heap grows up (towards higher addresses).

The data section initialized at compile-time.

The bss section contains uninitialized data, and is allocated at run-time.

Until it is written to, it remains zeroed (or at least from the application's point-of-view).

36

Heap Overflow Example

#define BUFSIZE 16

int main()

{ int i=0;

char *buf1 = (char *)malloc(BUFSIZE);

char *buf2 = (char *)malloc(BUFSIZE);

:

while((*(buf1+i)=getchar())!=EOF)

i++;

:

}

37

BSS Overflow Example#define BUFSIZE 16 int main(int argc, char **argv) { FILE *tmpfd; static char buf[BUFSIZE], *tmpfile; : tmpfile = "/tmp/vulprog.tmp"; gets(buf); tmpfd = fopen(tmpfile, "w"); :}

38

BSS and Function Pointer Overflow Example

int goodfunc(const char *str);

int main(int argc, char **argv)

{ int i=0;

static char buf[BUFSIZE];

static int (*funcptr)(const char *str);

:

while((*(buf+i)=getchar())!=EOF)

i++;

:

}

39

Function Pointer Attacks

40

Principle of Function Pointer Attacks

Utilizing a function pointer variable’s adjacent buffer to overwrite the content of the function pointer variable so that it will point to the code chosen by attackers.

A function pointer variable may locate at the stack section, the data section, or at the bss section.

41

Countermeasures of

Buffer Overflow Attacks

42

Countermeasures of Buffer Overflow Attacks (1)

Array bounds checking.

Non-executable stack/heap.

Safe C library.

Compiler solutions, e.g.,StackGuard

RAD

Type safe language, e.g. Java.

Static source code analysis.

43

Countermeasures of Buffer Overflow Attacks (2)

Anomaly Detection, e.g. through system calls.

Dynamic allocation of memory for data that will overwrite adjacent memory area.

Memory Address Obfuscation/ASLR

Randomization of executable Code.

Network-based buffer overflow detection

44

Array Bounds Checking

Fundamental solution for all kinds of buffer overflow attacks.

High run-time overhead (1 time in some situations)

45

Non-executable Stack/Heap

The majority of buffer overflow attacks are stack smashing attacks; therefore, a non-executable stack could block the majority of buffer overflow attacks.

Disable some original system functions, e.g. signal call handling, nested functions.

46

Safe C Library

Some string-related C library functions, such as strcpy and strcat don’t check the buffer boundaries of destination buffers, hence, modifying these kinds of unsafe library functions could secure programs that use these function.Replace strcpy with strncpy, or replace strcat with strncat, … and so on.Plenty of other C statements could still results in buffer overflow vulnerabilities.

E.g. while ((*(ptr+i)=getchar())!=EOF) i++;

47

Compiler Solutions: StackGuard

Put a canary word before each return address in each stack frame. Usually, when a buffer overflow attack is launched, not only the return address but also the canary word will be overwritten; thus, by checking the integrity of the canary word, this mechanism can defend against stack smashing attacks.Low performance overhead.Change the layout of the stack frame of a function; hence, this mechanism is not compatible with some programs, e.g. debugger.Only protect return addresses.

48

Compiler Solutions: RAD

Store another copies of return addresses in a well-protected area, RAR.When a function is call, instead of saving its return address in its corresponding stack frame, another copy of its return address is saved in RAR. When the function finishes, before returning to its caller, the callee checks the return address in its stack frame to see whether the RAR has a copy of that address. If there is no such address in the RAR, then a buffer overflow attack is alarmed.Low performance overhead.Only protect return addresses.

49

Type Safe Language, e.g. Java

These kinds of languages will automatically perform array bound checking.

The majority of programs are not written in these kinds of languages; rewriting all programs with these kinds of languages becomes an impossible mission.

50

Static Source Code Analysis.

Analyze source code to find potential program statements that could result in buffer overflow vulnerabilities. E.g. program statements like

while((*(buf+i)=getchar())!=EOF) i++;

are not safe.False positive and false negative.Difficulty to obtain the source code.

51

Anomaly Detection

This mechanism is based on the idea that most malicious code that is run on a target system will make system calls to access certain system resources, such as files and sockets.This technique has two main parts:

Preprocessingmonitoring.

False positive and false negative.

52

Memory Address Obfuscation/ASLR

This approach randomizes the layout of items in main memory; hence attackers can only guess the address where their injected code reside and the address of their target functions.Change the run-time memory layout specifying by the original file format.Increase the complexity of debugging a program.

53

Aspects of Address Obfuscation (1)

The first is the randomization of the base addresses of memory regions.

This involves the randomization of the base address of • the stack

• heap

• the starting address of dynamically linked libraries

• the locations of functions and static data structures contained in the executable.

The second aspect includes permuting the order of variables and functions.

54

Aspects of Address Obfuscation(2)

The last is the introduction of random length gaps, such as

padding in stack frames

padding between malloc allocations

padding between variables and static data structures

random length gaps in the code segment, with jumps to get over them.

55

Randomization of executable Code

This method involves the randomization of the code that is executed in a process. This approach encrypts instructions of a process, and decrypts instructions when they are prepared to be executed. Because attackers don’t know the key to encrypt their code, their injected code can not be decrypted correctly. As a result their code can not be executed.The main assumption of this method is that most attacks that attempt to gain control of a system are code-injection attacks.Need special hardwares to improve performance overhead.

56

Botnet [Trend Micro]

http://us.trendmicro.com/imperia/md/content/us/pdf/threats/securitylibrary/botnettaxonomywhitepapernovember2006.pdf

http://us.trendmicro.com/imperia/md/content/us/pdf/threats/securitylibrary/botnettaxonomywhitepapernovember2006.pdf

57

Definition of a Botnet

A botnet (zombie army or drone army) refers to a pool of compromised computers that are under the command of a single hacker, or a small group of hackers, known as a botmaster.

58

Definition of a Bot

A bot refers to a compromised end-host, or a computer, which is a member of a botnet.

59

The First Bot Generation Malware - PrettyPark

The first bot generation malware, PrettyPark worm, appeared in 1999.A critical difference between PrettyPark and previous worms is that it makes use of IRC as a means to allow a botmaster to remotely control a large pool of compromised hosts. Its revolutionary idea of using IRC as a discrete and extensible method for Command and Control (C&C) was soon adopted by the black hat community.

60

How Fast Could Your Computer Be Comprised?

Based on the observation of an unpatched version of Windows 2000 or Windows XP located within a dial-in network of a German ISP.

Normally it takes only a couple of minutes before it is successfully compromised. On average, the expected lifespan of the honeypot is less than ten minutes.

• After this small amount of time, the honeypot is often successfully exploited by automated malware.

The shortest compromise time was only a few seconds:• Once we plugged the network cable in, an SDBot compromised the

machine via an exploit against TCP port 135 and installed itself on the machine.

61

Typical Size of Botnets

Some botnets consist of only a few hundred bots.

In contrast to this, several large botnets with up to 50.000 hosts were also oberved.

Botnets with over several hundred thousands hosts have been reported in the past.

62

A Hosts May be Infected by Several Botnets Simultaneously

A home computer which got infected by 16 different bots has been found.

63

Taxonomy of Botnets

Attacking behavior

C&C models

Rally mechanisms

Communication protocols

Observable botnet activities

Evasion Techniques

64

Attacking Behavior [Paul Bächer et al.]

Distributed Denial-of-Service Attacks

Spamming

Sniffing Traffic

Keylogging

Spreading new malware

Installing Advertisement Addons

Google AdSense abuse

Manipulating online polls/games

Mass identity theft

http://www.honeynet.org/papers/bots/



65

Distributed Denial-of-Service Attacks (1)

Often botnets are used for Distributed Denial-of-Service (DDoS) attacks. A DDoS attack is an attack on a computer system or network that causes

a loss of service to users, typically the loss of network connectivity and services

by consuming the bandwidth of the victim network

or overloading the computational resources of the victim system.

66

Distributed Denial-of-Service Attacks (2)

Further research showed that botnets are even used to run commercial DDoS attacks against competing corporations:

Operation Cyberslam documents the story of Jay R. Echouafni and Joshua Schichtel alias EMP. Echouafni was indicted on August 25, 2004 on multiple charges of conspiracy and causing damage to protected computers. He worked closely together with EMP who ran a botnet to send bulk mail and also carried out DDoS attacks against the spam blacklist servers. In addition, they took Speedera - a global on-demand computing platform - offline when they ran a paid DDoS attack to take a competitor's website down.

67

SpammingSome bots offer the possibility to open a SOCKS v4/v5 proxy - a generic proxy protocol for TCP/IP-based networking applications (RFC 1928) - on a compromised machine.Some bots also implement a special function to harvest email-addresses. After having enabled the SOCKS proxy, this machine can then be used for nefarious tasks such as spamming.

With the help of a botnet and thousands of bots, an attacker is able to send massive amounts of bulk email (spam). Often that spam you are receiving was sent from, or proxied through, an old Windows computer at home. In addition, this can of course also be used to send phishing-mails since phishing is a special case of spam.

http://rfc.net/rfc1928.html

68

Sniffing TrafficBots can also use a packet sniffer to watch for interesting clear-text data passing by a compromised machine. The sniffers are mostly used to retrieve sensitive information like usernames and passwords. If a machine is compromised more than once and also a member of more than one botnet, the packet sniffing allows to gather the key information of the other botnet. Thus it is possible to "steal" another botnet.

69

KeyloggingIf the compromised machine uses encrypted communication channels (e.g. HTTPS or POP3S), then just sniffing the network packets on the victim's computer is useless since the appropriate key to decrypt the packets is missing. With the help of a keylogger it is very easy for an attacker to retrieve sensitive information.

An implemented filtering mechanism (e.g. "I am only interested in key sequences near the keyword 'paypal.com'") further helps in stealing secret data. And if you imagine that this keylogger runs on thousands of compromised machines in parallel you can imagine how quickly PayPal accounts are harvested.

70

Spreading New Malware

In most cases, botnets are used to spread new bots. This is very easy since all bots implement mechanisms to download and execute a file via HTTP or FTP. Spreading an email virus using a botnet is a very nice idea, too.

A botnet with 10,000 hosts which acts as the start base for the mail virus allows very fast spreading and thus causes more harm.

71

Installing Advertisement AddonsBotnets can also be used to gain financial advantages. This works by setting up a fake website with some advertisements:

The operator of this website negotiates a deal with some hosting companies that pay for clicks on ads. With the help of a botnet, these clicks can be "automated" so that instantly a few thousand bots click on the pop-ups.

This process can be further enhanced if the bot hijacks the start-page of a compromised machine so that the "clicks" are executed each time the victim uses the browser.

72

Google AdSense AbuseA similar abuse is also possible with Google's AdSense program:

AdSense offers companies the possibility to display Google advertisements on their own website and earn money this way. The company earns money due to clicks on these ads, for example per 10,000 clicks in one month. An attacker can abuse this program by leveraging his botnet to click on these advertisements in an automated fashion and thus artificially increments the click counter. This kind of usage for botnets is relatively uncommon, but not a bad idea from an attacker's perspective.

https://www.google.com/adsense/

https://www.google.com/adsense/

73

Loss Caused by Click Fraud [Catherine

Holahan]

On average, consultants estimate that between 14% and 15% of clicks are fraudulent.

http://www.businessweek.com/technology/content/jul2006/tc20060726_355531.htm?campaign_id=rss_tech

http://www.businessweek.com/technology/content/jul2006/tc20060726_355531.htm?campaign_id=rss_tech

74

Google Search Page

75

Google Search Result Page

76

Source HTML File of the Google Search Result Page

77

Ampersands (&'s) in URLs [Liam Quinn ]

Always use & in place of & when writing URLs in HTML:

E.g.: <a href="foo.cgi?

chapter=1&section=2&copy=3&lang=en">...</a>

http://htmlhelp.com/tools/validator/problems.html

78

Click Fraud (1) - Use the Browser’s URL Field

79

Click Fraud (2) – Connect to the Google Server Directly

Attackers could launch the same attacks byopening a HTTP connection to a Google server

and

sending the URL in the previous slide to the above server directly.

80

Click Fraud (3) - Use Fake Page (1)

81

Click Fraud (3) - Use Fake Page (2) [Mr. 東]

http://blog.roodo.com/ikaridon/f46df2ec.gif

http://blog.roodo.com/ikaridon/f46df2ec.gif

82

Click Fraud (3) - Use Fake Page (3)

83

Manipulating online Polls/Games

Since every bot has a distinct IP address, every vote will have the same credibility as a vote cast by a real person.

Online games can be manipulated in a similar way. Currently we are aware of bots being used that way, and there is a chance that this will get more important in the future.

84

Mass Identity TheftOften the combination of different functionality described above can be used for large scale identity theft, one of the fastest growing crimes on the Internet. Bogus emails ("phishing mails") that pretend to be legitimate (such as fake PayPal or banking emails) ask their intended victims to go online and submit their private information.

These fake emails are generated and sent by bots via their spamming mechanism. These same bots can also host multiple fake websites pretending to be ebay, PayPal, or a bank, and harvest personal information. Just as quickly as one of these fake sites is shut down, another one can pop up.

In addition, keylogging and sniffing of traffic can also be used for identity theft.

85

What Is IRC, and How Does It Work? [David

Caraballo et al.]

IRC (Internet Relay Chat) provides a way of communicating in real time with people from all over the world. It consists of various separate networks (or "nets") of IRC servers, machines that allow users to connect to IRC. The largest nets are

EFnet (the original IRC net, often having more than 32,000 people at once), Undernet, IRCnet, DALnet, and NewNet.

http://www.irchelp.org/irchelp/new2irc.html

http://www.irchelp.org/irchelp/new2irc.html

http://www.efnet.org/

http://www.undernet.org/

http://www.funet.fi/~irc/

http://www.dal.net/

http://www.newnet.net/

86

IRC ClientGenerally, the user (such as you) runs a program (called a "client") to connect to a server on one of the IRC nets. The server relays information to and from other servers on the same net. Recommended clients:

UNIX/shell: ircII Windows: mIRC Macintosh clients

http://www.irchelp.org/irchelp/networks/

http://www.irchelp.org/irchelp/ircii/

http://www.irchelp.org/irchelp/mirc/

http://www.irchelp.org/irchelp/mac/

87

IRC Bot [wikepedia]

An IRC bot is a set of scripts or an independent program that connects to Internet Relay Chat as a client, and so appears to other IRC users as another user.

It differs from a regular client in that instead of providing interactive access to IRC for a human user, it performs automated functions.

http://en.wikipedia.org/wiki/IRC_bot

88

IRC ChannelsOnce connected to an IRC server on an IRC network, you will usually join one or more "channels" and converse with others there.On IRC, channels are where people meet and chat.You may know them as "chat rooms".Channel names usually begin with a #, as in #irchelp.Conversations may be

public (where everyone in a channel can see what you type) or

private (messages between only two people, who may or may not be on the same channel).

89

Scheme of an IRC-Network [wikipedia]

normal clients (green)bots (blue) bouncers (orange)

http://en.wikipedia.org/wiki/Internet_Relay_Chat

90

Command and Control (C&C) System

C&C works as follows. A botmaster sets up a C&C server, typically an IRC server. After a bot virus infects a host, it will connect back to the C&C server and wait on the botmaster’s command.In a typical IRC botnet, the bot will join a certain IRC channel to listen to messages from its master.

91

Categories of C&CC&C systems can be roughly categorized into three different models

the centralized model, the peer-to-peer (P2P) model the random model

P.S.: We believe these three C&C models are sufficient to cover all the botnets found today. But there is possibility that future botnets may use new command and control systems that are completely from any of them, noting the quickly evolving nature of botnets.

92

Centralized C&C ModelIn the centralized model, a botmaster selects a single high bandwidth host to be the contacting point (C&C server) of all the bots.

The C&C server, usually a compromised computer as well, would run certain network services such as IRC, HTTP and etc. When a new computer is infected by a bot, it will join the botnet by initiating a connection to the C&C server. Once joined to the appropriate C&C server channel, the bot would then wait on the C&C server for commands from the botmaster. Botnets may have mechanisms to protect their communications.

• For example, IRC channels may be protected by passwords only known to bots and their masters to prevent eavesdropping.

93

Popularity of the Centralized C&C Model

The centralized model is the predominant C&C model used by existing botnets.

Many well known bots, such as AgoBot, SDBot and RBot, fall into the category of the centralized C&C model.

94

Why the Centralized C&C Model (1) ?

Due to the rich variety of software tools (e.g., IRC bot scripts on IRC servers and IRC bots), the centralized C&C model is rather simple to implement and customize. Notice that a botmaster can easily control thousands of bots using the centralized model.Botmasters are profit driven; hence, they are more interested in the centralized C&C model which allows them to control as many bots as possible and maximize their profit.

95


Few countermeasures have been used to fight against botnets. So, the centralized botnets have good survivability in the real world at this moment.

96


Messaging latencies in the centralized model is small. Therefore, it is easy for botmasters to coordinate botnets and launch attacks.

97

Drawback of the Centralized C&C Model

The C&C server is the crucial place where most of the conversation happens. Therefore, the C&C server is the weakest link in a botnet.

If we can manage to discover and destroy the C&C server, the entire botnet will be gone.

98

Motivation for a P2P-Based C&C Model

Some botnet authors have started to build alternative botnet communication systems, which are more resilient to failures in the network.

An interesting C&C paradigm that emerged recently exploits the idea of P2P communication.

For instance, certain variants of Phatbot have used P2P communication as a means to control botnets.

The botnets that use P2P based C&C are still very few.

99

Futures of the P2P-Based C&C Model

Compared with the centralized C&C model, the P2P based C&C model is much harder to discover and destroy. Since the communication system doesn’t heavily depend on a few selected servers, destroying a single, or even a number of bots, won’t necessarily lead to the destruction of an entire botnet. Because of this, it is possible that the P2P based C&C model will be used increasingly in botnets in the near future.

100

Constraints of the P2P C&C Model (1)

Existing P2P systems only support conversations of small user groups, usually in the range of 10-50 users.

The group size supported by P2P systems is too small compared to the size of centralized C&C botnets, in which a botnet of 1000 compromised hosts is still on the small side.

101

Constraints of the P2P C&C Model (2)

Existing P2P systems don’t guarantee message delivery and propagation latency. Therefore, if using P2P communication, a botnet would be harder to coordinate than those which use centralized C&C models.

102

Trend of the P2P C&C Model

The above two constraints have limited the wider adoption of P2P based communication in botnets.

As the knowledge on implementing P2P based botnets accumulates, new P2P-based botnets, which overcome the above limitations, may appear.

As such, more and more botnets will move to use P2P based communication since it is more robust than centralized C&C communication.

103

Random C&C ModelIn the proposed random C&C model, a bot will not actively contact other bots or the botmaster. Rather, a bot would listen to incoming connections from its botmaster.To launch attacks, a botmaster would scan the Internet to discover its bots. When a bot is found, the botmaster will issue command to the bot.While such a C&C model is easy to implement and highly resilient to discovery and destruction, the model intrinsically has scalability problem, and is difficult to be used for large scale, coordinated attacks.Although this C&C model has not been used in real world botnets, it is potentially interesting to certain future types of botnets that want high survivability.

104

Rallying Mechanisms

Rallying mechanisms are critical for botnets to discover new bots and rally them under their botmasters.

105

Hard-coded IP AddressA common method used to rally new bots works like this:

A bot includes hard-coded C&C server IP addresses in its binary. When the bot initially infects a computer, it will connect back to the C&C server using the hard-coded server IP address that is contained in the binary code. The problem with using hard-coded IP addresses is that the C&C server can be easily detected and the communication channel easily blocked. If a C&C server is "disconnected" in this fashion, a botnet may be completely deactivated. Because of this, hard-coded server IP addresses are not as much used now by recent variants of bots.

106

Dynamic DNS Domain NameThe bots today often include hard-coded domain names, assigned by dynamical DNS providers.The benefit to use dynamic DNS is that, if a C&C server is shutdown by authorities, the botmaster can easily resume his/her control by creating a new C&C server somewhere else and updating the IP address in the corresponding dynamic DNS entry.

When connections to the old C&C server fail, the bots will perform DNS queries and be redirected to the new C&C server. This DNS redirection behavior is often known as herding.

Using dynamic DNS names, a botmaster can retain the control on its botnet when existing C&C server fails to function. Sometimes, a botmaster will also update the dynamic DNS entry periodically to shift the locations of the command and control server, making the detection harder.

107

Distributed DNS ServiceSome of the newer botnet breeds run their own distributed DNS service at locations that are out of the reach of law enforcement or other authorities. Bots include the addresses of these DNS servers and contact these servers to resolve the IP addresses of C&C servers. Many times, these DNS services are chosen to run at high port numbers in order to evade the detection by security devices at gateways. The botnets using distributed DNS service to rally their bots are the hardest to detect and destroy, compared with other types of botnets discussed.

108

Communication ProtocolsBots communicate with each other and their botmasters following certain well-defined network protocols. In most cases, botnets don’t create new network protocols for their communication. Instead, they use existing communication protocols that are implemented by publicly available software tools.

e.g., the IRC protocol itself, and already publicly available software implementations for IRC servers and clients.

109

The Importance of Understanding the Botnet Communication Protocols

First, their communication characteristics provide an understanding of

the botnets’ origins

and

the possible software tools being used.

Secondly, understanding the communication protocols help security researchers to decode the conversations which happen among bots and their masters.

110

Common Botnet Communication Protocols

IRC Protocol

HTTP Protocol

P2P Protocol

… and so on.

111

Evasion Techniques – for AV and IDS

A variety of techniques are used by botnets to evade AV and signature based IDS systems, e.g.,

sophisticated executable packers

rootkits

protocol evasion techniques, etc

These evasion techniques improve the survivability of botnets and the success rate of compromising new hosts.

112

Evasion Techniques – Communication (1)

Additionally, botnets have also added (and continue to add) new mechanisms to hide traces of their communication. Some botnets are moving away from IRC, since monitoring of IRC traffic is increasingly done in an effort to detecting botnets. Instead, botnets are starting to use

modified IRC protocols or

other protocols altogether (e.g., HTTP, VoIP)

for their communication channels.

113

Evasion Techniques – Communication (2)

Encryption schemes are also being used to prevent the content from being revealed.

Certain state-of-the-art botnets even use convert channel communications such as TCP and ICMP tunneling, and even IPv6 tunneling.

There have been technical discussions which discuss the possibility of using SKYPE and IM to support communication.

114

Other Observable Activities

In order to detect the presence of botnets, we need to discover abnormal behaviors exhibited by botnets.The botnet observable behaviors can be categorized into three types:

network based behaviorhost-based behaviorglobal correlated behavior.

115

Network-based Behaviors

1. Observable CommunicationBotmasters need to communicate with their bots and launch attacks.

2. Observable Attacking TrafficWhen performing these functions, botnets will generate certain observable network traffic patterns that we can use to detect individual bots and their C&C servers.

116

Observable Communication (1)Since botnets often use IRC and HTTP to communicate with their bots, observable IRC & HTTP traffic with abnormal patterns can be used to indicate the presence of bots and the C&C servers.

For example, • inbound/outbound IRC traffic to an interior enterprise network

where IRC service is not allowed and • IRC conversations that follow certain syntax conventions that

humans don’t readily understand.

117

Observable Communication (2)Many botnets use dynamic DNS domain names to locate their C&C servers. Thus, abnormal DNS queries may also used to detect botnets. In some instances, hosts are found to query for improper domain names (e.g., cheese.dns4biz.org, butter.dns4biz.org) which can indicate a high probability that these hosts are compromised.

The next logical step in this methodology would be to attempt to glean the IP addresses of their C&C servers in observable traffic streams. If further detective work reveals that the IP address associated to a particular domain name keeps changing periodically, it can provide an even stronger indication the presence of a botnet.

118

Observable Communication (3)Moreover, botnets may exhibit additional network abnormalities that allow us to discover them.

One example would be a case in which bots are usually idle most of the time in a connection, and would response faster than a human being at the keyboard surfing the web. Yet another example would be a case of some sort of communication traffic originated by botnets is more "bursty" than normal traffic.

So, botnets can potentially be discovered by monitoring network traffic flow.

119

Observable Attacking TrafficThe traffic generated by botnets allows us to discover their presence.

For example, • When launching DDoS TCP SYN flood attacks, botnets can

send out a large number of invalid TCP SYN packets with fake source IP addresses. Therefore, if a network monitoring device finds a large number of outbound TCP SYN packets that have invalid source IP address (i.e., IP addresses that should not come from the internal network), it would indicate that some internal hosts may be compromised, and actively participating in a DDoS attack.

• Similarly, if an internal host is found to send out phishing e-mails, there is an indication that the host is infected by bots as well.

120

Host Based BehaviorBots compromise computers and hide their presence just like many older computer viruses. Therefore, they exhibit certain observable behaviors as viruses do at compromised hosts.

When executing, bots will make sequences of system/library calls, e.g.

• modifying system registries and system files • creating network connections• disabling antivirus programs

The sequences of system/library calls made by bots are often different from legitimate programs and applications.

121

Global Correlated BehaviorsPerhaps botnet behavior observed in a global snapshot is the most interesting one from the viewpoint of detection efficiency. Those global behavioral characteristics are often tied to the fundamental structures and mechanisms of botnets. Consequently, they are unlikely to change from botnet to botnet unless the structures and mechanisms of botnets themselves are redesigned and re-implemented. As a result, these globally observable behaviors are the most valuable to detect families of botnets.

122

Global Correlated Behaviors – DNS Traffic (1)

Many botnets use dynamic DNS entry to track their C&C servers. As a new C&C server is built, the related DNS entry will be updated to the IP address of the new C&C server. Therefore, bots will find the location of the new C&C server. Botmasters may herd their botnets to different C&C servers’ locations periodically to prevent detections. When a botmaster updates its dynamic DNS entry for C&C server:

there would be an observable global behavior on the Internet specifically, bots are disconnected from the old C&C server. So, they will query their DNS server for the new IP address of the domain name, resulting in an increase of DNS queries to this DNS entry globally.

123

Global Correlated Behaviors – DNS Traffic (2)

Therefore, if a network monitor discovers that a dynamic DNS entry is updated, which follows significant amount of DNS queries to this entry, then there is a high probability that this dynamic DNS domain name is being used by botnet C&C servers. Such a feature is unlikely to change whether a botnet is using IRC for communication or using HTTP for communication, unless the communication structure is changed.

1 the attack and defense of computers dr. 許 富 皓. 2 attacking program bugs

Documents

1 the attack and defense of computers dr. 許富皓. 2 attacking program bugs