1
CS 510 MALWARE
GHOST TURNS ZOMBIE: EXPLORING THE LIFE CYCLE
OF WEB-BASED MALWARE
MICHALIS POLYCHRONAKISPANAYIOTIS MAVROMMATIS
NIELS PROVOS
2
Introduction
• The underground Internet economy • Web-based malware• The system analyzing the post-infection network
behavior of web-based malware • How do malware’s behaviors taken together
provide a compelling perspective on the life cycle of web-based malware?
3
System Architecture
The goal of the system detect harmful URLs on the web
The brief overview of the overall system they used in their prior work machine learning techniques are used to find suspicious URLs
among a large number of web pages for verification in a virtual machine
The new extended system Responders
System Architecture
4
Over system architecture
oVirtual machine usedoObserved features:
• Links to known malware distribution sites
• Suspicious HTML element• The presence of code obfuscation.
oMachine learning system• Scores if the URL has a high score
oVerification results used to retrain the machine learning system
5
System Architecture
They extended the system improving verification components with light-weight responders
Providing fabricated responses for protocols such as SMTP, FTP and IRC
HTTP proxy is to record all HTTP requests and scan all HTTP responses
Generic responder is to hand off connections over nonstandard ports and identify connections that use unknown protocols
Responders
7
Life cycle of web-based malwareo Malware’s interaction with other hosts and
responders are organized into 3 categories:
1.Propagation
2.Data exfiltration
3.Remote controlo They analyzed the post-infection activity
and the result of these behaviors to find out the life cycle of web-based malware
8
Life cycle of web-based malware Data Set In 2 months virtual machine analyzed URLs from 5,756,000
unique host names and report on unique names At least one harmful URL in 307,000 hostnames %49 of these websites had URLs that resulted in HTTP
request initiated from process other than the web browser %5 of the sites had URLs that activated responder session The total number of responder sessions with transmitted data
is more than 448,000 They observed that malware made network connections
without transmitting data in many more cases
9
Life cycle of web-based malware Network characteristics
The destination ports of all outgoing connections from the virtual machine upon infection
10
Life cycle of web-based malware Network characteristics
They notified the number of unique hostnames for each port On these hosts at least one URL installs
malware that transmitted data to that port
More than 400 different destination ports were connected
This shows the diverse nature of malware’s post-
infection network behavior
11
The exact distribution of HTTP connections destined to nonstandard ports according to the destination port number
12
Life cycle of web-based malware Discovery and Propagation
Malwares usually scan for other vulnerable systems either in the same lan or on the internet to propagate
This figure shows the network protocol distribution used by malware
13
Life cycle of web-based malware Reporting Home
To observe this activity SMTP responders are employed to capture emails
Each email captured has a subject and body
14
TABLE 1Subject # MessagesXP Hacked 390ProRat [...] 162Vip Passw0rds 98Log file from ... 82Installation report 76Perfect Keylogger [...] 47Installation on XP succeeded 12E g y S p y KeyLogger [...] 12INFECTADO 6Mais 1: XP 3AVSXP 3C-h-e-c-k-i-n-g:XP 2...:Noticia quentinha de:... XP 2
Table 1 shows that the most common email subjects
SMTP Server # Messagesyahoo.com 436google.com 118tvm.com.tr 98aol.com 82hotmail.com 19outblaze.com 8globo.com 6
Life cycle of web-based malware Reporting Home
Table 2 above shows that the common SMTP servers used by malware to send installation
reports
15
Life cycle of web-based malware Reporting Home
GET /geturl.php?version=1.1.2&fid=7493&mac=00-00-00-00-00-
00&lversion=&wversion=&day=0&name=dodolook&recent=0
HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; )
Host: loader.51edm.net:1207
Cache-Control: no-cache
The HHTP protocol is also used to report successful installations back to malware authors
The trojan example:
16
Life cycle of web-based malware Reporting Home
Malware also reported infections using a custom XML-like format
HGZ5.<FT>2008-01-28 12:55:30</FT><IM>80</IM><GR>_&</GR>
<SYS>Windows XP 5.1</SYS>
<NE>XP</NE><pid>488</PID><VER>Ver1.22-0624</VER>
<BZ></BZ><P>1</P><V>0</V><IP>0.0.0.0</IP>
000......<LC></LC><GR>-</GR><IM>25</IM><NA>XP</NA>
<CS>English (United States)</CS><OS>Windows XP</OS>
<MEM>1024MB</MEM><CPU>2200 MHz</CPU>
<NET>LAN</NET><video>0</video><BZ>-</BZ>
17
Life cycle of web-based malware Data exfiltration
There are indications of data exfiltration in responder sessions such as browser history files and stored passwords
o In their observation, they found some emails that send back stored password from a compromised machine
o HTTP is also used for sending sensitive information back to data collection servers (notice the large number of POST requests on the graph on slide #11)
18
Life cycle of web-based malware Data exfiltration
In 2 days, one server had 4,729 files including more than 250,000 valid email addresses
They found more sensitive information in extensive logs continuously uploaded by malware
Logs have victim’s IP address, DNS server, gateway,
MAC address, username, URL, intercepted form and
password fields of HTTP request
o In 250MB logs, 500 usernames and passwords were found for over 250 web sites such as banking site, google.com, yahoo.com, etc.
19
Life cycle of web-based malware Joining Botnets
Botnets They encountered 2 types of botnets in their
work:
1.IRC Botnets
2.HTTP Botnets
20
Life cycle of web-based malware IRC Botnets
IRC and C&C communication IRC sessions to 90 servers were observed using
1587 different nicknames in 95 channels
21
Life cycle of web-based malware IRC Botnets
Some malwares use regular nicknames and channels, but some of them use artificial nicknames such as
[0]USA|XP[P]152102 or Inject-2l087876
22
Life cycle of web-based malware HTTP Botnets
Organize large-scale spam campaigns To participate in spam campaigns each bot
repeatedly downloaded ZIP-archives with instructions using HTTP requests
Each response has a ZIP-archive with instructions on how to participate in spam campaigns
23
Life cycle of web-based malware HTTP Botnets Some example instructions: 000_data22 - a list of domains and their authoritative name severs used
to form the sender's email address 001_ncommall - a list of common first names used as part of the sender's
email address 002_otkogo_r - a list of possible ``from'' names related to the subject of
the spam campaign 003_subj_rep - a list of possible email subjects, 004_outlook - the template of the spam email, config - a configuration file that instructs the bot how to construct emails
from the data files, how many emails to sent in total, and how many connections are allowed at a given time,
message - the message body of the spam campaign, mlist - a list of email addresses to which to send the spam, andmxdata - a binary file containing information about the mail-exchange
servers for the email addresses in mlist
24
Life cycle of web-based malware HTTP Botnets
Top domains out of 700,000 email addresses collected from a spam-sending botnet.Email Domain Frequencyyahoo.com 28899sbcglobal.net 14417yahoo.co.uk 8939shaw.ca 8321hotmail.com 6985korea.com 6041yahoo.co.jp 5215striker.ottawa.on.ca 4415web.de 4276yahoo.co.in 4200
o The most frequent domains captured in an hour didn’t entirely overlap with the larger data set