copyright 2008. beyond intrusion prevention and detection – intrusion tolerance arun sood george...
TRANSCRIPT
Copyright 2008.
Beyond Intrusion Prevention and Detection – Intrusion Tolerance
Arun Sood
George Mason UniversityInternational Cyber Center and Department of Computer Science
http://cs.gmu.edu/~asood/scit703.347.4494
Research supported by contracts from Lockheed Martin and Virginia Center for Innovative Technologies (partner: Northrop Grumman)
ICC supported by Lockheed Martin and Booz Allen Hamilton
Cyber Security and Global Affairs Workshop supported by Office of Naval Research
Copyright 2009 slide 2
285 million records compromised in 2008 [Verizon] Average per-incident data breach costs in 2007
were $6.3 million [Ponemon Institute]
Enterprise Server Firewall Hacker (Actual Photo)
Introducing a new paradigm for server security—Intrusion Tolerance
The Problem
Copyright 2009 slide 3
Intrusion Tolerance allows malware and hackers into a server…
Enterprise Server Firewall Hacker (Actual Photo)
…but uses virtualization to restore the OS and application to a pristine state after attack!
SCIT Virtual Server
SC
IT V
irtual P
artitio
n
Every 55 seconds SCIT software cleans and restores the virtual server to its pristine state
The SCIT Solution
Copyright 2009 slide 4
Multi-National Security Breach
http://news.bbc.co.uk/2/hi/technology/7118452.stm “A huge campaign to poison web searches and trick people
into visiting malicious websites has been thwarted.” If a user searched Google for terms such as
• "hospice", "cotton gin and its effect on slavery", "infinity" and many more
• The first result pointed to a website from which malicious software was downloaded and embedded on user system.
Criminals in country A created domains that were mostly bought by companies in country B and hosted in country C. Tens of thousands of domains were used.
These domains tricked the indexing strategy of Google to believe that these web pages were good and reliable source of information.
Targeted and organized attacks.
Copyright 2009 slide 5
Securing Servers
Servers and endpoints have to be protected Verizon Data Breach Investigation shows that 99% of the
compromised records were from servers• A key step in these attacks was the installation of customized
malware, which cannot be detected by current systems
Current protection can take place at the network level and for important asset protection at the host level• Intrusion Prevention Systems including Firewalls• Intrusion Detection Systems: statistical, anomaly and behavior
based• White list and Black lists: IP addresses and software• Intrusion Tolerance – intrusions will happen, focus on minimizing
losseshttp://www.verizonbusiness.com/resources/security/reports/2009_databreach_rp.pdf
Cross Sector Cyber Threats Strategy
5
Copyright 2009 slide 6
Multi layered Approach to Security
IPS depend on inspection of incoming packets IDS depend on inspection of incoming and outgoing
packets• With increasing bandwidth and more matching
requirements, the cycles devoted to packet inspection will keep increasing
Threat independent approaches are needed for protection Other approaches should be included in the mix, including
approaches that do not rely on packet inspection and have potential for threat independent performance:• White list of software• Time dependent intrusion tolerance
Cross Sector Cyber Threats Strategy
6
Copyright 2009 slide 7
Key Intrusion Tolerance Approaches
SITAR MAFTIA SCIT
Detection Based Structure Based Time Dependent
Payload Inspection Yes No No
Voting Algorithm Yes, used to detect faulty replica and survive attacks
Yes, used to detect faulty replica and survive attacks.
No
Deterministic No No Yes
Performance Impact
Impact on response time.
Impact on response time.
Some impact on computing cycles for starting a new server instance.
Execution of ITS algorithm
In Application Data Flow In Application Data Flow Out-of-band
Diversity Required Required Optional, but diversity will make scheme more robust
Recovery Adaptive recovery performed upon detecting intrusion detection.
Performed upon detecting intrusion. Faulty replica recovered according to healthy ones.
Periodic recovery performed by Controller, based on master copy.
Copyright 2008.
Self Cleansing Intrusion Tolerance
Next Generation Server Security Technology
Infrastructure Servers Including those in DMZ
Short Transactions
Reduce Exposure Time
Copyright 2009 slide 9
Intrusion Tolerance
Introducing SCIT, the Intrusion Tolerance System• Optimizes application-specific exposure windows (AEW)
Targets “overexposed” applications (transactions)• Servers are sitting ducks• Focus initially on Websites, DNS, Single Sign On • Ongoing R&D Authentication (LDAP), Firewall • Not targeted at applications with inherently long transaction times (FTP, VPN, etc)
Leverages virtualization technology to reduce intrusion risk and costs• Reduces exposure time to limit intrusion losses• Adds time-based exposure control to intrusion prevention and detection solutions
• SCIT is based on a new paradigm, but is easy to integrate with existing systems
• New level of “Day-Zero” protection
Increases security through real-time server rotation and cleansing:• Enhances security of high availability systems • Enables more flexible patch scheduling
Copyright 2009 slide 10
SCIT Software
SCIT deploys on existing servers - does not require additional physical servers
SCIT is cost effective, uses virtualization technology and increases system security
SCIT does not interfere with existing IPS and IDS solutions
SCIT is an additional layer of defense
Copyright 2009 slide 11
Anatomy of an Hack
Foot print analysisWho is
NSLookupSearch Engines
Enumeration
ScanningMachines
PortsApplications
ExploitationBuffer Overflow
SpoofingPassword
DOS
Damage“Owning” IP Theft, Blackmail, Graffiti,
EspoinageDestruction
Analyze publicly available info. Set scope of attack and identify key targets
Check for vulnerabilities on each target Attack targets using
library of tools and techniquesFoot print analysis
Who isNSLookup
Search EnginesEnumeration
Automated ScanningMachines
PortsApplications Deliver Payload
Custom TrojanRootkit
Damage“Owning” IP Theft, Blackmail, Graffiti,
EspoinageDestruction
Attack targets using installed software
Richard Stiennon, May 2006, http://blogs.zdnet.com/threatchaos/?p=330
Manual A
pproachA
utomated A
pproach
Identify Target Install Malicious Code Hack Other Machines Take over Domain Controller
Copyright 2009 slide 12
How Does SCIT Provide Additional Security? SCIT servers
• Regularly restored to a known state and remove malicious software installed by attackers.
• Provide protection while manufacturer is developing a patch, i.e. SCIT servers are protected in the time period between vulnerability detection and patch distribution.
• Gives data center managers an additional level of freedom in developing a systematic plan for patch management.
SCIT DNS servers • Domain name / IP address mapping is protected from malicious alteration,
thus avoiding improper redirection of the traffic.
SCIT Web servers • Protect the corporate crown jewels, front ends for sensitive information, e.g.
customer or employee data sets, IP, and informational web sites. • Regularly restores the sites to known states, and makes it difficult for
intruders to undertake harmful acts such as deleting files. • Avoid long term defacements.• Reduces the risk of large scale data ex-filtration.
Copyright 2009 slide 13
Comparison of IDS, IPS, IT
Issue Firewall, IDS, IPS Intrusion tolerance
Risk management. Reactive. Proactive.
A priori information required.
Attack models. Software vulnerabilities. Reaction rules.
Exposure time selection. Length of longest transaction.
Protection approach. Prevent all intrusions. Impossible to achieve.
Limit losses.
System Administrator workload.
High. Manage reaction rules. Manage false alarms.
Less. No false alarms generated.
Design metric. Unspecified. Exposure time: Deterministic.
Packet/Data stream monitoring.
Required. Not required.
Higher traffic volume requires.
More computations. Computation volume unchanged.
Applying patches. Must be applied immediately.
Can be planned.
Copyright 2009 slide 14
Server RotationsExample: 5 online and 3 offline servers
Server Rotation
Offlineservers; inself-cleansing
Online servers;potentiallycompromised
Servers-Virtual-Physical
Copyright 2009 slide 15
Server RotationsExample: 5 online and 3 offline servers
Server Rotation
Offlineservers; inself-cleansing
Online servers;potentiallycompromised
Servers-Virtual-Physical
Copyright 2009 slide 16
Server RotationsExample: 5 online and 3 offline servers
Server Rotation
Offlineservers; inself-cleansing
Online servers;potentiallycompromised
Servers-Virtual-Physical
Copyright 2009 slide 17
Server State Transitions
Copyright 2009 slide 18
Intrusion Tolerance
Increase security by reducing exposure window• Exposure window is the time a server is online between rotations
Optimizes application-specific exposure windows to servers Decreasing available time for intrusion, reduces potential
lossesLoss Curve
Intruder Res idence Tim e
Lo
ss
T
T
Co
st
Copyright 2009 slide 19
Target Applications
• E-Commerce payments – long session of multiple short transactions
• Streaming media
• VPN• Complex Database Queries• Back end processing
Tra
nsa
ctio
n L
eng
thLo
ng
S
hort
Low HighValue for Exposure Window Management
• Web servers• DNS services• Single Sign On• Firewalls• Authentication (LDAP)• Transaction Processors
• File Transfer (size dependent)
Copyright 2009 slide 20
Exposure Time Reductions
Application Current Server SCIT Server
Websites – Windows Server
1 day to 3 month 60 seconds
Websites – UNIX Server 1 month to 6 months 60 seconds
DNS services – Linux Server
3 months to 1 year 30 seconds
In the following slides we show that:
Reducing Exposure Time Significantly Reduces Expected Loss
Copyright 2009 slide 21
Security Risk Assessment
Threat Probability
Criticality Factor
Effort Required
Risk Factor(Criticality/Effort)
Threat Level(Threat Probability x Risk Factor)
Vulnerability Factor
Asset Priotiy
Impact (Loss Factor)
Exposure Factor (EF)(Threat Level x Impact)
Single Loss Expectancy(SLE)
Annual Loss Expectancy(ALE)
x Asset Value (AV)
x Annual Rate of Occurrence (ARO)
Follows SecurityFocus.com (Symantec), Microsoft
Copyright 2009 slide 22
SCIT vs Traditional Cumm Single Loss Expectancy
$0
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
$70,000
$80,000
SCIT Exposure Time
Reducing Exposure Time Significantly Reduces Expected Loss
Multi Tier Architecture
Web serverDNS server
Content ManagerDatabase server
Copyright 2009 slide 23
Avoidance is Better Than Cleaning
You cannot clean a compromised system by • patching it. • removing the back doors. • using some vulnerability remover. • using a virus scanner. • reinstalling the operating system over the existing installation.
You cannot trust • any data copied from a compromised system. • the event logs on a compromised system. • your latest backup.
The only proper way to clean a compromised system is to flatten and rebuild.
CLEANING COMPROMISED SYSTEMS IS DIFFICULT. IT IS BETTER TO AVOID HACKING.
Copyright 2009 slide 24
Sample Requirements Met by SCIT Servers
Web site should not be defaced longer than 1 minute
DNS tables should be restored within 1 minute Security architecture should reduce data ex-
filtration – SCIT server along with IDS will reduce the volume of data that can be maliciously retrieved
To ensure clean servers, remove malware every minute
Use diversity to change the face of the webserver every minute
Copyright 2009 slide 25
Performance & Functionality Stress Tests
Workload: number of user sessions/minute (50,100,125) User session:
• Series of request and response from server
• Select item from drop down list and add it to persistent storage
OpenSTA is used to generate workload• 3 runs per case.
Duration of run = 3 * Exposure time for the run• each VM is tested at least once
Workload consists of N requests every 10 secs. Exposure times of 2,3 and 4 minutes, No Rotation Stand alone web server for Non-SCIT test.
Copyright 2009 slide 26
Performance Test Results
SCIT Server Environment
• Entry Level DELL System•Dual processor – 4 cores each•Memory: 4 GB
• Slackware OS• Apache, Tomcat,
Shopping Cart (Java)
Exp Time (minutes)
User Sessions
Avg. Response Time (secs)
STD Dev
2 m 50 6.16 0.07
2 m 100 6.24 0.01
2 m 125 6.27 0.02
3 m 50 6.10 0.04
3 m 100 6.15 0.02
3 m 125 6.31 0.05
4 m 50 6.08 0.04
4 m 100 6.15 0.02
4 m 125 6.14 0.02
No Rotation 50 6.03 0.01
No Rotation 100 6.03 0.00
No Rotation 125 6.04 0.00
Copyright 2009 slide 27
Response time with Rotation
5.85.9
66.16.26.36.4
2 m
ins
3 m
ins
4 m
ins
NR
Sta
ndal
one
2 m
ins
3 m
ins
4 m
ins
NR
Sta
ndal
one
2 m
ins
3 m
ins
4 m
ins
NR
Sta
ndal
one
50 50 50 50 50 100 100 100 100 100 125 125 125 125 125
User Sessions (Minutes)
Res
pons
e Ti
me
(s)
Run1 Run2 Run3
Response Times for Different Exposure Times
Copyright 2009 slide 28
Preliminary Performance Data
•Each user session includes a series of requests and responses. •Average “think” time = 2 seconds between requests. •Each session involves selecting an item from a drop down list and adding
it into the persistent storage. Repeated 3 times.
DEPEND 2009, June 09
Response Time (secs) vs Users (per min)
5.855.9
5.956
6.056.1
6.156.2
6.256.3
6.35
50 100 125
Users per min
Res
po
nse
Tim
e (s
ecs)
2 mins
3 mins
4 mins
NR
Standalone
Copyright 2009 slide 29
SCIT Parameters
Copyright 2009 slide 29
Active window Wo: server accepts requests from the network
Grace period Wg: server stops accepting new requests and fulfills outstanding requests in its queue.
Exposure window: W = Wo + Wg.
Ntotal : total nodes in the cluster.
Ntotal, W, and the cleansing-time Tcleansing are inter-related.
Copyright 2009 slide 30
SCIT State Transition Diagram
Simple diagram. Pa: probability of successful attack. Pc: probability of cleansing when in A. F: low chance of occurrence, but still possible:
• Virtual machine and/or the host machine no longer responds to the Controller.
• Controller itself fails due to a hardware fault.
Copyright 2009 slide 30
G: GoodV: VulnerableA: Successful AttackF: Failed
G
V
A
F
Cleansing
G V A F
G 0 1 0 0
V 1–Pa 0 Pa 0
A Pc 0 0 1-Pc
F 0 0 0 1
Copyright 2009 slide 31
MTTSF and W
W ↓ → (Pa ≤ 1 - e-λW) ↓
W ↓ → (Pc ≥ e-λW) ↑
Then: W ↓ → MTTSFscit ↑
MTTSFSCIT ≥ F(W), where F(W) is a decreasing function of W:
Significance: engineer instance of SCIT architecture by tuning W in order to increase or decrease the value of MTTSFSCIT.
Copyright 2009 slide 31
)1(
h1(
h h
F(W)W
2W
10
)
e
e
Copyright 2009 slide 32
MTTSF and Grace Period
Grace period used by Controller to issue cleansing mode signal.
Noutstanding : average # of outstanding requests in the queue when the server enters the grace period.
Entire incoming traffic [ Poisson(α). It is known: λ = k.α, with k ≤ 1. Noutstanding ≤ α Wo.
S: service rate in terms of number of serviced requests per unit time:
• Wg = Noutstanding /S ≤ (α Wo ) / S
Since α/S < 1, estimate for grace period: Wg < Wo .
Then: control MTTSFSCIT by online window Wo
Copyright 2009 slide 32
Copyright 2009 slide 33
SCIT Research – Status & Future
On-going for 6 years Supported by Army, NIST/CIPP, SUN,
VA -CIT/NG, Lockheed Demos for SCIT Web, SSO, DNS
• Presentation, persistent, session Testing at corporate labs
• Mostly security driven Basic concept has been validated
• Malware automatically deleted every rotation cycle (about 1 minute)
• Automatic recovery from defacement every minute
• Ecommerce demo ( open source pet store software)
• No packet inspection: increased bandwidth does not impact performance
• Exposure time based - no false alarms generated
• Reduce managed services cost cs.gmu.edu/~asood/scit
• Pubs + 1 patent approved + 3 patents applied
• Video of demo on YouTube, MyDeo
Scalability for enterprise environments• Simulate a large server farm (v0)• Sys admin interface for managing
multiple servers (v0)• Software tools to support deployment
Randomized defense strategies• Add diversity to SCIT
• Memory image, application, OS• Add IP hopping to SCIT
• Distributed redundancy to achieve resilience
DNSSEC requirements
Extend to other apps / functions • DNS cache poisoning, • DNSSEC key protection, • email, LDAP, long duration
transactions Virtualization security: cloud
• Multifunction virtualized platform• Performance degradation
How to reduce exposure time Improve performance of SCIT
• Multiple cores (16 +); large memory Audit logs, monitoring, feedback
2009
p
lan
sF
utu
re
Cu
rren
t S
tatu
s
Copyright 2009 slide 34
Observations and Thoughts
Specifying security without a time framework is very hard
It is easier to assess risk for proactive systems as compared to reactive systems
Threat independent protection is critical• Protect while patches are being developed
We need easy to understand metrics and / or benchmarks• SCIT makes it harder for intruder, but how much harder?• Cost vs hardening assessment
Copyright 2009 slide 35
Conclusion
SCIT significantly reduces risk levels for targeted application using virtualization technology
Augments existing IPS and IDS solutions – another layer of defense – no interference
Completed SCIT web server and SSO server, SCIT DNS server in Q4
Research issues: long duration transactions, randomized defensive strategies, scalability, functionality under load, “penetration” testing, other servers (e.g. email)
Copyright 2009 slide 36
SCIT Publications + Contact Info
SCIT technical publications Links to media reports Links to demo videos
http://cs.gmu.edu/~asood/scit
Questions?
Arun Sood
Copyright 2009 slide 37
Questions
SCIT goal is to make it harder for an intruder to do damage. We need a way to say that by having an exposure time of X the task will become Y times harder. Are there ways of assessing this without the use of red teams?
What is a good enough exposure time? What metrics and benchmarks are more meaningful to decision makers?
Given limited knowledge of future attack methodologies, how does one justify a multi-layered security architecture?
Can SCIT simplify the constraints on IDS and thus reduce false alarms?