camp: content-agnostic malware protection
DESCRIPTION
20 th Annual Network & Distributed System Security Symposium (NDSS 2013). CAMP: Content-Agnostic Malware Protection. Niels Provos , Moheeb Abu Rajab, Lucas Ballard, Noe Lutz and Panayiotis Mavrommatis Google Inc. 左昌國 2013/04/01 Seminar @ ADLab , NCU-CSIE . X-agnostic - PowerPoint PPT PresentationTRANSCRIPT
CAMP: CONTENT-AGNOSTIC MALWARE PROTECTIONNiels Provos, Moheeb Abu Rajab, Lucas Ballard, Noe Lutz and Panayiotis Mavrommatis
Google Inc.
20th Annual Network & Distributed System Security Symposium(NDSS 2013)
左昌國2013/04/01 Seminar @ ADLab, NCU-CSIE
• X-agnostic• Without the knowledge of X
• Content-agnostic malware protection• The protection operates without the knowledge of the malware
content
2
• Introduction• Related Work• System Architecture• Reputation System• Evaluation• Conclusion
Outline
3
• Malware distribution through web browsers• Drive-by Downloads
• I will not talk about it in this paper• Social Engineering
• Fake Anti-Virus • The defense?
• Blacklists / Whitelists• Signature-based solution
• CAMP• Reputation system• Low false positive
Introduction
4
• Content-based Detection• Anti-virus software• CloudAV
• Blacklist-based Protection• Google Safe Browsing API• McAfee Site Advisor• Symantec Safe Web
• Whitelist-based Schemes• Bit9• CoreTrace
• Reputation-based Detection• SNARE• Notos and EXPOSURE• Microsoft SmartScreen
Related Work
5
System Architecture
6
Client Server
System Architecture – Binary Analysis
7
• Producing labels (benign or malicious) for training purpose
• To classify binaries based on static and dynamic analysis• The labels are also used to decide thresholds• Goal: low false positive
System Architecture – Binary Analysis
8
System Architecture – Client
9
• Doing local checks before asking the server for decision1. In blacklists?
Google Safe Browsing API2. Potentially harmful?
e.g. DMG files in Mac OS X3. In whitelists?
Trusted domains and trusted signing certificates• If no results in the local decision
• Extracting features from the downloaded binary• Final download URL / IP address• Referrer URL / (corresponding) IP address• Size / hash• Signature
• Sending the features to the server
System Architecture – Client
10
• The returned decision
System Architecture – Client
11
• ~70% of all downloads are considered benign due to policy or matching client-side whitelists
• (on server side) Regularly analyzing binaries hosted on the trusted domains or signed by trusted signers
System Architecture – Client
12
System Architecture – Client
13
System Architecture – Server
14
• The server receives the client request and renders a reputation verdict
• The server uses the information to update its reputation data
• BigTable and MapReduce
System Architecture – Server
15
System Architecture – Frontend and Data Storage
16
• Frontend• RPC to reputation system
• URL as index?• Popular URLs timestamp(request to the URL) : Reverse-Ordered hexadecimal string
System Architecture – Frontend and Data Storage
17
System Architecture – Spam Filtering
18
• Velocity controls on the user IP address• The spam filter is employed to fetch binaries from the web
that have not been analyzed by the binary classifier• Filter: only binaries that exhibit sufficient diversity of context • The analysis may complete a long time after a reputation decision
was made
System Architecture – Spam Filtering
19
System Architecture – Aggregator
20
• Aggregate• Forming the reputation data• 3-dimensional index
• From where• Features• Categories: reputation / urls / hash
• client | site:foo.com | reputation (6, 10)• analysis | ip:1.2.3.4/24 | urls (0, 3)
• Value• (a, b)• a: the number of interesting observations• b: the total number of observations
System Architecture – Aggregator
21
• Feature Extraction• IP address: single or netblock• URL: direct download or host/domain/site• Sign/Hash
Reputation System
22
Reputation System – Decision
23
• Threshold• Thresholds are chosen according to the precision and recall for
each AND gate• Precision and recall are determined from a labeled training set
• Training set: matching (hash from requests) with (hash from binary analysis)
• Binary analysis provides the label (benign or malicious)• Request provides the features• 4000 benign requests / 1000 malicious requests
• Precision and recall• http://en.wikipedia.org/wiki/Precision_and_recall
Reputation System – Decision
24
Reputation System – Decision
25
• Google Chrome• Targeting Windows executables
• Accuracy of Binary Analysis• Compared against VirusTotal• 2200 samples selected
• 1100 were labeled clean by binary analysis component• 1100 were labeled malicious
• Submitting to VirusTotal and waiting for 10 days• 99% of the malicious labeled binaries were flagged by 20%+ of AV
engines on VirusTotal• 12% of the clean labeled binaries were flagged 20%+ of AV engines on
VirusTotal
Evaluation
26
• Feb. 2012 ~ July 2012• Total 200 million users • Each day, 8~10 million request
• 200~300 thousand labeled as malicious• Total 3.2 billion aggregates
• , , , • Overall accuracy
Evaluation – Accuracy of CAMP
27
Evaluation – Accuracy of CAMP
28
Evaluation – Accuracy of CAMP
29
Evaluation – Accuracy of CAMP
30
Evaluation – Accuracy of CAMP
31
• A random sample of 10,000 binaries labeled as benign• 8,400 binaries labeled as malicious
Evaluation – Comparison to other systems
32
Evaluation – Comparison to other systems
33
Evaluation – Comparison to other systems
34
Evaluation – Case Study
35
• This paper presents a content-agnostic malware protection system, CAMP
• This paper performed a large scale of evaluation, and show that the detection approach is both accurate and good performance(processing requests in less 130ms)
Conclusion
36