hengha: data harvesting detection on hidden databases shiyuan wang, divyakant agrawal, amr el abbadi...
Post on 20-Dec-2015
217 views
TRANSCRIPT
HENGHA: DATA HARVESTING DETECTION ON HIDDEN
DATABASES
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi
University of California, Santa Barbara
CCSW 2010
Data Security Concern: Back-End Databases of Web-based Applications• Form-based query interfaces provide entrance to both
users and attackers.
• Traditional Attacks• Submit malicious requests to break in the hidden database
through vulnerable holes in the application, e.g. SQL injection [Vale05].
• Many can be detected by prior work.
10/8/2010 2
Data Security Concern: Back-End Databases of Web-based Applications
• Data Harvesting Attacks• Iteratively submit legitimate queries to extract data inventory or
infer sensitive aggregate information.
• E.g 1. A competitor of a car rental company A harvested A’s inventory about a popular car.
• E.g 2. Terrorists inferred that a flight was relatively empty and could be a hijacking target.
10/8/2010 3
Anatomy of Data Harvesting Attacks
• General strategy• Iteratively submit legitimate queries with valid fields, analyze the
results and then design new queries with the goal of maximizing information gain through limited #queries.
• Two types of harvesting attacks to consider• Crawling Attack
• Performed by deep web crawling [Madh08]
• Sampling Attack• Performed by uniform random sampling on results of sizes no more
than K [Dasg09]
10/8/2010 4
How To Defend Against Data Harvesting Attacks
• Database inference control [Denn83]?• Query set restriction is not effective, especially on sampling
attacks.• Query set restriction and data perturbation [Dasg09] hurt usability.
• Web robot detection [Tan02]?• Data harvesters can camouflage normal users’ http traffic patterns.
10/8/2010 5
Our Approach• Detection based on search behaviors within sessions
• Attackers’ search behaviors• Diversity
• Queries are not concentrated and localized, and they reflect very• distinct intents
• Broadness• The results of the queries cover a broad scope of the underlying data.
10/8/2010 6
HengHa: Detecting Data Harvesting Attacks at Single Session Level
• Identify data harvesting attackers by examining if their search behaviors in a session show relatively significant diversity and broadness.
• Diversity -> query correlation• Broadness -> result coverage
10/8/2010 7
Heng: query correlationobserver
Ha: resultcoveragemonitor
HengH
a
DETECTOR
Web
Application
DB
query
resultsuspicious
Queries in a Session That Plans Trip to Chicago
Heng: Query Correlation Observer
• Key idea• Frequent predicate value sets as indications of correlations
among queries
• Intuitively, if a session has more frequent predicate value sets with higher supports, and those predicate value sets are more similar to the queries, the queries in this session are more correlated.
10/8/2010 8
Ha: Result Coverage Monitor
• Key idea• Sort multi-attribute data D in a
total order, e.g. z-curve, that preserves locality.
• Create a coverage bit vector (CBV), where the bits correspond to the data in the total order.
• Access a data -> set a bit
• Training• Cluster CBVs to model
different data access patterns
x
y
0
1
2
3
10/8/2010 9
1110110001000000
Experiment• Extracted 98,564 real user query sessions and a data table of 387 records
from KDD Cup 2000 clickstream dataset
• Synthesized 1000 attack sessions [Madh08, Dasg09]
• Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS
• Performed four folds cross-validation
10/8/2010 10
Effectiveness of Detection in Four ValidationsEfficiency of Detection in Four Validation
Conclusion & Future Work• Identified non-traditional data harvesting attacks on the
back-end databases of web-based applications, i.e. crawling attack and sampling attack.
• Detection based on identifying attackers’ special search behaviors at single session level, diversity->query correlation observer, broadness->result coverage monitor.
• Detecting cross-session data harvesting attacks will be considered in the future work.
10/8/2010 11
References• [Vale05] F. Valeur et al. A learning-based approach to the
detection of sql attacks. In DIMVA, pages 123–140, 2005.• [Dasg09] A. Dasgupta et al. Privacy preservation of
aggregates in hidden databases: why and how? In SIGMOD, pages 153–164, 2009.
• [Madh08] J. Madhavan et al. Google’s deep web crawl. PVLDB, 1(2):1241–1252, 2008.
• [Tan02] P.-N. Tan et al. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9–35, 2002.
• [Denn83] D. E. Denning et al. Inference controls for statistical databases. Computer, 16(7):69–82, 1983.
10/8/2010 12
Thanks for Listening
10/8/2010 13