hengha: data harvesting detection on hidden databases shiyuan wang, divyakant agrawal, amr el abbadi...

13
HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

HENGHA: DATA HARVESTING DETECTION ON HIDDEN

DATABASES

Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi

University of California, Santa Barbara

CCSW 2010

Page 2: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Data Security Concern: Back-End Databases of Web-based Applications• Form-based query interfaces provide entrance to both

users and attackers.

• Traditional Attacks• Submit malicious requests to break in the hidden database

through vulnerable holes in the application, e.g. SQL injection [Vale05].

• Many can be detected by prior work.

10/8/2010 2

Page 3: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Data Security Concern: Back-End Databases of Web-based Applications

• Data Harvesting Attacks• Iteratively submit legitimate queries to extract data inventory or

infer sensitive aggregate information.

• E.g 1. A competitor of a car rental company A harvested A’s inventory about a popular car.

• E.g 2. Terrorists inferred that a flight was relatively empty and could be a hijacking target.

10/8/2010 3

Page 4: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Anatomy of Data Harvesting Attacks

• General strategy• Iteratively submit legitimate queries with valid fields, analyze the

results and then design new queries with the goal of maximizing information gain through limited #queries.

• Two types of harvesting attacks to consider• Crawling Attack

• Performed by deep web crawling [Madh08]

• Sampling Attack• Performed by uniform random sampling on results of sizes no more

than K [Dasg09]

10/8/2010 4

Page 5: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

How To Defend Against Data Harvesting Attacks

• Database inference control [Denn83]?• Query set restriction is not effective, especially on sampling

attacks.• Query set restriction and data perturbation [Dasg09] hurt usability.

• Web robot detection [Tan02]?• Data harvesters can camouflage normal users’ http traffic patterns.

10/8/2010 5

Page 6: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Our Approach• Detection based on search behaviors within sessions

• Attackers’ search behaviors• Diversity

• Queries are not concentrated and localized, and they reflect very• distinct intents

• Broadness• The results of the queries cover a broad scope of the underlying data.

10/8/2010 6

Page 7: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

HengHa: Detecting Data Harvesting Attacks at Single Session Level

• Identify data harvesting attackers by examining if their search behaviors in a session show relatively significant diversity and broadness.

• Diversity -> query correlation• Broadness -> result coverage

10/8/2010 7

Heng: query correlationobserver

Ha: resultcoveragemonitor

HengH

a

DETECTOR

Web

Application

DB

query

resultsuspicious

Page 8: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Queries in a Session That Plans Trip to Chicago

Heng: Query Correlation Observer

• Key idea• Frequent predicate value sets as indications of correlations

among queries

• Intuitively, if a session has more frequent predicate value sets with higher supports, and those predicate value sets are more similar to the queries, the queries in this session are more correlated.

10/8/2010 8

Page 9: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Ha: Result Coverage Monitor

• Key idea• Sort multi-attribute data D in a

total order, e.g. z-curve, that preserves locality.

• Create a coverage bit vector (CBV), where the bits correspond to the data in the total order.

• Access a data -> set a bit

• Training• Cluster CBVs to model

different data access patterns

x

y

0

1

2

3

10/8/2010 9

1110110001000000

Page 10: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Experiment• Extracted 98,564 real user query sessions and a data table of 387 records

from KDD Cup 2000 clickstream dataset

• Synthesized 1000 attack sessions [Madh08, Dasg09]

• Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS

• Performed four folds cross-validation

10/8/2010 10

Effectiveness of Detection in Four ValidationsEfficiency of Detection in Four Validation

Page 11: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Conclusion & Future Work• Identified non-traditional data harvesting attacks on the

back-end databases of web-based applications, i.e. crawling attack and sampling attack.

• Detection based on identifying attackers’ special search behaviors at single session level, diversity->query correlation observer, broadness->result coverage monitor.

• Detecting cross-session data harvesting attacks will be considered in the future work.

10/8/2010 11

Page 12: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

References• [Vale05] F. Valeur et al. A learning-based approach to the

detection of sql attacks. In DIMVA, pages 123–140, 2005.• [Dasg09] A. Dasgupta et al. Privacy preservation of

aggregates in hidden databases: why and how? In SIGMOD, pages 153–164, 2009.

• [Madh08] J. Madhavan et al. Google’s deep web crawl. PVLDB, 1(2):1241–1252, 2008.

• [Tan02] P.-N. Tan et al. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9–35, 2002.

• [Denn83] D. E. Denning et al. Inference controls for statistical databases. Computer, 16(7):69–82, 1983.

10/8/2010 12

Page 13: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Thanks for Listening

10/8/2010 13