distributing data for secure data services vignesh ganapathy, dilys thomas, tomas feder, hector...

32
Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford, TRDDC, TRUST

Upload: helen-cooper

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Distributing Data for Secure Data Services

Vignesh Ganapathy, Dilys Thomas, Tomas Feder,

Hector Garcia Molina, Rajeev MotwaniApril 8th, 2011

Stanford, TRDDC, TRUST

Page 2: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

RoadMap

Motivation for Secure Databases

Column level distribution

Encryption, Distribution

Privacy constraints

Set cover initialization

Query Mediation

Cost estimation

Where and Select clause processing

Query decomposition

Experiments

Related Work

Page 3: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

HealthPersonal medical details

Disease history

Clinical research dataBanking

Bank statement

Loan Details

Transaction history

FinancePortfolio information

Credit history

Transaction records

Investment details

InsuranceClaims records

Accident history

Policy details

OutsourcingCustomer data for testing

Remote DB Administration

BPO & KPORetail BusinessInventory records

Individual credit card details

Audits

ManufacturingProcess details

Blueprints

Production data

Govt. AgenciesCensus records

Economic surveys

Hospital Records

Motivation 1: Data Privacy in Enterprises

Page 4: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Motivation 2: Government Regulations

Country Privacy Legislation

Australia Privacy Amendment Act of 2000

European Union Personal Data Protection Directive 1998

Hong Kong Personal Data (Privacy) Ordinance of 1995

United Kingdom Data Protection Act of 1998

United States Security Breach Information Act (S.B. 1386) of 2002

Gramm-Leach-Bliley Act of 1999

Health Insurance Portability and Accountability Act of 1996

Page 5: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Motivation 3: Personal Information

EmailsSearches on Google/YahooProfiles on Social Networking sitesPasswords / Credit Card / Personal information at multiple E-

commerce sites / OrganizationsDocuments on the Computer / Network

Page 6: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Losses due to Lack of Privacy: ID-Theft

• 3% of households in the US affected by ID-Theft

• US $5-50B losses/year

• UK £1.7B losses/year

• AUS $1-4B losses/year

Page 7: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Data Privacy

Value disclosure: What is the value of attribute salary of person X

Perturbation

Privacy Preserving OLAP

Identity disclosure: Whether an individual is present in the database table

Randomization, K-Anonymity etc.

Data for Outsourcing / Research

Linkage disclosure: Linking columns from multiple sites

Page 8: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

RoadMap

Motivation for Secure Databases

Column level distribution

Encryption, Distribution

Privacy constraints

Set cover initialization

Query Mediation

Cost estimation

Where and Select clause processing

Query decomposition

Experiments

Related Work

Page 9: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Masketeer: A tool for data privacy

Lodha, Patwardhan, Roy, Sundaram etal.

Page 10: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Two Can Keep a Secret: A Distributed Architecture for Secure Database Services

Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi,

Motwani, Srivastava, Thomas, Xu

CIDR 2005

How to distribute data across multiple sites for (1)redundancy and(2) privacy so that a singlesite being compromised does not lead to data loss

Page 11: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Motivation

• Data outsourcing growing in popularity– Cheap, reliable data storage and management

• 1TB $399 < $0.5 per GB• $5000 – Oracle 10g / SQL Server• $68k/year DBAdmin

• Privacy concerns looming ever larger– High-profile thefts (often insiders)

• UCLA lost 900k records• Berkeley lost laptop with sensitive information• Acxiom, JP Morgan, Choicepoint• www.privacyrights.org

Page 12: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Present solutions

Application level: Salesforce.com

On-Demand Customer Relationship Management

$65/User/Month ---- $995 / 5 Users / 1 Year

Amazon Elastic Compute Cloud

1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth

Elastic, Completely controlled, Reliable, Secure

$0.10 per instance hour

$0.20 per GB of data in/out of Amazon

$0.15 per GB-Month of Amazon S3 storage used

Google Apps for your domain

Small businesses, Enterprise, School, Family or Group

Page 13: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Encryption Based Solution

EncryptClient DSP

Client-side

Processor

Query Q Q’

“Relevant Data”

Answer

Problem: Q’ “SELECT *”

Page 14: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

The Power of Two

Client DSP1

DSP2

Page 15: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

The Power of Two

DSP1

DSP2

Client-side

Processor

Query QQ1

Q2

Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)

Page 16: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

SB1386 Privacy

{ Name, SSN},

{ Name, LicenceNo}

{ Name, CaliforniaID}

{ Name, AccountNumber}

{ Name, CreditCardNo, SecurityCode}

are all to be kept private.

A set is private if at least one of its elements is “hidden”.

Element in encrypted form ok

Page 17: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Techniques

Vertical FragmentationPartition attributes across R1 and R2E.g., to obey constraint {Name, SSN}, R1 Name, R2 SSNUse tuple IDs for reassembly. R = R1 JOIN R2

EncodingOne-time Pad

For each value v, construct random bit seq. rR1 v XOR r, R2 r

Deterministic EncryptionR1 EK (v) R2 K Can detect equality and push selections with equality predicate

Random additionR1 v+r , R2 rCan push aggregate SUM

Page 18: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Example

An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}

Privacy Constraints

{Telephone}, {Email}

{Name, Salary}, {Name, Position}, {Name, DoB}

{DoB, Gender, ZipCode}

{Position, Salary}, {Salary, DoB}

Will use just Vertical Fragmentation and Encoding.

Decomposed Schema

R1:{TID, Name, Email, Telephone, Gender, Salary}

R2:{TID, Name, Email, Telephone, DoB, Position,ZipCode}

Encrypted Attributes E: {Telephone, Email, Name}

Page 19: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Partitioning, Execution

• Partitioning Problem– Partition to minimize communication cost for

given workload– Even simplified version hard to

approximate– Hill Climbing algorithm after starting with

weighted set cover

• Query Reformulation and Execution– Consider only centralized plans– Algorithm to partition select and where clause

predicates between the two partitions

Page 20: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Set Cover+ Greedy for partitioning

Page 21: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

RoadMap

Motivation for Secure Databases

Column level distribution

Encryption, Distribution

Privacy constraints

Set cover initialization

Query Mediation

Cost estimation

Where and Select clause processing

Query decomposition

Experiments

Related Work

Page 22: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Cost Estimation

Page 23: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

State Definitions

• 0: condition clause cannot be pushed to either servers• 1: condition clause can be pushed to Server 1• 2: condition clause can be pushed to Server 2 • 3: condition clause can be pushed to both servers• 4: condition clause can be pushed to either servers

Page 24: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

OR State Evaluation

Page 25: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

AND State Evaluation

Page 26: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Query Partitioning

• Query 1:

SELECT TID, name, salary

FROM R1

WHERE Name=’Tom’

• Query 2:

SELECT TID, dob, zipcode

FROM R2

WHERE Position=’Staff’

Original Query

SELECT Name, DoB, Salary

FROM R WHERE

(Name =’Tom’ AND Position=’Staff’) AND

(Zipcode =’94305’ OR Salary > 60000)

R1:R1:{TID, Name, Email, Telephone,Gender, Salary}

R2:{TID, Name, Email, Telephone, DoB, Position,Zipcode}

Page 27: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Distributed Query Plan

Page 28: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

RoadMap

Motivation for Secure Databases

Column level distribution

Encryption, Distribution

Privacy constraints

Set cover initialization

Query Mediation

Cost estimation

Where and Select clause processing

Query decomposition

Experiments

Related Work

Page 29: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Perfomance Gain Experiment

Page 30: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Iterations Vs Privacy Constraints

Page 31: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

Acknowledgements: Collaborators

Stanford Privacy Group

TRDDC Privacy Group

PORTIA, TRUST, Google

Page 32: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,

March 18, 2011

Back Up slides