secure database system. introduction database-as-a-service is gaining popularity – amazon...

Post on 11-Jan-2016

221 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Secure Database System

Introduction

• Database-as-a-Service is gaining popularity– Amazon Relational Database Service (RDS)– Microsoft SQL Azure

DB

Service provider (SP)

User

Query Query

AnswerAnswer

Security concerns

• Security now relies on SP solely– Will SP observe sensitive data of user and use the

data without authorization?– Will SP perform security measures as strict as its

own data?• User can also use encryption to protect its

own data, but then– How queries can be computed on encrypted

database?

Encrypt-Decrypt-Query (EDQ) model

• Baseline solution (but can handle all queries)

DB

Service provider (SP)User

Query

Answer

DB

DB

Weakness of EDQ model

• All query computation is done indeed by the user– High processing cost (due to decryption of large

portion of the database)– High communication cost

• SP has actually nothing to do, but just acts as a remote storage without processing power

Encrypt-Query-Decrypt (EQD) model

• More suitable to cloud environment

DB

Service provider (SP)User

Query Query

AnswerAnswer

Strengths of EQD model

• The answer is supposed to be much smaller than the entire database– Lower communication cost– Lower processing cost at user

• Challenge:– How to compute query on encrypted database?

Single EQD method approach

• A standalone encryption system is developed to address a particular query pattern

• Example: – Order-preserving encryption scheme (OPES) supports

comparison (E(x) > E(y) iff x > y)– RSA (E(x)E(y) = E(xy))

• Problem– We need to research and design a specific encryption for each

application!• Need a new encryption system for supporting WHERE X+Y > q• Need a new encryption system for supporting WHERE XY > q• …

Building database system based on single EQD method approach

• Example systems: – ODB model (NetDB2 with encryption) [SIGMOD

02, ICDE 02]– CryptDB [Commun. ACM 12]

• Limitations– Cannot support complex queries• Need to develop a new encryption method to support

each query pattern

Extensibility of the encryption system

• Each method (e.g. OPES, RSA) has its own encryption mechanism. The encrypted values by each method are not interoperable– The following query cannot be supported:• SELECT * WHERE price * quantity > 1000

– Attempt: first compute price*quantity (can be done by RSA)• The output is encrypted by RSA, but cannot be used by

OPES (not the same encryption)

How to achieve extensibility?

• Relational algebra– A few primitives are enough to build any queries

• Observation– Data interoperability (aka data interchangeability):

the result of one primitive operator can be used as input by other primitive operators

– Complex functions can be done using compositions of primitive operators

To enforce data interoperability

• There is only one encrypted data format• All operations operate on this format

• A similar secure mechanism with data interoperability– Using secure multiparty computation (SMC) with

secret sharing (Example: ShareMind)• Each data is split into shares and is distributed to multiple

parties. A distributed algorithm among all parties is executed and gives the result in shared form.

Illustration of SMC + secret sharing

Party 1

x: 3 y:8

Party 2

x: 2 y:4

Party 3

x: 5 y:-7

After some communications

Party 1

z: 13

Party 2

z: 6

Party 3

z: 6

Plain values:x = 10y = 5

Note:10 = 3 + 2 + 55 = 8 + 4 + (-7)

Plain values:z = (x – y)2

z = 25

SMC algorithms

Secret sharing

Generic operations in SMC

• Basic:– Addition– Multiplication

• Any operations that can be expressed as circuit can be computed– Addition on binary data can be regarded as XOR gate– Multiplication on binary data can be regarded as AND

gate– The two gates can form a universal gate which can

express any circuit

Weakness of SMC

• Require multiple non-colluding service providers– Higher cost to user due to more SPs– The assumption on non-colluding parties is hard to

realize in practice

Using the idea of SMC + secret sharing on encrypted database?

• Multiple parties vs client-server

• Same storage size (= original database size) for all parties– Secure share generation reduces the storage cost at user

Data Owner / User Cloud server

User Cloud server User Cloud server

Development of new operators

• Why?

• Our goal:– To develop (i) a secure share generator with (ii) its

corresponding operators

SMC Secure database system

Operations are done between multiple parties

Operations are done between user and service provider (SP)

No privileged party User is privileged. Can observe any plain data and should always have a low cost in any computation

Shares in secret sharing are materialized in each party

Shares at user are not materialized but can be generated

Attack model

• Security is defined w.r.t. to an attack model• Chosen ciphertext attack (CPA)– Formally: an attacker can observe the ciphertext

of any chosen plaintext. But it is still computationally hard to recover the key

• Some remarks on CPA– CPA is also used in RSA– OPES cannot guard against CPA

DESCRIPTION OF ENCRYPTION MECHANISM

Encryption procedure

• Secret sharing– Multiplicative secret sharing– Given a plain value v, the share at user vk, and the share at

SP v’• v = vkv’ mod n (n is a parameter in share generating function)

• The share at user vk is called the item key of the value v– The item key of each cell in the table is different– Each item key can be identified by the row and column

• The encrypted value v’ is stored at SP

Encryption illustration

A B

1 2 3

2 4 1

Plain data

A B

1 8 9

2 16 11

A B

1 9 12

2 9 16

n=35

Item keys at user Encrypted values at SP

Number of item keys = number of values in the table

Secure item key generator

• We extend RSA as our generator• Each column has a column key <m, x> (private

values)• Each row has a row ID r (public value)• Item key: mgxr mod n– g: system parameter, chosen by user, can be public– n: the system parameter generated in RSA; n is a

composite number with two big prime factors• n is public

– m, x, r are non-zero random values < n• Note: n is at least 1024-bit value

Secure item key generator

• Item key: mgxr mod n• Example: Column key <1, 1>, g = 2, n = 35

Essentially a single parameter y = gx Keep in this form so as to support update

<1, 1>

User’s storage

Row ID Encrypted value

1 9

2 5

3 12

SP’s storage

Row ID Value

1 18

2 20

3 26

Plain data

Row ID Item key

1 2

2 4

3 8

Security of our item key generator

• Our generating function extends RSA function– Ours: mgxr mod n (r, n are public, gx is private)– RSA: ye mod n (e, n are public, x is private)

• Imagine m = 1, y = gx, e = r, the functions are equivalent

• Conclusion: – Our encryption is secure w.r.t. CPA

PRIMITIVE OPERATORS

General Procedure

C

1 y

2 z

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

User interacts with SP to compute the answer.Cost at user must be lowC

<m, x>

1

2

The result is always a new column Note: The result C is interoperable with

encrypted columns A, B, e.g., B+C can be computed by a further addition

Overview of primitive operators

• Operations between columns– Multiplication (SELECT A * B)– Addition (SELECT A + B)– We will show that the above two are enough to support generic

function evaluation• Note: above operations assume both inputs are encrypted

– We are interested in operations between plain and encrypted columns• Non-sensitive columns should not be encrypted

– Special case: one of the operands is constant• Encrypt-constant operation (SELECT 10 * A)

List of basic primitive operators

• Encrypt-encrypt multiplication• Encrypt-encrypt addition• Encrypt-constant multiplication• Encrypt-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation (to support some of the above

operators)– Power– Key shuffling

Illustration 1Encrypt-encrypt multiplication

• C=AB (SELECT A*B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=ab = (akbk) (a’b’) mod nA B

1 2 3

2 4 1

Plain data

A<4, 1>

B<1, 3>

A B

1 9 31

2 9 29Table schema, and column keys at user Encrypted values at SP

C

34

16g=2n=35

Can be done by SP

Item keys are not materialized at user. User operates on column key level

Encrypt-encrypt multiplication

A B

… … …

r 4*gr mod 35 1*g3r mod 35

… … …

A<4, 1>

B<1, 3>

Table schema, and column keys at user

Item key table

C

(4*1)*(g1+3)r mod 35

C<4, 4>

Encrypt-encrypt multiplication - Result

A B

1 2 3

2 4 1

Plain data

A<4, 1>

B<1, 3>

A B

1 9 31

2 9 29Table schema, and column keys at user Encrypted values at SPn=35

g=2

C<4, 4>

C

1 34

2 16

Result: C

1 29

2 9

C=AB

6

4

Answer

Security: No information about item keys of A and B is sent to SP

User SP

Illustration 2Encrypt-encrypt addition

• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=a+b = (aka’) + (bkb’) mod n

A B

1 2 3

2 4 1

Plain data

A<4, 1>

B<1, 3>

A B

1 9 31

2 9 29Table schema, and column keys at user Encrypted values at SP

We must combine ak and a’ to compute addition. But ak is not materialized (generated by A’s key)Send A’s key to SP in a protected way.

Encrypt-encrypt addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=a+b = (aka’) + (bkb’) mod nIn the end, c should be also encrypted like other values, i.e., c = ckc’ mod n

• ckc’= (aka’) + (bkb’) mod n

• c’ = (ck-1ak)a’ + (ck

-1bk)b’ mod n

ck can be abstracted by C’s column key. User generates C’s key randomly

Remaining problem is to help SP compute c’

User prepares these two partsItem keys are not there yet, but can be abstracted at column key level

C <mc, xc>; A <ma, xa>At row r,

ck = mcgxcr mod nck

-1 = mc-1((gxc)-1)r mod n

ak = magxar mod nck

-1ak = mc-1ma (xc

-1xa)r mod n=> [ mc

-1ma, xa-xc]

Example

Hint for A

Hint for B

1 33 33

2 3 12

A B

1 2 3

2 4 1

Plain data

A<4, 1>

B<1, 3>

A B

1 9 31

2 9 29Table schema, and column keys at user Encrypted values at SP

C<3, 21>

1First, generate C’ key

C-1

<12, 3>

2 C’s inverse

3 Hint for A, BHint A

[13, 26]Hint B

[12, 29]

4 SP materializes the hints for every row

C

25

25

5 SP obtains encrypted values of C

C

5

5

C

1 31

2 17

Obtain the correct answers if we look at plain values

Generic Encrypt-encrypt operations

• With addition and multiplication, we can compute any function that can be expressed as a circuit

• All data is in binary form• It is sufficient to show that we can build a

universal gate (e.g., NAND gate) on top of binary data

Building NAND gate

• 1 – XY (multiplication and addition)– EE multiplication / EP multiplication (Z = XY)– EC addition (1 – Z)

• Any circuit can be expressed

X Y Result

0 0 1

0 1 1

1 0 1

1 1 0

Extension operators

• We also develop the following operators to support the following operations efficiently.

• Comparison– Example: Quantity * Price + One-TimeCost > 1M

• COUNT/SUM– Example: SELECT SUM(A+B) WHERE C > 30

• Join– Example: SELECT t1.A, t2.B WHERE t1.C = t2.C

• DELETE/UPDATE with predicates– UPDATE T1 SET A = A*1.1 + B WHERE C = 3000

Indexing

• Processing each tuple by linear scan is feasible but slow

• Indexing is needed• Note: index itself is a compromise of security– If certain tuples are filtered without any

processing, the attacker can obtain certain information about the data, e.g., a range about the data

Domain partitioning index [VLDB 04]

• The domain is partitioned into regions

Row ID Values

1 101

2 235

3 467

Partition ID

1

2

4

Query: SELECT … WHERE Values < 450

Query: SELECT … WHERE Values->ParitionID <= 4

Integration with existing DBMS

DBMS

Applications

SPUser

Query

SDB Client Layer SDB ServerLayer

QueryExecution

Plan

SecureOperators

SecureOperators

MemorySQL

Result

To enjoy existing features of DBMS, e.g., failure recovery

To wrap DBMS layer and provide our operators

A layer at userApplications simply use the database service using SQL

SQL

Partition Index is stored on DBMS

Index processing on SDB

• First process index, filter all disqualified tuples• Then, use cryptographic operation to compute

the actual answer

Index processing done by underlying DBMS

OperatorsDone by SDB layers

(Encrypted) Answers are sent to user

Note on SQL in applications

• The only difference– The application has to mention which columns

require encryption• CREATE TABLE

• Example:– CREATE TABLE Stud(ID, Name, Marks ENC)

Row ID ID Name Marks

1 … … …

2 … … …

Row ID ID Name Marks Marks_ind

1 … … … …

2 … … … …

Plain database Schema in underlying DBMS

Partition IDs, we will create an index on DBMS

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Row ID A B C D

105 … … … …

278 … … … …Table schema, and column keys at user Encrypted candidate

tuples at SPn=35

A*B + D – 20 > 0

Row ID A_ind B_ind C_ind D_ind

1 … … … …

2 … … … … Filter by index

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Row ID A B C D

105 … … … …

278 … … … …Table schema, and column keys at user Encrypted values at SPn=35

A*B + D – 20 > 0

E<…>

Column-column multiplication:E = AB

Column-column additionF = E + D – 20

Comparison

F<…>

Query execution plan done (with corresponding parameters)Note: E, F can be thrown away by user, since they are not needed in the result

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Row ID A B C D

105 … … … …

278 … … … …Table schema, and column keys at user Encrypted values at SPn=35

SP receives the query planRow ID Answers?

105 No

278 Yes

337 No

129 No

… …

Execute the plan and find the answers

Projection on C only

Row ID C

278 3

776 12

… …

Encrypted answer sent back to userRow IDs must be there

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Table schema, and column keys at user n=35

Row ID C

278 3

776 12

… …

Row ID C

278 9

776 9

… …

User computes own item keys

Encrypted answers

C

27

3

Decrypt

Experiment performance

• Comparison:– EDQ Model• SP filters the database by the index and sends all

candidate tuples to user; user decrypts the tuples and compute the query itself• This method has to be used when the query is outside

the supported query range in existing approach, e.g., ODB, CryptDB

– Our system: SDB

Insertion performance

• Data encryption speed

• Table schema– (A, B, C)• Encrypted: A, B

DB Size 100K 200K 300K 400K 500K

SDB 166.0 162.6 160.4 155.6 153.3

EDQ 335.5 343.0 328.9 333.7 342.8

Throughput: (Number of tuples per second)

Query performance

• SELECT A, B, C from test WHERE A + B < q.

100K 200K 300K 400K 500K0

100

200

300

400

SDBEDQ

Database size

Tim

e (s

)

100K 200K 300K 400K 500K0

0.5

1

1.5

2

2.5

SDB

Database sizeTi

me

(s)

Cost at user Cost at SP

Selectivity: 1%

Query performance

• SELECT SUM(A) from test WHERE A + B < q.

Cost at user Cost at SP

Selectivity: 1%

100K 200K 300K 400K 500K0

20406080

100120140160180200

SDBEDQ

Database size

Tim

e (s

)

100K 200K 300K 400K 500K02468

101214161820

SDB

Database sizeTi

me

(s)

Query performance

• SELECT * FROM test as t1, test as t2 WHERE t1.A = t.B AND t1.B < q1 AND t2.A < q2.

Cost at user Cost at SP

100K 200K 300K 400K 500K0

100200300400500600700800900

SDB

Database sizeTi

me

(s)

100K 200K 300K 400K 500K0

1000

2000

3000

4000

5000

6000

SDBEDQ

Database size

Tim

e (s

)

Conclusion

• We developed a new secure database system– Theoretically support generic operation– Support common database operations

• Range query, aggregation function, delete/update, join

– Our system is more efficient than naïve EDQ approach• Future work

– Query plan optimization– Development of more operators, especially on text data– Index improvement

• How to prepare/make use of the index for more complex queries?

top related