sgx bigmatrix - a practical encrypted data analytic framework … · a practical encrypted data...

Post on 20-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science

FEARLESS engineering

SGX BigMatrixA Practical Encrypted Data Analytic Framework with Trusted

Processors

Fahad Shaon Murat Kantarcioglu Zhiqiang LinLatifur Khan

The University of Texas at Dallas

FEARLESS engineering 1 / 49

Problem - Secure Data Analytics on Cloud

Result

Code & Data

I We want to utilize cloud environment for data analytics

I Service provider can observe the data

I Problematic for sensitive data (e.g., medical, financial data)

FEARLESS engineering 2 / 49

Problem - Secure Data Analytics on Cloud

Encrypted Result

Encrypted Code & Data

I We outsource encrypted sensitive data

I However, encrypted data is difficult to analyze

FEARLESS engineering 3 / 49

Problem - Secure Data Analytics - Approaches

Homomorphic Encryption

I Theoretically robust andprovides highest level ofsecurity

I High computational cost

I Impractical for large dataprocessing

Trusted Hardware

I Cost effective

I Provides reasonable security

I Intel SGX is available in allnew processors

I Needs careful considerationof side channel attacks

FEARLESS engineering 4 / 49

Objective of the work

Create a data analytics platform utilizing trustedprocessor, which is - secure, practical, generalpurpose, and scalable.

FEARLESS engineering 5 / 49

State of the Art

ObliVM (Liu et al., 2015)

I Provides a language and covert the logic into circuit

I Difficult to perform analysis on large data set

Oblivious Multi-party ML (Ohrimenko et al., 2016)

I Performs important machine learning algorithms using SGX

I Specific for set of algorithms

Opaque (Zheng et al., 2017)

I Oblivious and encrypted distributed analytics platform usingApache Spark and Intel SGX (mainly focused on supportingSQL)

FEARLESS engineering 6 / 49

Background - Intel SGX

I SGX stands for Software Guard Extensions

I SGX is new Intel instruction set

I Allows us to create secure compartment inside processor,called Enclave

I Privileged softwares, such as, OS, Hypervisor, can’t directlyobserve data and computation inside enclave

FEARLESS engineering 7 / 49

Background - Intel SGX - Attack Surface

I SGX essentially reduce the attack surface to processor andenclave code

OS

VMM

Hardware

App App App

Attack Surface

Attack surface of traditionalcomputation system

OS

VMM

App App App

Hardware

Attack Surface

Attack surface with SGX

FEARLESS engineering 8 / 49

Background - Intel SGX - Attack Surface

I SGX essentially reduce the attack surface to processor andenclave code

OS

VMM

Hardware

App App App

Attack Surface

Attack surface of traditionalcomputation system

OS

VMM

App App App

Hardware

Attack Surface

Attack surface with SGX

FEARLESS engineering 8 / 49

Background - Intel SGX Application

Untrusted Part

of App

Trusted Part

of App

I We only trust the processor and the code inside theenclave (Intel, 2015)

FEARLESS engineering 9 / 49

Background - Intel SGX Impact

SGX Server

Encrypted Result

Encrypted Code & Data

I We can outsource computation securely

I No need to trust the cloud provider (i.e. Hypervisor, OS,Cloud administrators)

FEARLESS engineering 10 / 49

Threat Model

Server

Memory Processor

Enclave

Disk

Code & Data

Result

I Adversary can control OS (i.e. memory, disk, networking)

I Adversary can not temper with enclave code

I Adversary can not observe CPU register content

FEARLESS engineering 11 / 49

Challenges - Obliviousness

Challenge: Access Pattern Leakage

I SGX uses system memory, which is controlled by the adversary

I Adversary can observe memory accesses

I Memory access reveals a lot about the data (Islam, Kuzu, andKantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

Solution

I To reduce information leakage we ensure Data Obliviousness

FEARLESS engineering 12 / 49

Challenges - Obliviousness

Challenge: Access Pattern Leakage

I SGX uses system memory, which is controlled by the adversary

I Adversary can observe memory accesses

I Memory access reveals a lot about the data (Islam, Kuzu, andKantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

Solution

I To reduce information leakage we ensure Data Obliviousness

FEARLESS engineering 12 / 49

Data Obliviousness - Example

I Program executes same path for all input of same size

Example: Non-Oblivious swap method of Bitonic sort

if (dir == (arr[i] > arr[j])) {

int h = arr[i];

arr[i] = arr[j];

arr[j] = h;

}

FEARLESS engineering 13 / 49

Data Obliviousness - Example

I Program executes same path for all input of same size

Example: Non-Oblivious swap method of Bitonic sort

if (dir == (arr[i] > arr[j])) {

int h = arr[i];

arr[i] = arr[j];

arr[j] = h;

}

FEARLESS engineering 13 / 49

Data Obliviousness - Example (Cont.)

Example: Oblivious swap method of Bitonic sort

int x = arr[i];

int y = arr[j];

_asm{

...

mov eax , x

mov ebx , y

mov ecx , dir

cmp ebx , eax

setg dl

xor edx , ecx

mov eax , x

mov ecx , y

mov ebx , y

mov edx , x

cmovz eax , ecx

cmovz ebx , edx

mov [x], eax

mov [y], ebx

}

FEARLESS engineering 14 / 49

Data Obliviousness - Challenges

Challenge

I Building data obliviousness solution is non-trivial

I Requires a lot of time and effort

Solution

I We provide our own python (NumPy, Pandas) inspiredlanguage that ensures data obliviousness

FEARLESS engineering 15 / 49

Data Obliviousness - Challenges

Challenge

I Building data obliviousness solution is non-trivial

I Requires a lot of time and effort

Solution

I We provide our own python (NumPy, Pandas) inspiredlanguage that ensures data obliviousness

FEARLESS engineering 15 / 49

Data Oblivious - Vectorization

I We removed if and emphasis on vectorization

Example: Compute average income of people with age >= 50

sum = 0, count = 0

for i = 0 to Person.length:

if Person.age >= 50:

count++

sum += P.income

print sum / count

FEARLESS engineering 16 / 49

Data Oblivious - Example

Example: Compute average income of people with age >= 50

S = where(Person , "Person[‘age ’] >= 50")

print (S .* Person[‘income ’] ) / sum(S)

FEARLESS engineering 17 / 49

Challenge - Memory constraint

Challenge

I Current version of SGX (v1) allows only 90MB of memoryallocation

Solution

I We build flexible data blocking mechanism with efficientand secure caching

I We build matrix manipulation library that supports blockingand we call the abstraction BigMatrix

FEARLESS engineering 18 / 49

Challenge - Memory constraint

Challenge

I Current version of SGX (v1) allows only 90MB of memoryallocation

Solution

I We build flexible data blocking mechanism with efficientand secure caching

I We build matrix manipulation library that supports blockingand we call the abstraction BigMatrix

FEARLESS engineering 18 / 49

Security Properties - Summary

I Individual operations in our system is data oblivious

I Combination of oblivious operations is also oblivious

I Compiler warns user about potential leakage

I We perform optimization based on publicly knowninformation, e.g. data size

FEARLESS engineering 19 / 49

System Overview - SGX BigMatrix

Untrusted Trusted

CompilerBlock SizeOptimizer

Service ManagerBigMatrix Library

Intel SGX SDK

Execution Engine

BlockCache

OCalls

ECalls

Compiler

BMRT Client

ServerClient

SGX BigMatrix

FEARLESS engineering 20 / 49

BigMatrix Library

Untrusted Trusted

CompilerBlock SizeOptimizer

Service ManagerBigMatrix Library

Intel SGX SDK

Execution Engine

BlockCache

OCalls

ECalls

Compiler

BMRT Client

ServerClient

SGX BigMatrix - BigMatrix Library

FEARLESS engineering 21 / 49

BigMatrix Library

Operations in BigMatrix Library

I Data access operations - load, publish, get row, etc.

I Matrix Operations - inverse, multiply, element wise,transpose, etc.

I Relational Algebra Operations - where, sort, join, etc.

I Data generation operations - rand, zeros, etc.

I Statistical Operations - norm, var

FEARLESS engineering 22 / 49

BigMatrix Library - Security Properties

I All the operations are data oblivious

I All the operations supports blocking

I We proved that combination of data oblivious operations isalso data oblivious (in Section 4)

I Data oblivious and blocking aware implementation details inAppendix A

FEARLESS engineering 23 / 49

BigMatrix Library - Trace

I Each operation has fixed trace

I Trace is the information disclosed to adversary duringexecution

I For example: operation type, input and output data size

Example: Trace of Matrix Multiplication C = A ∗BI Instruction type (i.e. multiplication)

I Input Matrices size (i.e., A.rows,A.cols, B.rows,B.cols)

I Output Matrix size (i.e., C.rows,C.cols)

I Block size

I Oblivious memory read and write sequences, which does notdepend on data content

FEARLESS engineering 24 / 49

BigMatrix Library - Trace

I Each operation has fixed trace

I Trace is the information disclosed to adversary duringexecution

I For example: operation type, input and output data size

Example: Trace of Matrix Multiplication C = A ∗BI Instruction type (i.e. multiplication)

I Input Matrices size (i.e., A.rows,A.cols, B.rows,B.cols)

I Output Matrix size (i.e., C.rows,C.cols)

I Block size

I Oblivious memory read and write sequences, which does notdepend on data content

FEARLESS engineering 24 / 49

Exec. Engine & Block Cache

Untrusted Trusted

CompilerBlock SizeOptimizer

Service ManagerBigMatrix Library

Intel SGX SDK

Execution Engine

BlockCache

OCalls

ECalls

Compiler

BMRT Client

ServerClient

SGX BigMatrix - Execution Engine and Block Cache

FEARLESS engineering 25 / 49

Exec. Engine & Block Cache

Execution Engine

I Execute BigMatrix library operations

I Parse instruction in the form of

Var ASSIGN Operation (Var, Var, ...)

I Process sequence of instructions

I Maintain intermediate states required to execute complexprogram, such as, variable to BigMatrix assignments

Block Cache

I Help with the decision when to remove a block from memorybased on next sequence of instructions

FEARLESS engineering 26 / 49

Exec. Engine & Block Cache - Security Properties

I Execution Engine and Block Cache is also data obliviousgiven the input program is data oblivious

I Compiler warns about potential data leakage

I Adversary can not infer anything more about data, apart fromthe trace of all the operations

FEARLESS engineering 27 / 49

Compiler

Untrusted Trusted

CompilerBlock SizeOptimizer

Service ManagerBigMatrix Library

Intel SGX SDK

Execution Engine

BlockCache

OCalls

ECalls

Compiler

BMRT Client

ServerClient

SGX BigMatrix - Compiler

FEARLESS engineering 28 / 49

Compiler

I Compiles our python inspired language into basic command

I It ensures data obliviousness by removing support for if

I We emphasis on operation vectorization

Input: Linear Regression

x = l o a d ( ‘ path / to / X Matr ix ’ )y = l o a d ( ‘ path / to / Y Matr ix ’ )x t = t r a n s p o s e ( x )t h e t a = i n v e r s e ( x t ∗ x ) ∗ x t ∗ yp u b l i s h ( t h e t a )

FEARLESS engineering 29 / 49

Compiler - Output

Output: Linear Regression

x = l o a d ( X M a t r i x I D )y = l o a d ( Y M a t r i x I D )x t = t r a n s p o s e ( x )t1 = m u l t i p l y ( xt , x )u n s e t ( x )t2 = i n v e r s e ( t1 )u n s e t ( t1 )t3 = m u l t i p l y ( t2 , x t )u n s e t ( x t )u n s e t ( t2 )t h e t a = m u l t i p l y ( t3 , y )u n s e t ( y )u n s e t ( t3 )p u b l i s h ( t h e t a )

FEARLESS engineering 30 / 49

Compiler - Track data leakage

I We report against accidental data leakage through trace

I We check if any sensitive data is used in trace of any operation

I In our system, sensitive data - content of any BigMatrix,content of intermediate variables

Example

X = load(‘path/to/X_Matrix ‘)

s = count(where(X[1] >= 0))

Y = zeros(s, 1)

publish(Y)

We report that zeros operation revealing sensitive data s

FEARLESS engineering 31 / 49

SQL Support

I We also support basic SQL

Input

I = sql(‘SELECT *

FROM person p

JOIN person_income pi (1)

ON p.id = pi.id

WHERE p.age > 50

AND pi.income > 100000 ’)

FEARLESS engineering 32 / 49

SQL Support (Cont.)

Output

t1 = where(person , ’C:3;V:50;O:=’)

# person.age is in column 3

t2 = zeros(person.rows , 2)

set_column(t2, 0, t3)

t3 = get_column(person , 0)

# person.id is in column 0

set_column(t2, 1, t1)

t4 = where(person_income , ’C:1;V:100000;O:=’)

t5 = zeros(person_income.rows , 2)

set_column(t5, 0, t6)

t6 = get_column(person_income , 0)

# person_income.id is in column 0

set_column(t5, 1, t4)

A = join(t3, t5, ’c:t1.0;c:t2.0;O:=’, 1)

...FEARLESS engineering 33 / 49

Block Size Optimizer

Untrusted Trusted

CompilerBlock SizeOptimizer

Service ManagerBigMatrix Library

Intel SGX SDK

Execution Engine

BlockCache

OCalls

ECalls

Compiler

BMRT Client

ServerClient

SGX BigMatrix - Block Size Optimizer

FEARLESS engineering 34 / 49

Block Size Optimizer - Intro & Design Decisions

I We observed that input block size has impact onperformances of the system

I Adversary doesn’t gain any knowledge about data based onblock size

I So, we find optimum block size for each instruction beforeexecuting a program

I We explicitly do not want to perform optimization insideenclave because

I Optimization libraries are large and complex, which canintroduce unintended security flaws

I Any efficient optimization algorithm will reveal informationabout data

I So we only perform optimization on trace data, nothing else

FEARLESS engineering 35 / 49

Block Size Optimizer - Overview

I We generate DAG of execution graphI Internal nodes represent operationsI Edges represent block conversions

I We know cost for each operation for different matrix andblock size

I Given input matrix sizes we can find optimized block size

I We can convert one block configuration to another and knowthe cost of conversion

FEARLESS engineering 36 / 49

Block Size Optimizer - Example - Linear Regression

I Execution graph (DAG) of Θ = (XTX)−1XTY in linerregression training phase

FEARLESS engineering 37 / 49

Block Size Optimizer - Example - LR Cost Function

Cost = Convert(X, (brX , bcX), (x0, x1))

+ OP Cost(′Transpose′, X, (x0, x1))

+ Convert(XT , (x1, x0), (x2, x3))

+ Convert(X, (brX , bcX), (x4, x5))

+ OP Cost(′Multiply′, [XT , X], [(x2, x3), (x4, x5)])

+ ...

We convert this into integer programming and solve it for all thexn variables.

FEARLESS engineering 38 / 49

Experimental Evaluations

We implemented a prototype using Intel SGX SDK and observeperformance of different operations

Setup

I Processor Intel Core i7 6700

I Memory 64GB

I OS Windows 7

I SGX SDK Version 1.0

I Number of Machine 1

FEARLESS engineering 39 / 49

Performance Impact - Matrix Size

0

200000

400000

600000

800000

1x106

1.2x106

1.4x106

0

5x10

6

1x10

7

1.5

x107

2x10

7

2.5

x107

Mat

rix

Mu

ltip

lica

tio

n T

ime

(ms)

Matrix Elements

UnencryptedEncrypted

Matrix Multiplication(e.g. C = A ∗B)

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

1x10

6

Join

tim

e (m

s)

Matrix Elements

UnencryptedEncrypted

Oblivious Join

FEARLESS engineering 40 / 49

Performance Impact - Matrix Size - Summary

I We observe similar trends for all matrix operations

I We observe minimal overhead for encrypted computation

I However, the overhead depends on operation type

I More experimental evaluations in Section 5

FEARLESS engineering 41 / 49

Performance Impact - Block Size

Execution Time

100 200

300 400

500 100

200

300

400

500

140

145

150

155

160

Sca

lar

Oper

atio

n T

ime

(ms)

Scalar Multiplication

Execution Time

100 200

300 400

500 100

200

300

400

500

18000

18400

18800

19200

19600

20000

Mat

rix M

ult

ipli

cati

on T

ime

(ms)

Matrix Multiplication

FEARLESS engineering 42 / 49

Performance Impact - Block Size - Summary

I We observe execution time increases with block size

I Also, very small block size increases execution time, due toblocking overhead

I As a result, we performed optimization

FEARLESS engineering 43 / 49

Comparison with ObliVM

I We compare performance of SGX-BigMatrix with ObliVM fortwo-party matrix multiplication

I We observe that SGX-BigMatrix is magnitude faster becausewe are utilizing hardware and do not require expensive overthe network communication

Matrix ObliVM BigMatrix BigMatrixDimension SGX Enc. SGX Unenc.

100 28s 660ms 10ms 10ms250 7m 0s 90ms 93ms 88ms500 53m 48s 910ms 706.66ms 675.66ms750 2h 59m 40s 990ms 2s 310ms 2s 260ms

1,000 6h 34m 17s 900ms 10s 450ms 10s 330ms

Table: Two-party matrix multiplication time in ObliVM vs BigMatrix

FEARLESS engineering 44 / 49

Case Studies - Page Rank

I Performed Page Rank on three popular datasets

I Each dataset contains directed graph

Data Set Nodes BigMatrix Encrypted

Wiki-Vote 7,115 97s 560msAstro-Physics 18,772 6m 41s 200msEnron Email 36,692 23m 19s 700ms

Table: Page Rank on real datasets

FEARLESS engineering 45 / 49

Conclusion

I We propose a practical data analytics framework with SGX

I We present BigMatrix abstraction to handle large matrices inconstrained environment

I We proposed a programming abstraction for secure dataanalytics

I We applied our system to solve real world problems

FEARLESS engineering 46 / 49

Thank You

Questions / Comments

I Fahad Shaon - fahad.shaon@utdallas.edu

I Murat Kantarcioglu - muratk@utdallas.edu

I Zhiqiang Lin - zhiqiang.lin@utdallas.edu

I Latifur Khan - lkhan@utdallas.edu

FEARLESS engineering 47 / 49

References I

Intel (2015). Presentation for Intel SGX: ISCA 2015. url: https://software.intel.com/sites/default/files/332680-

002.pdf.Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu

(2012). “Access Pattern disclosure on Searchable Encryption:Ramification, Attack and Mitigation.” In: NDSS. Vol. 20, p. 12.

Liu, Chang et al. (2015). “Oblivm: A programming framework forsecure computation”. In: Security and Privacy (SP), 2015 IEEESymposium on. IEEE, pp. 359–376.

Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015).“Inference attacks on property-preserving encrypted databases”.In: Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security. ACM, pp. 644–655.

FEARLESS engineering 48 / 49

References II

Ohrimenko, Olga et al. (2016). “Oblivious Multi-Party MachineLearning on Trusted Processors”. In: 25th USENIX SecuritySymposium (USENIX Security 16). Austin, TX: USENIXAssociation, pp. 619–636. isbn: 978-1-931971-32-4. url:https://www.usenix.org/conference/usenixsecurity16/

technical-sessions/presentation/ohrimenko.Zheng, Wenting et al. (2017). “Opaque: A Data Analytics

Platform with Strong Security”. In: 14th USENIX Symposiumon Networked Systems Design and Implementation (NSDI 17).Boston, MA: USENIX Association. url:https://www.usenix.org/conference/nsdi17/technical-

sessions/presentation/zheng.

FEARLESS engineering 49 / 49

top related