using fully homomorphic encryption for statistical ... · using fully homomorphic encryption for...

30
Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jie Lu 1 , Shohei Kawasaki 1 , Jun Sakuma 1,2,3 1. University of Tsukuba, Japan 2. JST CREST 3. RIKEN Center for AIP 1

Upload: others

Post on 24-Sep-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

UsingFullyHomomorphicEncryptionforStatisticalAnalysisofCategorical,Ordinal

andNumericalDataWen-jie Lu1,Shohei Kawasaki1,JunSakuma1,2,3

1. University of Tsukuba, Japan2. JST CREST

3. RIKEN Center for AIP

1

Page 2: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

StatisticalAnalysisontheCloud

Cloudcomputingisusefulforstatisticalanalysis• Gatherdistributeddata,andreducehardware cost.• Minimalinteractionsbetweendataprovidersandthecloud.• Theclouddoesmostoftheworkfortheanalyst.

Query&Result

Datacollection

Thirdpartycloudserver

Multipledataproviders Analyst

2

Page 3: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

CloudComputingwithSensitiveData

• Usingoutsidecloudserversraisesprivacyconcerns.o E.g,medicalrecords,federaldata.

• Wewanttocalculatestatisticsonthecloudwhilekeepingthedatasecret.

Sensitivedata

Thirdpartycloudserver

3

Page 4: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

SecureMultipartyComputation(SMC)

• Off-the-shelftoolsforSMCprotocolso Yao’sgarbledcircuit(GC).o Fullyhomomorphicencryption(FHE).

• ButdevelopmentcostandefficiencyhinderapplicationsofGCandFHEinthecloud.

Z=F(x,y)x y

OnlyrevealsZ!x,y:privateinputF:publicfunction

YaoAndrew.Protocolsforsecuresecurecomputation.1982.Gentry.Fullyhomomorphicencryptionusingideallattices.2009. 4

Page 5: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

GContheCloudEnvironment

5

SecretSharing

GCprotocol

GCrequiresalargedevelopmentcost• Multipleservers areneeded.

oAssumenocollusionbetweenservers.• Fastnetworkisnecessaryforcomputation.

o E.g.,10Gbpsbandwidth.

Page 6: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

FHEontheCloudEnvironment

• Lessdevelopmentcosto Singleserverisenough.oRapidnetworkisnotnecessary.

• Butmightbeinefficientinpracticeo Encryptbitsonebyone.o1~10ms perevaluation.o1~10megabytesperciphertext.

Gentryetal.HomomorphicEvaluationoftheAESCircuit.2012.6

ciphertexts

FHEprotocol

Page 7: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Observation

• PurposeofencryptingbitsseparatelyoToevaluateanyBooleanfunction.

• Buttodostatisticalanalysis,wecanuseomatrixarithmeticoperation.ocomparison operation.

7

Page 8: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

OurResult• TwonewFHE-basedprimitives:

oMatrixOperationsoBatchGreater-than

• Securestatisticalprotocols:ohistogram(count),oorderofcounts,o contingencytable (withcell-suppression),opercentile,oprincipalcomponentanalysis(PCA),o linearregression.

• Sourcecodes:https://github.com/fionser/CODA8

Page 9: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Preliminaries:FullyHomomorphicEncryption

• Public-privatekeyscheme.oDataproviders&cloudsharethepublickey.o Theanalystholdstheprivatekey.

• Allowaddition(subtraction)andmultiplicationonencryptedintegers.oAnalogy:blackboxwithgloves

Brakerski etal.FullyHomomorphicEncryptionwithoutBootstrapping.2012.

9

Page 10: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Preliminaries:Packing(Batching)

• Enabletoencrypt andprocessvectors atnoextracost.

N.P.Smartetal.FullyhomomorphicSIMDoperations.2011.

1 2 3 4

8 7 6 5

+

9 9 9 9

xSinglehomomorphicoperation

1 2 3 4

8 7 6 5

8 14 18 20

o Fewerciphertextso Fastercomputation

Multipleresults

10

Page 11: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Preliminaries:SlotManipulation

Rotate slotsoftheencryptedvector.

Halevi etal.AlgorithmsinHelib.2014.

1 2 3 4 >>2 3 4 1 2

Replicate aspecificslot.

8 5 1 5 @3 1 1 1 1

11

Page 12: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

PartIITechnicalDetails

• Datapreprocessing.• Efficientmatrixmultiplicationonciphertexts.• Comparingtwoencryptedintegers.• Exampleoftwoprotocols:

oContingencytablewithcell-suppressiono Linearregression(forotherprotocols,refertoourpaper).

12

Page 13: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

DataPreprocessing

• Numericaldata:fixed-pointrepresentationo3.14159 → ⌈3.14159×1000⌋ = 3142o Precision(e.g.,1000)determinedinadvance

• Categoricaldata:1-of-krepresentationo Gender(i.e.,k=2).Female→ [1,0]andMale→ [0,1]

• Ordinaldata:stair-caseencoding

13

Page 14: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ProposedMatrixPrimitive• Usedforadding&multiplyingencryptedmatrices• Encrypteachrowseparatelybypacking.

oRow-wiseencryption.oHorizontallypartitioneddata

• Efficientandlayoutconsistent.o𝑂 𝑁2 homomorphicoperations.

14

Page 15: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

MatrixMultiplication[1/2]• Encryptthematrixrowbyrowwithpacking.

15

1 1

2 2

a b

c d

1a+2c 1b+2dmultiplyadd

multiply

11 23 42× 1

𝑎 𝑏𝑐 𝑑2 = 11𝑎 + 2𝑐 1𝑏 + 2𝑑

3𝑎 + 4𝑐 3𝑏 + 4𝑑2Replicate@1@21

23

Page 16: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

MatrixMultiplication[1/2]• Encryptthematrixrowbyrowwithpacking.

• N2 replications,multiplicationsandadditionso𝑂 𝑁2 complexitycomparedto𝑂 𝑁3 (nopacking).

• Alsorow-wiselyencryptedresultingmatrix.

3 3

4 4

a b

c d

1a+2c 1b+2d

16

multiplyadd

multiply 3a+4c 3b+4d

11 23 42× 1

𝑎 𝑏𝑐 𝑑2 = 11𝑎 + 2𝑐 1𝑏 + 2𝑑

3𝑎 + 4𝑐 3𝑏 + 4𝑑2Replicate@1@2

Page 17: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

MatrixMultiplication[2/2]

17

• Layoutconsistencyisimportantfordevelopingefficientstatisticalprotocols.o Statisticalalgorithmsneediterativematrixmultiplications

Efficientforsinglemultiplication

Layoutconsistent

??

Stillefficientforiterativemulti.

Inefficientforiterativemulti.

Heavylayoutadjustment

YesNo

Page 18: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ExperimentalSettingsofMatrixPrimitive

• Implementations:o FHE:HElib (C++based)o GC:ObliVM (javabased)

• Evaluatedon32-bitintegers• Networks:

o LAN(about88Mbps)o WAN(about48Mbps)

18

HElib.https://github.com/shaih/HElib.Liuetal.ObliVM:Aprogrammingframeworkforsecurecomputation.2015.

Page 19: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

EvaluationofMatrixPrimitive

0.1

1

10

100

1000

10000

2 4 8 16 32 64

Second

#Matrix Dimension

FHE-LAN

FHE-WAN

GC-LAN

GC-WAN

• Whendoiterativemultiplications,FHE-basedprimitivecanofferbetterperformance.o Savecommunicationcostbetweeneachiteration

ExecutionTime

19

0.1

1

10

100

1000

10000

100000

2 4 8 16 32 64MB

Matrix Dimension

GC

16640

132096

1052672

8404992

67174400

537133056

FHE

CommunicationCost

ElapsedTime(s)

DataTransferred

(MB)

Page 20: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Greater-than(GT)Primitive

GT e 𝑥 , 𝑒 𝑦 → 𝑒(𝑥 >? 𝑦) s.t. 0 ≤ 𝑥, 𝑦 ≤ D• [Golle06]basedonPaillier cryptosystem:

𝑖𝑓𝑥 > 𝑦𝑡ℎ𝑒𝑛∃𝑘 ∈ 1, 𝐷 → 𝑥 − 𝑦 − 𝑘 = 0• Combinationwithpackinggivesgreatimprovements:

𝑒 𝑥, … , 𝑥 − 𝑒 𝑦,… , 𝑦 − [1, 2, … , 𝐷] → 𝑒(𝜼)

o0 ∈ 𝜼 ⟺ 𝑥 > 𝑦 (i.e.,decryptionisneeded)oComplexityfrom𝐷 to⌈D/ℓ⌉.

Golle.Aprivatestablematchingalgorithm.2006. 20

ReplicatedDtimes

Page 21: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ExperimentalSettingsforGTPrimitive

• Implementations:o FHE:HElib (C++based)o GC:ObliVM (javabased)

• Domain𝐷 =24 ~224• Numberofslotsℓ ≈ 1700.• Networks:

o LAN(about88Mbps)o WAN(about48Mbps)

21

HElib.https://github.com/shaih/HElib.Liuetal.ObliVM:Aprogrammingframeworkforsecurecomputation.2015.

Page 22: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

EvaluationofGreater-thanPrimitive

0.1

1

10

100

1000

4 8 12 16 20 24

Second

#Bits

FHE-LAN

FHE-WAN

GC-LAN

GC-WAN

Worksforsmalldomains,whichisenoughforordinalstatistics.

22

0.001

0.01

0.1

1

10

100

1000

10000

4 8 12 16 20 24MB

#Bits

GC

76 88 100 112 124 136

FHE

ExecutionTime CommunicationCost

ElapsedTime(s)

DataTransferred

(MB)

Page 23: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

SecureStatisticalProtocols

• Contingencytablewithcell-suppressionprotocol:oUsethegreater-thanprimitive.oOneroundprotocolbetweencloudandanalyst.

• Linearregressionprotocol:oUsethematrixprimitive.o Tworoundsprotocol.oUseaPlaintextPrecisionExpansiontechnique(discussitlatter).

23

Page 24: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ContingencyTableGender Smoke

Male SmokerFemale Non-smokerMale Non-SmokerCategoricaldata

Smoker Non-smokerMale 1 1

Female 0 1

ContingencyTable• Indicatorencoding:

Male→ [1,0],Female→ [0,1]Smoker→ [1,0],Non-smoker→ [0,1]

• BasicIdea:multiply&rotate[a1,a2]x[b1,b2] countsMale-Smoker,andFemale-Nonsmoker

[a1,a2]x([b1,b2]>>1)=[a1,a2]x[b2,b1] givesothertwocounts.• Improvementwithnoextrapreprocessing

o O(max(k1,k2))=>O(logk1k2). 24

K1 =2

K2 =2

Page 25: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ContingencyTable:CellSuppression

Smoker Non-smoker

Male 20 11

Female 3 12

Smoker Non-smoker

Male 20 11

Female 0 12

if<10zeroout

OriginTable SuppressedTable

• Protecttheprivacyofrareindividuals.• Givenaciphertext 𝑒(𝑥),tocompute𝑒 𝑦 where

if𝑥 >thresholdthen𝑦 =𝑥 else𝑦 =somerandomvalue• 𝐺𝑇 𝑒 𝑥 , threshold = 𝑒 𝜼 . iff𝑥 >threshold,then0 ∈ 𝜼.• Tocompute{𝑒 𝑥 + 𝒓 , 𝑒 𝜼 + 𝒓 , 𝑒 𝜼×𝒓′ }

o Non-zerorandomvectors𝒓, 𝒓’o If0 ∈ 𝜼, wehave0 ∈ 𝜼×𝒓’,thenwecanget𝒓 andknow𝑥.

25

Page 26: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

ContingencyTablePerformanceEvaluation

#records=4000

• Complexityincreaseslogarithmicallywiththetablesizes.• Mostofthework(>90%)donebythecloud.

26

(k1k2)

ElapsedTime(s)

Page 27: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

LinearRegression(LR)• Fromdata 𝒙𝑖, 𝑦𝑖 𝑖 , computesamodel𝒘 s.t.

𝒘 = (𝑿T𝑿)no𝑿T𝒚• Theinversionofanencryptedmatrix.Division-freeMatrixInversion(𝑸, 𝜆):set𝑨 o = 𝑸,𝑹 o = 𝑰, 𝑎(o) = 𝜆,anditerate

𝑹 vwo = 2𝑎(v)𝑹 v − 𝑹 v 𝑨 v

𝑨 vwo = 2𝑎(v)𝑨 v − 𝑨 v 𝑨 v

𝑎(vwo) = 𝑎(v)𝑎(v)

[Guo06]𝑹 v givesagoodapproximation to𝜆yz𝑸no if𝜆 isclosetolargesteigenvalueof𝑸(usePCAtocompute𝜆).

Layoutconsistencyleadstoefficientiterativeprotocols.

Guo etal. ASchur-Newtonmethodforthematrixpth rootanditsinverse.2006. 27

Page 28: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

PlaintextPrecisionExpansion(PPE)• Division-freealgorithmsintroducelargeintegers.(𝜆yz)

oButthecurrentFHElibraryallowsatmost60-bitintegers.

• Allowsdivision-freealgorithmswithoutchangingtheFHElibrary.

• UsesK differentFHEparameters(eachb-bit<60)oAchievesanequivalentKb-bitparameter.o IncreasesthetimebyK times,butnaturallyparallelizable.

• DirectapplicationoftheChineseRemainderTheorem.

28

Page 29: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Experiments:LinearRegression

16.90 18.34

62.685 67.62

189.07

• Negligibledecryptiontime(lessthan2s).• 20xfasterthanpreviousFHEsolution[Wuetal.12]

o 5dimensions(400+mins).• Goodscalability(reducedexecutionusingmorecores). 29

NumberofDimensions

ElapsedTime

(min)

Page 30: Using Fully Homomorphic Encryption for Statistical ... · Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data Wen-jieLu1, ShoheiKawasaki1,

Summary

• Securestatisticalanalysisinthecloudwithmultipledataproviders.

• TwoprimitivesoMatrixoperationandgreater-than

• Twoprotocols.oContingencytableandlinearregression.

• EncodingandpackingcanimproveFHE'sbalancebetweengeneralityandefficiency.

30