using fully homomorphic encryption for statistical ... · using fully homomorphic encryption for...
TRANSCRIPT
UsingFullyHomomorphicEncryptionforStatisticalAnalysisofCategorical,Ordinal
andNumericalDataWen-jie Lu1,Shohei Kawasaki1,JunSakuma1,2,3
1. University of Tsukuba, Japan2. JST CREST
3. RIKEN Center for AIP
1
StatisticalAnalysisontheCloud
Cloudcomputingisusefulforstatisticalanalysis• Gatherdistributeddata,andreducehardware cost.• Minimalinteractionsbetweendataprovidersandthecloud.• Theclouddoesmostoftheworkfortheanalyst.
Query&Result
Datacollection
Thirdpartycloudserver
Multipledataproviders Analyst
2
CloudComputingwithSensitiveData
• Usingoutsidecloudserversraisesprivacyconcerns.o E.g,medicalrecords,federaldata.
• Wewanttocalculatestatisticsonthecloudwhilekeepingthedatasecret.
Sensitivedata
Thirdpartycloudserver
3
SecureMultipartyComputation(SMC)
• Off-the-shelftoolsforSMCprotocolso Yao’sgarbledcircuit(GC).o Fullyhomomorphicencryption(FHE).
• ButdevelopmentcostandefficiencyhinderapplicationsofGCandFHEinthecloud.
Z=F(x,y)x y
OnlyrevealsZ!x,y:privateinputF:publicfunction
YaoAndrew.Protocolsforsecuresecurecomputation.1982.Gentry.Fullyhomomorphicencryptionusingideallattices.2009. 4
GContheCloudEnvironment
5
SecretSharing
GCprotocol
GCrequiresalargedevelopmentcost• Multipleservers areneeded.
oAssumenocollusionbetweenservers.• Fastnetworkisnecessaryforcomputation.
o E.g.,10Gbpsbandwidth.
FHEontheCloudEnvironment
• Lessdevelopmentcosto Singleserverisenough.oRapidnetworkisnotnecessary.
• Butmightbeinefficientinpracticeo Encryptbitsonebyone.o1~10ms perevaluation.o1~10megabytesperciphertext.
Gentryetal.HomomorphicEvaluationoftheAESCircuit.2012.6
ciphertexts
FHEprotocol
Observation
• PurposeofencryptingbitsseparatelyoToevaluateanyBooleanfunction.
• Buttodostatisticalanalysis,wecanuseomatrixarithmeticoperation.ocomparison operation.
7
OurResult• TwonewFHE-basedprimitives:
oMatrixOperationsoBatchGreater-than
• Securestatisticalprotocols:ohistogram(count),oorderofcounts,o contingencytable (withcell-suppression),opercentile,oprincipalcomponentanalysis(PCA),o linearregression.
• Sourcecodes:https://github.com/fionser/CODA8
Preliminaries:FullyHomomorphicEncryption
• Public-privatekeyscheme.oDataproviders&cloudsharethepublickey.o Theanalystholdstheprivatekey.
• Allowaddition(subtraction)andmultiplicationonencryptedintegers.oAnalogy:blackboxwithgloves
Brakerski etal.FullyHomomorphicEncryptionwithoutBootstrapping.2012.
9
Preliminaries:Packing(Batching)
• Enabletoencrypt andprocessvectors atnoextracost.
N.P.Smartetal.FullyhomomorphicSIMDoperations.2011.
1 2 3 4
8 7 6 5
+
9 9 9 9
xSinglehomomorphicoperation
1 2 3 4
8 7 6 5
8 14 18 20
o Fewerciphertextso Fastercomputation
Multipleresults
10
Preliminaries:SlotManipulation
Rotate slotsoftheencryptedvector.
Halevi etal.AlgorithmsinHelib.2014.
1 2 3 4 >>2 3 4 1 2
Replicate aspecificslot.
8 5 1 5 @3 1 1 1 1
11
PartIITechnicalDetails
• Datapreprocessing.• Efficientmatrixmultiplicationonciphertexts.• Comparingtwoencryptedintegers.• Exampleoftwoprotocols:
oContingencytablewithcell-suppressiono Linearregression(forotherprotocols,refertoourpaper).
12
DataPreprocessing
• Numericaldata:fixed-pointrepresentationo3.14159 → ⌈3.14159×1000⌋ = 3142o Precision(e.g.,1000)determinedinadvance
• Categoricaldata:1-of-krepresentationo Gender(i.e.,k=2).Female→ [1,0]andMale→ [0,1]
• Ordinaldata:stair-caseencoding
13
ProposedMatrixPrimitive• Usedforadding&multiplyingencryptedmatrices• Encrypteachrowseparatelybypacking.
oRow-wiseencryption.oHorizontallypartitioneddata
• Efficientandlayoutconsistent.o𝑂 𝑁2 homomorphicoperations.
14
MatrixMultiplication[1/2]• Encryptthematrixrowbyrowwithpacking.
15
1 1
2 2
a b
c d
1a+2c 1b+2dmultiplyadd
multiply
11 23 42× 1
𝑎 𝑏𝑐 𝑑2 = 11𝑎 + 2𝑐 1𝑏 + 2𝑑
3𝑎 + 4𝑐 3𝑏 + 4𝑑2Replicate@1@21
23
MatrixMultiplication[1/2]• Encryptthematrixrowbyrowwithpacking.
• N2 replications,multiplicationsandadditionso𝑂 𝑁2 complexitycomparedto𝑂 𝑁3 (nopacking).
• Alsorow-wiselyencryptedresultingmatrix.
3 3
4 4
a b
c d
1a+2c 1b+2d
16
multiplyadd
multiply 3a+4c 3b+4d
11 23 42× 1
𝑎 𝑏𝑐 𝑑2 = 11𝑎 + 2𝑐 1𝑏 + 2𝑑
3𝑎 + 4𝑐 3𝑏 + 4𝑑2Replicate@1@2
MatrixMultiplication[2/2]
17
• Layoutconsistencyisimportantfordevelopingefficientstatisticalprotocols.o Statisticalalgorithmsneediterativematrixmultiplications
Efficientforsinglemultiplication
Layoutconsistent
??
Stillefficientforiterativemulti.
Inefficientforiterativemulti.
Heavylayoutadjustment
YesNo
ExperimentalSettingsofMatrixPrimitive
• Implementations:o FHE:HElib (C++based)o GC:ObliVM (javabased)
• Evaluatedon32-bitintegers• Networks:
o LAN(about88Mbps)o WAN(about48Mbps)
18
HElib.https://github.com/shaih/HElib.Liuetal.ObliVM:Aprogrammingframeworkforsecurecomputation.2015.
EvaluationofMatrixPrimitive
0.1
1
10
100
1000
10000
2 4 8 16 32 64
Second
#Matrix Dimension
FHE-LAN
FHE-WAN
GC-LAN
GC-WAN
• Whendoiterativemultiplications,FHE-basedprimitivecanofferbetterperformance.o Savecommunicationcostbetweeneachiteration
ExecutionTime
19
0.1
1
10
100
1000
10000
100000
2 4 8 16 32 64MB
Matrix Dimension
GC
16640
132096
1052672
8404992
67174400
537133056
FHE
CommunicationCost
ElapsedTime(s)
DataTransferred
(MB)
Greater-than(GT)Primitive
GT e 𝑥 , 𝑒 𝑦 → 𝑒(𝑥 >? 𝑦) s.t. 0 ≤ 𝑥, 𝑦 ≤ D• [Golle06]basedonPaillier cryptosystem:
𝑖𝑓𝑥 > 𝑦𝑡ℎ𝑒𝑛∃𝑘 ∈ 1, 𝐷 → 𝑥 − 𝑦 − 𝑘 = 0• Combinationwithpackinggivesgreatimprovements:
𝑒 𝑥, … , 𝑥 − 𝑒 𝑦,… , 𝑦 − [1, 2, … , 𝐷] → 𝑒(𝜼)
o0 ∈ 𝜼 ⟺ 𝑥 > 𝑦 (i.e.,decryptionisneeded)oComplexityfrom𝐷 to⌈D/ℓ⌉.
Golle.Aprivatestablematchingalgorithm.2006. 20
ReplicatedDtimes
ExperimentalSettingsforGTPrimitive
• Implementations:o FHE:HElib (C++based)o GC:ObliVM (javabased)
• Domain𝐷 =24 ~224• Numberofslotsℓ ≈ 1700.• Networks:
o LAN(about88Mbps)o WAN(about48Mbps)
21
HElib.https://github.com/shaih/HElib.Liuetal.ObliVM:Aprogrammingframeworkforsecurecomputation.2015.
EvaluationofGreater-thanPrimitive
0.1
1
10
100
1000
4 8 12 16 20 24
Second
#Bits
FHE-LAN
FHE-WAN
GC-LAN
GC-WAN
Worksforsmalldomains,whichisenoughforordinalstatistics.
22
0.001
0.01
0.1
1
10
100
1000
10000
4 8 12 16 20 24MB
#Bits
GC
76 88 100 112 124 136
FHE
ExecutionTime CommunicationCost
ElapsedTime(s)
DataTransferred
(MB)
SecureStatisticalProtocols
• Contingencytablewithcell-suppressionprotocol:oUsethegreater-thanprimitive.oOneroundprotocolbetweencloudandanalyst.
• Linearregressionprotocol:oUsethematrixprimitive.o Tworoundsprotocol.oUseaPlaintextPrecisionExpansiontechnique(discussitlatter).
23
ContingencyTableGender Smoke
Male SmokerFemale Non-smokerMale Non-SmokerCategoricaldata
Smoker Non-smokerMale 1 1
Female 0 1
ContingencyTable• Indicatorencoding:
Male→ [1,0],Female→ [0,1]Smoker→ [1,0],Non-smoker→ [0,1]
• BasicIdea:multiply&rotate[a1,a2]x[b1,b2] countsMale-Smoker,andFemale-Nonsmoker
[a1,a2]x([b1,b2]>>1)=[a1,a2]x[b2,b1] givesothertwocounts.• Improvementwithnoextrapreprocessing
o O(max(k1,k2))=>O(logk1k2). 24
K1 =2
K2 =2
ContingencyTable:CellSuppression
Smoker Non-smoker
Male 20 11
Female 3 12
Smoker Non-smoker
Male 20 11
Female 0 12
if<10zeroout
OriginTable SuppressedTable
• Protecttheprivacyofrareindividuals.• Givenaciphertext 𝑒(𝑥),tocompute𝑒 𝑦 where
if𝑥 >thresholdthen𝑦 =𝑥 else𝑦 =somerandomvalue• 𝐺𝑇 𝑒 𝑥 , threshold = 𝑒 𝜼 . iff𝑥 >threshold,then0 ∈ 𝜼.• Tocompute{𝑒 𝑥 + 𝒓 , 𝑒 𝜼 + 𝒓 , 𝑒 𝜼×𝒓′ }
o Non-zerorandomvectors𝒓, 𝒓’o If0 ∈ 𝜼, wehave0 ∈ 𝜼×𝒓’,thenwecanget𝒓 andknow𝑥.
25
ContingencyTablePerformanceEvaluation
#records=4000
• Complexityincreaseslogarithmicallywiththetablesizes.• Mostofthework(>90%)donebythecloud.
26
(k1k2)
ElapsedTime(s)
LinearRegression(LR)• Fromdata 𝒙𝑖, 𝑦𝑖 𝑖 , computesamodel𝒘 s.t.
𝒘 = (𝑿T𝑿)no𝑿T𝒚• Theinversionofanencryptedmatrix.Division-freeMatrixInversion(𝑸, 𝜆):set𝑨 o = 𝑸,𝑹 o = 𝑰, 𝑎(o) = 𝜆,anditerate
𝑹 vwo = 2𝑎(v)𝑹 v − 𝑹 v 𝑨 v
𝑨 vwo = 2𝑎(v)𝑨 v − 𝑨 v 𝑨 v
𝑎(vwo) = 𝑎(v)𝑎(v)
[Guo06]𝑹 v givesagoodapproximation to𝜆yz𝑸no if𝜆 isclosetolargesteigenvalueof𝑸(usePCAtocompute𝜆).
Layoutconsistencyleadstoefficientiterativeprotocols.
Guo etal. ASchur-Newtonmethodforthematrixpth rootanditsinverse.2006. 27
PlaintextPrecisionExpansion(PPE)• Division-freealgorithmsintroducelargeintegers.(𝜆yz)
oButthecurrentFHElibraryallowsatmost60-bitintegers.
• Allowsdivision-freealgorithmswithoutchangingtheFHElibrary.
• UsesK differentFHEparameters(eachb-bit<60)oAchievesanequivalentKb-bitparameter.o IncreasesthetimebyK times,butnaturallyparallelizable.
• DirectapplicationoftheChineseRemainderTheorem.
28
Experiments:LinearRegression
16.90 18.34
62.685 67.62
189.07
• Negligibledecryptiontime(lessthan2s).• 20xfasterthanpreviousFHEsolution[Wuetal.12]
o 5dimensions(400+mins).• Goodscalability(reducedexecutionusingmorecores). 29
NumberofDimensions
ElapsedTime
(min)
Summary
• Securestatisticalanalysisinthecloudwithmultipledataproviders.
• TwoprimitivesoMatrixoperationandgreater-than
• Twoprotocols.oContingencytableandlinearregression.
• EncodingandpackingcanimproveFHE'sbalancebetweengeneralityandefficiency.
30