privacy preserving data miningfabian/courses/cs600.624/slides/... · overview what is data mining?...
TRANSCRIPT
![Page 1: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/1.jpg)
Privacy PreservingData Mining
Moheeb Rajab
![Page 2: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/2.jpg)
Agenda
Overview and Terminology Motivation Active Research Areas
Secure Multi-party Computation (SMC) Randomization approach
Limitations Summary and Insights
![Page 3: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/3.jpg)
Overview
What is Data Mining? Extracting implicit un-obvious patterns and relationships from a
warehoused of data sets.
This information can be useful to increase the efficiency of theorganization and aids future plans.
Can be done at an organizational level. By Establishing a data Warehouse
Can be done also at a global Scale.
![Page 4: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/4.jpg)
Data Mining System Architecture
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Entity I
Entity II
Entity n
GlobalAggregato
r
![Page 5: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/5.jpg)
Distributed Data Mining Architecture
Lower scale Mining
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
![Page 6: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/6.jpg)
Challenges
Privacy Concerns Proprietary information disclosure Concerns about Association breaches Misuse of mining These Concerns provide the motivation
for privacy preserving data miningsolutions
![Page 7: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/7.jpg)
Approaches to preserve privacy
Restrict Access to data (Protect Individualrecords)
Protect both the data and its source: Secure Multi-party computation (SMC) Input Data Randomization
There is no such one solution that fits allpurposes
![Page 8: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/8.jpg)
SMC vs Randomization
Pinkas et al
Overhead
Accuracy
Privacy Randomization Schemes
SM
C
![Page 9: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/9.jpg)
Secure Multi-party Computation
Multiple parties sharing the burden of creating the dataaggregate.
Final processing if needed can be delegated to anyparty.
Computation is considered secure if each party onlyknows its input and the result of its computation.
![Page 10: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/10.jpg)
SMC
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Each Party Knows its input and the result of the operation and nothing else
![Page 11: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/11.jpg)
Key Assumptions
The ONLY information that can be leaked is theinformation that we can get as an overall output from thecomputation (aggregation) process
Users are not Malicious but can honestly curious All users are supposed to abide to the SMC protocol
Otherwise, for the case of having malicious participantsis not easy to model! [Penkas et al, Argawal]
![Page 12: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/12.jpg)
“Tools for Privacy Preserving Distributed DataMining” Clifton et al [SIGKDD]
Secure Sum Given a number of values belonging
to n entities
We need to compute
Such that each entity ONLY knows its input and theresult of the computation (The aggregate sum of thedata)
nxxx ,........,,
21
!=
n
i
ix
1
![Page 13: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/13.jpg)
Examples (Secure Sum)
Problem: Colluding members
Solution Divide values into shares and have each share permute a
disjoint path (no site has the same neighbor twice)
R = 15 10 20
15
45
50
MasterR + 30
R + 10
R +45
R +90
R + 1
40
Sum = R+140 -R
![Page 14: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/14.jpg)
Split path solution
R1 = 15 10 20
15
45
50
R + 15
R + 5
R + 22.5 R +45
R + 7
0
Sum = R1+ 70 – R1 + R2+ 70 –R2= 140
R2 = 12
![Page 15: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/15.jpg)
Secure Set Union
Consider n setsCompute,
Such that each entity ONLY knows U andnothing else.
nSSS ,........,,
21
UUU nSSSSU ,........,
321=
![Page 16: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/16.jpg)
Secure Union Set
Using the properties of CommutativeEncryption
For any permutation i, j the following holds
)...)((...)...)((...11
MEEMEEnjjnii KKKK =
!<== )...))((...)...)((...( 2111
MEEMEEPnjjnii KKKK
![Page 17: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/17.jpg)
Secure Set Union
Global Union Set U. Each site:
Encrypts its items Creates an array M[n] and adds it to U
Upon receiving U an entity should encrypt all items in U that it did notencrypt before.
In the end: all entries are encrypted with all keys Remove the duplicates:
Identical plain text will result the same cipher text regardless of the order of theuse of encryption keys.
Decryption U: Done by all entities in any order.
nKKK ,.....,,
21
![Page 18: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/18.jpg)
Secure Union Set
0 1 0 … . . .. . … )...),((...1
MEEnii
KK
1 2 3 ….. . .. .. . . . n
![Page 19: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/19.jpg)
A
A
C1
3
2 U= {E1(A)}
Problem:Computation Overhead, number of
exchanged messages O(n*m)
U= {E2(E1(A)),E2(C)}
U= {E3(E2(E1(A))),E3(E2(C)), E3(A)}
U= {E3(E2(E1(A))),E1(E3(E2(C))), E1(E3(A))}
U= {E3(E2(E1(A))),E1(E3(E2(C))), E2(E1(E3(A)))}
![Page 20: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/20.jpg)
Problems with SMC
Scalability High Overhead Details of the trust model assumptions
Users are honest and follow the protocol
![Page 21: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/21.jpg)
Randomization Approach
“Privacy Preserving Data Mining”, Argawal et. al [SIKDD]
Applied generally to provide estimates for data distributions ratherthan single point estimates
A user is allowed to alter the value provided to the aggregator
The alteration scheme should known to the aggregator
The aggregator Estimates the overall global distribution of input byremoving the randomization from the aggregate data
![Page 22: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/22.jpg)
Randomization Approach (ctnd.)
Assumptions: Users are willing to divulge some form of their data The aggregator is not malicious but may honestly
curious (they follow the protocol)
Two main data perturbation schemes Value- class membership (Discretization) Value distortion
![Page 23: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/23.jpg)
Randomization Methods
Value Distortion Method
Given a value the client is allowed to report a distorted valuewhere is a random variable drawn from a known
distribution
Uniform Distribution:
Gaussian Distribution:
ix
)( rxi + r
[ ]!!µ +"= ,,0
!µ ,0=
![Page 24: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/24.jpg)
Quantifying the privacy of differentrandomization Schemes
6.8 x σ3.92 x σ1.34 x σGaussian
0.999 x 2α0.95 x 2α0.5 x 2αUniform
0.999 x W0.95 x W0.5 x WDiscretization
Distribution99.9 %95 %50 %Confidence (α)
W
− α + α
Gaussian Distribution provides the best accuracy at higher confidence levels
![Page 25: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/25.jpg)
Problem Statement
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
EntityI
Entity II
Entity m
GlobalAggregator
kk yxyxyx1112121111
,.....,, +++
kk yxyxyx2222222121
,.....,, +++
mkmkmmmm yxyxyx +++ ,.....,,2211
nnyxyxyx +++ ,.....,,
2211 )(afY
Estimator
)(zf X
![Page 26: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/26.jpg)
Reconstruction of the Original Distribution
Reconstruction problem can be viewed in in the generalframework of the “Inverse Problems”
Inverse Problems: describing system internal structurefrom indirect noisy data.
Bayesian Estimation is an Effective tools for suchsettings
![Page 27: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/27.jpg)
Formal problem statement
Given one dimensional array of randomized data
Where ’s are iid random variables each with the same distributionas the random variable X
And ’s are realizations of a globally known random distributionwith CDF
Purpose: Estimate
nnyxyxyx +++ ,.....,,
2211
ix
iy
YF
XF
![Page 28: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/28.jpg)
Background: Bayesian Inference
An Estimation method that involves collecting observational dataand use it a tool to adjust (either support of refute) a prior belief.
The previous knowledge (hypothesis) has an established probabilitycalled (prior probability)
The adjusted hypothesis given the new observational data is called(posterior probability)
![Page 29: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/29.jpg)
Bayesian Inference Let the prior probability, then Bayes’ rule states
that the posterior probability of given anobservation (D) is given by:
Bayes rule is a cyclic application of the general form ofthe joint probability theorem:
)(
)()|()|( 00
0DP
HPHDPDHP =
)()|(),( 00 DPDHPHDP =
)( 0HP
)( 0H
![Page 30: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/30.jpg)
Bayesian Inference ( Classical Example)
Two Boxes: Box-I : 30 Red balls and 10 White Balls Box-II: 20 Red balls and 20 White Balls
A Person draws a Red Ball, what is the probability thatthe Ball is from Box-I
Prior Probability P(Box-I) = 0.5
From the data we know that: P(Red|Box-I) = 30/40 = 0.75 P(Red|Box-II) = 20/40 = 0.5
![Page 31: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/31.jpg)
Example (cntd.)
Now, given the new observation (The Red Ball)we want to know the posterior probability of Box-I(i.e P(Box-I | Red) )
)(
)()|()|(
REDP
IBoxPIBoxREDPREDIBoxP
!!=!
),(),()( IIBoxREDPIBoxREDPREDP !+!=
)()|()()|()( IIBoxPIIBoxREDPIBoxPIBoxREDPREDP !!+!!=
5.05.075.05.0)( !+!=REDP
![Page 32: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/32.jpg)
Example (cntd)
Computing the joint probability:
Substituting,
The posterior probability of Box-I is amplifies by the observation ofthe Red Ball
)()|()()|()( IIBoxPIIBoxREDPIBoxPIBoxREDPREDP !!+!!=
5.05.075.05.0)( !+!=REDP
6.05.05.075.05.0
5.075.0)|( =
!+!
!=" REDIBoxP
![Page 33: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/33.jpg)
Back: Formal problem statement
Given one dimensional array of randomized data
Where ‘s are iid random variables each with the same distributionas the random variable X
And ’s are realizations of a globally known random distributionwith CDF
Purpose: Estimate
nnyxyxyx +++ ,.....,,
2211
ix
iy
YF
XF
![Page 34: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/34.jpg)
Continuous probability distributions
)()()(}{ zFzCDFdkkfzrP X
z
X ===! " #$0}{ == zrP
1)( =!+"
"#dkkfX
![Page 35: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/35.jpg)
CDF and PDF
![Page 36: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/36.jpg)
Estimation of
Bayes Rule:
Posterior Probability
Applying Bayes rule
)(
)()|()|( 00
0DP
HPHDPDHP =
!=
"#==+$
az
zXX dzwYXzfaF )|()( 111
!"# +
+ ==
a
YX
XYX
Xwf
dzzfzXwfaF
)(
)()|()(
1
11
11
111
XF
![Page 37: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/37.jpg)
Estimation of
We want to evaluate
Substituting:
!"
"#
++ == dkkfkXwfwf XYXYX )()|()(11111 111
!!
"#
"
"#
+
+
=
==
a
XYX
XyX
X
dkkfkXwf
dzzfzXwfaF
)()|(
)()|()(
111
111
11
11
XF
)( 111wf YX +
![Page 38: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/38.jpg)
Estimation of
Simplification (independence):
XF
!
!"
"#
"#
#
#
=
dkkfzwf
dzzfzwf
aF
XiY
a
XiY
X
)()(
)()(
)(
![Page 39: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/39.jpg)
Estimation of
For all n observations:
!"
"
=#
#$
#$
$
$
=n
i
XiY
a
XiY
X
dkkfzwf
dzzfzwf
naF
1)()(
)()(1
)(
XF
![Page 40: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/40.jpg)
Estimation of the PDF
Is just the derivative of the CDF
Xf
Xf
!"
=#
#$
$
$=
n
i
XiY
XiYX
dkkfzwf
dzzfzwf
naf
1)()(
)()(1)(
![Page 41: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/41.jpg)
Algorithm
Uniform Distribution
While ( not Stopping Condition):
=:0
Xf
0:=j
!"
=#+
#$
+
$
$=
n
i j
XiY
j
XiYj
X
dzzfzwf
afawf
naf
1
1
)()(
)()(1:)(
1: += jj
![Page 42: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/42.jpg)
Stopping Criteria
The algorithm should terminate if:
For each round a goodness of fit test is performed.
Iteration is stopped when the difference between thetwo estimates is too small (lower that a certainthreshold)
)()(1 afaf j
X
j
X !+
2!
![Page 43: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/43.jpg)
Evaluation
Gaussian Randomizing Function µ=0, σ = 0.25
![Page 44: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/44.jpg)
Evaluation
Uniform Randomizing Function [-0.5,0.5]
![Page 45: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/45.jpg)
How is this different from KalmanEstimator?
Both are estimation techniques
Kalman is stateless
In Kalman filter case we knew the distribution andestimation is used to validate whether the trend of thedata matches that distribution
In Bayesian Inference the observation data is used toadjust the prior hypothesis (probability distribution)
![Page 46: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/46.jpg)
Is the Problem Solved?
Suppose a client randomizes Age records using auniform random variable [-50,50]
If the aggregator receives value 120, with 100%confidence it knows that actual age is
Simply randomization does not guarantee absoluteprivacy
70!
![Page 47: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/47.jpg)
How to achieve better randomizationscheme
“Limiting Privacy Breaches in PrivacyPreserving Data Mining” Evfimievski et al
Define an evaluation metric of how privacypreserving a scheme is.
Based on the developed metric, develop arandomization scheme that abides to this metric
![Page 48: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/48.jpg)
How Privacy preserving is a scheme?
Information Theoretic Approach:
Computes the average information disclose in a randomizedattribute by computing the mutual information between theactual and the randomized attribute
Privacy breach Defines a criteria that should be satisfied for a randomization
scheme to be privacy preserving
![Page 49: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/49.jpg)
What is a privacy breach?
A privacy breach occurs when thedisclosure of a randomized value to theaggregator reveals that a certain property
of the “individual” input holdswith high probability
iy
ix)(xQ
![Page 50: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/50.jpg)
Privacy Breach
Back to Bayes’
Prior Probability where isthe property
Posterior probability:
))(( xQP )(xQ
)|)(( iyxQP
![Page 51: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/51.jpg)
Amplification Is defined in terms of the transitive probability
where y is a fixed randomized outputvalue
Intuitive definition: if there are many ’s that can be mapped to y by
the randomizing scheme then disclosing y have giveslittle information about
We say we amplify the probability that
][ yxP !
ix
][ yxP !
ix
![Page 52: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/52.jpg)
Amplification factor
Let, R a randomization operator
a randomized value of x.Revealing R will not cause privacy breach if :
!>"
"
)1(
)1(
2
1
1
2
p
p
p
p
yVy!
![Page 53: Privacy Preserving Data Miningfabian/courses/CS600.624/slides/... · Overview What is Data Mining? Extracting implicit un-obvious patterns and relationships from a warehoused of data](https://reader034.vdocuments.net/reader034/viewer/2022050200/5f53bcf4618636597671add4/html5/thumbnails/53.jpg)
Summary
No one solution can fit all. Which area looks more promising? Can we create robust randomization
schemes to a wide scale of applicationsand different distributions of data?
How to deal with the case of Maliciousparticipants?