privacy preservation issues in association rule mining in horizontally partitioned databases
DESCRIPTION
Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned DatabasesTRANSCRIPT
![Page 1: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/1.jpg)
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
![Page 2: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/2.jpg)
Introduction
Look before you leap
![Page 3: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/3.jpg)
The Flow
Association Rule Mining
Privacy Preservation
Horizontally Distributed Datasets
![Page 4: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/4.jpg)
Before we start mining!
trends or patterns in large datasets
extracting useful information
useful and unexpected
insights
analyze and predicting system
behavior
Data Mining
Scalability ?
Artificial Engineeri
ng
Machine Learning
Statistics
Database Systems
![Page 5: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/5.jpg)
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
![Page 6: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/6.jpg)
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent
Consequent
![Page 7: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/7.jpg)
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for the rule to be applied
Consequent
• The outcome
Support
• Percentage of transaction containing the itemset
Confidence
• Faction of transaction satisfying the rule
![Page 8: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/8.jpg)
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
![Page 9: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/9.jpg)
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating Large Items
et
• combinations of itemsets which are above a minimum support threshold
Generating Association Rules
• Mining all rules which are satisfied in that itemset
![Page 10: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/10.jpg)
Association Rule Learning in Distributed Datasets
And Privacy Preservation
![Page 11: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/11.jpg)
• Most tools used for mining association rules assume that data to be analyzed can be collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy.
Preview
![Page 12: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/12.jpg)
• Dataset• Combined data of Twitter and Facebook
• Rule• How many percentage of people login into a social
networking site and post within the next 2 minutes?
Privacy Preservation
![Page 13: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/13.jpg)
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate
of complication?
• Implications:
• A company may have high cases of the procedure
failing and they may change policies to help.
• At the same time if this rule is exposed it may be a
huge problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID
Disease Prescription
Effect
Patient ID
Disease Prescription
Effect
Patient ID
Disease Prescription
Effect
Company A
Company C
Company B
![Page 14: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/14.jpg)
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We can exploit.
![Page 15: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/15.jpg)
Mining of Association Rules
In Horizontally Partitioned Databases
![Page 16: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/16.jpg)
What we want• Computing Association Rules without revealing private information and
getting • The global support • The global confidence
What we have• Only the following information is available
• Local Support • Local Confidence• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites. But we’ll get to that.
![Page 17: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/17.jpg)
Calculating Required Values
𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝐴𝐵⇒ C=∑i=1
sites
supportcount ABC (i )
∑i=1
sites
database¿ (i ¿)
𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝐴𝐵=∑i=1
sites
supportcount AB (i)
∑i=1
sites
database¿i ¿¿
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝐴𝐵⇒C=support AB⇒ Csupport AB
![Page 18: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/18.jpg)
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
![Page 19: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/19.jpg)
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of
encryption.
Introducing the two Algorithms
![Page 20: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/20.jpg)
Algorithm Uno
Some people are honest
![Page 21: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/21.jpg)
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
![Page 22: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/22.jpg)
Phase 1: Commutative Encryption
![Page 23: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/23.jpg)
Phase 2: Data Distortion
Site AABC:5
Size=100
Site BABC:6
Size=200
Site CABC:20
Size=300
R+count-5%*Size=17+5-5%*100
13+20-5%*300 17+6-5%*20013
1718 >= R
R=17
![Page 24: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/24.jpg)
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules
rather than a subjective or weighted approach.
• As the no of candidate itemsets increases the encryption
overhead increases.
• The encryption overhead also varies directly proportional to the
no of sites or partitions.
Problems with the Algorithm
I got ……
![Page 25: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/25.jpg)
Algorithm Dua
Don’t trust anyone
![Page 26: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/26.jpg)
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such
segment.
CK Secure Sum
![Page 27: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/27.jpg)
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
![Page 28: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/28.jpg)
Conclusion
The moral of the story...
![Page 29: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/29.jpg)
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation
through careful analysis.
• This aspect of data mining can be very useful in predicting patterns and
foreseeing trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and
enjoy the harvest from data mining.
![Page 30: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases](https://reader035.vdocuments.net/reader035/viewer/2022062513/55628a19d8b42a68128b4638/html5/thumbnails/30.jpg)
There are no dumb questions
(No questions please shhhh…)