1.a random decision tree framework.pdf

13
A Random Decision Tree Framework for Privacy-Preserving Data Mining Jaideep Vaidya,  Senior Member, IEEE , Basit Shaq, Member, IEEE , Wei Fan, Member, IEEE , Danish Mehmood, and David Lorenzi Abstract—Distributed data is ubiquitous in modern information driven applications. With multiple sources of data, the natural challenge is to determine how to collaborate effectively across proprietary organizational boundaries while maximizing the utility of collected information. Since using only local data gives suboptimal utility, techniques for privacy-preserving collaborative knowledge discovery must be developed. Existing cryptography-based work for privacy-preserving data mining is still too slow to be effective for large scale data sets to face today’s big data challenge. Previous work on random decision trees (RDT) shows that it is possible to generate equivalent and accurate models with much smaller cost. We exploit the fact that RDTs can naturally t into a parallel and fully distributed architecture, and develop protocols to implement privacy-preserving RDTs that enable general and efcient distributed privacy-preserving knowledge discovery. Index Terms Privacy-preserving data mining, classication Ç 1 INTRODUCTION B UILDING and applying any data mining model gener- ally assumes that the underlying data is freely accessi-  ble. Often, this is not realisti c. Privacy and security concerns restrict the sharing or centralization of data. Pri- vacy-preserving data mining has emerged as an effective method to solve this problem [1]. Distributed solutions have been proposed that can preserve privacy while still enabling data mining. However, while perturbation based solutions do not provide stringent privacy, cryptographic solutions are too inefcient and infeasible to enable truly large scale analytics to face the era of big data. In this paper, we propose a solution that uses both randomization and cryptographic techniques to provide improved ef- ciency and security for several decision tree-based learn- ing tasks. Indeed, to the best of our knowled ge, the pr oposed soluti on pr ovides an order of magnit ude improvement in efci ency over exist ing solut ions while providing more security. This is an effective solution to privacy-preserving data mining for the big data challenge. The proposed approach is based on random decision trees (RD T), devel ope d by Fan et al. [2]. One imp ort ant property of RDT is that the same code can be used for multi- ple dat a mining tas ks: classicat ion, regres sion, ranking and multiple classication [2], [3], [4]. As shown previously, the RDT is an efcient implementation of Bayes optimal classier (BOC) [2], effective non-parametric density estima- tion [4], and can be explained via high order statistics such as moments [5]. The use of the multiple RDTs in various learning tasks offers many benets over other traditional classication/tree building techniques, because its structure and progression lends itself to modication for distributed/ parallel tasks. RDT is also an excellent candidate for use in privacy preserving distributed data mining since: 1. Rand omness in st ructu re r ather than simple pertur-  bation of input/output is more effective—perturbing the input or output fr om a da ta ba se to ac hi eve privacy wor ks, but the uti lit y of the inf ormati on garnered from data mining can be diminished if the pertu rbati ons are not carefu lly contr olled, or con- versely, information can be leaked if the information is not perturbed enough. Instead, we can exploit the design properties of RDT to generate trees that are random in structure, providing us with a similar end effect as perturbation without the associated pitfalls. A random struct ur e pr ovides securi ty agai nst leveraging a priori information to discover the entire classication model or instances. 2. Purel y cr ypto graph ic approa ches are often too slow to be pra cti cal and can become comput ati ona lly expensive as the size of the data set increases and intercommuni cat ions bet wee n different par ties inc rea se. RDT pro vid es a convenient esc ape fro m this par adi gm tha nks to its str uct ura l proper ties, more specically, the fact that only specic nodes (t he leaves) in the cl assi cati on tree need to be encrypted/decrypted, and secure token passing pre- vents attackers from utilizing counting techniques to decipher instance classications, as the branch struc- ture of the tree is hidden from all parties. 3. RDT is a ge nera l appr oach in whic h the same code works for class icat ion, regre ssion, ranking  J. Vaidya and D. Lorenzi ar e with the MSIS Department, Rutge rs Univer- sity, 1 Washington Park, Newark, NJ 07102. E-mail: [email protected], dlorenzi@pegas us.rutgers.edu.  B. Shaq and D. Mehmood are with the CS Department, Lahore Univ er- sity of Management Sciences, D.H.A., Lahore Cantt., Lahore 54792, Paki- stan. E-mail: [email protected], [email protected].  W. Fan is with Huawei Noah’s Ark Lab, Cor e Building 2, Hong Kong Science Park, Shatin, Hong Kong. E-mail: [email protected].  Manuscript received 22 Dec. 2012; revised 6 July 2013; accepted 22 Sept. 2013. Date of publication 30 Sept. 2013; date of current version 17 Sept. 2014. For information on obtaining reprints of this article, please send e-mail to: [email protected] , and reference the Digital Object Identier below. Digital Object Identier no. 10.1109/TDSC.2013.43 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTIN G, VOL. 11, NO. 5, SEPTEMBER/OCTOBER 2014 399 1545-5971 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standard s/publications/rights/index.html for more information.

Upload: meenaalagar

Post on 01-Jun-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 1/13

Page 2: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 2/13

Page 3: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 3/13

Page 4: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 4/13

Page 5: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 5/13

Page 6: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 6/13

Page 7: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 7/13

Page 8: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 8/13

Page 9: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 9/13

Page 10: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 10/13

Page 11: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 11/13

Page 12: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 12/13

Page 13: 1.A Random Decision Tree Framework.pdf

8/9/2019 1.A Random Decision Tree Framework.pdf

http://slidepdf.com/reader/full/1a-random-decision-tree-frameworkpdf 13/13