animprovedsanitizationalgorithminprivacy-preserving...

14
Research Article An Improved Sanitization Algorithm in Privacy-Preserving Utility Mining XuanLiu , 1,2 Genlang Chen, 1,2 Shiting Wen, 1,2 and Guanghui Song 1,2 1 Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China 2 Zhejiang University Ningbo Research Institute, Ningbo 315100, China Correspondence should be addressed to Xuan Liu; [email protected] Received 13 July 2019; Revised 2 March 2020; Accepted 9 March 2020; Published 25 April 2020 Academic Editor: Qiuye Sun Copyright © 2020 Xuan Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. High-utility pattern mining is an effective technique that extracts significant information from varied types of databases. However, the analysis of data with sensitive private information may cause privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, privacy-preserving utility mining (PPUM) has become an important research topic in recent years. e MSICF algorithm is a sanitization algorithm for PPUM. It selects the item based on the conflict count and identifies the victim transaction based on the concept of utility. Although MSICF is effective, the heuristic selection strategy can be improved to obtain a lower ratio of side effects. In our paper, we propose an improved sanitization approach named the Improved Maximum Sensitive Itemsets Conflict First Algorithm (IMSICF) to address this issue. It dynamically calculates conflict counts of sensitive items in the sanitization process. In addition, IMSICF chooses the transaction with the minimum number of nonsensitive itemsets and the maximum utility in a sensitive itemset for modification. Extensive experiments have been conducted on various datasets to evaluate the effectiveness of our proposed algorithm. e results show that IMSICF outperforms other state-of-the-art al- gorithms in terms of minimizing side effects on nonsensitive information. Moreover, the influence of correlation among itemsets on various sanitization algorithms’ performance is observed. 1.Introduction Data mining is used to discover the decision-making knowl- edge and information from massive data [1–4]. In a cooperative project, data are shared among different companies for mutual benefits. However, this brings the risk of disclosing sensitive knowledge contained in a database [5]. Sensitive knowledge can be represented as a set of frequent patterns and high-utility patterns with security implication [6, 7]. us, data owner wants to hide sensitive information before a database is re- leased. To solve the problem, privacy-preserving data mining (PPDM) has been proposed and become an important research direction [8]. PPDM methods have been applied in various fields, such as cloud computing, e-health, wireless sensor networks, and location-based services [9]. One way to conceal sensitive knowledge is to sanitize a database by modifying some items in it. Atallah et al. [10] first proved that the optimal sensitive-knowledge-hiding problem is NP-hard and proposed a sanitization algorithm based on heuristic strategy. After that, a lot of works have been completed. However, the existing hiding approaches for protecting high-utility itemsets sanitize a database on the basis of the concept of utility. Meanwhile, the side effects on nonsensitive information are not taken into account. us, the damage to nonsensitive knowledge is serious, and da- tabase quality is low when a database is modified. To address this problem, we propose an improved approach called the Improved Minimum Sensitive Itemsets Conflict First Al- gorithm (IMSICF) for hiding sensitive high-utility itemsets. is algorithm is based on the MSICF algorithm and makes the following improvements: (1) For the victim item selection, the conflict count of each sensitive item is dynamically calculated in the sanitization process, which ensures that the item with the maximum conflict count is chosen to be sanitized. (2) For the victim transaction selection, the transaction supporting the least number of nonsensitive itemsets Hindawi Mathematical Problems in Engineering Volume 2020, Article ID 7489045, 14 pages https://doi.org/10.1155/2020/7489045

Upload: others

Post on 05-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

Research ArticleAn Improved Sanitization Algorithm in Privacy-PreservingUtility Mining

Xuan Liu 12 Genlang Chen12 Shiting Wen12 and Guanghui Song12

1Ningbo Institute of Technology Zhejiang University Ningbo 315100 China2Zhejiang University Ningbo Research Institute Ningbo 315100 China

Correspondence should be addressed to Xuan Liu liuxuan6105126com

Received 13 July 2019 Revised 2 March 2020 Accepted 9 March 2020 Published 25 April 2020

Academic Editor Qiuye Sun

Copyright copy 2020 Xuan Liu et al +is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

High-utility patternmining is an effective technique that extracts significant information from varied types of databases Howeverthe analysis of data with sensitive private information may cause privacy concerns To achieve better trade-off between utilitymaximizing and privacy preserving privacy-preserving utility mining (PPUM) has become an important research topic in recentyears +eMSICF algorithm is a sanitization algorithm for PPUM It selects the item based on the conflict count and identifies thevictim transaction based on the concept of utility AlthoughMSICF is effective the heuristic selection strategy can be improved toobtain a lower ratio of side effects In our paper we propose an improved sanitization approach named the Improved MaximumSensitive Itemsets Conflict First Algorithm (IMSICF) to address this issue It dynamically calculates conflict counts of sensitiveitems in the sanitization process In addition IMSICF chooses the transaction with the minimum number of nonsensitive itemsetsand the maximum utility in a sensitive itemset for modification Extensive experiments have been conducted on various datasetsto evaluate the effectiveness of our proposed algorithm +e results show that IMSICF outperforms other state-of-the-art al-gorithms in terms of minimizing side effects on nonsensitive information Moreover the influence of correlation among itemsetson various sanitization algorithmsrsquo performance is observed

1 Introduction

Data mining is used to discover the decision-making knowl-edge and information frommassive data [1ndash4] In a cooperativeproject data are shared among different companies for mutualbenefits However this brings the risk of disclosing sensitiveknowledge contained in a database [5] Sensitive knowledgecan be represented as a set of frequent patterns and high-utilitypatterns with security implication [6 7] +us data ownerwants to hide sensitive information before a database is re-leased To solve the problem privacy-preserving data mining(PPDM) has been proposed and become an important researchdirection [8] PPDM methods have been applied in variousfields such as cloud computing e-health wireless sensornetworks and location-based services [9]

One way to conceal sensitive knowledge is to sanitize adatabase by modifying some items in it Atallah et al [10]first proved that the optimal sensitive-knowledge-hidingproblem is NP-hard and proposed a sanitization algorithm

based on heuristic strategy After that a lot of works havebeen completed However the existing hiding approachesfor protecting high-utility itemsets sanitize a database on thebasis of the concept of utility Meanwhile the side effects onnonsensitive information are not taken into account +usthe damage to nonsensitive knowledge is serious and da-tabase quality is low when a database is modified To addressthis problem we propose an improved approach called theImproved Minimum Sensitive Itemsets Conflict First Al-gorithm (IMSICF) for hiding sensitive high-utility itemsets+is algorithm is based on the MSICF algorithm and makesthe following improvements

(1) For the victim item selection the conflict count ofeach sensitive item is dynamically calculated in thesanitization process which ensures that the item withthe maximum conflict count is chosen to be sanitized

(2) For the victim transaction selection the transactionsupporting the least number of nonsensitive itemsets

HindawiMathematical Problems in EngineeringVolume 2020 Article ID 7489045 14 pageshttpsdoiorg10115520207489045

and having the maximal utility of a sensitive itemsetis chosen to be modified which effectively reducesundesired side effects produced by the sanitizationprocess

(3) +e conflict degree is defined to reflect the corre-lation among sensitive itemsets Moreover the in-fluence of conflict degree on sanitizationperformance is observed

+e rest of the paper is organized as follows Section 2reviews related works In Section 3 preliminary knowledgeof high-utility itemsets mining is introduced Section 4describes the hidden strategy of the proposed sanitizationapproach Section 5 gives experimental results and analysisFinally conclusions are made in Section 6

2 Related Works

In this section related works on privacy-preserving datamining are reviewed

Most of previous studies focused on hiding sensitiveitemsets from frequent itemset mining approaches Verykioset al [11 12] proposed five approaches for achieving PPDM+e first three algorithms are used to protect sensitive as-sociation rules A sensitive rule is hidden by reducing itssupport or confidence +e last two algorithms are used toconceal sensitive itemsets However all of the five algorithmsonly hide rules that are supported by disjoint frequentitemsets Oliveira and Zaıane [13] presented a one-scanalgorithm that only needs to scan a database once +edisclosure threshold is introduced to balance privacy pro-tection and knowledge disclosure Amiri [14] developedthree algorithms namely aggregate disaggregate and hy-brid for hiding sensitive frequent itemsets Aggregateconceals sensitive itemsets by deleting transactions Disag-gregate sanitizes a database by removing some items +ehybrid algorithm is the combination of the previous twoalgorithms In terms of execution time and side effects thehybrid algorithm is recommended for database sanitization

Gkoulalas-Divanis and Verykios [15] introduced a newapproach for hiding sensitive itemsets by inserting somesynthetic transactions into an original database +e hybriddatabase is generated based on the constraints on the borderitemsets Wu and Huang [16] described two greedy ap-proaches namely greedy approximation algorithm andexhausted algorithm for concealing sensitive associationrules Both algorithms include the sanitization procedureand exposed procedure +e greedy approximation algo-rithm always outperforms the other one because the cost isrecalculated when items are modified However the greedyalgorithm takes a lot of time to expose missing rules

Hong et al [17] used the technique of term frequency-inverse document frequency (TF-IDF) to sanitize a database+e transaction with more sensitive items and less influenceon other transactions is selected for modification Howeverthe scalability of this approach is poor

Le et al [18 19] applied the lattice theory to hide sen-sitive association rules and two approaches called HCSRILand AARHIL were proposed based on the intersection lattice

of frequent itemsets Both algorithms select the victim itemwith the least impacts on the generating set for sanitization+e AARHIL algorithm has better performance thanHCSRIL in terms of missing cost since the victim transactionselection is improved Shah et al [20] adopted the geneticalgorithm to hide sensitive association rules +e fitnessfunction is used to evaluate whether to modify a transactionor not +e transaction with lower fitness will be modifiedwith higher probability because it contains more sensitiveitems and minimal number of data items

Cheng et al [21ndash25] applied a multiobjective optimi-zation algorithm into PPDM such as NSGA-II and Hype+e algorithms in [21ndash24] conceal sensitive association rulesby modifying some items +e sanitization approach in [25]hides sensitive itemsets by removing some items +e keyissue in optimization algorithm is to design the objectivefunctions which are based on the side effects on a databaseBesides Cheng et al [26] also proposed a greedy algorithmfor hiding sensitive rules +e information on nonsensitiveitemsets is considered in the selection of the victim trans-action +us the side effects are effectively reduced Lin et al[27] presented a multiobjective algorithm (NSGA2DT) forhiding sensitive itemsets with transaction deletion A FastSoRting strategy and the prelarge concept are utilized toaccelerate the iterative process

+e above methods focus on protecting sensitiveknowledge in frequent itemset mining However it is notsuitable to modify the quantities of items in a transactionaldatabase Recently various methods are developed forprotecting high-utility itemsets Moreover PPUM has be-come an important research issue Yeh et al [28 29] pre-sented the HHUIF and MSICF algorithms to concealsensitive high-utility itemsets Both algorithms sanitize theoriginal database based on the concept of utility Howeverthe MSICF algorithm takes the conflict count of sensitiveitems into account Rajalaxmi and Natarajan [30] identifythe victim transaction with the maximum number of sen-sitive items +en the item with the maximal utility is se-lected for modification However the impact onnonsensitive itemsets is discarded Lin et al [31] proposed aGA-based algorithm for PPUM by inserting some appro-priate transactions into an original database A function withthree factors is used to determine the transactions for in-sertion However this algorithm produces some spuriousitemsets after the sanitization process

Yun and Kim [32] presented an algorithm called FPUTTto improve the efficiency of the HHUIF algorithm +e treestructure is utilized to accelerate the sanitization processHowever the results of FPUTT in terms of side effects onnonsensitive knowledge are the same as those of HHUIF Linet al [33 34] then developed two approaches MSU_MAUand MSU_MIU for PPUM For each sensitive itemset SHithe transaction with the maximum utility of SHi is selectedfor modification +en the victim item is chosen based onthe maximum utility or minimum utility Lin et al [35]designed a genetic algorithm to hide sensitive HUIs bytransaction deletion +e prelarge concept is adopted toaccelerate the evolution process However some spuriousitemsets are produced by the sanitization process Li et al

2 Mathematical Problems in Engineering

[36] formulate the hiding process as a constraint satisfactionproblem Integer linear programming is adopted in thedesigned algorithm to obtain a lower ratio of side effectsproduced in the hiding process

Rajalaxmi and Natarajan [37] proposed two approachesnamed MSMU and MCRSU to hide the sensitive frequentand utility itemsets Both algorithms conceal the itemsetsuntil their support and utility fall below the given thresholdsrespectively Liu et al [38] presented a novel sanitizationalgorithm called HUFI to conceal sensitive frequent andutility itemsets +e concept of maximum boundary value isintroduced to determine the hidden strategy +us theapproach outperforms the other algorithms in minimizingthe side effects Besides the above works Le et al [39]proposed an efficient algorithm for hiding high-utility se-quential patterns which relies on a novel structure to en-hance the sanitization process

3 Preliminaries

Some preliminary definitions of high-utility itemsets miningare introduced in this section [40 41] In addition thesanitization problem is described [42 43]

31 Definitions Let I I1 I2 Im1113864 1113865 be a set of distinctitems Let D T1 T2 Tn1113864 1113865 be a transaction databasewhere each transaction Ti has a unique identifier TID andTi sube I Each item has an external utility value which reflectsthe importance of an item An itemset is a subset of I and itis called a k-itemset if it contains k items

Definition 1 Each item it is assigned an external utilityvalue which is denoted as eu(it) For example in Table 1eu(b) 3

Definition 2 Each item it in a transaction T is assigned aninternal utility value which is denoted as iu(it T) Forexample in Table 1 iu(b T1) 1

Definition 3 +e utility of item it in a transaction T isdenoted as u(it T) and defined as iu(it T) times eu(it) Forexample in Table 1 u(d T2) 1 times 6 6

Definition 4 +e utility of itemset SHi in a transaction T isdenoted as u(SHi T) and defined as 1113936itisinSHi

u(it T) Forexample in Table 1 u( b c T3) 3 + 4 7

Definition 5 +e utility of itemset SHi is denoted as u(SHi)

and defined as 1113936SHiisinTu(SHi T) For example in Table 1u( b c ) 7 + 16 + 19 42

Definition 6 +e utility of a transaction T is denoted astu(T) and defined as 1113936itisinTu(it T) For example in Table 1u(T2) 2 + 6 8

Definition 7 +e user-specified minimum utility thresholdis denoted as minutil A pattern X is a high-utility itemset ifu(X)geminutil Otherwise it is a low-utility itemset High-

utility itemset mining is to discover the itemsets whoseutility values are beyond minutil

32 Sanitization Problem Description Given a transactiondatabase D the minimum utility threshold minutil and thehigh-utility itemsets mined from D based on minutil SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive high-utility itemsetswhere SHi is the itemset that needs to be hidden A sensitivetransaction refers to the transaction supporting at least onesensitive itemset+e sanitization problem is to transform anoriginal database D to a sanitized database Dprime so that allsensitive itemsets are hidden while at the same time the sideeffects on the database and nonsensitive knowledge areminimized

One way to sanitize a database is to modify some items init +e modified item contained in D is the victim itemwhich is denoted as Ivic +e transaction supporting Ivic isthe victim transaction which is denoted as Tvic

4 The Hiding Approach

In this section a sanitization approach named the ImprovedMaximum Sensitive Itemsets Conflict First Algorithm(IMSICF) is presented in detail +e victim item is selectedbased on the conflict count which is calculated dynamicallyMoreover the victim transaction is selected based on theside effects on nonsensitive knowledge which effectivelyreduces the missing costs To better illustrate how theIMSICF algorithm works an example is given

41 e Sanitization Process of Hiding Sensitive Itemsets

Definition 9 Let SH SH1 SH2 SHt1113864 1113865 be a set ofsensitive high-utility itemsets +e conflict count of a sen-sitive item it in SH is denoted as Icount(it) and defined as| SHi isin SH | it isin SHi1113864 1113865| that is Icount(it) is the number ofsensitive itemsets containing it +e conflict degree of SH isdefined as

1113936itisinSHIcount it( 1113857

ni

(1)

where ni refers to the number of distinct items contained inSH +e conflict degree reflects the correlation amongsensitive itemsets +e higher conflict degree indicates thatsensitive itemsets have more common items

For example given a set of sensitive itemsetsSH b e e f1113864 11138651113864 1113865 the conflict degree of SH is 43 becauseIcount(b) 1 Icount(e) 2 and Icount(f) 1

Table 1 A transaction database (left) External utility value (right)

TID Transaction (item iu) Item euT1 (a 2) (b 1) (e 3) a 5T2 (c 1) (d 6) b 3T3 (b 1) (c 2) (e 1) (f 1) c 2T4 (a 3) (b 4) (c 2) (d 2) (e 5) d 1T5 (b 3) (c 5) e 6T6 (a 2) (e 7) (f 3) f 10

Mathematical Problems in Engineering 3

Definition 10 Let SHi be a sensitive itemset and minutil bethe minimum utility threshold To hide SHi the minimumutility to be reduced is defined as u(SHi) minus minutil + 1 anddenoted as diffu where u(SHi) is the utility of SHi

Let D be a database and SHi a sensitive itemset To hideSHi the utility of SHi should be reduced until it falls below theminimum utility threshold namely diffu of SHi should belower than or equal to zero +e strategy of hiding SHi is tosanitize some items in the selected transactions in D +esanitization process of hiding SHi is shown in Figure 1 Firstthe victim item is identified +en a sensitive itemset sup-porting the victim item is determined Next the victimtransaction is selected to bemodified After the victim item andtransaction are determined an original database is sanitized Inthe following the sanitization process is described in detail

411 e Victim Item Selection Let D be a database SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive itemsets In theMSICF algorithm the items in SH are sorted according tothe descending order of conflict counts +en the victimitem is selected based on the sorted results Because the orderof sorted items is fixed the victim item selection cannot bechanged if an original database is modified +us we im-prove the selection of the victim item In our approach theitem with the maximum conflict count is selected to besanitized that is Ivic argmaxitisinSHIcount(it) Once a sen-sitive itemset is hidden the conflict count of each sensitiveitem is recalculated to prevent other sensitive itemsets frombeing hidden in the sanitization process In this way we canmake sure that a victim item has the maximum conflictcount in the sanitization process

412 e Victim Transaction Selection Let D be a databaseand SHi a sensitive itemset For the MSICF algorithm thetransaction with the maximum utility of a victim item isselected to be a victim transaction However the damage tononsensitive knowledge is not taken into account To reducethe side effects on nonsensitive information the transactionthat causes the minimum impact should be modified withpriority +us we assign a transaction weight to each sen-sitive transaction which is used to determine the victimtransaction +e transaction weight is computed as

tw(T) u SHi T( 1113857

1 + NSHC(T) (2)

where u(SHi T) is the utility of SHi in transaction T andNSHC(T) is the number of nonsensitive itemsets supportedby T +e transaction with the maximum utility of SHi

indicates that the deletion of a victim item will decreasemore utility Moreover the transaction supporting theminimumnumber of nonsensitive itemsets indicates that themodification of it will generate less side effects +us thetransaction having the maximum transaction weight wouldbe sanitized first Because NSHC(T) is zero when a trans-action does not support any nonsensitive itemset we set thedenominator of Formula (2) to NSHC(T) + 1

413 e Original Database Sanitization Let SHi be asensitive itemset Ivic a victim item and Tvic a victimtransaction If diffu of SHi is not greater thanu(Ivic Tvic) minus eu(Ivic) it indicates that the victim item is notremoved from a victim transaction +en iu(Ivic) is reducedto iu(Ivic) minus 1113862diffueu(Ivic)1113863 where eu(Ivic) is the externalutility of Ivic Correspondingly diffu is decreased todiffu minus eu(Ivic)lowast 1113862diffueu(Ivic)1113863 Otherwise if Ivic is re-moved from the victim transaction diffu is updated todiffu minus u(SHi Tvic) rather than diffu minus u(Ivic Tvic) +ereason is that SHi is not supported by Tvic if Ivic is removedfrom Tvic

42e Sketch of the IMSICF Algorithm +e pseudocode ofthe IMSICF algorithm is shown in Algorithm 1 Initially theconflict count of each sensitive item in SH is calculated (Line2) +en the item with the maximum conflict count is se-lected to be victim item Ivic (Line 3) After the sanitized itemis identified the sensitive itemsets containing Ivic are hiddenone by one For a sensitive itemset SHi the minimum utilityto be reduced is computed (Line 5) +e utility of SHi isreduced until diffu is less than or equal to zero To hide SHithe sensitive transactions of SHi are identified +en thetransaction weight of each transaction is calculatedaccording to Formula (2) +e transaction with the maxi-mum weight is selected to be victim transaction Tvic +is isreasonable since the minimum side effects on nonsensitiveinformation are generated by modifying the selected Tvic(Line 7-8) Next the victim item in Tvic is modified and thedatabase and itemsets are updated respectively (Line 9ndash16)If diffu of SHi is not greater than zero the sensitive itemset isremoved from SH (Line 18) +e algorithm is terminateduntil all sensitive itemsets are hidden

43 An Illustrative Example For a given transaction data-base in Table 1 the high-utility itemsets derived at minutil

60 are listed in Table 2 +e user-specified sensitive itemsetsare b e ande f which are identified in boldface in Table 2+e proposed algorithm (IMSICF) is applied to hide sen-sitive itemsets

To hide the sensitive itemsets SH b e e f1113864 11138651113864 1113865 theconflict count of each item contained in SH is calculated+eresults are Icount(b) 1 Icount(e) 2 and Icount(f) 1Item e is selected to be the victim item because it has themaximum conflict count +en the sensitive itemset forhiding is randomly selected from among the onescontaining the victim item Let us assume that the selecteditemset is b e +e minimum utility to be reduced isdiffu 72 minus 60 + 1 13 and the sensitive transactions areST T1 T3 T41113864 1113865+e transaction weight of each transactionis assigned according to Formula (2) Becausetw(T1) 214 tw(T3) 92 and tw(T4) 426 thetransaction T4 is chosen for modification After the victimitem and transaction are identified the item is sanitizedaccording to the original database sanitization method de-scribed in Section 31 Because diffult u(e T4) minus eu(e) theinternal utility of item e is updated toiu(e T4) minus 1113862diffueu(e)1113863 2 +e utility of b e is reduced

4 Mathematical Problems in Engineering

to 54 +us b e is concealed and the nonsensitive itemsetsa b c e and a b c d e are falsely hidden +en e f ishidden by following the above steps After all of the sensitiveitemsets are hidden four nonsensitive itemsets namely ab c e a b c d e e and a e f are hidden

5 Experimental Analysis

To evaluate the performance of the IMSICF algorithm aseries of experiments have been conducted on various realand synthetic datasets in which IMSICF is compared withthe state-of-the-art algorithms Besides the experimentalresults are discussed in this section

51 ExperimentalData +e experiments were conducted ona 28 GHz Intel Xeon E5-2360 processor with 8GB RAM Toevaluate the performance of the proposed algorithm fourstate-of-the-art sanitization algorithms namely HHUIFMSICF MSU-MIU and MSU-MAU were used for com-parison Because the PPUMGAT algorithm hides sensitiveitemsets by transaction deletion we do not compare theproposed algorithm with PPUMGAT All of the algorithmswere implemented in Java language Four datasets [33] wereused to run the programs +e characteristics of thesedatasets are displayed in Table 3 +e density is measured asthe average transaction length divided by the number ofitems For each dataset the external utility values weregenerated with the Gaussian normal distribution and theinternal utility values are the random numbers ranging from1 to 10 +e EFIM algorithm [44] was used to mine high-utility itemsets and the minimum utility thresholds formushroom Foodmart T25I10D10K and T20I6D100K wereset at 866 0045 024 and 017 respectively +esensitive itemsets were randomly selected from the mineditemsets

+e proposed algorithm identifies the victim item basedon the conflict count of each sensitive item+us the conflictdegree of the sensitive itemsets is presented to observe howthe correlation among the itemsets influences the perfor-mance of the sanitization algorithms Besides the sensitive

Victim item selection A sensitive itemset Victim transaction

selectionSanitize the

original database

Figure 1 +e sanitization process for hiding a sensitive itemset

Input a database D a set of sensitive itemsets SH SH1 SH2 SHt1113864 1113865 the given utility threshold minutilOutput the sanitized database Dprime

(1) while SHneempty(2) Calculate Icount(ip) of each sensitive item ip ip isin SHi(3) Ivic argmaxipisinSHIcount(ip)

(4) for each SHi isin SHand Ivic isin SHi

(5) diffu u(SHi) minus minutil + 1(6) while diffugt 0(7) Find the sensitive transactions ST SHi sube ST(8) Tvic argmaxTisinSTtw(T)

(9) if diffugt u(Ivic Tvic) minus eu(Ivic)

(10) Delete Ivic(11) diffu diffu minus u(SHi Tvic)

(12) else(13) iu(Ivic) iu(Ivic) minus 1113862diffueu(Ivic)1113863

(14) diffu diffu minus eu(Ivic)lowast1113862diffueu(Ivic)1113863

(15) end if(16) Update the database and the itemsets(17) end while(18) SH SH minus SHi

(19) end for(20) end while

ALGORITHM 1 +e IMSICF algorithm

Table 2 Derived high-utility itemsets

HID Itemsets Utility1 e 962 a e 1253 b e 724 e f 885 a b e 886 a e f 827 a b c e 618 a b c d e 63

Mathematical Problems in Engineering 5

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 2: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

and having the maximal utility of a sensitive itemsetis chosen to be modified which effectively reducesundesired side effects produced by the sanitizationprocess

(3) +e conflict degree is defined to reflect the corre-lation among sensitive itemsets Moreover the in-fluence of conflict degree on sanitizationperformance is observed

+e rest of the paper is organized as follows Section 2reviews related works In Section 3 preliminary knowledgeof high-utility itemsets mining is introduced Section 4describes the hidden strategy of the proposed sanitizationapproach Section 5 gives experimental results and analysisFinally conclusions are made in Section 6

2 Related Works

In this section related works on privacy-preserving datamining are reviewed

Most of previous studies focused on hiding sensitiveitemsets from frequent itemset mining approaches Verykioset al [11 12] proposed five approaches for achieving PPDM+e first three algorithms are used to protect sensitive as-sociation rules A sensitive rule is hidden by reducing itssupport or confidence +e last two algorithms are used toconceal sensitive itemsets However all of the five algorithmsonly hide rules that are supported by disjoint frequentitemsets Oliveira and Zaıane [13] presented a one-scanalgorithm that only needs to scan a database once +edisclosure threshold is introduced to balance privacy pro-tection and knowledge disclosure Amiri [14] developedthree algorithms namely aggregate disaggregate and hy-brid for hiding sensitive frequent itemsets Aggregateconceals sensitive itemsets by deleting transactions Disag-gregate sanitizes a database by removing some items +ehybrid algorithm is the combination of the previous twoalgorithms In terms of execution time and side effects thehybrid algorithm is recommended for database sanitization

Gkoulalas-Divanis and Verykios [15] introduced a newapproach for hiding sensitive itemsets by inserting somesynthetic transactions into an original database +e hybriddatabase is generated based on the constraints on the borderitemsets Wu and Huang [16] described two greedy ap-proaches namely greedy approximation algorithm andexhausted algorithm for concealing sensitive associationrules Both algorithms include the sanitization procedureand exposed procedure +e greedy approximation algo-rithm always outperforms the other one because the cost isrecalculated when items are modified However the greedyalgorithm takes a lot of time to expose missing rules

Hong et al [17] used the technique of term frequency-inverse document frequency (TF-IDF) to sanitize a database+e transaction with more sensitive items and less influenceon other transactions is selected for modification Howeverthe scalability of this approach is poor

Le et al [18 19] applied the lattice theory to hide sen-sitive association rules and two approaches called HCSRILand AARHIL were proposed based on the intersection lattice

of frequent itemsets Both algorithms select the victim itemwith the least impacts on the generating set for sanitization+e AARHIL algorithm has better performance thanHCSRIL in terms of missing cost since the victim transactionselection is improved Shah et al [20] adopted the geneticalgorithm to hide sensitive association rules +e fitnessfunction is used to evaluate whether to modify a transactionor not +e transaction with lower fitness will be modifiedwith higher probability because it contains more sensitiveitems and minimal number of data items

Cheng et al [21ndash25] applied a multiobjective optimi-zation algorithm into PPDM such as NSGA-II and Hype+e algorithms in [21ndash24] conceal sensitive association rulesby modifying some items +e sanitization approach in [25]hides sensitive itemsets by removing some items +e keyissue in optimization algorithm is to design the objectivefunctions which are based on the side effects on a databaseBesides Cheng et al [26] also proposed a greedy algorithmfor hiding sensitive rules +e information on nonsensitiveitemsets is considered in the selection of the victim trans-action +us the side effects are effectively reduced Lin et al[27] presented a multiobjective algorithm (NSGA2DT) forhiding sensitive itemsets with transaction deletion A FastSoRting strategy and the prelarge concept are utilized toaccelerate the iterative process

+e above methods focus on protecting sensitiveknowledge in frequent itemset mining However it is notsuitable to modify the quantities of items in a transactionaldatabase Recently various methods are developed forprotecting high-utility itemsets Moreover PPUM has be-come an important research issue Yeh et al [28 29] pre-sented the HHUIF and MSICF algorithms to concealsensitive high-utility itemsets Both algorithms sanitize theoriginal database based on the concept of utility Howeverthe MSICF algorithm takes the conflict count of sensitiveitems into account Rajalaxmi and Natarajan [30] identifythe victim transaction with the maximum number of sen-sitive items +en the item with the maximal utility is se-lected for modification However the impact onnonsensitive itemsets is discarded Lin et al [31] proposed aGA-based algorithm for PPUM by inserting some appro-priate transactions into an original database A function withthree factors is used to determine the transactions for in-sertion However this algorithm produces some spuriousitemsets after the sanitization process

Yun and Kim [32] presented an algorithm called FPUTTto improve the efficiency of the HHUIF algorithm +e treestructure is utilized to accelerate the sanitization processHowever the results of FPUTT in terms of side effects onnonsensitive knowledge are the same as those of HHUIF Linet al [33 34] then developed two approaches MSU_MAUand MSU_MIU for PPUM For each sensitive itemset SHithe transaction with the maximum utility of SHi is selectedfor modification +en the victim item is chosen based onthe maximum utility or minimum utility Lin et al [35]designed a genetic algorithm to hide sensitive HUIs bytransaction deletion +e prelarge concept is adopted toaccelerate the evolution process However some spuriousitemsets are produced by the sanitization process Li et al

2 Mathematical Problems in Engineering

[36] formulate the hiding process as a constraint satisfactionproblem Integer linear programming is adopted in thedesigned algorithm to obtain a lower ratio of side effectsproduced in the hiding process

Rajalaxmi and Natarajan [37] proposed two approachesnamed MSMU and MCRSU to hide the sensitive frequentand utility itemsets Both algorithms conceal the itemsetsuntil their support and utility fall below the given thresholdsrespectively Liu et al [38] presented a novel sanitizationalgorithm called HUFI to conceal sensitive frequent andutility itemsets +e concept of maximum boundary value isintroduced to determine the hidden strategy +us theapproach outperforms the other algorithms in minimizingthe side effects Besides the above works Le et al [39]proposed an efficient algorithm for hiding high-utility se-quential patterns which relies on a novel structure to en-hance the sanitization process

3 Preliminaries

Some preliminary definitions of high-utility itemsets miningare introduced in this section [40 41] In addition thesanitization problem is described [42 43]

31 Definitions Let I I1 I2 Im1113864 1113865 be a set of distinctitems Let D T1 T2 Tn1113864 1113865 be a transaction databasewhere each transaction Ti has a unique identifier TID andTi sube I Each item has an external utility value which reflectsthe importance of an item An itemset is a subset of I and itis called a k-itemset if it contains k items

Definition 1 Each item it is assigned an external utilityvalue which is denoted as eu(it) For example in Table 1eu(b) 3

Definition 2 Each item it in a transaction T is assigned aninternal utility value which is denoted as iu(it T) Forexample in Table 1 iu(b T1) 1

Definition 3 +e utility of item it in a transaction T isdenoted as u(it T) and defined as iu(it T) times eu(it) Forexample in Table 1 u(d T2) 1 times 6 6

Definition 4 +e utility of itemset SHi in a transaction T isdenoted as u(SHi T) and defined as 1113936itisinSHi

u(it T) Forexample in Table 1 u( b c T3) 3 + 4 7

Definition 5 +e utility of itemset SHi is denoted as u(SHi)

and defined as 1113936SHiisinTu(SHi T) For example in Table 1u( b c ) 7 + 16 + 19 42

Definition 6 +e utility of a transaction T is denoted astu(T) and defined as 1113936itisinTu(it T) For example in Table 1u(T2) 2 + 6 8

Definition 7 +e user-specified minimum utility thresholdis denoted as minutil A pattern X is a high-utility itemset ifu(X)geminutil Otherwise it is a low-utility itemset High-

utility itemset mining is to discover the itemsets whoseutility values are beyond minutil

32 Sanitization Problem Description Given a transactiondatabase D the minimum utility threshold minutil and thehigh-utility itemsets mined from D based on minutil SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive high-utility itemsetswhere SHi is the itemset that needs to be hidden A sensitivetransaction refers to the transaction supporting at least onesensitive itemset+e sanitization problem is to transform anoriginal database D to a sanitized database Dprime so that allsensitive itemsets are hidden while at the same time the sideeffects on the database and nonsensitive knowledge areminimized

One way to sanitize a database is to modify some items init +e modified item contained in D is the victim itemwhich is denoted as Ivic +e transaction supporting Ivic isthe victim transaction which is denoted as Tvic

4 The Hiding Approach

In this section a sanitization approach named the ImprovedMaximum Sensitive Itemsets Conflict First Algorithm(IMSICF) is presented in detail +e victim item is selectedbased on the conflict count which is calculated dynamicallyMoreover the victim transaction is selected based on theside effects on nonsensitive knowledge which effectivelyreduces the missing costs To better illustrate how theIMSICF algorithm works an example is given

41 e Sanitization Process of Hiding Sensitive Itemsets

Definition 9 Let SH SH1 SH2 SHt1113864 1113865 be a set ofsensitive high-utility itemsets +e conflict count of a sen-sitive item it in SH is denoted as Icount(it) and defined as| SHi isin SH | it isin SHi1113864 1113865| that is Icount(it) is the number ofsensitive itemsets containing it +e conflict degree of SH isdefined as

1113936itisinSHIcount it( 1113857

ni

(1)

where ni refers to the number of distinct items contained inSH +e conflict degree reflects the correlation amongsensitive itemsets +e higher conflict degree indicates thatsensitive itemsets have more common items

For example given a set of sensitive itemsetsSH b e e f1113864 11138651113864 1113865 the conflict degree of SH is 43 becauseIcount(b) 1 Icount(e) 2 and Icount(f) 1

Table 1 A transaction database (left) External utility value (right)

TID Transaction (item iu) Item euT1 (a 2) (b 1) (e 3) a 5T2 (c 1) (d 6) b 3T3 (b 1) (c 2) (e 1) (f 1) c 2T4 (a 3) (b 4) (c 2) (d 2) (e 5) d 1T5 (b 3) (c 5) e 6T6 (a 2) (e 7) (f 3) f 10

Mathematical Problems in Engineering 3

Definition 10 Let SHi be a sensitive itemset and minutil bethe minimum utility threshold To hide SHi the minimumutility to be reduced is defined as u(SHi) minus minutil + 1 anddenoted as diffu where u(SHi) is the utility of SHi

Let D be a database and SHi a sensitive itemset To hideSHi the utility of SHi should be reduced until it falls below theminimum utility threshold namely diffu of SHi should belower than or equal to zero +e strategy of hiding SHi is tosanitize some items in the selected transactions in D +esanitization process of hiding SHi is shown in Figure 1 Firstthe victim item is identified +en a sensitive itemset sup-porting the victim item is determined Next the victimtransaction is selected to bemodified After the victim item andtransaction are determined an original database is sanitized Inthe following the sanitization process is described in detail

411 e Victim Item Selection Let D be a database SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive itemsets In theMSICF algorithm the items in SH are sorted according tothe descending order of conflict counts +en the victimitem is selected based on the sorted results Because the orderof sorted items is fixed the victim item selection cannot bechanged if an original database is modified +us we im-prove the selection of the victim item In our approach theitem with the maximum conflict count is selected to besanitized that is Ivic argmaxitisinSHIcount(it) Once a sen-sitive itemset is hidden the conflict count of each sensitiveitem is recalculated to prevent other sensitive itemsets frombeing hidden in the sanitization process In this way we canmake sure that a victim item has the maximum conflictcount in the sanitization process

412 e Victim Transaction Selection Let D be a databaseand SHi a sensitive itemset For the MSICF algorithm thetransaction with the maximum utility of a victim item isselected to be a victim transaction However the damage tononsensitive knowledge is not taken into account To reducethe side effects on nonsensitive information the transactionthat causes the minimum impact should be modified withpriority +us we assign a transaction weight to each sen-sitive transaction which is used to determine the victimtransaction +e transaction weight is computed as

tw(T) u SHi T( 1113857

1 + NSHC(T) (2)

where u(SHi T) is the utility of SHi in transaction T andNSHC(T) is the number of nonsensitive itemsets supportedby T +e transaction with the maximum utility of SHi

indicates that the deletion of a victim item will decreasemore utility Moreover the transaction supporting theminimumnumber of nonsensitive itemsets indicates that themodification of it will generate less side effects +us thetransaction having the maximum transaction weight wouldbe sanitized first Because NSHC(T) is zero when a trans-action does not support any nonsensitive itemset we set thedenominator of Formula (2) to NSHC(T) + 1

413 e Original Database Sanitization Let SHi be asensitive itemset Ivic a victim item and Tvic a victimtransaction If diffu of SHi is not greater thanu(Ivic Tvic) minus eu(Ivic) it indicates that the victim item is notremoved from a victim transaction +en iu(Ivic) is reducedto iu(Ivic) minus 1113862diffueu(Ivic)1113863 where eu(Ivic) is the externalutility of Ivic Correspondingly diffu is decreased todiffu minus eu(Ivic)lowast 1113862diffueu(Ivic)1113863 Otherwise if Ivic is re-moved from the victim transaction diffu is updated todiffu minus u(SHi Tvic) rather than diffu minus u(Ivic Tvic) +ereason is that SHi is not supported by Tvic if Ivic is removedfrom Tvic

42e Sketch of the IMSICF Algorithm +e pseudocode ofthe IMSICF algorithm is shown in Algorithm 1 Initially theconflict count of each sensitive item in SH is calculated (Line2) +en the item with the maximum conflict count is se-lected to be victim item Ivic (Line 3) After the sanitized itemis identified the sensitive itemsets containing Ivic are hiddenone by one For a sensitive itemset SHi the minimum utilityto be reduced is computed (Line 5) +e utility of SHi isreduced until diffu is less than or equal to zero To hide SHithe sensitive transactions of SHi are identified +en thetransaction weight of each transaction is calculatedaccording to Formula (2) +e transaction with the maxi-mum weight is selected to be victim transaction Tvic +is isreasonable since the minimum side effects on nonsensitiveinformation are generated by modifying the selected Tvic(Line 7-8) Next the victim item in Tvic is modified and thedatabase and itemsets are updated respectively (Line 9ndash16)If diffu of SHi is not greater than zero the sensitive itemset isremoved from SH (Line 18) +e algorithm is terminateduntil all sensitive itemsets are hidden

43 An Illustrative Example For a given transaction data-base in Table 1 the high-utility itemsets derived at minutil

60 are listed in Table 2 +e user-specified sensitive itemsetsare b e ande f which are identified in boldface in Table 2+e proposed algorithm (IMSICF) is applied to hide sen-sitive itemsets

To hide the sensitive itemsets SH b e e f1113864 11138651113864 1113865 theconflict count of each item contained in SH is calculated+eresults are Icount(b) 1 Icount(e) 2 and Icount(f) 1Item e is selected to be the victim item because it has themaximum conflict count +en the sensitive itemset forhiding is randomly selected from among the onescontaining the victim item Let us assume that the selecteditemset is b e +e minimum utility to be reduced isdiffu 72 minus 60 + 1 13 and the sensitive transactions areST T1 T3 T41113864 1113865+e transaction weight of each transactionis assigned according to Formula (2) Becausetw(T1) 214 tw(T3) 92 and tw(T4) 426 thetransaction T4 is chosen for modification After the victimitem and transaction are identified the item is sanitizedaccording to the original database sanitization method de-scribed in Section 31 Because diffult u(e T4) minus eu(e) theinternal utility of item e is updated toiu(e T4) minus 1113862diffueu(e)1113863 2 +e utility of b e is reduced

4 Mathematical Problems in Engineering

to 54 +us b e is concealed and the nonsensitive itemsetsa b c e and a b c d e are falsely hidden +en e f ishidden by following the above steps After all of the sensitiveitemsets are hidden four nonsensitive itemsets namely ab c e a b c d e e and a e f are hidden

5 Experimental Analysis

To evaluate the performance of the IMSICF algorithm aseries of experiments have been conducted on various realand synthetic datasets in which IMSICF is compared withthe state-of-the-art algorithms Besides the experimentalresults are discussed in this section

51 ExperimentalData +e experiments were conducted ona 28 GHz Intel Xeon E5-2360 processor with 8GB RAM Toevaluate the performance of the proposed algorithm fourstate-of-the-art sanitization algorithms namely HHUIFMSICF MSU-MIU and MSU-MAU were used for com-parison Because the PPUMGAT algorithm hides sensitiveitemsets by transaction deletion we do not compare theproposed algorithm with PPUMGAT All of the algorithmswere implemented in Java language Four datasets [33] wereused to run the programs +e characteristics of thesedatasets are displayed in Table 3 +e density is measured asthe average transaction length divided by the number ofitems For each dataset the external utility values weregenerated with the Gaussian normal distribution and theinternal utility values are the random numbers ranging from1 to 10 +e EFIM algorithm [44] was used to mine high-utility itemsets and the minimum utility thresholds formushroom Foodmart T25I10D10K and T20I6D100K wereset at 866 0045 024 and 017 respectively +esensitive itemsets were randomly selected from the mineditemsets

+e proposed algorithm identifies the victim item basedon the conflict count of each sensitive item+us the conflictdegree of the sensitive itemsets is presented to observe howthe correlation among the itemsets influences the perfor-mance of the sanitization algorithms Besides the sensitive

Victim item selection A sensitive itemset Victim transaction

selectionSanitize the

original database

Figure 1 +e sanitization process for hiding a sensitive itemset

Input a database D a set of sensitive itemsets SH SH1 SH2 SHt1113864 1113865 the given utility threshold minutilOutput the sanitized database Dprime

(1) while SHneempty(2) Calculate Icount(ip) of each sensitive item ip ip isin SHi(3) Ivic argmaxipisinSHIcount(ip)

(4) for each SHi isin SHand Ivic isin SHi

(5) diffu u(SHi) minus minutil + 1(6) while diffugt 0(7) Find the sensitive transactions ST SHi sube ST(8) Tvic argmaxTisinSTtw(T)

(9) if diffugt u(Ivic Tvic) minus eu(Ivic)

(10) Delete Ivic(11) diffu diffu minus u(SHi Tvic)

(12) else(13) iu(Ivic) iu(Ivic) minus 1113862diffueu(Ivic)1113863

(14) diffu diffu minus eu(Ivic)lowast1113862diffueu(Ivic)1113863

(15) end if(16) Update the database and the itemsets(17) end while(18) SH SH minus SHi

(19) end for(20) end while

ALGORITHM 1 +e IMSICF algorithm

Table 2 Derived high-utility itemsets

HID Itemsets Utility1 e 962 a e 1253 b e 724 e f 885 a b e 886 a e f 827 a b c e 618 a b c d e 63

Mathematical Problems in Engineering 5

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 3: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

[36] formulate the hiding process as a constraint satisfactionproblem Integer linear programming is adopted in thedesigned algorithm to obtain a lower ratio of side effectsproduced in the hiding process

Rajalaxmi and Natarajan [37] proposed two approachesnamed MSMU and MCRSU to hide the sensitive frequentand utility itemsets Both algorithms conceal the itemsetsuntil their support and utility fall below the given thresholdsrespectively Liu et al [38] presented a novel sanitizationalgorithm called HUFI to conceal sensitive frequent andutility itemsets +e concept of maximum boundary value isintroduced to determine the hidden strategy +us theapproach outperforms the other algorithms in minimizingthe side effects Besides the above works Le et al [39]proposed an efficient algorithm for hiding high-utility se-quential patterns which relies on a novel structure to en-hance the sanitization process

3 Preliminaries

Some preliminary definitions of high-utility itemsets miningare introduced in this section [40 41] In addition thesanitization problem is described [42 43]

31 Definitions Let I I1 I2 Im1113864 1113865 be a set of distinctitems Let D T1 T2 Tn1113864 1113865 be a transaction databasewhere each transaction Ti has a unique identifier TID andTi sube I Each item has an external utility value which reflectsthe importance of an item An itemset is a subset of I and itis called a k-itemset if it contains k items

Definition 1 Each item it is assigned an external utilityvalue which is denoted as eu(it) For example in Table 1eu(b) 3

Definition 2 Each item it in a transaction T is assigned aninternal utility value which is denoted as iu(it T) Forexample in Table 1 iu(b T1) 1

Definition 3 +e utility of item it in a transaction T isdenoted as u(it T) and defined as iu(it T) times eu(it) Forexample in Table 1 u(d T2) 1 times 6 6

Definition 4 +e utility of itemset SHi in a transaction T isdenoted as u(SHi T) and defined as 1113936itisinSHi

u(it T) Forexample in Table 1 u( b c T3) 3 + 4 7

Definition 5 +e utility of itemset SHi is denoted as u(SHi)

and defined as 1113936SHiisinTu(SHi T) For example in Table 1u( b c ) 7 + 16 + 19 42

Definition 6 +e utility of a transaction T is denoted astu(T) and defined as 1113936itisinTu(it T) For example in Table 1u(T2) 2 + 6 8

Definition 7 +e user-specified minimum utility thresholdis denoted as minutil A pattern X is a high-utility itemset ifu(X)geminutil Otherwise it is a low-utility itemset High-

utility itemset mining is to discover the itemsets whoseutility values are beyond minutil

32 Sanitization Problem Description Given a transactiondatabase D the minimum utility threshold minutil and thehigh-utility itemsets mined from D based on minutil SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive high-utility itemsetswhere SHi is the itemset that needs to be hidden A sensitivetransaction refers to the transaction supporting at least onesensitive itemset+e sanitization problem is to transform anoriginal database D to a sanitized database Dprime so that allsensitive itemsets are hidden while at the same time the sideeffects on the database and nonsensitive knowledge areminimized

One way to sanitize a database is to modify some items init +e modified item contained in D is the victim itemwhich is denoted as Ivic +e transaction supporting Ivic isthe victim transaction which is denoted as Tvic

4 The Hiding Approach

In this section a sanitization approach named the ImprovedMaximum Sensitive Itemsets Conflict First Algorithm(IMSICF) is presented in detail +e victim item is selectedbased on the conflict count which is calculated dynamicallyMoreover the victim transaction is selected based on theside effects on nonsensitive knowledge which effectivelyreduces the missing costs To better illustrate how theIMSICF algorithm works an example is given

41 e Sanitization Process of Hiding Sensitive Itemsets

Definition 9 Let SH SH1 SH2 SHt1113864 1113865 be a set ofsensitive high-utility itemsets +e conflict count of a sen-sitive item it in SH is denoted as Icount(it) and defined as| SHi isin SH | it isin SHi1113864 1113865| that is Icount(it) is the number ofsensitive itemsets containing it +e conflict degree of SH isdefined as

1113936itisinSHIcount it( 1113857

ni

(1)

where ni refers to the number of distinct items contained inSH +e conflict degree reflects the correlation amongsensitive itemsets +e higher conflict degree indicates thatsensitive itemsets have more common items

For example given a set of sensitive itemsetsSH b e e f1113864 11138651113864 1113865 the conflict degree of SH is 43 becauseIcount(b) 1 Icount(e) 2 and Icount(f) 1

Table 1 A transaction database (left) External utility value (right)

TID Transaction (item iu) Item euT1 (a 2) (b 1) (e 3) a 5T2 (c 1) (d 6) b 3T3 (b 1) (c 2) (e 1) (f 1) c 2T4 (a 3) (b 4) (c 2) (d 2) (e 5) d 1T5 (b 3) (c 5) e 6T6 (a 2) (e 7) (f 3) f 10

Mathematical Problems in Engineering 3

Definition 10 Let SHi be a sensitive itemset and minutil bethe minimum utility threshold To hide SHi the minimumutility to be reduced is defined as u(SHi) minus minutil + 1 anddenoted as diffu where u(SHi) is the utility of SHi

Let D be a database and SHi a sensitive itemset To hideSHi the utility of SHi should be reduced until it falls below theminimum utility threshold namely diffu of SHi should belower than or equal to zero +e strategy of hiding SHi is tosanitize some items in the selected transactions in D +esanitization process of hiding SHi is shown in Figure 1 Firstthe victim item is identified +en a sensitive itemset sup-porting the victim item is determined Next the victimtransaction is selected to bemodified After the victim item andtransaction are determined an original database is sanitized Inthe following the sanitization process is described in detail

411 e Victim Item Selection Let D be a database SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive itemsets In theMSICF algorithm the items in SH are sorted according tothe descending order of conflict counts +en the victimitem is selected based on the sorted results Because the orderof sorted items is fixed the victim item selection cannot bechanged if an original database is modified +us we im-prove the selection of the victim item In our approach theitem with the maximum conflict count is selected to besanitized that is Ivic argmaxitisinSHIcount(it) Once a sen-sitive itemset is hidden the conflict count of each sensitiveitem is recalculated to prevent other sensitive itemsets frombeing hidden in the sanitization process In this way we canmake sure that a victim item has the maximum conflictcount in the sanitization process

412 e Victim Transaction Selection Let D be a databaseand SHi a sensitive itemset For the MSICF algorithm thetransaction with the maximum utility of a victim item isselected to be a victim transaction However the damage tononsensitive knowledge is not taken into account To reducethe side effects on nonsensitive information the transactionthat causes the minimum impact should be modified withpriority +us we assign a transaction weight to each sen-sitive transaction which is used to determine the victimtransaction +e transaction weight is computed as

tw(T) u SHi T( 1113857

1 + NSHC(T) (2)

where u(SHi T) is the utility of SHi in transaction T andNSHC(T) is the number of nonsensitive itemsets supportedby T +e transaction with the maximum utility of SHi

indicates that the deletion of a victim item will decreasemore utility Moreover the transaction supporting theminimumnumber of nonsensitive itemsets indicates that themodification of it will generate less side effects +us thetransaction having the maximum transaction weight wouldbe sanitized first Because NSHC(T) is zero when a trans-action does not support any nonsensitive itemset we set thedenominator of Formula (2) to NSHC(T) + 1

413 e Original Database Sanitization Let SHi be asensitive itemset Ivic a victim item and Tvic a victimtransaction If diffu of SHi is not greater thanu(Ivic Tvic) minus eu(Ivic) it indicates that the victim item is notremoved from a victim transaction +en iu(Ivic) is reducedto iu(Ivic) minus 1113862diffueu(Ivic)1113863 where eu(Ivic) is the externalutility of Ivic Correspondingly diffu is decreased todiffu minus eu(Ivic)lowast 1113862diffueu(Ivic)1113863 Otherwise if Ivic is re-moved from the victim transaction diffu is updated todiffu minus u(SHi Tvic) rather than diffu minus u(Ivic Tvic) +ereason is that SHi is not supported by Tvic if Ivic is removedfrom Tvic

42e Sketch of the IMSICF Algorithm +e pseudocode ofthe IMSICF algorithm is shown in Algorithm 1 Initially theconflict count of each sensitive item in SH is calculated (Line2) +en the item with the maximum conflict count is se-lected to be victim item Ivic (Line 3) After the sanitized itemis identified the sensitive itemsets containing Ivic are hiddenone by one For a sensitive itemset SHi the minimum utilityto be reduced is computed (Line 5) +e utility of SHi isreduced until diffu is less than or equal to zero To hide SHithe sensitive transactions of SHi are identified +en thetransaction weight of each transaction is calculatedaccording to Formula (2) +e transaction with the maxi-mum weight is selected to be victim transaction Tvic +is isreasonable since the minimum side effects on nonsensitiveinformation are generated by modifying the selected Tvic(Line 7-8) Next the victim item in Tvic is modified and thedatabase and itemsets are updated respectively (Line 9ndash16)If diffu of SHi is not greater than zero the sensitive itemset isremoved from SH (Line 18) +e algorithm is terminateduntil all sensitive itemsets are hidden

43 An Illustrative Example For a given transaction data-base in Table 1 the high-utility itemsets derived at minutil

60 are listed in Table 2 +e user-specified sensitive itemsetsare b e ande f which are identified in boldface in Table 2+e proposed algorithm (IMSICF) is applied to hide sen-sitive itemsets

To hide the sensitive itemsets SH b e e f1113864 11138651113864 1113865 theconflict count of each item contained in SH is calculated+eresults are Icount(b) 1 Icount(e) 2 and Icount(f) 1Item e is selected to be the victim item because it has themaximum conflict count +en the sensitive itemset forhiding is randomly selected from among the onescontaining the victim item Let us assume that the selecteditemset is b e +e minimum utility to be reduced isdiffu 72 minus 60 + 1 13 and the sensitive transactions areST T1 T3 T41113864 1113865+e transaction weight of each transactionis assigned according to Formula (2) Becausetw(T1) 214 tw(T3) 92 and tw(T4) 426 thetransaction T4 is chosen for modification After the victimitem and transaction are identified the item is sanitizedaccording to the original database sanitization method de-scribed in Section 31 Because diffult u(e T4) minus eu(e) theinternal utility of item e is updated toiu(e T4) minus 1113862diffueu(e)1113863 2 +e utility of b e is reduced

4 Mathematical Problems in Engineering

to 54 +us b e is concealed and the nonsensitive itemsetsa b c e and a b c d e are falsely hidden +en e f ishidden by following the above steps After all of the sensitiveitemsets are hidden four nonsensitive itemsets namely ab c e a b c d e e and a e f are hidden

5 Experimental Analysis

To evaluate the performance of the IMSICF algorithm aseries of experiments have been conducted on various realand synthetic datasets in which IMSICF is compared withthe state-of-the-art algorithms Besides the experimentalresults are discussed in this section

51 ExperimentalData +e experiments were conducted ona 28 GHz Intel Xeon E5-2360 processor with 8GB RAM Toevaluate the performance of the proposed algorithm fourstate-of-the-art sanitization algorithms namely HHUIFMSICF MSU-MIU and MSU-MAU were used for com-parison Because the PPUMGAT algorithm hides sensitiveitemsets by transaction deletion we do not compare theproposed algorithm with PPUMGAT All of the algorithmswere implemented in Java language Four datasets [33] wereused to run the programs +e characteristics of thesedatasets are displayed in Table 3 +e density is measured asthe average transaction length divided by the number ofitems For each dataset the external utility values weregenerated with the Gaussian normal distribution and theinternal utility values are the random numbers ranging from1 to 10 +e EFIM algorithm [44] was used to mine high-utility itemsets and the minimum utility thresholds formushroom Foodmart T25I10D10K and T20I6D100K wereset at 866 0045 024 and 017 respectively +esensitive itemsets were randomly selected from the mineditemsets

+e proposed algorithm identifies the victim item basedon the conflict count of each sensitive item+us the conflictdegree of the sensitive itemsets is presented to observe howthe correlation among the itemsets influences the perfor-mance of the sanitization algorithms Besides the sensitive

Victim item selection A sensitive itemset Victim transaction

selectionSanitize the

original database

Figure 1 +e sanitization process for hiding a sensitive itemset

Input a database D a set of sensitive itemsets SH SH1 SH2 SHt1113864 1113865 the given utility threshold minutilOutput the sanitized database Dprime

(1) while SHneempty(2) Calculate Icount(ip) of each sensitive item ip ip isin SHi(3) Ivic argmaxipisinSHIcount(ip)

(4) for each SHi isin SHand Ivic isin SHi

(5) diffu u(SHi) minus minutil + 1(6) while diffugt 0(7) Find the sensitive transactions ST SHi sube ST(8) Tvic argmaxTisinSTtw(T)

(9) if diffugt u(Ivic Tvic) minus eu(Ivic)

(10) Delete Ivic(11) diffu diffu minus u(SHi Tvic)

(12) else(13) iu(Ivic) iu(Ivic) minus 1113862diffueu(Ivic)1113863

(14) diffu diffu minus eu(Ivic)lowast1113862diffueu(Ivic)1113863

(15) end if(16) Update the database and the itemsets(17) end while(18) SH SH minus SHi

(19) end for(20) end while

ALGORITHM 1 +e IMSICF algorithm

Table 2 Derived high-utility itemsets

HID Itemsets Utility1 e 962 a e 1253 b e 724 e f 885 a b e 886 a e f 827 a b c e 618 a b c d e 63

Mathematical Problems in Engineering 5

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 4: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

Definition 10 Let SHi be a sensitive itemset and minutil bethe minimum utility threshold To hide SHi the minimumutility to be reduced is defined as u(SHi) minus minutil + 1 anddenoted as diffu where u(SHi) is the utility of SHi

Let D be a database and SHi a sensitive itemset To hideSHi the utility of SHi should be reduced until it falls below theminimum utility threshold namely diffu of SHi should belower than or equal to zero +e strategy of hiding SHi is tosanitize some items in the selected transactions in D +esanitization process of hiding SHi is shown in Figure 1 Firstthe victim item is identified +en a sensitive itemset sup-porting the victim item is determined Next the victimtransaction is selected to bemodified After the victim item andtransaction are determined an original database is sanitized Inthe following the sanitization process is described in detail

411 e Victim Item Selection Let D be a database SH

SH1 SH2 SHt1113864 1113865 is a set of sensitive itemsets In theMSICF algorithm the items in SH are sorted according tothe descending order of conflict counts +en the victimitem is selected based on the sorted results Because the orderof sorted items is fixed the victim item selection cannot bechanged if an original database is modified +us we im-prove the selection of the victim item In our approach theitem with the maximum conflict count is selected to besanitized that is Ivic argmaxitisinSHIcount(it) Once a sen-sitive itemset is hidden the conflict count of each sensitiveitem is recalculated to prevent other sensitive itemsets frombeing hidden in the sanitization process In this way we canmake sure that a victim item has the maximum conflictcount in the sanitization process

412 e Victim Transaction Selection Let D be a databaseand SHi a sensitive itemset For the MSICF algorithm thetransaction with the maximum utility of a victim item isselected to be a victim transaction However the damage tononsensitive knowledge is not taken into account To reducethe side effects on nonsensitive information the transactionthat causes the minimum impact should be modified withpriority +us we assign a transaction weight to each sen-sitive transaction which is used to determine the victimtransaction +e transaction weight is computed as

tw(T) u SHi T( 1113857

1 + NSHC(T) (2)

where u(SHi T) is the utility of SHi in transaction T andNSHC(T) is the number of nonsensitive itemsets supportedby T +e transaction with the maximum utility of SHi

indicates that the deletion of a victim item will decreasemore utility Moreover the transaction supporting theminimumnumber of nonsensitive itemsets indicates that themodification of it will generate less side effects +us thetransaction having the maximum transaction weight wouldbe sanitized first Because NSHC(T) is zero when a trans-action does not support any nonsensitive itemset we set thedenominator of Formula (2) to NSHC(T) + 1

413 e Original Database Sanitization Let SHi be asensitive itemset Ivic a victim item and Tvic a victimtransaction If diffu of SHi is not greater thanu(Ivic Tvic) minus eu(Ivic) it indicates that the victim item is notremoved from a victim transaction +en iu(Ivic) is reducedto iu(Ivic) minus 1113862diffueu(Ivic)1113863 where eu(Ivic) is the externalutility of Ivic Correspondingly diffu is decreased todiffu minus eu(Ivic)lowast 1113862diffueu(Ivic)1113863 Otherwise if Ivic is re-moved from the victim transaction diffu is updated todiffu minus u(SHi Tvic) rather than diffu minus u(Ivic Tvic) +ereason is that SHi is not supported by Tvic if Ivic is removedfrom Tvic

42e Sketch of the IMSICF Algorithm +e pseudocode ofthe IMSICF algorithm is shown in Algorithm 1 Initially theconflict count of each sensitive item in SH is calculated (Line2) +en the item with the maximum conflict count is se-lected to be victim item Ivic (Line 3) After the sanitized itemis identified the sensitive itemsets containing Ivic are hiddenone by one For a sensitive itemset SHi the minimum utilityto be reduced is computed (Line 5) +e utility of SHi isreduced until diffu is less than or equal to zero To hide SHithe sensitive transactions of SHi are identified +en thetransaction weight of each transaction is calculatedaccording to Formula (2) +e transaction with the maxi-mum weight is selected to be victim transaction Tvic +is isreasonable since the minimum side effects on nonsensitiveinformation are generated by modifying the selected Tvic(Line 7-8) Next the victim item in Tvic is modified and thedatabase and itemsets are updated respectively (Line 9ndash16)If diffu of SHi is not greater than zero the sensitive itemset isremoved from SH (Line 18) +e algorithm is terminateduntil all sensitive itemsets are hidden

43 An Illustrative Example For a given transaction data-base in Table 1 the high-utility itemsets derived at minutil

60 are listed in Table 2 +e user-specified sensitive itemsetsare b e ande f which are identified in boldface in Table 2+e proposed algorithm (IMSICF) is applied to hide sen-sitive itemsets

To hide the sensitive itemsets SH b e e f1113864 11138651113864 1113865 theconflict count of each item contained in SH is calculated+eresults are Icount(b) 1 Icount(e) 2 and Icount(f) 1Item e is selected to be the victim item because it has themaximum conflict count +en the sensitive itemset forhiding is randomly selected from among the onescontaining the victim item Let us assume that the selecteditemset is b e +e minimum utility to be reduced isdiffu 72 minus 60 + 1 13 and the sensitive transactions areST T1 T3 T41113864 1113865+e transaction weight of each transactionis assigned according to Formula (2) Becausetw(T1) 214 tw(T3) 92 and tw(T4) 426 thetransaction T4 is chosen for modification After the victimitem and transaction are identified the item is sanitizedaccording to the original database sanitization method de-scribed in Section 31 Because diffult u(e T4) minus eu(e) theinternal utility of item e is updated toiu(e T4) minus 1113862diffueu(e)1113863 2 +e utility of b e is reduced

4 Mathematical Problems in Engineering

to 54 +us b e is concealed and the nonsensitive itemsetsa b c e and a b c d e are falsely hidden +en e f ishidden by following the above steps After all of the sensitiveitemsets are hidden four nonsensitive itemsets namely ab c e a b c d e e and a e f are hidden

5 Experimental Analysis

To evaluate the performance of the IMSICF algorithm aseries of experiments have been conducted on various realand synthetic datasets in which IMSICF is compared withthe state-of-the-art algorithms Besides the experimentalresults are discussed in this section

51 ExperimentalData +e experiments were conducted ona 28 GHz Intel Xeon E5-2360 processor with 8GB RAM Toevaluate the performance of the proposed algorithm fourstate-of-the-art sanitization algorithms namely HHUIFMSICF MSU-MIU and MSU-MAU were used for com-parison Because the PPUMGAT algorithm hides sensitiveitemsets by transaction deletion we do not compare theproposed algorithm with PPUMGAT All of the algorithmswere implemented in Java language Four datasets [33] wereused to run the programs +e characteristics of thesedatasets are displayed in Table 3 +e density is measured asthe average transaction length divided by the number ofitems For each dataset the external utility values weregenerated with the Gaussian normal distribution and theinternal utility values are the random numbers ranging from1 to 10 +e EFIM algorithm [44] was used to mine high-utility itemsets and the minimum utility thresholds formushroom Foodmart T25I10D10K and T20I6D100K wereset at 866 0045 024 and 017 respectively +esensitive itemsets were randomly selected from the mineditemsets

+e proposed algorithm identifies the victim item basedon the conflict count of each sensitive item+us the conflictdegree of the sensitive itemsets is presented to observe howthe correlation among the itemsets influences the perfor-mance of the sanitization algorithms Besides the sensitive

Victim item selection A sensitive itemset Victim transaction

selectionSanitize the

original database

Figure 1 +e sanitization process for hiding a sensitive itemset

Input a database D a set of sensitive itemsets SH SH1 SH2 SHt1113864 1113865 the given utility threshold minutilOutput the sanitized database Dprime

(1) while SHneempty(2) Calculate Icount(ip) of each sensitive item ip ip isin SHi(3) Ivic argmaxipisinSHIcount(ip)

(4) for each SHi isin SHand Ivic isin SHi

(5) diffu u(SHi) minus minutil + 1(6) while diffugt 0(7) Find the sensitive transactions ST SHi sube ST(8) Tvic argmaxTisinSTtw(T)

(9) if diffugt u(Ivic Tvic) minus eu(Ivic)

(10) Delete Ivic(11) diffu diffu minus u(SHi Tvic)

(12) else(13) iu(Ivic) iu(Ivic) minus 1113862diffueu(Ivic)1113863

(14) diffu diffu minus eu(Ivic)lowast1113862diffueu(Ivic)1113863

(15) end if(16) Update the database and the itemsets(17) end while(18) SH SH minus SHi

(19) end for(20) end while

ALGORITHM 1 +e IMSICF algorithm

Table 2 Derived high-utility itemsets

HID Itemsets Utility1 e 962 a e 1253 b e 724 e f 885 a b e 886 a e f 827 a b c e 618 a b c d e 63

Mathematical Problems in Engineering 5

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 5: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

to 54 +us b e is concealed and the nonsensitive itemsetsa b c e and a b c d e are falsely hidden +en e f ishidden by following the above steps After all of the sensitiveitemsets are hidden four nonsensitive itemsets namely ab c e a b c d e e and a e f are hidden

5 Experimental Analysis

To evaluate the performance of the IMSICF algorithm aseries of experiments have been conducted on various realand synthetic datasets in which IMSICF is compared withthe state-of-the-art algorithms Besides the experimentalresults are discussed in this section

51 ExperimentalData +e experiments were conducted ona 28 GHz Intel Xeon E5-2360 processor with 8GB RAM Toevaluate the performance of the proposed algorithm fourstate-of-the-art sanitization algorithms namely HHUIFMSICF MSU-MIU and MSU-MAU were used for com-parison Because the PPUMGAT algorithm hides sensitiveitemsets by transaction deletion we do not compare theproposed algorithm with PPUMGAT All of the algorithmswere implemented in Java language Four datasets [33] wereused to run the programs +e characteristics of thesedatasets are displayed in Table 3 +e density is measured asthe average transaction length divided by the number ofitems For each dataset the external utility values weregenerated with the Gaussian normal distribution and theinternal utility values are the random numbers ranging from1 to 10 +e EFIM algorithm [44] was used to mine high-utility itemsets and the minimum utility thresholds formushroom Foodmart T25I10D10K and T20I6D100K wereset at 866 0045 024 and 017 respectively +esensitive itemsets were randomly selected from the mineditemsets

+e proposed algorithm identifies the victim item basedon the conflict count of each sensitive item+us the conflictdegree of the sensitive itemsets is presented to observe howthe correlation among the itemsets influences the perfor-mance of the sanitization algorithms Besides the sensitive

Victim item selection A sensitive itemset Victim transaction

selectionSanitize the

original database

Figure 1 +e sanitization process for hiding a sensitive itemset

Input a database D a set of sensitive itemsets SH SH1 SH2 SHt1113864 1113865 the given utility threshold minutilOutput the sanitized database Dprime

(1) while SHneempty(2) Calculate Icount(ip) of each sensitive item ip ip isin SHi(3) Ivic argmaxipisinSHIcount(ip)

(4) for each SHi isin SHand Ivic isin SHi

(5) diffu u(SHi) minus minutil + 1(6) while diffugt 0(7) Find the sensitive transactions ST SHi sube ST(8) Tvic argmaxTisinSTtw(T)

(9) if diffugt u(Ivic Tvic) minus eu(Ivic)

(10) Delete Ivic(11) diffu diffu minus u(SHi Tvic)

(12) else(13) iu(Ivic) iu(Ivic) minus 1113862diffueu(Ivic)1113863

(14) diffu diffu minus eu(Ivic)lowast1113862diffueu(Ivic)1113863

(15) end if(16) Update the database and the itemsets(17) end while(18) SH SH minus SHi

(19) end for(20) end while

ALGORITHM 1 +e IMSICF algorithm

Table 2 Derived high-utility itemsets

HID Itemsets Utility1 e 962 a e 1253 b e 724 e f 885 a b e 886 a e f 827 a b c e 618 a b c d e 63

Mathematical Problems in Engineering 5

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 6: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

percentage is used to evaluate the scalability of the saniti-zation approaches Sensitive percentage is measured as thenumber of sensitive itemsets divided by the number of high-utility itemsets +is value of parameter ranges from 01 to05 Moreover two-way ANOVA is used to evaluate thedifferences between the compared approaches Two-wayANOVA is a comparison of means between groups that havebeen split on two independent variables (called factors) +eP value is important because it indicates whether the dif-ference between the sanitization algorithms is significant IfP value is below 005 it means that there is a significantdifference between the compared approaches Otherwise itindicates that there is no significant difference between thesanitization approaches

52 Performance Measurement To evaluate the efficiencythe execution time of the sanitization process is measuredand the data processing stages are discarded On the otherhand five performance measures are used to evaluate theeffectiveness and summarized as follows

(1) Hiding failure (HF) it is the proportion of thesensitive itemsets that fail to be hidden which iscalculated as

HF SH Dprime( 1113857

SH(D) (3)

where SH(Dprime) and SH(D) are the sensitive high-utility itemsets mined from a sanitized database Dprimeand an original database D respectively

(2) Missing cost (MC) it is the proportion of themissingnonsensitive itemsets that are hidden by accidentafter sanitization which is computed as

MC NSH(D) minus NSH Dprime( 1113857

NSH(D) (4)

where NSH(Dprime) and NSH(D) are the nonsensitiveitemsets discovered from the databases Dprime and Drespectively

(3) Artificial cost (AC) it refers to the proportion of theartificial itemsets which is computed as

AC H Dprime( 1113857 minus H(D)capH Dprime( 1113857

H Dprime( 1113857 (5)

where H(Dprime) and H(D) are the high-utility itemsetsmined from the databases Dprime and D respectively

(4) Itemset utility similarity (IUS) it reveals the utilityloss for the discovered itemsets by the sanitizationprocess which is calculated as

IUS 1113936

XisinHUIsDprimeu(X)

1113936XisinHUIsD u(X) (6)

where 1113936XisinHUIs

Dprimeu(X)and 1113936XisinHUIsD u(X) denote the

utility of the high-utility itemsets discovered fromthe databases Dprime and D respectively

(5) Database utility similarity (DUS) it reveals the utilityloss for an original database by the sanitizationprocess which is calculated as

DUS 1113936TiisinDprimetu Ti( 1113857

1113936TiisinDtu Ti( 1113857 (7)

where 1113936TiisinDprimetu(Ti) and 1113936TiisinDtu(Ti) denote the utilityof the databases Dprime and D respectively

53 Execution Time +e results of the execution timesunder various sensitive percentages are plotted in Figure 2 Itis clear to see that the runtime is increased with the growth ofthe sensitive percentage +is is reasonable because theincreasing number of sensitive itemsets requires moretransactions for modification From Figure 2 we can alsoobserve that the proposed algorithm takes more time thanthe other algorithms +e reason is that HHUIF MSICFMSU_MIU and MSU_MAU sanitize the database based onthe concept of utility However IMSICF needs to calculatethe number of nonsensitive itemsets supported by eachsensitive transaction which costs a lot of time Besides notethat the runtime in mushroom is more than that in the otherdatasets +e reason is that mushroom is a much denserdataset

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for various datasets (P 315lowast 10minus 13 inFigure 2(a) P 241lowast 10minus 08 in Figure 2(b) P 215lowast 10minus 15

in Figure 2(c) and P 641lowast 10minus 18 in Figure 2(d))+e results of the execution times under various conflict

degrees for different databases are plotted in Figure 3 Wecan see that the runtime is decreased with the growth of theconflict degree in most cases +is is reasonable because thehigher the conflict degree is the more sensitive the itemsetswill be concealed at the same time From Figure 3 we alsofind that the proposed algorithm IMSICF costs more timethan the other algorithms in most cases +is is becauseIMSICF takes a lot of time to calculate the number ofnonsensitive itemsets supported by each sensitive

Table 3 +e characteristics of various databases

Dataset No of transactions No of items Avg trans length Density ()Mushroom 8124 119 23 193Foodmart 4141 1559 4 025T25I10D10K 10000 929 2477 266T20I6D100K 100000 893 199 222

6 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 7: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

01 02 03 04 05Sensitive percentage

0003

0012

0048

0192

0768

3072

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

006

024

096

384

1536

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

0001

0004

0016

0064

0256

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

025

2

16

128

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 2 Execution times under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0001

0008

0064

0512

4096

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0005

004

032

256

2048

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

0002

0004

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

001

1

100

Tim

e (s)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 3 Execution times under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 7

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 8: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

transaction However IMSICF performs the best in Figure 3because Foodmart is a very sparse dataset compared to theother datasets Moreover the sparse dataset indicates thatthe number of nonsensitive itemsets supported by eachtransaction is less +us the execution time in Foodmart isless than that in other datasets

Based on two-way ANOVA there is a significant dif-ference between the execution times of the sanitization al-gorithms for T25I10D10K and mushroom datasets(P 00037 in Figure 2(a) and P 0002 in Figure 2(d)) ForT20I6D100K and Foodmart datasets there is no significantdifference between the sanitization algorithms(P 013gt 005 in Figure 2(b) and P 044 in Figure 2(c))

54 Missing Costs +e results of the missing costs undervarious sensitive percentages are shown in Figure 4 It can beobserved that the missing costs are increased with the rise ofthe sensitive percentage due to the increasing number ofmodified transactions From the results of Figure 4 it is alsoclear to see that the proposed algorithm prevents morenonsensitive itemsets from being overhidden An importantreason is that the transaction with the minimum number ofnonsensitive itemsets and the maximum utility of sensitiveitemset is chosen for modification +e second reason is thatthe conflict count of each item contained in sensitiveitemsets is recalculated once a sensitive itemset is hiddenHowever the algorithms MSU_MAU MSU_MIU HHUIFand MSICF select the victim item based on the value ofutility +us the side effects on nonsensitive information arediscarded in the sanitization process Moreover based ontwo-way ANOVA there is a significant difference betweenthe missing costs of the sanitization algorithms for variousdatasets (P 146lowast 10minus 8 in Figure 2(a) P 943lowast 10minus 6 inFigure 2(b) P 424lowast 10minus 7 in Figure 2(c) and P 00006in Figure 2(d))

+e results of the missing costs under various conflictdegrees are shown in Figure 5 From the results of Figure 5 itcan be observed that the missing costs decrease as theconflict degree is increased +e reason is that the higher theconflict degree is the more sensitive the itemsets will behidden by modifying a victim item Correspondingly thenumber of nonsensitive itemsets concealed by mistake isdecreased From Figure 5 we can also find that the proposedalgorithm outperforms other algorithms +is is caused bythe sanitization strategy of IMSICF Besides it is noted thatthe number of missing nonsensitive itemsets inmushroom ismore than that in other datasets +e reason is thatmushroom is much denser compared to the other datasetswhich indicates that the modification of the dataset willcause more side effects on nonsensitive itemsets Moreoverbased on two-way ANOVA there is a significant differencebetween the missing costs of the sanitization algorithms forvarious datasets (P 0004 in Figure 2(a) P 0021 inFigure 2(b) P 204lowast 10minus 7 in Figure 2(c) and P 002 inFigure 2(d))

55 Itemset Utility Similarity +e results of the itemsetutility similarity under various sensitive percentages for

different datasets are plotted in Figure 6We can find that theIUS values are decreased with the growth of the sensitivepercentage +is is reasonable because the increase on thenumber of sensitive itemsets will cause more transactions tobe sanitized+us the damage to the nonsensitive itemsets iscorrespondingly increased From the results of Figure 6 wealso observe that the proposed algorithm outperforms theother four algorithms in most cases +e reason is thatIMSICF takes the side effects on nonsensitive knowledgeinto account when identifying the victim transactionHowever the other algorithms select the victim transactionbased on the value of utility without considering the damageto nonsensitive information by the sanitization processBesides it is interesting to see that the IUS values ofmushroom are lower than those of the other datasets sincemushroom is a very dense dataset

Based on two-way ANOVA there is a significant dif-ference between the IUS values of the sanitization algorithmsfor various datasets (P 236lowast 10minus 8 in Figure 2(a)P 00003 in Figure 2(b) P 844lowast10minus 7 in Figure 2(c) andP 00009 in Figure 2(d))

+e results of the itemset utility similarity under variousconflict degrees for different datasets are plotted in Figure 7It can be observed that the conflict degree has a great impacton the itemset utility similarity With the growth of thecorrelation among the sensitive itemsets more sensitiveitemsets are hidden when a sensitive itemset is sanitized+us less nonsensitive itemsets are concealed in error afterthe database sanitization and the IUS values are corre-spondingly increased From Figure 7 it is also clear to seethat the proposed algorithm outperforms the other algo-rithms under various conflict degrees +e reason is that theimpact on nonsensitive information is considered in theIMSICF algorithm Moreover the conflict count of eachsensitive item is dynamically calculated However the otheralgorithms only take the concept of utility into account inthe sanitization process

Based on two-way ANOVA there is a significant dif-ference between the IUS of the sanitization algorithms forvarious datasets (P 00046 in Figure 2(a) P 0025 inFigure 2(b) P 125lowast 10minus 7 in Figure 2(c) and P 0019 inFigure 2(d))

56 Database Utility Similarity +e results of the databaseutility similarity under various sensitive percentages areshown in Figure 8 It can be seen that the DUS values aredecreased with the growth of the sensitive percentage be-cause more items are modified for hiding more sensitiveitemsets As shown in Figure 8 we also observe that theIMSICF algorithm outperforms the other approaches exceptthe MSU_MIU algorithm +e reason is that MSU_MIUidentifies the victim transaction Tvic with the maximumutility of a sensitive itemset X and the item contained in Xwith the minimum utility is selected to be a victim item IvicIn addition the utility of X is reduced by u(X Tvic) when Ivicis removed from Tvic +us the database utility similarity ofMSU_MIU is higher than that of the other algorithmsHowever the proposed algorithm IMSICF selects the

8 Mathematical Problems in Engineering

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 9: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

01 02 03 04 05Sensitive percentage

30

45

60

75

90

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

15

28

41

54

67

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

49

53

57

61

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

45

58

71

84

97

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 4 MC values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

0

15

30

45

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

0

20

40

60

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

0

04

08

12

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

MC

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 5 MC values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 9

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 10: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

01 02 03 04 05Sensitive percentage

0

20

40

60

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

23

35

47

59

71

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

38

42

46

50

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

0

20

40

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 6 IUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

169 279 500 717 1405Conflict degree

35

48

61

74

87

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

50

60

70

80

90

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

986

989

992

995

998

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

0

50

100

IUS

()

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 7 IUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

10 Mathematical Problems in Engineering

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 11: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

sanitized item based on the side effects on nonsensitiveinformation+us IMSICF performs worse thanMSU_MIUin terms of DUS Moreover HHUIF MSICF and MSU_-MAU algorithms select the victim item with the maximumutility Hence these algorithms perform worse than theprevious two algorithms

Based on two-way ANOVA there is a significant dif-ference between the DUS of the sanitization algorithms forvarious datasets (P 327lowast 10minus 8 in Figure 2(a)P 255lowast10minus 7 in Figure 2(b) P 243lowast10minus 7 in Figure 2(c)and P 111lowast10minus 8 in Figure 2(d))

+e results of the database utility similarity under var-ious conflict degrees for different datasets are shown inFigure 9 +e DUS values are increased as the conflict degreeincreases +is is reasonable because few items are sanitizedwhen the sensitive itemsets have more common items InFigure 9 we also find that the MSU_MIU algorithm per-forms the best in terms of DUS under various conflict de-grees +e reason is that the item with the minimal utility ischosen for modification +e proposed algorithm IMSICFhas better performance than MSU_MAU HHUIF andMSICF because these algorithms identify the victim itemwith the maximum utility Besides note that DUS ofmushroom is lower than that of other datasets +is is be-cause the number of items supported by a transaction inmushroom is much higher compared to the other datasetsMoreover based on two-way ANOVA there is a significantdifference between the DUS of the sanitization algorithms

for various datasets (P 00016 in Figure 2(a) P 0039 inFigure 2(b) P 113lowast 10minus 6 in Figure 2(c) and P 00015in Figure 2(d))

+e above experimental results demonstrate that theproposed algorithm IMSICF outperforms the other state-of-the-art algorithms in terms of MC and IUS +e reason isthat IMSICF selects the victim transaction based on the sideeffects on nonsensitive itemsets Besides we can find that theMSU_MIU algorithm performs better than other algorithmsin DUS +is is reasonable because the victim item with theminimum utility is chosen formodification and the utility ofa sensitive itemset is reduced by the utility of the itemset inthe identified transaction when a victim item is removed Inaddition it can be observed that the density of a datasetaffects the performance of the itemset hiding

6 Conclusions

In this paper an improved sanitization algorithm calledIMSICF is proposed for privacy-preserving utility mining+is algorithm identifies the victim item with the maximumconflict count which is dynamically computed in the san-itization process +en the sensitive itemset containing thevictim item is selected to be hidden+e transaction with themaximum utility of the currently hidden sensitive itemsetand the minimum count of nonsensitive itemsets is chosenfor modification Hence the side effects on nonsensitiveknowledge are effectively reduced In our experiments real

01 02 03 04 05Sensitive percentage

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

01 02 03 04 05Sensitive percentage

994

996

998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

01 02 03 04 05Sensitive percentage

996

997

998

999

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

01 02 03 04 05Sensitive percentage

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 8 DUS values under various sensitive percentages (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

Mathematical Problems in Engineering 11

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 12: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

and synthetic datasets are used to evaluate the performanceof the proposed algorithm +e experimental results showthat IMSICF outperforms the state-of-the-art algorithms inmissing cost and itemset utility similarity at the expense of adegradation on efficiency Besides it is observed that theconflict degree of sensitive itemsets has a great impact on theperformance of the sanitization algorithms

For future work we will focus on preserving other formsof sensitive knowledge such as frequent and utility itemset

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the National Natural ScienceFoundation of China (Grant no 61802344) Zhejiang Pro-vincial Natural Science Foundation of China (Grant noLY16F030012) Ningbo Natural Science Foundation ofChina (Grant no 2017A610118) General Scientific ResearchProjects of Zhejiang Education Department (Grant noY201534788) and Youth Foundation for Humanities and

Social Sciences Research of Ministry of Education of China(Grant no 16YJCZH112)

References

[1] P Fournier-Viger J C W Lin B Vo et al ldquoA survey ofitemset miningrdquoWiley Interdisciplinary Reviews-DataMiningand Knowledge Discovery vol 7 no 4 p e1207 2017

[2] W Gan J C W Lin P Fournier-Viger et al ldquoA survey ofincremental high-utility itemset miningrdquo Wiley Interdisci-plinary Reviews DataMining and Knowledge Discovery vol 8no 2 p e1242 2018

[3] JWang F Liu and C Jin ldquoPHUIMUS a potential high utilityitemsets mining algorithm based on stream data with un-certaintyrdquo Mathematical Problems in Engineering vol 2017Article ID 8576829 13 pages 2017

[4] RWang Q Sun DMa and Z Liu ldquo+e small-signal stabilityanalysis of the droop-controlled converter in electromagnetictimescalerdquo IEEE Transactions on Sustainable Energy vol 10no 3 pp 1459ndash1469 2019

[5] D E OrsquoLeary ldquoKnowledge discovery as a threat to databasesecurityrdquo in Proceedings of the 1st International Conference inKnowledge Discovery and Database pp 507ndash516 Menlo ParkCA USA 1991

[6] U Yun H Ryang G Lee and H Fujita ldquoAn efficient al-gorithm for mining high utility patterns from incrementaldatabases with one database scanrdquo Knowledge-Based Systemsvol 124 pp 188ndash206 2017

[7] J Lee U Yun G Lee and E Yoon ldquoEfficient incrementalhigh utility pattern mining based on pre-large conceptrdquo

169 279 500 717 1405Conflict degree

9989

9993

9997

10001

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(a)

126 315 500 700 1457Conflict degree

9965

9975

9985

9995

10005

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(b)

119 316 568 787 1525Conflict degree

999

9992

9994

9996

9998

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(c)

277 350 568 720 1333Conflict degree

92

94

96

98

100

DU

S (

)

MSICF

HHUIFMSU_MIU MSU_MAU

IMSICF

(d)

Figure 9 DUS values under various conflict degrees (a) T25I10D10K (b) T20I6D100K (c) Foodmart (d) mushroom

12 Mathematical Problems in Engineering

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 13: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

Engineering Applications of Artificial Intelligence vol 72pp 111ndash123 2018

[8] R Agrawal and R Srikant ldquoPrivacy-preserving data miningrdquoACM SIGMOD Record vol 29 no 2 pp 439ndash450 2000

[9] R Mendes and J P Vilela ldquoPrivacy-preserving data miningmethods metrics and applicationsrdquo IEEE Access vol 5pp 10562ndash10582 2017

[10] M Atallah E Bertino A Elmagarmid et al ldquoDisclosurelimitation of sensitive rulesrdquo in Proceedings of the 1999Workshop on Knowledge and Data Engineering Exchangepp 45ndash52 Chicago IL USA November 1999

[11] E Dasseni V S Verykios A K Elmagarmid and E BertinoldquoHiding association rules by using confidence and supportrdquoInformation Hiding Springer Berlin Germany pp 369ndash3832001

[12] V S Verykios A K Elmagarmid E Bertino Y Saygin andE Dasseni ldquoAssociation rule hidingrdquo IEEE Transactions onKnowledge and Data Engineering vol 16 no 4 pp 434ndash4472004

[13] S R M Oliveira and O R Zaıane ldquoProtecting sensitiveknowledge by data sanitizationrdquo in Proceedings of the 3rdInternational Conference on Data Mining pp 613ndash616Leipzig Germany July 2003

[14] A Amiri ldquoDare to share protecting sensitive knowledge withdata sanitizationrdquo Decision Support Systems vol 43 no 1pp 181ndash191 2007

[15] A Gkoulalas-Divanis and V S Verykios ldquoExact knowledgehiding through database extensionrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 5 pp 699ndash7132009

[16] C-M Wu and Y-F Huang ldquoA cost-efficient and versatilesanitizing algorithm by using a greedy approachrdquo SoftComputing vol 15 no 5 pp 939ndash952 2011

[17] T-P Hong C-W Lin K-T Yang and S-L Wang ldquoUsingTF-IDF to hide sensitive itemsetsrdquo Applied Intelligencevol 38 no 4 pp 502ndash510 2013

[18] H Q Le S Arch-int and N Arch-int ldquoAssociation rulehiding based on intersection latticerdquo Mathematical Problemsin Engineering vol 2013 Article ID 210405 11 pages 2013

[19] H Q Le S Arch-int H X Nguyen and N Arch-int ldquoAs-sociation rule hiding in risk management for retail supplychain collaborationrdquo Computers in Industry vol 64 no 7pp 776ndash784 2013

[20] R A Shah and S Asghar ldquoPrivacy preserving in associationrules using a genetic algorithmrdquo Turkish Journal of ElectricalEngineering amp Computer Sciences vol 22 no 2 pp 434ndash4502014

[21] P Cheng and J-S Pan ldquoUse EMO to protect sensitiveknowledge in association rule mining by adding itemsrdquo inProceedings of the Companion Publication of the 2014 AnnualConference on Genetic and Evolutionary Computationpp 65-66 Vancouver Canada June 2014

[22] P Cheng J-S Pan and C-W Lin ldquoPrivacy preserving as-sociation rule mining using binary encoded NSGA-IIrdquo Lec-ture Notes in Computer Science Springer Berlin Germanypp 87ndash99 2014

[23] P Cheng J-S Pan and C-W L Harbin ldquoUse EMO toprotect sensitive knowledge in association rule mining byremoving itemsrdquo in Proceedings of the 2014 Congress onEvolutionary Computation pp 1108ndash1115 Beijing ChinaJune 2014

[24] P Cheng C W Lin and J S Pan ldquoUse HypE to hide as-sociation rules by adding itemsrdquo PLoS One vol 10 no 6Article ID e0127834 2015

[25] P Cheng C-W Lin J-S Pan and I Lee ldquoManage thetradeoff in data sanitizationrdquo IEICE Transactions on Infor-mation and Systems vol E98D no 10 pp 1856ndash1860 2015

[26] P Cheng J F Roddick S-C Chu and C-W Lin ldquoPrivacypreservation through a greedy distortion-based rule-hidingmethodrdquo Applied Intelligence vol 44 no 2 pp 295ndash3062016

[27] J C-W Lin Y Zhang B Zhang et al ldquoHiding sensitiveitemsets with multiple objective optimizationrdquo Soft Com-puting vol 23 no 23 pp 12779ndash12797 2019

[28] J-S Yeh and P-C Hsu ldquoHHUIF and MSICF novel algo-rithms for privacy preserving utility miningrdquo Expert Systemswith Applications vol 37 no 7 pp 4779ndash4786 2010

[29] J S Yeh P C Hsu and M H Wen ldquoNovel algorithms forprivacy preserving utility miningrdquo in Proceedings of the 8thInternational Conference on Intelligent Systems Design andApplications pp 291ndash296 Kaohsiung Taiwan November2008

[30] R R Rajalaxmi and A M Natarajan ldquoA novel sanitizationapproach for privacy preserving utility itemset miningrdquoComputer and Information Science vol 1 no 3 pp 77ndash822009

[31] C-W Lin T-P Hong H-C Hsu et al ldquoReducing side effectsof hiding sensitive itemsets in privacy preserving data min-ingrdquo e Scientific World Journal vol 2014 Article ID398269 13 pages 2014

[32] U Yun and J Kim ldquoA fast perturbation algorithm using treestructure for privacy preserving utility miningrdquo Expert Sys-tems with Applications vol 42 no 3 pp 1149ndash1165 2015

[33] J C-W Lin T-Y Wu P Fournier-Viger G Lin J Zhan andM Voznak ldquoFast algorithms for hiding sensitive high-utilityitemsets in privacy-preserving utility miningrdquo EngineeringApplications of Artificial Intelligence vol 55 no C pp 269ndash284 2016

[34] J C-W Lin T-Y Wu P Fournier-Viger G Lin T-P Hongand J-S Pan ldquoA sanitization approach of privacy preservingutility miningrdquo Advances in Intelligent Systems and Com-puting Springer Berlin Germany pp 47ndash57 2016

[35] J C-W Lin T-P Hong P Fournier-Viger Q LiuJ-W Wong and J Zhan ldquoEfficient hiding of confidentialhigh-utility itemsets with minimal side effectsrdquo Journal ofExperimental amp eoretical Artificial Intelligence vol 29no 6 pp 1225ndash1245 2017

[36] S Li N Mu J Le and X Liao ldquoA novel algorithm for privacypreserving utility mining based on integer linear program-mingrdquo Engineering Applications of Artificial Intelligencevol 81 pp 300ndash312 2019

[37] R R Rajalaxmi and A M Natarajan ldquoEffective sanitizationapproaches to hide sensitive utility and frequent itemsetsrdquoIntelligent Data Analysis vol 16 no 6 pp 933ndash951 2012

[38] X Liu F Xu and X Lv ldquoA novel approach for hidingsensitive utility and frequent itemsetsrdquo Intelligent DataAnalysis vol 22 no 6 pp 1259ndash1278 2018

[39] B Le D-T Dinh V-N Huynh Q-M Nguyen andP Fournier-Viger ldquoAn efficient algorithm for hiding highutility sequential patternsrdquo International Journal of Approx-imate Reasoning vol 95 pp 77ndash92 2018

[40] L T T Nguyen P Nguyen T D D Nguyen B VoP Fournier-Viger and V S Tseng ldquoMining high-utilityitemsets in dynamic profit databasesrdquo Knowledge-BasedSystems vol 175 pp 130ndash144 2019

[41] S Krishnamoorthy ldquoHMiner efficiently mining high utilityitemsetsrdquo Expert Systems with Applications vol 90pp 168ndash183 2017

Mathematical Problems in Engineering 13

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering

Page 14: AnImprovedSanitizationAlgorithminPrivacy-Preserving ...downloads.hindawi.com/journals/mpe/2020/7489045.pdf · of high-utility itemsets mining is introduced. Section 4 describes the

[42] A Telikani and A Shahbahrami ldquoData sanitization in as-sociation rule mining an analytical reviewrdquo Expert Systemswith Applications vol 96 pp 406ndash426 2018

[43] U Yun and D Kim ldquoAnalysis of privacy preserving ap-proaches in high utility pattern miningrdquo Advances in Com-puter Science and Ubiquitous Computing Springer Singaporepp 883ndash887 2016

[44] S Zida P Fournier-Viger J C-W Lin C-W Wu andV S Tseng ldquoEFIM a highly efficient algorithm for high-utilityitemset miningrdquo Lecture Notes in Computer Science SpringerCham Switzerland pp 530ndash546 2015

14 Mathematical Problems in Engineering