shapr: an efficient and versatile membership privacy risk

16
SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning Vasisht Duddu University of Waterloo Waterloo, Canada [email protected] Sebastian Szyller Aalto University Espoo, Finland [email protected] N. Asokan University of Waterloo Waterloo, Canada [email protected] ABSTRACT Data used to train machine learning (ML) models can be sensitive. Membership inference attacks (MIAs), attempting to determine whether a particular data record was used to train an ML model, risk violating membership privacy. ML model builders need a principled definition of a metric that enables them to quantify the privacy risk of (a) individual training data records, (b) independently of specific MIAs, (c) efficiently. None of the prior work on membership privacy risk metrics simultaneously meets all of these criteria. We propose such a metric, SHAPr, which uses Shapley values to quantify a model’s memorization of an individual training data record by measuring its influence on the model’s utility. This mem- orization is a measure of the likelihood of a successful MIA. Using ten benchmark datasets, we show that SHAPr is effective (precision: 0.94±0.06, recall: 0.88±0.06) in estimating susceptibility of a training data record for MIAs, and is efficient (computable within minutes for smaller datasets and in 90 minutes for the largest dataset) . SHAPr is also versatile in that it can be used for other purposes like assessing fairness or assigning valuation for subsets of a dataset. For example, we show that SHAPr correctly captures the dispro- portionate vulnerability of different subgroups to MIAs. Using SHAPr, we show that the membership privacy risk of a dataset is not necessarily improved by removing high risk training data records, thereby confirming an observation from prior work in a significantly extended setting (in ten datasets, removing up to 50% of data). KEYWORDS Membership Inference Attacks, Data Privacy, Deep Learning. 1 INTRODUCTION Assessing the privacy risks of data is necessary as highlighted by several official reports from government institutions (NIST [44], the White House [18], and the UK Information Commissioner’s Office [24]). Membership inference attacks (MIAs) are a potential threat to privacy of an individual’s data used for training ML mod- els [38, 40, 42, 46]. These attacks exploit the difference in a model’s prediction on training and testing datasets to infer whether a given data record was used to train that model. For datasets contain- ing sensitive data, MIAs constitute a privacy threat. For instance, identifying that a randomly sampled person’s data was used to train a health-related ML model can allow an adversary to infer the health status of that person. Hence, measuring the membership privacy risks of training data records is essential for data privacy risk assessment. Several existing tools, like MLPrivacyMeter [32] and MLDoc- tor [29] can quantify membership privacy risk. They are based on measuring the success rate of specific known MIAs [34, 38, 40, 46]. In addition, these attacks use aggregate metrics such as accuracy, precision and recall over all training data records, and are not de- signed for quantifying individual, record-level membership privacy risk. Song and Mittal [42] proposed a record-level probabilistic met- ric (hereafter referred as SPRS) defined as the likelihood of a data record being present in the target model’s training dataset. SPRS is intended to be used by adversaries rather than model builders. It also relies on specific MIAs. Ideally, a membership privacy risk metric should capture the root cause of MIAs, namely the memorization of training data records. Such a metric will be independent of any specific attack and thus be applicable to any future MIAs as well. Hence, there is a need for a principled definition of membership privacy risk metric for individual training data records. We present SHAPr, a membership privacy risk metric designed for model builders. The intuition behind SHAPr is that membership privacy risk for a model can be estimated by measuring the extent to which the model memorizes an individual training data record. This is done by estimating the influence of each training data record on the model’s utility using the leave-one-out training approach [13, 30]. However, directly using the leave-one-out approach for each data record is computationally expensive [15, 21–23]. Shapley values is a well-known notion in game theory used to quantify the contributions of individuals within groups [39]. It was shown to be capable of capturing the influence of training data records on a model’s utility by approximating the leave-one- out training [15, 22]. Crucially, Shapley values can be efficiently computed in one go for every training data record (see Section 4) without having to train two models for each training data record (with and without that data record in the training dataset). Shapley values have been recently used in the context of data valuation in ML (to estimate economic value of a data record) [14, 15, 21, 22]. They have also been used for estimating attribute influence for explainability [31]. We propose using Shapley values as the basis to quantify the membership privacy risks of individual training data records. We make the following contributions: (1) SHAPr 1 , the first efficiently computable metric with a principled approach (independently of specific MIAs), using Shapley values, for estimating membership privacy risk for individual training data records. (Section 4) (2) Showing that 1 We plan to open-source our implementation. arXiv:2112.02230v1 [cs.CR] 4 Dec 2021

Upload: others

Post on 05-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SHAPr: An Efficient and Versatile Membership Privacy Risk

SHAPr: An Efficient and Versatile Membership Privacy RiskMetric for Machine Learning

Vasisht Duddu

University of Waterloo

Waterloo, Canada

[email protected]

Sebastian Szyller

Aalto University

Espoo, Finland

[email protected]

N. Asokan

University of Waterloo

Waterloo, Canada

[email protected]

ABSTRACT

Data used to train machine learning (ML) models can be sensitive.

Membership inference attacks (MIAs), attempting to determine

whether a particular data record was used to train anMLmodel, risk

violating membership privacy. ML model builders need a principled

definition of a metric that enables them to quantify the privacy risk

of (a) individual training data records, (b) independently of specificMIAs, (c) efficiently. None of the prior work on membership privacy

risk metrics simultaneously meets all of these criteria.

We propose such a metric, SHAPr, which uses Shapley values

to quantify a model’s memorization of an individual training data

record by measuring its influence on the model’s utility. This mem-

orization is a measure of the likelihood of a successful MIA.

Using ten benchmark datasets, we show that SHAPr is effective(precision: 0.94±0.06, recall: 0.88±0.06) in estimating susceptibility

of a training data record for MIAs, and is efficient (computable

within minutes for smaller datasets and in ≈90 minutes for the

largest dataset) .

SHAPr is also versatile in that it can be used for other purposes

like assessing fairness or assigning valuation for subsets of a dataset.

For example, we show that SHAPr correctly captures the dispro-

portionate vulnerability of different subgroups to MIAs.

Using SHAPr, we show that the membership privacy risk of a

dataset is not necessarily improved by removing high risk training

data records, thereby confirming an observation from prior work

in a significantly extended setting (in ten datasets, removing up to

50% of data).

KEYWORDS

Membership Inference Attacks, Data Privacy, Deep Learning.

1 INTRODUCTION

Assessing the privacy risks of data is necessary as highlighted by

several official reports from government institutions (NIST [44],

the White House [18], and the UK Information Commissioner’s

Office [24]). Membership inference attacks (MIAs) are a potential

threat to privacy of an individual’s data used for training ML mod-

els [38, 40, 42, 46]. These attacks exploit the difference in a model’s

prediction on training and testing datasets to infer whether a given

data record was used to train that model. For datasets contain-

ing sensitive data, MIAs constitute a privacy threat. For instance,

identifying that a randomly sampled person’s data was used to

train a health-related ML model can allow an adversary to infer

the health status of that person. Hence, measuring the membershipprivacy risks of training data records is essential for data privacyrisk assessment.

Several existing tools, like MLPrivacyMeter [32] and MLDoc-

tor [29] can quantify membership privacy risk. They are based on

measuring the success rate of specific known MIAs [34, 38, 40, 46].

In addition, these attacks use aggregate metrics such as accuracy,

precision and recall over all training data records, and are not de-

signed for quantifying individual, record-level membership privacy

risk. Song and Mittal [42] proposed a record-level probabilistic met-

ric (hereafter referred as SPRS) defined as the likelihood of a data

record being present in the target model’s training dataset. SPRS is

intended to be used by adversaries rather than model builders. It

also relies on specific MIAs.

Ideally, a membership privacy risk metric should capture the root

cause of MIAs, namely the memorization of training data records.

Such a metric will be independent of any specific attack and thus

be applicable to any future MIAs as well. Hence, there is a need

for a principled definition of membership privacy risk metric for

individual training data records.

We present SHAPr, a membership privacy risk metric designed

for model builders. The intuition behind SHAPr is that membership

privacy risk for a model can be estimated by measuring the extent

to which the model memorizes an individual training data record.

This is done by estimating the influence of each training data record

on the model’s utility using the leave-one-out training approach [13,

30]. However, directly using the leave-one-out approach for each

data record is computationally expensive [15, 21–23].

Shapley values is a well-known notion in game theory used

to quantify the contributions of individuals within groups [39]. It

was shown to be capable of capturing the influence of training

data records on a model’s utility by approximating the leave-one-

out training [15, 22]. Crucially, Shapley values can be efficiently

computed in one go for every training data record (see Section 4)

without having to train two models for each training data record(with and without that data record in the training dataset). Shapley

values have been recently used in the context of data valuation in

ML (to estimate economic value of a data record) [14, 15, 21, 22].

They have also been used for estimating attribute influence for

explainability [31]. We propose using Shapley values as the basis to

quantify the membership privacy risks of individual training data

records. We make the following contributions:

(1) SHAPr1, the first efficiently computable metric with a principled

approach (independently of specificMIAs), using Shapley values,

for estimating membership privacy risk for individual trainingdata records. (Section 4)

(2) Showing that

1We plan to open-source our implementation.

arX

iv:2

112.

0223

0v1

[cs

.CR

] 4

Dec

202

1

Page 2: SHAPr: An Efficient and Versatile Membership Privacy Risk

• SHAPr is effective at assessing susceptibility of a training

data record to state-of-the-art MIAs (0.93 ± 0.06 precision

and 0.88 ± 0.06 recall across ten benchmark datasets). This

is comparable SPRS (adapted to the model builder’s setting);

(Section 6.1)

• unlike SPRS, SHAPr can correctly estimate how adding noise

to a subset of the training dataset impacts membership pri-

vacy risk, (Section 6.2);

• metrics like SPRS, based on specific MIAs, may not correctly

assess susceptibility to future attacks (Section 6.3); and

• SHAPr scores can be computed significantly more efficiently

than the direct application of the leave-one-out approach.

(Section 6.4)

(3) Demonstrating that SHAPr is versatile in that it

• can be used to estimate fairness of the distribution of mem-

bership privacy risk across subsets of a dataset grouped ac-

cording different attribute values, (Section 7.1), and

• inherits other established applications of Shapley values like

data valuation in data marketplaces [15, 21–23]. (Section 7.2)

(4) Using SHAPr, to show that removing data records with high

membership privacy risk does not necessarily reduce risk for

the remaining data records, confirming the observation by Long

et al. [30], but on a broader scale, using ten large datasets (vs.

one), and exploring the effect of removing up to 50% of training

data records (vs. 2%). (Section 8)

2 BACKGROUND

Consider a training dataset 𝐷𝑡𝑟 = {𝑥𝑖 , 𝑦𝑖 }𝑛𝑖=1 containing input fea-

tures 𝑥𝑖 ∈ 𝑋 and corresponding classification labels 𝑦𝑖 ∈ 𝑌 where

𝑋 and 𝑌 are the space of all possible inputs and corresponding

labels. A machine learning (ML) classifier is a model 𝑓\ which maps

the inputs to the corresponding classification labels 𝑓\ : 𝑋 → 𝑌 .

The function parameters \ are updated by minimizing the loss be-

tween the model’s prediction 𝑓\ (𝑥) on input 𝑥 and the true labels

𝑦. The loss is minimized using gradient based algorithms such as

stochastic gradient descent.

2.1 Membership Inference Attacks

MIAs differentiate betweenmembers and non-members of the train-

ing dataset of a model using the output predictions of that model,

or some function of them. We identify four main types of MIAs

proposed in the literature:

Shadow Models [40]. Shokri et al. proposed the first attack that

uses a ML attack model to distinguish between a member and

non-member based on the predictions of the target model. The

attack assumes that the adversary has auxiliary data including

some training data records used by the target model. This auxiliary

data is used to train multiple shadow models to mimic the utility of

the target model. An attack ML model is then trained to distinguish

between members and non-members using the predictions of the

shadow models. Given a prediction from the target model for an

arbitrary input, the attack model can classify it as a member or a

non-member. This attack has two main drawbacks: first, it assumes

a strong adversary who has partial knowledge about the target

model’s training data, and second it incurs a high computational

overhead due to the need to train multiple (shadow) models.

Prediction Correctness [46]. An alternative approach, thatmakes

weaker assumptions regarding adversary capabilities, relies on the

fact that models which do not generalize well make correct pre-

dictions on training data records but not on testing data records.

The adversary decides that a data record is a member if it is cor-

rectly predicted by the target model and non-member otherwise.

This attack is particularly applicable in settings where the target

model outputs only a label. However, the attack works for poorly

generalizing models and assumes the adversary knows the ground

truth labels for the data records used to probe the target model.

Prediction Confidence [38, 46]. A third approach uses prediction

confidence reported by the target model across all classes. Givenan input data record, the target model outputs a vector describing

the confidence that the record belongs to each class. The maximum

confidence value is likely to be higher for an input data record that

was also part of the training set, than for one that was not [40,

41]. The prediction confidence attack relies on this observation: it

declares an input data record as a member if the associated highest

confidence is higher than an adversary-chosen threshold, and as

a non-member otherwise. Unlike Prediction Correctness attacks,

Prediction Confidence attacks do not require the adversary to have

any knowledge of the target model’s training data or the ground

truth for the input data record. However it assumes that the target

model outputs confidence values for all classes.

Prediction Entropy [38, 40, 42]. Rather than relying on the max-

imum confidence value in the output prediction, an adversary may

resort to a more sophisticated function defined over the set of

confidence values in the prediction. The entropy in a model’s pre-

diction (i.e., information gain for the adversary) is the uncertainty

in predictions [38, 40]. The entropy differs for training and testing

data records which the adversary can use as the basis for deciding

whether an input data record was in the training set. For instance,

the output for a training data record is likely to be close to a one-hot

encoding, resulting in a prediction entropy close to zero. Testing

data records are likely to have higher prediction entropy values. As

with the previous method, the adversary can choose a threshold

for the prediction entropy to decide whether an input data record

is a member or not.

In the rest of this paper, we focus on Prediction Entropy attacks;

we use a modified entropy function as the basis for the state-of-the-

art MIA [42] (described in detail in Section 6.1).

2.2 Memorization in Deep Learning

Membership privacy risk (susceptibility to MIAs) occurs due the

fact that deep learning models, with their inherent large capacity,

tend to memorize training data records [12, 34]. Memorization of

an 𝑖𝑡ℎ data record 𝑧𝑖 = (𝑥𝑖 , 𝑦𝑖 ) with input features 𝑥𝑖 and label 𝑦𝑖in a training dataset 𝐷𝑡𝑟 can be estimated as the difference in the

prediction of a model on input features 𝑥𝑖 when the model was

trained with and without 𝑧𝑖 in its training set [13]. Formally, for

a specific model 𝑓\ drawn from the set of models for a training

algorithm A, this can be written as follows:

𝑚𝑒𝑚(𝑧𝑖 , 𝐷𝑡𝑟 ,A) = |𝑃𝑟 𝑓\∼A(𝐷) [𝑓\ (𝑥𝑖 ) = 𝑦𝑖 ]−𝑃𝑟 𝑓\∼A(𝐷𝑡𝑟 \𝑧𝑖 ) [𝑓\ (𝑥𝑖 ) = 𝑦𝑖 ] |

(1)

2

Page 3: SHAPr: An Efficient and Versatile Membership Privacy Risk

If𝑚𝑒𝑚(𝑧𝑖 , 𝐷𝑡𝑟 ,A) is high, the model is likely to have memorized

𝑧𝑖 . The above formulation of memorization is a notion of leave-

one-out stability which captures the extent to which the presence

of a record in the training dataset influences the model’s output

predictions [13].

2.3 Shapley Values

An alternative approach to capture the influence of a training data

record is by estimating Shapley values [15, 21–23]. Shapley values

are of the form,

𝜙𝑖 =1

|𝐷𝑡𝑟 |∑︁

𝑆⊆𝐷𝑡𝑟 \{𝑧𝑖 }

1( |𝐷𝑡𝑟−1 ||𝑆 |

) [𝑈 (𝑆 ∪ {𝑧𝑖 }) −𝑈 (𝑆)] (2)

where 𝑆 is a randomly chosen subset of 𝐷𝑡𝑟 \{𝑧𝑖 } and𝑈 (𝑆) (accu-racy of 𝑓\ on a testing dataset 𝐷𝑡𝑒 when trained on 𝑆) is a util-

ity metric.

( |𝐷𝑡𝑟−1 ||𝑆 |

)denotes the binomial coefficient for choosing

|𝐷𝑡𝑟 − 1| elements from a set of |𝑆 | elements. Here, the Shapley

value of 𝑧𝑖 is defined as the average marginal contribution of 𝑧𝑖 to

𝑈 (𝑆) over all training data subsets 𝑆 ⊆ 𝐷𝑡𝑟 \{𝑧𝑖 }. Evaluating the

Shapley function naïvely for all possible subsets with and without

𝑧𝑖 is computationally expensive (complexity of 𝑂 (2 |𝐷𝑡𝑟 | for |𝐷𝑡𝑟 |data records [23]) and not scalable (leading to the same problem as

with naïve leave-one-out) [13, 30].

However, Shapley values can be efficiently computed using a

𝐾-Nearest Neighbours (𝐾-NN) classifier as a surrogate model [23].

Unlike the naïve approach to computing Shapley values which

requires training two models for each training data record, the𝐾-NN model, once trained, can be used to compute the Shapley

values for all training data records. This improves the computational

complexity to O(|𝐷𝑡𝑟 |𝑙𝑜𝑔( |𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |)) compared to exponential

complexity of the formulation in Equation 2. We now outline this

approach [23]2. For a given 𝑧𝑖 , we can first compute the partial

contribution 𝜙𝑡𝑒𝑠𝑡𝑖

of a single test data record 𝑧𝑡𝑒𝑠𝑡 to the Shapley

value 𝜙𝑖 of 𝑧𝑖 , and then add up these partial contributions across

the entire 𝐷𝑡𝑒 .

Step 1. The training of the 𝐾-NN classifier consists of passing 𝐷𝑡𝑟and a single testing data record 𝑧𝑡𝑒𝑠𝑡 = (𝑥𝑡𝑒𝑠𝑡 , 𝑦𝑡𝑒𝑠𝑡 ) ∈ 𝐷𝑡𝑒 , as

an input to the target classifier 𝑓\ which outputs the probability

scores across all classes. The outputs 𝑓\ (𝐷𝑡𝑟 ) and 𝑓\ (𝑥𝑡𝑒𝑠𝑡 ) andtheir corresponding true labels are used for further computation.

Step 2. During inference, for 𝑧𝑡𝑒𝑠𝑡 , the 𝐾-NN classifier identifies

the top 𝐾 closest training data records (𝑥𝛼1 , · · · , 𝑥𝛼𝐾 ) with la-

bels (𝑦𝛼1 , · · · , 𝑦𝛼𝐾 ) using the distance between the predictions

(𝑓\ (𝑥𝛼1 ), · · · , 𝑓\ (𝑥𝛼𝐾 )) and 𝑓\ (𝑥𝑡𝑒𝑠𝑡 ). We use 𝛼 𝑗 (𝑆) to indicate the

index of the training data record, among all data records in 𝑆 ,

whose output prediction is the 𝑗𝑡ℎ closest to 𝑓\ (𝑥test). For brevity,𝛼 𝑗 (𝐷𝑡𝑟 ) is written simply as 𝛼 𝑗 . Following prior work on data val-

uation [21, 23], we use 𝐾 = 5.Step 3. The𝐾-NN classifier assigns majority label corresponding to

the top 𝐾 training data records as the label to 𝑥test. The probability

of the classifier assigning the correct label is given as: 𝑃 [𝑓\ (𝑥test) =𝑦test] = 1

𝐾

∑𝐾𝑖=1 1[𝑦𝛼𝑖 = 𝑦test]. Hence, the utility of the classifier

with respect to the subset 𝑆 , and the single test data record 𝑧𝑡𝑒𝑠𝑡 , is

computed as𝑈 𝑡𝑒𝑠𝑡 (𝑆) = 1𝐾

∑min{𝐾, |𝑆 | }𝑘=1

1[𝑦𝛼𝑘 (𝑆) = 𝑦test].

2Source code: https://github.com/AI-secure/Shapley-Study

Step 4.Consider all data records in𝐷𝑡𝑟 sorted as above {· · · , 𝑧𝛼𝑖−1 , 𝑧𝛼𝑖 , 𝑧𝛼𝑖+1 , · · · }.From Equation 2, the difference between the partial contributions

for two adjacent data records 𝑧𝛼𝑖 , 𝑧𝛼𝑖+1 ∈ 𝐷𝑡𝑟 is given by 𝜙𝑡𝑒𝑠𝑡𝛼𝑖−

𝜙𝑡𝑒𝑠𝑡𝛼𝑖+1 =

1

|𝐷𝑡𝑟 | − 1

∑︁𝑆⊆𝐷𝑡𝑟 \{𝑧𝛼𝑖 ,𝑧𝛼𝑖+1 }

[𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ {𝑧𝛼𝑖 }) −𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ 𝑧𝛼𝑖+1 )]( |𝐷𝑡𝑟−2 ||𝑆 |

)(3)

Using the 𝐾-NN utility function: 𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ {𝑧𝛼𝑖 }) −𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪𝑧𝛼𝑖+1 ) =

1[𝑦𝛼𝑖 =𝑦test ]−1[𝑦𝛼𝑖+1=𝑦test ]𝐾

.

Once the label for 𝑥𝑡𝑒𝑠𝑡 is assigned, the partial contribution can

be computed recursively starting from the farthest data record:

𝜙𝑡𝑒𝑠𝑡𝛼 |𝐷𝑡𝑟 |=1[𝑦𝛼 |𝐷𝑡𝑟 | = 𝑦test]

|𝐷𝑡𝑟 |(4)

𝜙𝑡𝑒𝑠𝑡𝛼𝑖= 𝜙𝑡𝑒𝑠𝑡𝛼𝑖+1+

1[𝑦𝛼𝑖 = 𝑦test] − 1[𝑦𝛼𝑖+1 = 𝑦test]𝐾

min{𝐾, 𝑖}𝑖

(5)

The fractionmin{𝐾,𝑖 }

𝑖 is obtained by simplifying the binomial

coefficient (the full derivation can be found in Theorem 1 of Jia et

al. [21]). The intuition behind Equation 5 is that the contribution of

𝑧𝛼𝑖 is 0 if the nearest neighbor of 𝑧𝛼𝑖 in 𝑆 is closer to 𝑧𝑡𝑒𝑠𝑡 than 𝑧𝛼𝑖 ,

and 1 otherwise. Using the above steps, we get 𝜙𝑡𝑒𝑠𝑡 for each 𝑧𝑡𝑒𝑠𝑡of size 𝐷𝑡𝑟 × 1. This recursive formulation in Equation 5 can be

extended across all 𝐷𝑡𝑒 to obtain a matrix [𝜙𝑡𝑒𝑠𝑡𝑖

] of size 𝐷𝑡𝑟 ×𝐷𝑡𝑒 .The final Shapley values can be obtained by aggregating the partial

contributions 𝜙𝑡𝑒𝑠𝑡𝑖

across 𝐷𝑡𝑒 .

2.4 Song and Mittal’s Privacy Risk Scores

Song and Mittal [42] describe a membership privacy risk metric3

(whichwe refer to as SPRS) that defines themembership privacy risk

score of 𝑧𝑖 = (𝑥𝑖 , 𝑦𝑖 ) as the posterior probability that 𝑧𝑖 ∈ 𝐷𝑡𝑟 giventhe output predictions from the model 𝑓\ (𝑥𝑖 ). They compute the

score as 𝑟 (𝑧𝑖 ) = 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 |𝑓\ (𝑥𝑖 )). This probability is computed

using Bayes’ theorem as

𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) + 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑒 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑒 )

(6)

They assume that the probability of the data record belonging to the

training/testing dataset is equally likely, 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 ) = 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑒 )= 0.5. The membership privacy risk scores rely on training shadow

models on auxiliary dataset to mimic the functionality of the tar-

get model. The conditional probabilities 𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) and𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑒 ) are then computed using the shadow model’s

output predictions on the auxiliary training and testing dataset.

Further, instead of using fixed threshold based prediction entropy

MIA, each class has a threshold for deciding the data record’s mem-

bership which are computed using the auxiliary dataset. The con-

ditional probabilities are estimated per class 𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) ={𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 , 𝑦 = 𝑦𝑖 )} across all class labels 𝑦 = 𝑦𝑖 .

Traditional MIAs require the adversary to sample arbitrary data

records to infer their membership status. SPRS is designed as a tool

for adversaries to identify data samples which are more likely to

be members instead of sampling a large number of data records.

3Source Code: https://github.com/inspire-group/membership-inference-evaluation/

blob/master/privacy_risk_score_utils.py

3

Page 4: SHAPr: An Efficient and Versatile Membership Privacy Risk

3 PROBLEM STATEMENT

Our goal is to design an efficient metric for model builders to assess

membership privacy risk, independent of specific MIAs. We lay

out the system and adversary models (Section 3.1), describe the

desiderata for such a metric (Section 3.2), and outline the challenges

in designing such a metric by discussing limitations of prior work

(Section 3.3).

3.1 System and Adversary Model

System Model. We consider the perspective of a model builder

M who trains a model using a dataset contributed to by multiple

participants.M wants to estimate the susceptibility of individual

data records to MIAs.M has full access to the training (𝐷𝑡𝑟 ) and

testing (𝐷𝑡𝑒 ) datasets and can use them to compute the membership

privacy risk score.

Adversary Model. The ground truth for the membership privacy

risk metric for a given training data record is the degree to which

an actual state-of-the-art MIA [38, 40, 42, 46] succeeds against that

record. We adapt the standard adversary model for MIAs [40, 42].

The adversary A has access to the prediction interface of a model

𝑓\ built using a training dataset 𝐷𝑡𝑟 . A submits data records via

the prediction interface and receives model outputs (this is a widely

adapted setting for cloud-based ML models in the industry). Given

an input data record 𝑥 , A can only observe the final output pre-

diction 𝑓\ (𝑥). The MIAs considered use the full prediction vec-

tor [40, 42] instead of the prediction labels [11, 26]. A does not

know the underlying target model architecture. Furthermore, we

assume that A does not know 𝐷𝑡𝑟 but has access to an auxiliary

dataset 𝐷𝑎𝑢𝑥 sampled from the same distribution as 𝐷𝑡𝑟 .

3.2 Requirements

We identify three requirements for an effective record-level mem-

bership privacy risk metric:

R1 Principled Definition. Membership privacy risk scores re-

sulting from the metric must be independent of specific MIAs,

thereby allowing the scores to assess the risks associated with

any MIA, current or future. (Sections 4 and 6.3)

R2 Correlation with attack success. The estimated risk score

of a particular data record must correlate with the likelihood

of success of the best MIA against that record. (Sections 6.1

and 6.2).

R3 Versatility. The membership privacy risk score, once com-

puted, should be useful to estimate other characteristics of the

dataset such as fairness (defined as susceptibility of subgroups,

as determined by one or more attributes, to MIAs; Section 7.1)

and economic value (as defined in prior work on Shapley scores;

Section 7.2).

3.3 Limitations of Existing Metrics

Privacy assessment libraries such as MLPrivacyMeter [32] and ML-

Doctor [29] quantify the membership privacy risk using existing

MIAs. They use aggregatemetrics such as accuracy, precision and re-

call for MIAs across all training data records, and are not optimized

for estimating the privacy risks of individual data records [42].

Song and Mittal propose SPRS which is a probabilistic mem-

bership privacy risk metric for individual data records [42]. SPRS

estimates use a specific MIA. The more effective the attack against

a particular data record, the higher the score. Hence, it is unclear

whether SPRS reflects vulnerability to future, more effective attacks.

Additionally, SPRS does not satisfy the versatility requirement R3

(c.f. Sections 6.2 and 7.1).

Long et al. [30] propose Differential Training Privacy as a mem-

bership privacy metric based on the direct leave-one-out approach:

computing the difference between model predictions with and with-

out a given training record in the training dataset and hence, the

influence of that record on the model utility. However, as we saw in

Section 2.3, naïve application of the leave-one-out approach cannot

scale to large datasets and models since it requires retraining the

model to estimate the score for each data record.

4 SHAPR: A MEMBERSHIP PRIVACY RISK

METRIC

Shapley values (Section 2.3), originally designed as a game-theoretic

notion to quantify the contributions of individuals within groups [39],

were previously proposed for data valuation [14, 15, 21, 22] and

explainability [31]. We propose, SHAPr: using Shapley values to es-

timate individual record-level membership privacy risk associated

with an ML model. SHAPr scores are computed using a 𝐾-NN clas-

sifier trained on the output predictions of the target ML classifier.

SHAPr scores inherit certain properties from Shapley values

which allows SHAPr to satisfy the requirements introduced in

Section 3. In the context of membership privacy risk scores, these

properties can be formulated as follows:

P1 Interpretable. The SHAPr score 𝜙𝑖 of a data record 𝑧𝑖 =

(𝑥𝑖 , 𝑦𝑖 ) is captured measuring how 𝑧𝑖 ’s addition to a training

dataset 𝑆 influences utility 𝑈 () of the resulting model (Equa-

tion 2). Consequently, no influence (i.e.,𝑈 (𝑆) = 𝑈 (𝑆 ∪ 𝑧𝑖 ) leadsto a zero score. Similarly if two data records 𝑧𝑖 and 𝑧 𝑗 have the

same influence (i.e., 𝑈 (𝑆 ∪ 𝑧𝑖 ) = 𝑈 (𝑆 ∪ 𝑧 𝑗 ), then they are as-

signed the same score. We can identify three ranges of SHAPr

scores that have associated semantics:

(a) Case 1:𝑈 (𝑆∪{𝑧𝑖 }) =𝑈 (𝑆) → 𝜙 = 0: There is no differencein the model’s output regardless of the presence of 𝑧𝑖 in

the training dataset: 𝑧𝑖 has a no membership privacy risk.

(b) Case 2: 𝑈 (𝑆 ∪ {𝑧𝑖 }) > 𝑈 (𝑆) → 𝜙 > 0: 𝑧𝑖 contributed to

increasing the model utility. Higher scores indicate higher

likelihood of memorization which increases the suscepti-

bility to MIAs.

(c) Case 3: 𝑈 (𝑆 ∪ {𝑧𝑖 }) < 𝑈 (𝑆) → 𝜙 < 0: 𝑧𝑖 was harmful

to the model’s utility (not learnt well by the model or is

an outlier). It has a higher loss and is indistinguishable

from testing data records which makes it less susceptible

to MIAs.

This clear semantic association allows us to set meaningful

thresholds for SHAPr scores that can be used to decide whether

a data record is susceptible to MIAs.

P2 Additive. 𝜙𝑖 is computed using the testing dataset 𝐷𝑡𝑒 . Specif-

ically, 𝜙𝑖 (𝑈𝑘 ) represents the influence of 𝑧𝑖 on utility 𝑈 () w.r.tto 𝑘𝑡ℎ testing data record. For two testing data records 𝑘 and 𝑙 ,

𝑈𝑖 ({𝑘, 𝑙}) = 𝑈𝑖 (𝑘) +𝑈𝑖 (𝑙). Hence, 𝜙𝑖 is the sum of the member-

ship privacy risk scores of 𝑧𝑖 with respect to each testing data

record. This property further implies group rationality where

4

Page 5: SHAPr: An Efficient and Versatile Membership Privacy Risk

the entire model utility is fairly and completely distributed

amongst all the training data records.

P3 Heterogeneous. Different data records have different influ-

ence on the model’s utility and hence, have varying suscepti-

bility to MIAs (referred to as “heterogeneity"). SHAPr assigns

scores to training data records based on their individual in-

fluence on the model’s utility. This is referred to as equitabledistribution of utility among the training data records in prior

work [22].

We will refer back to these properties while interpreting the

results of our experiments (Sections 6 and 7).

5 EXPERIMENT SETUP

Using several datasets (Section 5.1) we systematically evaluated the

effectiveness of SHAPr with respect to the requirements identified

in Section 3. In the rest of this section, we describe the model

architecture (Section 5.2, the state-of-the-art MIA (Section 6.1), and

the metrics (Section 5.4) we used in our experiments.

5.1 Datasets

Weused ten datasets for our experiments.We used the same number

of training and testing data records from all the datasets except in

the case of MNIST and FMNIST where we used the entire dataset to

ensure the utility of the resulting model is sufficiently high. Three

datasets, TEXAS, LOCATION and PURCHASE, were also used to

evaluate SPRS [42] – we refer to them as SPRS datasets. To facilitate

comparison with SPRS, we used the same dataset partitions for the

three SPRS datasets as described in [42]. We summarize the dataset

partitions in Table 1.

Table 1: Summary of dataset partitions for our experiments.

Dataset Training Set Size Testing Set Size

SPRS Datasets

LOCATION 1000 1000

PURCHASE 19732 19732

TEXAS 10000 10000

Additional Datasets

MNIST 60000 10000

FMNIST 60000 10000

USPS 3000 3000

FLOWER 1500 1500

MEPS 7500 7500

CREDIT 15000 15000

CENSUS 24000 24000

We briefly describe each of the ten datasets, starting with the

SPRS datasets:

LOCATION contains the location check-in records of individu-

als [4]. We used the pre-processed dataset from [40] which con-

tains 5003 data samples with 446 binary features corresponding to

whether an individual has visited a particular location. The data is

divided into 30 classes representing different location types. The

classification task is to predict the location type given the location

check-in attributes of individuals. As in prior work [20, 42], we

used 1000 training data records and 1000 testing data records.

PURCHASE consists of shopping records of different users [5].

We used a pre-processed dataset from [40] containing 197,324 data

records with 600 binary features corresponding to a specific product.

Each record represents whether an individual has purchased the

product or not. The data has 100 classes each representing the

purchase style for the individual record. The classification task is

to predict the purchase style given the purchase history. We used

19,732 train and test records as in prior work [42].

TEXAS consists of Texas Department of State Health Services’ in-

formation about patients discharged from public hospitals [6]. Each

data record contains information about the injury, diagnosis, the

procedures the patient underwent and some demographic details.

We used the pre-processed version of the dataset from [40] which

contains 100 classes of patient’s procedures consisting 67,330 data

samples with 6,170 binary features. The classification task is to

predict the procedure given patient’s attributes. We used 10,000

train and test records as in prior work [20, 42].

Additionally, we used seven other datasets described below. We

rounded down the number of training data records in dataset to

the nearest 1000 and split it in half between training and testing

datasets.

MNIST consists of a training dataset of 60,000 images and a test

dataset of 10,000 images that represent handwritten digits (0-9).

Each data record is a 28x28 grayscale image with a corresponding

class label identifying the digit. The classification task is to identify

the handwritten digits. We used the entire training and testing set.

FMNIST consists of a training dataset of 60,000 data records and a

test dataset of 10,000 data records that represent pieces of clothing.

Each data record is a 28x28 grayscale image with a corresponding

class from one of ten labels. The classification task is to identify the

piece of clothing.

USPS consists of 7291 16x16 grayscale images of handwritten dig-

its [7]. There area total of 10 classes. The classification task is to

identify the handwritten digits. We used 3000 training data records

and 3000 testing data records.

FLOWER consists of 3670 images of flowers categorized into five

classes—chamomile, tulip, rose, sunflower, and dandelion—with

each class having about 800 320x240 images. The dataset was col-

lected from Flickr, Google Images and Yandex Images [3]. The

classification task is to predict the flower category given an image.

We used 1500 train and 1500 testing data records.

CREDIT is an anonymized dataset from the UCI Machine Learning

dataset repository which contains 30000 records with 24 attributes

for each record [2]. It contains information about different credit

card applicants, including a sensitive attribute: the gender of the

applicant. There are two classes indicating whether the application

was approved or not. The classification task is to predict whether

the applicant will default. We used 15000 training data records and

15000 testing data records.

MEPS contains 15830 records of different patients that used medi-

cal services, and captures the frequency of their visits. Each data

record includes the gender of the patient, which is considered a sen-

sitive attribute. The classification task is to predict the utilization

of medical resources as ’High’ or “Low” based on whether the total

number of patient visits is greater than 10. We use 7500 training

data records and 7500 testing data records.

5

Page 6: SHAPr: An Efficient and Versatile Membership Privacy Risk

CENSUS consists of 48842 data records with 103 attributes about

individuals from the 1994 US Census data obtained from UCI Ma-

chine Learning dataset repository [1]. It includes sensitive attributes

such as gender and race of the participant. Other attributes include

marital status, education, occupation, job hours per week among

others. The classification task is to estimate whether the individ-

ual’s annual income is at least 50,000 USD. We used 24000 training

data records and 24000 testing data records.

5.2 Model Architecture

While the proposed SHAPr scores are compatible with all types of

machine learning models, we focus on deep neural networks in our

evaluation. We used a fully connected model with the following

architecture: [1024, 512, 256, 128, 𝑛] with tanh() activation functions

where 𝑛 is the number of classes. This model architecture has been

used in prior work on MIAs [20, 33, 34, 38, 40, 42].

5.3 Membership Inference Attack

We used a modification of prediction entropy attack as proposed

by Song and Mittal [42]. The prediction entropy (described in Sec-

tion 6.1) outputs a zero prediction entropy for data records with cor-

rect or incorrect classification predicted with high confidence by the

model. For a given data record (𝑥,𝑦), the modified entropy function:

𝑀𝑒𝑛𝑡𝑟 (𝑓\ (𝑥), 𝑦) = −(1− 𝑓\ (𝑥)𝑦)𝑙𝑜𝑔(𝑓\ (𝑥)𝑦)−∑𝑖≠𝑦 (𝑓\ (𝑥)𝑖𝑙𝑜𝑔(1−

𝑓\ (𝑥)𝑖 )), accounts for this problem. Here, 𝑓\ (𝑥)𝑦 indicates the pre-

diction on record 𝑥 with correct label 𝑦. A thresholds the modified

prediction entropy to determine themembership status: 𝐼𝑚𝑒𝑛𝑡 (𝑓\ (𝑥), 𝑦)= 1{𝑀𝑒𝑛𝑡𝑟 (𝑓\ (𝑥), 𝑦) ≤ 𝜏𝑦}. Instead of using a fixed threshold of

0.5 over the prediction confidence as seen in original prediction

entropy attack (Section 6.1), the thresholds 𝜏𝑦 are adapted for each

class using the shadow models trained on the auxiliary dataset to

improve the attack accuracy. This modified entropy function along

with the adaptive threshold gives the best attack accuracy [42]. We

refer to this attack as 𝐼𝑚𝑒𝑛𝑡 .

5.4 Metrics

For all the experiments, we used accuracy of MIAs as a the primary

metric along with the average membership privacy risk score. At-

tack accuracy is the percentage of data records with correct mem-

bership predictions by the attack model. Average membership

privacy risk score is the average over the membership privacy

risk scores assigned to training data records by a metric to evaluate

the membership privacy risk across a group of data records.

As in prior work [42], we used two additional metrics to measure

the success of the attack: precision and recall. Precision is the ratio

of true positives to the sum of true positive and false positives. This

indicates the fraction of data records inferred as members by theAwhich are indeed members of training dataset. Recall is the ratio

of true positives to the sum of true positives and false negatives.

This indicates the fraction of the training dataset’s members which

are correctly inferred as members by the A.

6 COMPARATIVE EVALUATION

We experimentally evaluated the effectiveness of SHAPr and SPRS

along two dimensions: their ability to (a) capture susceptibility to

MIAs (Section 6.1), (b) correctly assess the effectiveness of adding

noise to thwart MIAs (Section 6.2). We then tested how well SPRS

would perform against future MIAs by computing SPRS scores us-

ing an older MIA, and examining how that impacts the effectiveness

of SPRS in assessing susceptibility to the state-of-the-art MIA (Sec-

tion 6.3). Finally, we report on the performance cost of computing

SHAPr scores (Section 6.4).

Table 2 presents the baselines: the test accuracy of target models

trainedwith each dataset, and the attack accuracy of the state-of-the

art MIA (modified entropy attack, I𝑚𝑒𝑛𝑡 ) against each model.

Table 2: Test accuracy of target models for each dataset, and

corresponding attack accuracy using I𝑚𝑒𝑛𝑡 .

Dataset Test Accuracy I𝑚𝑒𝑛𝑡

SPRS Datasets

LOCATION 69.0 79.6

PURCHASE 84.65 65.1

TEXAS 49.92 83.0

Additional Datasets

MNIST 98.1 54.3

FMNIST 89.3 58.0

USPS 95.5 54.9

FLOWER 89.6 61.1

MEPS 84.0 62.4

CREDIT 79.9 58.6

CENSUS 82.2 56.5

6.1 Susceptibility to MIAs

Our goal was to evaluate how well the scores from a given met-

ric correlate with the success rate of MIAs. For SPRS, we used a

threshold of 0.5 that was shown to be optimal (according to the F1-

score [42]); data records with scores > 0.5 were deemed members.

SHAPr scores were thresholded at 0 (Property P1); data records

with scores > 0 were considered members. We used the results

of 𝐼𝑚𝑒𝑛𝑡 attacks as the ground truth for computing precision and

recall in order to assess both metrics.

For each dataset, we repeated the experiment ten times. We re-

port the mean precision, recall and their corresponding standard

deviations (Table 3). The results are color-coded: 1) orange in-

dicates that SPRS and SHAPr are comparable (similar mean and

small standard deviation); 2) red indicates that SPRS outperformed

SHAPr 3); and green indicates that SHAPr outperformed SPRS.

Additionally, we report the statistical significance of this difference

(corresponding p-value of a student t-test). Our null hypothesis was

that both sets of results came from the distribution with the same

mean. For p-value < 0.05 there is enough evidence to say that one

metric outperforms the other. For p-value < 0.01, the confidencewith which we can reject the null hypothesis is even stronger. Oth-

erwise (p-value > 0.05) we do not have enough evidence to say

that one metric consistently outperformed the other.

SHAPr has comparable precision to SPRS for six datasets, and

clearly outperformed SPRS for two (TEXAS and MEPS). On the

other hand, SPRS has better precision for two datasets (CREDIT

and CENSUS). In terms of recall, SHAPr outperformed SPRS scores

6

Page 7: SHAPr: An Efficient and Versatile Membership Privacy Risk

Table 3: SHAPr correlates with the success rate of MIAs.

Neithermetric consistently outperforms the other. Orange

indicates comparable results, red indicates SPRS outper-

forms SHAPr and green SHAPr outperforms SPRS.

Dataset Metric Precision p-value Recall p-value

SPRS Datasets

LOCATION

SPRS 0.96 ± 1e-16

>0.050.93 ± 1e-16

<0.01SHAPr 0.96 ± 0.000 0.85 ± 0.000

PURCHASE

SPRS 0.95 ± 1e-16

>0.050.80 ± 0.000

<0.01SHAPr 0.95 ± 1e-16 0.81 ± 0.000

TEXAS

SPRS 0.92 ± 1e-16

<0.010.95 ± 0.000

<0.01SHAPr 0.96 ± 1e-16 0.74 ± 1e-16

Additional Datasets

MNIST

SPRS 0.99 ± 0.002

<0.010.57 ± 0.013

<0.01SHAPr 0.99 ± 8e-4 0.94 ± 0.001

FMNIST

SPRS 0.99 ± 0.005

0.05

0.98 ± 0.026

<0.01SHAPr 0.99 ± 0.005 0.89 ± 0.026

USPS

SPRS 0.79 ± 0.201

0.84

0.76 ± 0.074

<0.01SHAPr 0.77± 0.230 0.98 ± 0.009

FLOWER

SPRS 0.98 ± 0.010

0.81

0.81 ± 0.040

<0.01SHAPr 0.98 ± 0.010 0.94 ± 0.008

MEPS

SPRS 0.96 ± 1e-16

<0.010.99 ± 0.000

<0.01SHAPr 0.97 ± 1e-16 0.91 ± 1e-16

CREDIT

SPRS 0.94 ± 0.006

<0.010.81 ± 2e-4

<0.01SHAPr 0.89 ± 0.004 0.92 ± 0.002

CENSUS

SPRS 0.98 ± 0.000

<0.051.00 ± 0.000

<0.05SHAPr 0.93 ± 0.000 0.84 ± 0.000

for five datasets, while SPRS outperformed SHAPr for the other

five.

It is worth noting that for a metric used to assess membership pri-

vacy risk, recall is more important than precision because minimiz-

ing false negatives (i.e., failing to correctly identify a training data

record at risk) is undesirable from a privacy perspective, whereas

false positives (i.e., incorrectly flagging a record as risky) consti-

tutes erring on the safe side. Both metrics performed comparably

in terms of recall.

Furthermore, we repeated the same experiment under the system

model used in [42] – the datasets used to compute the scores did not

overlap with the training data of the target model – and confirmed

that both metrics still perform comparably (See Appendix B).

6.2 Effectiveness of Adding Noise

A seemingly plausible way to thwart MIAs is to add noise to (“per-

turb") data records before training the model. The rationale is that

the adversary A, who wants to check the presence of a data record

in the training data, is likely to be thwarted because the A cannot

know what perturbation was added to that record.

We divided the original training set (“No Noise”) into two subsets

of equal size: 1) a clean subset without any noise and 2) a noisy sub-

set with perturbed samples. We crafted the noise using FGSM [16],

None 64/255 128/255Noise

0.500

0.505

0.510

0.515

0.520

0.525

0.530

SPR

S Sc

ores

52

53

54

55

56

Atta

ck A

ccur

acy

CleanNoisy

(a) MNIST (SPRS)

None 64/255 128/255Noise

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

SHAP

r Sc

ores

1e 4

52

53

54

55

56

Atta

ck A

ccur

acy

CleanNoisy

(b) MNIST (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

2500

5000

7500

10000

12500

15000

17500

Freq

uenc

y

(c) MNIST (SPRS)

7.5 5.0 2.5 0.0 2.5 5.0 7.5SHAPr Scores 1e 5

0

2500

5000

7500

10000

12500

15000

17500

Freq

uenc

y

(d) MNIST (SHAPr)

None 64/255 128/255Noise

0.50

0.52

0.54

0.56

0.58

0.60SP

RS

Scor

es

59

60

61

62

63

Atta

ck A

ccur

acy

CleanNoisy

(e) USPS (SPRS)

None 64/255 128/255Noise

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

SHAP

r Sc

ores

1e 3

59

60

61

62

63

Atta

ck A

ccur

acy

CleanNoisy

(f) USPS (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

100

200

300

400

500

600

Freq

uenc

y

(g) USPS (SPRS)

0.50 0.25 0.00 0.25 0.50 0.75 1.00SHAPr Scores 1e 3

0

100

200

300

400

500

600

Freq

uenc

y

(h) USPS (SHAPr)

Figure 1: Effect of adding noise. The grey lines for attack

accuracy forming a “sideways-V” shape acts as the ground

truth. Ideally, the privacy risk scores (black lines) should ex-

hibit the same shape (gray lines). This is observed for SHAPr

but not for SPRS.

and tested two different amounts 𝜖 = {64/255, 128/255} (underℓ∞).

4

In the experiment, A used an auxiliary dataset D𝑎𝑢𝑥 that was

identical to the original “No Noise” dataset used to mount the attack.

We assumed this complete overlap betweenM’s and A’s data as it

corresponds to the scenario in whichM is trying to estimate the

privacy risk of their own model.

4Adding Gaussian noise led to similar behavior as with FGSM.

7

Page 8: SHAPr: An Efficient and Versatile Membership Privacy Risk

When a subset of the data was perturbed, the attack accuracy of

the clean subset increased while that for the noisy subset decreased.

This “sideways-V” shape shown with gray lines in Figure 1 for the

attack accuracy acts as the ground truth.

Due to space constraints, we present the results only for two

datasets: MNIST and USPS. Nevertheless, we observed the same

trend for other datasets (see Appendix D). Adding noise reduces

the average privacy risk score of the noisy subset (Figure 1). These

noisy samples are more difficult to learn and contribute negatively

to the model utility. Hence, their corresponding SHAPr score is

lower. The more noise we add, the lower the SHAPr score, and the

lower the attack accuracy. However, as a result, clean data points

in D𝑎𝑢𝑥 become more vulnerable to MIAs as they become more

influential to the utility of the model.

The scores of a good membership privacy risk metric should

exhibit the same “sideways-V” shape as the ground truth. We note

that unlike SPRS scores, SHAPr satisfies this condition. For SPRS,

the average score is not impacted by the added noise. Also, there is

no consistent correlation between the score and the attack accuracy.

The lack of sensitivity of SPRS to training data noise can be attrib-

uted to clustering of SPRS scores around 0.5 indicating indecisive

membership (Figure 1 (c) and (g)). SHAPr scores are fine-grained

and heterogeneous (Property P3) which make them sensitive to

noise added to training data records.

6.3 Is SPRS future-proof?

Recall that to satisfy requirement R1, we compute SHAPr scores

independently of any particular MIA. SPRS on the other hand uses

the best known attack for computing scores. In the preceding sec-

tions, we used the same state-of-the-art MIA (𝐼𝑚𝑒𝑛𝑡 ) for computing

the ground truth as well as for computing SPRS scores.

To fairly assess whether the reliance on a specific attack will

affect the ability of SPRS to correctly assess susceptibility to future

MIAs, we simulated a “future proofness scenario”. We used 𝐼𝑚𝑒𝑛𝑡 to

compute the ground truth as before (simulating “future”), but used

the original prediction entropy function to generate SPRS scores

(simulating “past”).

Table 4 summarizes the results. The "Simulated" corresponds

to a "past" where SPRS scores were computed using the original

prediction entropy attack, and the baseline is same as Table 3 where

A uses the modified prediction entropy attack as the MIA. As

mentioned previously, recall is the most important measure for a

membership privacy metric. We see that in all but two datasets, the

recall for SPRS scores drops sharply in the simulated past setting.

This indicates that SPRS scores may not be effective at assessing

susceptibility to unknown, future MIAs as SPRS relies on specific

MIAs for computing the scores.

6.4 Performance Evaluation of SHAPr

Next, we show that SHAPr scores can be computed in reasonable

time. Table 5 shows the average execution time for computing

SHAPr scores across datasets of different sizes over ten runs. We

ran the evaluation on Intel Core i9-9900K CPU @ 3.60GHz with

65.78GB memory. We use the python metric time() in time library

which returns the time in seconds (UTC) since epoch start.

Table 4: Assessing “future proofness” of SPRS. The base-

line setting is the same as the SPRS scores reported in Ta-

ble 3. The “simulated” indicates the “past” setting to com-

pute SPRS scores using the original prediction entropy func-

tion. Red indicates datasets for which the recall of SPRS

scores have a sharp drop in recall in the simulated “past”

setting.

Dataset Metric Precision Recall

SPRS Datasets

LOCATION

Baseline 0.96 ± 1e-16 0.93 ± 1e-16

Simulated 0.95 ± 1e-16 0.97 ± 1e-16

PURCHASE

Baseline 0.95 ± 1e-16 0.80 ± 0.000

Simulated 0.99 ± 1e-16 0.50 ± 1e-16

TEXAS

Baseline 0.92 ± 1e-16 0.95 ± 0.000

Simulated 0.94 ± 6e-4 0.79 ± 0.002

Additional Datasets

MNIST

Baseline 0.99 ± 0.002 0.57 ± 0.013

Simulated 0.99 ± 0.001 0.56 ± 0.028

FMNIST

Baseline 0.99 ± 0.005 0.98 ± 0.026

Simulated 1.0 ± 0.000 0.64 ± 0.035

USPS

Baseline 0.79 ± 0.201 0.76 ± 0.074

Simulated 0.86 ± 0.160 0.64 ± 0.050

FLOWER

Baseline 0.98 ± 0.010 0.81 ± 0.040

Simulated 0.99 ± 0.006 0.66 ± 0.094

MEPS

Baseline 0.96 ± 1e-16 0.99 ± 0.000

Simulated 0.94 ± 0.001 0.67 ± 6e-4

CREDIT

Baseline 0.94 ± 0.006 0.81 ± 2e-4

Simulated 0.79 ± 0.032 0.39 ± 0.038

CENSUS

Baseline 0.98 ± 0.000 1.00 ± 0.000

Simulated 0.99 ± 1e-16 0.28 ± 0.000

Table 5: Performance of SHAPr across different datasets.

Dataset # Records# FeaturesExecution Time (s)

SPRS Datasets

LOCATION 1000 446 130.77 ± 3.90

PURCHASE 19732 600 3065.58 ± 19.24

TEXAS 10000 6170 5506.79 ± 17.47

Additional Datasets

MNIST 60000 784 2747.41 ± 22.65

FMNIST 60000 784 3425.90 ± 34.03

USPS 3000 256 238.67 ± 1.74

FLOWER 1500 2048 174.27 ± 11.74

MEPS 7500 42 732.43 ± 4.95

CREDIT 15000 24 1852.66 ± 30.92

CENSUS 24000 103 3718.26 ± 18.25

Computation time for SHAPr scores ranges from ≈ 2 mins for

LOCATION dataset to ≈ 91 mins for TEXAS. Since the scores

are computed once and designed for M with access to hardware

resources, these execution times are reasonable.

Long et al.’s naïve leave-one-out scores require training |𝐷𝑡𝑟 | ad-ditionalmodels (compared to training a singlemodel for SHAPr) [30].

8

Page 9: SHAPr: An Efficient and Versatile Membership Privacy Risk

To benchmark, we used a subset of the LOCATION dataset with

100 training data records and found that SHAPr is 100x faster

than a stratight-forward leave-one-out based approach: 3640.21± 244.08s (leave-one-out) vs. 34.65 ± 1.74s (SHAPr) took only,

across five runs.5

7 VERSATILITY OF SHAPR

We now explore the versatility (Requirement R3) of SHAPr in

terms of applicability in other settings such as estimating fairness

(Section 7.1) and data valuation (Section 7.2).

7.1 Fairness

Prior work has shown that different subgroups with sensitive at-

tribute (e.g., race or gender) have disparate vulnerability toMIAs [45].

We evaluated whether SPRS and SHAPr can correctly identify this

disparity.

We used three datasets that have sensitive attributes: CENSUS,

CREDIT, and MEPS. CENSUS has two sensitive attributes, gender

and race, while CREDIT and MEPS have gender. For gender, the

majority class is “Male” and the minority class is “Female”. For rate,

“White” is the majority class and “Black” the minority class. We

computed the ground truth MIA accuracy, separately for each class,

using 𝐼𝑚𝑒𝑛𝑡 .

Figure 2 shows that subgroups are disparately vulnerable to MIA

(indicated in yellow ). SHAPr can capture this difference (shown

in red ) – the scores are higher for subgroups with higher attack

accuracy. On the other hand, SPRS scores stay the same for both

subgroups, regardless of the attack accuracy (shown in blue ).

SHAPr scores are additive (Property P2) – we can compute

the membership privacy risk over subgroups by averaging the

scores within each subgroup. However, for SPRS scores there is no

semantically meaningful notion of adding or averaging probability

scores. We conjecture that the lack of additivity property makes

SPRS makes ineffective at this task.

7.2 Data Valuation

We briefly discuss the application of SHAPr for data valuation. We

did not carry out separate experiments but refer to the extensive

prior literature on the use of Shapley values for data valuation

[15, 21–23].

Two relevant properties of Shapley values are additivity (Prop-

erty P2) which includes group rationality, where the complete util-

ity is distributed among all training data records, and heterogenity

(Property P3), which indicates equitable assignment of model util-

ity to training data records based on their influence. These make

Shapley values useful for data valuation [15, 22]. Since SHAPr uses

Shapley values, once computed, SHAPr scores can be used directly

for data valuation of both individual data records as well as groups

of data records.

SPRS was not designed to be additive P2 and hence cannot guar-

antee group rationality of scores among training data records. SPRS

scores are not heterogeneous (Property P3) either which does guar-

antee equitable assignment of privacy risk scores (as discussed in

Section 1). We discuss the lack of heterogeneity in the Appendix C,

5For larger datasets leave-one-out took unreasonably long.

SHAPr (Group 1)

SPRS (Group 1)

AttAcc (Group 1)

SHAPr (Group 2)

SPRS (Group 2)

AttAcc (Group 2)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

SHAP

r Sc

ores

1e 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

AttA

cc /

SPR

S Sc

ores

(a) CENSUS (Race)

SHAPr (Group 1)

SPRS (Group 1)

AttAcc (Group 1)

SHAPr (Group 2)

SPRS (Group 2)

AttAcc (Group 2)

0

1

2

3

4

SHAP

r Sc

ores

1e 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

AttA

cc /

SPR

S Sc

ores

(b) CENSUS (Gender)

SHAPr (Group 1)

SPRS (Group 1)

AttAcc (Group 1)

SHAPr (Group 2)

SPRS (Group 2)

AttAcc (Group 2)

0

1

2

3

4

5SH

APr

Scor

es

1e 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

AttA

cc /

SPR

S Sc

ores

(c) CREDIT (Gender)

SHAPr (Group 1)

SPRS (Group 1)

AttAcc (Group 1)

SHAPr (Group 2)

SPRS (Group 2)

AttAcc (Group 2)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

SHAP

r Sc

ores

1e 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

AttA

cc /

SPR

S Sc

ores

(d) MEPS (Gender)

Figure 2: Different subgroups are vulnerable to MIAs to a

different extent. SHAPr scores ( red ) have the same trend

as the ground truth ( yellow ). However SPRS scores ( blue )

are the same for all subgroups. “AttAcc” refers to the MIA

accuracy.

visualizing the distribution of SPRS scores (Figure 5. Given the

lack of these properties (heterogeneity, additivity, group rationality,

and equitable assignment), we argue that SPRS is unlikely to be

applicable for data valuation.

9

Page 10: SHAPr: An Efficient and Versatile Membership Privacy Risk

8 PITFALLS OF DATA REMOVAL

Having established that SHAPr is an effective metric for measuring

membership privacy risk, we now describe an example application

where suchmetric can be useful. In data valuation research, it is well-

known that removing records with high Shapley values will harm

the utility of the model, and removing records with low values will

improve it [22, 23]. Hence, it begs the question whether removal of

records with high SHAPr scores improves the membership privacy

risk of a dataset, by reducing its overall susceptibility to MIAs. To

explore this question, we removed a fraction (up to 50%) of recordswith the highest SHAPr scores. Also, we randomly removed testing

data records so as to keep the same number of member and non-

member records as in previous experiments.

0.0 0.1 0.2 0.3 0.4 0.5Fraction of Data Removed

10 2

10 1

100

SHAP

r Sc

ores

MNISTFMNISTUSPSFLOWER

MEPSCENSUSCREDIT

LOCATIONPURCHASETEXAS

Figure 3: Removing a fraction of training data records with

high SHAPr scores does not reduce the risk for the remain-

ing records.

Figure 3 summarizes the results. Removing an increasing num-

ber of records with high SHAPr scores does not necessarily reduce

the membership privacy risk for the remaining records. No consis-

tent upward (or downward) trend was visible for the scores of the

remaining records. Interestingly, depending on the number of re-

moved samples, the scores fluctuate. A possible explanation is that

once risky data records are removed, and a new model is trained

using the remaining records, their contribution to the utility of the

revised model changes, thereby changing their SHAPr scores.

Long et al. [30] observed a similar result. However, their ex-

periment was limited to a single small dataset (≈ 1000 training

data records), and minimal removal (only up to 2%). Thanks tothe superior efficiency of SHAPr, we are able to confirm that this

observation holds broadly across larger datasets (10 vs. 1) and for

more extensive data removal (up to 50% vs. 2%).

9 RELATEDWORK

Efficient Computation of Shapley Values. Estimating the influ-

ence of a training data record on model utility is useful in settings

where data providers receive compensation for sharing their data

with model builders. This is referred to as data valuation. Shapley

values for ML were recently explored in the context of data valua-

tion [14, 15, 21, 22, 25]. Naive approaches for computing Shapley

values are computationally expensive because their require retrain-

ing for each training data record. There are two proposed variants

for approximate computation of Shapley values [15]: Monte Carlo

(drawing a random permutation of the training data and averaging

the marginal contribution of training data records over multiple

permutations) and using gradients (computing marginal contribu-

tion using gradients for each data record during training). However,

these metrics are still computationally expensive. In particular,

Monte Carlo based metrics still require training multiple models.

Defending Membership Privacy. Several defences against MIAs

have been proposed. Notably, they rely on regularization [33, 38, 40,

47], low precision models [12] or output perturbation with accuracy

guarantees [20]. However, recent attacks indicated that defences

such as adversarial regularization [33] and MemGuard [20] are

ineffective [42]. Hence, an effective defence against MIAs remains

an open problem.

Estimating Membership Privacy Risk. Adversary’s advantage

was proposed as a metric for evaluating differential privacy mecha-

nisms using MIAs [46]. It was later improved by using recall instead

of attack accuracy [19]. However, both metrics compute aggregate

membership privacy risk across all data records and not for indi-

vidual records. Two metrics that are the closest to our work were

described earlier: SPRS [42] (Section 2.4) and Long et al. [30] (Sec-

tions 3.3 and 6.4).

Membership privacywas also studied in the context of generative

models [28], including for individual records. Additionally, it was

shown that Fisher information can be used to compute the influence

of attributes towards the model utility [17]. However, this is limited

to linear models with convex loss which does not apply to the neural

networks we consider. Also, this metric requires inverting a Hessian

which is computationally expensive for large models. Similarly, an

information theoretic metric [37] can be used to compute an upper

bound on the privacy risks of the PATE framework [35]. Even so, it

cannot be used as a standalone metric for record-level membership

privacy risk.

10 DISCUSSION

We now discuss alternatives to SHAPr and potential limitations.

10.1 Comparison with Influence Functions

Influence functions [25, 36] were proposed for explaining model

predictions. Since these are independent of specific membership

attacks, they could potentially be used to design an alternative,

principled (Property P1) metric for measuring membership privacy

risk. We now explore the viability of such designs.

We implemented Koh et al.’s influence function [25] which as-

signs an influence score to each individual training data record with

respect to each prediction class. We adapt it to estimate membership

privacy risk by averaging the scores for each training data record

across all classes to estimate the overall influence of that the test

data record.

We additionally implemented TracIN [36] which estimates the

influence of a training data record by computing the dot product

of the gradient of the loss for training and testing data records

computed from intermediate models saved during training. Given

|𝐷𝑡𝑟 | training data records and |𝐷𝑡𝑒 | testing data records, TracIN10

Page 11: SHAPr: An Efficient and Versatile Membership Privacy Risk

computes a score for each training data record corresponding to

each testing data record resulting in a |𝐷𝑡𝑟 | × |𝐷𝑡𝑒 | matrix. To

estimate the scores of training data records across the entire test

dataset, we averaged the values across all the testing data records

for each training data record.

Table 6: Effectiveness of blackbox influence functions

(KIFS [25]) and TracIN [36] as a metric for membership pri-

vacy risk scores. Comparing to SHAPr in Table 3, orange

indicates comparable results, red indicates poor results

and green indicates better results. Computations which

took unreasonably long time were omitted indicated by “-”.

Dataset

KIFS [25] TracIN [36]

Precision Recall Precision Recall

SPRS Datasets

LOCATION 0.92 ± 0.0050.48 ± 0.0130.96 ± 1.1e-16 0.20 ± 0.00

PURCHASE 0.94 ± 0.002 0.51 ± 0.005 - -

TEXAS 0.91 ± 0.005 0.51 ± 0.034 - -

Additional Datasets

MNIST 0.98 ± 0.015 0.30 ± 0.180 - -

FMNIST 0.81 ± 0.081 0.49 ± 0.098 - -

USPS 0.82 ± 0.257 0.33 ± 0.097 0.75 ± 0.226 0.42 ± 0.028

FLOWER 0.97 ± 0.017 0.51 ± 0.066 0.95 ± 0.029 0.46 ± 0.095

MEPS 0.95 ± 0.004 0.62 ± 0.053 0.96 ± 0.00 0.85 ± 0.00

CREDIT 0.85 ± 0.037 0.79 ± 0.030 - -

CENSUS 0.94 ± 0.012 0.72 ± 0.115 - -

For evaluation, we compute precision and recall by thresholding

Koh et al.’s influence function scores (referred to as KIFS) and TracIN

scores and comparing with attack success of modified prediction

entropy attack (𝐼𝑚𝑒𝑛𝑡 ) as the ground truth.We compare these results

in Table 6 with the results of SHAPr in Table 3. We use orange

to indicate comparable results, red to indicate poor results and

green to indicate better results compared to SHAPr.

Overall, both TracIN and KIFS have poor recall across all the

datasets compared to SHAPr scores. Both KIFS and TracIN approx-

imate the influence of training data records [25, 36]. KIFS is well

defined for convex functions but not for large neural networks with

non-convex optimization [9]. Hence, the approximation of influ-

ence scores are often erroneous. We conjecture this as a potential

reason for the lower recall for KIFS compared to SHAPr.

For TracIn, the overhead of computing per-sample influence

was high for large datasets (>7500 training data records) and all

the datasets which took more than a day of computation were

omitted indicated by “-” in Table 6. TracIN stores the model at

different epochs during training. For each of the 𝑁𝑚𝑜𝑑𝑒𝑙𝑠 models

saved during training, the dot product of the gradient loss over

each training and testing data record is computed resulting in a

complexity of O(𝑁𝑚𝑜𝑑𝑒𝑙𝑠 .|𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |). The computational overhead

for KIFS is also high and in the order of O(|𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |).The poor recall and high cost of KIFS and TracIN (compared to

SHAPr) imply that they are not good candidates to base member-

ship privacy risk metrics on.

10.2 Impact of Backdoors

A backdoor to a machine learning model is a set of inputs chosen

to manipulate decision boundaries of the model. Backdoors can

be used for malicious purposes such as poisoning (e.g. [10]), or to

embed watermarks that allow model owners to claim ownership of

their model in case it gets stolen [8, 43]. A backdoor is created by

changing the label of several training data records [43], by adding

artifacts to the training data records themselves (e.g. overlay text

or texture to images [48]), or by introducing out-of-distribution

data [8] to the training data. A successfully embedded backdoor is

memorised during training, along the primary task of the model.

During the verification, the model builder M queries the model

and expects matching backdoor predictions.

Backdoors have negative influence on model utility as they intro-

duce noise, and make training more difficult. Hence, their SHAPr

scores are low. However, memorization of backdoors is required

for successful verification. In other words, backdoors behave differ-

ently from other data records in the context of SHAPr: they are,

by definition, memorized but unlike other memorized data records,

they are likely to have low SHAPr scores. This is not a concern in

our setting becauseM is the entity that computes SHAPr scores. If

a backdoor is inserted intentionally by M (e.g., for watermarking),

then M will know what they are. If a backdoor was inserted mali-

ciously (e.g., by a training data provider), there is no need to provide

any guarantees regarding the SHAPr score for those records.

10.3 Comparing Privacy Risk Across Datasets

SHAPr scores can be used to compare the relative privacy risk of

individual data records within a dataset. However, they are not de-

signed to compare membership privacy risk across different datasets.This is because the scores rely on the testing dataset which varies

across different datasets. Similarly, none of the other privacy met-

rics (like SPRS [42] or Long et al. [30]) designed for individual

privacy risk scores can be used to compare the privacy risk of two

different datasets either.

11 CONCLUSIONS

As a membership privacy metric, SHAPr scores can potentially

be used for evaluating the effectiveness of defences. Several prior

works have indicated specific defences, adversarial regularization

and MemGuard, are ineffective [42] and that simple regulariza-

tion can be used to mitigate membership privacy risk [27, 47]. In

(Appendix A,we show how SHAPr scores can be used to assess

the effectiveness of a defence, L2 regularization for demonstration,

against MIAs. We leave similar evaluations of other defences as

future work.

Differential privacy (DP) is closely associated with the notion

of leave-one-out stability where the goal is to minimize the model

behaviour with and without a training data record. While this has

been used to reduce the membership privacy risk, Jia et al. [23]

show that when a model is trained using DP, the overall Shapley

values across the training data records decrease [23]. Evaluating

whether SHAPr scores can correctly assess the impact of DP on

susceptibility to MIAs is also left as future work.

11

Page 12: SHAPr: An Efficient and Versatile Membership Privacy Risk

REFERENCES

[1] Adult income census dataset. https://archive.ics.uci.edu/ml/datasets/adult. Ac-

cessed: 2021-11-27.

[2] Credit dataset. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+

data). Accessed: 2021-11-27.

[3] Flowers dataset. https://www.kaggle.com/alxmamaev/flowers-recognition. Ac-

cessed: 2021-11-27.

[4] Location dataset. https://sites.google.com/site/yangdingqi/home/foursquare-

dataset. Accessed: 2021-11-27.

[5] Purchase dataset. https://www.kaggle.com/c/acquire-valued-shoppers-challenge.

Accessed: 2021-11-27.

[6] Texas dataset. https://www.dshs.texas.gov/THCIC/Hospitals/Download.shtm.

Accessed: 2021-11-27.

[7] Usps dataset. https://www.kaggle.com/bistaumanga/usps-dataset. Accessed:

2021-11-27.

[8] Adi, Y., Baum, C., Cisse, M., Pinkas, B., and Keshet, J. Turning your weakness

into a strength: Watermarking deep neural networks by backdooring. In 27thUSENIX Security Symposium (2018), pp. 1615–1631.

[9] Basu, S., Pope, P., and Feizi, S. Influence functions in deep learning are fragile,

2021.

[10] Chen, X., Liu, C., Li, B., Lu, K., and Song, D. Targeted backdoor attacks on deep

learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).[11] Choqette-Choo, C. A., Tramer, F., Carlini, N., and Papernot, N. Label-only

membership inference attacks. In Proceedings of the 38th International Conferenceon Machine Learning (18–24 Jul 2021), M. Meila and T. Zhang, Eds., vol. 139 of

Proceedings of Machine Learning Research, PMLR, pp. 1964–1974.

[12] Duddu, V., Boutet, A., and Shejwalkar, V. Gecko: Reconciling privacy, accu-

racy and efficiency in embedded deep learning. In arXiv 2010.00912 (2021).[13] Feldman, V. Does learning require memorization? a short tale about a long tail.

In Symposium on Theory of Computing (New York, NY, USA, 2020), STOC 2020,

Association for Computing Machinery, p. 954–959.

[14] Ghorbani, A., Kim, M., and Zou, J. A distributional framework for data valuation.

In International Conference on Machine Learning (13–18 Jul 2020), H. D. III and

A. Singh, Eds., vol. 119 of Proceedings of Machine Learning Research, PMLR,

pp. 3535–3544.

[15] Ghorbani, A., and Zou, J. Data shapley: Equitable valuation of data for ma-

chine learning. In International Conference on Machine Learning (09–15 Jun

2019), K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97 of Proceedings of MachineLearning Research, PMLR, pp. 2242–2251.

[16] Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing

adversarial examples. In arXiv 1412.6572 (2015).[17] Hannun, A., Guo, C., and van der Maaten, L. Measuring data leakage in

machine-learning models with fisher information. In arXiv 2102.11673 (2021).[18] House, W. Guidance for regulation of artificial intelligence applications. In

Memorandum For The Heads Of Executive Departments And Agencies (2020).[19] Jayaraman, B., Wang, L., Evans, D. E., and Gu, Q. Revisiting membership infer-

ence under realistic assumptions. Proceedings on Privacy Enhancing Technologies2021 (2021), 348 – 368.

[20] Jia, J., Salem, A., Backes, M., Zhang, Y., and Gong, N. Z. Memguard: Defending

against black-box membership inference attacks via adversarial examples. In

Conference on Computer and Communications Security (New York, NY, USA, 2019),

CCS ’19, Association for Computing Machinery, p. 259–274.

[21] Jia, R., Dao, D., Wang, B., Hubis, F. A., Gurel, N. M., Li, B., Zhang, C., Spanos,

C., and Song, D. Efficient task-specific data valuation for nearest neighbor

algorithms. Proc. VLDB Endow. 12, 11 (July 2019), 1610–1623.

[22] Jia, R., Dao, D., Wang, B., Hubis, F. A., Hynes, N., Gürel, N. M., Li, B., Zhang,

C., Song, D., and Spanos, C. J. Towards efficient data valuation based on the

shapley value. In International Conference on Artificial Intelligence and Statistics(16–18 Apr 2019), K. Chaudhuri and M. Sugiyama, Eds., vol. 89 of Proceedings ofMachine Learning Research, PMLR, pp. 1167–1176.

[23] Jia, R., Wu, F., Sun, X., Xu, J., Dao, D., Kailkhura, B., Zhang, C., Li, B., and

Song, D. Scalability vs. utility: Do we have to sacrifice one for the other in

data importance quantification? In Conference on Computer Vision and PatternRecognition (2021).

[24] Kazim, E., Denny, D. M. T., and Koshiyama, A. AI auditing and impact assess-

ment: according to the uk information commissioner’s office. AI and Ethics (Feb2021).

[25] Koh, P. W., and Liang, P. Understanding black-box predictions via influence

functions. In International Conference on Machine Learning (06–11 Aug 2017),

D. Precup and Y. W. Teh, Eds., vol. 70 of Proceedings of Machine Learning Research,PMLR, pp. 1885–1894.

[26] Li, Z., and Zhang, Y. Membership leakage in label-only exposures, 2021.

[27] Liu, J., Oya, S., and Kerschbaum, F. Generalization techniques empirically

outperform differential privacy against membership inference, 2021.

[28] Liu, X., Xu, Y., Tople, S., Mukherjee, S., and Ferres, J. L. Mace: A flexible

framework for membership privacy estimation in generative models. In arXiv2009.05683 (2021).

[29] Liu, Y., Wen, R., He, X., Salem, A., Zhang, Z., Backes, M., Cristofaro, E. D.,

Fritz, M., and Zhang, Y. Ml-doctor: Holistic risk assessment of inference attacks

against machine learning models. In arXiv 2102.02551 (2021).[30] Long, Y., Bindschaedler, V., and Gunter, C. A. Towards measuring member-

ship privacy. In arXiv 1712.09136 (2017).[31] Lundberg, S. M., and Lee, S.-I. A unified approach to interpreting model pre-

dictions. In Proceedings of the 31st international conference on neural informationprocessing systems (2017), pp. 4768–4777.

[32] Murakonda, S. K., and Shokri, R. ML privacy meter: Aiding regulatory compli-

ance by quantifying the privacy risks of machine learning. InWorkshop on HotTopics in Privacy Enhancing Technologies (HotPETs) (2020).

[33] Nasr, M., Shokri, R., and Houmansadr, A. Machine learning with member-

ship privacy using adversarial regularization. In Conference on Computer andCommunications Security (New York, NY, USA, 2018), CCS ’18, Association for

Computing Machinery, p. 634–646.

[34] Nasr, M., Shokri, R., and Houmansadr, A. Comprehensive privacy analysis of

deep learning: Passive and active white-box inference attacks against centralized

and federated learning. In IEEE Symposium on Security and Privacy (SP) (2019),pp. 739–753.

[35] Papernot, N., Abadi, M., Úlfar Erlingsson, Goodfellow, I., and Talwar, K.

Semi-supervised knowledge transfer for deep learning from private training data.

In Proceedings of the International Conference on Learning Representations (2017).[36] Pruthi, G., Liu, F., Kale, S., and Sundararajan, M. Estimating training data

influence by tracing gradient descent. In Advances in Neural Information Pro-cessing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and

H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 19920–19930.

[37] Saeidian, S., Cervia, G., Oechtering, T. J., and Skoglund, M. Quantifying

membership privacy via information leakage. In arXiv 2010.05965 (2020).[38] Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M., and Backes, M. ML-

leaks: Model and data independent membership inference attacks and defenses

on machine learning models. In Network and Distributed Systems Security (2018).

[39] Shapley, L. S. 17. A Value for n-Person Games. Princeton University Press, 2016,

pp. 307–318.

[40] Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference

attacks against machine learning models. In IEEE Symposium on Security andPrivacy (SP) (2017), pp. 3–18.

[41] Song, C., Ristenpart, T., and Shmatikov, V. Machine learning models that

remember too much. In Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security (New York, NY, USA, 2017), CCS ’17,

Association for Computing Machinery, p. 587–601.

[42] Song, L., and Mittal, P. Systematic evaluation of privacy risks of machine

learning models. In 30th USENIX Security Symposium (USENIX Security 21) (Aug.2021), USENIX Association.

[43] Szyller, S., Atli, B. G., Marchal, S., and Asokan, N. DAWN: dynamic adver-

sarial watermarking of neural networks. CoRR (2019).

[44] Tabassi, E., Burns, K. J., Hadjimichael, M., Molina-Markham, A., and Sex-

ton, J. A taxonomy and terminology of adversarial machine learning. In NISTInteragency/Internal Report (2019).

[45] Yaghini, M., Kulynych, B., Cherubin, G., and Troncoso, C. Disparate vulner-

ability: On the unfairness of privacy attacks against machine learning. arXivpreprint arXiv:1906.00389 (2019).

[46] Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine

learning: Analyzing the connection to overfitting. In IEEE 31st Computer SecurityFoundations Symposium (CSF) (2018), pp. 268–282.

[47] Ying, Z., Zhang, Y., and Liu, X. Privacy-preserving in defending against mem-

bership inference attacks. In Workshop on Privacy-Preserving Machine Learningin Practice (New York, NY, USA, 2020), PPMLP’20, Association for Computing

Machinery, p. 61–63.

[48] Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., and Molloy, I.

Protecting intellectual property of deep neural networks with watermarking. In

ACM Symposium on Information, Computer and Communications Security (2018),

pp. 159–172.

A EVALUATING EFFECTIVENESS OF

DEFENCES

We evaluate whether SHAPr can be used to verify the effective-

ness of defences against MIAs. Specifically, the average SHAPr

scores across all training data records should decrease on applying

a defence to indicate the decrease in the empirical MIA accuracy.

Prior defences such as adversarial regularization [33] and Mem-

Guard [20] have been shown to be ineffective against MIAs [42].

Hence, we focus on L2 regularization previously shown as a valid

12

Page 13: SHAPr: An Efficient and Versatile Membership Privacy Risk

defence against MIAs [47]. As in Song and Mittal [42], we con-

sider the three datasets vulnerable to MIAs to evaluate SHAPr:

LOCATION, PURCHASE and TEXAS.

SHAPr (NoReg)

SHAPr (Reg)

AttAcc (NoReg)

AttAcc (Reg)

ModelAcc (NoReg)

ModelAcc (Reg)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

SHAP

r Sc

ores

1e 4

0

10

20

30

40

50

60

70

80

Accu

racy

Random Guess

(a) LOCATION

SHAPr (NoReg)

SHAPr (Reg)

AttAcc (NoReg)

AttAcc (Reg)

ModelAcc (NoReg)

ModelAcc (Reg)

0.0

0.5

1.0

1.5

2.0

SHAP

r Sc

ores

1e 5

0

10

20

30

40

50

60

70

80

Accu

racy

Random Guess

(b) PURCHASE

SHAPr (NoReg)

SHAPr (Reg)

AttAcc (NoReg)

AttAcc (Reg)

ModelAcc (NoReg)

ModelAcc (Reg)

0.0

0.5

1.0

1.5

2.0

2.5

SHAP

r Sc

ores

1e 5

0

10

20

30

40

50

60

70

80

Accu

racy

Random Guess

(c) TEXAS

Figure 4: Effectiveness of L2 regularization. L2 regulariza-

tion lowers the susceptibility to MIAs which is captured by

SHAPr. The gain comes at a significant cost to model accu-

racy. “SHAPr” on x-axis indicates the average SHAPr scores

with values on the left y-axis. “AttAcc” refers to the MIA

accuracy and “ModelAcc” refers to the model utility on the

testing dataset, both with values on right Y-axis.

We choose the regularization hyperparameter for highest MIA

resistance (attack accuracy close to random guess 50%). In Figure 4,

“AttAcc” refers to the empirical MIA accuracy with a random guess

of 50%, “ModelAcc” refers to the model utility on unseen test data

while “Scores” indicate the average SHAPr scores across all data

records. The evaluation is done with and without regularization

indicated as “Reg” and “NoReg” respectively (Figure 4).

We can see that the model utility degradation significant for

random guess attack accuracy resulting in a poor privacy-utility

tradeoff (shown previously [33]). However, average SHAPr scores

across all data records decreases with regularization which corre-

sponds to a decrease in the empirical MIA accuracy. Hence, SHAPr

can measure the effectiveness of defences against MIAs.

B EVALUATING ADVERSARY’S

PERSPECTIVE

We additionally compare SHAPr and SPRS in the adversary model

used by Song and Mittal, i.e., there is no overlap between the target

model’s training data and the A’s auxiliary data. This is the practi-

cal setting from the adversary A’s perspective for which SPRS was

specifically designed.

The results are shown in Table 7 where orange indicates com-

parable results (same mean and small standard deviation or p-

value>0.05), red indicates that SPRS significantly performs better

and green indicates that SHAPr is significantly better. We use the

p-value student t-test over 10 runs to indicate statistical significance

of the results.

The results for SHAPr is still comparable to SPRS across all ten

datasets: both SPRS and SHAPr perform better for two datasets in

precision and five datasets in recall each (Table 2).

Table 7: SHAPr is comparable to SPRS (evaluated in the ad-

versary model of [42], representing the adversary’s perspec-

tive with no overlap between D𝑎𝑢𝑥 and target model’s train

data). Orange indicates comparable results, red indicates

SPRS outperforms SHAPr and green SHAPr outperforms

SPRS.

Dataset Approach Precision p-value Recall p-value

SPRS Datasets

LOCATION

SPRS 0.97 ± 1e-16

>0.050.95 ± 0.000

<0.01SHAPr 0.97 ± 1e-16 0.87 ± 1e-16

PURCHASE

SPRS 0.98 ± 1e-16

<0.010.83 ± 1e-16

<0.01SHAPr 0.98 ± 0.000 0.81 ± 0.000

TEXAS

SPRS 0.99 ± 0.000

>0.050.93 ± 1e-16

<0.01SHAPr 0.99 ± 0.000 0.70 ± 1e-16

Additional Datasets

MNIST

SPRS 0.95 ± 0.001

<0.010.53 ± 0.043

<0.01SHAPr 0.98 ± 0.002 0.96 ± 0.001

FMNIST

SPRS 0.99 ± 0.004

0.021

0.84 ± 0.061

0.015

SHAPr 0.99 ± 0.008 0.89 ± 0.007

USPS

SPRS 0.99 ± 0.003

0.99

0.70 ± 0.014

<0.01SHAPr 0.99 ± 0.002 0.97 ± 0.001

FLOWER

SPRS 0.97 ± 0.021

0.68

0.78 ± 0.108

<0.01SHAPr 0.97 ± 0.022 0.95 ± 0.008

MEPS

SPRS 0.97 ± 1e-16

<0.010.97 ± 1e-16

<0.01SHAPr 0.98 ± 1e-16 0.89 ± 0.000

CREDIT

SPRS 0.91 ± 0.037

<0.010.87 ± 0.064

0.04

SHAPr 0.84 ± 0.029 0.92 ± 0.015

CENSUS

SPRS 0.90 ± 0.000

<0.051.00 ± 0.000

<0.05SHAPr 0.87 ± 0.000 0.86 ± 0.000

13

Page 14: SHAPr: An Efficient and Versatile Membership Privacy Risk

C COMPARING DISTRIBUTIONS

We visually compare SHAPr with SPRS by plotting the distribution

of SHAPr (in green) and for SPRS (in blue) shown in Figure 5.

For several datasets, we observe that SPRS is centered at 0.5

indicating that the membership likelihood for a large number of

training data records is inconclusive. Further, we note that the dis-

tribution of SPRS scores is not evenly distributed, with some values

correspond to several records while neighboring values correspond

to none. We conjecture that this is due to the assigning some prior

probabilities (Section 2.4) and estimating the conditional probabil-

ities using shadow models optimized to give the same output for

multiple similar data records. Compared to SPRS, SHAPr follows a

more even distribution (due to the heterogenity property P3).

D EFFECTIVENESS OF ADDING NOISE

To supplement the results in Section D, we now provide the results

for other datasets in the context of noise addition. As before, SPRS

scores do not follow the pattern exhibited by the ground truth

(attack accuracy) plots (Figure 6) while SHAPr scores match their

“sideways-V” shapes.

14

Page 15: SHAPr: An Efficient and Versatile Membership Privacy Risk

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

500

1000

1500

2000

Freq

uenc

y

(a) CREDIT (SPRS)

1.25 1.00 0.75 0.50 0.25 0.00 0.25SHAPr Scores 1e 3

0

500

1000

1500

2000

Freq

uenc

y(b) CREDIT (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

5000

10000

15000

20000

25000

Freq

uenc

y

(c) FMNIST (SPRS)

2 1 0 1SHAPr Scores 1e 4

0

5000

10000

15000

20000

25000

Freq

uenc

y

(d) FMNIST (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

500

1000

1500

2000

2500

Freq

uenc

y

(e) MEPS (SPRS)

1.5 1.0 0.5 0.0SHAPr Scores 1e 3

0

500

1000

1500

2000

2500

Freq

uenc

y

(f) MEPS (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

50

100

150

200

250

300

Freq

uenc

y(g) FLOWER (SPRS)

2 1 0 1 2 3SHAPr Scores 1e 3

0

50

100

150

200

250

300

Freq

uenc

y

(h) FLOWER (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

20

40

60

80

100

120

140

160

Freq

uenc

y

(i) LOCATION (SPRS)

2 1 0 1 2SHAPr Scores 1e 3

0

20

40

60

80

100

120

140

160

Freq

uenc

y

(j) LOCATION (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

1000

2000

3000

4000

5000

6000

7000

8000

Freq

uenc

y

(k) CENSUS (SPRS)

4 3 2 1 0 1SHAPr Scores 1e 4

0

1000

2000

3000

4000

5000

6000

7000

8000

Freq

uenc

y(l) CENSUS (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

500

1000

1500

2000

2500

3000

3500

Freq

uenc

y

(m) PURCHASE (SPRS)

2 1 0 1 2SHAPr Scores 1e 4

0

500

1000

1500

2000

2500

3000

3500

Freq

uenc

y

(n) PURCHASE (SHAPr)

0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores

0

500

1000

1500

2000

2500

3000

Freq

uenc

y

(o) TEXAS (SPRS)

6 4 2 0 2 4SHAPr Scores 1e 4

0

500

1000

1500

2000

2500

3000

Freq

uenc

y

(p) TEXAS (SHAPr)

Figure 5: Distribution of membership privacy risk scores for different datasets. SHAPr (green) equitably assigns privacy risk

to data records based on model’s memorization. SPRS (blue) have the same scores for large numbers of data records and are

centered around 0.5 resulting in inconclusive membership privacy risk estimates for those training data records.

15

Page 16: SHAPr: An Efficient and Versatile Membership Privacy Risk

None 64/255 128/255Noise

0.52

0.53

0.54

0.55

0.56

SPR

S Sc

ores

54

56

58

60

62

Atta

ck A

ccur

acy

CleanNoisy

(a) FMNIST (SPRS)

None 64/255 128/255Noise

0.2

0.4

0.6

0.8

1.0

1.2

1.4

SHAP

r Sc

ores

1e 4

54

56

58

60

62

Atta

ck A

ccur

acy

CleanNoisy

(b) FMNIST (SHAPr)

None 64/255 128/255Noise

0.5200

0.5225

0.5250

0.5275

0.5300

0.5325

0.5350

0.5375

0.5400

SPR

S Sc

ores

56

58

60

62

Atta

ck A

ccur

acy

CleanNoisy

(c) CREDIT (SPRS)

None 64/255 128/255Noise

6.50

6.75

7.00

7.25

7.50

7.75

8.00

8.25

SHAP

r Sc

ores

1e 4

56

58

60

62

Atta

ck A

ccur

acy

CleanNoisy

(d) CREDIT (SHAPr)

None 64/255 128/255Noise

0.560

0.565

0.570

0.575

0.580

0.585

0.590

SPR

S Sc

ores

56

57

58

59

60

61

62

63

64

Atta

ck A

ccur

acy

CleanNoisy

(e) MEPS (SPRS)

None 64/255 128/255Noise

7.5

7.6

7.7

7.8

7.9

8.0

8.1

8.2

SHAP

r Sc

ores

1e 4

56

57

58

59

60

61

62

63

64

Atta

ck A

ccur

acy

CleanNoisy

(f) MEPS (SHAPr)

None 64/255 128/255Noise

0.54

0.55

0.56

0.57

0.58

0.59

0.60

0.61

0.62

SPR

S Sc

ores

52

54

56

58

60

62

64

66

Atta

ck A

ccur

acy

CleanNoisy

(g) FLOWER (SPRS)

None 64/255 128/255Noise

1

2

3

4

5

6

7

8

9

SHAP

r Sc

ores

1e 4

52

54

56

58

60

62

64

66

Atta

ck A

ccur

acy

CleanNoisy

(h) FLOWER (SHAPr)

None 64/255 128/255Noise

0.775

0.800

0.825

0.850

0.875

0.900

0.925

0.950

SPR

S Sc

ores

55

60

65

70

75

80

85

90

Atta

ck A

ccur

acy

CleanNoisy

(i) LOCATION (SPRS)

None 64/255 128/255Noise

0.0

0.5

1.0

1.5

2.0

2.5

SHAP

r Sc

ores

1e 4

55

60

65

70

75

80

85

90

Atta

ck A

ccur

acy

CleanNoisy

(j) LOCATION (SHAPr)

None 64/255 128/255Noise

0.5200

0.5225

0.5250

0.5275

0.5300

0.5325

0.5350

0.5375

0.5400

SPR

S Sc

ores

56

58

60

62

Atta

ck A

ccur

acy

CleanNoisy

(k) CENSUS (SPRS)

None 64/255 128/255Noise

2.6

2.8

3.0

3.2

3.4

3.6

3.8

4.0

SHAP

r Sc

ores

1e 5

51

52

53

54

55

56

Atta

ck A

ccur

acy

CleanNoisy

(l) CENSUS (SHAPr)

None 64/255 128/255Noise

0.58

0.60

0.62

0.64

0.66

0.68

0.70

0.72

SPR

S Sc

ores

55

60

65

70

75

80

Atta

ck A

ccur

acy

CleanNoisy

(m) PURCHASE (SPRS)

None 64/255 128/255Noise

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

SHAP

r Sc

ores

1e 5

55

60

65

70

75

80

Atta

ck A

ccur

acy

CleanNoisy

(n) PURCHASE (SHAPr)

None 64/255 128/255Noise

0.68

0.69

0.70

0.71

0.72

0.73

0.74

0.75

0.76

SPR

S Sc

ores

55

60

65

70

75

80

85

Atta

ck A

ccur

acy

CleanNoisy

(o) TEXAS (SPRS)

None 64/255 128/255Noise

0

1

2

3

4

5

SHAP

r Sc

ores

1e 5

55

60

65

70

75

80

85

Atta

ck A

ccur

acy

CleanNoisy

(p) TEXAS (SHAPr)

Figure 6: Impact of adding noise to training data records onmembership privacy.We expect the black lines (privacy risk scores)

to follow “sideways-V” shape to match the gray lines (attack accuracy). We observe this to be true for SHAPr but not SPRS.

16