model stealing attacks - people.duke.edu

17
Model Stealing Attacks Neil Gong

Upload: others

Post on 06-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model Stealing Attacks - people.duke.edu

Model Stealing AttacksNeil Gong

Page 2: Model Stealing Attacks - people.duke.edu

Machine Learning as a Service

Page 3: Model Stealing Attacks - people.duke.edu

Why do we care model stealing?

- Avoiding query charges

- Violating training-data privacy

- Stepping stone to evasion

Page 4: Model Stealing Attacks - people.duke.edu

How to evaluate success?

• Stealing exact model parameters

• Stealing functionality• Similar accuracy• Prediction agreement

• Number of queries

Page 5: Model Stealing Attacks - people.duke.edu

How to steal a model?

• Model is deployed as a client software, e.g., Android app• “Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning

Model Protection in Mobile Apps”• Reverse engineering the code

• MLaaS• Querying the model

Page 6: Model Stealing Attacks - people.duke.edu

Extraction with Confidence Values Equation-Solving Attacks

Binary logistic regression LR model performs binary classification (c = 2), by estimating the probability of a binary response, based on a number of independent features.LR model is defined by parameters w ∈ Rd , β ∈ R, and outputs probability f1(x) = σ(w·x + β), where σ(t) = 1/(1+e−t ). LR is a linear classifier: it defines a hyperplane in the feature space X (defined by w·x+β = 0), that separates the two classes.Given an oracle sample (x, f(x)), we get a linear equation w·x+β = σ−1(f1(x)).Thus, d+1 samples are both necessary and sufficient to recover w and β.

Page 7: Model Stealing Attacks - people.duke.edu

Extraction with Confidence Values Equation-Solving Attacks

Binary logistic regression Any user who wishes to make more than d +1 queries to a model would then minimize the prediction cost by first running a cross-user model extraction attack, and then using the extracted model for personal use.A user may abuse the service’s resources to train a model over a large data set D (i.e., |D| >> d), and extract it after only d +1predictions.

Page 8: Model Stealing Attacks - people.duke.edu

Extraction with Confidence Values Equation-Solving Attacks

Multiclass LRs and Multilayer Perceptrons

MLRs combines c binary models, each with parameters wi ,βi , to form a multiclass model.

Two types of MLR models: softmax [fits a joint multinomial distribution] and one-vs-rest (OvR) [ trains a separate binary LR for each class, and then normalizes the class probabilities]

MLR model f is defined by parameters w ∈ Rcd , β ∈ Rc . Each sample (x, f(x)) gives c equations in w and β . The equation system is non-linear and has no analytic solution.

With a regularization term, the loss function is strongly convex, and the optimization converges to a global minimum -easily adapted for model extraction with equation-solving.

MLP has similar architecture: first apply a non-linear transform to all inputs (the hidden layer), followed by a softmaxregression in the transformed space.

Two major differences: number of unknowns in the system to solve and the loss function for MLPs is not strongly convex.

Page 9: Model Stealing Attacks - people.duke.edu

Extraction with Confidence Values Equation-Solving Attacks

Training Data Leakage for Kernel LRKernel methods are commonly used to efficiently extend support vector machines (SVM) to nonlinear classifiers, but similar techniques can be applied to logistic regression.

A KLR model is a softmax model, where we replace the linear components wi·x + βi by a mapping ∑s

r=1αi,rK(x,xr) + βi , K is a kernel function, and the representers x1,...,xs are a chosen subset of the training points.

Each sample (x, f(x)) from a KLR model yields c equations over the parameters α ∈ Rsc ,β ∈ Rc and the representers x1,...,xs.

By querying the model, A obtains a non-linear equation system, the solution of which leaks training data.

Assumption is that A knows the exact numbers of representers sampled from the data.- Relaxing assumptions: adding extra representers with weights α = 0.

Page 10: Model Stealing Attacks - people.duke.edu

Decision Tree Path-Finding Attacks

● Goal: Find the underlying structure of a decision tree

Page 11: Model Stealing Attacks - people.duke.edu

Decision Tree Overview

● Decision Trees classify data by splitting on features and have labels in the leaf nodes

● Tree Model○ Allows binary and multi-ary splits over

categorical features○ Binary splits over numeric features○ Leaf of tree is labeled with class and a

confidence score○ Also applies to regression trees

■ Leafs are labeled with real value output and a confidence score

Page 12: Model Stealing Attacks - people.duke.edu

API Details

1. API give high precision confidence value for leaf nodes

2. API allow for incomplete inputs by labeling intermediate nodes with a label and confidence value

Page 13: Model Stealing Attacks - people.duke.edu

Identity Oracles

● Leaf Identity Oracle ○ takes a query as input and returns the identifier of the leaf of the tree that is

reached on that query● Node Identity Oracle

○ Takes an incomplete query and returns the identifier of the intermediate node or leaf at which tree computation halts.

Page 14: Model Stealing Attacks - people.duke.edu

Extraction Algorithm 1

● Assume we have a leaf-identity oracle that returns a unique identifier for each leaf● Start with a random input X and get the leaf id from the oracle

○ Search for all constraints on x that have to be satisfied to remain in that leaf● Create a new query for unvisited leaf and repeat until all leaves have been found

Page 15: Model Stealing Attacks - people.duke.edu

Top-Down Approach

● Exploit queries over partial inputs to extract tree layer by layer starting at the root● Start with empty query to get id for root● Set each feature in order to get which feature is the first split● Use this procedure to recursively get all nodes in the tree and eventually all leaves

Page 16: Model Stealing Attacks - people.duke.edu

What if only predicted labels?

● Re-training

● Active learning

● Adversarial examples

Page 17: Model Stealing Attacks - people.duke.edu

Challenges

● Stealing exact model parameters is still hard for complex models

● Relationship with knowledge distillation○ Black-box knowledge distillation