siamese networks for generating adversarial examples

Siamese networks for generating adversarial examples

Mandar KulkarniData ScientistSchlumberger

[email protected]

Aria AbubakarData Analytics Program Manager

[email protected]

Abstract— Machine learning models are vulnerable toadversarial examples. An adversary modifies the input datasuch that humans still assign the same label, however, ma-chine learning models misclassify it. Previous approachesin the literature demonstrated that adversarial examplescan even be generated for the remotely hosted model. Inthis paper, we propose a Siamese network based approachto generate adversarial examples for a multiclass targetCNN. We assume that the adversary do not possess anyknowledge of the target data distribution, and we use anunlabeled mismatched dataset to query the target, e.g., forthe ResNet-50 target, we use the Food-101 dataset as thequery. Initially, the target model assigns labels to the querydataset, and a Siamese network is trained on the imagepairs derived from these multiclass labels. We learn theadversarial perturbations for the Siamese model and showthat these perturbations are also adversarial w.r.t. the targetmodel. In experimental results, we demonstrate effective-ness of our approach on MNIST, CIFAR-10 and ImageNettargets with TinyImageNet/Food-101 query datasets.

I. INTRODUCTION AND RELATED WORKS

Machine learning models are vulnerable to externalattacks such as adversarial inputs. The examples fromthe dataset can be perturbed in a manner that a hu-man assigns the same label to it, however, it machinelearning models to misclassify it. Such examples arereferred to as the Adversarial examples. Szegedy et al.[1] proposed a box-constraint optimization to obtain theadversarial samples from the input images. In addition,it was observed that the same perturbation causesdifferent networks trained on different subsets of thedataset to misclassify the same input. Thus, adversarialexamples are transferable between the models trainedon similar data distribution. Goodfellow et. al. [2]proposed fast gradient sign (FGS) method for gener-ating adversarial examples. Small input perturbationsthat maximize the loss, locally, produce images thatare more often misclassified by the network. It wasargued that the primary cause of the vulnerability ofneural networks to adversarial perturbation is theirlinear nature. Moosavi et al. [3] discovered the exis-tence of a universal (image-agnostic) perturbation thatcauses natural images to be misclassified with highprobability. The existence of universal perturbationspoint towards the geometric correlations among thehigh-dimensional decision boundary of classifiers. Itunderlines the existence of single directions in the inputspace that adversaries can possibly exploit to enforce aclassifier to make mistakes on natural images.

Fig. 1. Example of an adversarial image generated by our approachfor RESNET-50 model.

Adversarial examples pose a potential threat to ma-chine learning models when deployed in the realworld. Since these perturbations tend to go undetectedby the humans, it becomes difficult to rectify them.Kurakin et al. [4] showed that even in physical-worldscenarios, machine learning systems are vulnerable toadversarial examples. It was demonstrated that a largefraction of the adversarial examples obtained from acell-phone camera get misclassified by the ImageNetInception model.

It was believed that the adversarial samples can onlybe generated when we have an access to the targetmodel. However, Papernot et al. [5] demonstrated thatthe adversarial examples can even be generated for theremotely hosted deep neural network (DNN) with noknowledge of the architectural of the DNN, such as thenumber, type, and size of layers, nor of the trainingdata used to learn the DNN’s parameters. The onlyaccess their adversary has to the target model is the(atomic) class label of the submitted query vector. Theyrefer to this setting as the black-box scenario. Startingwith a small subset of a training set, a Jacobian-baseddataset augmentation technique is used to effectivelylearn the decision boundary with the limited numberof queries made to the target DNN.

In this paper, we propose an approach based onSiamese networks [6][7] to generate adversarial ex-amples for the target classification model. Instead oflearning the decision boundary of the target model,we directly learn the input perturbation that changes

arX

iv:1

805.

0143

1v1

[cs

.LG

] 3

May

201

8

the label of the images. We use Siamese networksto learn such adversarial perturbation and show itseffectiveness in generating adversarial examples forthe target model. We use an unlabeled mismatchedimage dataset as query to the target convolutionalneural network (CNN). We refer to the mismatcheddataset as the ”stimulus”. Kulkarni et al. [8] showed theeffectiveness of mismatched stimulus for knowledgedistillation. We input the stimulus dataset to the targetmodel and obtain the corresponding class labels. Basedon the similarity and dissimilarity of the assigned la-bels, we generate positive and negative image pairs fortraining the Siamese network. We learn the adversarialperturbations for the Siamese network using the FSGmethod [2] in which perturbations are applied only tothe left image channel of the network to ’switch’ theoriginal label. We apply the learned perturbations tothe target images and show that they act as adversarialperturbations.

Our experimental setting is similar to [5] with a fewdifferences. Instead of remote DNN APIs, we demon-strate results with pretrained CNN models. Also, we donot assume availability of the subset of original trainingset. Through the use of the Siamese network, we utilizethe notion that the target CNN assigns different labelsto different stimulus images, which enables the net-work to infer the adversarial ”directions” in the inputspace. In the experimental section, we demonstrateeffectiveness of our approach on MNIST, CIFAR-10and ImageNet pretrained models as targets. We usesubset of TinyImageNet [9] [10] and Food-101 [11] asthe stimulus for these targets.

II. PROPOSED METHODOLOGY

This section describes our approach to generate ad-versarial examples for the target CNN.

A. Training Siamese networks

Siamese networks [6][7] are well known architecturesfor learning data representations. The training objec-tive minimizes a loss function such that the featuresrepresenting images from the same class have highersimilarity whereas features for the images of differentclasses have low similarity. The mapping from theraw images to the feature space is facilitated by aconvolutional network. They are proved to be efficienttools for learning the feature representation when theamount of labeled data is limited.

Since we do not assume knowledge of the targetdata distribution, we use an unlabeled mismatcheddataset as the stimulus for the target model. Duringthe experiments, for the MNIST target, we use TinyIm-ageNet whereas for CIFAR-10 and ResNet targets, weuse Food-101 as the stimulus. Initially, we get a smallstimulus dataset labeled by the target CNN. To limitthe number of queries, for each of our experiments, weonly make 5k queries to the target CNN and collect the

Fig. 2. Siamese architecture for learning the adversarial perturba-tion.

corresponding class labels. Since with n image queries,we can generate n2

2 − n unique image pairs, we caneffectively train the Siamese network with a lower riskof overfitting.

Let fT denotes the function of the target CNN. Letz ∈ [1, .., k] denote the label returned by the targetmodel for the input x. Since the target is the classi-fication model, z is obtained as follows:

zx = arg max fT(x) (1)

It is feasible that the labels assigned to the stimulusmay not cover all the possible classes that are presentin the target training set. Since Siamese networks workon similarity/dissimilarity of the labels, this does notcreate an issue. To avoid the possible class imbalanceby random sampling, we generate an equal numberof positive and negative pairs using the followingprocedure:• Randomly pick a class from the stimulus labels.• For that class, collect one positive and one negative

example.• Iterate until the desired number of pairs are col-

lected.For training the Siamese network, we use a strategy

similar to the one-shot learning approach proposed in[6] where the L1 difference of the left and the rightchannel image representations is taken prior to the sig-moid unit. Fig. 2 shows an overview of the architectureof our Siamese network. Let xL and xR denote the input

to the Left and to the Right subnetwork, respectively.Let y denotes the ground truth label. If xL and xRbelong to the same class, then y = 1, else y = 0. LetfS denotes the function represented by the Siamese. fStakes two images (xL and xR) as the input and assignsa label to it as follows:

ypred =

{1, if fS(xL, xR) >= 0.50, otherwise

Here, ypred denotes the label predicted for the giveninput image pair. The network is trained with thebinary cross entropy loss. Let Li denotes the loss forthe ith sample

Li = −yi log yipred − (1− yi) log (1− yi

pred) (2)

We use 10% of the training pairs as the validationset and the training is terminated when the model’sperformance ceases to improve on the validation set.

B. Adversarial perturbations for Siamese

Our aim is to produce the minimally altered exampleof an image x, say x∗, such that the target model mis-classifies the example, i.e.

x∗ = x + δ s.t. arg max fT(x) 6= arg max fT(x∗) (3)

However, we do not have access to fT , and we nowuse the trained fS as the proxy. We learn the adversarialperturbations for fS and show that, they are transferablew.r.t. the target model function fT .

To generate adversarial examples for the Siamesenetwork, we resort to the FSG method [2]. Since wehave two images as input and a binary label as output,our attempt is to flip the label of the given input pair.We take the gradient of xL w.r.t. the output of thenetwork, i.e., fS(xL, xR). We iteratively modify xL so asto flip the initial estimated label. If initial value of ypredis 0, we need to perform the gradient ascent to flip itto 1. If initially, ypred is 1, we need to perform gradientdescent to flip the label. To facilitate the switch betweenthe ascent and the descent, we use the following updateequation:

xL = xL + εS sign(0.5− ypred)∂ fS∂xL

(4)

Here, εS denotes the weight assigned to the gradi-ent. Note that the term sign(0.5− ypred), automaticallychoses the ascent or descent appropriately. As opposedto FSG for the multiclass case, which takes the gradientw.r.t. to the loss, we directly take the gradient w.r.t. themodel output.

The perturbation that resulted from the label flip isthen obtained as

xg = xL − xinitL (5)

Here, xinitL denotes the initial value of xL. The procedure

is described in the Algorithm 1 block.

Algorithm 1 Learning adversarial perturbations usingSiamese

1: INPUTS: xL, xR, y, εS, max iter2: OUTPUT: adversarial perturbation: xg3: Initialization: xg = 0, xinit

L = xL4: ypred = I( fS(xL, xR) >= 0.5)5: if ypred = y then6: for iter = 1 to max iter do7: xL = xL + εS sign(0.5− ypred)

∂ fS∂xL

8: ypred = I( fS >= 0.5)9: if ypred 6= y :

10: xg = xL - xinitL

11: breakFor MNIST and CIFAR-10 experiments, we use εS =0.001. For ImageNet, we use εS = 1. We use max iter =100 for all experiments.

C. Generating adversarial examples for target model

We perturb target samples using xg as follows:

xP = xT + εT sign(xg) (6)

Here xT denotes the sample from the target datasetand xP denotes the perturbed sample. εT denotes thescale factor for the signed perturbation. We define thetransfer rate Tr as the percentage of labels changed bythe learned perturbation:

Tr =

N∑

i=1I(zi

xP6= zi

xT)

N(7)

Here, N denotes the size of the target set and I denotesthe indicator function whose output is 1 if the conditionis true; else the output is 0. Note that our aim is tochange the label of the target image that was previouslyassigned by the target model. The modified label canbe any possible class.

To more effectively learn the xg, we generate a smallset of test image pairs and obtain the perturbation foreach by following the procedure described in Algo-rithm 1. We demonstrate that the perturbations thatcause the maximum change in the loss (of the Siamesenetwork) with the smallest infinity (max) norm, hasthe best transfer rate. We score each perturbation asfollows:

sj = (Lj

xinitL +xj

g− Lj

xinitL)/||xj

g||∞ (8)

Here, sj denotes the score value assigned for the jth

perturbation. We rank perturbations in descending or-der of the the score value. We use the highest-rankedperturbation to generate the adversarial examples forthe target set. In the experimental section, we demon-strate the validity and the effectiveness of our scoringfunction.

.2 .3 .4 .5 .6 .7 .8T

0.00.10.20.30.40.50.60.70.8

Tran

sfer

rate

MNIST TargetTop perturbationBottom perturbationRandom perturbation

(a) (b)

0.1 0.2 0.3 0.4 0.5T

0.5

0.6

0.7

0.8

0.9

Tran

sfer

rate

CIFAR Target

Top perturbationBottom perturbationRandom perturbation

(c) (d)

0 1 2 3 4 5 6 7 8 9MNIST Classes

0

500

1000

1500

2000

2500

3000

3500Histogram of classes for 5k TinyImageNet

airplaneautomobilebird cat deer dog frog horse ship truck

CIFAR Classes

0

200

400

600

800

1000

Histogram of classes for 5k Food-101

(e) (f)

Fig. 3. Experiments with MNIST and CIFAR targets. (a) Adversarial images misclassified by the MNIST target for εT = 0.2; (b) comparisonof transfer rates for the top, the bottom and the random perturbation for MNIST target; (c) adversarial images misclassified by the CIFARtarget for εT = 0.1; (d) comparison of transfer rates for top, the bottom and the random perturbation for CIFAR target showing that the topranked perturbations always have the higher transfer rate; (e) histogram of stimulus labels assigned by the MNIST target;(f) histogram ofstimulus labels assigned by the CIFAR target.

III. EXPERIMENTAL RESULTS

We demonstrate effectiveness of our approach onpretrained target models trained on MNIST, CIFAR-10and ImageNet datasets.

A. MNIST CNN target

The MNIST dataset consists of gray-scale images ofsize 28× 28 consisting of handwritten digits. We usea pretrained network as the MNIST target, which has

Fig. 4. Adversarial images for ResNet-50. The center imageshows the learned adversarial perturbation. Class probabilities aredisplayed in green.

approximately 184k parameters and test accuracy of99.58%. For MNIST target, we use 5k randomly chosenimages from the TinyImageNet dataset [9] [10] as thestimulus. TinyImageNet is the natural image datasetand has different data distribution than the MNISTimages. Stimulus images are resized and converted togray-scale prior to querying. We obtain the multiclasslabels on the stimulus using the MNIST target. Fig.3(e) shows the histogram of labels assigned to the 5kstimulus dataset. Peculiarly, label 8 gets assigned themost.

We train the Siamese network on 30k image pairsderived from these labels. From the training data,we use 10% data as the validation set. The networkarchitecture that performs well on the validation setis chosen. The architecture of the subnetwork of theSiamese is[Conv(128,5,5)–BatchNorm–MaxPool(2)–Conv(128,5,5)–BatchNorm–MaxPool(2)–Conv(128,5,5)–BatchNorm–MaxPool(2)].We use three convolution layers followed bybatch-normalization and max-pooling layers. Eachconvolution layer has 128 filters of size 5× 5. We use adense layer of size 512 prior to the sigmoid unit. Ournetwork has approximately 1.4M parameters. We usebatch-normalization layers and dropout to regularizethe network. We observe that results are fairly robustagainst the small changes in the Siamese architecture.

Thereafter, we follow the procedure described inAlgorithm 1 and learn the adversarial perturbations fora small set of newly generated image pairs. εS is set to0.001 and max iter is set to 100. We rank the perturba-tions according to the scoring function defined in Eq. 8.We use the top-ranked perturbation as xg and perturb10k images from MNIST test set according to Eq. 6.Fig. 3(a) shows the adversarial images misclassified bythe MNIST target. Note that we use the MNIST testset only to demonstrate the results for the adversarialexamples. The MNIST data is not assumed during thetraining process.

We now evaluate the effectiveness of our scoringstrategy. We compare the transfer rates for the highest-(top-) ranked perturbation and the lowest- (bottom-)ranked perturbation. We also perform a comparisonwith a random perturbation to verify the superiority oflearned perturbations. In case of random perturbation,we use a zero mean and unit standard deviation noiseimage as xg. We obtain transfer rates with five randomperturbations and plot the average transfer rate. Fig.3(b) shows the plots of the transfer rates for differentvalues of εT . It can be seen that the top-ranked pertur-bation always performs better than the bottom rankedone. Also, learned perturbations have higher transferrates than the random perturbation.

Though, we get higher transfer rates with largervalues of εT , the perturbation becomes increasinglyvisible. We get the transfer rate of 36.48% for εT valueof 0.3. For the similar value of εT , Papernot et al. [5]report the transfer rate of 78.72% with the query datasetsimilar to the target dataset. Although our transfer rateis lower, it is interesting to note that it is obtained witha single perturbation applied to the entire test set withthe smaller number of queries of mismatched dataset.

B. CIFAR CNN target

The CIFAR-10 dataset consists of color images of size32× 32. For this experiment, we use a pretrained targetnetwork which has approximately 776k parametersand test accuracy of 82.16%. For the CIFAR target, tohave a mismatched set, we use 5k randomly chosenimages from the Food-101 dataset [11] as the stimulus.The Food-101 dataset consists of images of food itemscorresponding to 101 different categories. We obtainlabels on the 5k stimulus set. Fig. 3(f) shows thehistogram of stimulus labels assigned by the CIFARtarget. We generate 70k image pairs from the labeledstimulus to train the Siamese network. We use the samearchitecture for Siamese as for the MNIST experiment.The network has approximately 1.8M parameters. Fig.3(c) shows the adversarial examples misclassified bythe CIFAR target. Fig. 3(d) shows the comparison oftransfer rates for the top, the bottom and the randomperturbation. Here as well, for the random perturba-tion, we report average transfer rate over five differentrandom perturbations. We observe the similar trendthat the top-ranked perturbation performs the best.

We now study the effect of the training set sizefor the Siamese network on the transfer rate. Table Ishows the transfer rates for the MNIST and CIFARtargets for varied number of input image pairs. We usefive different subsets of test image pairs and calculatethe mean and the standard deviation of the transferrates. We observe that transfer rates increase with morenumber of pairs. However, we observed that after acertain number of pairs, transfer rates do not changesignificantly.

# image pairs MNIST transfer rate CIFAR transfer rate(Tinyimagenet stim.) (Food-101 stim.)

10k 0.435± 0.074 0.376± 0.02620k 0.511± 0.055 0.419± 0.03830k 0.635± 0.059 0.426± 0.06

TABLE ICOMPARISON OF TRANSFER RATES WITH VARYING NUMBER OF

TRAINING IMAGE PAIRS. FOR MNIST TARGET, εT IS SET TO 0.5WHEREAS FOR CIFAR TARGET, εT IS SET TO 0.1.

Experimental results demonstrate that our approachis able to learn the adversarial perturbations evenunder restricted setting. We observe that a single per-turbation can be estimated that maximally perturbs thetarget set. This underlines the existence of universalperturbations, which we can effectively learn usingSiamese networks. Next, we qualitatively demonstrateresults of our approach on the pretrained target modeltrained on ImageNet dataset.

C. ResNet target

We use a pretrained ResNet-50 model as the target.For this target as well, we use 5k images from the Food-101 dataset as the mismatched stimulus. We generate1M image pairs from the class labels obtained from thetarget and train the Siamese network.

The architecture of the Siamese network is[Conv(128,5,5)–BatchNorm–MaxPool(2)–Conv(128,5,5)–BatchNorm–MaxPool(2)–Conv(128,5,5)–BatchNorm–MaxPool(2)–Conv(128,5,5)–BatchNorm–MaxPool(2)].

Note that, we use a similar architecture to those ofthe MNIST and CIFAR experiments where instead ofthree, we use four convolution/max pool layers. Thenetwork has approximately 14M parameters. We setεS = 1 and max iter to 100. Fig. 4 shows the resultof adding a learned adversarial perturbation to inputimages. The image in the center shows the highest-ranked perturbation. Images on the left show the origi-nal inputs and images on the right show the adversarialexamples. εT is calculated separately for each image.Note that the same perturbation is able to switch thelabel of the inputs. For some images, the effect of theperturbation is visible in the output image, e.g., for thegreen snake image, εT had to be set to 27 to change thelabel. However, for most of the images, smaller valuesof εT effected the class change. In this case as well,we observed that our perturbation is superior to therandom perturbation. With the random perturbation, amuch larger value of εT was needed to effect the labelchange. Fig 1 shows another example of adversarialimage generated by our approach for the image takenfrom [2].

Fig. 5. Experiment with the Google vision API. Left and rightcolumns show output of the vision API for original and modifiedimages, respectively.

Since adversarial images are known to be transfer-able between the models, we performed an experimentwith the Google vision API. Fig. 5 shows the class labelsreturned by the vision API for the original imagesand the perturbed images generated by our approach.Perturbed images are misclassified by the vision API.The result indicates that the adversarial perturbationdirections computed by our approach are effective.

IV. CONCLUSION

In this paper, we proposed an approach based onSiamese networks to generate adversarial examplesunder a black-box setting. Since Siamese networks aretrained on image pairs, with the limited number ofmismatched queries to the target model, we couldeffectively learn the adversarial directions in the inputspace. We observed that the perturbations that max-imally alter the loss with minimum max norm arethe most effective. Experiments with MNIST, CIFAR-10 and ImageNet targets with TinyImageNet and Food-101 stimulus demonstrated the efficacy of the proposedapproach.

REFERENCES

[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,I. Goodfellow, and R. Fergus, “Intriguing properties of neuralnetworks,” arXiv preprint arXiv:1312.6199, 2013.

[2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and har-nessing adversarial examples,” arXiv preprint arXiv:1412.6572,2014.

[3] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard,“Universal adversarial perturbations,” arXiv preprint, 2017.

[4] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial exam-ples in the physical world,” arXiv preprint arXiv:1607.02533,2016.

[5] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik,and A. Swami, “Practical black-box attacks against machinelearning,” in Proceedings of the 2017 ACM on Asia Conference onComputer and Communications Security. ACM, 2017, pp. 506–519.

[6] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural net-works for one-shot image recognition,” in ICML Deep LearningWorkshop, vol. 2, 2015.

[7] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similaritymetric discriminatively, with application to face verification,” inComputer Vision and Pattern Recognition, 2005. CVPR 2005. IEEEComputer Society Conference on, vol. 1. IEEE, 2005, pp. 539–546.

[8] M. Kulkarni, K. Patil, and S. Karande, “Knowledge distil-lation using unlabeled mismatched images,” arXiv preprintarXiv:1703.07131, 2017.

[9] “Tiny imagenet 120k: http://cs231n.stanford.edu/tiny-imagenet-100-a.zip, http://cs231n.stanford.edu/tiny-imagenet-100-b.zip.”

[10] A. Torralba, R. Fergus, and W. T. Freeman, “80 million tinyimages: A large data set for nonparametric object and scenerecognition,” IEEE transactions on pattern analysis and machineintelligence, vol. 30, no. 11, pp. 1958–1970, 2008.

[11] L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101 – miningdiscriminative components with random forests,” in EuropeanConference on Computer Vision, 2014.

siamese networks for generating adversarial examples

Documents