fzhangjie,[email protected] arxiv:2111.13841v1 [cs.cv] 27

11
Adaptive Perturbation for Adversarial Attack Zheng Yuan 1,2 , Jie Zhang 1,2 , Shiguang Shan 1,2 1 Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences [email protected]; {zhangjie,sgshan}@ict.ac.cn Abstract In recent years, the security of deep learning models achieves more and more attentions with the rapid develop- ment of neural networks, which are vulnerable to adver- sarial examples. Almost all existing gradient-based attack methods use the sign function in the generation to meet the requirement of perturbation budget on L norm. However, we find that the sign function may be improper for generat- ing adversarial examples since it modifies the exact gradi- ent direction. We propose to remove the sign function and directly utilize the exact gradient direction with a scaling factor for generating adversarial perturbations, which im- proves the attack success rates of adversarial examples even with fewer perturbations. Moreover, considering that the best scaling factor varies across different images, we pro- pose an adaptive scaling factor generator to seek an ap- propriate scaling factor for each image, which avoids the computational cost for manually searching the scaling fac- tor. Our method can be integrated with almost all existing gradient-based attack methods to further improve the attack success rates. Extensive experiments on the CIFAR10 and ImageNet datasets show that our method exhibits higher transferability and outperforms the state-of-the-art meth- ods. 1. Introduction With the rapid progress and significant success in the application of neural networks in recent years, the secu- rity of deep learning models achieves more and more at- tention. The robustness of models is quite important, espe- cially for the scenarios like face recognition and automatic driving. Unfortunately, the models can be easily misled by adding small, human-imperceptible noises to the input, i.e., the models are vulnerable to adversarial examples [42]. In addition, adversarial examples have an intriguing property of transferability, where adversarial examples generated by one model can also fool other unknown models. Several gradient-based attack methods have been pro- global optimum local optimum original image the update path of BIM the update path of ours APAA Figure 1. A two-dimensional toy example to illustrate the differ- ence between our proposed APAA and existing sign-based meth- ods, e.g., BIM [21]. The orange path and blue path represent the update path of BIM and our APAA when generating adversarial examples, respectively. Due to the limitation of the sign function, the update direction of BIM is limited and not accurate enough, resulting in only reaching the sub-optimal end-point. Our method can not only obtain a more accurate update direction, but also adjust the step size adaptively. Finally, our APAA can reach the global optimum with fewer steps. posed to generate adversarial examples. From the one-step method [13] to the iterative methods [8, 9, 21, 25, 29, 49], the transferability of generated adversarial examples have gradually improved. In the task of adversarial examples generation, the most commonly used setting is to constrain the L norm of the generated adversarial perturbations for fair comparisons. Almost all the existing gradient-based attack methods use the sign function to normalize the gradient to maintain the maximation of perturbation. Due to the influence of the sign function, all values of the gradient are normalized into {0, +1, -1}. Although this approach can amplify the small gradient value, so as to make full use of the perturbation budget in the L attack, the resultant update directions in adversarial attack are limited, e.g., there are only eight possible update directions in the case of a two-dimensional space. The inaccurate update direction may cause the gen- erated adversarial examples to be sub-optimal (as shown in 1 arXiv:2111.13841v1 [cs.CV] 27 Nov 2021

Upload: others

Post on 13-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Adaptive Perturbation for Adversarial Attack

Zheng Yuan1,2, Jie Zhang1,2, Shiguang Shan1,2

1Institute of Computing Technology, Chinese Academy of Sciences2University of Chinese Academy of Sciences

[email protected]; {zhangjie,sgshan}@ict.ac.cn

Abstract

In recent years, the security of deep learning modelsachieves more and more attentions with the rapid develop-ment of neural networks, which are vulnerable to adver-sarial examples. Almost all existing gradient-based attackmethods use the sign function in the generation to meet therequirement of perturbation budget on L∞ norm. However,we find that the sign function may be improper for generat-ing adversarial examples since it modifies the exact gradi-ent direction. We propose to remove the sign function anddirectly utilize the exact gradient direction with a scalingfactor for generating adversarial perturbations, which im-proves the attack success rates of adversarial examples evenwith fewer perturbations. Moreover, considering that thebest scaling factor varies across different images, we pro-pose an adaptive scaling factor generator to seek an ap-propriate scaling factor for each image, which avoids thecomputational cost for manually searching the scaling fac-tor. Our method can be integrated with almost all existinggradient-based attack methods to further improve the attacksuccess rates. Extensive experiments on the CIFAR10 andImageNet datasets show that our method exhibits highertransferability and outperforms the state-of-the-art meth-ods.

1. IntroductionWith the rapid progress and significant success in the

application of neural networks in recent years, the secu-rity of deep learning models achieves more and more at-tention. The robustness of models is quite important, espe-cially for the scenarios like face recognition and automaticdriving. Unfortunately, the models can be easily misled byadding small, human-imperceptible noises to the input, i.e.,the models are vulnerable to adversarial examples [42]. Inaddition, adversarial examples have an intriguing propertyof transferability, where adversarial examples generated byone model can also fool other unknown models.

Several gradient-based attack methods have been pro-

global optimum

local optimum

original image

the update path of BIM

the update path of ours APAA

Figure 1. A two-dimensional toy example to illustrate the differ-ence between our proposed APAA and existing sign-based meth-ods, e.g., BIM [21]. The orange path and blue path represent theupdate path of BIM and our APAA when generating adversarialexamples, respectively. Due to the limitation of the sign function,the update direction of BIM is limited and not accurate enough,resulting in only reaching the sub-optimal end-point. Our methodcan not only obtain a more accurate update direction, but alsoadjust the step size adaptively. Finally, our APAA can reach theglobal optimum with fewer steps.

posed to generate adversarial examples. From the one-stepmethod [13] to the iterative methods [8, 9, 21, 25, 29, 49],the transferability of generated adversarial examples havegradually improved.

In the task of adversarial examples generation, the mostcommonly used setting is to constrain the L∞ norm of thegenerated adversarial perturbations for fair comparisons.Almost all the existing gradient-based attack methods usethe sign function to normalize the gradient to maintain themaximation of perturbation. Due to the influence of thesign function, all values of the gradient are normalized into{0,+1,−1}. Although this approach can amplify the smallgradient value, so as to make full use of the perturbationbudget in the L∞ attack, the resultant update directionsin adversarial attack are limited, e.g., there are only eightpossible update directions in the case of a two-dimensionalspace. The inaccurate update direction may cause the gen-erated adversarial examples to be sub-optimal (as shown in

1

arX

iv:2

111.

1384

1v1

[cs

.CV

] 2

7 N

ov 2

021

Fig. 1), i.e., although the generated adversarial examplescan succeed in white-box attacks, they may fail under thesetting of black-box attacks.

To solve the above problem of existing methods, we pro-pose a method called Adaptive Perturbation for AdversarialAttack (APAA), which directly multiplies a scaling factor tothe gradient of loss function instead of using the sign func-tion to normalize. The scaling factor can either be elabo-rately selected manually, or adaptively generated accordingto different image characteristics. Specifically, to furthertake the characteristics of different images into consider-ation, we propose an adaptive scaling factor generator toautomatically generate the suitable scaling factor in eachattack step during the generation of adversarial examples.From Fig. 1, we can clearly see that, since our method canadaptively adjust the step size in each attack step, a largestep size can be used in the first few steps of iterative at-tacks to make full use of the perturbation budget, and whenclose to the global optimal point, the step size can be adap-tively reduced. With more accurate update direction, ourAPAA reaches the global optimum with fewer update stepsand perturbations, which can improve the transferability ofadversarial examples and thus achieve higher attack successrates in black-box attacks.

The main contributions of our work are as follows:1. We propose an effective attack method by directly

multiplying the gradient of loss function by a scaling fac-tor instead of employing the sign function, which is nearlyused in all existing gradient-based methods to normalize thegradient.

2. We propose an adaptive scaling factor generator togenerate the suitable scaling factor in each attack step ac-cording to the characteristics of different images, which getsrid of manually hyperparameter searching.

3. Extensive experiments on the datasets of CIFAR10and ImageNet show the superiority of our proposed method.Our method significantly improves the transferability ofgenerated adversarial examples and achieves higher attacksuccess rates on both white-box and black-box settings thanthe state-of-the-art methods with fewer update steps andperturbations budgets.

2. Related Work

The phenomenon of adversarial examples is proposed bySzegedy et al. [42]. The attack and defense methods pro-mote the development of each other in recent years, whichare briefly reviewed in this section.

2.1. Attack Methods

The attack methods can be divided into four categoriesaccording to the amount of the target model information theadversary can obtain from more to less, i.e., white-box at-

tack, score-based black-box attack, decision-based black-box attack, and transfer-based black-box attack.

The white-box attack methods [2,13,21,21,29,30] haveaccess to all information of the target model, including themodel parameters and gradient information. The scored-based [3, 6, 23, 28] and decision-based [1, 5, 10] black-boxattack respectively can achieve the probability of each classand the final decision class in the output layer. The transfer-based black-box attack [8, 9, 25, 49] is the most challengingtask setting, where no information of the target model canbe obtained.

Our work mainly focuses on the gradient-based attackmethods on L∞ norm. The white-box methods can di-rectly use the gradient information, while the transfer-basedblack-box methods utilize the generated adversarial exam-ples from the white-box model and leverage the transfer-ability [42] of adversarial examples to attack the black-box model. Since the methods in these two categories aremainly gradient-based, we provide a brief introduction tothese works.

Let x and y be the original image and its correspond-ing class label, respectively. Let J(x, y) be the loss func-tion of cross-entropy. Let xadv be the generated adversar-ial example. Let ε and α be the total perturbation budgetand the budget in each step of the iterative methods. Πx+S

means to clip the generated adversarial examples within theε−neighborhood of the original image on L∞ norm.

Fast Gradient Sign Method. FGSM [13] is a one-stepmethod for white-box attack, which directly utilizes the gra-dient of loss function to generate the adversarial example:

xadv = x+ ε · sign(∇xJ(f(x), y)). (1)Basic Iterative Method. BIM [21] is an extension of

FGSM, which uses the iterative method to improve the at-tack success rate of adversarial examples:xadvt+1 = Πx+S(xadvt + α · sign(∇xJ(f(xadvt ), y))), (2)

where xadv0 = x and the subscript t is the index of iteration.Momentum Iterative Fast Gradient Sign Method.

MIFGSM [8] proposes a momentum term to accumulatethe gradient in previous steps to achieve more stable up-date directions, which greatly improves the transferabilityof generated adversarial examples:

gt+1 = µ · gt +∇xJ(f(xadvt ), y)

‖∇xJ(f(xadvt ), y)‖1, (3)

xadvt+1 = Πx+S(xadvt + α · sign(gt+1)), (4)where gt denotes the momentum item of gradient in the t-thiteration and µ is a decay factor.

Diverse Inputs Method. A randomization operation ofrandom resizing and zero-padding to the original image isproposed in DIM [49].

Translation-Invariant Attack Method. TIM [9] pro-poses a translation-invariant attack method by convolvingthe gradient with a Gaussian kernel to further improve the

2

transferability of adversarial examples.SINI Method. Inspired by Nesterov accelerated gradi-

ent [31], SINI [25] amends the accumulation of the gradi-ents to effectively look ahead and improve the transferabil-ity of adversarial examples. In addition, SINI also proposesto use several copies of the original image with differentscales to generate the adversarial example.

2.2. Defense Methods

Adversarial defense aims to improve the robustness ofthe target model in the case of adversarial examples be-ing the inputs. The defense methods can mainly be cat-egorized into adversarial training, input transformation,model ensemble, and certified defenses. The adversar-ial training methods [29, 33, 39, 43, 44] use the adversar-ial examples as the extra training datas to improve the ro-bustness of the model. The input transformation meth-ods [11, 19, 24, 27, 37] tend to denoise the adversarial ex-amples before feeding them into the classifier. The modelensemble methods [26, 32, 50] use multiple models simul-taneously to reduce the influence of adversarial exampleson the single model and achieve more robust results. Cer-tified defense methods [7, 18, 35, 45] guarantee that the tar-get model can correctly classify the adversarial exampleswithin the given distance from the original images.

3. Method

In this section, we first analyze the defect of the signfunction used in existing gradient-based attack methods inSec. 3.1. Then we propose our method of Adaptive Pertur-bation for Adversarial Attack (APAA) in Sec. 3.2, i.e., uti-lizing a scaling factor to multiply the gradient instead of thesign function normalization. The scaling factor can eitherbe elaborately selected manually (in Sec. 3.2.1), or adap-tively generated according to different image characteristicsby a generator (in Sec. 3.2.2).

3.1. Rethinking the Sign Function

The task of adversarial attack is to do a minor modi-fication on the original images with human-imperceptiblenoises to fool the target model, i.e., misclassifying the ad-versarial examples. The gradient-based methods generatethe adversarial examples by maximizing the cross-entropyloss function, which can be formulated as follows:

arg maxxadv

J(f(xadv), y), s.t.‖xadv − x‖p ≤ ε, (5)

where p could be 0, 1, 2 and∞. The most commonly usedsetting is to constrain the maximation perturbation budgetwith L∞ norm for fair comparisons. Almost all existinggradient-based attack methods use the sign function to nor-malize the gradient of the loss function in each step, whichcan fully take advantage of the perturbation budget.

The sign function normalizes all values of the gradientinto {0,+1,−1}. Although this method can amplify thesmall gradient value to make full use of the perturbationbudget in the L∞ attack, the resultant update directionsin adversarial attack are limited, e.g., there are only eightpossible update directions in the case of a two-dimensionalspace ((0, 1), (0, -1), (1, 1), (1, -1), (1, 0), (-1, -1), (-1,0), (-1, 1)). From Fig. 1, we can clearly see that the attackdirections in sign-based methods (e.g., BIM [21]) are dis-tracted and not accurate anymore, which conversely needsmore update steps and perturbations budgets to implementa successful attack. The inaccurate update direction maycause the generated adversarial examples to be sub-optimal,which damages the transferability of adversarial examples,and probably leads to a failure in the black-box attack.

3.2. Adaptive Perturbation for Adversarial Attack

To solve the problem mentioned above, we propose amethod of Adaptive Perturbation for Adversarial Attack(APAA). Specifically, we propose to directly multiply thegradient with a scaling factor instead of normalizing it withthe sign function. The scaling factor can be determined ei-ther elaborately selected manually, or adaptively achievedfrom a generator according to different image characteris-tics. We will introduce each of them in the following, re-spectively.

3.2.1 Fixed Scaling Factor

First, we propose to directly multiply the gradient with afixed scaling factor, which not only maintains the accurategradient directions but also can flexibly utilize the perturba-tion budget through the adjustment of the scaling factor:

xadvt+1 = Πx+S(xadvt + γ · ∇xJ(f(xadvt ), y)), (6)where γ is the scaling factor.

Our method is easy to implement and can be combinedwith all existing gradient-based attack methods. When inte-grating with all aforementioned methods, i.e., MIFGSM [8],DIM [49], TIM [9] and SINI [25], the full update formula-tion is as follows:

xadv0 = x, (7)

xnest = xadvt + γ · µ · gt, (8)

g = W ∗ 1

m

m−1∑i=0

∇xJ(f(T (Si(xnest ), p)), y), (9)

gt+1 = µ · gt +g

‖g‖1, (10)

xadvt+1 = Πx+S(xadvt + γ · gt+1), (11)where µ is the decay factor in MIFGSM, W is a Gaussiankernel with the kernel size of k used in DIM, ∗ denotes theoperator of convolution, T (∗, p) denotes the transformationused in TIM with the probability of p, Si(x) = x/2i de-

3

notes the scaled image used in SINI andm denotes the num-ber of scale copies.

3.2.2 Adaptive Scaling Factor

Considering that suitable scaling factors vary across differ-ent images, we further propose a generator to directly gen-erate the adaptive scaling factor in each step of adversarialexamples generation, which also gets rid of manually hy-perparameter searching.

The overview of our proposed adaptive scaling factorgenerator is shown in Fig. 2. We employ an iterative methodto generate the adversarial example with T steps. In eachstep t, the gradient of the adversarial examples generatedin the previous step is calculated through a randomly se-lected white-box model Cx. The corresponding scaling fac-tor is also generated through the generator Gt by feedingthe adversarial example together with the gradient informa-tion as the input. Then the gradient information and scal-ing factor are used to generate the new adversarial pertur-bation. We used the BIM-style update method (xadvt =xadvt−1 + γt · ∇xJ(Cx(xadvt−1), y)) as an example in the fig-ure, but in fact any existing gradient-based attack methodcan be utilized, e.g., MIFGSM [8], DIM [49], TIM [9] andSINI [25]. The generated adversarial examples in all stepsare used to calculate the cross-entropy loss by another clas-sifier Cy . Then the parameters in the generator Gt are up-dated by maximizing the cross-entropy loss through back-propagation, aiming to generate the adversarial examples,which can mislead the classifier. The generators used ineach step share the same architecture but do not share theparameters. The procedure of training the generator is sum-marized in Algorithm 1.

It should be mentioned that our proposed generatorneeds at least two white-box models to train. Through ex-periments, we find that when training the generator withonly one classifier model, i.e., n = 1 and Cx = Cy , thegenerated scaling factor is as high as possible. Althoughthe higher white-box attack success rate can be achieved inthis case, the transferability of generated adversarial exam-ples is dropped dramatically. To maintain the transferabilityof adversarial examples, we use different classifier modelswhen calculating the gradient information and updating theparameters of generator Gt, so that the generated scalingfactor will not overfit to a specific model.

As shown in Fig. 1, compared to the sign-based meth-ods, our proposed APAA can adaptively adjust the step sizein each attack step. A large step size can be used in thefirst few steps of iterative attacks to make full use of theperturbation budget, and when close to the global optimalpoint, the step size can be adaptively reduced. With moreaccurate update direction, our APAA can reach the globaloptimum with fewer update steps and perturbations, which

Algorithm 1 The training of adaptive scaling factor gener-atorInput: the total training step NInput: the number of attack iteration TInput: the learning rate βInput: the dataset (X ,Y)Input: the white-box models C1, · · · , CnOutput: the scaling factor generators G1, G2, · · · , GT

with the parameters of θ1: for i ∈ {1, · · ·N} do2: Sample the image and label pair (x, y) from (X ,Y)3: Randomly choose two classifiers Cx and Cy fromC1, · · · , Cn

4: xadv0 = x5: for t ∈ {1, · · ·T} do6: gradt = ∇xJ(Cx(xadvt−1), y)7: γt = Gt(x

advt−1, gradt)

8: xadvt = Πx+S(γt · gradt + xadvt−1) . using theupdate in BIM as an example

9: losst = J(Cy(xadvt ), y)10: Gt(θ) = Gt(θ) + β · ∇θ(losst)11: end for12: end for

Table 1. The models used in the experiments on CIFAR10 andImageNet.

No. CIFAR10 ImageNet

normallytrainedmodel

1 ResNet-18 [14] Inceptionv3 [41]2 DenseNet-121 [17] Inceptionv4 [40]3 SENet-18 [16] InceptionResNetv2 [40]4 RegNetX-200mf [34] ResNetv2-101 [15]5 DLA [51] ResNetv2-152 [15]6 DPN-92 [4] MobileNetv2 1.0 [38]7 Shake-ResNet-26 2x64d [12] MobileNetv2 1.4 [38]8 CbamResNeXt [46]

defensemodel

D1 Adv DenseNet-121 Adv Inceptionv3 [43]D2 Adv GoogLeNet Ens3 Inceptionv3 [43]D3 Adv ResNet-18 ll Ens4 Inceptionv3 [43]D4 k-WTA [47] Ens InceptionResNetv2 [43]D5 ADP [32] R&P [48]D6 NIPS-r3

further improves the transferability of adversarial examplesand thus achieves higher attack success rates in black-boxattacks.

4. Experiments

We first introduce the setting of experiments in Sec. 4.1.Then we demonstrate the effectiveness of our proposedAPAA with fixed scaling factor in Sec. 4.2. Finally, weconduct several experiments to show the superiority of theadaptive scaling factor generator in Sec. 4.3. We denotethe APAA with the fixed scaling factor as APAAf and theAPAA with the adaptive scaling factor generator as APAAa.

4

𝛻𝛻𝑥𝑥𝐽𝐽(𝐶𝐶𝑥𝑥(𝑥𝑥0𝑎𝑎𝑎𝑎𝑎𝑎),𝑦𝑦)

𝛾𝛾1

𝐽𝐽(𝐶𝐶𝑦𝑦(𝑥𝑥1𝑎𝑎𝑎𝑎𝑎𝑎), 𝑦𝑦)

𝑥𝑥0𝑎𝑎𝑎𝑎𝑎𝑎 = 𝑥𝑥

𝑥𝑥1𝑎𝑎𝑎𝑎𝑎𝑎

𝐶𝐶𝑥𝑥

𝐺𝐺1

𝐶𝐶𝑦𝑦

𝛻𝛻𝑥𝑥𝐽𝐽(𝐶𝐶𝑥𝑥(𝑥𝑥1𝑎𝑎𝑎𝑎𝑎𝑎),𝑦𝑦)

𝛾𝛾2

𝐽𝐽(𝐶𝐶𝑦𝑦(𝑥𝑥2𝑎𝑎𝑎𝑎𝑎𝑎),𝑦𝑦)

𝑥𝑥1𝑎𝑎𝑎𝑎𝑎𝑎

𝑥𝑥2𝑎𝑎𝑎𝑎𝑎𝑎

𝐶𝐶𝑥𝑥

𝐺𝐺2

𝐶𝐶𝑦𝑦

𝛻𝛻𝑥𝑥𝐽𝐽(𝐶𝐶𝑥𝑥(𝑥𝑥𝑇𝑇−1𝑎𝑎𝑎𝑎𝑎𝑎),𝑦𝑦)

𝛾𝛾𝑇𝑇

𝐽𝐽(𝐶𝐶𝑦𝑦(𝑥𝑥𝑇𝑇𝑎𝑎𝑎𝑎𝑎𝑎),𝑦𝑦)

𝑥𝑥𝑇𝑇−1𝑎𝑎𝑎𝑎𝑎𝑎

𝑥𝑥𝑇𝑇𝑎𝑎𝑎𝑎𝑎𝑎

𝐶𝐶𝑥𝑥

𝐺𝐺𝑇𝑇

𝐶𝐶𝑦𝑦...

Figure 2. The overview of the scaling factor generator. In each step t, the gradient of one randomly selected model Cx is first calculated.Simultaneously, the adversarial example from the previous step together with the gradient information are fed into the generator Gt togenerate the scaling factor γt. Then the scaling factor and the gradient information are used to generate the adversarial perturbation.The adversarial examples generated in all steps are all fed into another classifier Cy to optimize the parameters of generator Gt throughmaximizing the cross-entropy loss between the output classification probability and the ground truth label.

4.1. Experiment Setting

Datasets. We use ImageNet [36] and CIFAR10 [20]datasets to conduct the experiments. CIFAR10 has 50000training images and 10000 test images in different classes.For ImageNet dataset, we use the 1000 images in NIPS2017Adversarial Attack Competition 1 to evaluate the attack suc-cess rate of adversarial examples. The maximation pertur-bation budgets on CIFAR10 and ImageNet are 0.0314 and0.0628 with L∞ norm, respectively.

Evaluation Models. We use both normally trained mod-els and defense models to evaluate all attack methods. ForCIFAR10, we use totally 8 normally trained models and6 defense models for comprehensive evaluations. For Im-ageNet, we use totally 7 naturally trained models and 7defense models to evaluate. The details of these modelsare shown in Tab. 1. The normally trained models con-tain almost all mainstream model structures. The defensemodels contain adversarial training [29, 43], which is themost effective defense method nowadays, and some otherdefense technologies, including k-WTA [47], GBZ [22],ADP [32], R&P [48], NIPS-r3 2 and CERTIFY [7]. The D1and D2 models used in CIFAR10 are adversarially trainedwith FGSM, and D3 is adversarially trained with Least-Likely [21]. The D1 to D4 models used in ImageNet are ad-

1https : / / www . kaggle . com / c / nips - 2017 - non -targeted-adversarial-attack/

2https://github.com/anlthms/nips- 2017/tree/master/mmd

versarially trained with ensemble adversarial training [43].Baselines. We use MIFGSM [8], DIM [49], TIM [9]

and SINI [25] as baselines to compare with our proposedmethod. Since these methods can integrate with each other,we combine them with the previous methods accordingto the chronological order they proposed, i.e., integratingMIFGSM to DIM, integrating MIFGSM and DIM to TIM,integrating MIFGSM, DIM and TIM to SINI. The details ofthe hyperparameters in these methods are provided in Tab. 7in the appendix. The number of iteration T in the generationof adversarial examples for all methods, including ours, is10 unless mentioned. We clip the adversarial examples tothe range of the normal image (i.e., 0-1) and constrain theadversarial perturbations within the perturbation budget onL∞ norm bound in each step of the iteration process.

Details. We train the adaptive scaling factor generatorwith the training set of CIFAR10. The training of the modelis quite fast. It takes about 2-3 hours on a GTX 1080TiGPUs with 3-4 epochs. The architecture of the generator isshown in Tab. 8 in the appendix.

4.2. Attack with Fixed Scaling Factor

We conduct comprehensive experiments on CIFAR10and ImageNet to demonstrate the effectiveness of our pro-posed APAAf , i.e., replacing the sign normalization witha fixed scaling factor. The value of the scaling factor inthe experiments is determined by manually hyperparametersearching within a certain range.

5

Table 2. The attack success rates of the adversarial examples against 13 models on CIFAR10 under untargeted attack setting. The indexof models in the table is the same as Tab. 1. The MAD means the mean absolute distance between original examples and adversarialexamples. The scaling factor used is 8. ∗ indicates the white-box models being attacked.

(a) The adversarial examples are generated on model No. 1.

Method normally trained model defense model MAD1∗ 2 3 4 5 6 7 8 D1 D2 D3 D4 D5

MIFGSM 99.70 90.49 94.02 85.30 92.37 79.92 88.82 87.60 16.58 37.43 14.58 81.52 82.00 0.0265DIM 99.54 93.94 96.12 89.86 95.23 85.73 92.60 92.10 16.43 39.56 15.09 86.49 87.81 0.0234TIM 99.59 93.81 96.13 90.40 94.94 86.01 92.65 92.54 16.78 40.14 12.76 86.32 87.62 0.0234

TIM w/ APAAf 99.77 94.87 97.16 91.68 96.49 86.70 94.05 93.65 21.52 48.72 20.18 87.35 88.90 0.0208SINI 99.66 95.74 97.45 92.38 96.57 88.16 94.47 94.29 18.85 44.53 16.83 89.25 90.35 0.0234

SINI w/ APAAf 99.84 96.27 97.98 93.12 97.31 89.08 95.24 95.48 26.34 56.42 23.26 89.43 91.11 0.0199(b) The adversarial examples are generated on the ensembled models of No. 1 and D2.

Method normally trained model defense model MAD1∗ 2 3 4 5 6 7 8 D1 D2∗ D3 D4 D5

MIFGSM 97.79 93.81 94.42 91.45 94.29 88.02 91.99 93.50 51.07 100 27.57 85.64 88.30 0.0213DIM 97.52 94.31 94.96 91.98 94.78 89.02 92.34 93.85 44.47 99.74 27.54 86.99 89.73 0.0222TIM 97.51 94.03 94.83 92.20 94.40 89.25 92.39 93.87 45.23 99.74 27.50 87.43 89.53 0.0222

TIM w/ APAAf 98.34 95.57 96.11 93.75 95.87 90.79 93.89 95.35 48.90 99.91 30.41 88.62 91.10 0.0196SINI 98.95 97.16 97.30 95.35 97.33 92.79 95.47 97.08 49.43 97.46 30.08 91.81 94.10 0.0227

SINI w/ APAAf 99.46 98.18 98.52 96.65 98.39 94.50 96.93 98.18 54.67 98.74 33.64 93.02 95.28 0.0202

Table 3. The attack success rates of the adversarial examples against 13 models on ImageNet under untargeted attack setting. The scalingfactor used is 0.4 except for 0.9 used in TIM. ∗ indicates the white-box models being attacked.

(a) The adversarial examples are generated on model No. 1.

Method normally trained model defense model MAD1∗ 2 3 4 5 6 7 D1 D2 D3 D4 D5 D6

MIFGSM 100 52.0 49.4 37.9 39.2 65.3 58.4 26.0 21.6 19.9 10.6 9.6 14.2 0.0404MIFGSM w/ APAAf 100 52.8 51.6 37.7 38.5 66.8 59.7 27.6 20.8 20.1 10.9 10.6 15.1 0.0346

DIM 99.8 79.6 76.1 60.9 61.0 82.1 79.0 39.0 35.0 33.8 18.4 20.2 28.9 0.0405DIM w/ APAAf 100 79.8 77.6 61.5 61.6 82.7 81.3 49.8 39.7 37.9 21.0 23.8 32.3 0.0303

TIM 99.4 67.0 54.6 63.0 61.9 79.2 76.6 56.8 55.0 53.8 40.9 42.8 47.8 0.0403TIM w/ APAAf 99.9 72.1 59.4 65.8 66.6 82.8 80.0 64.1 59.3 60.2 45.1 45.4 52.1 0.0406

SINI 100 85.0 81.6 77.6 76.2 86.7 87.1 65.2 64.4 61.0 44.1 47.3 55.3 0.0406SINI w/ APAAf 100 84.0 80.1 77.1 76.3 89.1 88.0 73.8 67.3 64.9 51.2 52.1 56.6 0.0304

(b) The adversarial examples are generated on the ensembled models of No. 1, 2, 3 and 5.

Method normally trained model defense model MAD1∗ 2∗ 3∗ 4 5∗ 6 7 D1 D2 D3 D4 D5 D6

MIFGSM 99.7 99.9 99.7 86.7 98.7 85.4 85.8 50.9 53.5 52.2 35.8 36.4 47.8 0.0399MIFGSM w/ APAAf 99.9 99.9 99.9 87.5 98.7 86.3 87.6 62.8 58.8 58.7 40.1 41.8 52.2 0.0311

DIM 99.8 99.5 99.0 93.6 97.9 94.7 95.9 74.3 80.9 77.5 60.5 68.7 78.0 0.0407DIM w/ APAAf 99.6 99.7 99.5 94.7 98.4 95.3 95.8 86.4 84.3 81.0 68.1 74.1 81.7 0.0309

TIM 98.7 98.5 96.8 90.2 96.5 89.8 91.4 84.6 85.8 85.1 81.1 81.4 83.1 0.0408TIM w/ APAAf 99.3 99.5 99.1 96.2 98.7 92.3 94.3 91.8 92.4 92.1 87.6 88.8 90.3 0.0405

SINI 99.9 99.4 99.5 97.1 99.0 94.6 96.0 90.7 90.2 88.6 81.2 82.3 86.5 0.0406SINI w/ APAAf 99.9 99.6 99.4 97.5 99.5 95.3 96.7 92.9 91.5 90.1 83.0 84.0 87.3 0.0358

The untargeted attack. The experiments of adversarialexamples generated on CIFAR10 and ImageNet under theuntargeted attack setting are shown in Tab. 2 and Tab. 3 re-spectively. From the results on both the single model andthe ensembled model attacks, combined with our proposedmethod of the scaling factor, the attack success rates of gen-

erated adversarial examples in all the state-of-the-art meth-ods are improved. In addition, under the same perturbationbudget constrain on L∞ norm, the mean absolute distancesbetween the original images and generated adversarial ex-amples of our APAAf are smaller than the existing meth-ods. Due to the accurate gradient directions used in our pro-

6

Table 4. The attack success rates of the adversarial examples against 13 models on CIFAR10 under targeted attack setting. The scalingfactor used is 8. ∗ indicates the white-box models being attacked.

Attack Method normally trained model defense modelModel 1 2 3 4 5 6 7 8 D1 D2 D3 D4 D5

1 SINI 86.83∗ 59.74 64.88 51.63 62.56 45.82 59.87 56.59 3.68 14.38 3.02 45.82 47.77APAAf 90.14∗ 64.03 70.53 56.22 67.88 49.09 64.74 61.65 5.30 19.58 4.62 49.36 51.76

1, 2 SINI 83.88∗ 88.24∗ 73.37 65.32 71.88 60.23 67.76 68.70 5.63 21.78 4.09 54.29 58.36APAAf 88.76∗ 92.09∗ 79.29 71.40 77.53 65.25 73.71 74.89 7.31 26.71 5.44 59.01 64.05

1, 2, 3, 4 SINI 86.67∗ 88.27∗ 90.28∗ 81.07∗ 80.99 71.22 77.64 78.06 8.41 32.88 5.25 65.31 69.35APAAf 90.95∗ 92.38∗ 93.69∗ 86.96∗ 86.36 77.51 83.37 84.26 10.96 39.56 7.06 71.55 75.67

1, 3, D1, D2 SINI 87.32∗ 82.26 91.10∗ 74.86 81.69 70.04 79.47 80.05 60.54∗ 84.98∗ 9.52 65.03 69.79APAAf 92.01∗ 87.19 94.54∗ 81.01 87.07 75.08 85.27 85.80 63.31∗ 90.58∗ 11.15 70.07 75.94

28

36

44

52

60

68

76

84

92

100

2 4 8

Atta

ck S

ucce

ss R

ate

%

Perturbation Budget

1* 2 3 5 7

1* 2 3 5 7

SINI

ours

12

20

28

36

44

52

60

68

76

84

92

100

2 4 8

Att

ack

Succ

ess

Rat

e %

Perturbation Budget

D1 D2 D4 D5

D1 D2 D4 D5

SINI

ours

(a) white-box model: 1

28

36

44

52

60

68

76

84

92

100

2 4 8

Atta

ck S

ucce

ss R

ate

%

Perturbation Budget

1* 2 3 5 7

1* 2 3 5 7

SINI

ours

28

36

44

52

60

68

76

84

92

100

2 4 8

Atta

ck S

ucce

ss R

ate

%

Perturbation Budget

D1 D2* D4 D6

D1 D2* D4 D6

SINI

ours

(b) white-box model: 1, D2

36

44

52

60

68

76

84

92

100

2 4 8

Atta

ck S

ucce

ss R

ate

%

Perturbation Budget

1* 2 3* 5 7

1* 2 3* 5 7

SINI

ours

36

44

52

60

68

76

84

92

100

2 4 8

Atta

ck S

ucce

ss R

ate

%

Perturbation Budget

D1* D2* D4 D6

D1* D2* D4 D6

SINI

ours

(c) white-box model: 1, 3, D1, D2

Figure 3. The attack success rates vs. perturbation budget curve on CIFAR10. The curves with dotted lines are the results of SINI, andthose with solid lines are the results of our method. The two rows are the results against normally trained models and defense modelsrespectively. The index of models in the table is the same as Tab. 1. ∗ indicates the white-box models being attacked. The scaling factorsused in our methods with perturbation budget of 2, 4 and 8 are 5, 7 and 8 respectively.

posed method, our attack method is more effective, whichhas higher attack success rates with fewer perturbations.

The targeted attack. The experiments of adversarialexamples generated on CIFAR10 under the targeted attacksetting are shown in Tab. 4. The target label of each imageis randomly chosen among the 9 wrong labels. The averageblack-box attack success rates of the adversarial examplesgenerated by APAAf against normally trained models anddefense models are about 5% and 3% higher than those ofSINI under various settings of the target model. It verifiesthat our proposed scaling factor method is also effective un-der the targeted attack setting.

The size of perturbation. We conduct experiments todemonstrate the attack success rates vs. perturbation budgetcurves on CIFAR10 in Fig. 3. We can clearly find that ourproposed method of using the scaling factor improves theattack success rates on various perturbation budgets, whichshows the excellent generalization of our method.

The number of iteration steps. From Tab. 5, we cansee that when we reduce the number of iteration used inour APAAf from 10 steps to 8 steps, we can still achievehigher attack success rates of the adversarial examples thanthe results of the state-of-the-art method with 10 steps. Itconfirms our opinion that accurate gradient directions canconduct a successful attack with fewer attack steps.

4.3. Attack with Adaptive Scaling Factor

We further demonstrate the effectiveness of our proposedadaptive scaling factor generator. The process of gener-ating adversarial examples with APAAa is summarized inAlgorithm 2 in the appendix. For fair comparisons, we con-sider all the models used during the training of the generator(C1, · · · , Cn) as white-box models, and conduct both at-tacks of SINI and our method on the ensemble of all white-box models. As shown in Tab. 6, when using the scalingfactor generated by the generator proposed in Sec. 3.2.2 to

7

Table 5. The attack success rates of the adversarial examples against 13 models on CIFAR10 under untargeted attack setting with differentnumber of iterations. The scaling factors used in 8 steps and 10 steps are 10 and 8 respectively. ∗ indicates the white-box models beingattacked.

Attack Method normally trained model defense modelModel 1 2 3 4 5 6 7 8 D1 D2 D3 D4 D5

1

SINI 8steps 99.14∗ 93.85 95.92 90.15 94.85 86.23 92.36 92.21 23.71 42.85 16.00 86.51 87.98SINI 10steps 99.66∗ 95.74 97.45 92.38 96.57 88.16 94.47 94.29 18.85 44.53 16.83 89.25 90.35

APAAf 8steps 99.81∗ 96.40 98.13 93.18 97.33 88.69 95.18 95.40 22.11 50.14 20.74 89.76 91.04APAAf 10steps 99.84∗ 96.27 97.98 93.12 97.31 89.08 95.24 95.48 26.34 56.42 23.26 89.43 91.11

1, 2

SINI 8steps 98.41∗ 98.89∗ 96.77 94.04 96.25 91.66 94.21 94.68 20.45 53.84 19.88 90.35 92.06SINI 10steps 99.31∗ 99.56∗ 98.27 96.41 97.93 94.43 96.33 96.87 23.94 55.91 20.75 92.83 94.44

APAAf 8steps 99.53∗ 99.70∗ 98.70 96.73 98.16 94.52 97.03 97.54 31.54 64.71 26.78 93.09 94.62APAAf 10steps 99.70∗ 99.87∗ 99.15 97.93 98.83 95.59 97.78 98.22 32.54 65.29 27.24 93.73 95.28

1, 2, 3, 4

SINI 8steps 98.60∗ 98.74∗ 99.26∗ 97.57∗ 97.64 95.19 96.25 96.85 27.76 64.41 22.18 93.55 94.78SINI 10steps 99.28∗ 99.49∗ 99.64∗ 98.70∗ 98.53 96.95 97.65 98.06 29.09 65.85 23.42 95.60 96.53ours 8steps 99.54∗ 99.66∗ 99.76∗ 99.10∗ 99.05 97.43 98.26 98.68 38.29 75.63 31.05 96.33 97.09

APAAf 10steps 99.63∗ 99.77∗ 99.79∗ 99.33∗ 99.28 97.93 98.51 98.88 40.46 76.27 32.22 96.95 97.99

1, 3, D1, D2

SINI 8steps 98.16∗ 97.25 99.09∗ 95.36 97.08 93.89 95.85 96.66 85.67∗ 94.08∗ 36.93 92.69 94.61SINI 10steps 98.83∗ 98.16 99.50∗ 96.88 98.11 95.53 97.07 97.82 89.69∗ 95.78∗ 38.79 94.58 96.07

APAAf 8steps 99.19∗ 98.63 99.58∗ 97.38 98.53 96.16 97.54 98.29 89.50∗ 97.08∗ 41.60 94.97 96.64APAAf 10steps 99.51∗ 99.03 99.75∗ 98.11 98.97 96.93 98.22 98.84 92.52∗ 97.76∗ 41.55 96.13 97.50

Table 6. The comparisons between the attack success rates of the adversarial examples by SINI, our proposed method with fixed scalingfactor and our proposed method with adaptive scaling factor against 13 models on CIFAR10 under untargeted attack setting. ∗ indicatesthe white-box models being attacked.

(a) The adversarial examples are generated on the ensembled models of No. 1 and D2.

Method normally trained model defense model MAD1∗ 2 3 4 5 6 7 8 D1 D2∗ D3 D4 D5

SINI 98.95 97.16 97.30 95.35 97.33 92.79 95.47 97.08 49.43 97.46 30.08 91.81 94.10 0.0227SINI w/ APAAf 99.46 98.18 98.52 96.65 98.39 94.50 96.93 98.18 54.67 98.74 33.64 93.02 95.28 0.0202SINI w/ APAAa 99.63 98.37 98.70 96.89 98.49 94.75 97.34 98.28 54.68 98.97 33.68 93.40 95.44 0.0202

(b) The adversarial examples are generated on the ensembled models of No. 1, 3, D1 and D2.

Method normally trained model defense model MAD1∗ 2 3∗ 4 5 6 7 8 D1∗ D2∗ D3 D4 D5

SINI 98.83 98.16 99.50 96.88 98.11 95.53 97.07 97.82 89.69 95.78 38.79 94.58 96.07 0.0231SINI w/ APAAf 99.51 99.03 99.75 98.11 98.97 96.93 98.22 98.84 92.52 97.76 41.55 96.13 97.50 0.0204SINI w/ APAAa 99.51 99.07 99.75 98.00 99.08 96.84 98.38 98.86 92.52 97.85 45.87 95.82 97.13 0.0192

generate the adversarial examples, it achieves comparativeor even higher attack success rates, which saves the trou-ble of manually searching for the best hyperparameter ofscaling factor. The means of the scaling factor generated inTab. 6a and Tab. 6b are 8.41 and 6.95 respectively, whichis near the best manually selected hyperparameter. More-over, the standard deviations are 1.74 and 2.01 respectively.It demonstrates that the generator can adaptively generatedifferent scaling factors with a certain diversity in differentsteps according to the characteristics of images.

5. Limitation and Social Impact

Limitation. Extensive experiments show that the accu-rate gradient direction is more promising than that of beingnormalized by the sign function, which is however not jus-tified by theoretical analysis. Even though, our method issimple but effective for adversarial attacks and we will dotheoretical analysis on it in the future.

Possible negative social impact. While adversarial at-tacks pose a potential threat to existing deep models, the

ultimate goal of our research is to promote the robustnessof the model. For example, it is feasible to incorporate ourattack method into the framework of adversarial training,resulting in a more reliable model to defense against variousunknown attacks, which thus can promote the developmentof safer deep learning applications.

6. ConclusionIn this work, we propose to use the scaling factor in-

stead of the sign function to normalize the gradient of the in-put example when conducting the adversarial attack, whichcan achieve a more accurate gradient direction and thus im-prove the transferability of adversarial examples. The scal-ing factor can either be elaborately selected manually, oradaptively achieved by a generator according to differentimage characteristics. Extensive experiments on CIFAR10and ImageNet show the superiority of our proposed meth-ods, which can improve the black-box attack success rateson both normally trained models and defense models withfewer update steps and perturbation budgets.

8

References[1] Wieland Brendel, Jonas Rauber, and Matthias Bethge.

Decision-based adversarial attacks: Reliable attacks againstblack-box machine learning models. In ICLR, 2018. 2

[2] Nicholas Carlini and David A. Wagner. Towards evaluatingthe robustness of neural networks. In SP, pages 39–57, 2017.2

[3] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, andCho-Jui Hsieh. ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substi-tute models. In Proceedings of the 10th ACM Workshop onArtificial Intelligence and Security, AISec@CCS 2017, Dal-las, TX, USA, November 3, 2017, pages 15–26, 2017. 2

[4] Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin,Shuicheng Yan, and Jiashi Feng. Dual path networks. InNeurIPS, pages 4467–4475, 2017. 4

[5] Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jin-feng Yi, and Cho-Jui Hsieh. Query-efficient hard-labelblack-box attack: An optimization-based approach. In ICLR,2019. 2

[6] Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, andJun Zhu. Improving black-box adversarial attacks with atransfer-based prior. In NeurIPS, pages 10932–10942, 2019.2

[7] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter. Cer-tified adversarial robustness via randomized smoothing. InICML, pages 1310–1320, 2019. 3, 5

[8] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, JunZhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial at-tacks with momentum. In CVPR, pages 9185–9193, 2018.1, 2, 3, 4, 5

[9] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu.Evading defenses to transferable adversarial examples bytranslation-invariant attacks. In CVPR, pages 4312–4321,2019. 1, 2, 3, 4, 5

[10] Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu,Tong Zhang, and Jun Zhu. Efficient decision-based black-box adversarial attacks on face recognition. In CVPR, pages7714–7722, 2019. 2

[11] Gintare Karolina Dziugaite, Zoubin Ghahramani, andDaniel M. Roy. A study of the effect of jpg compression onadversarial images. arXiv preprint arXiv:1608.00853, 2016.3

[12] Xavier Gastaldi. Shake-shake regularization. arXiv preprintarXiv:1705.07485, 2017. 4

[13] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and harnessing adversarial examples. In ICLR,2015. 1, 2

[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. In CVPR,pages 770–778, 2016. 4

[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Identity mappings in deep residual networks. In ECCV,pages 630–645, 2016. 4

[16] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation net-works. In CVPR, pages 7132–7141, 2018. 4

[17] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil-ian Q. Weinberger. Densely connected convolutional net-works. In CVPR, pages 4700–4708, 2017. 4

[18] Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil ZhenqiangGong. Certified robustness for top-k predictions against ad-versarial perturbations via randomized smoothing. In ICLR,2020. 3

[19] Xiaojun Jia, Xingxing Wei, Xiaochun Cao, and HassanForoosh. Comdefend: An efficient image compressionmodel to defend adversarial examples. In CVPR, pages6084–6092, 2019. 3

[20] Alex Krizhevsky. Learning multiple layers of features fromtiny images. 2009. 5

[21] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Ad-versarial machine learning at scale. In ICLR, 2017. 1, 2, 3,5

[22] Yingzhen Li, John Bradshaw, and Yash Sharma. Are genera-tive classifiers more robust to adversarial attacks? In Interna-tional Conference on Machine Learning, pages 3804–3814.PMLR, 2019. 5

[23] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, and Bo-qing Gong. NATTACK: learning the distributions of adver-sarial examples for an improved black-box attack on deepneural networks. In ICML, volume 97, pages 3866–3876,2019. 2

[24] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang,Xiaolin Hu, and Jun Zhu. Defense against adversarial attacksusing high-level representation guided denoiser. In CVPR,pages 1778–1787, 2018. 3

[25] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, andJohn E. Hopcroft. Nesterov accelerated gradient and scaleinvariance for adversarial attacks. In ICLR, 2020. 1, 2, 3, 4,5

[26] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-JuiHsieh. Towards robust neural networks via random self-ensemble. In ECCV, pages 369–385, 2018. 3

[27] Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang,and Wujie Wen. Feature distillation: Dnn-oriented JPEGcompression against adversarial examples. In CVPR, pages860–868, 2019. 3

[28] Chen Ma, Li Chen, and Jun-Hai Yong. Simulating un-known target models for query-efficient black-box attacks.In CVPR, pages 11835–11844, 2021. 2

[29] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt,Dimitris Tsipras, and Adrian Vladu. Towards deep learningmodels resistant to adversarial attacks. In ICLR, 2018. 1, 2,3, 5

[30] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, andPascal Frossard. Deepfool: A simple and accurate methodto fool deep neural networks. In CVPR, pages 2574–2582,2016. 2

[31] Y. Nesterov. A method for unconstrained convex minimiza-tion problem with the rate of convergence o(1/k2). 1983.3

[32] Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu.Improving adversarial robustness via promoting ensemblediversity. In ICML, pages 4970–4979, 2019. 3, 4, 5

9

[33] Tianyu Pang, Xiao Yang, Yinpeng Dong, Taufik Xu, JunZhu, and Hang Su. Boosting adversarial training with hy-persphere embedding. In NeurIPS, 2020. 3

[34] Ilija Radosavovic, Raj Prateek Kosaraju, Ross B. Girshick,Kaiming He, and Piotr Dollar. Designing network designspaces. In CVPR, pages 10428–10436, 2020. 4

[35] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Cer-tified defenses against adversarial examples. In ICLR, 2018.3

[36] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael S. Bernstein, Alexander C. Berg,and Li Fei-Fei. Imagenet large scale visual recognition chal-lenge. IJCV, 115(3):211–252, 2015. 5

[37] Pouya Samangouei, Maya Kabkab, and Rama Chellappa.Defense-gan: Protecting classifiers against adversarial at-tacks using generative models. In ICLR, 2018. 3

[38] Mark Sandler, Andrew G. Howard, Menglong Zhu, AndreyZhmoginov, and Liang-Chieh Chen. Mobilenetv2: Invertedresiduals and linear bottlenecks. In CVPR, pages 4510–4520,2018. 4

[39] Chuanbiao Song, Kun He, Jiadong Lin, Liwei Wang, andJohn E. Hopcroft. Robust local features for improving thegeneralization of adversarial training. In ICLR, 2020. 3

[40] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, andAlexander A. Alemi. Inception-v4, inception-resnet and theimpact of residual connections on learning. In AAAI, pages4278–4284, 2017. 4

[41] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe,Jonathon Shlens, and Zbigniew Wojna. Rethinking the in-ception architecture for computer vision. In CVPR, pages2818–2826, 2016. 4

[42] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, JoanBruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus.Intriguing properties of neural networks. In ICLR, 2014. 1,2

[43] Florian Tramer, Alexey Kurakin, Nicolas Papernot, Ian J.Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensem-ble adversarial training: Attacks and defenses. In ICLR,2018. 3, 4, 5

[44] Eric Wong, Leslie Rice, and J. Zico Kolter. Fast is better thanfree: Revisiting adversarial training. In ICLR, 2020. 3

[45] Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, andJ. Zico Kolter. Scaling provable adversarial defenses. InNeurIPS, pages 8400–8409, 2018. 3

[46] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In SoKweon. CBAM: convolutional block attention module. InECCV, pages 3–19, 2018. 4

[47] Chang Xiao, Peilin Zhong, and Changxi Zheng. Enhancingadversarial defense by k-winners-take-all. In ICLR, 2020. 4,5

[48] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, andAlan L. Yuille. Mitigating adversarial effects through ran-domization. In ICLR, 2018. 4, 5

[49] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, JianyuWang, Zhou Ren, and Alan L. Yuille. Improving transferabil-ity of adversarial examples with input diversity. In CVPR,pages 2730–2739, 2019. 1, 2, 3, 4, 5

[50] Huanrui Yang, Jingyang Zhang, Hongliang Dong, NathanInkawhich, Andrew Gardner, Andrew Touchet, WesleyWilkes, Heath Berry, and Hai Li. DVERGE: diversifyingvulnerabilities for enhanced robust generation of ensembles.In NeurIPS, 2020. 3

[51] Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Dar-rell. Deep layer aggregation. In CVPR, pages 2403–2412,2018. 4

10

Appendix

A. Details in ExperimentsA.1. Hyperparameters in Baselines

The hyperparameters used in baselines are provided inTab. 7. For the dataset of ImageNet, we use the hyperparam-eters reported in origin papers. For the dataset of CIFAR10,we find that the setting used in ImageNet is ineffective anddetermine the optimal value by hyperparameter searching.

Table 7. The hyperparameters used in baseline methods on Ima-geNet and CIFAR10 datasets.

ImageNet CIFAR10MIFGSM µ = 1.0 µ = 1.0

DIM µ = 1.0, p = 0.7 µ = 1.0, p = 0.7TIM µ = 1.0, p = 0.7, k = 15 µ = 1.0, p = 0.7, k = 3SINI µ = 1.0, p = 0.5, k = 7, m = 5 µ = 1.0, p = 0.7, k = 3, m = 2

A.2. The Architecture of the Generator

The architecture of the scaling factor generator is shownin Tab. 8.

Table 8. The architecture of the adaptive scaling factor generator.The k, s and p in the Conv2d layer means the kernel size, strideand padding, respectively.

No. Layer Input Shape Output Shape1 Conv2d(k=3, s=2, p=1) [32,32,6] [16,16,32]2 InstanceNorm2d [16,16,32] [16,16,32]3 Conv2d(k=3, s=2, p=1) [16,16,32] [8,8,32]4 InstanceNorm2d [8,8,32] [8,8,32]5 Conv2d(k=3, s=2, p=1) [8,8,32] [4,4,32]6 InstanceNorm2d [4,4,32] [4,4,32]7 Flatten [4,4,32] [512]8 Linear [512] [128]9 Linear [128] [1]

B. AlgorithmThe process of generating adversarial examples with

adaptive scaling factor generator is summarized in Algo-rithm 2.

Algorithm 2 Generating adversarial examples with adap-tive scaling factor generator

Input: the original image x and corresponding label yInput: the number of attack iteration TInput: the scaling factor generators G1, G2, · · · , GTInput: the ensembled model C, which contains all white-

box classifier models C1, · · · , CnOutput: the adversarial example xadvT

1: xadv0 = x2: for t ∈ {1, · · ·T} do3: gradt = ∇xJ(C(xadvt−1), y)4: γt = Gt(x

advt−1, gradt)

5: xadvt = Πx+S(γt · gradt + xadvt−1) . using theupdate in BIM as an example

6: end for7: return xadvT

11