This fork tries to apply RayS untargeted L_inf norm hard-label attack to commercial Google Vision API. Bellow are the results of the experiments done so far.
Google vision API doesn't classify images into a set number of categories. Therefore we need our own binary decision function which defines what is/isn't a valid adversarial example.
Another problem is that one concept may be represented by multiple labels. For example labels "Cat", "Small to medium-sized cats", "Whiskers" and "felidae" are all somewhat close to the category "Cat". Idealy we would like to eliminate all those similar labels from the classification results.
Because RayS is inherently binary search algorithm, the definition of advesarial example may reflect how hard it is to find one.
Various boundary decision functions were experimented with:
- strict untargeted attack - any mention of any label from object label-set in any label returned by the API is considered a fail
- top5 attack - no labels coresponding to the original concept may appear in top5 results
- top1 attack - Only the first label is taken into account.
- All labels containing "cat", "felidae" or "whiskers" were forbidden.
- 3200 queries, L_inf = 0.196
- All labels containing "cat" were forbidden.
- 3200 queries, L_inf = 0.1757
- word "cat" forbidden in top5 labels
- 1600 queries, L_inf = 0.121
- word "cat" forbidden in top1 label
- 1600 queries, L_inf = 0.0666
- no mention of "Shark", "Fin" "Water", "Fish", "Carcharhiniformes", "Lamnidae", "Lamniformes" allowed in top5 labels
- 1600 queries, L_inf = 0.141
- no mention of "Shark", "Fin" "Water", "Fish", "Carcharhiniformes", "Lamnidae", "Lamniformes" allowed in top1 label
- 1600 queries, L_inf = 0.0705
"RayS: A Ray Searching Method for Hard-label Adversarial Attack"
Jinghui Chen, Quanquan Gu
https://arxiv.org/abs/2006.12792
This repository contains our PyTorch implementation of RayS: A Ray Searching Method for Hard-label Adversarial Attack in the paper RayS: A Ray Searching Method for Hard-label Adversarial Attack (accepted by KDD 2020).
RayS is a hard-label adversarial attack which only requires the target model's hard-label output (prediction label).
It is gradient-free, hyper-parameter free, and is also independent of adversarial losses such as CrossEntropy or C&W.
Therefore, RayS can be used as a good sanity check for possible "falsely robust" models (models that may overfit to certain types of gradient-based attacks and adversarial losses).
RayS also proposed a new model robustness metric: ADBD
(average decision boundary distance), which reflects examples' average distance to their closest decision boundary.
We tested the robustness of recently proposed robust models which are trained on the CIFAR-10 dataset with the maximum L_inf norm perturbation strength epsilon=0.031
(8/255). The robustness is evaluated on the entire CIFAR-10 testset (10000 examples).
Note:
- Ranking is based on the ADBD (average decision boundary distance) metric under RayS attack with the default query limit set as 40000. Reducing the query limit will accelerate the process but may lead to inaccurate ADBD value. For fast checking purpose, we recommend evaluating on subset of CIFAR-10 testset (e.g., 1000 examples).
*
denotes model using extra data for training.Robust Acc (RayS)
represents robust accuracy under RayS attack for L_inf norm perturbation strengthepsilon=0.031
(8/255). For truly robust models, this value could be larger than the reported value (using white-box attacks) due to the hard-label limitation. For the current best robust accuracy evaluation, please refers to AutoAttack, which uses an ensemble of four white-box/black-box attacks.ADBD
represents our proposed Average Decision Boundary Distance metric, which is independent to the perturbation strengthepsilon
. It reflects the overall model robustness through the lens of decision boundary distance.ADBD
can be served as a complement to the traditional robust accuracy metric. Furthermore,ADBD
only depends on hard-label output and can be adopted for cases where back-propgation or even soft-labels are not available.
Method | Natural Acc | Robust Acc (Reported) |
Robust Acc (RayS) |
ADBD |
---|---|---|---|---|
WAR (Wu et al., 2020)* |
85.6 | 59.8 | 63.2 | 0.0480 |
RST (Carmon et al., 2019)* |
89.7 | 62.5 | 64.6 | 0.0465 |
HYDRA (Sehwag et al., 2020)* |
89.0 | 57.2 | 62.1 | 0.0450 |
MART (Wang et al., 2020)* |
87.5 | 65.0 | 62.2 | 0.0439 |
UAT++ (Alayrac et al., 2019)* |
86.5 | 56.3 | 62.1 | 0.0426 |
Pretraining (Hendrycks et al., 2019)* |
87.1 | 57.4 | 60.1 | 0.0419 |
Robust-overfitting (Rice et al., 2020) |
85.3 | 58.0 | 58.6 | 0.0404 |
TRADES (Zhang et al., 2019b) |
85.4 | 56.4 | 57.3 | 0.0403 |
Backward Smoothing (Chen et al., 2020) |
85.3 | 54.9 | 55.1 | 0.0403 |
Adversarial Training (retrained) (Madry et al., 2018) |
87.4 | 50.6 | 54.0 | 0.0377 |
MMA (Ding et al., 2020) |
84.4 | 47.2 | 47.7 | 0.0345 |
Adversarial Training (original) (Madry et al., 2018) |
87.1 | 47.0 | 50.7 | 0.0344 |
Fast Adversarial Training (Wong et al., 2020) |
83.8 | 46.1 | 50.1 | 0.0334 |
Adv-Interp (Zhang & Xu, 2020) |
91.0 | 68.7 | 46.9 | 0.0305 |
Feature-Scatter (Zhang & Wang, 2019) |
91.3 | 60.6 | 44.5 | 0.0301 |
SENSE (Kim & Wang, 2020) |
91.9 | 57.2 | 43.9 | 0.0288 |
Please contact us if you want to add your model to the leaderboard.
- Python
- Numpy
- CUDA
Import RayS attack by
from general_torch_model import GeneralTorchModel
torch_model = GeneralTorchModel(model, n_class=10, im_mean=None, im_std=None)
from RayS import RayS
attack = RayS(torch_model, epsilon=args.epsilon)
where:
torch_model
is the PyTorch model under GeneralTorchModel warpper; For models using transformed images (exceed the range of [0,1]), simply setim_mean=[0.5, 0.5, 0.5]
andim_std=[0.5, 0.5, 0.5]
for instance,epsilon
is the maximum adversarial perturbation strength.
To actually run RayS attack, use
x_adv, queries, adbd, succ = attack(data, label, query_limit)
it returns:
x_adv
: the adversarial examples found by RayS,queries
: the number of queries used for finding the adversarial examples,adbd
: the average decision boundary distance for each example,succ
: indicate whether each example being successfully attacked.
- Sample usage on attacking a robust model:
- python3 attack_robust.py --dataset rob_cifar_trades --query 40000 --batch 1000 --epsilon 0.031
- You can also use
--num 1000
argument to limit the number of examples to be attacked as 1000. Defaultnum
is set as 10000 (the whole CIFAR10 testset).
To evaluate TensorFlow models with RayS attack:
from general_tf_model import GeneralTFModel
tf_model = GeneralTFModel(model.logits, model.x_input, sess, n_class=10, im_mean=None, im_std=None)
from RayS import RayS
attack = RayS(tf_model, epsilon=args.epsilon)
where:
model.logits
: logits tensor return by the Tensorflow model,model.x_input
: placeholder for model input (NHWC format),sess
: TF session .
The remaining part is the same as evaluating PyTorch models.
- Run attacks on a naturally trained model (Inception):
- python3 attack_natural.py --dataset inception --epsilon 0.05
- Run attacks on a naturally trained model (Resnet):
- python3 attack_natural.py --dataset resnet --epsilon 0.05
- Run attacks on a naturally trained model (Cifar):
- python3 attack_natural.py --dataset cifar --epsilon 0.031
- Run attacks on a naturally trained model (MNIST):
- python3 attack_natural.py --dataset mnist --epsilon 0.3
Please check our paper for technical details and full results.
@inproceedings{chen2020rays,
title={RayS: A Ray Searching Method for Hard-label Adversarial Attack},
author={Chen, Jinghui and Gu, Quanquan},
booktitle={Proceedings of the 26rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year={2020}
}
If you have any question regarding RayS attack or the ADBD leaderboard above, please contact [email protected], enjoy!