Authors: Pengcheng Li,Jinfeng Yi,Lijun Zhang
ArXiv: 1809.04913
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1809.04913v1
Deep neural network (DNN) as a popular machine learning model is found to be
vulnerable to adversarial attack. This attack constructs adversarial examples
by adding small perturbations to the raw input, while appearing unmodified to
human eyes but will be misclassified by a well-trained classifier. In this
paper, we focus on the black-box attack setting where attackers have almost no
access to the underlying models. To conduct black-box attack, a popular
approach aims to train a substitute model based on the information queried from
the target DNN. The substitute model can then be attacked using existing
white-box attack approaches, and the generated adversarial examples will be
used to attack the target DNN. Despite its encouraging results, this approach
suffers from poor query efficiency, i.e., attackers usually needs to query a
huge amount of times to collect enough information for training an accurate
substitute model. To this end, we first utilize state-of-the-art white-box
attack methods to generate samples for querying, and then introduce an active
learning strategy to significantly reduce the number of queries needed.
Besides, we also propose a diversity criterion to avoid the sampling bias. Our
extensive experimental results on MNIST and CIFAR-10 show that the proposed
method can reduce more than $90\%$ of queries while preserve attacking success
rates and obtain an accurate substitute model which is more than $85\%$ similar
with the target oracle.