Understanding Adversarial Machine Learning Attacks

Slide Note

Adversarial Machine Learning (AML) involves attacks on machine learning models by manipulating input data to deceive the model into making incorrect predictions. This includes creating adversarial examples, understanding attack algorithms, distance metrics, and optimization problems like L-BFGS. Various techniques such as Fast Gradient Sign are used to generate adversarial examples quickly. It's crucial to comprehend these attacks to enhance model robustness and security.

sand_379 Follow

Uploaded on Sep 06, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Adversarial Adversarial Machine Learning Machine Learning He, Xiaoyi) He, Xiaoyi) Attack algorithms

What is AML?

Adversarial Examples Attack algorithms ? Classified as panda Small adversarial noise Classified as gibbon Who cares panda?

Adversarial Examples Attack algorithms ? Small adversarial noise

Outline Attack Formulation Distance metrics Attack algorithms L-BFGS Fast Gradient Sign JSMA .........

Attack Attack algorithms warplane Original input Attack: find a new input (similar to original input ) but classified as another class t (untargeted or targeted) Attacker knows the classifier

How to find adversarial examples Attack algorithms distance between x and x+? x+? is classified as target class t each element of x+? in [0,1] (for a valid image)

Distance Metrics Attack algorithms ? and Two images: ? L0: measures the number of coordinates such that ?? ?? corresponds to the number of pixels that have been changed in an image L2: Euclidean distance , , ?? ?? ) L : max( ?1 ?1 measures maximum change to any of the elements

L-BFGS Attack algorithms Initial formulation Note that these two are not equivalent optimization problems SZEGEDY, C., ZAREMBA, W., SUTSKEVER, I., BRUNA, J., ERHAN, D., GOODFELLOW, I., AND FERGUS, R. Intriguing properties of neural networks. ICLR (2013

Fast Gradient Sign Attack algorithms ? is chosen to be sufficiently small so as to be undetectable fast rather than optimal GOODFELLOW, I. J., SHLENS, J., AND SZEGEDY, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

JSMA Attack algorithms Jacobian-based Saliency Map Attack (JSMA) Modify x Compute ??(?) Saliency Map Until it succeeds changing the classification or reach maximum distortion PAPERNOT, N., MCDANIEL, P., JHA, S., FREDRIKSON, M., CELIK, Z. B., AND SWAMI, A. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (2016), IEEE, pp. 372 387.

Attack algorithms Towards Evaluating the Robustness of Neural Networks

Attack algorithms Towards Evaluating the Robustness of Neural Networks difficult to solve Initial formulation ? ? + ? = ? ? ? + ? 0 ?? ??? ???? ?? Note that these two are not equivalent optimization problems

Attack algorithms Towards Evaluating the Robustness of Neural Networks Try to find a good f ?6is the best one !

Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack ??+ ??=1 Change of variables 2(tanh??+ 1) Optimized with gradient descent

Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack

Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack: try to find some pixels don t have much effect on classifier output Performe the L2 attack to find a adversarial example x + ? allowed (to be changed) set initial: all points of input image x Compute ? = ?? ? + ? , f objective function in L2 attack ? = ??????????? remove i from allowed set Until the L2 adversary fails to find an adversarial example.

Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack

Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack only penalizes the largest entry At each iteration, solve := *0.9 if all ?i< , else terminate the search

Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack

L2Attack L0Attack L Attack

Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation

Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation

Defense against adversarial Defense against adversarial examples examples Chong Xiang

Training Process / Adversarial Retraining Refining model Network Structure / Regularization Adversarial Example Detection

Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

ICLR2015

Adversarial Retraining Adding adversarial examples to the training set e.g.

S&P2016

Smooth the model

Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

a network k_r is more robust or harder to be fooled than network k, if:

Center Loss centrality L2-Softmax Loss centrality compactness

ICLR2018

Generating perturbation

Optimization with penalization Most interesting part! Regularization term rewrite

Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

Adversarial Example Detection Most Na ve idea: to train a binary classifier to distinguish between original and adversarial examples. Problems: 1.The binary classifier is also a machine learning model which can be attacked. 2.That will incur great computational cost