Understanding Adversarial Machine Learning Attacks
Adversarial Machine Learning (AML) involves attacks on machine learning models by manipulating input data to deceive the model into making incorrect predictions. This includes creating adversarial examples, understanding attack algorithms, distance metrics, and optimization problems like L-BFGS. Various techniques such as Fast Gradient Sign are used to generate adversarial examples quickly. It's crucial to comprehend these attacks to enhance model robustness and security.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Adversarial Adversarial Machine Learning Machine Learning He, Xiaoyi) He, Xiaoyi) Attack algorithms
Adversarial Examples Attack algorithms ? Classified as panda Small adversarial noise Classified as gibbon Who cares panda?
Adversarial Examples Attack algorithms ? Small adversarial noise
Outline Attack Formulation Distance metrics Attack algorithms L-BFGS Fast Gradient Sign JSMA .........
Attack Attack algorithms warplane Original input Attack: find a new input (similar to original input ) but classified as another class t (untargeted or targeted) Attacker knows the classifier
How to find adversarial examples Attack algorithms distance between x and x+? x+? is classified as target class t each element of x+? in [0,1] (for a valid image)
Distance Metrics Attack algorithms ? and Two images: ? L0: measures the number of coordinates such that ?? ?? corresponds to the number of pixels that have been changed in an image L2: Euclidean distance , , ?? ?? ) L : max( ?1 ?1 measures maximum change to any of the elements
L-BFGS Attack algorithms Initial formulation Note that these two are not equivalent optimization problems SZEGEDY, C., ZAREMBA, W., SUTSKEVER, I., BRUNA, J., ERHAN, D., GOODFELLOW, I., AND FERGUS, R. Intriguing properties of neural networks. ICLR (2013
Fast Gradient Sign Attack algorithms ? is chosen to be sufficiently small so as to be undetectable fast rather than optimal GOODFELLOW, I. J., SHLENS, J., AND SZEGEDY, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
JSMA Attack algorithms Jacobian-based Saliency Map Attack (JSMA) Modify x Compute ??(?) Saliency Map Until it succeeds changing the classification or reach maximum distortion PAPERNOT, N., MCDANIEL, P., JHA, S., FREDRIKSON, M., CELIK, Z. B., AND SWAMI, A. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (2016), IEEE, pp. 372 387.
Attack algorithms Towards Evaluating the Robustness of Neural Networks
Attack algorithms Towards Evaluating the Robustness of Neural Networks difficult to solve Initial formulation ? ? + ? = ? ? ? + ? 0 ?? ??? ???? ?? Note that these two are not equivalent optimization problems
Attack algorithms Towards Evaluating the Robustness of Neural Networks Try to find a good f ?6is the best one !
Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack ??+ ??=1 Change of variables 2(tanh??+ 1) Optimized with gradient descent
Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack
Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack: try to find some pixels don t have much effect on classifier output Performe the L2 attack to find a adversarial example x + ? allowed (to be changed) set initial: all points of input image x Compute ? = ?? ? + ? , f objective function in L2 attack ? = ??????????? remove i from allowed set Until the L2 adversary fails to find an adversarial example.
Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack
Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack only penalizes the largest entry At each iteration, solve := *0.9 if all ?i< , else terminate the search
Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack
L2Attack L0Attack L Attack
Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation
Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation
Defense against adversarial Defense against adversarial examples examples Chong Xiang
Training Process / Adversarial Retraining Refining model Network Structure / Regularization Adversarial Example Detection
Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection
Adversarial Retraining Adding adversarial examples to the training set e.g.
Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection
a network k_r is more robust or harder to be fooled than network k, if:
Center Loss centrality L2-Softmax Loss centrality compactness
Optimization with penalization Most interesting part! Regularization term rewrite
Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection
Adversarial Example Detection Most Na ve idea: to train a binary classifier to distinguish between original and adversarial examples. Problems: 1.The binary classifier is also a machine learning model which can be attacked. 2.That will incur great computational cost
Preliminary: Autoencoder Minimize reconstruction error
Detector Reformer
Detector (1) Detector based on reconstruction error (2) Detector based on probability divergence
Reformer (1) Noise-based reformer (2) Autoencoder-based reformer