Understanding Adversarial Machine Learning Attacks

Slide Note
Embed
Share

Adversarial Machine Learning (AML) involves attacks on machine learning models by manipulating input data to deceive the model into making incorrect predictions. This includes creating adversarial examples, understanding attack algorithms, distance metrics, and optimization problems like L-BFGS. Various techniques such as Fast Gradient Sign are used to generate adversarial examples quickly. It's crucial to comprehend these attacks to enhance model robustness and security.


Uploaded on Sep 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Adversarial Adversarial Machine Learning Machine Learning He, Xiaoyi) He, Xiaoyi) Attack algorithms

  2. What is AML?

  3. Adversarial Examples Attack algorithms ? Classified as panda Small adversarial noise Classified as gibbon Who cares panda?

  4. Adversarial Examples Attack algorithms ? Small adversarial noise

  5. Outline Attack Formulation Distance metrics Attack algorithms L-BFGS Fast Gradient Sign JSMA .........

  6. Attack Attack algorithms warplane Original input Attack: find a new input (similar to original input ) but classified as another class t (untargeted or targeted) Attacker knows the classifier

  7. How to find adversarial examples Attack algorithms distance between x and x+? x+? is classified as target class t each element of x+? in [0,1] (for a valid image)

  8. Distance Metrics Attack algorithms ? and Two images: ? L0: measures the number of coordinates such that ?? ?? corresponds to the number of pixels that have been changed in an image L2: Euclidean distance , , ?? ?? ) L : max( ?1 ?1 measures maximum change to any of the elements

  9. L-BFGS Attack algorithms Initial formulation Note that these two are not equivalent optimization problems SZEGEDY, C., ZAREMBA, W., SUTSKEVER, I., BRUNA, J., ERHAN, D., GOODFELLOW, I., AND FERGUS, R. Intriguing properties of neural networks. ICLR (2013

  10. Fast Gradient Sign Attack algorithms ? is chosen to be sufficiently small so as to be undetectable fast rather than optimal GOODFELLOW, I. J., SHLENS, J., AND SZEGEDY, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

  11. JSMA Attack algorithms Jacobian-based Saliency Map Attack (JSMA) Modify x Compute ??(?) Saliency Map Until it succeeds changing the classification or reach maximum distortion PAPERNOT, N., MCDANIEL, P., JHA, S., FREDRIKSON, M., CELIK, Z. B., AND SWAMI, A. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (2016), IEEE, pp. 372 387.

  12. Attack algorithms Towards Evaluating the Robustness of Neural Networks

  13. Attack algorithms Towards Evaluating the Robustness of Neural Networks difficult to solve Initial formulation ? ? + ? = ? ? ? + ? 0 ?? ??? ???? ?? Note that these two are not equivalent optimization problems

  14. Attack algorithms Towards Evaluating the Robustness of Neural Networks Try to find a good f ?6is the best one !

  15. Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack ??+ ??=1 Change of variables 2(tanh??+ 1) Optimized with gradient descent

  16. Attack algorithms Towards Evaluating the Robustness of Neural Networks L2Attack

  17. Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack: try to find some pixels don t have much effect on classifier output Performe the L2 attack to find a adversarial example x + ? allowed (to be changed) set initial: all points of input image x Compute ? = ?? ? + ? , f objective function in L2 attack ? = ??????????? remove i from allowed set Until the L2 adversary fails to find an adversarial example.

  18. Attack algorithms Towards Evaluating the Robustness of Neural Networks L0Attack

  19. Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack only penalizes the largest entry At each iteration, solve := *0.9 if all ?i< , else terminate the search

  20. Attack algorithms Towards Evaluating the Robustness of Neural Networks L Attack: an iterative attack

  21. L2Attack L0Attack L Attack

  22. Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation

  23. Attack algorithms Towards Evaluating the Robustness of Neural Networks Evaluation

  24. Defense against adversarial Defense against adversarial examples examples Chong Xiang

  25. Training Process / Adversarial Retraining Refining model Network Structure / Regularization Adversarial Example Detection

  26. Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

  27. ICLR2015

  28. Adversarial Retraining Adding adversarial examples to the training set e.g.

  29. S&P2016

  30. Smooth the model

  31. Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

  32. a network k_r is more robust or harder to be fooled than network k, if:

  33. Center Loss centrality L2-Softmax Loss centrality compactness

  34. ICLR2018

  35. Generating perturbation

  36. Optimization with penalization Most interesting part! Regularization term rewrite

  37. Training Process / Adversarial Retraining Network Structure / Regularization Adversarial Example Detection

  38. Adversarial Example Detection Most Na ve idea: to train a binary classifier to distinguish between original and adversarial examples. Problems: 1.The binary classifier is also a machine learning model which can be attacked. 2.That will incur great computational cost

  39. CCS2018

  40. Preliminary: Autoencoder Minimize reconstruction error

  41. Detector Reformer

  42. Detector (1) Detector based on reconstruction error (2) Detector based on probability divergence

  43. Reformer (1) Noise-based reformer (2) Autoencoder-based reformer

Related


More Related Content