Understanding Adversarial Threats in Machine Learning
This document explores the world of adversarial threats in machine learning, covering topics such as attack nomenclature, dimensions in adversarial learning, influence dimension, causative and exploratory approaches in attacks, and more. It delves into how adversaries manipulate data or models to compromise machine learning systems, emphasizing the importance of developing strategies to counter such threats.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Introduction to Adversarial Threats of Machine Learning Shameek Bhattacharjee Western Michigan University
Adversarial ML Attack Nomenclature Adversarial ML Attack Nomenclature Attacks launched during training phase Attacks launched during testing phase Causative Approach: Alters training process Exploratory Approach: Steal info. about training data or discover info on learning /decision model (1)Escape detector by knowing the undetectable strategy space of a mechanism Evasion (2) Escape classifier by biasing the mechanism s learning capability Main Types: (1) Data Poisoning Attacks and Feedback Weaponizing Attacks (uses causative approach) (2) Model Stealing (uses exploratory approach) Main Types: (1) Mutated Inputs (2) Zero Day Inputs This classification is based on what stage of learning Is compromised and how
Dimensions in Adversarial Learning Dimensions in Adversarial Learning Adversarial learning characterizing it along three dimensions (1) Influence (Based on adversaries behavior and defender s behavior ) (2) Specificity (Based on whether the attack impact was Random to varying levels of targeted impact) (3) Security Violation (Based on whether it violates confidentiality, integrity, or availability)
Influence Dimension Most relevant and widely researched dimension for adversarial learning Characterizes the adversary based on its behavior Objective is to develop appropriate learner strategies to counter the adversary s behavior Influence dimensions study two types of attacks causative and exploratory
Causative Approach Causative Approach Adversary acquires data used to train the learner s classifier The adversary modifies this data. This modified data, called adversarial example, is then used by the learner during further training of its classifier. This results in the learner learning an incorrect classifier that gives classification errors (false positives and false negatives) while testing or while using the classifier. Mainly data poisoning attack that have two variations: model skewing and feedback weaponizing
Exploratory Approach Exploratory Approach Sometimes referred to as Black Box or Gray Box attacks This applies to ML/AI as a service applications The adversary sends in inputs and observes the output generated from the ML/AI application Since ML is about functional approximation, the adversary is able to learn a surrogate approximation model that mimics what the ML/AI model does In essence, the adversary stole the model from the company running the ML service The stolen model could be further used to improve effectiveness of causative and evasion attacks.
Variations of Poisoning Attacks Variations of Poisoning Attacks Specially crafted inputs developed to prevent an attack detection framework from learning correct parameters (a) Model skewing: pollute training data such that boundary between what the classifier categorizes as good data and bad, shifts in his favor. (b) Feedback Weaponization: attempts to abuse a feedback mechanism to manipulate the system toward mis- classifying good content as abusive (e.g., competitor content or part of revenge), or abusive content as good. Variations of Evasion Attacks Variations of Evasion Attacks specially crafted attack inputs that have been developed with the aim of being misclassified in order to evade an attack detection framework in the testing phase. (a) mutated inputs: which are variations of a known attack specifically engineered to avoid your classifier, (b) zero-day inputs: which are never-seen-before payloads. Variations of Model Stealing Attacks Variations of Model Stealing Attacks (a) Model reconstruction: steal (i.e., duplicate) models specific private information to launch evasion attacks. (b) Membership leakage: recover training data membership via Blackbox probing to know whether a given input record was used to train the model and refine future attacks
Based on Adversaries Knowledge of ML algorithm Based on Adversaries Knowledge of ML algorithm Black Box Approach to Attacks Black Box Approach to Attacks Attackers do not have knowledge the specifics (model and its parameters) of the AI method being targeted Typically, uses model stealing to incrementally acquire an idea about targeted ML/AI approach Crafts poisoning or evasion attacks based on that idea White Box Approach to Attacks White Box Approach to Attacks Attackers know the specifics (model and/or parameters) of the AI method being targeted Crafts poisoning or evasion stealing attacks that usually create optimal or suboptimal attacks with fast computation attacks Analyzes worst case attack impacts possible Gray Box Approach to Attacks Gray Box Approach to Attacks Attackers know some details about either model, parameter, loss functions Analyzes based on varying likelihood of what might be known to the attackers case Often introduces some rationality assumptions
Relating Attack Knowledge and Attack Impact Specificity Specificity Axis of ML Attacks Least Specific Attack Impact Most Specific Attack Impact WHITE BOX (complete knowledge) Knowledge Axis of ML attacks Pic from: Papernot et.al. Euro S & P 2017 BLACK BOX (no knowledge
Based on Type of Security Violation Based on Type of Security Violation Type 1 Integrity of Attack Detector Violation Allowing harmful instances to bypass the detector as false negatives (or missed detection); similar to evasion Type 2: Availability of System Violation due to detector s existence Create an event where benign observations are incorrectly detected as harmful (or false alarms), rendering the system to be virtually unusable, since it generates too many alarms. Type 3: Confidentiality of Detection Model Violated using a detector s responses (if available to the attacker) to infer information used in the learning process (a privacy violation); a technique used in black box mode of attack More will follow when we do the case studies.