Context-Aware Malware Detection Using GANs in Signal Systems

 
Student:
Student Email:
Faculty:
Faculty Email:
AFRL Sponsor:
AFRL Directorate:
 
Context Aware Malware Detection
using GAN’s
 
David Wood
 
woodd6@udayton.edu
 
Dr. Keigo Hirakawa
 
khirakawa1@udayton.edu
 
Dr. David Kapp (RYWA)
 
RY
 
1)
Goal: Design a binary classifier to 
detect malware from system behavior (Context)
2)
Challenges:
We want to be prepared for 
detecting novel trojan malware
.
Signature based malware detection will fail.
3)
Approach:
Anomaly detection (without malware signatures)
Use limited examples of valid system data
Use GAN to promote the clustering of the normal system behavior in feature space.
 
Problem Statement
 
Identification of malware within the supply chain of a signal/sensor system.
 
Binary Classifier (Signature-Based)
 
Train on the 
normal
 and 
compromised
 system behaviors
Pros
Can cluster the 
normal system
 and 
known malware
Cons
Does not cluster 
unknown malware
False Negatives (FN) can be high.
 
Train on only 
compromised
 system behaviors
Pros
Can cluster 
known malware
Cons
Does not cluster 
unknown malware
False Negatives (FN) can be high.
Does not cluster 
normal system 
behavior
False Positives (FP) from misclassification of System
 
 
Our Goal: Anomaly Detection
 
Trains on 
normal system
 behavior
Pros
Can still detect 
unknown malware
Clusters the 
normal system
 behavior
Low false positives (FP)
Cons
Does not  cluster 
unknown malware
False negatives, maliciousness low
 
In signal systems…
Small perturbations can be thought of as noise
Large perturbations are attacks with consequence.
 
Our Solution: Generative Adversarial Network (GAN)
 
Generator (Proxy Malware)
Is used to produce fakes that alters the system behavior: Yg =gen(X,Ys).
Wants to produce fakes that behave like malware, in the form of hiding, so it’s
given input/output of the real system.
 
Discriminator…
Learns to detect fake samples from real samples.
Forces the generator to continue to produce fakes close to the real system or
they will be detected.
Eventually the fake and real samples will be so similar that the impact of the fakes would be
negligible to the the real performance.
Is now trained as a detector.
 
Impact: The discriminator has more samples to learn the possible feature space
of the normal system and 
detect compromised systems, and anomalies.
 
To promote clustering...
Generator designed to learn the classification
boundary
Discriminator learns to classify normal system and
generator behavior
The classification boundary will tighten over time.
In signal Systems…
The Trojan is ineffective outside the classification
boundary
The Trojan is ineffective at center of the cluster
The Trojan is most effective just inside the
classification boundary (maximize attack while hiding)
 
Our Solution: Generative
Adversarial Network (GAN)
 
The generator is mimicking ideal Trojan attack
 
VirusShare – Dynamic malware dataset provided by UCI Machine
Learning Repository
Dynamic features of VirusShare Executables Data Set
Malicious executables ran on Cuckoo Sandbox for 1 min,
recording all system calls.
Each sample is provided with a maliciousness score from the
Cuckoo Sandbox Score[1]
107888 samples
, with a 
maliciousness
 
score of 0.0 to 1.0
,
482 measurements 
(measured occurrences of system calls)
per sample
 
The 482 measurements are what the generator will train on, trying to
recreate similar samples and calls that happen for that sample group
.
 
VirusShare Dataset
 
Plot showing the performance of the generator successfully learning
the Normal System Features.
 
[1] Huynh, Ngoc Anh, Wee Keong Ng, and Kanishka Ariyapala. "A new adaptive learning algorithm and its application to online malware detection." 
International Conference on Discovery Science
. Springer, Cham, 2017.
Counts
Measurements
 
Designing the Experiment
 
Normal System (Proxy)
VirusShare lacks a clean dataset 
of calls made in a 1 min
interval without any malware being run.
Low maliciousness (blue) will be treated as a normal system
proxy with maliciousness scores ranging from 
0 to .25.
This group will be split into a training set and a test set.
The training set is the set the generator will try to mimic.
Compromised System
For the first test set, only high malicious samples will be used
to test the model with scores from 
0.75 to 1.0
 
Histogram showing the distributions of samples from the the VirusShare dataset,
based on their maliciousness scores.
 
Discriminator Performance
 
Validation was done by taking 1000 samples from the normal
system group, and 1000 samples from the malicious system
group and using the Discriminator model to predict their labels.
Score vs Maliciousness
As predicted the model did very well at detecting malicious
malware and labeling it as outside of the group (0).
The testing set was almost entirely classified as part of the
system set, with a few false positive (0 instead of 1)
ROC Curve
The AUC of the ROC plot was .993 meaning that the model
was almost completely successful in classifying every label
correctly.
 
 
True Neg
True Neg
 
True Pos
True Pos
 
False Neg
False Neg
 
Conclusion
 
Contributions
Proposed Minimax GAN
The generator as a proxy for malware, trains to find the boundary.
The discriminator is our anomaly detector.
Trains only on normal system behavior.
Verified the Minimax GAN using real world malware data.
Can detect unknown malware.
Corresponds well to maliciousness score.
This is a more resilient method for defending against new
and unknown malware attacks.
Future Work
Optimized network architecture.
Provide interpretability to the discriminator.
Work to study different sets of malware data, with benign
system signal data.
 
Slide Note
Embed
Share

This project focuses on detecting malware within signal/sensor systems using a Generative Adversarial Network (GAN) approach. By training on normal system behavior and generating fake malware-like samples, the system can effectively identify anomalies without relying on signature-based methods. The GAN framework helps in clustering normal behavior, detecting compromised systems, and reducing false positives, ultimately enhancing the security of signal systems against novel trojan malware.

  • Malware Detection
  • GAN
  • Signal Systems
  • Anomaly Detection
  • Cybersecurity

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Context Aware Malware Detection using GAN s Student: David Wood Student Email: woodd6@udayton.edu Faculty: Dr. Keigo Hirakawa Faculty Email: khirakawa1@udayton.edu AFRL Sponsor: Dr. David Kapp (RYWA) AFRL Directorate: RY PA #: AFRL-2022-5644

  2. Problem Statement Identification of malware within the supply chain of a signal/sensor system. 1) 2) Goal: Design a binary classifier to detect malware from system behavior (Context) Challenges: We want to be prepared for detecting novel trojan malware. Signature based malware detection will fail. Approach: Anomaly detection (without malware signatures) Use limited examples of valid system data Use GAN to promote the clustering of the normal system behavior in feature space. 3)

  3. Binary Classifier (Signature-Based) Train on only compromised system behaviors Pros Can cluster known malware Cons Does not cluster unknown malware False Negatives (FN) can be high. Does not cluster normal system behavior False Positives (FP) from misclassification of System Train on the normal and compromised system behaviors Pros Can cluster the normal system and known malware Cons Does not cluster unknown malware False Negatives (FN) can be high.

  4. Our Goal: Anomaly Detection Trains on normal system behavior Pros Can still detect unknown malware Clusters the normal system behavior Low false positives (FP) Cons Does not cluster unknown malware False negatives, maliciousness low In signal systems Small perturbations can be thought of as noise Large perturbations are attacks with consequence.

  5. Our Solution: Generative Adversarial Network (GAN) Generator (Proxy Malware) Is used to produce fakes that alters the system behavior: Yg =gen(X,Ys). Wants to produce fakes that behave like malware, in the form of hiding, so it s given input/output of the real system. Discriminator Learns to detect fake samples from real samples. Forces the generator to continue to produce fakes close to the real system or they will be detected. Eventually the fake and real samples will be so similar that the impact of the fakes would be negligible to the the real performance. Is now trained as a detector. Impact: The discriminator has more samples to learn the possible feature space of the normal system and detect compromised systems, and anomalies.

  6. Our Solution: Generative Adversarial Network (GAN) To promote clustering... Generator designed to learn the classification boundary Discriminator learns to classify normal system and generator behavior The classification boundary will tighten over time. In signal Systems The Trojan is ineffective outside the classification boundary The Trojan is ineffective at center of the cluster The Trojan is most effective just inside the classification boundary (maximize attack while hiding) The generator is mimicking ideal Trojan attack

  7. VirusShare Dataset VirusShare Dynamic malware dataset provided by UCI Machine Learning Repository Dynamic features of VirusShare Executables Data Set Malicious executables ran on Cuckoo Sandbox for 1 min, recording all system calls. Each sample is provided with a maliciousness score from the Cuckoo Sandbox Score[1] 107888 samples, with a maliciousnessscore of 0.0 to 1.0, 482 measurements (measured occurrences of system calls) per sample Counts Measurements Plot showing the performance of the generator successfully learning the Normal System Features. The 482 measurements are what the generator will train on, trying to recreate similar samples and calls that happen for that sample group. [1] Huynh, Ngoc Anh, Wee Keong Ng, and Kanishka Ariyapala. "A new adaptive learning algorithm and its application to online malware detection." International Conference on Discovery Science. Springer, Cham, 2017.

  8. Designing the Experiment Normal System (Proxy) VirusShare lacks a clean dataset of calls made in a 1 min interval without any malware being run. Low maliciousness (blue) will be treated as a normal system proxy with maliciousness scores ranging from 0 to .25. This group will be split into a training set and a test set. The training set is the set the generator will try to mimic. Compromised System For the first test set, only high malicious samples will be used to test the model with scores from 0.75 to 1.0 Histogram showing the distributions of samples from the the VirusShare dataset, based on their maliciousness scores.

  9. True Neg Discriminator Performance Validation was done by taking 1000 samples from the normal system group, and 1000 samples from the malicious system group and using the Discriminator model to predict their labels. True Pos False Neg Score vs Maliciousness As predicted the model did very well at detecting malicious malware and labeling it as outside of the group (0). The testing set was almost entirely classified as part of the system set, with a few false positive (0 instead of 1) ROC Curve The AUC of the ROC plot was .993 meaning that the model was almost completely successful in classifying every label correctly.

  10. Conclusion Contributions Proposed Minimax GAN The generator as a proxy for malware, trains to find the boundary. The discriminator is our anomaly detector. Trains only on normal system behavior. Verified the Minimax GAN using real world malware data. Can detect unknown malware. Corresponds well to maliciousness score. This is a more resilient method for defending against new and unknown malware attacks. Future Work Optimized network architecture. Provide interpretability to the discriminator. Work to study different sets of malware data, with benign system signal data.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#