Automated Signature Extraction for High Volume Attacks in Cybersecurity

undefined
Automated Signature Extraction for High
Volume Attacks
Yehuda Afek
Anat Bremler-Barr
Shir Landau Feibish
This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief
scientist in the Israeli ministry of Industry, Trade and Labor.
This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.
Current DDoS Attack
2
High volume attacks - Current Defense
 
3
Defense
Line1
Defense
Line 2
Defense
Line n
Defense
Line 3
Many different types of attackers:
 
Call for
 HELP!!
Remaining attacks:
 Botnets (millions of computers)
 
Hard to identify behaviorally,
under the radar screen
 
Zero-day – no known signatures
Signature based DDoS Attack Detection
 
Unknown (zero-day) attacks:
Some hope:  Attack tools usually leave some unique footprint
(repeating pattern)
Example in packet:
 
Connection:  
KEEP-ALIVE
 
Today:  Find signatures manually (human eye)
 
Our goal:  Find it automatically
 
Signatures
 used by anti-DDoS devices and firewalls to stop
attack
Mitigation in minutes,  good enough for these types of attacks
4
Signatures also used in
NIDS/IPS (Snort, Bro, etc.)
Worm detection (automated extraction)
Previous work:
Worm behavior (address dispersion, suspicious code, etc.)
Fixed-length signatures
Non-scalable
Notable works:
Kephart et al ‘94
Honeycomb [Kreibich et al ’04]
Earlybird [Singh et al ‘04]
Autograph[Kim et al ’04]
Hancock[Griffin et al ’09]
5
 
System Overview
 
Our Challenge: 
 Automatically find signatures that appear
frequently only during attack
 
 Where:
Input collection:
In mitigation box (DDoS Guard/firewall/anti-DDoS etc.)
In the cloud 
– collect data from several collectors.
 
6
Signature
Extraction
Attack time traffic sample
Peace time traffic sample
Attack signatures
e.g. 
Connection:  
KEEP-ALIVE
 
Signature Extraction - High Level
7
Attack time
traffic sample
Peace time
traffic sample
Attack signatures
e.g. 
Connection:  
KEEP-ALIVE
Signature Extraction
Find frequent
strings in 
attack
time traffic
Find frequent
strings in 
peace
time traffic
Take only
strings
found in
attack 
and
not in
peace
Our Goal
 
Automatically find signatures that appear frequently only
during attack
 
Requirements:
1.
Find minimal set of signatures
Some filtering devices have limited capacity
2.
Allow signatures of varying lengths
3.
Don’t include signatures found in legitimate traffic
Minimum false positives
4.
Minimize space and time usage
Large amounts of data
Quick response
 
8
Finding Frequent Strings in Traffic
 
Input:  Sequence of packets
Output:  Strings that appear frequently in packets
 
 
 
 
 
 
 
Common Stringology solution: use suffix trees/arrays
too much space
Our solution uses heavy hitters
9
Attack time
traffic sample
Peace time
traffic sample
Attack signatures
e.g. 
Connection:  
KEEP-ALIVE
Find frequent
strings in 
attack
time traffic
Find frequent
strings in 
peace
time traffic
Take only
strings
found in
attack 
and
not in
peace
Heavy Hitters (Frequent Items)
Input:  N values, integer v
Output:   v values each appearing at least N/v times
Approximate solution:
Uses O(v) space!
One pass over input!
Known counter based HH Algorithms:
Misra & Gries 1982
Lossy Counting – Monku and Motwani 2002
Space saving - Metwally et al 2005 – currently using
10
10
Space saving Heavy Hitters
[Metwally et al 2005]
Algorithm:
Maintain v values, and their counters.
11
11
Space saving Heavy Hitters
[Metwally et al 2005]
Algorithm:
Maintain v values, and their counters.
If next value x is one of the v, increment its counter.
12
12
Space saving Heavy Hitters
[Metwally et al 2005]
Algorithm:
Maintain v values, and their counters.
If next value x is one of the v, increment its counter.
Else take item with minimal counter c:
Replace value with x
New counter is c+1
Error rate: N/v
13
13
Our Solution
 
Heavy hitters usually done on numbers… how do we use it for
text?
 
k-grams: strings of length exactly k
 
Trivial idea: For each packet:
Take all k-grams (sliding window)
Do Heavy hitters on them
 
Fixed length not good enough
Either too short:  cuts up longer signatures
Substring pollution - Too many heavy hitters for one signature
Or too long : noisy signatures
 
14
14
abcabcadefgfsdghjghnfdghfgsdhfjs
b
1
=abca
b
2
 = bcab
b
3
 = cabc
k-grams
Our Solution: Double Heavy Hitters
Double Heavy Hitters algorithm:  two separate instances of
heavy hitters
Heavy Hitters 1: Find heavy hitters of k-grams
Heavy Hitters 2: Find heavy hitters of varying-length strings created
during run of Heavy Hitters 1
15
15
Heavy Hitters
1
k
k
….
k
k
k
k
string
k
k
Heavy Hitters
2
string
string
string
string
Input to Heavy
Hitters 1: k-grams
Input to Heavy
Hitters 2: strings
Output is
output of
Heavy
Hitters 2
Double Heavy Hitters Algorithm
While processing k-grams in Heavy Hitters1
Find max run of k-grams:
Already in Heavy Hitters 1
Counters of consecutive k-grams maintain predefined ratio
Create string
Insert into Heavy Hitters 2
16
16
abca
cabc
bcab
k-grams:
Is already
in Heavy
Hitters 1?
N
Y
Y
N
N
Y
Y
N
N
N
abca
abcabc
Check ratio
abca
cabc
bcab
abcd
bcda
cdab
dabc
abca
N
Double Heavy Hitters Algorithm
Example:
17
17
abcabcabcd
Input:
Double Heavy Hitters Algorithm
Example:
18
18
String =
abca
abcabcabcd
Input:
Double Heavy Hitters Algorithm
Example:
19
19
String =
abca
b
abcabcabcd
Input:
Double Heavy Hitters Algorithm
Example:
20
20
String =
abcab
c
abcabcabcd
Input:
Double Heavy Hitters Algorithm
Example:
21
21
String =
abcabc
abcabcabcd
Input:
Heavy Hitters on text – improving the
estimation
 
Problem: substrings in heavy hitters
Only longest run is in input to HH2
 
Correct the count:
After run of algorithm
For all strings s in Heavy Hitters 2:
Find other strings which contain s and add their counters to s’s counter
22
22
Double Heavy Hitters Algorithm Analysis
Input:
Input to HH
1
: N k-grams
Input to HH
2
: C consecutive grams
Error bounds:
For HH
1
 with v items: N/v
For HH
2
 with v items: C/v
We Prove:
C ≤ N/(k + 1)
Overall: Error bound of the Double Heavy Hitters algorithm
23
23
 
 
Signature Extraction - High Level
24
24
Formalize with
thresholds
Attack time
traffic sample
Peace time
traffic sample
Attack signatures
e.g. 
Connection:  
keep-ALIVE
Signature Extraction
Find frequent
strings in 
attack
time traffic
Find frequent
strings in 
peace
time traffic
Take only
strings
found in
attack 
and
not in
peace
Chose Signatures
Create signatures that 
never 
appear in legitimate traffic
25
25
 
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Chose Signatures
Create signatures that 
never
 appear in legitimate traffic
26
26
 
Signatures
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
False
positives
Chose Signatures
Create signatures that 
rarely
 appear in legitimate traffic
27
27
 
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Signatures
False
positives
Chose Signatures
Create signatures that may appear in legitimate traffic, but appear in attack
traffic much more
28
28
 
Strings in attack
with frequency
> Attack-High
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Signatures only if attack
frequency at least delta
more than peace frequency
False
positives
Signatures
frequency >
Peace-high
Use peace traffic to create filters
29
29
abcabcadefgfsdghjghnfdghfg......
b
1
=abca
b
2
 = bcab
b
3
 = cabc
Output
values
Peace time traffic packets payload:
White list
Maybe white
list
Not white list
Use our Double Heavy Hitters algorithm on 
peace time 
traffic:
0
%
1
0
0
%
5
0
%
Peace-high
Peace-low
frequency >
Peace-Low
frequency >
Peace-high
Extracting Attack Signatures
30
30
hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas
b
1
=hagd
b
2
 = agdh
b
3
 = gdhd
string
Output
values
Signatures
Attack traffic packets payload:
White list:
discard if
contained in
whitelist string
Maybe white list:
Now use Double Heavy Hitters algorithm on 
attack time 
traffic
with filters
Modified DHH
Evaluations
31
31
Overall eleven tests:
 Ten real attack captures
5 captures of peacetime traffic
5 
synthetic
 peacetime captures
One Synthetic attack in real peace
 
time traffic
Compare to human expert
Sample Signatures
32
32
Extra newline between header fields
Use of upper-case characters, where usually lower
Use of a rarely used HTTP field
Use of rare user agent.
 
Could not
be identified
manually
Results – Accuracy of Double Heavy Hitters
estimation
Graph of frequency of signatures
RED – Actual count (frequency) in attack traffic
BLUE – Algorithm (DHH) estimation of frequency of signatures
33
33
Percent
Signatures
Results - Attack Rate Estimation
34
34
Attack rate
Test Number
Tests with real peace time traffic
Tests with synthetic peace time traffic
Results – Recall and Precision Estimation
35
35
Tests with real
peace time traffic
Tests with synthetic
peace time traffic
Percent
Test Number
Precision: 
relevant packets from all identified
Recall: 
identified packets from
all relevant
Average: 99.96
Worst case: 99.8
Future Work
Identify signatures always found in same packets
Good synthetic peace-time traffic, global white-list
Support regular expression signatures
36
36
37
37
 
 
 
Slide Note
Embed
Share

This research delves into automated signature extraction for high-volume attacks in cybersecurity, specifically focusing on defending against Distributed Denial of Service (DDoS) attacks. The study discusses the challenges posed by sophisticated attackers using botnets and zero-day attacks, emphasizing the importance of rapidly identifying and mitigating attacks through automated signature detection. Various defense strategies and technologies are explored, highlighting the role of signatures in DDoS attack detection and prevention. The system overview provides insights into the collection and analysis of attack signatures in real-time to enhance cybersecurity defense mechanisms.

  • Cybersecurity
  • DDoS Attacks
  • Automated Signature Extraction
  • Defense Strategies
  • Botnets

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Automated Signature Extraction for High Volume Attacks YehudaAfek Anat Bremler-Barr Shir Landau Feibish This work is part of the Kabarnit Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.

  2. Current DDoSAttack Zombies on innocent computers Infrastructure- level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks 2

  3. High volume attacks - Current Defense Many different types of attackers: Remaining attacks: Botnets (millions of computers) Hard to identify behaviorally, under the radar screen Zero-day no known signatures Defense Line1 Defense Line 2 Defense Line 3 Defense Line n Call for HELP!! access control list filtering SYN cookies, Challenge- response behavioral analysis 3

  4. Signature based DDoSAttack Detection Unknown (zero-day) attacks: Some hope: Attack tools usually leave some unique footprint (repeating pattern) Example in packet: Connection: KEEP-ALIVE Today: Find signatures manually (human eye) Our goal: Find it automatically Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks 4

  5. Signatures also used in NIDS/IPS (Snort, Bro, etc.) Worm detection (automated extraction) Previous work: Worm behavior (address dispersion, suspicious code, etc.) Fixed-length signatures Non-scalable Notable works: Kephart et al 94 Honeycomb [Kreibich et al 04] Earlybird [Singh et al 04] Autograph[Kim et al 04] Hancock[Griffin et al 09] 5

  6. System Overview Peace time traffic sample Signature Extraction Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Our Challenge: Automatically find signatures that appear frequently only during attack Where: Input collection: In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) In the cloud collect data from several collectors. 6

  7. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic 7

  8. Our Goal Automatically find signatures that appear frequently only during attack Requirements: 1. Find minimal set of signatures Some filtering devices have limited capacity 2. Allow signatures of varying lengths 3. Don t include signatures found in legitimate traffic Minimum false positives 4. Minimize space and time usage Large amounts of data Quick response 8

  9. Finding Frequent Strings in Traffic Input: Sequence of packets Output: Strings that appear frequently in packets Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Common Stringology solution: use suffix trees/arrays too much space Our solution uses heavy hitters 9

  10. Heavy Hitters (Frequent Items) Input: N values, integer v Output: v values each appearing at least N/v times Approximate solution: Uses O(v) space! One pass over input! Known counter based HH Algorithms: Misra & Gries 1982 Lossy Counting Monku and Motwani 2002 Space saving - Metwally et al 2005 currently using 10

  11. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. Input 10 22 30 10 35 50 value 10 22 30 counter 1 1 1 11

  12. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. If next value x is one of the v, increment its counter. Input 10 22 30 10 35 50 value 10 22 30 counter 2 1 1 12

  13. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. If next value x is one of the v, increment its counter. Else take item with minimal counter c: Replace value with x New counter is c+1 Input 10 22 30 10 35 50 value 10 35 30 counter 2 2 1 Error rate: N/v 13

  14. Our Solution Heavy hitters usually done on numbers how do we use it for text? abcabcadefgfsdghjghnfdghfgsdhfjs k-grams: strings of length exactly k b1=abca k-grams b2= bcab Trivial idea: For each packet: Take all k-grams (sliding window) Do Heavy hitters on them b3= cabc Fixed length not good enough Either too short: cuts up longer signatures Substring pollution -Too many heavy hitters for one signature Or too long : noisy signatures 14

  15. Our Solution: Double Heavy Hitters Double Heavy Hitters algorithm: two separate instances of heavy hitters Heavy Hitters 1: Find heavy hitters of k-grams Heavy Hitters 2: Find heavy hitters of varying-length strings created during run of Heavy Hitters 1 Heavy Hitters 1 Heavy Hitters 2 Input to Heavy Hitters 1: k-grams Input to Heavy Hitters 2: strings Output is output of Heavy Hitters 2 k k . k k k string string string k string k k string 15

  16. Double Heavy Hitters Algorithm While processing k-grams in Heavy Hitters1 Find max run of k-grams: Already in Heavy Hitters 1 Counters of consecutive k-grams maintain predefined ratio Create string Insert into Heavy Hitters 2 k-grams: bcab abca cabc dabc bcab abca cabc abcd cdab abca bcda Is already in Heavy Hitters 1? N N N Y Y Y N N N N Y abca abcabc Check ratio 16

  17. Double Heavy Hitters Algorithm Example: Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 1 1 1 counter 0 0 0 17

  18. Double Heavy Hitters Algorithm Example: String = abca Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 1 1 counter 0 0 0 18

  19. Double Heavy Hitters Algorithm Example: String = abcab Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b7 K-gram abca bcab cabc abca bcab cabc abcd Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 2 1 counter 0 0 0 19

  20. Double Heavy Hitters Algorithm Example: String = abcabc Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 2 2 counter 0 0 0 20

  21. Double Heavy Hitters Algorithm Example: String = abcabc Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abcd bcab cabc Heavy Hitters 2 string abcabc NULL NULL counter 3 2 2 counter 1 0 0 21

  22. Heavy Hitters on text improving the estimation Problem: substrings in heavy hitters Only longest run is in input to HH2 Heavy Hitters 2 string wonder woman wonderwoman counter 200 300 100 Correct the count: After run of algorithm For all strings s in Heavy Hitters 2: Find other strings which contain s and add their counters to s s counter Heavy Hitters 2 string counter Real counter wonder 200 300 woman 300 400 wonderwoman 100 100 22

  23. Double Heavy Hitters Algorithm Analysis Input: Input to HH1: N k-grams Input to HH2: C consecutive grams Error bounds: For HH1with v items: N/v For HH2with v items: C/v We Prove: C N/(k + 1) Overall: Error bound of the Double Heavy Hitters algorithm 23

  24. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: keep-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Formalize with thresholds 24

  25. Chose Signatures Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High 25

  26. Chose Signatures Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace time Signatures False positives 26

  27. Chose Signatures Create signatures that rarely appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace with frequency > Peace-Low Signatures False positives 27

  28. Chose Signatures Create signatures that may appear in legitimate traffic, but appear in attack traffic much more Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High frequency > Peace-high frequency > Peace-Low Signatures Signatures only if attack frequency at least delta more than peace frequency False positives 28

  29. Use peace traffic to create filters Use our Double Heavy Hitters algorithm on peace time traffic: 100% Peace time traffic packets payload: White list frequency > Peace-high abcabcadefgfsdghjghnfdghfg...... b2= bcab b3= cabc b1=abca Peace-high Double Heavy Hitters Algorithm Maybe white list Output values 50% frequency > Peace-high frequency > Peace-Low Peace-low Not white list 0% 29

  30. Extracting Attack Signatures Now use Double Heavy Hitters algorithm on attack time traffic with filters Attack traffic packets payload: hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas b3= gdhd b2= agdh b1=hagd Heavy Hitters 1 Heavy Hitters 2 Output values Signatures string Maybe white list: White list: discard if contained in whitelist string frequency > Attack- High Modified DHH 30

  31. Evaluations Overall eleven tests: Ten real attack captures 5 captures of peacetime traffic 5 synthetic peacetime captures One Synthetic attack in real peace time traffic Compare to human expert 31

  32. Could not be identified manually Sample Signatures Extra newline between header fields Use of upper-case characters, where usually lower Use of a rarely used HTTP field Use of rare user agent. 32

  33. Results Accuracy of Double Heavy Hitters estimation Algorithm (DHH) Actual Count (frequency) 100 90 80 70 Percent 60 50 40 30 20 10 0 1 5 9 13 17 21 25 29 33 37 Signatures Graph of frequency of signatures RED Actual count (frequency) in attack traffic BLUE Algorithm (DHH) estimation of frequency of signatures 33

  34. Results -Attack Rate Estimation Tests with synthetic peace time traffic Tests with real peace time traffic 100 90 80 70 Attack rate 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 Human Expert Our Algorithm Test Number 34

  35. Results Recall and Precision Estimation Precision: relevant packets from all identified Tests with real peace time traffic Tests with synthetic peace time traffic 100 Recall: identified packets from all relevant Average: 99.96 Worst case: 99.8 90 80 70 Percent 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 Peacetime based Attack based Test Number 35

  36. Future Work Identify signatures always found in same packets Good synthetic peace-time traffic, global white-list Support regular expression signatures 36

  37. 37

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#