Automated Signature Extraction for High Volume Attacks in Cybersecurity

Slide Note
Embed
Share

This research delves into automated signature extraction for high-volume attacks in cybersecurity, specifically focusing on defending against Distributed Denial of Service (DDoS) attacks. The study discusses the challenges posed by sophisticated attackers using botnets and zero-day attacks, emphasizing the importance of rapidly identifying and mitigating attacks through automated signature detection. Various defense strategies and technologies are explored, highlighting the role of signatures in DDoS attack detection and prevention. The system overview provides insights into the collection and analysis of attack signatures in real-time to enhance cybersecurity defense mechanisms.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Automated Signature Extraction for High Volume Attacks YehudaAfek Anat Bremler-Barr Shir Landau Feibish This work is part of the Kabarnit Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.

  2. Current DDoSAttack Zombies on innocent computers Infrastructure- level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks 2

  3. High volume attacks - Current Defense Many different types of attackers: Remaining attacks: Botnets (millions of computers) Hard to identify behaviorally, under the radar screen Zero-day no known signatures Defense Line1 Defense Line 2 Defense Line 3 Defense Line n Call for HELP!! access control list filtering SYN cookies, Challenge- response behavioral analysis 3

  4. Signature based DDoSAttack Detection Unknown (zero-day) attacks: Some hope: Attack tools usually leave some unique footprint (repeating pattern) Example in packet: Connection: KEEP-ALIVE Today: Find signatures manually (human eye) Our goal: Find it automatically Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks 4

  5. Signatures also used in NIDS/IPS (Snort, Bro, etc.) Worm detection (automated extraction) Previous work: Worm behavior (address dispersion, suspicious code, etc.) Fixed-length signatures Non-scalable Notable works: Kephart et al 94 Honeycomb [Kreibich et al 04] Earlybird [Singh et al 04] Autograph[Kim et al 04] Hancock[Griffin et al 09] 5

  6. System Overview Peace time traffic sample Signature Extraction Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Our Challenge: Automatically find signatures that appear frequently only during attack Where: Input collection: In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) In the cloud collect data from several collectors. 6

  7. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic 7

  8. Our Goal Automatically find signatures that appear frequently only during attack Requirements: 1. Find minimal set of signatures Some filtering devices have limited capacity 2. Allow signatures of varying lengths 3. Don t include signatures found in legitimate traffic Minimum false positives 4. Minimize space and time usage Large amounts of data Quick response 8

  9. Finding Frequent Strings in Traffic Input: Sequence of packets Output: Strings that appear frequently in packets Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Common Stringology solution: use suffix trees/arrays too much space Our solution uses heavy hitters 9

  10. Heavy Hitters (Frequent Items) Input: N values, integer v Output: v values each appearing at least N/v times Approximate solution: Uses O(v) space! One pass over input! Known counter based HH Algorithms: Misra & Gries 1982 Lossy Counting Monku and Motwani 2002 Space saving - Metwally et al 2005 currently using 10

  11. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. Input 10 22 30 10 35 50 value 10 22 30 counter 1 1 1 11

  12. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. If next value x is one of the v, increment its counter. Input 10 22 30 10 35 50 value 10 22 30 counter 2 1 1 12

  13. Space saving Heavy Hitters [Metwally et al 2005] Algorithm: Maintain v values, and their counters. If next value x is one of the v, increment its counter. Else take item with minimal counter c: Replace value with x New counter is c+1 Input 10 22 30 10 35 50 value 10 35 30 counter 2 2 1 Error rate: N/v 13

  14. Our Solution Heavy hitters usually done on numbers how do we use it for text? abcabcadefgfsdghjghnfdghfgsdhfjs k-grams: strings of length exactly k b1=abca k-grams b2= bcab Trivial idea: For each packet: Take all k-grams (sliding window) Do Heavy hitters on them b3= cabc Fixed length not good enough Either too short: cuts up longer signatures Substring pollution -Too many heavy hitters for one signature Or too long : noisy signatures 14

  15. Our Solution: Double Heavy Hitters Double Heavy Hitters algorithm: two separate instances of heavy hitters Heavy Hitters 1: Find heavy hitters of k-grams Heavy Hitters 2: Find heavy hitters of varying-length strings created during run of Heavy Hitters 1 Heavy Hitters 1 Heavy Hitters 2 Input to Heavy Hitters 1: k-grams Input to Heavy Hitters 2: strings Output is output of Heavy Hitters 2 k k . k k k string string string k string k k string 15

  16. Double Heavy Hitters Algorithm While processing k-grams in Heavy Hitters1 Find max run of k-grams: Already in Heavy Hitters 1 Counters of consecutive k-grams maintain predefined ratio Create string Insert into Heavy Hitters 2 k-grams: bcab abca cabc dabc bcab abca cabc abcd cdab abca bcda Is already in Heavy Hitters 1? N N N Y Y Y N N N N Y abca abcabc Check ratio 16

  17. Double Heavy Hitters Algorithm Example: Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 1 1 1 counter 0 0 0 17

  18. Double Heavy Hitters Algorithm Example: String = abca Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 1 1 counter 0 0 0 18

  19. Double Heavy Hitters Algorithm Example: String = abcab Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b7 K-gram abca bcab cabc abca bcab cabc abcd Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 2 1 counter 0 0 0 19

  20. Double Heavy Hitters Algorithm Example: String = abcabc Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abca bcab cabc Heavy Hitters 2 string NULL NULL NULL counter 2 2 2 counter 0 0 0 20

  21. Double Heavy Hitters Algorithm Example: String = abcabc Input: abcabcabcd bi b1 b2 b3 b4 b5 b6 b6 b7 bi b1 b2 b3 b4 b5 K-gram abca bcab cabc abca bcab cabc cabc abcd K-gram abca bcab cabc abca bcab Heavy Hitters 1 K-gram abcd bcab cabc Heavy Hitters 2 string abcabc NULL NULL counter 3 2 2 counter 1 0 0 21

  22. Heavy Hitters on text improving the estimation Problem: substrings in heavy hitters Only longest run is in input to HH2 Heavy Hitters 2 string wonder woman wonderwoman counter 200 300 100 Correct the count: After run of algorithm For all strings s in Heavy Hitters 2: Find other strings which contain s and add their counters to s s counter Heavy Hitters 2 string counter Real counter wonder 200 300 woman 300 400 wonderwoman 100 100 22

  23. Double Heavy Hitters Algorithm Analysis Input: Input to HH1: N k-grams Input to HH2: C consecutive grams Error bounds: For HH1with v items: N/v For HH2with v items: C/v We Prove: C N/(k + 1) Overall: Error bound of the Double Heavy Hitters algorithm 23

  24. Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: keep-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Formalize with thresholds 24

  25. Chose Signatures Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High 25

  26. Chose Signatures Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace time Signatures False positives 26

  27. Chose Signatures Create signatures that rarely appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace with frequency > Peace-Low Signatures False positives 27

  28. Chose Signatures Create signatures that may appear in legitimate traffic, but appear in attack traffic much more Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High frequency > Peace-high frequency > Peace-Low Signatures Signatures only if attack frequency at least delta more than peace frequency False positives 28

  29. Use peace traffic to create filters Use our Double Heavy Hitters algorithm on peace time traffic: 100% Peace time traffic packets payload: White list frequency > Peace-high abcabcadefgfsdghjghnfdghfg...... b2= bcab b3= cabc b1=abca Peace-high Double Heavy Hitters Algorithm Maybe white list Output values 50% frequency > Peace-high frequency > Peace-Low Peace-low Not white list 0% 29

  30. Extracting Attack Signatures Now use Double Heavy Hitters algorithm on attack time traffic with filters Attack traffic packets payload: hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas b3= gdhd b2= agdh b1=hagd Heavy Hitters 1 Heavy Hitters 2 Output values Signatures string Maybe white list: White list: discard if contained in whitelist string frequency > Attack- High Modified DHH 30

  31. Evaluations Overall eleven tests: Ten real attack captures 5 captures of peacetime traffic 5 synthetic peacetime captures One Synthetic attack in real peace time traffic Compare to human expert 31

  32. Could not be identified manually Sample Signatures Extra newline between header fields Use of upper-case characters, where usually lower Use of a rarely used HTTP field Use of rare user agent. 32

  33. Results Accuracy of Double Heavy Hitters estimation Algorithm (DHH) Actual Count (frequency) 100 90 80 70 Percent 60 50 40 30 20 10 0 1 5 9 13 17 21 25 29 33 37 Signatures Graph of frequency of signatures RED Actual count (frequency) in attack traffic BLUE Algorithm (DHH) estimation of frequency of signatures 33

  34. Results -Attack Rate Estimation Tests with synthetic peace time traffic Tests with real peace time traffic 100 90 80 70 Attack rate 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 Human Expert Our Algorithm Test Number 34

  35. Results Recall and Precision Estimation Precision: relevant packets from all identified Tests with real peace time traffic Tests with synthetic peace time traffic 100 Recall: identified packets from all relevant Average: 99.96 Worst case: 99.8 90 80 70 Percent 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 Peacetime based Attack based Test Number 35

  36. Future Work Identify signatures always found in same packets Good synthetic peace-time traffic, global white-list Support regular expression signatures 36

  37. 37

Related