LD-Sketch: Distributed Sketching Design for Anomaly Detection in Network Data Streams

1
L
D
-
S
k
e
t
c
h
:
 
A
 
D
i
s
t
r
i
b
u
t
e
d
 
S
k
e
t
c
h
i
n
g
 
D
e
s
i
g
n
f
o
r
 
A
c
c
u
r
a
t
e
 
a
n
d
 
S
c
a
l
a
b
l
e
 
A
n
o
m
a
l
y
D
e
t
e
c
t
i
o
n
 
i
n
 
N
e
t
w
o
r
k
 
D
a
t
a
 
S
t
r
e
a
m
s
Qun Huang
 and Patrick P. C. Lee
The Chinese University of Hong Kong, Hong Kong
INFOCOM’14
Network traffic: a stream of (key, value) tuples
Keys:
 src IPs, five-tuple flows
Value:
 # of packets, payload bytes
Heavy keys
 - classical anomalies in network traffic
Heavy hitters:
 keys with large volume in one period
e.g. SLA violation
Heavy changers:
 keys with large volume change across two
periods
e.g. DoS attacks, component failures
G
o
a
l
:
identify heavy keys in 
real time
M
o
t
i
v
a
t
i
o
n
2
C
h
a
l
l
e
n
g
e
s
3
R
e
l
a
t
e
d
 
W
o
r
k
s
Counter-based techniques
Misra-Gries algorithm
 [Misra & Gries 82]; 
Lossy Counting
 [Manku et al.
02]; 
Space Saving
 [Metwally et al. 05]; 
Probalistic Lossy Count
[Dimitropoulos et al. 08]
Only address for heavy hitter detection in single machine
Sketch-based techniques
Multi-stage filter
 [Estan et al. 03]; 
CGT
 [Cormode et al. 04]; 
Reversible
Sketch 
[Schweller et al. 06]; 
SeqHash
 [Tian et al. 07]; 
Fast Sketch 
[Liu
et al. 12]
Only work in single machine
Distributed detection
[Cormode et al. 2005]
[Manjhi et al. 2005]
[Yi et al. 2009]
Only address heavy hitter detection
4
O
u
r
 
W
o
r
k
 
5
 
A sketch technique for 
local detection
High accuracy
High speed
Low space complexity
A 
distributed detection
 scheme not only achieves
scalability 
but also improves 
accuracy
Experiments on real-world traces
P
r
o
b
l
e
m
 
F
o
r
m
u
l
a
t
i
o
n
6
A
r
c
h
i
t
e
c
t
u
r
e
7
 
Data
source
 
Data
source
 
Data
source
 
Data
source
 
Data
source
 
Local
detection
results
Final detection results
Distributed
detection
L
o
c
a
l
 
D
e
t
e
c
t
i
o
n
8
 
Update phase
 
Examine the buckets and
report heavy keys
 
Detection phase
 
LD-Sketch
I
n
s
i
d
e
 
a
 
B
u
c
k
e
t
9
10
Four cases
D
e
c
r
e
m
e
n
t
 
K
e
y
s
11
 
Step 1
D
e
c
r
e
m
e
n
t
 
K
e
y
s
12
Step 3
 
Step 2
D
y
n
a
m
i
c
 
E
x
p
a
n
s
i
o
n
13
E
s
t
i
m
a
t
e
 
T
r
u
e
 
S
u
m
 
o
r
 
C
h
a
n
g
e
14
I
d
e
n
t
i
f
y
 
H
e
a
v
y
 
K
e
y
15
Key point:
 consider keys tracked by buckets
Enumerate all buckets
A
n
a
l
y
s
i
s
16
D
i
s
t
r
i
b
u
t
e
d
 
D
e
t
e
c
t
i
o
n
17
 
Remote
site
Local detection
results
Final detection results
G
o
a
l
S
c
a
l
a
b
i
l
i
t
y
:
 
r
e
d
u
c
e
c
o
m
p
l
e
x
i
t
y
A
c
c
u
r
a
c
y
:
 
r
e
d
u
c
e
f
a
l
s
e
 
p
o
s
i
t
i
v
e
 
r
a
t
e
R
e
m
o
t
e
 
S
i
t
e
How to partition data
streams
F
i
n
a
l
 
r
e
s
u
l
t
s
How to aggregate
local detection results
R
e
m
o
t
e
 
S
i
t
e
s
18
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
D
e
t
e
c
t
i
o
n
 
a
n
d
 
A
g
g
r
e
g
a
t
i
o
n
19
A
n
a
l
y
s
i
s
20
E
x
p
e
r
i
m
e
n
t
a
l
 
R
e
s
u
l
t
s
21
A
c
c
u
r
a
c
y
 
o
f
 
L
o
c
a
l
 
D
e
t
e
c
t
i
o
n
:
H
e
a
v
y
 
C
h
a
n
g
e
r
22
LD-Sketch achieves 100% recall
LD-Sketch has a little lower precision than CGT and
Seqhash, but we can improve with distributed detection
A
c
c
u
r
a
c
y
 
o
f
 
D
i
s
t
r
i
b
u
t
e
d
D
e
t
e
c
t
i
o
n
:
 
H
e
a
v
y
 
C
h
a
n
g
e
r
23
T
h
r
o
u
g
h
p
u
t
24
LD-Sketch has a little lower throughput than CGT and
Fast Sketch in local detection
LD-Sketch can scale linearly in distributed detection
Local detection
Distributed detection
C
o
n
c
l
u
s
i
o
n
s
25
Propose LD-Sketch, a sketching approach for real-time
heavy key detection in a distributed architecture
Composed of local detection and distributed detection
Propose a sketch structure for local detection
High accuracy
Low complexity in space and time
Seamlessly deployed in distributed architecture
Propose a distributed detection scheme
Reduce complexity
Improve accuracy
Slide Note
Embed
Share

LD-Sketch is a novel distributed sketching design for accurate and scalable anomaly detection in network data streams, addressing challenges such as tracking heavy keys in real-time across a vast key space. By combining high accuracy, speed, and low space complexity, LD-Sketch enables efficient heavy key detection in a distributed architecture. The work presents a solution for detecting heavy hitters and changers, enhancing both scalability and detection accuracy through experiments on real-world traces.

  • Anomaly Detection
  • Network Data Streams
  • Distributed Sketching
  • Heavy Keys
  • Scalability

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM 14 1

  2. Motivation Network traffic: a stream of (key, value) tuples Keys: src IPs, five-tuple flows Value: # of packets, payload bytes Heavy keys - classical anomalies in network traffic Heavy hitters: keys with large volume in one period e.g. SLA violation Heavy changers: keys with large volume change across two periods e.g. DoS attacks, component failures Goal: identify heavy keys in real time 2

  3. Challenges Enormous key space e.g., 5-tuple IPv4 flows are drawn from key domain of size 2104 Per-key tracking is infeasible Line-rate processing Single machine fails to keep pace with line rate Seamless distributed detection Apply single-machine detection in distributed architecture Open issue: How to achieve both scalability and accuracy ? 3

  4. Related Works Counter-based techniques Misra-Gries algorithm [Misra & Gries 82]; Lossy Counting [Manku et al. 02]; Space Saving [Metwally et al. 05]; Probalistic Lossy Count [Dimitropoulos et al. 08] Only address for heavy hitter detection in single machine Sketch-based techniques Multi-stage filter [Estan et al. 03]; CGT [Cormode et al. 04]; Reversible Sketch [Schweller et al. 06]; SeqHash [Tian et al. 07]; Fast Sketch [Liu et al. 12] Only work in single machine Distributed detection [Cormode et al. 2005] [Manjhi et al. 2005] [Yi et al. 2009] Only address heavy hitter detection 4

  5. Our Work LD-Sketch: a new sketching design for heavy key detection in a distributed architecture A sketch technique for local detection High accuracy High speed Low space complexity A distributed detection scheme not only achieves scalability but also improves accuracy Experiments on real-world traces 5

  6. Problem Formulation Perform detection in each time period (epoch) Input data: a stream (key, value) tuple (?,??) True sum ?(?): sum of values of key ? in the time period True change ?(?): absolute value of difference of ?(?) in current and last epochs Heavy hitters: all ? with ? ? ? Heavy changers: all ? with ? ? ? Problem: infeasible to track ? ? and ? ? in real-time with limited memory 6

  7. Architecture Data source Data source Data source Data source Data source Remote site Remote site Remote site Remote site Remote site Worker Worker Worker Distributed detection Local detection Local detection Local detection Local detection results Final detection results 7

  8. Local Detection ? buckets LD-Sketch Structure of ?rows, with ? buckets each 1 Update phase For each data item ?,?? select a bucket for row i by hashing key ? with function h?? update the bucket with the data item (?,??) Detection phase key ? 2 ? rows ? Examine the buckets and report heavy keys 8

  9. Inside a Bucket Basic ideas Track significant keys in a bucket with array ??,? Increment length ??,? based of total sum ??,? and parameter ? Record error ??,? due to dropping insignificant keys Bucket (?,?) Expansion parameter ? Total sum: ??? 1 ????? ??? 2 ????? Array ??,? ??,? length: ??,? Error: ????? ??,? 9

  10. Update Bucket with (?,??) Four cases Case 1: ? ??,? Update ??,? directly: ??,?? ??,?? + ?? Case 2: ? ??,? but ??,? has empty slots Insert key ? into ??,?, and set ??,?= ?? Cases 3 & 4: ? ??,?, ??,? is full ??,? ? Expansion number ? = Based on ? and ??,?: Case 3: decrement keys in ??,? Case 4: expand ??,?dynamically 10

  11. Decrement Keys Case 3: ? + 1 ? + 2 1 ??,? Example y 5 ??,? Bucket ?,? ? = 0 ??,?= 1 ??,?= 2 Step 1 New data item (?,??) ? ? 3,?? ??= 3 5,?? ??= 5 5,?? ??= 8 ? = Procedure Step 1: calculate decrement value ? = min(??, min ? ??,???,?[?]) 11

  12. Decrement Keys Procedure (cont.) Step 2: Update ??,? ??,?+ ? Step 3: Update ??,?: ??,?? ??,?? ?, for all y ??,? Remove all ? with ??,?? = 0 Insert key ? with ??,?? = ?? ? if ??> ? Step 2 5,?? ??= 3 7,?? ??= 5 7,?? ??= 8 ??,?= y 2 After ??= 3 Step 3 y 5 empty Before After ??= 5 x 3 After ??= 8 12

  13. Dynamic Expansion Case 4: ? + 1 ? + 2 1 > ??,? Add ? + 1 ? + 2 1 ??,? new counters to ??,? Set ??,?= ? + 1 ? + 2 1 Insert key ? with ??,?? = ?? Before ??,? ?2 ?1 ?3 ?4 ?5 ??,?= 5 ? = 2 After ??,? ?4 ?3 ?5 ?2 ? ?1 ??,?= 11 13

  14. Estimate True Sum or Change Estimate ?(?) in bucket (?,?): a pair of values ???? = ??,?? , if ? ??,? 0, otherwise ??? = ??,? ??,? ???? + ??,? ??,? Estimate ?(?) in bucket (?,?) Bucket (?,?) at 1st epoch Bucket (?,?) at 2nd epoch ??,2? ??,1? ???,2? and ??,? ???,1? and ??,? ??,? ??,? Estimate change: ??,2? ??,? ??,1? ??,? ???,1? ,??,? ???,2? } ??,?? = max{??,? 14

  15. Identify Heavy Key Key point: consider keys tracked by buckets Enumerate all buckets ??,? ? ? ??,?, check key ? Bucket (?,?) Check key ? for heavy hitters ?? ? for all row ? ??, ?(?) Check key ? for heavy changers ??, ?(?)? for all row ? 15

  16. Analysis Let maximum number of heavy keys = ? On accuracy Zero false negative rate Upper bound of false positive rate On complexity time complexity to update a data item: ?(1) time complexity to identify heavy keys: ?(?) space complexity: ?(?) 16

  17. Distributed Detection Goal Scalability: reduce complexity Accuracy: reduce false positive rate Remote site Worker Remote Site How to partition data streams Local detection Local detection results Final results How to aggregate local detection results Final detection results 17

  18. Remote Sites Two-step partitioning Data item (?,??) Step 1: select ? workers based on ? Worker Worker Worker Worker Worker Step 2: select one from the ? workers uniformly Worker Worker Worker For same ?, the same ? workers are selected in all remote sites 18

  19. Detection and Aggregation Detection in workers For key ?, each selected worker expects to receive 1/? of ?(?) Perform local detection in each worker with threshold ? ? Aggregate results All ? workers report ? in the local detection For key ? Report ? as a heavy key 19

  20. Analysis Let Maximum number of heavy keys = ? Total number of worker = ? On accuracy Reduce false positive rate Introduce a small false negative rate due to unfair partitioning On complexity time complexity to update a data item: ?(1) time complexity to identify heavy keys: ?(??/?) space complexity: ?(??/?) 20

  21. Experimental Results Trace 3G UMTS network in mainland China in December 2010 1.1 billion packets, 600GB traffic Approach Local detection: compare LD-Sketch with CGT, SeqHash, Fast Sketch, all of which are allocated same amount of memory Distributed detection: vary the value of ? Metrics Recall: (# of returned true heavy keys) / (# of true heavy keys) Precision: (# of returned true heavy keys) / (# of return keys) Update throughput 21

  22. Accuracy of Local Detection: Heavy Changer LD-Sketch achieves 100% recall LD-Sketch has a little lower precision than CGT and Seqhash, but we can improve with distributed detection 22

  23. Accuracy of Distributed Detection: Heavy Changer When ? = 1, the precision is similar to local detection When ? > 1, the precision significantly increases while lose a little recall 23

  24. Throughput Local detection Distributed detection LD-Sketch has a little lower throughput than CGT and Fast Sketch in local detection LD-Sketch can scale linearly in distributed detection 24

  25. Conclusions Propose LD-Sketch, a sketching approach for real-time heavy key detection in a distributed architecture Composed of local detection and distributed detection Propose a sketch structure for local detection High accuracy Low complexity in space and time Seamlessly deployed in distributed architecture Propose a distributed detection scheme Reduce complexity Improve accuracy 25

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#