Protecting User Privacy in Web-Based Applications through Traffic Padding

undefined
 
Background Knowledge-Resistant Traffic Padding for
Preserving User Privacy in Web-Based Applications
 
Presenter:
Wen Ming Liu (Concordia University)
 
Joint work with:
  Lingyu Wang (Concordia University)
  Kui Ren (University at Buffalo)
  Mourad Debbabi (Concordia University)
 
Cloudcom 2013
 
CIISE@CU /  CSE@UB-SUNY
 
December  4 , 2013
 
Agenda
 
2
O
v
e
r
v
i
e
w
T
h
e
 
M
o
d
e
l
C
o
n
c
l
u
s
i
o
n
T
h
e
 
A
l
g
o
r
i
t
h
m
s
E
x
p
e
r
i
m
e
n
t
 
 
M
o
t
i
v
a
t
i
n
g
 
E
x
a
m
p
l
e
3
Web-Based Applications
untrusted
Internet
Client
Server
 
 
C
r
y
p
t
o
g
r
a
p
h
y
 
s
o
l
v
e
s
 
a
l
l
 
s
e
c
u
r
i
t
y
 
p
r
o
b
l
e
m
s
!
R
e
a
l
l
y
?
4
Motivating Example
Internet
Website
Username/Password
Patient
Info about disease most recently associated
I
n
d
i
c
a
t
o
r
 
o
f
d
i
s
e
a
s
e
s
F
i
x
e
d
 
p
a
t
t
e
r
n
:
 
i
d
e
n
t
i
f
i
e
d
 
a
p
p
l
i
c
a
t
i
o
n
 
5
 
Side-Channel Attack on Encrypted Traffic
Internet
 
Client
 
Server
 
Encrypted Traffic
 
Network packets’ sizes and directions
between user and a popular search engine
By acting as a normal user and
eavesdropping traffic with sniffer pro 4.7.5.
I
n
d
i
c
a
t
o
r
 
o
f
 
t
h
e
i
n
p
u
t
 
i
t
s
e
l
f
F
i
x
e
d
 
p
a
t
t
e
r
n
:
 
i
d
e
n
t
i
f
i
e
d
 
i
n
p
u
t
 
s
t
r
i
n
g
 
6
 
Trivial problem!
Just let every packet have the same size!
 
 
7
 
Don’t Forget the Cost
 
1
 S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a
challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010.
 
 To make all inputs indistinguishable will result in a 21074%
overhead for a well-known online tax system
 
1
8
Solution: Ceiling Padding
2-indistinguishability
 
 Observation
: 
a patient receives a 360-byte packet after logins
 Cancer? Cervicitis?  
⇒ 50% , 50%
 
 Extra 
knowledge: 
this patient is a male
 Cancer? Cervicitis?  
100%
 , 0%
 
Ceiling Padding: 
pad every packet to the maximum size in the group
One Natural Solution
9
 Differential privacy:
No assumption about  the adversaries’ background
knowledge.
Suitable for statistical aggregates or their variants with
    relatively small sensitivity.
 
 Challenges:
Less suitable to traffic padding:
 Sensitivity is less predictable and much large;
 Packet sizes are directly observable;
 Privacy budget is shared by unbounded number of users.
The selection of privacy parameter 
ε
:
Q
u
a
l
i
t
a
t
i
v
e
 
s
i
g
n
i
f
i
c
a
n
c
e
 
o
f
 
p
a
r
a
m
e
t
e
r
 
ε
 
 Quantitative link between 
ε 
value and degree of privacy
guarantee 
?
10
Solution: Add Randomness
 
Cancerous Person
3
6
0
 
 
 
 
3
6
0
 
 
 
 
 
3
6
0
 Random Ceiling Padding
 Instead of deterministically forming padding groups, the server will randomly
(at uniform, in this example) selects one out of the other three diseases (together
with the real disease) to form a padding group in order to apply ceiling padding.
A
l
w
a
y
s
 
r
e
c
e
i
v
e
 
a
3
6
0
-
b
y
t
e
 
p
a
c
k
e
t
 
Cervicitis Patient
3
6
0
 
 
 
 
 
2
9
0
6
6
.
7
%
:
 
2
9
0
-
b
y
t
e
 
p
a
c
k
e
t
3
3
.
3
%
:
 
3
6
0
-
b
y
t
e
 
p
a
c
k
e
t
2
9
0
 
11
 
Better Privacy Protection
 
 Can tolerate adversaries’ extra knowledge
 
 Suppose an adversary knows a patient is male and he saw s = 360
T
h
e
 
a
d
v
e
r
s
a
r
y
n
o
w
 
c
a
n
 
o
n
l
y
b
e
 
6
0
%
,
 
i
n
s
t
e
a
d
o
f
 
1
0
0
%
,
 
s
u
r
e
t
h
a
t
 
p
a
t
i
e
n
t
 
h
a
s
C
a
n
c
e
r
.
 
 Cost is not necessarily worse
 
 In this example, these two methods actually lead to exactly the same
expected padding and processing costs
 
Agenda
 
12
O
v
e
r
v
i
e
w
T
h
e
 
M
o
d
e
l
C
o
n
c
l
u
s
i
o
n
T
h
e
 
A
l
g
o
r
i
t
h
m
s
E
x
p
e
r
i
m
e
n
t
T
r
a
f
f
i
c
 
P
a
d
d
i
n
g
P
r
i
v
a
c
y
 
P
r
o
p
e
r
t
i
e
s
P
a
d
d
i
n
g
 
M
e
t
h
o
d
s
C
o
s
t
 
M
e
t
r
i
c
s
13
Traffic Padding Issue
Internet
 
 Vector-Action Set 
VA
i
:
 
Pairs of 
i
th  
actions and corresponding 
i
th 
flow-vectors
 
14
 
Privacy Properties
 
Model the privacy requirement of a traffic padding from two perspectives
 
15
 
Privacy Properties 
(cont.)
16
Padding Method
 Ceiling padding [15][16]:
 
Inspired by PPDP: grouping and breaking
 Dominant-vector of a padding group:
 Size of each group is not less than k, and
    every flow-vector in a  group is padded to dominant-vector of that group.
Achieves k-
indistinguishability, but
not sufficient if the
adversary possess
prior knowledge.
 
 Random ceiling padding method:
 
A mechanism 
M
: when responding to an action 
a 
(per each user request),
 It randomly selects k-1 other actions, and
 Pads the flow-vector of action 
a
 to be dominant-vector of transient group
    (those k actions).
 
 Randomness:
 Randomly selects members of transient group from certain candidates
based on certain distributions.
 To reduce the cost, change the probability of an action being selected as
a member of transient group.
 
17
 
Cost Metrics
 
Agenda
 
18
O
v
e
r
v
i
e
w
T
h
e
 
M
o
d
e
l
C
o
n
c
l
u
s
i
o
n
T
h
e
 
A
l
g
o
r
i
t
h
m
s
E
x
p
e
r
i
m
e
n
t
S
c
h
e
m
e
I
n
s
t
a
n
t
i
a
t
i
o
n
s
19
Overview of Scheme
 
Main idea:
 
To response a user input, server randomly selects members to form the group. 
 Different choices of random distribution lead to different algorithms.
 
 
Goal:
 The privacy properties need to be ensured.
 
The costs of achieving such privacy protection should be minimized.
 
 
Two stage scheme:
 Stage 1: derive randomness parameters, one-time, optimization problem;
 
Stage 2: form transient group, real-time.
20
Scheme 
(cont.)
 
Computational complexity: 
O(k)
 Stage 1: pre-calculated only once
 Stage 2: select k-1 random actions without duplicate, 
O(k)
.
 
 
Discussion on costs:
 Deterministically incomparable with those of ceiling padding.
 
21
 
Instantiations of Scheme
 
 Bounded uniform distribution:
 
 c
t
: cardinality of candidate actions
 
 
c
l
: number of larger candidates
 
Scheme can be realized in many different ways.
 
Choose group members from different subsets of candidates and based on
different distributions, in order to reduce costs.
 
 Normal distribution:
 
 
μ
: mean
 
 
σ: standard deviation
 
0
 
n
 
i
action
 
c
l
 
c
t
 
0
 
n
 
i
action
 
largest
 
smallest
 
largest
 
smallest
 
probability
 
[max(0,min(i-c
l
, |VA|-c
t
)), min(max(0,min(i-c
l
, |VA|-c
t
))+c
t
, |VA|)]
 
Agenda
 
22
O
v
e
r
v
i
e
w
T
h
e
 
M
o
d
e
l
C
o
n
c
l
u
s
i
o
n
T
h
e
 
A
l
g
o
r
i
t
h
m
s
E
x
p
e
r
i
m
e
n
t
 
23
 
Experiment Settings
 
 Collect testing vector-action sets from two real-world web applications:
 
 
A popular search engine (where users’ search keyword needs to be protected)
      
Collect flow-vectors for query suggestion widget by crafting requests to simulate the
normal AJAX connection request.
 
 
An authoritative drug information system (user’s possible health information)
    
Collect vector-action set for all the drug information by mouse-selecting following the
application’s tree-hierarchical navigation.
 
 
 The flows of 
drug
 are more diverse, large, and disparate than those of 
engine
.
 
 Compare our solutions (TUNI option, Norm option) with the svmdGreedy
(SVMD) [16] on four-letter combinations in 
Engine
 and last-level data in 
Drug.
 
24
 
Uncertainty and Costs v.s. k
 
 The padding and processing costs of all algorithms increase
with k, while TUNI and NORM have less than those of SVMD.
 
 Our algorithms have much larger uncertainty for 
Drug
 and
slightly larger for 
Engine
.
 
25
 
Randomness Drawn From Bounded Uniform Distribution (c
l
)
 
 Both costs increase slowly with c
l
 for TUNI: more chances to
select larger actions for transient group.
 
 TUNI has less costs yet higher uncertainty than SVMD.
 
26
 
Randomness Drawn From Bounded Uniform Distribution (c
t
)
 
 For engine: same regardless of c
t
 value.
 
 For drug: costs (uncertainty) increase (decreases) slowly.
 
 TUNI has less costs yet higher uncertainty than SVMD.
 
27
 
Randomness Drawn From Normal Distribution (
μ
)
 
 Costs decrease almost linearly with 
μ
 from 0 to 16, and rapidly as 
μ
grows to 32.
 
 Uncertainty slightly changes with 
μ
 from 0 to 16, and decreases rapidly
when 
μ
 grows to 32 .
 
 NORM has less costs yet higher uncertainty than SVMD.
 
28
 
Randomness Drawn From Normal Distribution (
σ
)
 
 All the three measurements decreases with the decrease of 
σ
.
 
 Compared with SVMD, the less the 
σ
, NORM exhibits better.
 
Agenda
 
29
O
v
e
r
v
i
e
w
T
h
e
 
M
o
d
e
l
C
o
n
c
l
u
s
i
o
n
T
h
e
 
A
l
g
o
r
i
t
h
m
s
E
x
p
e
r
i
m
e
n
t
 
Conclusion and Future Work
 
30
 
 Propose a solution to reduce the impact of adversaries’
background knowledge in privacy-preserving traffic padding.
 
 Propose a formal model for quantifying the amount of privacy
protection provided by traffic padding solutions;
 
 Instantiate two algorithms;
 
 
Confirm the performance of our solutions in terms of both privacy and
overhead through experiment with real-world applications.
 
 Future work:
 
 
Apply the proposed approach to privacy-preserving data publishing
.
 
Thank you!
 
31
Slide Note
Embed
Share

Explore a novel approach utilizing knowledge-resistant traffic padding to safeguard user privacy in web-based applications. The study addresses side-channel attacks on encrypted traffic, the challenges of maintaining security in untrusted internet environments, and the potential privacy overhead of implementing such security measures. By analyzing experiment results and considerations regarding padding overhead, the research offers insights into the balance between privacy protection and operational efficiency.


Uploaded on Aug 17, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Concordia Institute for Information Systems Engineering (CIISE) at Concordia University Department of Computer Science and Engineering (CSE) at University at Buffalo Background Knowledge-Resistant Traffic Padding for Preserving User Privacy in Web-Based Applications Presenter: Wen Ming Liu (Concordia University) Joint work with: Lingyu Wang (Concordia University) Kui Ren (University at Buffalo) Mourad Debbabi (Concordia University) Cloudcom 2013 December 4 , 2013 CIISE@CU / CSE@UB-SUNY

  2. Agenda Overview Motivating Example The Model The Algorithms Experiment Conclusion 2

  3. Web-Based Applications web,server,computer Cryptography solves all security problems! Really? untrusted Internet Client Server Encryption 3

  4. Motivating Example ie,browser,internet explorer,microsoft web,server,computer Internet Patient Website Username/Password Info about disease most recently associated Fixed pattern: identified application Diseases Cancer 801 , Observed Directional Packet Sizes 54, 360, 60 Cervicitis 801 , 54, 290, 60 Cold 801 , 54, 290, 60 Cough 801 , 54, 290, 60 b-byte s-byte Indicator of diseases 4

  5. Side-Channel Attack on Encrypted Traffic Network packets sizes and directions between user and a popular search engine By acting as a normal user and eavesdropping traffic with sniffer pro 4.7.5. web,server,computer Internet Client Encrypted Traffic Server Fixed pattern: identified input string User Input Observed Directional Packet Sizes 54, 509, a: 801 , 60 00: 812 , 54, 60 , 505, 54, 507, 813 , 60 b-byte s-byte Indicator of the input itself 5

  6. Trivial problem! Just let every packet have the same size! 6

  7. Dont Forget the Cost To make all inputs indistinguishable will result in a 21074% overhead for a well-known online tax system 1 No guarantee of better privacy at a higher cost privacy overhead Diseases s Value Rounding ( ) 112 144 176 Cancer 360 448 432 528 Cervicitis 290 336 432 352 Cold 290 336 432 352 Cough 290 336 432 352 18.4% 40.5% 28.8% Padding Overhead (%) 1 S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy 10, pages 191 206, 2010. 7

  8. Solution: Ceiling Padding 2-indistinguishability Diseases s Value Rounding ( ) 112 Ceiling Padding 144 176 360 448 432 528 Cancer 360 290 336 432 352 Cervicitis 360 Cold 290 336 432 352 290 Cough 290 336 432 352 290 18.4% 40.5% 28.8% 5.7% Padding Overhead (%) Ceiling Padding: pad every packet to the maximum size in the group Observation: a patient receives a 360-byte packet after logins Cancer? Cervicitis? 50% , 50% Extra knowledge: this patient is a male Cancer? Cervicitis? 100% , 0% 8

  9. One Natural Solution Differential privacy: No assumption about the adversaries background knowledge. Suitable for statistical aggregates or their variants with relatively small sensitivity. Challenges: Less suitable to traffic padding: Sensitivity is less predictable and much large; Packet sizes are directly observable; Privacy budget is shared by unbounded number of users. The selection of privacy parameter : Qualitative significance of parameter Quantitative link between value and degree of privacy guarantee ? 9

  10. Solution: Add Randomness Random Ceiling Padding Instead of deterministically forming padding groups, the server will randomly (at uniform, in this example) selects one out of the other three diseases (together with the real disease) to form a padding group in order to apply ceiling padding. Cancerous Person Cervicitis Patient Diseases s Value Diseases s Value Cancer 360 Cancer 360 360 360 Cervicitis 290 Cervicitis 290 360 360 290 Cold 290 Cold 290 290 Cough 290 Cough 290 Always receive a 360-byte packet 66.7%: 290-byte packet 33.3%: 360-byte packet 10

  11. Better Privacy Protection Can tolerate adversaries extra knowledge Suppose an adversary knows a patient is male and he saw s = 360 Patient Has Cancer Server Selects Cervicitis Diseases s Value The adversary now can only be 60%, instead of 100%, sure that patient has Cancer. Cancer 360 Cancer Cold Cervicitis 290 Cancer Cough Cold 290 Cold Cancer Cough 290 Cough Cancer Cost is not necessarily worse In this example, these two methods actually lead to exactly the same expected padding and processing costs 11

  12. Agenda Overview The Model Traffic Padding Privacy Properties Padding Methods Cost Metrics The Algorithms Experiment Conclusion 12

  13. Traffic Padding Issue Diseases Observed Directional Packet Sizes web,server,computer Cancer 801 , 54, 360, 60 ie,browser,internet explorer,microsoft Internet Cervicitis 801 , 54, 290, 60 Observation: Interaction: flow-vector v: A sequence of flows (directional packet sizes) Triggered an action action a: Atomic user input that triggers traffic A keystroke, a mouse click vector-sequence ? : A sequence of flow-vectors Triggered by an equal-length action-sequence action-sequence ?: A sequence of actions with complete input info Consecutive keystrokes vector-set Vi: Collection of all ith vectors in a set of vector-seq action-set Ai: Collection of all ith actions in a set of action-seq Vector-Action Set VAi: Pairs of ith actions and corresponding ith flow-vectors 13

  14. Privacy Properties Model the privacy requirement of a traffic padding from two perspectives k-Indistinguishability: For any flow-vector, at least k different actions can trigger it. Given vector-action set VA, padding algorithm M, range Range(M,VA) ? ????? ?,?? , ?:Pr(? 1? = ? > 0 ? ? ? Uncertainty: Apply the concept of entropy in information theory to quantify an adversary s uncertainty about the real action performed by a user. Given vector-action sequence ??, padding algorithm M Pr(? 1? = ? log2Pr(? 1? = ? ? ?,??,? = ? ? ? ??,? = ?(?,??,?) Pr(? ? = ?) ? ?????(?,??) ??,? = ?(??,?) ?? ?? 14

  15. Privacy Properties (cont.) -uncertaink-Indistinguishability: An algorithm M give -uncertain k-Indistinguishability for a vector-action sequence ?? if M w.r.t. any ?? ?? satisfies k-indistinguishability, and The uncertainty of M w.r.t. ?? is not less than . 15

  16. Padding Method Achieves k- indistinguishability, but not sufficient if the adversary possess prior knowledge. Ceiling padding [15][16]: Inspired by PPDP: grouping and breaking Dominant-vector of a padding group: Size of each group is not less than k, and every flow-vector in a group is padded to dominant-vector of that group. Random ceiling padding method: A mechanism M: when responding to an action a (per each user request), It randomly selects k-1 other actions, and Pads the flow-vector of action a to be dominant-vector of transient group (those k actions). Randomness: Randomly selects members of transient group from certain candidates based on certain distributions. To reduce the cost, change the probability of an action being selected as a member of transient group. 16

  17. Cost Metrics Expected padding cost: The proportion of packet size increases compared to original flow-vectors Given vector-action sequence ??, padding algorithm M, Pr ? ? = ? ? ? ???? ?,??,? = (?,?): ? ?????(?,??) 1 ???? ??,? = (????(?,??,?)) ??: ?,? ?? ??: ???? ??,? = (????(??,?)) ?? ?? Expected processing cost: How many flow-vectors need to be padded ?? ?? (?,?) ??Pr ?(?) ? ?? ?? ???? ??,? = ?,? : ?,? ?? 17

  18. Agenda Overview The Model The Algorithms Scheme Instantiations Experiment Conclusion 18

  19. Overview of Scheme Main idea: To response a user input, server randomly selects members to form the group. Different choices of random distribution lead to different algorithms. Goal: The privacy properties need to be ensured. The costs of achieving such privacy protection should be minimized. Two stage scheme: Stage 1: derive randomness parameters, one-time, optimization problem; Stage 2: form transient group, real-time. 19

  20. Scheme (cont.) Computational complexity: O(k) Stage 1: pre-calculated only once Stage 2: select k-1 random actions without duplicate, O(k). Discussion on privacy: The adversary cannot collect vector-action set even acting as normal user, 99 19 266 ?? = 100,? = 20,??????? ????????? ?????? (?? ??????) = Approximate the distribution is hard: all users share one random process. Discussion on costs: Deterministically incomparable with those of ceiling padding. 20

  21. Instantiations of Scheme Scheme can be realized in many different ways. Choose group members from different subsets of candidates and based on different distributions, in order to reduce costs. Bounded uniform distribution: Normal distribution: ct: cardinality of candidate actions cl: number of larger candidates : mean : standard deviation probability ct smallest smallest largest largest 0 n 0 n cl i i action action [max(0,min(i-cl, |VA|-ct)), min(max(0,min(i-cl, |VA|-ct))+ct, |VA|)] 21

  22. Agenda Overview The Model The Algorithms Experiment Conclusion 22

  23. Experiment Settings Collect testing vector-action sets from two real-world web applications: A popular search engine (where users search keyword needs to be protected) Collect flow-vectors for query suggestion widget by crafting requests to simulate the normal AJAX connection request. An authoritative drug information system (user s possible health information) Collect vector-action set for all the drug information by mouse-selecting following the application s tree-hierarchical navigation. The flows of drug are more diverse, large, and disparate than those of engine. Compare our solutions (TUNI option, Norm option) with the svmdGreedy (SVMD) [16] on four-letter combinations in Engine and last-level data in Drug. 23

  24. Uncertainty and Costs v.s. k The padding and processing costs of all algorithms increase with k, while TUNI and NORM have less than those of SVMD. Our algorithms have much larger uncertainty for Drug and slightly larger for Engine. 24

  25. Randomness Drawn From Bounded Uniform Distribution (cl) Both costs increase slowly with cl for TUNI: more chances to select larger actions for transient group. TUNI has less costs yet higher uncertainty than SVMD. 25

  26. Randomness Drawn From Bounded Uniform Distribution (ct) For engine: same regardless of ct value. For drug: costs (uncertainty) increase (decreases) slowly. TUNI has less costs yet higher uncertainty than SVMD. 26

  27. Randomness Drawn From Normal Distribution () Costs decrease almost linearly with from 0 to 16, and rapidly as grows to 32. Uncertainty slightly changes with from 0 to 16, and decreases rapidly when grows to 32 . NORM has less costs yet higher uncertainty than SVMD. 27

  28. Randomness Drawn From Normal Distribution () All the three measurements decreases with the decrease of . Compared with SVMD, the less the , NORM exhibits better. 28

  29. Agenda Overview The Model The Algorithms Experiment Conclusion 29

  30. Conclusion and Future Work Propose a solution to reduce the impact of adversaries background knowledge in privacy-preserving traffic padding. Propose a formal model for quantifying the amount of privacy protection provided by traffic padding solutions; Instantiate two algorithms; Confirm the performance of our solutions in terms of both privacy and overhead through experiment with real-world applications. Future work: Apply the proposed approach to privacy-preserving data publishing. 30

  31. Thank you! 31

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#