The Challenges of Protecting Privacy with Differential Privacy

undefined
 
Differential Privacy Under Fire
 
1
 
USENIX Security (August 12, 2011)
 
Andreas Haeberlen
    Benjamin C. Pierce     Arjun Narayan
University of Pennsylvania
Motivation: Protecting privacy
 
Lots of potentially useful data exists
But: Releasing it can violate privacy!
We can try to anonymize/scrub it…
… but this can go horribly wrong (see Netflix, AOL, …)
2
USENIX Security (August 12, 2011)
Data
#1
#2
#3
#4
#5
Promising approach: Differential privacy
 
Idea: Use 
differential privacy 
[Dwork et al.]
Only allow 
queries
[lots of mathematical details omitted]
Result: Strong, provable 
privacy guarantees
Implemented, e.g., by PINQ [McSherry] and Airavat [Roy et al.]
3
USENIX Security (August 12, 2011)
Private data
 
; add a certain amount of 
noise
 to results
?!?
Differential Privacy under Fire
 
What if the adversary uses a 
covert channel
?
Devastating effect on privacy guarantees
Usual defenses are not strong enough (can't leak even one bit!)
We show:
Working attacks
An effective (domain-specific) defense
 
 
 
 
4
USENIX Security (August 12, 2011)
Private data
Does Bob
watch porn?
 
Outline
 
Motivation
Differential Privacy primer
Attacks on PINQ and Airavat
Our defense
The Fuzz system
Evaluation
 
5
 
USENIX Security (August 12, 2011)
?
Background: Queries
 
Queries are 
programs
PINQ is based on C#, Airavat on MapReduce
These programs have a specific structure
Some overall program logic, e.g., aggregation
Some computation on each database row (
microquery
)
6
USENIX Security (August 12, 2011)
noisy sum, foreach r in db, of {
}
Data
 
  if (r.score("Godfather")>4)
    then return 1
    else return 0
 
Microquery
Background: Sensitivity
 
How much noise should we add to results?
Depends on how much the output can change if we add or
remove a single row (the 
sensitivity
 of the query)
7
USENIX Security (August 12, 2011)
noisy sum, 
r in db, of {
  if (r.score("Godfather")>4)
    then return 
1200
    else return 
200
}
noisy sum, 
r in db, of {
  if (r.score("Godfather")>4)
    then return 
1
    else return 
0
}
 
Sensitivity 1
 
Sensitivity 1,000
Background: Privacy budget
 
How many queries should we answer?
Set up a 
privacy 'budget' 
for answering queries
Deduct a 'cost' for each query, depending on 'how private' it is
8
USENIX Security (August 12, 2011)
Data
 
Privacy
budget
 
Query
Covert-channel attacks
 
The above query...
... is differentially private 
(sensitivity zero)
... takes 1 second longer if the database contains Bob's data
Result: Adversary can learn private information with certainty!
Other channels we have exploited:
Privacy budget
Global state
9
USENIX Security (August 12, 2011)
noisy sum, foreach r in db, of {
  if (r.name=="Bob" && r.hasRating("Porn"))
    then { 
      loop(1 second);
    };
  return 0
}
expensive_subquery();
b=1;                 
b
Our attacks work in practice
 
Both PINQ and Airavat are vulnerable
 
What went wrong?
The authors were aware of this attack vector
Both papers discuss some ideas for possible defenses
But: Neither system has a defense that is fully effective
10
USENIX Security (August 12, 2011)
Threat model
 
Too many channels!! Is it hopeless?
Reasonable assumption: Querier is 
remote
This leaves just three channels:
The actual 
answer
 to the query
The 
time
 until the answer arrives
The decision whether the remaining 
budget
 is sufficient
11
USENIX Security (August 12, 2011)
 
Memory
consumption
 
Electromagnetic
radiation
 
Power
 
Cache
usage
 
Sound
 
Light
 
Query completion
time
 
Privacy
budget
 
Short-range channels
Our approach
 
We can close the remaining channels completely
through a combination of systems and PL techniques
 
Language design 
rules out state attacks etc.
Example: Simply don't allow global variables!
 
Program analysis 
closes the budget channel
Idea: Statically determine the 'cost' of a query before running it
Uses a novel type system [Reed and Pierce]
 
Special runtime 
to close the timing channel
12
USENIX Security (August 12, 2011)
 
Details
in the
paper
Plugging the timing channel
 
How to avoid leaking information via query
completion time?
Could treat time as an additional output
But: Unclear how to determine sensitivity
 
Our approach: 
Make timing predictable
If time does not depend on the contents of the database,
it cannot leak information
 
13
USENIX Security (August 12, 2011)
 
Timeouts and default values
 
Querier specifies for each microquery:
a
 timeout
 T, and
a 
default value 
d
 
Each time the microquery processes a row:
If completed in less than T, wait
If not yet complete at T, abort and proceed to next row
 
14
 
USENIX Security (August 12, 2011)
Example: Timeouts and default values
15
USENIX Security (August 12, 2011)
noisy sum, 
r
db, of {
  if r.name=="Bob"
    then loop(1 sec);
  return 0
}
Alice   
 
(Star Wars, 5)
 
(Alien, 4)   
 
Bob 
 
(Godfather, 1)
 
(Porn, 5)
 
Cindy
 
(Die Hard, 4)
 
(Toy Story, 2)
Dave
 
(Avatar, 5)
 
(Gandhi, 4)
 
Eva
 
(Amélie, 4)
 
(Rocky, 1)
0
Time
0
 
, T=20
s, d=1
0
0
0
Bob not in db:
 
Bob in db:
Rob
0
0
0
0
 
Observable
0
 
Time
 
Bob not in db:
 
Bob in db:
0
0
0
0
0
0
0
0
0
 
sum=0
 
sum=0
 
sum=0
 
sum=1
 
1
 
20
s
Default values do not violate privacy
 
Don't default values change the query's answer?
Yes, but 
that's okay
:
Remember that the answer is still noised before it is returned
Noise depends on the sensitivity, which is now 1
It's just as if we had written "If r.name=='Bob', return 1"
Impact on non-adversarial queriers?
Default value is never included if timeout is sufficiently high
16
USENIX Security (August 12, 2011)
noisy sum, 
r
db, of {
  if r.name=="Bob"
    then loop(1 sec);
  return 0
}
, T=20
s, d=1
Bob not in db:
Bob in db:
0
0
0
0
0
0
0
0
0
1
 
Outline
 
Motivation
Differential Privacy primer
Attacks on PINQ and Airavat
Our defense
The Fuzz system
Evaluation
 
17
 
USENIX Security (August 12, 2011)
 
The Fuzz system
 
Fuzz: 
A programming language for writing
differentially private queries
Designed from scratch 
 Easier to secure
Functionality roughly comparable to PINQ/Airavat
Novel type system 
for statically checking sensitivity
 
Runtime supports timeouts + default values
Turns out to be highly nontrivial
Problem: How to make a potentially adversarial computation
take 
exactly
 a given amount of time?
Uses a new primitive called 
predictable transactions
 
18
 
USENIX Security (August 12, 2011)
Predictable transactions
 
Isolation: 
Microquery must not interfere with
the rest of the computation in any way
Examples: Trigger garbage collector, change runtime state, ...
Preemptability: 
Must be able to abort
microqueries at any time
Even in the middle of memory allocation, ...
Bounded deallocation: 
Must be able to free any
allocated resources within bounded time
Example: Microquery allocates lots of memory, acquires locks...
Details are in the paper
19
USENIX Security (August 12, 2011)
 
Outline
 
Motivation
Differential Privacy primer
Attacks on Differential Privacy
Defenses
The Fuzz system
Evaluation
Is Fuzz expressive enough to handle realistic queries?
Is Fuzz fast enough to be practical?
Does Fuzz effectively prevent side-channel attacks?
More experiments are described in the paper
 
20
 
USENIX Security (August 12, 2011)
Experimental setup
 
Implemented three queries from prior work:
K-means clustering (inspired by Blum et al., PODS'05)
Census query (inspired by Chawla et al., TCC'05)
Web server log analysis (inspired by Dwork et al., TCC'06)
Fuzz is expressive enough to run all three queries
 
Also crafted several adversarial queries
Using different variants of our attacks
 
Evaluated on a commodity system
3GHz Core 2 Duo running Linux 2.6.38
Synthetic database with 10,000 rows
 
 
 
21
USENIX Security (August 12, 2011)
Performance: Non-adversarial queries
 
Query completion time increased by 2.5x-6.8x
But: Most expensive query took 'only' 12.7s
Most of the increase was due to time padding
Timeouts were set conservatively
More detailed results are in the paper
22
USENIX Security (August 12, 2011)
 
Original runtime
 
Fuzz (no padding)
 
Fuzz
Query completion time (s)
kmeans
census
weblog
14
12
10
8
6
4
2
0
 
6.8x
 
3.4x
 
2.5x
 
Due to
padding
Performance: Adversarial queries
 
Evaluated five adversarial queries
Unprotected runtime: Attacks cause large timing variation
Protected runtime: Completion times are extremely stable
Timing channel now too narrow to be useful!
Remember: State and budget channels closed by design
23
USENIX Security (August 12, 2011)
 
0.32s
0.32s
0.32s
26.38s
0.90s
 
1.96s
 
1.57s
 
1.62s
 
26.37s
 
2.17s
 
1.6s
 
1.2s
 
1.3s
 
6ms
 
1.3s
 
1.10s
 
1.10s
 
1.10s
 
1.10s
 
2.40s
 
1.10s
1.10s
1.10s
1.10s
2.40s
 
<1
s
 
<1
s
 
<1
s
 
<1
s
 
<1
s
 
Summary
 
Differentially private query processors must
be protected against covert-channel attacks
Leaking even a single bit can destroy the privacy guarantees
 
Vulnerabilities exist in PINQ and Airavat
 
Proposed defense: Fuzz
Uses static analysis and predictable transactions
Specific to differential privacy, but very strong: Closes all
remotely measurable channels completely
 
24
 
USENIX Security (August 12, 2011)
 
More information at: 
http://privacy.cis.upenn.edu/
Slide Note
Embed
Share

The article discusses the motivation behind using differential privacy to protect sensitive data while enabling useful queries. It highlights the promise of differential privacy, challenges faced, attacks on existing systems like PINQ and Airavat, and introduces a defense system called The Fuzz. The importance of maintaining strong privacy guarantees against potential adversary threats is emphasized.

  • Privacy
  • Differential Privacy
  • Data Protection
  • Attacks
  • Defense

Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Differential Privacy Under Fire Andreas Haeberlen Benjamin C. Pierce Arjun Narayan University of Pennsylvania 1 A. Haeberlen USENIX Security (August 12, 2011)

  2. Motivation: Protecting privacy Better recom- mendations? #1 #2 #3 #4 #5 Alice (Star Wars, 5) (Alien, 4) Bob (Godfather, 1) (Porn, 5) Cindy (Die Hard, 4) Dave (Avatar, 5) Eva (Am lie, 4) ... Data (Toy Story, 2) (Gandhi, 4) (Rocky, 1) Does Bob watch porn? I know Bob hates 'Godfather' Lots of potentially useful data exists But: Releasing it can violate privacy! We can try to anonymize/scrub it but this can go horribly wrong (see Netflix, AOL, ) 2 A. Haeberlen USENIX Security (August 12, 2011)

  3. Promising approach: Differential privacy Noise Alice (Star Wars, 5) (Alien, 4) Bob (Godfather, 1) (Porn, 5) Cindy (Die Hard, 4) (Toy Story, 2) Dave (Avatar, 5) Eva (Am lie, 4) ... Private data ?!? (Gandhi, 4) (Rocky, 1) Idea: Use differential privacy [Dwork et al.] Only allow queries [lots of mathematical details omitted] Result: Strong, provable privacy guarantees Implemented, e.g., by PINQ [McSherry] and Airavat [Roy et al.] ; add a certain amount of noise to results 3 A. Haeberlen USENIX Security (August 12, 2011)

  4. Differential Privacy under Fire Does Bob watch porn? (special query) Alice (Star Wars, 5) (Alien, 4) Bob (Godfather, 1) (Porn, 5) Cindy (Die Hard, 4) (Toy Story, 2) Dave (Avatar, 5) Eva (Am lie, 4) ... Private data (Gandhi, 4) (Rocky, 1) (noised response) YES What if the adversary uses a covert channel? Devastating effect on privacy guarantees Usual defenses are not strong enough (can't leak even one bit!) We show: Working attacks An effective (domain-specific) defense 4 A. Haeberlen USENIX Security (August 12, 2011)

  5. Outline Motivation Differential Privacy primer Attacks on PINQ and Airavat Our defense The Fuzz system Evaluation NEXT 5 A. Haeberlen USENIX Security (August 12, 2011)

  6. Background: Queries noisy sum, foreach r in db, of { if (r.score("Godfather")>4) then return 1 else return 0 ? } Microquery Data Queries are programs PINQ is based on C#, Airavat on MapReduce These programs have a specific structure Some overall program logic, e.g., aggregation Some computation on each database row (microquery) 6 A. Haeberlen USENIX Security (August 12, 2011)

  7. Background: Sensitivity noisy sum, r in db, of { if (r.score("Godfather")>4) then return 1 else return 0 } noisy sum, r in db, of { if (r.score("Godfather")>4) then return 1200 else return 200 } Sensitivity 1,000 Sensitivity 1 How much noise should we add to results? Depends on how much the output can change if we add or remove a single row (the sensitivity of the query) 7 A. Haeberlen USENIX Security (August 12, 2011)

  8. Background: Privacy budget Query Privacy budget noisy sum, r in db, of { if (r.score("Godfather")>4) then return 1 else return 0 } Answer Data How many queries should we answer? Set up a privacy 'budget' for answering queries Deduct a 'cost' for each query, depending on 'how private' it is 8 A. Haeberlen USENIX Security (August 12, 2011)

  9. Covert-channel attacks noisy sum, foreach r in db, of { if (r.name=="Bob" && r.hasRating("Porn")) then { loop(1 second); }; return 0 } expensive_subquery(); b=1; b The above query... ... is differentially private (sensitivity zero) ... takes 1 second longer if the database contains Bob's data Result: Adversary can learn private information with certainty! Other channels we have exploited: Privacy budget Global state 9 A. Haeberlen USENIX Security (August 12, 2011)

  10. Our attacks work in practice Both PINQ and Airavat are vulnerable What went wrong? The authors were aware of this attack vector Both papers discuss some ideas for possible defenses But: Neither system has a defense that is fully effective 10 A. Haeberlen USENIX Security (August 12, 2011)

  11. Short-range channels Threat model Memory consumption Electromagnetic radiation Light Power Query completion time Cache usage Query Answer Sound Privacy budget Too many channels!! Is it hopeless? Reasonable assumption: Querier is remote This leaves just three channels: The actual answer to the query The time until the answer arrives The decision whether the remaining budget is sufficient 11 A. Haeberlen USENIX Security (August 12, 2011)

  12. Our approach We can close the remaining channels completely through a combination of systems and PL techniques Language design rules out state attacks etc. Example: Simply don't allow global variables! Details in the paper Program analysis closes the budget channel Idea: Statically determine the 'cost' of a query before running it Uses a novel type system [Reed and Pierce] Special runtime to close the timing channel NEXT 12 A. Haeberlen USENIX Security (August 12, 2011)

  13. Plugging the timing channel How to avoid leaking information via query completion time? Could treat time as an additional output But: Unclear how to determine sensitivity Our approach: Make timing predictable If time does not depend on the contents of the database, it cannot leak information 13 A. Haeberlen USENIX Security (August 12, 2011)

  14. Timeouts and default values Querier specifies for each microquery: a timeout T, and a default value d Each time the microquery processes a row: If completed in less than T, wait If not yet complete at T, abort and proceed to next row 14 A. Haeberlen USENIX Security (August 12, 2011)

  15. Example: Timeouts and default values Alice (Star Wars, 5) (Alien, 4) Bob (Godfather, 1) (Porn, 5) Cindy (Die Hard, 4) Dave (Avatar, 5) Eva (Am lie, 4) noisy sum, r db, of { if r.name=="Bob" then loop(1 sec); return 0 } , T=20 s, d=1 Rob (Toy Story, 2) (Gandhi, 4) (Rocky, 1) sum=0 0 0 0 0 0 Bob not in db: Bob in db: sum=0 0 0 0 0 0 Time 20 s Observable 0 0 0 0 0 sum=0 Bob not in db: Bob in db: sum=1 0 1 Time 0 0 0 15 A. Haeberlen USENIX Security (August 12, 2011)

  16. Default values do not violate privacy noisy sum, r db, of { if r.name=="Bob" then loop(1 sec); return 0 }, T=20 s, d=1 0 0 0 0 0 Bob not in db: Bob in db: 0 1 0 0 0 Don't default values change the query's answer? Yes, but that's okay: Remember that the answer is still noised before it is returned Noise depends on the sensitivity, which is now 1 It's just as if we had written "If r.name=='Bob', return 1" Impact on non-adversarial queriers? Default value is never included if timeout is sufficiently high 16 A. Haeberlen USENIX Security (August 12, 2011)

  17. Outline Motivation Differential Privacy primer Attacks on PINQ and Airavat Our defense The Fuzz system Evaluation NEXT 17 A. Haeberlen USENIX Security (August 12, 2011)

  18. The Fuzz system Fuzz: A programming language for writing differentially private queries Designed from scratch Easier to secure Functionality roughly comparable to PINQ/Airavat Novel type system for statically checking sensitivity Runtime supports timeouts + default values Turns out to be highly nontrivial Problem: How to make a potentially adversarial computation take exactly a given amount of time? Uses a new primitive called predictable transactions 18 A. Haeberlen USENIX Security (August 12, 2011)

  19. Predictable transactions Isolation: Microquery must not interfere with the rest of the computation in any way Examples: Trigger garbage collector, change runtime state, ... Preemptability: Must be able to abort microqueries at any time Even in the middle of memory allocation, ... Bounded deallocation: Must be able to free any allocated resources within bounded time Example: Microquery allocates lots of memory, acquires locks... Details are in the paper 19 A. Haeberlen USENIX Security (August 12, 2011)

  20. Outline Motivation Differential Privacy primer Attacks on Differential Privacy Defenses The Fuzz system Evaluation Is Fuzz expressive enough to handle realistic queries? Is Fuzz fast enough to be practical? Does Fuzz effectively prevent side-channel attacks? More experiments are described in the paper NEXT 20 A. Haeberlen USENIX Security (August 12, 2011)

  21. Experimental setup Implemented three queries from prior work: K-means clustering (inspired by Blum et al., PODS'05) Census query (inspired by Chawla et al., TCC'05) Web server log analysis (inspired by Dwork et al., TCC'06) Fuzz is expressive enough to run all three queries Also crafted several adversarial queries Using different variants of our attacks Evaluated on a commodity system 3GHz Core 2 Duo running Linux 2.6.38 Synthetic database with 10,000 rows 21 A. Haeberlen USENIX Security (August 12, 2011)

  22. Performance: Non-adversarial queries 14 6.8x Query completion time (s) Original runtime Fuzz (no padding) 12 10 Fuzz Due to padding 8 6 4 2.5x 3.4x 2 0 kmeans census weblog Query completion time increased by 2.5x-6.8x But: Most expensive query took 'only' 12.7s Most of the increase was due to time padding Timeouts were set conservatively More detailed results are in the paper 22 A. Haeberlen USENIX Security (August 12, 2011)

  23. Performance: Adversarial queries Protection disabled Hit Miss Protected # Attack type Hit Miss <1 s <1 s <1 s <1 s <1 s 0.32s 0.32s 0.32s 26.38s 0.90s 1.10s 1.10s 1.10s 1.10s 2.40s 1.6s 1.2s 1.3s 6ms 1.3s 1.10s 1.10s 1.10s 1.10s 2.40s 1.96s 1.57s 1.62s 26.37s 2.17s 1 Memory allocation 2 Garbage collection 3 Artificial delay 4 Early termination 5 Artificial delay Evaluated five adversarial queries Unprotected runtime: Attacks cause large timing variation Protected runtime: Completion times are extremely stable Timing channel now too narrow to be useful! Remember: State and budget channels closed by design 23 A. Haeberlen USENIX Security (August 12, 2011)

  24. Summary Differentially private query processors must be protected against covert-channel attacks Leaking even a single bit can destroy the privacy guarantees Vulnerabilities exist in PINQ and Airavat Proposed defense: Fuzz Uses static analysis and predictable transactions Specific to differential privacy, but very strong: Closes all remotely measurable channels completely More information at: http://privacy.cis.upenn.edu/ 24 A. Haeberlen USENIX Security (August 12, 2011)

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#