Database Access Control & Privacy: A Common Ground Explored

undefined
 
Database Access Control & Privacy:
Is There A Common Ground?
 
Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy
Microsoft Research
 
Data Privacy
 
Databases Have Sensitive Information
Health care database: Patient PII, Disease information
Sales database: Customer PII
Employee database: Employee level, salary
Data analysis carries the risk of privacy breach [FTDB 2009]
Latanya Sweeney’s identification of the governor of MA from medical
records
AOL search logs
Netflix prize dataset
Focus of this paper: What is the implication of data privacy
concerns on the DBMS? Do we need any more than access
control?
 
2
 
Data Publishing
 
3
 
Patients [FTDB2009]
 
Patients-Anonymized
K-Anonymity, L-Diversity,
T-Closeness
 
Privacy-Aware Query Answering
 
4
 
Patients [FTDB2009]
 
Patients-Anonymized
Differential Privacy,
Privacy-Preserving OLAP
 
Data Publishing Vs Query Answering
 
5
 
Jury is still out
Data Publishing
No impact on DBMS
De-identification algorithms over published data are getting
increasingly sophisticated
Need to take a hard look at the query answering
paradigm
Potential implications for DBMS
“An interactive, query-based approach is generally superior
from the privacy perspective to the “release-and-forget”
approach” 
[CACM’10]
Is “Privacy-Aware” = (Fine-Grained) Access
Control (FGA)?
6
Every user is allowed to view only subset of data
(
authorization view
)
Subset defined using a predicate
Queries are (logically) rewritten to go against subset
 
Select *
From Patients
 
Where Patients.Physician
= userID()
Is “Privacy-Aware” = (Fine-Grained) Access
Control (FGA)?
7
Every user is allowed to view only subset of data
(
authorization view
)
Subset defined using a predicate
Queries are (logically) rewritten to go against subset
 
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
Group by Drug
 
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug
                      
and auth(Side-Effects)
) > 3
    
and auth(Patients) and auth(Drugs)
Group by Drug
Authorization is “Black and White”
8
Query: Count the number of cancer patients
Utility
Privacy
 
Grant access to cancer patients
(Return accurate count)
 
Deny access to cancer patients
Beyond “Black and White”: Differential
Privacy [SIGMOD09]
9
Perturb the 
output
 
of agg. computation
(Requires no change
in execution engine)
Need to set
parameters 
ε
,
Budget
Count the number of
cancer patients
Baggage
Non-deterministic
Per-query privacy parameter
Overall privacy budget
 
Seeking Common Ground
 
Access Control
Supports full generality of SQL
“Black and White”
Differential Privacy Algorithms
A principled way to go beyond “black and white”
Known mechanisms do not support full generality of SQL
Data analysis involves aggregation but also joins, sub-queries
Can we get the best of both worlds?
Differential Privacy = Computation on unauthorized data
What is the implication on privacy guarantees?
 
10
 
What Does “Best of Both Worlds” Look Like?
 
FGA Policy:
Each physician can see:
Records of their patients
Analyst can see:
Drug records manufactured by their
employer
No patient records
 
 
Patients
 
Drugs
 
Side-Effects
 
Analysts
FGA
12
Select *
From Patients
Select *
From Patients
Where Physician
= userID()
 
Grey
Differential Privacy
13
Select count(*)
From Patients
Where Disease
= ‘Cancer’
 
Select 
count(*) + Noise
From Patients
Where Disease
= ‘Cancer’
User = JaneAnalyst
Mix And Match: FGA + Differential Privacy
14
Find for each drug with more than 3 side-
effects, count the number of patients who
have been prescribed
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
Group by Drug
Patients
Drugs
Side-Effects
Analysts
 
Architecture That Will Fail To Mix And
Match
 
15
 
Execution Engine
Authorization Subsystem
 
Q
 
Policy
 
Result(AggQ)
 
Results
Differential Privacy API
 
AggQ
 
AggQ
 
Result(AggQ) + Noise
 
DBMS
 
16
 
Execution Engine
Authorization Subsystem
 
Q
 
Policy
 
Result(AggQ)
 
Results
Differential Privacy API
 
AggQ
 
Result(AggQ) + Noise
 
DBMS
Wrapper
 
Architecture That Will Fail To Mix And
Match
 
Authorization-Aware Data Privacy
 
17
 
Execution Engine
Authorization Aware Privacy Subsystem
 
Q
 
Policy
 
Results
 
DBMS
Query Rewriting
18
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
Group by Drug
Patients
Drugs
Side-Effects
Analysts
Non-aggregation: Authorization
What about aggregation?
 
Query Rewriting
 
19
 
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
Group by Drug
 
Patients
 
Drugs
 
Side-Effects
 
Analysts
Query Rewriting
20
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug
                      
and auth(Side-Effects)
) > 3
    
and auth(Patients) and auth(Drugs)
Group by Drug
Patients
Drugs
Side-Effects
Analysts
Authorized
Groups
For each authorized
group, find noisy
count
 
Query Rewriting
 
21
 
Select  Drug, count(*)
From Patients right outer join Drugs on Drug
Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug
                      
and auth(Side-Effects)
) > 3
    
and auth(Patients) and auth(Drugs)
Group by Drug
 
Patients
 
Drugs
 
Side-Effects
 
Analysts
Authorized
Groups
For each authorized group, find:
(1)Noisy count on unauthorized subset
(2)Accurate count on authorized subset
 
Class of Queries
 
22
 
   Select  Drug, count(*)
   From Patients right outer join Drugs on Drug
   Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
   Group by Drug
 
Foreign key join
 
Predicate
 
Grouping
 
Aggregation
 
Rewriting:
  
Go to unauthorized data for final aggregation
Principled rewriting for arbitrary SQL: open problem
 
Our Privacy Guarantee: Relative Differential
Privacy
 
23
 
Differential Privacy Intuition:
A computation is differentially private if its behavior is similar
for any two databases D1and D2 that differ in a 
single 
record
Relative Differential Privacy Intuition:
A computation is differentially private 
relative to an
authorization policy
 if its behavior is similar for any two
databases D1and D2 that differ in a 
single 
record 
and both
result in the same authorization views
 
Noisy View
 
24
 
Create 
noisy view
 DrugCounts(Drug, PatientCnt) as
   (Select  Drug, count(*)
   From Patients right outer join Drugs on Drug
   Where (Select count(*) From Side-Effects
             Where Drug = Drugs.Drug) > 3
   Group by Drug)
 
Named
Non-deterministic
Rewriting is a
uthorization aware
Can be part of 
grant-revoke
 statements just like regular views
 
Noisy View Examples
 
25
 
Select count(*)
From Patients
Where Disease = ‘Cancer’
 
Select Disease, count(*)
From Patients
Group by Disease
 
Select Category, count(*)
From Patients join
DiseaseCategory on Disease
Group by Category
Noisy View Architecture
26
Execution Engine
Authorization Aware Privacy Subsystem
Q
Policy
Results
Tables
Noisy Views
Views
Enforce authorization
Rewrite as we saw before
 
Select Drug, Side-Effect, Cnt
From DrugCounts, Side-Effects
Where DrugCounts.Drug = Side-Effects.Drug
DBMS
 
Differential Privacy Parameters [SIGMOD09]
 
27
Need to set
parameters 
ε
,
Budget
Noisy View Architecture: Differential Privacy
Parameters
28
Execution Engine
Authorization Aware Privacy Subsystem
(Q,
 
ε
)
Auth. Policy,
Privacy 
Budget
Results
Tables
Noisy Views
Views
Fall back to access control
after budget exhausted
DBMS
 
Conclusions and Future Work
 
29
 
Noisy view based architecture to incorporate privacy-
preserving query answering with access control in a DBMS
Based on differential privacy
Needs minimal changes to engine
Guarantee: Differential privacy relative to authorizations
Baggage of differential privacy
Non-deterministic
Per-query privacy parameter
Overall privacy budget
Open Issues
Larger class of noisy views (can we support arbitrary SQL?)
Benchmark the privacy-utility tradeoff for complex data analysis, e.g.
TPC-H, TPC-DS.
Query Optimization
Integrating Access Control with other privacy models
Slide Note
Embed
Share

Exploring the intersection of database access control and data privacy, this paper delves into the implications of data privacy concerns on Database Management Systems (DBMS). It discusses the need for more than just access control mechanisms and highlights the evolving landscape of data publishing, query answering, and fine-grained access control in the context of privacy-aware practices.

  • Database Management
  • Data Privacy
  • Access Control
  • Privacy Awareness
  • Query Answering

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research

  2. Data Privacy Databases Have Sensitive Information Health care database: Patient PII, Disease information Sales database: Customer PII Employee database: Employee level, salary Data analysis carries the risk of privacy breach [FTDB 2009] Latanya Sweeney s identification of the governor of MA from medical records AOL search logs Netflix prize dataset Focus of this paper: What is the implication of data privacy concerns on the DBMS? Do we need any more than access control? 2

  3. Data Publishing Patients [FTDB2009] Name Ann Bob Carol Age 28 21 24 Gender Zipcode F M F Disease Heart disease Flu Viral disease 13068 13068 13068 K-Anonymity, L-Diversity, T-Closeness Patients-Anonymized Age [20-29] [20-29] [20-29] Gender Zipcode Disease F 1**** M 1**** F 1**** Q1 ... Heart disease Flu Viral disease Qn 3

  4. Privacy-Aware Query Answering Patients [FTDB2009] Name Ann Bob Carol Age 28 21 24 Gender Zipcode F M F Disease Heart disease Flu Viral disease Q1 ... 13068 13068 13068 Qn Differential Privacy, Privacy-Preserving OLAP Patients-Anonymized Age [20-29] [20-29] [20-29] Gender Zipcode Disease F 1**** M 1**** F 1**** Heart disease Flu Viral disease 4

  5. Data Publishing Vs Query Answering Jury is still out Data Publishing No impact on DBMS De-identification algorithms over published data are getting increasingly sophisticated Need to take a hard look at the query answering paradigm Potential implications for DBMS An interactive, query-based approach is generally superior from the privacy perspective to the release-and-forget approach [CACM 10] 5

  6. Is Privacy-Aware = (Fine-Grained) Access Control (FGA)? Every user is allowed to view only subset of data (authorization view) Subset defined using a predicate Queries are (logically) rewritten to go against subset Select * From Patients Where Patients.Physician = userID() 6

  7. Is Privacy-Aware = (Fine-Grained) Access Control (FGA)? Every user is allowed to view only subset of data (authorization view) Subset defined using a predicate Queries are (logically) rewritten to go against subset Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug 7

  8. Authorization is Black and White Query: Count the number of cancer patients Deny access to cancer patients Privacy Grant access to cancer patients (Return accurate count) Utility 8

  9. Beyond Black and White: Differential Privacy [SIGMOD09] Count the number of cancer patients Perturb the output of agg. computation (Requires no change in execution engine) Need to set parameters , Budget Baggage Non-deterministic Per-query privacy parameter Overall privacy budget 9

  10. Seeking Common Ground Access Control Supports full generality of SQL Black and White Differential Privacy Algorithms A principled way to go beyond black and white Known mechanisms do not support full generality of SQL Data analysis involves aggregation but also joins, sub-queries Can we get the best of both worlds? Differential Privacy = Computation on unauthorized data What is the implication on privacy guarantees? 10

  11. What Does Best of Both Worlds Look Like? Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Ann Heart disease Lipitor Grey Lipitor Pfizer Lipitor Muscle Lipitor Liver FGA Policy: Each physician can see: Records of their patients Analyst can see: Drug records manufactured by their employer No patient records Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck

  12. FGA Name Disease Drug Physician Select * From Patients Grey Grey Select * From Patients Where Physician = userID() Stevens Stevens Yang Grey 12

  13. Differential Privacy User = JaneAnalyst Name Disease Drug Physician Select count(*) From Patients Where Disease = Cancer Heart Disease Flu Cancer Select count(*) + Noise From Patients Where Disease = Cancer Cancer AIDS 13

  14. Mix And Match: FGA + Differential Privacy Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Find for each drug with more than 3 side- effects, count the number of patients who have been prescribed Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 14

  15. Architecture That Will Fail To Mix And Match AggQ Result(AggQ) + Noise Results Q Differential Privacy API AggQ Result(AggQ) Authorization Subsystem Policy Execution Engine DBMS 15

  16. Architecture That Will Fail To Mix And Match Q Result(AggQ) + Noise Results Wrapper Differential Privacy API Authorization Subsystem Policy Result(AggQ) AggQ Execution Engine DBMS 16

  17. Authorization-Aware Data Privacy Q Results Authorization Aware Privacy Subsystem Policy Execution Engine DBMS 17

  18. Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Non-aggregation: Authorization What about aggregation? Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 18

  19. Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 19

  20. For each authorized group, find noisy count Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Authorized Groups Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 20

  21. For each authorized group, find: (1)Noisy count on unauthorized subset (2)Accurate count on authorized subset Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Authorized Groups Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 21

  22. Class of Queries Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Aggregation Foreign key join Predicate Grouping Rewriting: Go to unauthorized data for final aggregation Principled rewriting for arbitrary SQL: open problem 22

  23. Our Privacy Guarantee: Relative Differential Privacy Differential Privacy Intuition: A computation is differentially private if its behavior is similar for any two databases D1and D2 that differ in a single record Relative Differential Privacy Intuition: A computation is differentially private relative to an authorization policy if its behavior is similar for any two databases D1and D2 that differ in a single record and both result in the same authorization views 23

  24. Noisy View Create noisy view DrugCounts(Drug, PatientCnt) as (Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug) Named Non-deterministic Rewriting is authorization aware Can be part of grant-revoke statements just like regular views 24

  25. Noisy View Examples Select count(*) From Patients Where Disease = Cancer Select Disease, count(*) From Patients Group by Disease Select Category, count(*) From Patients join DiseaseCategory on Disease Group by Category 25

  26. Noisy View Architecture Select Drug, Side-Effect, Cnt From DrugCounts, Side-Effects Where DrugCounts.Drug = Side-Effects.Drug Rewrite as we saw before Enforce authorization Q Results Tables Views Noisy Views Authorization Aware Privacy Subsystem Policy Execution Engine DBMS 26

  27. Differential Privacy Parameters [SIGMOD09] Need to set parameters , Budget 27

  28. Noisy View Architecture: Differential Privacy Parameters Fall back to access control after budget exhausted (Q, ) Results Tables Views Noisy Views Auth. Policy, Privacy Budget Authorization Aware Privacy Subsystem Execution Engine DBMS 28

  29. Conclusions and Future Work Noisy view based architecture to incorporate privacy- preserving query answering with access control in a DBMS Based on differential privacy Needs minimal changes to engine Guarantee: Differential privacy relative to authorizations Baggage of differential privacy Non-deterministic Per-query privacy parameter Overall privacy budget Open Issues Larger class of noisy views (can we support arbitrary SQL?) Benchmark the privacy-utility tradeoff for complex data analysis, e.g. TPC-H, TPC-DS. Query Optimization Integrating Access Control with other privacy models 29

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#