Database Access Control & Privacy: A Common Ground Explored

Slide Note
Embed
Share

Exploring the intersection of database access control and data privacy, this paper delves into the implications of data privacy concerns on Database Management Systems (DBMS). It discusses the need for more than just access control mechanisms and highlights the evolving landscape of data publishing, query answering, and fine-grained access control in the context of privacy-aware practices.


Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research

  2. Data Privacy Databases Have Sensitive Information Health care database: Patient PII, Disease information Sales database: Customer PII Employee database: Employee level, salary Data analysis carries the risk of privacy breach [FTDB 2009] Latanya Sweeney s identification of the governor of MA from medical records AOL search logs Netflix prize dataset Focus of this paper: What is the implication of data privacy concerns on the DBMS? Do we need any more than access control? 2

  3. Data Publishing Patients [FTDB2009] Name Ann Bob Carol Age 28 21 24 Gender Zipcode F M F Disease Heart disease Flu Viral disease 13068 13068 13068 K-Anonymity, L-Diversity, T-Closeness Patients-Anonymized Age [20-29] [20-29] [20-29] Gender Zipcode Disease F 1**** M 1**** F 1**** Q1 ... Heart disease Flu Viral disease Qn 3

  4. Privacy-Aware Query Answering Patients [FTDB2009] Name Ann Bob Carol Age 28 21 24 Gender Zipcode F M F Disease Heart disease Flu Viral disease Q1 ... 13068 13068 13068 Qn Differential Privacy, Privacy-Preserving OLAP Patients-Anonymized Age [20-29] [20-29] [20-29] Gender Zipcode Disease F 1**** M 1**** F 1**** Heart disease Flu Viral disease 4

  5. Data Publishing Vs Query Answering Jury is still out Data Publishing No impact on DBMS De-identification algorithms over published data are getting increasingly sophisticated Need to take a hard look at the query answering paradigm Potential implications for DBMS An interactive, query-based approach is generally superior from the privacy perspective to the release-and-forget approach [CACM 10] 5

  6. Is Privacy-Aware = (Fine-Grained) Access Control (FGA)? Every user is allowed to view only subset of data (authorization view) Subset defined using a predicate Queries are (logically) rewritten to go against subset Select * From Patients Where Patients.Physician = userID() 6

  7. Is Privacy-Aware = (Fine-Grained) Access Control (FGA)? Every user is allowed to view only subset of data (authorization view) Subset defined using a predicate Queries are (logically) rewritten to go against subset Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug 7

  8. Authorization is Black and White Query: Count the number of cancer patients Deny access to cancer patients Privacy Grant access to cancer patients (Return accurate count) Utility 8

  9. Beyond Black and White: Differential Privacy [SIGMOD09] Count the number of cancer patients Perturb the output of agg. computation (Requires no change in execution engine) Need to set parameters , Budget Baggage Non-deterministic Per-query privacy parameter Overall privacy budget 9

  10. Seeking Common Ground Access Control Supports full generality of SQL Black and White Differential Privacy Algorithms A principled way to go beyond black and white Known mechanisms do not support full generality of SQL Data analysis involves aggregation but also joins, sub-queries Can we get the best of both worlds? Differential Privacy = Computation on unauthorized data What is the implication on privacy guarantees? 10

  11. What Does Best of Both Worlds Look Like? Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Ann Heart disease Lipitor Grey Lipitor Pfizer Lipitor Muscle Lipitor Liver FGA Policy: Each physician can see: Records of their patients Analyst can see: Drug records manufactured by their employer No patient records Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck

  12. FGA Name Disease Drug Physician Select * From Patients Grey Grey Select * From Patients Where Physician = userID() Stevens Stevens Yang Grey 12

  13. Differential Privacy User = JaneAnalyst Name Disease Drug Physician Select count(*) From Patients Where Disease = Cancer Heart Disease Flu Cancer Select count(*) + Noise From Patients Where Disease = Cancer Cancer AIDS 13

  14. Mix And Match: FGA + Differential Privacy Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Find for each drug with more than 3 side- effects, count the number of patients who have been prescribed Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 14

  15. Architecture That Will Fail To Mix And Match AggQ Result(AggQ) + Noise Results Q Differential Privacy API AggQ Result(AggQ) Authorization Subsystem Policy Execution Engine DBMS 15

  16. Architecture That Will Fail To Mix And Match Q Result(AggQ) + Noise Results Wrapper Differential Privacy API Authorization Subsystem Policy Result(AggQ) AggQ Execution Engine DBMS 16

  17. Authorization-Aware Data Privacy Q Results Authorization Aware Privacy Subsystem Policy Execution Engine DBMS 17

  18. Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Non-aggregation: Authorization What about aggregation? Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 18

  19. Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 19

  20. For each authorized group, find noisy count Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Authorized Groups Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 20

  21. For each authorized group, find: (1)Noisy count on unauthorized subset (2)Accurate count on authorized subset Query Rewriting Drugs Side-Effects Patients Name Disease Drug Physician Drug Company Drug Side- Effect Lipitor Pfizer Lipitor Muscle Lipitor Liver Authorized Groups Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug Analysts Name Employer JoeAnalyst Pfizer JaneAnalyst Merck 21

  22. Class of Queries Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Aggregation Foreign key join Predicate Grouping Rewriting: Go to unauthorized data for final aggregation Principled rewriting for arbitrary SQL: open problem 22

  23. Our Privacy Guarantee: Relative Differential Privacy Differential Privacy Intuition: A computation is differentially private if its behavior is similar for any two databases D1and D2 that differ in a single record Relative Differential Privacy Intuition: A computation is differentially private relative to an authorization policy if its behavior is similar for any two databases D1and D2 that differ in a single record and both result in the same authorization views 23

  24. Noisy View Create noisy view DrugCounts(Drug, PatientCnt) as (Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug) Named Non-deterministic Rewriting is authorization aware Can be part of grant-revoke statements just like regular views 24

  25. Noisy View Examples Select count(*) From Patients Where Disease = Cancer Select Disease, count(*) From Patients Group by Disease Select Category, count(*) From Patients join DiseaseCategory on Disease Group by Category 25

  26. Noisy View Architecture Select Drug, Side-Effect, Cnt From DrugCounts, Side-Effects Where DrugCounts.Drug = Side-Effects.Drug Rewrite as we saw before Enforce authorization Q Results Tables Views Noisy Views Authorization Aware Privacy Subsystem Policy Execution Engine DBMS 26

  27. Differential Privacy Parameters [SIGMOD09] Need to set parameters , Budget 27

  28. Noisy View Architecture: Differential Privacy Parameters Fall back to access control after budget exhausted (Q, ) Results Tables Views Noisy Views Auth. Policy, Privacy Budget Authorization Aware Privacy Subsystem Execution Engine DBMS 28

  29. Conclusions and Future Work Noisy view based architecture to incorporate privacy- preserving query answering with access control in a DBMS Based on differential privacy Needs minimal changes to engine Guarantee: Differential privacy relative to authorizations Baggage of differential privacy Non-deterministic Per-query privacy parameter Overall privacy budget Open Issues Larger class of noisy views (can we support arbitrary SQL?) Benchmark the privacy-utility tradeoff for complex data analysis, e.g. TPC-H, TPC-DS. Query Optimization Integrating Access Control with other privacy models 29

Related