Privacy-Preserving Analysis of Graph Structures

Slide Note
Embed
Share

Explore the world of graph structures and differential privacy in data publishing networks, focusing on preserving privacy while releasing structural information about graphs. Differential privacy techniques such as edge privacy and subgraph counts are discussed in detail, highlighting the challenges and ideal algorithms for ensuring privacy guarantees.


Uploaded on Jul 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Private Analysis of Graph Structure Grigory Yaroslavtsev http://grigory.us With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University 1

  2. Publishing network data Many data sets can be represented as a graph: Friendship in online social network Financial transactions Romantic relationships Publish information about a graph Preserve privacy of relationships Na ve approach: anonymization American J. Sociology, Bearman, Moody, Stovel 2

  3. Publishing network data Goal: Publish structural information about a graph Users Database Relationships Government, researchers, businesses (or) Malicious adversary queries answers) ( A Anonymization not sufficient [Backstr m, Dwork, Kleinberg 07, Narayanan, Shmatikov 09, Narayanan, Shi, Rubinstein 11] Ideal:Algorithms with rigorous privacy guarantee, no assumptions about attacker s prior information/algorithm 3

  4. Differential privacy [Dwork, McSherry, Nissim, Smith 06] Limits incremental information by hiding presence/absence of an individual relationship Database Users Relationships Government, researchers, businesses (or) Malicious adversary queries answers) ( A Neighbors: Graphs G and G that differ in one edge Answers on neighboring graphs should be similar 4

  5. Differential privacy for relationships ?-differential privacy (edge privacy) For all pairs of neighbors ?,? and all events S: ?? ? ? ? ??Pr ? ? ? Probability is over the randomness of A Definition requires that the distributions are close: A(G) A(G ) 5

  6. Subgraph counts For graphs G and H: # of occurrences of H in G Example: k-star k k-triangle 2-star: Total: 40 Total: 2 Triangle: Total: 1 2-triangle: k 6

  7. Subgraph counts Subgraph counts are used in: Exponential random graph models Descriptive graph statistics, e.g.: # # Clustering coefficient = Our focus: efficientdifferentially private algorithms for releasing subgraph counts 7

  8. Previous work Smooth Sensitivity [Nissim, Raskhodnikova, Smith 07] Differentially private algorithm for triangles Open: private algorithms for other subgraphs? Private queries with joins [Rastogi, Hay, Miklau, Suciu 09] Works for a wide range of subgraphs Weaker privacy guarantee, applies only for specific class of adversaries Private degree sequence [Hay, Li, Miklau, Jensen 09] Guarantees differential privacy Works for k-stars, but not for other subgraphs 8

  9. Laplace Mechanism and Sensitivity [Dwork, McSherry, Nissim, Smith 06] Add noise: mean = 0, standard deviation ~(??/?), where ?? is sensitivity => ?-differential privacy: ? ? = ? ? + ???(??/?) Local sensitivity ([NRS 07], not differentially private!): ? : ???????? ?? ?? ? ? ? ???? = max Previous work (mostly): Global sensitivity ??= ???= max ? ???(?) differentially private! 9

  10. Instance-Specific Noise ?? = set of all graphs on n vertices.d(G,G ) = # edges in which G and G differ. Smooth Sensitivity [Nissim, Raskhodnikova, Smith 07]: ???? ? ?? ?,? (G) ??,? ?? ???(G) (?) = max ? ?? Add Cauchy noise: median = 0, median absolute value ??,? ? ? = ? ? + ???? ?(??,? (G)/? (where ? = ? ?) => ?-differential privacy: /?) Na ve computation requires exponential time [NRS 07]: Compute smooth sensitivity for triangles 10

  11. Our contributions Differentially private algorithms for k-stars and k-triangles Efficiently compute smooth sensitivity for k-stars NP-hardness for k-triangles and k-cycles Different approach for k-triangles Average-case analysis in G(n,p) Theoretical comparison with previous work Experimental evaluation 11

  12. Smooth Sensitivity for k-stars ( ) This paper: near-linear time algorithm for smooth sensitivity Algorithm also reveals structural results, e.g.: Proposition: If(? < 1) and (maximum degree > ????? ?/?) then (smooth sensitivity) = (local sensitivity) Algorithm optimal for large class of graphs Proposition: error > const (local sensitivity) Compared to [HLMJ 09] (private degree sequence): Our error never worse by more than a constant factor For 2-stars, our error can be better by ( n/?) factor 12

  13. Private Approximation to Local Sensitivity: k-triangles ( ) Approximate differential privacy, ?,? -privacy [Dwork, Kenthapadi, McSherry, Mironov, Naor 06]: Pr ? ? ? ??Pr ? ? ? + ? Idea: Private upper bound on local sensitivity ( Release: ?(?) = ( ??, ? ? + ???( ??). ??/?)). If ?? is ?-differentially private and Pr ?? ?? 1 ? Then Then A is (2?,???)-differentially private. 13

  14. Evaluating our algorithms Theoretical evaluation in G(n,p) model All of our algorithms have relative error -> 0 when the average degree = ?? grows Empirical evaluation Synthetic graphs from G(n,p) model Variety of real data sets 14

  15. Experimental results for G(n,p) Comparison with previous work for ? =??? ? ? 10 2-Stars LS Barrier Relative Median Error Our algorithms 1 HLMJ 0.1 RHMS Lower RHMS upper 0.01 Relative Error = 1 0.001 5 % Relative Error 0 500 Nodes 1000

  16. Experimental results for G(n,p) Comparison with previous work for ? =??? ? ? 10000 3-Stars LS Barrier 1000 Relative Median Error Our algorithms 100 HLMJ 10 RHMS Lower 1 RHMS upper 0.1 Relative Error = 1 0.01 20 % Relative Error 0 200 400 600 800 1000 Nodes

  17. Experimental results for G(n,p) Comparison with [RHMS 09] for ? =??? ? ? 2 Triangles Triangles 1E+14 Relative Median Error 100000 1E+09 1000 10000 10 0.1 0.1 0 500 1000 0 500 1000 LS Barrier Our algorithms RHMS Lower RHMS upper Relative Error = 1

  18. Experimental results (SNAP) 2-triangles triangles 10 Relative Median Error 1 0.1 0.01 0.001 0.0001 ca-GrQc ca-HepTh ca-CondMat ca-HepPh Email-Enron ca-AstroPh n=5K n=10K n=23K n=12K n=37K n=19K m=29K m=52K m=187K m=237K m=368K m=396K

  19. Experimental results (SNAP) 1 2-stars HLMJ - 2-stars Relative Median Error 0.1 0.01 0.001 0.0001 ca-GrQc ca-HepTh ca-CondMat ca-HepPh Email-Enron ca-AstroPh n=5K n=10K n=23K n=12K n=37K n=19K m=29K m=52K m=187K m=237K m=368K m=396K

  20. Summary Private algorithms for subgraph counts Rigorous privacy guarantee (differential privacy) Running time close to best algorithms for computing the subgraph counts Improvement in accuracy and (for some graph counts) in privacy over previous work Techniques: Fast computation of smooth sensitivity Differentially private upper bound on local sensitivity 20

Related


More Related Content