Secure Computation Challenges and Solutions in Data Mining
Exploring the intersection of secure computation and data mining, this content uncovers key challenges such as improving algorithms, converting programs for secure computation, and addressing parallelizability issues. It highlights the importance of cryptography in ensuring data privacy and presents generic solutions for running machine learning algorithms without revealing sensitive information.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
GraphSC: Parallel Secure Computation Made Easy Kartik Nayak With Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft, Elaine Shi 1
Data Mining on User Data Users Users Data Data Mining Engine Data Mining Engine Data Privacy concern! Privacy concern! Data Model Data Model 2
Companies Computing on Private Data Graph representing social connections Graph representing professional connections Compute user s influence in both circles 3
Companies want to run Companies want to run machine learning learning algorithms machine algorithms Users/Companies do Users/Companies do NOT want to reveal reveal data NOT want to data Can we enable this in practice? Can we enable this in practice? 4
Cryptography to the rescue: Cryptography to the rescue: Secure Multiparty Computation Secure Multiparty Computation Ensures that we learn only the outcome Ensures that we learn only the outcome 5
Key Challenges Generic Solutions Generic Solutions 1 Lot of work improving individual algorithms Departure from one-at-a-time approach 6
Key Challenges Convert Program to Convert Program to Run on Secure Run on Secure Computation Computation (Cost of obliviousness) 2 7
Key Challenges Parallelizability Parallelizability 3 There s a lot of data maintain benefits of parallelism in the insecure setting With cryptography, expensive computation 8
Key Contributions Generic Framework Generic Framework for Graph for Graph- -parallel Algorithms Algorithms PageRank parallel Challenge: Challenge: Generic Solutions Generic Solutions Matrix Factorization using ALS Risk Minimization using ADMM Matrix Factorization using gradient descent Pregel by And many more 10
Key Contributions Efficiently Convert Graph Efficiently Convert Graph- - parallel Programs to parallel Programs to Oblivious Programs Oblivious Programs Challenge: Challenge: Convert program to Convert program to run on Secure run on Secure Computation Computation Total work blowup is O(log |V|) Blowup for na ve solution: O(|V|) for sparse graphs 11
Key Contributions Maintain Maintain Parallelizability Parallelizability Challenge: Challenge: Parallelizability Parallelizability Depth of the computation is O(log |V|) Matrix Factorization: 4K ratings, 32 threads [NIWJTB 13] 1.4 hours < 4 < 4 mins mins 12
Key Contributions 1 Generic Framework for Graph Generic Framework for Graph- -parallel Algorithms parallel Algorithms 2 Efficiently Convert to Oblivious Programs Efficiently Convert to Oblivious Programs 3 Maintain Parallelizability Maintain Parallelizability 13
Cryptographers favorite model Programmer s favorite model function bs(val, s, t) mid = (s + t) / 2; if (val < mem[mid]) bs(val, 0, mid) else bs(val, mid+1, t) 14
Programmers model: Programs Programs Cryptographer s model: Circuits Circuits Oblivious Oblivious Programs Programs Intuitively, Program traces should not depend on input data 15
Cryptographers favorite model Programmer s favorite model function bs(val, s, t) mid = (s + t) / 2; if (val < mem[mid]) bs(val, 0, mid) else bs(val, mid+1, t) 16
Programmers model: Programs Programs Cryptographer s model: Circuits Circuits Easy Easy Hard Hard Oblivious Oblivious Programs Programs Intuitively, Program traces should not depend on input data 17
Achieving Parallelism Oblivious Parallel RAM [BCP 14] Polylogarithmic Blowup: Not practical Goal: Goal: Low Depth Low Depth Circuits Circuits GraphSC: O(log |V|) O(log |V|) blowup 18
Pregel by Graph-parallel algorithms [LGKB 10, GLGBG 12, MABDHLC 10, ZCF 10] 19
Graph-parallel Algorithms 2 1 Scatter: Send data to edges D D 1 2 0 1 A A Gather: Aggregate data from edges C C 5 1 7 1 4 B B 4 Apply: Perform some computation 1 20
Obliviousness of Graph-parallel Algorithms 1 Do not reveal edge/vertex data D D 1 2 Do not reveal structure of the graph A A 7 1 0 C C 1 4 Our Solution: O(|E| log|V|) Na ve Solution: O(|V|2) B B 1 21
Oblivious Gather Key Trick 2 2 3 3 1 1 4 4 22
Oblivious Gather Key Trick Oblivious Sort with (v, isVertex) Single pass Oblivious Gather: (|E| log |V|) Gather in clear: O(|E|) Sort: O(|E| log |V|) Single pass: O(|E|) 23
Complexity of Our Algorithms Na ve Oblivious (Total Work) Parallel Oblivious (Total Work) Parallel Oblivious (Parallel Time) Sequential Insecure (Total Work) Scatter Gather O(|V|2) O(|E| log |V|) O(|E| log |V|) O(log |V|) O(log |V|) O(|E|) Apply O(|E|) O(|E|) O(|E|) O(1) O(1) O(|V|) 24
Algorithms on GraphSC Histogram computation PageRank Matrix Factorization using gradient descent Matrix Factorization using alternating least squares Pregel by Bellman-Ford shortest path Bipartite matching Parallel empirical risk minimization through alternating direction method of multipliers (ADMM) 25
Experimental Setup Cloud 1 (Garblers) Cloud 1 (Garblers) Cloud 2 (Evaluators) Cloud 2 (Evaluators) Two Scenarios: 1. LAN 2. Across Data Centers (WAN) 26
Key Evaluation Results Parallel Time (32 processors) Input Size 1K 0.5M Histogram 4 sec 34 min 20 sec 15.5 min PageRank (1 iteration) 4K 128K Using GD 1K 32K 47 sec 34 min Matrix Factorization (1 iteration) 2 min 2.35 hours Using ALS 64 4K 27
Running at Scale We used only 7 We used only 7 machines! machines! Matrix Factorization using gradient descent: 1M ratings, 6K users, 4K movies [KBV 09] 4K ratings, 32 threads 1.4 hours < 4 mins > few mins mins Time taken: ~13 hours (1 iteration) 13 hours 13 hours - -> few by using more machines by using more machines Max: 16K ratings (64x smaller data) [NIWJTB 13] 7 machine cluster, 128 processors, 525 GB RAM 28
Across Data Centers Page Rank Garblers: Oregon Evaluators: N. Virginia B/W provisioned: 2 Gbps Time reduces linearly with increasing processors 29
Conclusion GraphSC GraphSC is a parallel secure computation is a parallel secure computation framework for Graph framework for Graph- -parallel algorithms parallel algorithms www.oblivm.com www.oblivm.com Thank You! kartik@cs.umd.edu 30