Private Information Retrieval in Large-Scale Data Repositories

 
By Stelios Christou: schris10@ucy.ac.cy
 
https://www2.cs.ucy.ac.cy/courses/EPL646
 
1
 
EPL646: Advanced Topics in Databases
EPL646: Advanced Topics in Databases
 
Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta. 2023.
Private Information Retrieval in Large Scale Public Data Repositories. Proc. VLDB
Endow. 16, 12 (August 2023)
 
 
 
What is Private
Information Retrieval (PIR)
 
A protocol that allows clients to privately retrieve data, without anyone knowing the
executed query or the data returned
The server, the network and anyone spying on the network should not know or be
able to figure out this information
This is very useful as data has now moved to cloud
 
What is Private
Information Retrieval (PIR)
 
A protocol that allows clients to privately retrieve data, without anyone knowing the
executed query or the data returned
The server, the network and anyone spying on the network should not know or be
able to figure out this information
This is very useful as data has now moved to cloud
 
What is Private
Information Retrieval (PIR)
 
A protocol that allows clients to privately retrieve data, without anyone knowing the
executed query or the data returned
The server, the network and anyone spying on the network should not know or be
able to figure out this information
This is very useful as data has now moved to cloud
 
Can’t we just encrypt the data on the server?
 
No… Why?
 
We don’t own the server
Security concerns
 
No… Why?
 
We don’t own the server
Security concerns
 
Can’t we just save the database locally?
 
Basic Outline
 
Server has data stored in array 
A
,
size of 
n
 
 
Server returns response 
r
 to client
 
Client wants to retrieve data 
A[i]
 
Client sends query 
q 
to server
 
 
 
Client receives 
r
 
PIR Protocol Outline
 
Server has data stored in array 
A
,
size of 
n
 
 
Server returns 
encrypted
 response 
r
 to
client
 
Client wants to retrieve data 
A[i]
 
Client sends 
encrypted
 query 
q 
to
server
 
 
Client receives 
encrypted
 
r
Client decrypts 
r
 
PIR Protocol Must Guarantee
 
Correctness
Privacy
 
PIR Categories
 
Information Theoretic PIR (IT-PIR)
Array is replicated in multiple servers
Each server sends differ part of 
r
Client puts together the parts to get 
r
Servers must not communicate with
each other!
Hard to achieve
 
PIR Categories
 
Computational PIR (CPIR)
Array is stored in a single server
Server computes 
r
Client receives 
r
All of this is done with encrypted data!
 
Allows for encrypted data computation without the need of decryption
Uses two keys. Secret Key 
sk
, Public Key 
pk
 
Encrypt (v, pk):
  Encrypts message 
v
 using 
pk
. Encrypted message is a ciphertext 
c
 and its size
is bigger than the size of 
v
Decrypt (c, sk): 
Decrypts ciphertext 
c
 using 
sk
. Decrypted message is the original message 
v
Add (c1, c2): 
Adds the two ciphertexts 
(c1+c2) 
and gives the encryption of 
(v1+v2)
Multiply (c1, c2): 
Multiplies the two ciphertexts 
(c1*c2) 
and gives the encryption of 
(v1*v2)
MultiplyPlain (v1, c2): 
Same as Multiply but using one cipher and one plain text
 
CPIR - Homomorphic Encryption
 
CPIR - How Does it Work
(Client Query)
 
Generates 
sk 
and 
pk
Creates query of size 
n 
which includes values of 0 and 1
If digit 
i 
is 1, it means that the client wants to receive the data positioned at 
A[i]
If digit 
i 
is 0, it means that the client doesn’t care for the data at that index
Encry
pts 
q 
using 
Encrypt (q, pk) 
and sends it to server
 
CPIR - How Does it Work
(Server)
 
Receives encrypted 
q
Performs 
MultiplyPlain 
between
 q[i] 
and 
A[i]
Performs 
Add 
for every multiplication
 q[i]*A[i] 
to generate 
r
Returns 
r 
to client
 
CPIR - How Does it Work
(Client Response)
 
Receives 
r
Decrypts 
r
 using 
Decrypt (r, sk)
 
CPIR - Performance
 
Size of 
q
. Bigger means more execution time
Computation time to generate 
r
. Bigger means more execution time
Size of 
r
. Bigger means more execution time
Each affects the other!
Hard to improve performance
 
Multi-dimensional arrays
 
CPIR - Performance
 
Arrange 
q 
and 
A 
to
H
igher dimensions
 
Compute response
for 2 smaller queries
 
Multi-dimensional arrays
Smaller queries, bigger response size
 
CPIR - Performance
 
r
 
gets bigger with
every encryption
 
Retrieving Multiple Elements
 
Batch Code (method similar to HashMap)
Database records are split to groups and each group is assigned a code
Codes are distributed in buckets
Queries are encoded to a ser of codes
Server returns corresponding groups based on query codes
 
How does this work in real life though?
 
Real Life Application
 
Clients don’t usually know index 
i
Multiple element retrieval is applied first
Client picks desired result
Data is retrieved based on client’s choice
 
What if the server doesn’t save data in an
array structure?
 
Key - Value Retrieval
 
Data is saved in pairs of <key, value>
Client probably know the key and not the index!
 
PIR Protocol Extension
 
Keys are arranged in a binary tree
Phase 1: Search tree to find index based on key using Breadth First Search
Phase 2: Retrieve data based on index
 
PIR Protocol Extension
 
Keys are arranged in a binary tree
Phase 1: Search tree to find index based on key using Breadth First Search
Phase 2: Retrieve data based on index
Very time consuming for large amount of data!
What if data changes between the two phases?
 
Single Round Value Retrieval
 
Client sends encrypted key
Server performs equality checks to find the key
If equal return encrypted 1
If not equal, return encrypted 0
Builds an encrypted query 
q
, like before
 
Conclusion
 
Private Information Retrieval is possible
Research required to improve performance
Research required to apply on SQL queries and other types of queries
 
References
 
[Paper] Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta. 2023.
Private Information Retrieval in Large Scale Public Data Repositories. Proc. VLDB Endow.
16, 12 (August 2023), 3868–3871.
[Tutorial] https://sites.cs.ucsb.edu/~ishtiyaque/files/VLDB_2023_Tutorial.pdf
 
 
Thank you for your time!
Slide Note

University of Cyprus

Embed
Share

Private Information Retrieval (PIR) is a protocol that allows clients to retrieve data privately without revealing the query or returned data to the server or anyone spying on the network. Encrypting data on the server is not a solution due to security concerns related to server ownership. This advanced database topic is crucial as more data moves to the cloud, emphasizing the importance of privacy and security in data retrieval.

  • PIR
  • Large-Scale Data
  • Database Security
  • Cloud Data
  • Privacy Protection

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. EPL646: Advanced Topics in Databases Private Information Retrieval in Large Scale Public Data Repositories Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta. 2023. Private Information Retrieval in Large Scale Public Data Repositories. Proc. VLDB Endow. 16, 12 (August 2023) By Stelios Christou: schris10@ucy.ac.cy 1 https://www2.cs.ucy.ac.cy/courses/EPL646

  2. What is Private Information Retrieval (PIR) A protocol that allows clients to privately retrieve data, without anyone knowing the executed query or the data returned The server, the network and anyone spying on the network should not know or be able to figure out this information This is very useful as data has now moved to cloud

  3. What is Private Information Retrieval (PIR) A protocol that allows clients to privately retrieve data, without anyone knowing the executed query or the data returned The server, the network and anyone spying on the network should not know or be able to figure out this information This is very useful as data has now moved to cloud

  4. What is Private Information Retrieval (PIR) A protocol that allows clients to privately retrieve data, without anyone knowing the executed query or the data returned The server, the network and anyone spying on the network should not know or be able to figure out this information This is very useful as data has now moved to cloud

  5. Cant we just encrypt the data on the server?

  6. No Why? We don t own the server Security concerns

  7. No Why? We don t own the server Security concerns

  8. Cant we just save the database locally?

  9. Basic Outline Server has data stored in array A, size of n Client wants to retrieve data A[i] Client sends query q to server Server returns response r to client Client receives r

  10. PIR Protocol Outline Server has data stored in array A, size of n Client wants to retrieve data A[i] Client sends encrypted query q to server Server returns encrypted response r to client Client receives encrypted r Client decrypts r

  11. PIR Protocol Must Guarantee Correctness Privacy

  12. PIR Categories Information Theoretic PIR (IT-PIR) Array is replicated in multiple servers Each server sends differ part of r Client puts together the parts to get r Servers must not communicate with each other! Hard to achieve

  13. PIR Categories Computational PIR (CPIR) Array is stored in a single server Server computes r Client receives r All of this is done with encrypted data!

  14. CPIR - Homomorphic Encryption Allows for encrypted data computation without the need of decryption Uses two keys. Secret Key sk, Public Key pk Encrypt (v, pk): Encrypts message v using pk. Encrypted message is a ciphertext c and its size is bigger than the size of v Decrypt (c, sk): Decrypts ciphertext c using sk. Decrypted message is the original message v Add (c1, c2): Adds the two ciphertexts (c1+c2) and gives the encryption of (v1+v2) Multiply (c1, c2): Multiplies the two ciphertexts (c1*c2) and gives the encryption of (v1*v2) MultiplyPlain (v1, c2): Same as Multiply but using one cipher and one plain text

  15. CPIR - How Does it Work (Client Query) Generates sk and pk Creates query of size n which includes values of 0 and 1 If digit i is 1, it means that the client wants to receive the data positioned at A[i] If digit i is 0, it means that the client doesn t care for the data at that index Encrypts q using Encrypt (q, pk) and sends it to server

  16. CPIR - How Does it Work (Server) Receives encrypted q Performs MultiplyPlain between q[i] and A[i] Performs Add for every multiplication q[i]*A[i] to generate r Returns r to client

  17. CPIR - How Does it Work (Client Response) Receives r Decrypts r using Decrypt (r, sk)

  18. CPIR - Performance Size of q. Bigger means more execution time Computation time to generate r. Bigger means more execution time Size of r. Bigger means more execution time Each affects the other! Hard to improve performance

  19. CPIR - Performance Multi-dimensional arrays Arrange q and A to Higher dimensions Compute response for 2 smaller queries

  20. CPIR - Performance Multi-dimensional arrays Smaller queries, bigger response size r gets bigger with every encryption

  21. Retrieving Multiple Elements Batch Code (method similar to HashMap) Database records are split to groups and each group is assigned a code Codes are distributed in buckets Queries are encoded to a ser of codes Server returns corresponding groups based on query codes

  22. How does this work in real life though?

  23. Real Life Application Clients don t usually know index i Multiple element retrieval is applied first Client picks desired result Data is retrieved based on client s choice

  24. What if the server doesnt save data in an array structure?

  25. Key - Value Retrieval Data is saved in pairs of <key, value> Client probably know the key and not the index!

  26. PIR Protocol Extension Keys are arranged in a binary tree Phase 1: Search tree to find index based on key using Breadth First Search Phase 2: Retrieve data based on index

  27. PIR Protocol Extension Keys are arranged in a binary tree Phase 1: Search tree to find index based on key using Breadth First Search Phase 2: Retrieve data based on index Very time consuming for large amount of data! What if data changes between the two phases?

  28. Single Round Value Retrieval Client sends encrypted key Server performs equality checks to find the key If equal return encrypted 1 If not equal, return encrypted 0 Builds an encrypted query q, like before

  29. Conclusion Private Information Retrieval is possible Research required to improve performance Research required to apply on SQL queries and other types of queries

  30. References [Paper] Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta. 2023. Private Information Retrieval in Large Scale Public Data Repositories. Proc. VLDB Endow. 16, 12 (August 2023), 3868 3871. [Tutorial] https://sites.cs.ucsb.edu/~ishtiyaque/files/VLDB_2023_Tutorial.pdf

  31. Thank you for your time!

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#