Encrypted Deduplication for Secure Cloud Storage
Encrypted deduplication enhances data storage efficiency by eliminating duplicate chunks and encrypting outsourced data to protect against cloud service providers. This approach uses message-locked encryption and metadata management to store unique data and key mappings securely. By leveraging encryption and deduplication techniques, storage space is optimized while maintaining data security.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection Jingwei Li*, Patrick P. C. Lee#, Yanjing Ren*, and Xiaosong Zhang* *University of Electronic Science and Technology of China #The Chinese University of Hong Kong MSST 2019 1
Background Global datasphere is expected to grow from 33 ZB in 2018 to 175 ZB in 2025[1] (Note: 1 ZB = 1 trillion GB) Outsourcing data management to public cloud storage is popular Various cloud services have applied deduplication on stored data, in order to save maintenance cost 2 [1] https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Deduplication Deduplication coarse-grained compression Unit: chunk (fixed- or variable-size) Store only one copy of chunks with same content; other chunks refer to the copy by references (pointers) Storage space saved by 5/12 = 42%! 3
Encrypted Deduplication Encrypted deduplication augments deduplication with encryption for securing outsourced data against cloud service providers Message-locked encryption (MLE)[Bellare, EUROCRYPT 13] uses a key derived from content of chunk Enable deduplication on same data from distinct users MLE instantiations: Historical MLE[Bellare, EUROCRYPT 13] uses chunk fingerprint as key Server-aided MLE[Bellare, Security 13] derives key based on chunk fingerprint and a global secret Robust against brute-force attack 4
Metadata In addition to storing non-duplicate data, encrypted deduplication storage keeps metadata: Fingerprint index: fingerprints of already stored chunks Deduplication metadata: fingerprint-to-chunk info mappings (file recipe) Key metadata: fingerprint-to-key mappings (key recipe) Key recipes need to be managed separately from file recipes, and protected by users master keys 5
Overall Metadata Storage Metadata storage overhead in two backup workloads: X axis: number of backups Y axis: cumulative size of data or metadata The cumulative size of metadata approximates (FSL) or even exceeds (VM) that of physical data 6
Metadata Storage Breakdown X axis: number of backups Y axis: cumulative size of different metadata File recipes and key recipes take more than 99.5% of overall metadata storage 7
Challenges Grouping & re-chunking[Romanski, SYSTOR 11; Kruus, FAST 10] Depend on prior knowledge of deduplication results Compression[Meister, FAST 13] Cannot be applied to key recipes Key management[Li, TPDS 14; Zhou, MSST 15, Li, DSN 16] Add significant overhead to deduplication metadata or degrade storage efficiency of data How to enable space-efficient metadata management? 8
Our Contributions Metadedup: an encrypted deduplication storage system with space-efficient metadata management Build on indirection to apply deduplication on metadata Preserve security guarantees for both data and metadata Extensive prototype and trace-driven evaluations Metadedup suppresses the storage space of metadata, while incurring limited performance and indexing penalties 9
Architecture Client Fingerprint index File recipes Key recipes File Chunk Internet Non-duplicate chunks encrypted data & metadata Deduplication File Chunk Server Client Target backup workloads Support deduplication of data from multiple clients 10
Design Goals Low storage overhead of metadata Suppress storage space of file recipe and key recipe Add small storage overhead to fingerprint index Ensure security for data and metadata Prevent adversarial server from accessing any data or metadata Prevent adversarial clients from accessing unauthorized data or metadata Incur limited performance overhead Add small performance overhead for write 11
Main Idea Build indirect index, called metadata chunks Each metadata chunk stores metadata of multiple continuous chunks File recipes and key recipes store references of metadata chunks Metadata chunks are highly redundant for being deduplicated Changes to backups are limited to a few regions of data Unchanged regions have long sequences of data, as well as metadata, in identical orders 12
Segmentation Metadedup works after encryption procedure Each data chunk has been encrypted with MLE Apply variable-size segmentation scheme[Lillibridge, FAST 09] on encrypted data chunks Create boundaries if chunk fingerprints match specific pattern Address boundary-shift problem and achieve high effectiveness of metadata deduplication 13
Metadata Management Segment Data chunks Deduplication & Storage Fingerprint Size Key Other Metadata chunks Recipes Storage only File recipe Key recipe Encrypt metadata chunks with historical MLE Apply deduplication to both data chunks and metadata chunks 14
Operations Write operation Client: generate and encrypt data chunks; construct metadata chunks; create file recipe and key recipe Server: store non-duplicate data and metadata chunks, as well as file recipes and key recipes Restore operation First stage: retrieve file recipes, key recipes and metadata chunks Second stage: retrieve data chunks and assemble original file 15
Security Analysis Retain confidentiality for data chunks as encrypted deduplication If server-aided MLE is applied, ensure security for all data chunks If historical MLE is applied, ensure security for unpredictable data chunks Case I: server-aided MLE is applied on data chunks Metadata chunks are fully protected Case II: historical MLE is applied on data chunks Infeasible to launch brute-force attack against metadata chunks 16
Implementation Metadedup builds on our prior system CDStore[Li, ATC 15] Augment CDStore with metadata deduplication ~7.5K LoC in total Open issues: Protect sensitive filenames via obfuscation Optimize restore performance 17
Experimental Setup Testbed: Xeon 2.40GHz machines connected via 1Gb/s switch Datasets: FSL: file system snapshots (115 full daily backups, 56.20TB in total) VM: virtual image snapshots (26 full daily backups, 39.61TB in total) Evaluation goals: Performance penalty? Storage savings? Can be further improved? 18
Performance Note: Md-X denotes Metadedup instance with average segment size of X Metadedup adds small write (~4.05%) and medium restore overheads Metadedup incurs limited metadata processing overhead (details in our paper) 19
Overall Storage Efficiency Components/Metrics Total (GB) Raw 369.646 512KB 12.211 1MB 10.767 2MB 10.428 4MB 10.704 Storage saving - 97.07% 97.46% 97.55% 97.47% FSL Index overhead - 1.94% 1.07% 0.60% 0.33% Total (GB) 615.177 27.438 26.981 37.900 42.204 Storage saving - 95.74% 95.81% 94.03% 93.33% VM Index overhead - 1.91% 0.96% 0.70% 0.39% Metadedup reduces storage space of metadata, while incurring limited indexing penalties (see breakdown in our paper) Save metadata storage by 97.46% at expense of 1.07% index overhead 20
Storage Saving Storage saving of Metadedup is significantly higher than those of baseline compression approaches[Meister, FAST 13] 21
Index Overhead Compared with baseline compression approaches[Meister, FAST 13], Metadedup adds small index overhead 22
Combined with Compression Components/Metrics FSL 10.679 97.48% 1.07% 9.949 97.68% 1.11% VM Total (GB) Storage saving Index overhead Total (GB) Storage saving Index overhead 26.682 95.86% 0.96% 26.630 95.85% 1.00% Metadedup + ZC Metadedup + PC Combination: apply Metadedup, followed by compression to suppress metadata of metadata chunks Metadedup is marginally improved Compression cannot apply to physical metadata chunks 23
Conclusion Metadedup: an encrypted deduplication storage system with space-efficient metadata management Build on indirection to apply metadata deduplication Implement a prototype Conduct extensive prototype and trace-driven evaluation Software: http://adslab.cse.cuhk.edu.hk/software/metadedup/ 24