Enhancing Data Integrity Protection in Cloud Storage Using Regenerating Codes

E
n
a
b
l
i
n
g
 
D
a
t
a
 
I
n
t
e
g
r
i
t
y
 
P
r
o
t
e
c
t
i
o
n
 
i
n
R
e
g
e
n
e
r
a
t
i
n
g
-
C
o
d
i
n
g
-
B
a
s
e
d
 
C
l
o
u
d
 
S
t
o
r
a
g
e
H
e
n
r
y
 
C
.
 
H
.
 
C
h
e
n
 
a
n
d
 
P
a
t
r
i
c
k
 
P
.
 
C
.
 
L
e
e
Department of Computer Science and Engineering
The Chinese University of Hong Kong
SRDS’12
1
C
l
o
u
d
 
S
t
o
r
a
g
e
Cloud storage is an emerging service model for
remote backup and data synchronization
Is cloud storage fully reliable?
2
 
P
r
o
b
l
e
m
s
 
i
n
 
C
l
o
u
d
 
S
t
o
r
a
g
e
3
C
l
o
u
d
 
S
t
o
r
a
g
e
 
R
e
q
u
i
r
e
m
e
n
t
s
Data integrity protection
Detect any corrupted data chunks stored on cloud
servers
Fault tolerance
Tolerate any cloud server failures
Efficient recovery
Recover any lost/corrupted data chunks with minimal
overhead
4
R
e
l
a
t
e
d
 
W
o
r
k
Single server
Provable Data Possession (PDP) 
[Ateniese et al. ’07]
 and
Proof of Retrievability (POR) 
[Juels et al. ’07]
Verify data integrity by spot-checking a fraction of large files
via cryptographic means
Multiple servers
MR-PDP 
[Curtmola et al. ’08]
Target replication
HAIL 
[Bowers et al. ’09]
Use erasure codes (e.g., Reed-Solomon codes) to provide
less storage overhead than replication
5
R
e
g
e
n
e
r
a
t
i
n
g
 
C
o
d
e
s
R
e
g
e
n
e
r
a
t
i
n
g
 
c
o
d
e
s
 
[
D
i
m
a
k
i
s
 
e
t
 
a
l
.
,
 
1
0
]
 
m
i
n
i
m
i
z
e
 
r
e
p
a
i
r
 
t
r
a
f
f
i
c
w
h
i
l
e
 
m
a
i
n
t
a
i
n
i
n
g
 
s
a
m
e
 
f
a
u
l
t
 
t
o
l
e
r
a
n
c
e
 
a
s
 
e
r
a
s
u
r
e
 
c
o
d
e
s
Implication: recover faster than erasure codes
Idea:
Do not reconstruct the whole file as in erasure codes
Instead, read only the chunks (smaller than whole file) that are
needed to recover the lost chunks
Build on network coding
N
C
C
l
o
u
d
 
[
H
u
 
e
t
 
a
l
.
,
 
1
2
]
 
p
r
o
v
i
d
e
s
 
a
n
 
i
m
p
l
e
m
e
n
t
a
b
l
e
 
d
e
s
i
g
n
 
o
f
r
e
g
e
n
e
r
a
t
i
n
g
 
c
o
d
e
s
Functional minimum storage regenerating (FMSR) codes
Designed for 
long-term archival storage
6
R
e
e
d
-
S
o
l
o
m
o
n
 
C
o
d
e
s
Conventional repair:
Reconstruct whole file and generate data in new server
A
B
A
+
B
A
+
2
B
B
A
+
B
A
A
A
B
File of 
size M
Server 1
Server 2
Server 3
Server 4
P
r
o
x
y
Reed Solomon codes
R
e
p
a
i
r
 
t
r
a
f
f
i
c
 
=
 
M
n
 
=
 
4
,
 
k
 
=
 
2
(
n
,
k
)
 
M
D
S
 
p
r
o
p
e
r
t
y
:
 
a
n
y
 
k
 
o
u
t
 
o
f
 
n
s
e
r
v
e
r
s
 
c
a
n
 
r
e
b
u
i
l
d
 
o
r
i
g
i
n
a
l
 
f
i
l
e
.
7
F
M
S
R
 
i
n
 
N
C
C
l
o
u
d
Code chunk
 P
i
 = linear combination of original native chunks
Repair in FMSR:
Download one code chunk from each surviving server
Reconstruct new code chunks via 
random linear combination
P
1
P
2
P
3
P
4
P
5
P
6
P
7
P
8
P
3
P
5
P
7
P
1
P
2
A
B
C
D
P
1
P
2
File of 
size M
P
r
o
x
y
n
 
=
 
4
,
 
k
 
=
 
2
FMSR codes
R
e
p
a
i
r
 
t
r
a
f
f
i
c
 
=
 
0
.
7
5
M
8
Server 1
Server 2
Server 3
Server 4
[Hu et al., FAST’12]
F
M
S
R
 
P
r
o
p
e
r
t
y
P
1
P
2
P
3
P
4
P
5
P
6
P
7
P
8
A
B
C
D
k(n-k) chunks
P
r
o
x
y
partition
encode
P
1
P
2
P
3
P
4
P
5
P
6
P
7
P
8
n(n-k) chunks
distribute
F
i
l
e
n=4, k=2
Servers (clouds)
c
1,1
   c
1,2
   c
1,3
   c
1,4
 
.
.
.
c
8,1
   c
8,2
   c
8,3
   c
8,4
 
A
B
C
D
P
1
P
2
P
3
P
4
P
5
P
6
P
7
P
8
Encoding matrix
rank = k(n-k)
Native chunks
C
o
d
e
 
c
h
u
n
k
s
9
F
M
S
R
 
C
o
d
e
s
FMSR codes
Can achieve up to 
50%
 repair traffic saving for
general k = n-2 (i.e., RAID-6 tolerance)
It is 
functional
, i.e., fault tolerance is preserved
It is suitable to 
long-term archival storage
 whose read
frequency is rare
Can we enable integrity protection atop
regenerating codes, while preserving repair
traffic saving?
10
O
u
r
 
C
o
n
t
r
i
b
u
t
i
o
n
s
B
u
i
l
d
 
a
 
F
M
S
R
-
D
I
P
 
c
o
d
e
,
 
w
h
i
c
h
 
e
n
a
b
l
e
s
 
d
a
t
a
i
n
t
e
g
r
i
t
y
 
p
r
o
t
e
c
t
i
o
n
,
 
f
a
u
l
t
 
t
o
l
e
r
a
n
c
e
,
 
a
n
d
 
e
f
f
i
c
i
e
n
t
r
e
c
o
v
e
r
y
Export tunable parameters from FMSR-DIP to
trade performance and security
Implement and evaluate FMSR-DIP atop a real
cloud storage testbed
11
F
M
S
R
-
D
I
P
 
G
o
a
l
s
T
h
r
e
a
t
 
m
o
d
e
l
:
 
B
y
z
a
n
t
i
n
e
,
 
m
o
b
i
l
e
 
a
d
v
e
r
s
a
r
y
[
B
o
w
e
r
s
 
e
t
 
a
l
.
 
0
9
]
exhibits arbitrary behavior
corrupts different subsets of servers over time
Design goals:
Preserve advantages of FMSR codes
Work on 
thin clouds 
(i.e., only basic PUT/GET
operations need to be supported)
Support 
byte sampling 
to minimize cost
12
F
M
S
R
-
D
I
P
:
 
O
v
e
r
v
i
e
w
F
M
S
R
-
D
I
P
U
s
e
r
s
N
C
C
l
o
u
d
F
o
u
r
 
o
p
e
r
a
t
i
o
n
s
:
 
U
p
l
o
a
d
,
 
C
h
e
c
k
,
 
D
o
w
n
l
o
a
d
 
a
n
d
 
R
e
p
a
i
r
13
S
t
o
r
a
g
e
i
n
t
e
r
f
a
c
e
F
M
S
R
c
o
d
e
 
c
h
u
n
k
s
F
M
S
R
-
D
I
P
c
o
d
e
 
c
h
u
n
k
s
S
e
r
v
e
r
s
 
/
 
c
l
o
u
d
s
F
M
S
R
-
D
I
P
:
 
U
p
l
o
a
d
8
 
F
M
S
R
 
c
o
d
e
 
c
h
u
n
k
s
,
 
3
 
b
y
t
e
s
 
e
a
c
h
14
F
M
S
R
-
D
I
P
:
 
U
p
l
o
a
d
A
p
p
l
y
 
e
r
r
o
r
-
c
o
r
r
e
c
t
i
n
g
 
c
o
d
e
 
(
E
C
C
)
t
o
 
e
a
c
h
 
c
h
u
n
k
 
i
n
d
i
v
i
d
u
a
l
l
y
15
F
M
S
R
-
D
I
P
:
 
U
p
l
o
a
d
X
O
R
 
e
a
c
h
 
b
y
t
e
 
w
i
t
h
 
a
p
s
e
u
d
o
r
a
n
d
o
m
 
v
a
l
u
e
16
F
M
S
R
-
D
I
P
:
 
U
p
l
o
a
d
F
o
r
 
e
a
c
h
 
c
h
u
n
k
,
 
c
a
l
c
u
l
a
t
e
t
h
e
 
M
A
C
 
o
f
 
t
h
e
 
f
i
r
s
t
 
3
 
b
y
t
e
s
17
F
M
S
R
-
D
I
P
:
 
U
p
l
o
a
d
18
Upload the chunks to clouds
Encrypt the metadata from NCCloud
(which contains the encoding matrix)
Append all MACs to metadata
Replicate metadata on all servers
F
M
S
R
-
D
I
P
:
 
C
h
e
c
k
P
i
c
k
 
a
 
r
o
w
 
t
o
 
c
h
e
c
k
19
F
M
S
R
-
D
I
P
:
 
C
h
e
c
k
X
O
R
 
w
i
t
h
 
t
h
e
 
p
r
e
v
i
o
u
s
 
p
s
e
u
d
o
r
a
n
d
o
m
v
a
l
u
e
s
,
 
a
n
d
 
c
h
e
c
k
 
t
h
e
i
r
 
c
o
n
s
i
s
t
e
n
c
y
20
F
M
S
R
-
D
I
P
:
 
C
h
e
c
k
R
e
c
a
l
l
 
t
h
a
t
 
f
r
o
m
 
F
M
S
R
 
e
n
c
o
d
i
n
g
:
 
A
 
x
 
=
 
P
A
 
=
 
e
n
c
o
d
i
n
g
 
m
a
t
r
i
x
 
w
i
t
h
 
r
a
n
k
 
=
 
k
(
n
-
k
)
x
 
=
 
r
o
w
 
o
f
 
n
a
t
i
v
e
 
c
h
u
n
k
s
P
 
=
 
r
o
w
 
o
f
 
c
o
d
e
 
c
h
u
n
k
s
P
 
i
s
 
a
 
l
i
n
e
a
r
 
c
o
m
b
i
n
a
t
i
o
n
 
o
f
 
x
Rank checking:
C
h
e
c
k
 
i
f
 
r
a
n
k
(
A
 
|
 
P
)
 
=
 
r
a
n
k
(
A
)
 
=
 
k
(
n
-
k
)
21
F
M
S
R
-
D
I
P
:
 
D
o
w
n
l
o
a
d
D
o
w
n
l
o
a
d
 
c
h
u
n
k
s
 
f
r
o
m
 
a
n
y
 
2
 
s
e
r
v
e
r
s
a
n
d
 
v
e
r
i
f
y
 
w
i
t
h
 
t
h
e
i
r
 
M
A
C
s
22
F
M
S
R
-
D
I
P
:
 
D
o
w
n
l
o
a
d
R
e
m
o
v
e
 
p
s
e
u
d
o
r
a
n
d
o
m
 
v
a
l
u
e
s
a
n
d
 
p
a
s
s
 
t
o
 
N
C
C
l
o
u
d
 
f
o
r
 
d
e
c
o
d
i
n
g
23
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
24
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
D
o
w
n
l
o
a
d
 
1
 
c
h
u
n
k
 
f
r
o
m
 
a
l
l
 
o
t
h
e
r
 
s
e
r
v
e
r
s
a
n
d
 
v
e
r
i
f
y
 
w
i
t
h
 
t
h
e
i
r
 
M
A
C
s
25
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
R
e
m
o
v
e
 
p
s
e
u
d
o
r
a
n
d
o
m
 
v
a
l
u
e
s
a
n
d
 
p
a
s
s
 
t
o
 
N
C
C
l
o
u
d
26
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
N
C
C
l
o
u
d
 
g
e
n
e
r
a
t
e
s
 
n
e
w
 
c
h
u
n
k
s
27
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
P
r
o
c
e
s
s
 
t
h
e
 
n
e
w
l
y
 
g
e
n
e
r
a
t
e
d
c
h
u
n
k
s
 
a
s
 
b
e
f
o
r
e
28
F
M
S
R
-
D
I
P
:
 
R
e
p
a
i
r
U
p
l
o
a
d
 
c
h
u
n
k
s
 
a
n
d
 
u
p
d
a
t
e
 
m
e
t
a
d
a
t
a
o
n
 
a
l
l
 
s
e
r
v
e
r
s
29
F
M
S
R
-
D
I
P
 
O
p
t
i
m
i
z
a
t
i
o
n
s
B
y
 
d
e
f
a
u
l
t
,
 
F
M
S
R
-
D
I
P
 
o
p
e
r
a
t
e
s
 
i
n
 
u
n
i
t
s
 
o
f
 
b
y
t
e
s
F
M
S
R
-
D
I
P
 
c
a
n
 
a
l
s
o
 
o
p
e
r
a
t
e
 
i
n
 
b
l
o
c
k
s
A block is a sequence of bytes
Better checking performance, but less security
We export different 
trade-off parameters 
that
tune between performance and security
We conduct preliminary security analysis on
FMSR-DIP
See details in paper and technical report
30
F
M
S
R
-
D
I
P
:
 
E
x
p
e
r
i
m
e
n
t
s
Testbed environment
Openstack Swift 1.4.2
1 proxy connected to storage servers over LAN
NCCloud and FMSR-DIP deployed on proxy
NCCloud uses RAMDisk as storage
Storage scheme
(4,2)-FMSR
Goal: evaluate FMSR-DIP overhead over FMSR
codes
31
R
u
n
n
i
n
g
 
T
i
m
e
 
v
s
.
 
F
i
l
e
 
S
i
z
e
FMSR-DIP overhead
comparable to network
transfer time in a LAN
environment
U
P
L
O
A
D
D
O
W
N
L
O
A
D
R
E
P
A
I
R
F
i
l
e
 
s
i
z
e
F
i
l
e
 
s
i
z
e
F
i
l
e
 
s
i
z
e
32
T
h
e
 
C
h
e
c
k
 
O
p
e
r
a
t
i
o
n
D
o
w
n
l
o
a
d
b
l
o
c
k
 
s
i
z
e
C
h
e
c
k
i
n
g
p
e
r
c
e
n
t
a
g
e
Bottleneck in
network transfer
1% check
256KB
download
block size
33
C
o
n
c
l
u
s
i
o
n
s
Propose a design for efficient data integrity
protection using FMSR on thin clouds
Implement and evaluate the efficiency of the
design
Source code:
h
t
t
p
:
/
/
a
n
s
r
l
a
b
.
c
s
e
.
c
u
h
k
.
e
d
u
.
h
k
/
s
o
f
t
w
a
r
e
/
f
m
s
r
d
i
p
/
34
B
a
c
k
u
p
35
R
e
c
a
l
l
:
 
F
M
S
R
 
E
n
c
o
d
i
n
g
36
Slide Note
Embed
Share

This paper explores the importance of data integrity protection in cloud storage and presents a solution using regenerating codes to detect corrupted data chunks, provide fault tolerance, and enable efficient recovery. It compares regenerating codes with Reed-Solomon codes and discusses their implications for faster data recovery. The use of network coding in NCCloud for implementing regenerating codes is also highlighted.

  • Cloud Storage
  • Data Integrity Protection
  • Regenerating Codes
  • Fault Tolerance
  • Network Coding

Uploaded on Sep 18, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage Henry C. H. Chen and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong SRDS 12 1

  2. Cloud Storage Cloud storage is an emerging service model for remote backup and data synchronization Is cloud storage fully reliable? 2

  3. Problems in Cloud Storage 3

  4. Cloud Storage Requirements Data integrity protection Detect any corrupted data chunks stored on cloud servers Fault tolerance Tolerate any cloud server failures Efficient recovery Recover any lost/corrupted data chunks with minimal overhead 4

  5. Related Work Single server Provable Data Possession (PDP) [Ateniese et al. 07] and Proof of Retrievability (POR) [Juels et al. 07] Verify data integrity by spot-checking a fraction of large files via cryptographic means Multiple servers MR-PDP [Curtmola et al. 08] Target replication HAIL [Bowers et al. 09] Use erasure codes (e.g., Reed-Solomon codes) to provide less storage overhead than replication 5

  6. Regenerating Codes Regenerating codes [Dimakis et al., 10]minimize repair traffic while maintaining same fault tolerance as erasure codes Implication: recover faster than erasure codes Idea: Do not reconstruct the whole file as in erasure codes Instead, read only the chunks (smaller than whole file) that are needed to recover the lost chunks Build on network coding NCCloud[Hu et al., 12] provides an implementable design of regenerating codes Functional minimum storage regenerating (FMSR) codes Designed for long-term archival storage 6

  7. Reed-Solomon Codes Reed Solomon codes Repair traffic = M File of size M A A Server 1 B Proxy B Server 2 B A+B Server 3 A A A+B A+2B Server 4 n = 4, k = 2 (n,k) MDS property: any k out of n servers can rebuild original file. Conventional repair: Reconstruct whole file and generate data in new server 7

  8. [Hu et al., FAST12] FMSR in NCCloud File of size M A B C D P1 P2 P3 P4 P5 P6 P7 P8 FMSR codes Repair traffic = 0.75M Server 1 Proxy Server 2 P3 P5 P7 Server 3 P1 P2 P1 P2 Server 4 n = 4, k = 2 Code chunk Pi = linear combination of original native chunks Repair in FMSR: Download one code chunk from each surviving server Reconstruct new code chunks via random linear combination 8

  9. FMSR Property Servers (clouds) n(n-k) chunks Proxy P1 P2 P3 P4 P5 P6 P7 P8 P1 P2 P3 P4 P5 P6 P7 P8 k(n-k) chunks A B C D File partition encode distribute n=4, k=2 c1,1 c1,2 c1,3 c1,4 P1 P2 P3 P4 P5 P6 P7 P8 A . . . B C D c8,1 c8,2 c8,3 c8,4 Encoding matrix rank = k(n-k) Native chunks Code chunks 9

  10. FMSR Codes FMSR codes Can achieve up to 50% repair traffic saving for general k = n-2 (i.e., RAID-6 tolerance) It is functional, i.e., fault tolerance is preserved It is suitable to long-term archival storage whose read frequency is rare Can we enable integrity protection atop regenerating codes, while preserving repair traffic saving? 10

  11. Our Contributions Build a FMSR-DIP code, which enables data integrity protection, fault tolerance, and efficient recovery Export tunable parameters from FMSR-DIP to trade performance and security Implement and evaluate FMSR-DIP atop a real cloud storage testbed 11

  12. FMSR-DIP Goals Threat model: Byzantine, mobile adversary [Bowers et al. 09] exhibits arbitrary behavior corrupts different subsets of servers over time Design goals: Preserve advantages of FMSR codes Work on thin clouds (i.e., only basic PUT/GET operations need to be supported) Support byte sampling to minimize cost 12

  13. FMSR-DIP: Overview Servers / clouds FMSR FMSR-DIP code chunks code chunks file upload FMSR- DIP Storage interface NCCloud Users file download Four operations: Upload, Check, Download and Repair 13

  14. FMSR-DIP: Upload 8 FMSR code chunks, 3 bytes each 14

  15. FMSR-DIP: Upload Apply error-correcting code (ECC) to each chunk individually 15

  16. FMSR-DIP: Upload XOR each byte with a pseudorandom value 16

  17. FMSR-DIP: Upload For each chunk, calculate the MAC of the first 3 bytes 17

  18. FMSR-DIP: Upload Upload the chunks to clouds Encrypt the metadata from NCCloud (which contains the encoding matrix) Append all MACs to metadata Replicate metadata on all servers 18

  19. FMSR-DIP: Check Pick a row to check 19

  20. FMSR-DIP: Check XOR with the previous pseudorandom values, and check their consistency20

  21. FMSR-DIP: Check Recall that from FMSR encoding: Ax = P A = encoding matrix with rank = k(n-k) x = row of native chunks P = row of code chunks P is a linear combination of x Rank checking: Check if rank(A | P) = rank(A) = k(n-k) 21

  22. FMSR-DIP: Download Download chunks from any 2 servers and verify with their MACs 22

  23. FMSR-DIP: Download Remove pseudorandom values and pass to NCCloud for decoding23

  24. FMSR-DIP: Repair 24

  25. FMSR-DIP: Repair Download 1 chunk from all other servers and verify with their MACs 25

  26. FMSR-DIP: Repair Remove pseudorandom values and pass to NCCloud 26

  27. FMSR-DIP: Repair NCCloud generates new chunks 27

  28. FMSR-DIP: Repair Process the newly generated chunks as before 28

  29. FMSR-DIP: Repair Upload chunks and update metadata on all servers 29

  30. FMSR-DIP Optimizations By default, FMSR-DIP operates in units of bytes FMSR-DIP can also operate in blocks A block is a sequence of bytes Better checking performance, but less security We export different trade-off parameters that tune between performance and security We conduct preliminary security analysis on FMSR-DIP See details in paper and technical report 30

  31. FMSR-DIP: Experiments Testbed environment Openstack Swift 1.4.2 1 proxy connected to storage servers over LAN NCCloud and FMSR-DIP deployed on proxy NCCloud uses RAMDisk as storage Storage scheme (4,2)-FMSR Goal: evaluate FMSR-DIP overhead over FMSR codes 31

  32. Running Time vs. File Size 25 Time taken (s) 20 Transfer-Up DIP-Encode FMSR UPLOAD 15 10 5 0 FMSR-DIP overhead comparable to network transfer time in a LAN environment 100MB 50MB 20MB 10MB 5MB 1MB File size 8 Time taken(s) 6 DOWNLOAD Transfer-Down DIP-Decode FMSR 4 2 0 100MB 50MB 20MB 10MB 5MB 1MB File size 20 Time taken(s) Transfer-Up Transfer-Down DIP-Encode DIP-Decode FMSR 15 REPAIR 10 5 0 32 100MB 50MB 20MB 10MB 5MB 1MB File size

  33. The Check Operation 80 1% check 70 60 Time taken (s) Misc. Transfer-Down Rank Checking PRF 50 40 30 20 Bottleneck in network transfer 10 0 Download block size 256B 1KB 4KB 7KB 25KB 256KB 30 256KB download block size 25 Time taken (s) 20 Misc. Transfer-Down Rank Checking PRF 15 10 5 0 100% 75% 50% 25% 10% 5% 1% Checking percentage 33

  34. Conclusions Propose a design for efficient data integrity protection using FMSR on thin clouds Implement and evaluate the efficiency of the design Source code: http://ansrlab.cse.cuhk.edu.hk/software/fmsrdip/ 34

  35. Backup 35

  36. Recall: FMSR Encoding P1 c1,1 c1,2 c1,3 c1,4 P2 c2,1 c2,2 c2,3 c2,4 A P3 c3,1 c3,2 c3,3 c3,4 c4,1 c4,2 c4,3 c4,4 B P4 c5,1 c5,2 c5,3 c5,4 C P5 c6,1 c6,2 c6,3 c6,4 D P6 c7,1 c7,2 c7,3 c7,4 P7 c8,1 c8,2 c8,3 c8,4 P8 Encoding matrix rank = k(n-k) Native chunks Code chunks 36

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#