File System Performance and Optimization Techniques

Announcements
P3 Grading: Available
Look at runtests.log for details (10 points for creating test cases)
Contact TA (give login and discussion section) with questions
P4:  Threads (Part a and b) available
Due Wednesday 11/18 at 9pm
Exam 2: Answers with explanations available
Exam 3: Thursday evening at 11/19 7:15-9:15
Fill out form AND send academic schedule if need alternate time by 5pm
Similar in style to previous exams (very few questions about Virtualization; few
about Concurrency)
Read as we go along!
Chapter 41
Persistence:
FAST FILE SYSTEM (FFS)
Questions answered in this lecture:
How to improve performance of complex system?
Why do file systems obtain worse performance over time?
How to choose the right block size? How to avoid internal fragmentation?
How to place related blocks close to one another on disk?
UNIVERSITY of WISCONSIN-MADISON
Computer Sciences Department
CS 537
Introduction to Operating Systems
Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
File-System 
File-System 
Case Studies
Case Studies
Local
-
 
F
F
S
:
 
F
a
s
t
 
F
i
l
e
 
S
y
s
t
e
m
-
 
L
F
S
:
 
L
o
g
-
S
t
r
u
c
t
u
r
e
d
 
F
i
l
e
 
S
y
s
t
e
m
Network
-
 
N
F
S
:
 
N
e
t
w
o
r
k
 
F
i
l
e
 
S
y
s
t
e
m
-
 
A
F
S
:
 
A
n
d
r
e
w
 
F
i
l
e
 
S
y
s
t
e
m
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
b
i
t
m
a
p
s
inodes
data blocks
regular data
directories
indirect blocks
Review: Basic 
Review: Basic 
Layout
Layout
What is stored as a data block?
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
REVIEW: 
create /foo/bar
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
create /foo/bar
read
read
read
read
[traverse]
Verify that 
bar does not already exist
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
create /foo/bar
read
read
read
read
read
write
[allocate inode]
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
create /foo/bar
read
read
read
read
read
write
read
write
[populate inode]
Why must 
read
 bar inode?
How to initialize inode?
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
create /foo/bar
read
read
read
read
read
write
write
read
write
write
[add bar to /foo]
Update inode (e.g., size) and data for directory 
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
open /foo/bar
data
bar
read
read
read
read
read
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
write to /foo/bar
 (assume file exists and has been opened)
bar
data
read
read
write
write
write
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
append to /foo/bar
data
bar
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
append to /foo/bar
 (opened already)
bar
data
read
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
append to /foo/bar
bar
data
read
read
write
[allocate block]
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
append to /foo/bar
bar
data
read
read
write
write
[point to block]
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
append to /foo/bar
bar
data
read
read
write
write
write
[write to block]
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
read /foo/bar
 – assume opened
data
bar
read
read
write
undefined
data
inode
root
foo
bar
root
foo
bitmap
bitmap
inode
inode
inode
data
data
close /foo/bar
data
bar
nothing to do on disk!
Review: 
Review: 
Locality Types
Locality Types
time
address
S
p
a
t
i
a
l
 
L
o
c
a
l
i
t
y
time
address
T
e
m
p
o
r
a
l
 
L
o
c
a
l
i
t
y
Which type of locality is most interesting with a disk?
Order Matters
Order Matters
time
address
F
a
s
t
time
address
S
l
o
w
Implication for disk schedulers?
Policy: Choose Inode,
Policy: Choose Inode,
Data Blocks
Data Blocks
0
7
D
D
D
D
D
D
D
D
8
15
D
D
D
D
D
D
D
D
16
23
D
D
D
D
D
D
D
D
24
31
S
i
d
I
I
I
I
I
Assuming all free, which should be chosen?
Bad File Layout
Bad File Layout
0
7
D
D
D
D
D
D
D
D
8
15
D
D
D
D
D
D
D
D
16
23
D
D
D
D
D
D
D
D
24
31
S
i
d
I
I
I
I
I
0
1
2
3
inode
Better File Layout
Better File Layout
0
7
D
D
D
D
D
D
D
D
8
15
D
D
D
D
D
D
D
D
16
23
D
D
D
D
D
D
D
D
24
31
S
i
d
I
I
I
I
I
0
1
2
3
inode
Best File Layout
Best File Layout
0
7
D
D
D
D
D
D
D
D
8
15
D
D
D
D
D
D
D
D
16
23
D
D
D
D
D
D
D
D
24
31
S
i
d
I
I
I
I
I
0
1
2
3
inode
Can’t do this for all files 
undefined
Fast File System
Fast File System
:
:
FFS
FFS
(1980’s)
(1980’s)
System Building
System Building
B
e
g
i
n
n
e
r
s
 
a
p
p
r
o
a
c
h
1. get idea
2. build it!
P
r
o
 
a
p
p
r
o
a
c
h
1. identify 
existing 
state of the art
2. measure it, identify 
and understand 
problems
3. get idea
 (solutions often flow from deeply understanding problem)
4. build it!
measure then build
Measure 
Measure 
Old FS
Old FS
State of the art: original UNIX file system
State of the art: original UNIX file system
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Free lists are embedded in inodes, data blocks
Data blocks are 512 bytes
Measure 
throughput for whole sequential file reads/writes
Compare to theoretical max,
 which is…
O
l
d
 
U
N
I
X
 
f
i
l
e
 
s
y
s
t
e
m
:
 
a
c
h
i
e
v
e
d
 
o
n
l
y
 
2
%
 
o
f
 
p
o
t
e
n
t
i
a
l
.
 
 
W
h
y
?
 
disk bandwidth
Measurement 1
Measurement 1
: Aging?
: Aging?
 
What is performance before/after 
What is performance before/after 
aging
aging
?
?
N
N
e
e
w
w
 
 
F
F
S
S
:
:
 
 
1
1
7
7
.
.
5
5
%
%
 
 
o
o
f
f
 
 
d
d
i
i
s
s
k
k
 
 
b
b
a
a
n
n
d
d
w
w
i
i
d
d
t
t
h
h
F
F
e
e
w
w
 
 
w
w
e
e
e
e
k
k
s
s
 
 
o
o
l
l
d
d
:
:
 
 
3
3
%
%
 
 
o
o
f
f
 
 
d
d
i
i
s
s
k
k
 
 
b
b
a
a
n
n
d
d
w
w
i
i
d
d
t
t
h
h
Problem: 
Problem: 
FS is 
FS is 
becomes 
becomes 
fragmented over time
fragmented over time
Free list makes contiguous chunks hard to find
Free list makes contiguous chunks hard to find
 
Hacky Solutions:
Hacky Solutions:
Occassional defrag of disk
Occassional defrag of disk
Keep freelist sorted
Keep freelist sorted
 
How does 
block size
 affect performance?
Try doubling it!
 
R
e
s
u
l
t
:
 
P
e
r
f
o
r
m
a
n
c
e
 
m
o
r
e
 
t
h
a
n
 
d
o
u
b
l
e
d
 
Why double the performance?
Logically adjacent blocks not physically adjacent
Only half as many seeks+rotations now required
 
Why 
more
 than double the performance?
Smaller blocks 
require more indirect blocks
Measurement 2
Measurement 2
:
:
Block SIZE?
Block SIZE?
Old FS Summary
Old FS Summary
Free list becomes scrambled 
Free list becomes scrambled 
 
 
random allocations
random allocations
Small blocks (512 bytes)
Small blocks (512 bytes)
Blocks laid out poorly
Blocks laid out poorly
 long distance between inodes/data
 long distance between inodes/data
 
 
related 
related 
inodes not close to one another
inodes not close to one another
Which inodes related?
Which inodes related?
R
R
e
e
s
s
u
u
l
l
t
t
:
:
 
 
2
2
%
%
 
 
o
o
f
f
 
 
p
p
o
o
t
t
e
e
n
n
t
t
i
i
a
a
l
l
 
 
p
p
e
e
r
r
f
f
o
o
r
r
m
m
a
a
n
n
c
c
e
e
!
!
 
 
(
(
a
a
n
n
d
d
 
 
w
w
o
o
r
r
s
s
e
e
 
 
o
o
v
v
e
e
r
r
 
 
t
t
i
i
m
m
e
e
)
)
Problem: old FS treats disk like RAM!
 
Inodes in same directory (ls –l)
Solution: a disk-aware
Primary File System Design Questions:
Primary File System Design Questions:
Where 
Where 
to place 
to place 
meta-data and 
meta-data and 
data on disk
data on disk
?
?
How to use big blocks without wasting space?
How to use big blocks without wasting space?
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Placement 
Placement 
Technique 1:
Technique 1:
Bitmaps
Bitmaps
Use bitmaps instead of free list
Provides 
better speed
, with more global view
Faster to find contiguous free blocks 
b
i
t
m
a
p
s
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Placement 
Placement 
Technique 2:
Technique 2:
Groups
Groups
b
i
t
m
a
p
s
before: whole disk
fast
How to keep inode close to data?
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Technique 2: Groups
Technique 2: Groups
b
i
t
m
a
p
s
before: whole disk
slow
How to keep inode close to data?
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Technique 2: Groups
Technique 2: Groups
b
i
t
m
a
p
s
before: whole disk
slower
How to keep inode close to data?
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Technique 2: Groups
Technique 2: Groups
b
i
t
m
a
p
s
before: whole disk
slowest
How to keep inode close to data?
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
N
Technique 2: Groups
Technique 2: Groups
b
i
t
m
a
p
s
before: whole disk
How to keep inode close to data?
Technique 2: Groups
Technique 2: Groups
group 1
0
G
2G
3G
group 2
group 3
How to keep inode close to data?
Answer: Use groups across disks;
Try to place inode and data in same
group
Technique 2: Groups
Technique 2: Groups
strategy: allocate inodes and data blocks in same group.
group 1
0
G
2G
3G
group 2
group 3
fast
fast
fast
Groups
Groups
In FFS, groups were ranges of cylinders
In FFS, groups were ranges of cylinders
 - called 
 - called 
cylinder group
cylinder group
In ext2-4, groups are ranges of blocks
In ext2-4, groups are ranges of blocks
 - called 
 - called 
block group
block group
Placement 
Placement 
Technique 3:
Technique 3:
Super Rotation
Super Rotation
group 1
0
G
2G
3G
group 2
group 3
Is it useful to have multiple super blocks?
 
Yes, if some (but not all) fail.
undefined
Problem
Problem
Old FS: 
All super-block
 
copies are on
 
the top platter.
Correlated failures!  
What if 
 top platter d
ies?
solution: for each group, store super-block at different offset
Technique 4:
Technique 4:
Block Size
Block Size
Observation: 
Observation: 
Doubling the block size for the old FS over
Doubling the block size for the old FS over
doubled performance.
doubled performance.
Strategy: choose block size so 
Strategy: choose block size so 
never 
never 
read more than two
read more than two
indirect blocks
indirect blocks
 (i.e., double indirect)
 (i.e., double indirect)
 to 
 to 
reach 
reach 
data block
data block
.
.
With 4KB block size, how large of a file can they support?
With 4KB block size, how large of a file can they support?
(Blocksize / 4 bytes) * (Blocksize / 4bytes) * Blocksize = 4 GB
Blocksize^3 = 256 MB 
Technique:
Technique:
Large
Large
R
R
 Blocks
 Blocks
Most file are very
Most file are very
small, even today!
small, even today!
Observation: Doubling block size for old FS over doubled performance
Why not make blocks huge?
Large
Large
R
R
 Blocks
 Blocks
Lots of waste 
Lots of waste 
due to internal fragment in most 
due to internal fragment in most 
block
block
s
s
Time vs. Space
Time vs. Space
 t
 t
radeoffs…
radeoffs…
Solution: Fragments
Solution: Fragments
Hybrid
Hybrid
 – combine best of large blocks and best of small
 – combine best of large blocks and best of small
blocks
blocks
Use large block when file is large enough
Use large block when file is large enough
Introduce “fragment” for files that use parts of blocks
Introduce “fragment” for files that use parts of blocks
Only tail of file uses fragments
Only tail of file uses fragments
Fragment Example
Fragment Example
Block size = 4096
Block size = 4096
Fragment size = 1024
Fragment size = 1024
bits: 0000
 
0000
 
1111
 
0010
  
blk1
 
 blk2
 
 blk3
 
 blk4
Whether addr refers to block or fragment is inferred by file offset
What about when files grow?
Must copy fragments to new block if no room to grow
undefined
AAAA
file, size 5KB
file, size 2KB
B
A
B
undefined
AAAA
file, size 6KB
file, size 2KB
B
A
B
A
append A to first file
undefined
AAAA
file, size 6KB
file, size 2KB
B
A
B
A
undefined
AAAA
file, size 7KB
file, size 2KB
B
A
B
A
append A to first file
Not allowed to use fragments across multiple blocks!
What to do instead?
A
undefined
AAAA
AAAA
file, size 8KB
file, size 2KB
B
B
append A to first file, 
copy to fragments to new block
Optimal Write Size
Optimal Write Size
Writing less than a block is inefficient
Writing less than a block is inefficient
Solution: new API exposes optimal write size
Solution: new API exposes optimal write size
Smart Policy
Smart Policy
Where should new inodes and data blocks go?
group 1
0
G
2G
3G
group 2
group 3
Strategy
Strategy
 
Put related pieces of data near each other.
Put related pieces of data near each other.
Rules:
Rules:
1. Put directory entries near directory inodes.
1. Put directory entries near directory inodes.
2. Put inodes near directory entries.
2. Put inodes near directory entries.
3. Put data blocks near inodes.
3. Put data blocks near inodes.
Sound good?
Sound good?
Problem: File system is one big tree
Problem: File system is one big tree
All directories and files have a common root.
All directories and files have a common root.
All data in  same FS is related in some way
All data in  same FS is related in some way
Trying to put everything near everything else doesn’t make any
Trying to put everything near everything else doesn’t make any
choices!
choices!
Revised Strategy
Revised Strategy
Put more-related pieces of data near each other
Put more-related pieces of data near each other
Put less-related pieces of data 
Put less-related pieces of data 
far
far
 from each other
 from each other
FFS developers used their best judgement
FFS developers used their best judgement
inode
dir data
file inode
dir inode
B
1
B
2
B
3
B
1
B
2
Where to cut the tree and start
growing into another group?
pointer
related
many
FFS puts dir inodes
 
in a new group
break
ls
” is fast on directories
 
with many files.
Preferences
Preferences
F
i
l
e
 
i
n
o
d
e
s
:
 
a
l
l
o
c
a
t
e
 
i
n
 
s
a
m
e
 
g
r
o
u
p
 
w
i
t
h
 
d
i
r
D
i
r
 
i
n
o
d
e
s
:
 
a
l
l
o
c
a
t
e
 
i
n
 
n
e
w
 
g
r
o
u
p
 
w
i
t
h
 
f
e
w
e
r
 
u
s
e
d
 
i
n
o
d
e
s
 
t
h
a
n
a
v
e
r
a
g
e
 
g
r
o
u
p
F
i
r
s
t
 
d
a
t
a
 
b
l
o
c
k
:
 
a
l
l
o
c
a
t
e
 
n
e
a
r
 
i
n
o
d
e
O
t
h
e
r
 
d
a
t
a
 
b
l
o
c
k
s
:
 
a
l
l
o
c
a
t
e
 
n
e
a
r
 
p
r
e
v
i
o
u
s
 
b
l
o
c
k
Problem: Large Files
Problem: Large Files
S
ingle large file can 
fill 
nearly all of a group
D
isplaces data for many small files
Better 
to do one seek for large file than one seek for each of
many small files
inode
dir data
file inode
dir inode
B
1
B
2
B
3
B
1
B
2
Large files: wh
ere to cut the tree and start
growing into another group?
pointer
related
many
break
I
n
d
B
3
B
4
Define “large” as requiring an 
indirect
 block
break
Starting at indirect (e.g., after 48 KB)
put blocks in a new block group.
Preferences
Preferences
F
i
l
e
 
i
n
o
d
e
s
:
 
a
l
l
o
c
a
t
e
 
i
n
 
s
a
m
e
 
g
r
o
u
p
 
w
i
t
h
 
d
i
r
D
i
r
 
i
n
o
d
e
s
:
 
a
l
l
o
c
a
t
e
 
i
n
 
n
e
w
 
g
r
o
u
p
 
w
i
t
h
 
f
e
w
e
r
 
u
s
e
d
 
i
n
o
d
e
s
 
t
h
a
n
 
a
v
e
r
a
g
e
g
r
o
u
p
F
i
r
s
t
 
d
a
t
a
 
b
l
o
c
k
:
 
a
l
l
o
c
a
t
e
 
n
e
a
r
 
i
n
o
d
e
O
t
h
e
r
 
d
a
t
a
 
b
l
o
c
k
s
:
 
a
l
l
o
c
a
t
e
 
n
e
a
r
 
p
r
e
v
i
o
u
s
 
b
l
o
c
k
L
a
r
g
e
 
f
i
l
e
 
d
a
t
a
 
b
l
o
c
k
s
:
 
a
f
t
e
r
 
4
8
K
B
,
 
g
o
 
t
o
 
n
e
w
 
g
r
o
u
p
.
 
 
M
o
v
e
 
t
o
 
a
n
o
t
h
e
r
g
r
o
u
p
 
(
w
/
 
f
e
w
e
r
 
t
h
a
n
 
a
v
g
 
b
l
o
c
k
s
)
 
e
v
e
r
y
 
s
u
b
s
e
q
u
e
n
t
 
1
M
B
.
D
a
t
a
 
B
l
o
c
k
s
s
u
p
e
r
b
l
o
c
k
i
n
o
d
e
s
0
G
Group Descriptor 
Group Descriptor 
(aka Summary Block)
(aka Summary Block)
b
i
t
m
a
p
s
s
u
m
-
m
a
r
y
Tracks number of 
free inodes
 and 
data blocks
How does file system know which new group to pick?
Conclusion
Conclusion
First disk-aware file system
First disk-aware file system
Bitmaps
Bitmaps
Locality groups
Locality groups
Rotated superblocks
Rotated superblocks
Large blocks
Large blocks
Fragments
Fragments
Smart allocation policy
Smart allocation policy
FFS inspired modern files systems, including ext2 and ext3
FFS inspired modern files systems, including ext2 and ext3
FFS also introduced several new features:
FFS also introduced several new features:
 long file names
 long file names
 atomic rename
 atomic rename
 symbolic links
 symbolic links
Advice
Advice
All hardware is unique
All hardware is unique
Treat disk like disk!
Treat disk like disk!
Treat flash like flash!
Treat flash like flash!
Treat random-access memory like random-access memory!
Treat random-access memory like random-access memory!
Slide Note
Embed
Share

This content covers important topics related to file systems, such as improving system performance, preventing internal fragmentation, selecting block sizes, and organizing data blocks. Various file system case studies are discussed, including FFS and LFS, along with key concepts like super blocks, inodes, and data block storage. Insights on creating and traversing directories, allocating and populating inodes, and initializing data blocks are provided.

  • File Systems
  • Performance Optimization
  • Data Blocks
  • Inodes
  • File System Case Studies

Uploaded on Jul 20, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Announcements P3 Grading: Available Look at runtests.log for details (10 points for creating test cases) Contact TA (give login and discussion section) with questions P4: Threads (Part a and b) available Due Wednesday 11/18 at 9pm Exam 2: Answers with explanations available Exam 3: Thursday evening at 11/19 7:15-9:15 Fill out form AND send academic schedule if need alternate time by 5pm Similar in style to previous exams (very few questions about Virtualization; few about Concurrency) Read as we go along! Chapter 41

  2. UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 537 Introduction to Operating Systems Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Persistence: FAST FILE SYSTEM (FFS) Questions answered in this lecture: How to improve performance of complex system? Why do file systems obtain worse performance over time? How to choose the right block size? How to avoid internal fragmentation? How to place related blocks close to one another on disk?

  3. File-System Case Studies Local - FFS: Fast File System - LFS: Log-Structured File System Network - NFS: Network File System - AFS: Andrew File System

  4. Review: Basic Layout super block bit inodes Data Blocks maps 0 N regular data directories indirect blocks inodes data blocks What is stored as a data block?

  5. REVIEW: create /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data

  6. [traverse] create /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data read read read read Verify that bar does not already exist

  7. [allocate inode] create /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data read read read read read write

  8. [populate inode] create /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data read read read read read write read write Why must read bar inode? How to initialize inode?

  9. [add bar to /foo] create /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data read read read read read write read write write write Update inode (e.g., size) and data for directory

  10. open /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read read read read

  11. write to /foo/bar (assume file exists and has been opened) data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read write write write

  12. append to /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data

  13. append to /foo/bar (opened already) data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read

  14. [allocate block] append to /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read write

  15. [point to block] append to /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read write write

  16. [write to block] append to /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read write write write

  17. read /foo/bar assume opened data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data read read write

  18. close /foo/bar data bitmap inode bitmap root inode foo inode bar inode root data foo data bar data nothing to do on disk!

  19. Review: Locality Types Temporal Locality Spatial Locality time time Which type of locality is most interesting with a disk?

  20. Order Matters Slow Fast time time Implication for disk schedulers?

  21. Policy: Choose Inode, Data Blocks S i d I I I I I D D D D D D D D 8 0 7 15 D D D D D D D D 16 D D D D D D D D 24 23 31 Assuming all free, which should be chosen?

  22. Bad File Layout inode 0 S i d I I I I I D D D D D D D D 8 2 0 7 15 1 3 D D D D D D D D 16 D D D D D D D D 24 23 31

  23. Better File Layout inode 0 1 2 3 S i d I I I I I D D D D D D D D 8 0 7 15 D D D D D D D D 16 D D D D D D D D 24 23 31

  24. Best File Layout inode 0 1 2 3 S i d I I I I I D D D D D D D D 8 0 7 15 D D D D D D D D 16 D D D D D D D D 24 23 31 Can t do this for all files

  25. Fast File System: FFS (1980 s)

  26. System Building Beginner s approach 1. get idea 2. build it! measure then build Pro approach 1. identify existing state of the art 2. measure it, identify and understand problems 3. get idea (solutions often flow from deeply understanding problem) 4. build it!

  27. Measure Old FS State of the art: original UNIX file system super block inodes Data Blocks 0 N Free lists are embedded in inodes, data blocks Data blocks are 512 bytes Measure throughput for whole sequential file reads/writes Compare to theoretical max, which is disk bandwidth Old UNIX file system: achieved only 2% of potential. Why?

  28. Measurement 1: Aging? What is performance before/after aging? New FS: 17.5% of disk bandwidth Few weeks old: 3% of disk bandwidth Problem: FS is becomes fragmented over time Free list makes contiguous chunks hard to find Hacky Solutions: Occassional defrag of disk Keep freelist sorted

  29. Measurement 2: Block SIZE? How does block size affect performance? Try doubling it! Result: Performance more than doubled Why double the performance? Logically adjacent blocks not physically adjacent Only half as many seeks+rotations now required Why more than double the performance? Smaller blocks require more indirect blocks

  30. Old FS Summary Free list becomes scrambled random allocations Small blocks (512 bytes) Blocks laid out poorly long distance between inodes/data related inodes not close to one another Which inodes related? Inodes in same directory (ls l) Result: 2% of potential performance! (and worse over time) Problem: old FS treats disk like RAM!

  31. Solution: a disk-aware Primary File System Design Questions: Where to place meta-data and data on disk? How to use big blocks without wasting space?

  32. Placement Technique 1: Bitmaps super block bitmaps inodes Data Blocks 0 N Use bitmaps instead of free list Provides better speed, with more global view Faster to find contiguous free blocks

  33. Placement Technique 2: Groups fast super block bitmaps inodes Data Blocks 0 N before: whole disk How to keep inode close to data?

  34. Technique 2: Groups slow super block bitmaps inodes Data Blocks 0 N before: whole disk How to keep inode close to data?

  35. Technique 2: Groups slower super block bitmaps inodes Data Blocks 0 N before: whole disk How to keep inode close to data?

  36. Technique 2: Groups slowest super block bitmaps inodes Data Blocks 0 N before: whole disk How to keep inode close to data?

  37. Technique 2: Groups super block bitmaps inodes Data Blocks 0 N before: whole disk How to keep inode close to data?

  38. Technique 2: Groups S B I D S B I D S B I D 0 G 2G 3G group 1 group 2 group 3 How to keep inode close to data? Answer: Use groups across disks; Try to place inode and data in same

  39. Technique 2: Groups fast fast fast S B I D S B I D S B I D 0 G 2G 3G group 1 group 2 group 3 strategy: allocate inodes and data blocks in same group.

  40. Groups In FFS, groups were ranges of cylinders - called cylinder group In ext2-4, groups are ranges of blocks - called block group

  41. Placement Technique 3: Super Rotation S B I D S B I D S B I D 0 G 2G 3G group 1 group 2 group 3 Is it useful to have multiple super blocks? Yes, if some (but not all) fail.

  42. Problem Old FS: All super-block copies are on the top platter. Correlated failures! What if top platter dies? solution: for each group, store super-block at different offset

  43. Technique: LargeR Blocks Observation: Doubling block size for old FS over doubled performance Why not make blocks huge? Most file are very small, even today!

  44. LargeR Blocks 50 37.5 Percent 25 12.5 0 512 1024 2048 4096 Block Size Lots of waste due to internal fragment in most blocks Time vs. Space tradeoffs

  45. Solution: Fragments Hybrid combine best of large blocks and best of small blocks Use large block when file is large enough Introduce fragment for files that use parts of blocks Only tail of file uses fragments

  46. Fragment Example Block size = 4096 Fragment size = 1024 bits: 0000 0000 1111 0010 blk2 blk3 blk4 blk1 Whether addr refers to block or fragment is inferred by file offset What about when files grow? Must copy fragments to new block if no room to grow

  47. file, size 5KB file, size 2KB AAAA B A B

  48. file, size 6KB file, size 2KB AAAA B A B A append A to first file

  49. file, size 6KB file, size 2KB AAAA B A B A

  50. file, size 7KB file, size 2KB AAAA B A B A A append A to first file Not allowed to use fragments across multiple blocks! What to do instead?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#