B-Trees: A Comprehensive Overview

Data Structures
Haim Kaplan 
and
 Uri Zwick
November 2014
Lecture 5
B-Trees
Idealized computation
 model
CPU
RAM
 
Each 
instruction
 takes one unit of time
 
Each 
memory access 
takes one unit of time
A more realistic
 model
 
Each 
level 
much larger but much slower
 
Information moved in 
blocks
A simplified
 I/O mode
CPU
RAM
Disk
 
Each 
block 
is of size 
B
 
Count both 
operations
 and 
I/O operations
 
I/O operations 
are much more expensive
Data structures in
 the I/O model
 
Linked list and binary search trees
behave 
poorly
 in the I/O model.
 
Each pointer followed may cause a 
cache miss
 
We need an alternative for binary search trees
that is more suited to the I/O model
 
B-Trees !
key
 < 10
10 
< key 
< 25
25 < 
key
 < 42
42 < 
key
A 
4
-node
3
 keys
4
-way branch
k
0
k
r−3
k
r−2
An 
r
-node
r−
1
 keys
r
-way branch
k
1
k
2
c
0
c
1
c
2
c
r
−2
c
r
−1
B-Trees / 
(
d
,2
d
)
-Trees
[Bayer-McCreight (1972)]
 
Each node holds between 
d
−1
 and 
2
d
 −1
 keys
 
(Each non-leaf node has between 
d
 and 
2
d
 children)
 
The root is special:
has between 
1
 and 
2
d
 −1
 keys
(Has between 
2
 and 
2
d
 children, if not a leaf)
 
All leaves are at the 
same depth
 
d
minimum 
degree
A 
(2,4)
-tree
13
1   3
30 40 50
16 17
5
7
11
B-Tree 
with minimal degree 
d
=2
4    6    10
15  28
14
Node structure
 
r 
 
the actual degree
 
key
[0],…
key
[
r
−2] 
 
the keys
 
child
[0],…
child
[
r
−1] 
 
the children
 
leaf 
 
is the node a leaf?
 
Possibly a different representation for leaves
 
item
[0],…
item
[
r
−2] 
 
the associated items
Room for 
2d
1
 
keys and 
2d 
child pointers
The height of 
B-Trees
 
At depth 
1
 there are at least 
2
 nodes
At depth 
2
 there are at least 
2
d
 nodes
At depth 
3
 there are at least 
2
d
2
 
nodes
At depth 
h
 there are at least 
2
d
h−
1
 
nodes
B-Trees 
– What are they good for?
 
Large degree 
B-trees
 are used to represent
very large dictionaries stored on 
disks
. The
minimum degree 
d
 is chosen according to
the size of a disk block.
Smaller degree 
B-trees
 used for internal-
memory dictionaries to reduce the
number of 
cache-misses
.
B-trees
 with 
d
=2
, i.e., 
(2,4)-
trees,
are very similar to 
Red
-Black trees.
WAVL
 Trees vs. 
B-Trees
 
n
 = 2
30 
 10
9
30  ≤  
height of 
WAVL
 
Tree
  ≤  60
Up to 
60
 pages read from disk
Height of 
B-Tree 
with 
d
=
 2
10 
=
1024 
is only 3
Each 
B-Tree 
node resides in a block/page
Only 3 (or 4) pages read from disk
Disk access 
 1 millisecond 
(10
-3 
sec)
Memory access 
 100 nanosecond (10
-7 
sec)
 
Number of I/Os - 
log
d
n
 
Number of operations – 
O(
d 
log
d
n
)
 
Number of ops with binary search – 
O(log
2
d 
log
d
n
) = O(log
2
n
)
Look for 
k
 in
node 
x
 
Look for 
k
 in the
subtree
 of node 
x
Splitting an 
overflowing
 node
(I.e., a node with 
2
d
 
keys /
 
2
d
+1
 
children)
Insert
14
13
1   3
30 40 50
16 17
6
11
5    10
 Insert(T,2)
15  28
Insert
13
6
11
5    10
 Insert(T,2)
1   2   3
14
30 40 50
16 17
15  28
Insert
13
6
11
5    10
1   2   3
 Insert(T,4)
14
30 40 50
16 17
15  28
Insert
13
6
11
5    10
 Insert(T,4)
14
30 40 50
16 17
15  28
1   2   3  4
Split
13
6
11
5    10
 Insert(T,4)
1   2   3  4
14
30 40 50
16 17
15  28
Split
13
6
11
5    10
 Insert(T,4)
3    4
1
2
14
30 40 50
16 17
15  28
Split
13
6
11
 Insert(T,4)
3    4
1
14
30 40 50
16 17
15  28
2    5    10
Splitting an 
overflowing
 node
(I.e., a node with 
2
d
 
keys /
 
2
d
+1
 
children)
 
Number of I/Os – 
O(1)
 
Number of operations – 
O(
d
)
Splitting an 
overflowing
 root
13
6
11
 Insert(T,7)
3    4
Another insert
14
30 40 50
16 17
15  28
2     5    10
1
13
11
 Insert(T,7)
3    4
Another insert
14
30 40 50
16 17
15  28
6    7
2     5    10
1
13
11
 Insert(T,8)
3    4
and another insert
6    7
14
30 40 50
16 17
15  28
2     5    10
1
13
11
 Insert(T,8)
3    4
and another insert
6     7    8
14
30 40 50
16 17
15  28
2     5    10
1
13
11
 Insert(T,9)
3    4
6    7    8   9
and the last for today
14
30 40 50
16 17
15  28
2     5    10
1
Split
13
11
 Insert(T,9)
3    4
8    9
14
30 40 50
16 17
15  28
6
7
2     5    10
1
Split
13
11
 Insert(T,9)
3    4
8  9
14
30 40 50
16 17
15  28
6
2   5   7  10
1
Split
13
11
 Insert(T,9)
3    4
8    9
14
30 40 50
16 17
15  28
6
7    10
2
5
1
Split
11
 Insert(T,9)
3    4
8    9
14
30 40 50
16 17
6
1
7    10
15  28
5   13
2
Insert – 
Bottom-up
 
Find the insertion point by a 
downward search
Insert the key in the appropriate place
If the current node is
 
overflowing
, split it
If its parent is now overflowing, split it, etc.
 
Disadvantages:
Need both a 
downward scan 
and an 
upward scan
Nodes
 are temporarily 
overflowing
Need to keep parents on a stack
 
Note: We do 
not
 maintain parent pointers. (Why?)
Exercise: 
(
d
,2
d
1)
-Trees
 
Show that essentially the same 
bottom-up
insertion technique also works
 
for 
(
d
,2
d
1)
-Trees
 
(
d
,2
d
)
-Trees are better than 
(
d
,2
d
-1)
-Trees
for at least two reasons:
 
They allow 
top-down
 insertions and deletions
 
The amortized number of 
split
/
fuse
operations per insertion/deletion is O(1)
Insert – 
Top
-
down
 
While conducting
 the search,
split
 
full
 children on the search path
before descending to them!
 
When the appropriate leaf it reached,
it is 
not full
, so
 the new 
key may be added!
 
If the root is 
full
, 
split
 it before
starting
 this process
Split-Root(
T
)
 
Number of I/Os – 
O(1)
 
Number of operations – 
O(
d
)
Split-Child(
x
,
i
)
key
[
i
]
x
x.child
[
i
]
key
[
i
]
x
x.child
[
i
]
 
Number of I/Os – 
O(1)
 
Number of operations – 
O(
d
)
Insert – 
Top-
down
 
While conducting
 the search,
split
 
full
 children on the search path
before descending to them!
 
Number of I/Os – 
O(log
d
n
)
 
Number of operations – 
O(
d 
log
d
n
)
 
Amortized no. of splits  
 
 
1/(
d
1)
(See bonus material)
Bonus
material
Number of splits
(Insertions only)
 
If 
n
 items are inserted into an
initially empty 
(
d
,2
d
)
-tree,
 then the
total number of splits is at most 
n
/(
d
1)
 
Amortized number of splits per insert 
 
1
/(
d
1)
Deletions from
 
B-Trees
 
As always, similar, but slightly
more complicated than insertions
 
To delete an item in an internal node,
replace it by its successor and delete successor
 
Deletion is slightly simpler for 
B
+
-Trees
B-Trees vs. B
+
-Trees
 
In a B-tree each node contains items and keys
In a B
+
-tree leaves contain items and keys.
Internal nodes contain keys to direct the search.
Keys in internal nodes are either keys of existing
items, or keys of items that were deleted.
Internal nodes may contain more keys.
When 
d
 is large, the extra space needed is negligible.
 
We continue with B-trees
Delete
22 28
20
30 40 50
24 26
14
 delete(T,26)
1    2
4    6
8    9
11  12
10  13
7   15
3
Delete
22 28
20
30 40 50
14
 delete(T,26)
1    2
4    6
8    9
11  12
10  13
7   15
3
24
Delete
22 28
20
30 40 50
14
 delete(T,13)
1    2
4    6
8    9
11  12
10  13
7   15
3
24
Delete (Replace with predecessor)
22 28
20
30 40 50
14
 delete(T,13)
1    2
4    6
8    9
11  12
10  12
7   15
3
24
Delete
22 28
20
30 40 50
14
 delete(T,13)
1    2
4    6
8    9
10  12
7   15
3
24
11
Delete
22 28
20
30 40 50
14
 delete(T,24)
1    2
4    6
8    9
10  12
7   15
3
24
11
Delete
22 28
20
30 40 50
14
 delete(T,24)
1    2
4    6
8    9
10  12
7   15
3
 
11
Delete (borrow from sibling)
22 30
20
14
 delete(T,24)
1    2
4    6
8    9
10  12
7   15
3
28
11
40 50
Borrow from left
 
Borrow from right
Delete
20
14
 delete(T,20)
1    2
4    6
8    9
7   15
3
28
11
40 50
22 30
10  12
Delete
22 30
14
 delete(T,20)
1    2
4    6
8    9
10  12
7   15
3
28
11
40 50
Delete (Fuse)
14
 delete(T,20)
1    2
4    6
8    9
10  12
7   15
3
11
40 50
22 28
30
Few more…
14
 delete(T,22)
1    2
4    6
8    9
10  12
7   15
3
11
40 50
22 28
30
14
 delete(T,22)
1    2
4    6
8    9
7   15
3
40 50
30
28
10  12
11
Few more…
14
 delete(T,28)
1    2
4    6
8    9
10  12
7   15
3
11
40 50
30
28
Few more…
14
 delete(T,28)
1    2
4    6
8    9
10  12
7   15
3
11
40 50
30
 
Few more…
Borrowing again
14
 delete(T,28)
1    2
4    6
8    9
10  12
7   15
3
11
40
30
50
Another one
14
 delete(T,30)
1    2
4    6
8    9
10  12
7   15
3
11
40
30
50
Another one
14
 delete(T,30)
1    2
4    6
8    9
10  12
7   15
3
11
40
50
Fuse
After Fuse
14
 delete(T,30)
1    2
4    6
8    9
10  12
7   15
3
11
40  50
Now we can borrow
14
 delete(T,30)
1    2
4    6
8    9
10  12
7   15
3
11
40  50
Now we can borrow
14
 delete(T,30)
1    2
4    6
8    9
7   12
3
11
40  50
10
15
Delete – 
Bottom-up
 
Delete an item from a leaf
 
If the current node is 
underflowing, 
i.e.,
has less than 
d
1
 keys, either 
borrow
 an item
from a sibling, or 
fuse
 with a sibling
 
If the item to be deleted is not in a leaf,
replace it by its successor
 and
delete the successor
 
Borrowing
 fixes the problem
 
Fusing
 may make the parent 
underflowing
Borrow from left
Borrow from right
Fuse
Delete – 
Top-
down
 
While conducting
 the search,
make sure that each child descended into
contains at least 
d
 keys
 
How?
Use 
Borrow
 or 
Fuse
 
Assume, at first,
 that the
 item to be deleted is in a leaf
 
When the item is located,
 it resides in a leaf
containing at least 
d
 keys, so it can be removed
Delete – 
Top
 down
While conducting
 the search,
make sure that each child you descend to
contains at least 
d
 keys
 
Borrow
 
Fuse
Delete – 
Top
 down
 
What if
 the item to be deleted is in 
an internal node?
 
Descend as before from the root until
the item to be deleted is located
 
Keep a pointer to the node containing the item
 
Carry on descending towards the successor,
making sure that nodes contain at least 
d
 keys
 
When the successor is found, delete it from its leaf
and use it to replace the item to be deleted
Bonus
material
      Number of fuse/splits
       (With 
bottom-up
 Insert/Delete)
 
The number of 
split
 and 
fuse
 operations in a
sequence of 
m
 insert and delete operations on
an initially empty 
(
d
,2
d
)-
tree is at most 
O(
m
)
 
Amortized no. of 
splits
/
fuses
 per update is 
O(1)
 
(
2
d
-node)  =  2
 
(
d
-node)  =  1
 
With
 
top-down
 insertions and deletions, the
amortized number of splits/fuses may be 
(log
d
n
)
Slide Note
Embed
Share

Comprehensive overview of B-Trees, a data structure optimized for I/O models. Exploring different computation models, B-Tree characteristics, node structure, and usage scenarios. Ideal for understanding efficient data organization for storage systems.

  • B-Trees
  • Data Structures
  • I/O Model
  • Computation Models

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Structures Lecture 5 B-Trees Haim Kaplan and Uri Zwick November 2014

  2. Idealized computation model CPU RAM Each instruction takes one unit of time Each memory access takes one unit of time

  3. A more realistic model CPU Cache Disk RAM Each level much larger but much slower Information moved in blocks

  4. A simplified I/O mode CPU RAM Disk Each block is of size B Count both operations and I/O operations I/O operations are much more expensive

  5. Data structures in the I/O model Linked list and binary search trees behave poorly in the I/O model. Each pointer followed may cause a cache miss We need an alternative for binary search trees that is more suited to the I/O model B-Trees !

  6. A 4-node 10 25 42 key < 10 10 < key < 25 25 < key < 42 42 < key 3 keys 4-way branch

  7. An r-node k0 k1 k2 kr 3kr 2 c0 c1 c2 cr 2 cr 1 r 1 keys r-way branch

  8. B-Trees / (d,2d)-Trees [Bayer-McCreight (1972)] d minimum degree Each node holds between d 1 and 2d 1 keys (Each non-leaf node has between d and 2d children) The root is special: has between 1 and 2d 1 keys (Has between 2 and 2d children, if not a leaf) All leaves are at the same depth

  9. A (2,4)-tree B-Tree with minimal degree d=2 13 4 6 10 15 28 1 3 5 7 11 14 16 17 30 40 50

  10. Node structure k0 k1 k2 kr-3 kr-2 c0 c1 c2 cr 2 cr 1 Room for 2d 1keys and 2d child pointers r the actual degree key[0], key[r 2] the keys item[0], item[r 2] the associated items child[0], child[r 1] the children leaf is the node a leaf? Possibly a different representation for leaves

  11. The height of B-Trees At depth 1 there are at least 2 nodes At depth 2 there are at least 2d nodes At depth 3 there are at least 2d2 nodes At depth h there are at least 2dh 1 nodes

  12. B-Trees What are they good for? Large degree B-trees are used to represent very large dictionaries stored on disks. The minimum degree d is chosen according to the size of a disk block. Smaller degree B-trees used for internal- memory dictionaries to reduce the number of cache-misses. B-trees with d=2, i.e., (2,4)-trees, are very similar to Red-Black trees.

  13. WAVL Trees vs. B-Trees n = 230 109 30 height of WAVL Tree 60 Up to 60 pages read from disk Height of B-Tree with d= 210 =1024 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access 1 millisecond (10-3 sec) Memory access 100 nanosecond (10-7 sec)

  14. Look for k in node x Look for k in the subtree of node x Number of I/Os - logdn Number of operations O(d logdn) Number of ops with binary search O(log2d logdn) = O(log2n)

  15. Splitting an overflowing node (I.e., a node with 2d keys / 2d+1 children) B A A C B C d 1 d d 1 d

  16. Insert 13 5 10 15 28 1 3 6 11 14 16 17 30 40 50 Insert(T,2)

  17. Insert 13 5 10 15 28 1 2 3 6 11 14 16 17 30 40 50 Insert(T,2)

  18. Insert 13 5 10 15 28 1 2 3 6 11 14 16 17 30 40 50 Insert(T,4)

  19. Insert 13 5 10 15 28 1 2 3 4 6 11 14 16 17 30 40 50 Insert(T,4)

  20. Split 13 5 10 15 28 1 2 3 4 6 11 14 16 17 30 40 50 Insert(T,4)

  21. Split 13 5 10 15 28 2 1 3 4 6 11 14 16 17 30 40 50 Insert(T,4)

  22. Split 13 2 5 10 15 28 1 3 4 6 11 14 16 17 30 40 50 Insert(T,4)

  23. Splitting an overflowing node (I.e., a node with 2d keys / 2d+1 children) B A A C B C d 1 d d 1 d

  24. Splitting an overflowing root T.root C T.root C d 1 d 1 d d Number of I/Os O(1) Number of operations O(d)

  25. Another insert 13 2 5 10 15 28 1 3 4 6 11 14 16 17 30 40 50 Insert(T,7)

  26. Another insert 13 2 5 10 15 28 1 3 4 6 7 11 14 16 17 30 40 50 Insert(T,7)

  27. and another insert 13 2 5 10 15 28 1 3 4 6 7 11 14 16 17 30 40 50 Insert(T,8)

  28. and another insert 13 2 5 10 15 28 1 3 4 6 7 8 11 14 16 17 30 40 50 Insert(T,8)

  29. and the last for today 13 2 5 10 15 28 1 3 4 6 7 8 9 11 14 16 17 30 40 50 Insert(T,9)

  30. Split 13 2 5 10 15 28 7 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert(T,9)

  31. Split 13 2 5 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert(T,9)

  32. Split 13 5 2 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert(T,9)

  33. Split 5 13 2 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert(T,9)

  34. Insert Bottom-up Find the insertion point by a downward search Insert the key in the appropriate place If the current node is overflowing, split it If its parent is now overflowing, split it, etc. Disadvantages: Need both a downward scan and an upward scan Nodes are temporarily overflowing Need to keep parents on a stack Note: We do not maintain parent pointers. (Why?)

  35. Exercise: (d,2d1)-Trees Show that essentially the same bottom-up insertion technique also works for (d,2d 1)-Trees (d,2d)-Trees are better than (d,2d-1)-Trees for at least two reasons: They allow top-down insertions and deletions The amortized number of split/fuse operations per insertion/deletion is O(1)

  36. Insert Top-down While conducting the search, split full children on the search path before descending to them! If the root is full, split it before starting this process When the appropriate leaf it reached, it is not full, so the new key may be added!

  37. Split-Root(T) T.root C T.root C d 1 d 1 d 1 d 1 Number of I/Os O(1) Number of operations O(d)

  38. Split-Child(x,i) key[i] x key[i] x B A A C B x.child[i] x.child[i] C d 1 d 1 d 1 d 1 Number of I/Os O(1) Number of operations O(d)

  39. Insert Top-down While conducting the search, split full children on the search path before descending to them! Number of I/Os O(logdn) Number of operations O(d logdn) Amortized no. of splits 1/(d 1) (See bonus material)

  40. Number of splits (Insertions only) If n items are inserted into an initially empty (d,2d)-tree, then the total number of splits is at most n/(d 1) Amortized number of splits per insert 1/(d 1)

  41. Deletions from B-Trees As always, similar, but slightly more complicated than insertions To delete an item in an internal node, replace it by its successor and delete successor Deletion is slightly simpler for B+-Trees

  42. B-Trees vs. B+-Trees In a B-tree each node contains items and keys In a B+-tree leaves contain items and keys. Internal nodes contain keys to direct the search. Keys in internal nodes are either keys of existing items, or keys of items that were deleted. Internal nodes may contain more keys. When d is large, the extra space needed is negligible. We continue with B-trees

  43. Delete 7 15 3 10 13 22 28 30 40 50 20 24 26 14 1 2 4 6 11 12 8 9 delete(T,26)

  44. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,26)

  45. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  46. Delete (Replace with predecessor) 7 15 3 10 12 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  47. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,13)

  48. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,24)

  49. Delete 7 15 3 10 12 22 28 30 40 50 20 11 14 1 2 4 6 8 9 delete(T,24)

  50. Delete (borrow from sibling) 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 20 28 40 50 delete(T,24)

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#