Understanding B-Trees: Efficient Data Storage and Retrieval

Slide Note

B-Trees are balanced search trees designed for secondary storage devices, commonly used by databases. They can have many children, allowing for efficient data organization. The branching factor of B-Trees keeps their height low, making them ideal for minimizing disk I/O operations. This article explains the properties and characteristics of B-Trees, along with examples and exercises to enhance understanding.

ilyas Follow

Uploaded on Jul 10, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

B-trees Eduardo Laber David Sotelo

What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at minimizing disk I/O operations Main data structure used by DBMS to store and retrieve information

What are B-trees? Nodes may have many children (from a few to thousands) Branching factor can be quite large Every B-tree of n keys has height O(log n) In practice, its height is smaller than the height of an AVL-Tree

B-trees and Branching Factor

Definition of B-trees B-tree is a rooted tree containing the following five properties: 1. Every node x has the following attributes: a) x.n, the number of keys stored in node x b) The x.n keys: x.key1 x.key2 ... x.keyx.n c) The boolen x.leaf indicating if x is a leaf or an internal node

Definition of B-trees 2. If x is an internal node it contains x.n + 1 pointers x.p1, x.p2, ... , x.p(x.n + 1) to its children 3. The keys x.keyiseparate ranges of trees stored in each subtree (x.pi, x.pi+1 ) 4. All leaves have the same depth == tree s height.

Definition of B-trees 5. Bounds on the number of keys of a node: Let B be a positive integer representing the order of the B-tree. Every node (except the root) must have at least B keys. Every node (except the root) must have at most 2B keys. Root is free to contain between 1 and 2B nodes (why?)

Example of a B-tree

Exercise 1 Enumerate all valid B-trees of order 2 that represent the set {1, 2, ... , 8}

Exercise 1 Solution: 4 5 1 2 3 5 6 7 8 1 2 3 4 6 7 8 3 6 1 2 4 5 7 8

The height of a B-tree Theorem : Let h be the height of a B-tree of n keys and order B > 1. Then: h log B (n+1)/2 Proof: Root contains at least one key. All other nodes contain at least B keys At least one key at depth 0 At least 2B keys at depth 1 At least 2B2 + B keys at depth 2 At least 2Bi + Bi-1 + Bi-2 + ... + B keys at depth i

Proof (continued) h = i + i 1 2 n B 1 + 1 h 1 B + 1 2 n 1 B + 1 n h h 2 1 n B B 2 + 1 n log h B 2

Searching a B-tree Similar to searching a binary search tree. Multiway branching decision according to the number of the node s chidren. Recursive procedure with a time complexity of O(B log B n) for a tree of n keys and order B.

Searching a B-tree B-TREE-SEARCH (x, k) 1 i = 1 2 whilei x.n and k > x.keyido i = i + 1 3 ifi x.n and k == x.keyithen return (x, i) 4 if x.leaf then return NIL 5 else DISK-READ(x.pi) return B-TREE-SEARCH (x.pi , k)

Searching a B-tree Search for the key F J K P S C G D E F H I L M Q R T U A B N O

Searching a B-tree Search for the key F J K P S C G D E F H I L M Q R T U A B N O

Searching a B-tree Search for the key F J K P S C G D E F H I L M Q R T U A B N O

Searching a B-tree Search for the key N J KP S C G D E F H I L M Q R T U A B N O

Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCHis O(B log B n) Proof: Number of recursive calls is equal to tree s height. The height of a B-tree is O(log B n) Cost between B and 2B iterations per call. Total of O(B log B n) steps.

Exercise 2 Suppose that B-TREE-SEARCH is implemented to use binary search rather than linear search within each node. Show that this changes makes the time complexity O(lg n), independently of how B might be chosen as a function of n.

Exercise 2 Solution: By using binary search the number of steps of the algorithm becomes O(lg B log B n) . Observe that log B n = lg n / lg B . Therefore O(lg B log B n) = O(lg n).

Linear or Binary B-tree search ? Lemma: If 1 < B < n then lg n B logB n Proof: ( ) B lg n = = = B B log log lg B n n n B B B lg B B B / n B n B ( ( ) ( ) lg ( ) B B B lg lg / lg / n B n B n n ) ( ) n 1 B B lg / lg n n n

Inserting a key into a B-tree The new key is always inserted into an existing leaf node (why?) Firstly we search for the leaf position at which to insert the new key. If such a node is full we split it. A split operation splits a full node around its median key into two nodes having B keys each. Median key moves up into splitted node s parent (insertion recursive call).

Split operation Inserting key F into a full node (B = 2) J A C E G K M O Q

Split operation Node found but already full J A C E F G K M O Q

Split operation Median key identified J A C E F G K M O Q

Split operation Splitting the node E J A C F G K M O Q

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M O Q

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M N O Q

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) E J N T X A C F G K M O Q U W Y Z SPLIT

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) N SPLIT E J T X A C F G K M O Q U W Y Z

Inserting a key into a B-tree B-TREE-INSERT (x, k, y) 1 i = 1 2 whilei x.n and k < x.keyido i = i + 1 3 x.n = x.n + 1 4 x.keyi = k 5 x.pi+1 = y 6 for j = x.n downto i+1 do 7 x.keyj = x.keyj-1 8 x.pj = x.pj-1 9 10 DISK-WRITE(x) end-for

Inserting a key into a B-tree B-TREE-INSERT (x, k) 11 if x.n > 2*B then 12 [m, z] = SPLIT (x) 13 if x.parent != NIL then 14 DISK-READ (x.parent) 15 end-if 16 else 17 x.parent = ALLOCATE-NODE() 18 DISK-WRITE (x) 19 root = x.parent 20 end-else 21 B-TREE-INSERT (x.parent, m, z) 22 end-if

Inserting a key into a B-tree SPLIT (x) 1 z = ALLOCATE-NODE() 2 m = FIND-MEDIAN (x) 3 COPY-GREATER-ELEMENTS(x, m, z) 4 DISK-WRITE (z) 5 COPY-SMALLER-ELEMENTS(x, m, x) 6 DISK-WRITE (x) 7 return [m, z]

Inserting a key into a B-tree Function B-TREE-INSERT has three arguments: The node x at which an element of key k should be inserted The key k to be inserted A pointer y to the left child of k to be used as one of the pointers of x during insertion process. There is a global variable named root which is a pointer to the root of the B-Tree. Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. The fields x.leaf should also be updated accordingly.

Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERTis O(B log B n) Proof: Recall that B-TREE-SEARCH function is called first and costs O(log n) by using binary search. Then, B-TREE-INSERT starts by visiting a node and proceeds upward. At most one node is visited per level/depth and only visited nodes can be splitted. A most one node is created during the insertion process. Cost for splitting is proportional to 2B. Number of visited nodes is equal to tree s height and the height of a B-tree is O(log B n). Cost between B and 2B iterations per visited node. Total of O(B log B n) steps.

Some questions on insertion Which split operation increases the tree s height? The split of the root of the tree. How many DISK-READ operations are executed by the insertion algorithm? Every node was read at least twice. Does binary search make sense here? Not exactly. We already pay O(B) to split a node (for finding the median).

Drawbacks of our insertion method Once that the key s insertion node is found it may be necessary to read its parent node again (due to splitting). DISK-READ/WRITE operations are expensive and would be executed al least twice for each node in the key s path. It would be necessary to store a nodes s parent or to use the recursion stack to keep its reference. (Mond and Raz, 1985) provide a solution that spends one DISK-READ/WRITE per visited node (See at CLRS)

Exercise 3 Show the results of inserting the keys E, H, B, A, F, G, C, J, D, I in order into an empty B-tree of order 1.

Exercise 3 Solution: (final configuration) E B G I A C D F H J

Exercise 4 Does a B-tree of order 1 is a good choice for a balanced search tree? What about the expression h log B (n+1)/2 when B = 1?

Deleting a key from a B-tree Analogous to insertion but a little more complicated. A key can be deleted from any node (not just a leaf) and can affect its parent and its children (insertion operation just affect parents). One must ensure that a node does not get to small during deletion (less than B keys). As a result deleting a node is the most complex operation on B-trees. It will be considered in 4 particular cases.

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements. Procedure: Just remove the key from the node.

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A C D F G K M O Q U W Y Z

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A D F G K M O Q U W Y Z

Deleting a key from a B-tree Case 2: The join procedure The key k1to be deleted is in a leaf x with exactly B elements. Let y be a node that is an adjacent brother of x. Suppose that y has exactly B elements. Procedure: Remove the key k1. Let k2be the keythat separates nodes x and y in their parent. Join the the nodes x and y and move the key k2 from the parent to the new joined node. If the parent of x becomes with B-1 elements and also has an adjacent brother with B elements, apply the join procedure recursively for the parent of x (seen as x) and its adjacent brother (seen as y).