Linear Hashing: An Extension to Extendible Hashing

linear hashing l.w
1 / 11
Embed
Share

Linear Hashing (LH) is an innovative technique that enhances Extendible Hashing by eliminating the need for a directory through the usage of a family of hash functions. By employing hash functions hi and looking at the last di bits, LH expands the range of directories to improve performance. The process involves bucket splitting in rounds where each round ends when all initial buckets have been split. LH ensures efficient data organization by avoiding overflow chains and maintaining a balanced structure without the need for a directory.

  • Hashing Techniques
  • Database Management
  • Linear Hashing
  • Bucket Splitting
  • Data Organization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Linear Hashing An extension to Extendible Hashing, in spirit. LH tries to avoid the creation/maintenance of a directory. Idea: Use a family of hash functions h0, h1, h2, ... N = initial # buckets = 2d0 h is some hash function (range is not 0 to N-1) hiconsists of applying h and looking at the last di bits, where di = d0 + i. hi+1 doubles the range of hi (similar to directory doubling) e.g., h = binary representation, d0 = 2, d1 = 3, d2 = 4, ... CSCIX370: Database Management

  2. Overview of LH File Directory avoided in LH by using overflow pages, and choosing bucket to split round-robin. Next pointer to current bucket, i.e., next bucket likely to be split. Note: bucket split need not be bucket where insertion and/or overflow occurred. Splitting proceeds in `rounds . Round ends when all NR initial (for round R) buckets are split. Buckets 0 to Next- 1 have been split; Next to NR-1 yet to be split. Current round number is Level. Level and Round used interchangeably. CSCIX370: Database Management

  3. Overview of LH File (Contd.) In the middle of a round. Buckets split in this round: If ( Level h is in this range, must use h Level+1 to decide if entry is in search key value ) Next Buckets to be split ( search key value ) Buckets that existed at the beginning of this round: this is the range of `split image' bucket. h Level `split image' buckets: created (through splitting of other buckets) in this round Level = R. CSCIX370: Database Management

  4. Example of Linear Hashing Level=0, N=4 starts with 4 buckets all buckets to be split in a round-robin fashion, starting from the first one PRIMARY PAGES h h 0 1 Next=0 44* 36* 32* 000 00 Data entry r with h(r)=5 9* 5* 25* 001 01 14*18*10*30* Primary bucket page 10 010 31*35* 7* 11* 011 11 (This info is for illustration only!) (The actual contents of the linear hashed file) CSCIX370: Database Management

  5. Example Inserting 43* 43 = 101011 h0 (43) = 11 => overflow overflow page exists! splitting occurs, but to the Next bucket Level=0 h OVERFLOW PAGES h PRIMARY PAGES 0 1 32* 000 00 Next=1 9* 5* 25* 001 01 14*18*10*30* 10 010 31*35* 7* 11* 43* 011 11 100 44* 36* 00 CSCIX370: Database Management

  6. Linear Hashing - insertions Insert: Find bucket by applying hLevel/ hLevel+1: If bucket to insert into is full: Add overflow page and insert data entry. (Maybe) Split Next bucket and increment Next. Can choose any criterion to `trigger split. Since buckets are split round-robin, long overflow chains don t develop! CSCIX370: Database Management

  7. Example: End of a Round (Inserting 37*,29*, 22*,66*,34*,50*) Level=1 PRIMARY PAGES OVERFLOW PAGES h1 h 0 Next=0 Level=0 00 000 32* PRIMARY PAGES OVERFLOW PAGES h1 h 0 001 01 9* 25* 32* 000 00 10 010 50* 66* 18* 10*34* 9* 25* 001 01 011 11 35* 11* 43* 66* 10 18* 10* 34* 010 Next=3 100 00 44* 36* 43* 7* 11* 31* 35* 011 11 101 11 5* 29* 37* 44*36* 100 00 14* 30* 22* 110 10 5* 37*29* 101 01 14* 30* 22* 31*7* 11 111 110 10 back to deletion CSCIX370: Database Management

  8. Linear Hashing - Searching Search: To find bucket for data entry r, find hLevel(r): If hLevel(r) in range `Next to NR-1 , r belongs here. Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NR ; must apply hLevel+1(r) to find out. CSCIX370: Database Management

  9. LH Deletion Inverse of insertion. If last bkt is empty, remove it and decrement Next. More generally, can combine last bkt with its split image even if non- empty. Criterion may be based on bkt occupancy level. CSCIX370: Database Management

  10. LH Deletion (example) After deleting 14*, 22* Level=0 Level=0 PRIMARY PAGES OVERFLOW PAGES h1 h 0 h1 h 0 32* 000 00 32* 000 00 Delete 30* 9* 25* 001 01 Next=2 9* 25* 001 01 66* 10 18* 10* 34* 010 66* 10 18* 10* 34* 010 Next=3 43* 7* 11* 31* 35* 011 11 43* 7* 11* 31* 35* 011 11 44*36* 100 00 44*36* 100 00 5* 37*29* 101 01 5* 37*29* 101 01 30* 110 10 CSCIX370: Database Management

  11. Summary Hash-based indexes: best for equality searches. Static Hashing can lead to long overflow chains. EH avoids overflow pages by splitting a full bucket when a new data entry is to be added to it. Directory to keep track of buckets, doubles periodically. Can get large with skewed data; additional I/O if this does not fit in main memory. LH avoids directory by splitting buckets round-robin, and using overflow pages. Overflow pages not likely to be long. CSCIX370: Database Management

More Related Content