Data Structures and Algorithms: Open Addressing Techniques

undefined
Lecture 12:
Open Addressing
Data Structures and
Algorithms
CSE 373 19 SP - KASEY CHAMPION
1
Administrivia
 
Exercise 2 due tonight.
-
Make sure you’re assigning pages properly please!
 
Exercise 3 out sometime tonight.
 
Midterm in one week!
 
For the midterm, you are allowed one 8.5”x11” sheet of paper (both sides) for notes
-
I strongly recommend you handwrite your note sheet.
-
But you are free to generate it with a computer if you prefer.
 
Idea for note sheet: in the real-world you can often google stuff,
write down what you would lookup. It should also help you study.
 
We will provide you identities, we’ll post the sheet in the exam resources early next week.
CSE 373 19 SP - KASEY CHAMPION
2
Midterm Topics (not exhaustive)
CSE 373 19 SP - KASEY CHAMPION
3
 
BST and AVL Trees
-
Binary Search Property, Balance Property
-
Insertions, Retrievals
-
AVL rotations
 
Hashing
-
Understanding hash functions
-
Insertions and retrievals from a table
-
Collision resolution strategies: chaining, linear
probing, quadratic probing, double hashing
 
Projects
-
ArrayDictionary
-
DoubleLinkedList
Resizing
CSE 373 SU 19 - ROBBIE WEBER
4
pollEV.com/cse373su19
Can we just copy over our
old chains?
Warm Up
CSE 373 SP 18 - KASEY CHAMPION
5
 
Consider a StringDictionary using separate chaining with an internal capacity of 10. Assume
our buckets are implemented using a LinkedList. Use the following hash function:
 
public int hashCode(String input) {
 
   return input.length() % arr.length;
 
}
 
Now, insert the following key-value pairs. What does the dictionary internally look like?
 
(“cat”, 1) (“bat”, 2) (“mat”, 3) (“a”, 4) (“abcd”, 5) (“abcdabcd”, 6) (“five”, 7) (“hello world”, 8)
(“cat”, 1)
(“a”, 4)
(“bat”, 2)
(“five”, 7)
(“mat”, 3)
(“abcd”, 5)
(“hello world”, 8)
(“abcdabcd”, 6)
Review: 
Handling Collisions
 
Solution 1: Chaining
 
Each space holds a “
bucket
” that can store multiple values. Bucket is often implemented
with a LinkedList
CSE 373 SP 18 - KASEY CHAMPION
6
Handling Collisions
 
 
Solution 2: Open Addressing
 
Resolves collisions by choosing a different location to store a value if natural choice is
already full.
 
Type 1: Linear Probing
 
If there is a collision, keep checking the next element until we find an open spot.
 
int findFinalLocation(Key s)
 
     int naturalHash = this.getHash(s);
     int index = natrualHash % TableSize;
 
while (index in use) {
  
i++;
 
     
 
index = (naturalHash + i) % TableSize;
 
}
 
return index;
CSE 373 SP 18 - KASEY CHAMPION
7
Linear Probing
CSE 373 SP 18 - KASEY CHAMPION
8
Insert the following values into the Hash Table using a hashFunction of % table size and
linear probing to resolve collisions
1, 5, 11, 7, 12, 17, 6, 25
 
1
 
5
 
11
 
7
 
12
 
17
 
6
 
25
Linear Probing
CSE 373 SP 18 - KASEY CHAMPION
9
Insert the following values into the Hash Table using a hashFunction of % table size and
linear probing to resolve collisions
38, 19, 8, 109, 10
 
38
 
19
 
8
 
8
 
109
 
10
 
Problem:
Linear probing causes clustering
Clustering causes more looping when probing
 
Primary Clustering
When probing causes long chains of
occupied slots within a hash table
3 Minutes
Runtime
 
 
When is runtime good?
 
When we hit an empty slot
-
(or an empty slot is a very short distance away)
 
 
When is runtime bad?
 
When we hit a “cluster”
 
 
Maximum Load Factor?
 
λ at most 1.0
 
 
When do we resize the array?
 
λ ≈ ½  is a good rule of thumb
CSE 373 SP 18 - KASEY CHAMPION
10
2 Minutes
Can we do better?
CSE 373 SP 18 - KASEY CHAMPION
11
Quadratic Probing
CSE 373 SP 18 - KASEY CHAMPION
12
 
(49 % 10 + 0 * 0) % 10 = 9
(49 % 10 + 1 * 1) % 10 = 0
 
(58 % 10 + 0 * 0) % 10 = 8
(58 % 10 + 1 * 1) % 10 = 9
(58 % 10 + 2 * 2) % 10 = 2
 
89
 
18
 
49
Insert the following values into the Hash Table using a hashFunction of % table size and
quadratic probing to resolve collisions
89, 18, 49, 58, 79, 27
 
58
 
79
 
(79 % 10 + 0 * 0) % 10 = 9
(79 % 10 + 1 * 1) % 10 = 0
(79 % 10 + 2 * 2) % 10 = 3
 
Problems:
If λ≥ ½ we might never find an empty spot
Infinite loop!
Can still get clusters
 
27
 
Now try to insert 9.
 
Uh-oh
 
Quadratic Probing
Secondary Clustering
CSE 373 SP 18 - KASEY CHAMPION
15
Insert the following values into the Hash Table using a hashFunction of % table size and
quadratic probing to resolve collisions
19, 39, 29, 9
 
39
 
29
 
19
 
9
 
Secondary Clustering
When using quadratic probing sometimes need
to probe the same sequence of table cells, not
necessarily next to one another
3 Minutes
Probing
-
h(k) = the natural hash
-
h’(k, i) = resulting hash after probing
-
i = iteration of the probe
-
T = table size
 
Linear Probing:
 
h’(k, i) = (h(k) + i) % T
 
Quadratic Probing
 
h’(k, i) = (h(k) + i
2
) % T
CSE 373 SP 18 - KASEY CHAMPION
16
Double Hashing
 
 
Probing causes us to check the same indices over and over- can we check different ones instead?
 
 
Use a second hash function!
 
h’(k, i) = (h(k) + i * g(k)) % T
 
 
int findFinalLocation(Key s)
 
     int naturalHash = this.getHash(s);
     int index = natrualHash % TableSize;
 
while (index in use) {
  
i++;
 
     
 
index = (naturalHash + 
i*jumpHash(s)
) % TableSize;
 
}
 
return index;
CSE 373 SP 18 - KASEY CHAMPION
17
 
<- Most effective if g(k) returns value relatively prime to table size
Second Hash Function
 
 
Effective if g(k) returns a value that is 
relatively prime
 to table size
-
If T is a power of 2, make g(k) return an odd integer
-
If T is a prime, make g(k) return anything except a multiple of the TableSize
CSE 373 SP 18 - KASEY CHAMPION
18
Resizing: Open Addressing
Running Times
CSE 332 SU 18 
 ROBBIE WEBER
In-Practice
Summary
 
 
1. Pick a hash function to:
-
Avoid collisions
-
Uniformly distribute data
-
Reduce hash computational costs
 
2. Pick a collision strategy
-
Chaining
-
LinkedList
-
AVL Tree
-
Probing
-
Linear
-
Quadratic
-
Double Hashing
CSE 373 SP 18 - KASEY CHAMPION
22
No clustering
Potentially more “compact” (λ can be higher)
Managing clustering can be tricky
Less compact (keep 
λ
 < ½)
Array lookups tend to be a constant factor faster than traversing pointers
Summary
Hash functions with some additional properties
Cryptographic hash functions: A small change in the key completely changes the hash.
-
Commonly used in practice: SHA-1, SHA-265
-
verify file integrity. When you share a large file with someone, how do you know that
the other person got the exact same file?
-
Just compare hash of the file on both ends. Used by file sharing services (Google
Drive, Dropbox)
-
For password verification: Storing passwords in plaintext is insecure. So your
passwords are stored as a hash.
-
Digital signatures
-
Lots of other crypto applications
Other Applications of Hashing
CSE 373 AU 18 
 SHRI MARE
24
Other Applications of Hashing
 
Locality Sensitive Hashing – hash functions that map similar keys to similar hashes.
 
Finding similar records: Records with similar but not identical keys
-
Spelling suggestion/corrector applications
-
Audio/video fingerprinting
-
Clustering
 
- Finding similar substrings in a large collection of strings
-
Genomic databases
-
Detecting plagiarism
 
- Geometric hashing: Widely used in computer graphics and computational
geometry
Extra optimizations
CSE 373 SP 18 - KASEY CHAMPION
26
Wrap Up
Slide Note
Embed
Share

Learn about open addressing and handling collisions in data structures and algorithms. Dive into topics like BST, AVL trees, hashing functions, resizing arrays, and more. Prepare for the midterm and explore efficient ways to manage data storage.

  • Data Structures
  • Algorithms
  • Open Addressing
  • Collisions
  • Resizing

Uploaded on Feb 16, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Lecture 12: Data Structures and Algorithms Open Addressing CSE 373 19 SP - KASEY CHAMPION 1

  2. Administrivia Exercise 2 due tonight. -Make sure you re assigning pages properly please! Exercise 3 out sometime tonight. Midterm in one week! For the midterm, you are allowed one 8.5 x11 sheet of paper (both sides) for notes -I strongly recommend you handwrite your note sheet. -But you are free to generate it with a computer if you prefer. Idea for note sheet: in the real-world you can often google stuff, write down what you would lookup. It should also help you study. We will provide you identities, we ll post the sheet in the exam resources early next week. CSE 373 19 SP - KASEY CHAMPION 2

  3. Midterm Topics (not exhaustive) BST and AVL Trees - Binary Search Property, Balance Property - Insertions, Retrievals - AVL rotations Hashing - Understanding hash functions - Insertions and retrievals from a table - Collision resolution strategies: chaining, linear probing, quadratic probing, double hashing Projects - ArrayDictionary - DoubleLinkedList ADTs and Data structures - Lists, Stacks, Queues, Dictionaries - Array vs Node implementations of each - Design decisions! Asymptotic Analysis - Proving Big O by finding ? and ?0 - Modeling code runtime - Finding closed form of recurrences using tree method and master theorem - Looking at code models and giving simplified tight Big O runtimes - Definitions of Big O, Big Omega, Big Theta CSE 373 19 SP - KASEY CHAMPION 3

  4. Resizing Our running time in practice depends on ?. What do we do when ? is big? Resize the array! -Usually we double, that s not quite the best idea here -Increase array size to next prime number that s roughly double the current size -Prime numbers tend to redistribute keys, because you re now modding by a completely unrelated number. -If % TableSize = ? then %2*TableSize gives either ? or ? +TableSize. -Rule of thumb: Resize sometime around when is somewhere around 1 if you re doing separate chaining. -When you resize, you have to rehash everything! pollEV.com/cse373su19 Can we just copy over our old chains? CSE 373 SU 19 - ROBBIE WEBER 4

  5. Review: Handling Collisions Solution 1: Chaining Solution 1: Chaining Each space holds a bucket that can store multiple values. Bucket is often implemented with a LinkedList Operation Array w/ indices as keys Average Case: Depends on average number of elements per chain best O(1) O(1 + ) average put(key,value) worst O(n) best O(1) Load Factor If n is the total number of key- value pairs Let c be the capacity of array Load Factor = ? O(1 + ) average get(key) worst O(n) best O(1) O(1 + ) average remove(key) ? worst O(n) CSE 373 SP 18 - KASEY CHAMPION 6

  6. Handling Collisions Solution 2: Open Addressing Solution 2: Open Addressing Resolves collisions by choosing a different location to store a value if natural choice is already full. Type 1: Linear Probing If there is a collision, keep checking the next element until we find an open spot. int findFinalLocation(Key s) int naturalHash = this.getHash(s); int index = natrualHash % TableSize; while (index in use) { i++; index = (naturalHash + i) % TableSize; } return index; CSE 373 SP 18 - KASEY CHAMPION 7

  7. Linear Probing Insert the following values into the Hash Table using a hashFunction of % table size and linear probing to resolve collisions 1, 5, 11, 7, 12, 17, 6, 25 0 1 2 3 4 5 6 7 8 9 6 17 7 1 12 25 5 11 CSE 373 SP 18 - KASEY CHAMPION 8

  8. 3 Minutes Linear Probing Insert the following values into the Hash Table using a hashFunction of % table size and linear probing to resolve collisions 38, 19, 8, 109, 10 0 1 2 3 4 5 6 7 8 9 10 8 38 8 19 109 Problem: Primary Clustering When probing causes long chains of occupied slots within a hash table Linear probing causes clustering Clustering causes more looping when probing CSE 373 SP 18 - KASEY CHAMPION 9

  9. 2 Minutes Runtime When is runtime good? When is runtime good? When we hit an empty slot - (or an empty slot is a very short distance away) When is runtime bad? When is runtime bad? When we hit a cluster Maximum Load Factor? Maximum Load Factor? at most 1.0 When do we resize the array? When do we resize the array? is a good rule of thumb CSE 373 SP 18 - KASEY CHAMPION 10

  10. Can we do better? Clusters are caused by picking new space near natural index Solution 2: Open Addressing Solution 2: Open Addressing Type 2: Quadratic Probing Instead of checking ? past the original location, check ?2 from the original location. int findFinalLocation(Key s) int naturalHash = this.getHash(s); int index = natrualHash % TableSize; while (index in use) { i++; index = (naturalHash + i*i) % TableSize; } return index; CSE 373 SP 18 - KASEY CHAMPION 11

  11. Quadratic Probing Insert the following values into the Hash Table using a hashFunction of % table size and quadratic probing to resolve collisions 89, 18, 49, 58, 79, 27 0 1 2 3 4 5 6 7 8 9 58 18 79 27 89 49 (49 % 10 + 0 * 0) % 10 = 9 (49 % 10 + 1 * 1) % 10 = 0 Now try to insert 9. Uh-oh (58 % 10 + 0 * 0) % 10 = 8 (58 % 10 + 1 * 1) % 10 = 9 (58 % 10 + 2 * 2) % 10 = 2 Problems: If we might never find an empty spot Infinite loop! Can still get clusters (79 % 10 + 0 * 0) % 10 = 9 (79 % 10 + 1 * 1) % 10 = 0 (79 % 10 + 2 * 2) % 10 = 3 CSE 373 SP 18 - KASEY CHAMPION 12

  12. Quadratic Probing There were empty spots. What gives? Quadratic probing is not guaranteed to check every possible spot in the hash table. The following is true: If the table size is a prime number ?, then the first ?/2 probes check distinct indices. Notice we have to assume ? is prime to get that guarantee.

  13. 3 Minutes Secondary Clustering Insert the following values into the Hash Table using a hashFunction of % table size and quadratic probing to resolve collisions 19, 39, 29, 9 0 1 2 3 4 5 6 7 8 9 39 29 9 19 Secondary Clustering When using quadratic probing sometimes need to probe the same sequence of table cells, not necessarily next to one another CSE 373 SP 18 - KASEY CHAMPION 15

  14. Probing - h(k) = the natural hash - h (k, i) = resulting hash after probing - i = iteration of the probe - T = table size Linear Probing: Linear Probing: h (k, i) = (h(k) + i) % T Quadratic Probing Quadratic Probing h (k, i) = (h(k) + i2) % T CSE 373 SP 18 - KASEY CHAMPION 16

  15. Double Hashing Probing causes us to check the same indices over and over- can we check different ones instead? Use a second hash function! h (k, i) = (h(k) + i * g(k)) % T <- Most effective if g(k) returns value relatively prime to table size int findFinalLocation(Key s) int naturalHash = this.getHash(s); int index = natrualHash % TableSize; while (index in use) { i++; index = (naturalHash + i*jumpHash(s)) % TableSize; } return index; CSE 373 SP 18 - KASEY CHAMPION 17

  16. Second Hash Function Effective if g(k) returns a value that is relatively prime to table size -If T is a power of 2, make g(k) return an odd integer -If T is a prime, make g(k) return anything except a multiple of the TableSize CSE 373 SP 18 - KASEY CHAMPION 18

  17. Resizing: Open Addressing How do we resize? Same as separate chaining -Remake the table -Evaluate the hash function over again. -Re-insert. When to resize? -Depending on our load factor ? AND our probing strategy. -Hard Maximums: -If ? = 1,put with a new key fails for linear probing. -If ? > 1/2put with a new key might might fail for quadratic probing, even with a prime tableSize -And it might fail earlier with a non-prime size. -If ? = 1put with a new key fails for double hashing -And it might fail earlier if the second hash isn t relatively prime with the tableSize

  18. Running Times What are the running times for: insert Best: ?(1) Worst: ?(?)(we have to make sure the key isn t already in the bucket.) find Best: ?(1) Worst: ?(?) delete Best: ?(1) Worst: ?(?) CSE 332 SU 18 ROBBIE WEBER

  19. In-Practice For open addressing: We ll assume assume you ve set ? appropriately, and that all the operations are 1 . The actual dependence on ? is complicated see the textbook (or ask me in office hours) And the explanations are well-beyond the scope of this course.

  20. Summary 1. Pick a hash function to: - Avoid collisions - Uniformly distribute data - Reduce hash computational costs 2. Pick a collision strategy - Chaining - LinkedList - AVL Tree - Probing - Linear - Quadratic - Double Hashing No clustering Potentially more compact ( can be higher) Managing clustering can be tricky Less compact (keep < ) Array lookups tend to be a constant factor faster than traversing pointers CSE 373 SP 18 - KASEY CHAMPION 22

  21. Summary Separate Chaining -Easy to implement -Running times ?(1 + ?) in practice Open Addressing -Uses less memory (usually). -Various schemes: -Linear Probing easiest, but lots of clusters -Quadratic Probing middle ground, but need to be more careful about ?. -Double Hashing need a whole new hash function, but low chance of clustering. Which you use depends on your application and what you re worried about.

  22. Other Applications of Hashing Hash functions with some additional properties Cryptographic hash functions: A small change in the key completely changes the hash. -Commonly used in practice: SHA-1, SHA-265 -verify file integrity. When you share a large file with someone, how do you know that the other person got the exact same file? -Just compare hash of the file on both ends. Used by file sharing services (Google Drive, Dropbox) -For password verification: Storing passwords in plaintext is insecure. So your passwords are stored as a hash. -Digital signatures -Lots of other crypto applications CSE 373 AU 18 SHRI MARE 24

  23. Other Applications of Hashing Locality Sensitive Hashing hash functions that map similar keys to similar hashes. Finding similar records: Records with similar but not identical keys -Spelling suggestion/corrector applications -Audio/video fingerprinting -Clustering - Finding similar substrings in a large collection of strings -Genomic databases -Detecting plagiarism - Geometric hashing: Widely used in computer graphics and computational geometry

  24. Extra optimizations Idea 1: Idea 1: Take in better keys Take in better keys -Really up to your client, but if you can control them, do! Idea 2: Idea 2: Optimize the bucket Optimize the bucket -Use an AVL tree instead of a Linked List -Java starts off as a linked list then converts to AVL tree when buckets get large Idea 3: Idea 3: Modify the array s internal capacity Modify the array s internal capacity -When load factor gets too high, resize array - Increase array size to next prime number that s roughly double the array size - Let the client fine-tune the ? that causes you to resize CSE 373 SP 18 - KASEY CHAMPION 26

  25. Wrap Up Hash Tables: -Efficient find, insert, delete in practice, under some assumptions in practice, under some assumptions -Items not in sorted order -Tons of real world uses - and really popular in tech interview questions. Need to pick a good hash function. -Have someone else do this if possible. -Balance getting a good distribution and speed of calculation. Resizing: -Always make the table size a prime number. -? determines when to resize, but depends on collision resolution strategy.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#