Hash Collisions and HashMap Efficiency in C++

undefined
CS106X –
Programming
Abstractions in C++
Cynthia Bailey Lee
 
  
 
CS2 in C++ Peer Instruction
Materials by 
Cynthia Bailey Lee
 is
licensed under a 
Creative Commons
Attribution-NonCommercial-
ShareAlike 4.0 International License
.
Permissions beyond the scope of this
license may be available
at 
http://peerinstruction4cs.org
.
 
Today’s Topics:
Map Interface, continued
 
1.
Hashing: collisions
The birthday problem
2.
Hashing: live coding
3.
Tree traversals (time permiting)
 
 
2
 
Hash collisions
 
We may need to worry about 
hash
collisions
 
 
 
Hash collision:
Two keys a, b, a≠b, have the same hash
code index (i.e. hash(a) = hash(b))
 
Hash key collisions
 
We can NOT overwrite the value the way
we would if it really were the same key
Use a linked list chain extending from
each hash table array entry to allow
multiple distinct keys to share a single
hash table location
 
Map Interface: hash-map.h
 
private:
 struct node {
 Key key;
 Value value;
 node *next;
 };
 node **buckets;
 int numBuckets;
 int count;
 int hash(const Key& key) const;
};
#include "hash-map-impl.h“
 
HashMap
 
 
HashMap
 
TRUE or FALSE:
We don’t actually need
to store the keys in the
nodes—we only really
need the key to find
which
 index, then store
the value there
A.
TRUE
B.
FALSE
 
Hash key collisions & Big-O of
HashMap
 
If there are no collisions, find/add/remove are
all 
O(1)
—just compute the key and go!
Two factors for ruining this magical land of
instantaneous lookup:
1.
Too-small table (worst case = 1)
2.
Hash function doesn’t produce a good
spread
int awfulHashFunction(char* input, int size) {
 
return 4;
}
Find/add/remove all 
O(n) worst case
 
HTTP
://
XKCD
.
COM
/221/
 
One simple, common approach
 
Hash table (array) size, 
numBuckets
, is some prime
number
Keep 
numBuckets
 big enough that 
at most 70% 
of
the slots are filled (resize as needed)
Somehow convert your data to an int and then
int array_index = input % numBuckets;
 
Can we still get super unlucky? YES
Hashmap analysis of O(1) is “average case”
For a nicely designed hash: 99.9% of the time
 
The Birthday Problem
 
Commonly mentioned fact of probability theory:
In a group of n people, what are the odds that 2
people have the same birthday?
(assume birthdays are uniformly distributed across 365
days, which is wrong in a few ways)
For n ≥ 23, it is more likely than not
For n ≥ 57, >99% chance
 
Means that hash tables
almost certainly have
at least one collision
 
http://commons.wikimedia.org/wiki/File:Birthday_Paradox.svg
Creative Commons share-alike license.
 
(Live coding of
hash-map.h)
 
 
Quickly Back to Trees:
Tree Traversals!
 
 
What does this print?
 
void traverse(Node *node) {
  if (node != NULL) {
     cout << node->key << " ";
     traverse(node->left);
     traverse(node->right);
  }
}
 
A.
A B C D E F
B.
A B D E C F
C.
D B E F C A
D.
D E B F C A
E.
Other/none/more
A
B
C
D
E
F
 
What does this print?
 
void traverse(Node *node) {
  if (node != NULL) {
     traverse(node->left);
     cout << node->key << " ";
     traverse(node->right);
  }
}
 
A.
A B C D E F
B.
A B D E C F
C.
D B E F C A
D.
D E B F C A
E.
Other/none/more
A
B
C
D
E
F
 
How can we get this to print our ABCs
in order?
 
void traverse(Node *node) {
  if (node != NULL) {
     cout << node->key << " ";
     traverse(node->left);
     traverse(node->right);
  }
}
 
A.
You can’t—we already tried moving the cout to all
3 possible places.
B.
You can but you use a stack instead of recursion.
C.
You can but you use a queue instead of recursion.
D.
Other/none/more
A
B
C
D
E
F
 
That prints the ABC nodes in order, but notice
that our ABC tree isn’t a BST. 
How do we print BST
nodes in order?
 
void traverse(Node *node) {
  if (node != NULL) {
     traverse(node->left);
     traverse(node->right);
  }
}
 
A.
Use the BFS (queue instead of recursion).
B.
Use the DFS and put cout first.
C.
Use the DFS and put cout between.
D.
Use the DFS and put cout last.
E.
Other/none/more
 
We can play the same game with non-alpha
characters as keys: 
What does this print?
 
void traverse(Node *node) {
  if (node != NULL) {
     traverse(node->left);
     traverse(node->right);
     cout << node->key << " ";
  }
}
 
A.
* + / 3 4 8 2
B.
* + 3 4 / 2
C.
3 + 4 * 8 / 2
D.
3 4 + 8 2 / *
E.
Other/none/more
*
+
/
3
4
2
8
 
Remember our old friend:
 
 
Practical uses of
inorder/preorder/postorder traversals
 
void traverse(Node *node) {
  if (node != NULL) {
     traverse(node->left);
     traverse(node->right);
     cout << node->key << " ";
  }
}
 
 
 
 
To calculate the answer: postorder
To print this in human-friendly format: inorder
*
+
/
3
4
2
8
Slide Note
Embed
Share

Concepts of hash collisions and HashMap efficiency in C++. Dive into the challenges of handling hash collisions and the impact on HashMap operations. Learn key strategies to optimize HashMap performance and avoid pitfalls in implementation.

  • C++
  • Hash Collisions
  • HashMap Efficiency
  • Data Structures
  • Algorithm

Uploaded on Feb 18, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Creative Commons License CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at http://peerinstruction4cs.org. CS106X Programming Abstractions in C++ Cynthia Bailey Lee

  2. 2 Today s Topics: Map Interface, continued 1. Hashing: collisions The birthday problem 2. Hashing: live coding 3. Tree traversals (time permiting)

  3. Hash collisions We may need to worry about hash collisions Hash collision: Two keys a, b, a b, have the same hash code index (i.e. hash(a) = hash(b))

  4. Hash key collisions We can NOT overwrite the value the way we would if it really were the same key Use a linked list chain extending from each hash table array entry to allow multiple distinct keys to share a single hash table location

  5. Map Interface: hash-map.h private: struct node { Key key; Value value; node *next; }; node **buckets; int numBuckets; int count; int hash(const Key& key) const; }; #include "hash-map-impl.h

  6. HashMap

  7. HashMap TRUE or FALSE: We don t actually need to store the keys in the nodes we only really need the key to find which index, then store the value there A. TRUE B. FALSE

  8. Hash key collisions & Big-O of HashMap If there are no collisions, find/add/remove are all O(1) just compute the key and go! Two factors for ruining this magical land of instantaneous lookup: Too-small table (worst case = 1) Hash function doesn t produce a good spread int awfulHashFunction(char* input, int size) { return 4; } Find/add/remove all O(n) worst case 1. 2. HTTP://XKCD.COM/221/

  9. One simple, common approach Hash table (array) size, numBuckets, is some prime number Keep numBuckets big enough that at most 70% of the slots are filled (resize as needed) Somehow convert your data to an int and then int array_index = input % numBuckets; Can we still get super unlucky? YES Hashmap analysis of O(1) is average case For a nicely designed hash: 99.9% of the time

  10. The Birthday Problem Commonly mentioned fact of probability theory: In a group of n people, what are the odds that 2 people have the same birthday? (assume birthdays are uniformly distributed across 365 days, which is wrong in a few ways) For n 23, it is more likely than not For n 57, >99% chance Means that hash tables almost certainly have at least one collision http://commons.wikimedia.org/wiki/File:Birthday_Paradox.svg Creative Commons share-alike license.

  11. (Live coding of hash-map.h)

  12. Quickly Back to Trees: Tree Traversals!

  13. What does this print? void traverse(Node *node) { if (node != NULL) { cout << node->key << " "; traverse(node->left); traverse(node->right); } } A B C A B C D E F A B D E C F D B E F C A D E B F C A Other/none/more A. B. C. D. E. D E F

  14. What does this print? void traverse(Node *node) { if (node != NULL) { traverse(node->left); cout << node->key << " "; traverse(node->right); } } A B C A B C D E F A B D E C F D B E F C A D E B F C A Other/none/more A. B. C. D. E. D E F

  15. How can we get this to print our ABCs in order? void traverse(Node *node) { if (node != NULL) { cout << node->key << " "; traverse(node->left); traverse(node->right); } } A B C D E F You can t we already tried moving the cout to all 3 possible places. You can but you use a stack instead of recursion. You can but you use a queue instead of recursion. Other/none/more A. B. C. D.

  16. That prints the ABC nodes in order, but notice that our ABC tree isn t a BST. How do we print BST nodes in order? void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); } } Use the BFS (queue instead of recursion). Use the DFS and put cout first. A. B. C. Use the DFS and put cout between. D. Use the DFS and put cout last. E. Other/none/more

  17. We can play the same game with non-alpha characters as keys: What does this print? void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); cout << node->key << " "; } } * + / * + / 3 4 8 2 * + 3 4 / 2 3 + 4 * 8 / 2 3 4 + 8 2 / * Other/none/more A. B. C. D. E. 3 4 8 2

  18. Remember our old friend:

  19. Practical uses of inorder/preorder/postorder traversals void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); cout << node->key << " "; } } * + / 3 4 8 2 To calculate the answer: postorder To print this in human-friendly format: inorder

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#