Understanding Hashing Strings in Computer Science

csce 221 hashing strings n.w
1 / 14
Embed
Share

Explore the concept of hashing with string keys in computer science, focusing on creating effective hash functions and addressing issues with summing hash functions. Delve into examples, ASCII tables, and discussions to grasp the complexities of hashing strings effectively. Learn how to optimize hash codes for English words and prevent collisions in hash tables.

  • Hashing
  • Strings
  • Computer Science
  • Hash Functions
  • ASCII

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-221 Hashing Strings Emil Thomas 04/10/19 Based on Prof. Shawn Lupoli s Notes(TAMU)

  2. Hashing with string keys Idea is to use string-valued keys in hash table Most real world keys would be strings The challenge is to come up with good hash function Done in TWO Steps: Use the characters in the string to compute an integer Take the above integer % TABLESIZE Each Character has an decimal value corresponding to its ASCII code.

  3. Ascii Table '5' has the int value 53 if we write '5'-'0' it evaluates to 53-48, or the int 5 if we write char c = 'B'+32; then c stores 'b' Source : ibm.com, cmu.cs.edu

  4. Hash Function Example1 (BAD) Sum all the characters in the string % TABLESIZE unsigned int hash(const std::string &key, unsigned int TableSIZE) { unsigned int hashval = 0; for(int i = 0; i < key.size(); ++i) { hashval += key[i]; } return hashval % TableSIZE; }

  5. Hash Function Example1 (BAD) Sum all the characters in the string % TABLESIZE N = 13 inserting Lupoli (ASCII sum = 629, hash value = 5 (629%13)) New York (751 % 13) = 10 bucket (638%13) = 1 Ocean Park (916 % 13) = 6

  6. Issues with summing hash function If the table size is LARGE (N), many strings that are smaller will be placed in the first half of the table! Will there be any values at 0? What would the MIN value for a 3 CAPITAL letter word? What would the MAX value for a 3 CAPITAL letter word? Would distribution be uniform?

  7. Discussion in Lab

  8. Bad hash codes on Strings (English Words) some letters come up more than other in our English Dictionary example a few bad hash codes o sum ASCII value of characters most words are short w/ 70000 buckets, there will be many collisions since most will only fit the first 500 or so buckets adding just does not thinking about the order the letters appear pat, apt, tap, all collide o choose first 3 letters of a word with 26^3 buckets biases in English words like pre- but no words with some other letters like xzq will not have an even distribution of key

  9. Hash Function Example2 (Good) unsigned int hash(const std::string &key, unsigned int TableSIZE) { unsigned int hashval = 0; for(int i = 0; i < key.size(); ++i) { hashval = 37 * hashval + key[i]; } return hashval % TableSIZE; }

  10. Hash Function Example3 (Good) unsigned int hash(const std::string &key, unsigned int TableSIZE) { unsigned int hashval = 0; for(int i = 0; i < key.size(); ++i) { hashval = (127 * hashval + key[i]) % 16908799; } return hashval % TableSIZE; }

  11. Designing a Hash Table Determine possible size of data set Does the data size grow? Determine possible N with 20% compression Determine load factor

  12. Why not always use a hash table? Does not support ordered output. (No findMin or findMax) Does not support finding all elements in a given range How to design the hash function is not always clear. Too many duplicate keys (O(N) complexity to find all of its values)

  13. Exercise && Survey http://people.tamu.edu/~yanzp/courses/csce221/notes/Lab12_UIN_First_Last.docx

  14. Questions?

More Related Content