Trie Data Structure for Efficient Word Search

Slide Note

A comprehensive overview of Trie data structure usage in word search and pattern matching. Learn about different types of Tries, implementation techniques, time complexities, and practical applications. Explore the nuances of searching, insertion, and deletion operations, unraveling the power of Trie for efficient word processing. Delve into word matching examples and understand the intricacies of constructing ordered trees for strings.

hcel Follow

Uploaded on Feb 26, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Lecture 19 Tries

Linked List Priority Queue Stack Heap Queue BST Multiway Red-Black AVL Hash Table Algorithm Analysis Tools

Is finding a pattern same as finding a word in the dictionary?

How fast can you search for a word in a set of words?

How to make ordered tree for strings?

Standard Tries Except root, each node is labeled with a character The children of a node are alphabetically ordered The path from root to external nodes yields the string

S = { bear, bell, bid, bull, buy, sell, stock, stop } b s e i u e t a l d l y l o r l l l c p k

Insert Assumption: no word is prefix of other word! What if this is not the case?

Time complexity n total size of the strings in S m size of given string d size of the alphabet s number of words

Search? O(m)

Insert & delete? O(md)

Word Matching with a Trie s 0 e e 1 a 4 b e a r 6 7 ? s e l l s t o c k ! 2 3 5 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 s e e a b u l l ? b u y s t o c k ! 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 b i 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 a r d s t o c k ! b i d s t o c k ! h e 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 t h e b e l l ? s t o p ! 87 88 b h s e i u e e t y e a l l a l o d 36 0, 24 47, 58 r r p c l l l 6 69 84 78 30 12 k 17, 40, 51, 62

How many nodes do we need? Can we reduce the number of nodes?

b s e i u e t a l d l y l o r l l l c p k Compressed Tries b s e id u ell to ar ll ll y ck p

Compressed Tries A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of redundant nodes A redundant node is the one that has only one child

How many nodes do I have in a compressed Trie?

Compressed Trie Properties Every internal node of T has at least two children and at most d children T has s leaf nodes The number of internal nodes of T is O(s)

0 1 2 3 4 0 1 2 3 0 1 2 3 s e e b e a r s e l l s t o c k b u l l b u y b i d h e b e l l s t o p a r S[0] = S[4] = S[7] = S[1] = S[5] = S[8] = S[2] = S[6] = S[9] = S[3] = b s hear e e id u ell to ll e ar ll ll y ck p 1,0,0 0,0,0 7,0,3 1,1,1 4,1,1 6,1,2 0,1,1 3,1,2 1,2,3 8,2,3 5,2,2 2,2,3 3,3,4 9,3,3 4,2,3 0,2,2

Compact Representation of Tries Stores at the nodes ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure

Insertion/Deletion in Compressed Trie f g or u one reat n r f g search ends here or u one reat insert green f g n r or u one re insert green n r at en

Suffix Trie m i 0 1 2 3 4 5 6 7 n i m i z e e i mi nimize ze 7 2 6 mize nimize ze nimize ze 4 3 1 5 0

Suffix Trie Not all strings are guaranteed to have corresponding suffix trie. For example: xabxa a xa bxa bxa bxa a xa bxa bxa bxa $ $

main 2011/1/13 9:10 page 585 #607 12.5. Tries 585 main 2011/1/13 9:10 page 585 #607 main 2011/1/13 9:10 page 585 #607 12.5. Tries 585 Pattern Matching Using Suffix Trie 12.5. Tries (a) 585 Compressed Trie (a) (b) (a) Figure 12.13: (a) Suffix trie T for the string X = m i ni m i ze . (b) Compact representation of T, where pair (i, j) denotes X[i..j]. Using a Suffix Trie The suffix trie T for a string X can be used to efficiently perform pattern matching queries on text X. Namely, we can determine whether a pattern P is a substring of X by trying to trace a path associated with P in T. P is a substring of X if and only if such a path can be traced. The search down the trie T assumes that nodes in T store some additional information, with respect to the compact representation of the suffix trie: Auxiliary Trie If node v has label (i, j) andY is the string of length y associated with the path from the root to v (included), then X[j y+ 1..j]= Y. (b) Figure 12.13: (a) Suffix trie T for the string X = m i ni m i ze . (b) Compact representation of T, where pair (i, j) denotes X[i..j]. This property ensures that we can easily compute the start index of the pattern in the text when a match occurs. Using a Suffix Trie (b) Figure 12.13: (a) Suffix trie T for the string X = m i ni m i ze . (b) Compact representation of T, where pair (i, j) denotes X[i..j]. The suffix trie T for a string X can be used to efficiently perform pattern matching queries on text X. Namely, we can determine whether a pattern P is a substring of X by trying to trace a path associated with P in T. P is a substring of X if and only if such a path can be traced. The search down the trie T assumes that nodes in T store some additional information, with respect to the compact representation of the suffix trie: Using a Suffix Trie The suffix trie T for a string X can be used to efficiently perform pattern matching queries on text X. Namely, we can determine whether a pattern P is a substring of X by trying to trace a path associated with P in T. P is a substring of X if and only if such a path can be traced. The search down the trie T assumes that nodes in T store some additional information, with respect to the compact representation of the suffix trie: This property ensures that we can easily compute the start index of the pattern in the text when a match occurs. If node v has label (i, j) and Y is the string of length y associated with the path from the root to v (included), then X[j y+ 1..j] = Y. If node v has label (i, j) and Y is the string of length y associated with the path from the root to v (included), then X[j y+ 1..j] = Y. This property ensures that we can easily compute the start index of the pattern in the text when a match occurs.

How much space do we need to make a suffix trie of n character string?

To Trie or Not to Trie? What would you use to store phonebook?

Constructing a Suffix Trie S[1..n] is the string Start with a single edge for S Enter the edges for the suffix S[i..n] where i goes from 2 to n Starting at the root node find the longest part from the root whose label matches a prefix of S[i..n]. At some point, no further matches are possible If the point is at a node, then denote this node by w If it is in the middle of an edge, then insert a new node called w, at this point Create a new edge running from w to a new leaf labeled S[i..n]