Understanding Compression and LZW Algorithm

cs 1501 recitation 3 n.w
1 / 30
Embed
Share

Explore the concept of compression and the LZW algorithm in the context of data representation and processing efficiency. Discover how data compression can improve data transfer speeds and reduce processing delays in networks, with a focus on lossless compression techniques.

  • Compression
  • LZW Algorithm
  • Data Representation
  • Processing Efficiency
  • Lossless Compression

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CS 1501 RECITATION 3 Gordon Lu

  2. AGENDA FOR TODAY Compression LZW Hints for the Project

  3. COMPRESSION

  4. MOTIVATION Consider the Internet Suppose we have User 1 sending a 100GB file to User 2 Suppose that the packets of 100GBytes must traverse over a link with 10Gbps bandwidth. Also, suppose that we have 8 packets waiting in a queue when a new packet arrives. We re gonna end up waiting a LONG time before the other packets are processed i.e., the queueing delay is REALLY LONG. How long does a packet have to sit in a buffer before it is processed? But what if we could represent our data with a smaller number of bits without losing any data?

  5. ENTER COMPRESSION Compression is defined as The process of encoding information using fewer bits than the original representation

  6. THATS COOL AND ALLBUT WHATS THE POINT? Going back to our network example If we compressed our 100GByte file into a 10GByte file We can process the packet much faster, and effectively process the remaining packets quicker.

  7. LOSSLESS COMPRESSION We re mainly concerned with lossless compression! A lossless compression algorithm allow the data to be perfectly reconstructed from compressed data D EXPAND COMPRESS C D

  8. A SIDENOTE In the Silicon Valley TV series, the main characters develop a compression algorithm that is WAY beyond belief: The idea is an adaptive machine learning model As more data is compressed, the algorithm learns and compresses data more efficiently to the extent that the size of the file may be 50x smaller than the original Which is really not possible (for now, let s not think about quantum)

  9. LZW COMPRESSION

  10. LZW COMPRESSION IDEA In Huffman coding, we maintained a table of variable-length codewords for fixed- length patterns in the input. Instead, we maintain a table fixed-length codewords for variable-length patterns in the input. By doing so, we do not have to encode the table in compression!

  11. LZW COMPRESSION Initialize codebook to all single characters e.g., character maps to its ASCII value While !EOF: Find the longest string s in the symbol table that is a prefix of the unscanned input Output codeword Take this longest prefix, add the next character in the file, and add the result to the dictionary with a new codeword

  12. LZW EXPANSION Initialize codebook to all single characters e.g., ASCII value maps to its character While !EOF: Read next codeword from file Lookup corresponding pattern in the codebook Output that pattern Add the previous pattern + the first character of the current pattern to the codebook

  13. THE EDGE CASE Expansion s codebook will always be a step behind compression s codebook when processing the same pattern So, it is possible that in expansion, we encounter a codeword that has not be encountered yet!! Solution: We take the first character of our previous output, and add it to the end of the previous output For example, if the previous output was BD, then we would add BDB

  14. LZW EXAMPLE I Consider a file containing the following text data: AAABBBAAB Given that the ASCII value for A is 65, perform LZW compression and expansion.

  15. LZW EXAMPLE II Consider a file containing the following text data: D A B D D D B D B D B Given that the ASCII value for A is 65, perform LZW compression and expansion.

  16. LETS TAKE A LOOK AT THE LZW CODE

  17. PROJECT 2 HINTS

  18. WHAT SHOULD I DO FIRST? The project is broken up into 3 phases: 1. DLB 2. Autocompleter 3. UserHistory

  19. HOW DO I DISTINGUISH BETWEEN WORDS IN THE DLB? Before proceeding upon insertion, you can append a sentinel character to the String. By doing so, as you perform operations on the DLB, you know when you are at a word

  20. METHODS YOU WILL NEED TO IMPLEMENT IN DLB AND USERHISTORY public void add(String key); public boolean contains(String key); public boolean containsPrefix(String pre); public int searchByChar(char next); public void resetByChar(); public ArrayList<String> suggest(); public ArrayList<String> traverse(); public int count();

  21. CONTAINS VS CONTAINS PREFIX In contains: You are searching the DLB/UserHistory for a given word This means you are looking if you can reach the corresponding sentinel node In containsPrefix It s pretty much like contains, BUT you don t need to get to a sentinel node

  22. SUGGEST Suggest() is one of the more difficult methods in DLB/UserHistory. In DLB, an ArrayList is filled up with all words based on the current by-character search. This ArrayList will be sorted ASCII-betically In UserHistory, an ArrayList is filled up with all words based on the current by- character search. This ArrayList will be sort by frequency AND ASCII-betically These methods must run in O(wR) (w = length of longest word, R = alphabet size)

  23. WEVE JUST MADE A REVELATION From the previous slide we can say the following: UserHistory and DLB are fundamentally the same, but UserHistory simply requires a modified suggest method!! (They re both DLBs!!)

  24. USER HISTORY How do we sort based on the frequency that a word has been selected? We can do it two ways: Add a frequency field to each node Create a mappingbetween a word and the frequency (How??)

  25. FREQUENCY FIELD APPROACH To encode frequency, once a given word is selected, we would increment the frequency of the corresponding SENTINEL node s field!

  26. AUTOCOMPLETER.JAVA WILL BRING EVERYTHING TOGETHER! public ArrayList<String> nextChar(char next); public void finishWord(String cur); public void saveUserHistory(String fname);

  27. HOW AUTOCOMPLETER.JAVA SHOULD WORK First, load in userHistory from a provided text file. Load in said contents into the UserHistory object Load in the dictionary into a DLB object Here, we prioritize what the user has entered in the past AND then we suggest words from the dictionary! Initialize a String curr = ; Invoke nextChar( somechar ) to get a list of suggestions based on curr + somechar This will return a list of suggestions with the prefix curr + somechar Once you are satisfied and want to add a suggestion, finishWord() is invoked The word is added to the userHistory object AND curr is reset! Once you are satisfied and want to terminate the program: Write out everything in the current userHistory object to the file: fname. HINT: Use the traverse method!

  28. WHY BOTHER WRITING OUT THE USERHISTORY? We want to be able to record what the user has historically selected In other words, the code will be run multiple times, and each time we want to remember the respective user histories!!

  29. POTENTIAL ISSUES Suppose we retrieve userHistory and dictionary suggestions We ll get two different ArrayLists from the respective suggest methods How do we get rid of duplicates? How do we merge the ArrayLists?

  30. LETS SEE IT IN ACTION!

More Related Content