Faster Postings Merges with Skip Pointers in Information Retrieval

Slide Note
Embed
Share

Enhance the speed of postings merges in information retrieval by utilizing skip pointers and skip lists strategically. Learn how to augment postings with skip pointers, optimize query processing with skips, decide where to place skips, and adopt simple heuristics for skip pointer placement.


Uploaded on Aug 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Introduction to Information Retrieval Introduction to Information Retrieval Faster postings merges: Skip pointers/Skip lists

  2. Sec. 2.3 Introduction to Information Retrieval Recall basic merge Walk through the two postings simultaneously, in time linear in the total number of postings entries 2 4 8 41 48 64 128 Brutus 2 8 1 2 3 8 11 17 21 31 Caesar If the list lengths are m and n, the merge takes O(m+n) operations. Can we do better? Yes (if the index isn t changing too fast).

  3. Sec. 2.3 Introduction to Information Retrieval Augment postings with skip pointers (at indexing time) 128 41 2 4 8 41 48 64 128 31 11 1 2 3 8 11 17 21 31 Why? To skip postings that will not figure in the search results. How? Where do we place skip pointers?

  4. Sec. 2.3 Introduction to Information Retrieval Query processing with skip pointers 128 41 2 4 8 41 48 64 128 31 11 1 2 3 8 11 17 21 31 Suppose we ve stepped through the lists until we process 8 on each list. We match it and advance. We then have 41 and 11 on the lower. 11 is smaller. But the skip successor of 11 on the lower list is 31, so we can skip ahead past the intervening postings.

  5. Sec. 2.3 Introduction to Information Retrieval Where do we place skips? Tradeoff: More skips shorter skip spans more likely to skip. But lots of comparisons to skip pointers. Fewer skips few pointer comparison, but then long skip spans few successful skips.

  6. Sec. 2.3 Introduction to Information Retrieval Placing skips Simple heuristic: for postings of length L, use L evenly-spaced skip pointers [Moffat and Zobel 1996] This ignores the distribution of query terms. Easy if the index is relatively static; harder if L keeps changing because of updates. This definitely used to help; with modern hardware it may not unless you re memory-based [Bahle et al. 2002] The I/O cost of loading a bigger postings list can outweigh the gains from quicker in memory merging!

  7. Introduction to Information Retrieval Introduction to Information Retrieval Faster postings merges: Skip pointers/Skip lists

Related


More Related Content