Advanced Tools for Text Indexing and Searching in SQL and Lucene

Slide Note
Embed
Share

Explore advanced techniques for text indexing and searching using SQL statements like CREATE INDEX and FULLTEXT INDEX, along with insights into popular search engines such as Lucene, Sphinx, and Thinking Sphinx. Dive into the comparison between Lucene and Sphinx, and discover how tools like Sphinx Standalone Server and Thinking Sphinx can enhance your search capabilities. Understand the power of Thinking Sphinx in defining indexes, facets, geolocation features, and more for efficient text searching.


Uploaded on Aug 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. TOOLS FOR TEXT INDEXING AND SEARCHING PeWe 2011 Du an Zelen k FIIT STU zelenik@fiit.stuba.sk

  2. Searching using SQL LIKE CREATE INDEX names_index ON heroes(name) SELECT name FROM heroes WHERE name LIKE zelen% will use names_index, ok SELECT name FROM heroes WHERE name LIKE %ik won t use names_index (seriously don t do that) CREATE FULLTEXT INDEX names_fullindex ON heroes(name) SELECT name FROM heroes MATCH(name) AGAINST( %ik ) will use names_fullindex SELECT name FROM heroes MATCH(name) AGAINST( ze%ik ) won t use names_fullindex(seriously don t do that)

  3. Search Engines for TEXT Lucene Lucene Core - Java (library) Ferret Solr - Java (standalone server) Sunspot ElasticSearch - Lucene Core Tire Sphinx C++ Thinking Sphinx

  4. Lucene vs. Sphinx Lucene Sphinx live index update delata indexes :( only wraps ODBC tables direct import (ODBC) Java C++ very scalable very scalable Wikipedia, Digg Mininova, Slashdot, DoTankoch free, opensource free, opensource

  5. Sphinx Standalone server (http://sphinxsearch.com/) Thinking Sphinx (Rails Gem MVC) http://freelancing-god.github.com/ works directly with DB and Sphinx server

  6. Thinking Sphinx class Hero < ActiveRecord::Base define_index do indexes description, :sortable => true indexes sidekick(:name), :as => :sidekick, :sortable => true has sidekick, summoned_at, died_at end end Hero.search zelenik Hero.search :conditions => {:sidekick=> simko }, :match_mode => :any #(:all, :any, :phrase, :boolean) :order => :died_at

  7. Thinking Sphinx Excerpts heroes = Hero.search gigant heroes.excerpts.description has abnormally gigant muscles . Facets indexes sidekick.name, :as => :sidekick, :facet => true Geolocation has "RADIANS(latitude)", :as => :latitude, :type => :float has "RADIANS(longitude)", :as => :longitude, :type => :float Place.search zelenik", :geo => [@lat, @lng], :with => {"@geodist" => 0.0..10_000.0}

  8. Solr Standalone server (http://lucene.apache.org/solr/) Sunspot (Rails Gem) http://outoftime.github.com/sunspot/ communicates with DB and Solr server

  9. Sunspot Hero.search do fulltext muscles' with(:died_at).less_than Time.now order_by :summoned_at, :desc paginate :page => 2, :per_page => 15 facet :sidekick end class Hero < ActiveRecord::Base searchable do text :description string :sidekick do sidekick.name end time :summoned_at time :died_at end end

  10. Sunspot DSL Solr highlighting Class hierarchy Facets Geographical searches WillPaginate support Lucene analyzers (tokenizers, filters )

  11. ElasticSearch Standalone server based on Solr (http://www.elasticsearch.org/) Tire (Rails Gem), better than nothing https://github.com/karmi/tire communicates with DB and ElasticSearch server

  12. Tire class Hero < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks mapping do indexes :description, indexes :name, indexes :died_at, indexes :summoned_at, end end :type => 'string , :analyzer => 'snowball :type => 'string' :type => time :type => time Hero.search muscles'

  13. ElasticSearch ADVANTAGES OF SOLR REST DISTRIBUTED!!! http://www.youtube.com/watch?v=l4ReamjCxHo For instance, Hadoop http://www.elasticsearch.org/guide/reference/modules/g ateway/hadoop.html

More Related Content