Advanced Tools for Text Indexing and Searching in SQL and Lucene

undefined
 
TOOLS FOR TEXT
INDEXING AND SEARCHING
 
Du
šan Zeleník
 
PeWe 2011
 
FIIT STU
 
zelenik
@fiit.stuba.sk
 
Searching using SQL LIKE
 
CREATE INDEX names_index ON heroes(name)
SELECT
 name
 FROM 
heroes 
WHERE name LIKE “z
elen
%”
will use names_index, ok
SELECT
 name
 FROM 
heroes 
WHERE name LIKE “%ik”
won’t use names_index (seriously don’t do that)
 
CREATE FULLTEXT INDEX names_fullindex ON heroes(name)
SELECT
 name
 FROM 
heroes 
MATCH(name) AGAINST(“%ik”)
will use 
names_fullindex
SELECT
 name
 FROM 
heroes 
MATCH(name) AGAINST(“ze%ik”)
won’t use 
names_fullindex
(seriously don’t do that)
 
Search Engines for TEXT
 
Lucene
Lucene Core - Java (library)
Ferret …
Solr - Java (standalone server)
Sunspot …
ElasticSearch - Lucene Core
Tire …
Sphinx – C++
Thinking Sphinx
 
Lucene vs. Sphinx
 
Sphinx
 
Standalone server (
http://sphinxsearch.com/
)
Thinking Sphinx (Rails Gem – MVC)
http://freelancing-god.github.com/
works directly with DB and Sphinx server
 
Thinking Sphinx
 
class
 Hero < ActiveRecord::Base
   define_index 
do
     indexes description, 
:sortable 
=> 
true
     indexes sidekick(
:name
), 
:as 
=> 
:sidekick
, 
:sortable
 => 
true
     has sidekick, summoned_at, died_at
  
end
end
 
Hero
.search
 “zelenik”
Hero
.search 
:conditions 
=> {
:
sidekick
=> 
“simko”
}
,
 
:match_mode 
=> 
:any
 
#(
:all
,
 :any
,
 :phrase
,
 :boolean
)
 
:
order 
=> 
:
died
_at
 
Thinking Sphinx
 
Excerpts
heroes = Hero.search “gigant”
heroes.excerpts.description
… has abnormally 
gigant
 muscles ….
Facets
indexes sidekick.name, 
:as 
=> 
:
sidekick
, 
:facet
 => 
true
Geolocation
has "RADIANS(latitude)",  
:as 
=> 
:latitude
,  
:type
 => 
:float
has "RADIANS(longitude)", 
:as 
=> 
:longitude
, 
:type 
=> :
float
Place.search “zelenik",
  
:geo 
=> [@lat, @lng],
  
:with 
=> {"@geodist" => 0.0..10_000.0}
 
Solr
 
Standalone server (
http://lucene.apache.org/solr/
)
Sunspot (Rails Gem)
http://outoftime.github.com/sunspot/
communicates with DB and Solr server
 
 
 
 
Sunspot
 
class
 Hero < ActiveRecord::Base
 
searchable 
do
  
text :description
  
string :sidekick 
do
   
sidekick.name
  
end
  
time :summoned_at
  
time :died_at
 
end
end
 
 
Hero.search 
do
 
fulltext ‘muscles'
 
with
(:died_at).less_than Time.now
 
order_by :summoned_at, :desc
 
paginate :page => 2, :per_page => 15
 
facet :sidekick
end
 
 
Sunspot
 
DSL
Solr highlighting
Class hierarchy
Facets
Geographical searches
WillPaginate support
 
Lucene analyzers (tokenizers, filters …)
 
ElasticSearch
 
Standalone server based on Solr
(
http://www.elasticsearch.org/
)
Tire (Rails Gem), better than nothing
https://github.com/karmi/tire
communicates with DB and ElasticSearch server
 
Tire
 
class
 Hero < ActiveRecord::Base
 
 
include
 Tire::Model::Search
      
include
 Tire::Model::Callbacks
      mapping 
do
      
 
indexes :
description
,
 
:
type
 => 'string‘, :
analyzer
 => 'snowball‘
  
indexes :
name
,
  
:
type
 => 'string'
  
indexes :
died_at
,
  
:
type
 => ‘time‘
  
indexes :
summoned_at
,
 
:
type
 => ‘time‘
      
end
end
 
Hero.search ‘muscles'
 
 
 
 
 
ElasticSearch
 
ADVANTAGES OF SOLR
REST
DISTRIBUTED!!!
http://www.youtube.com/watch?v=l4ReamjCxHo
 
For instance, Hadoop …
http://www.elasticsearch.org/guide/reference/modules/g
ateway/hadoop.html
 
Slide Note
Embed
Share

Explore advanced techniques for text indexing and searching using SQL statements like CREATE INDEX and FULLTEXT INDEX, along with insights into popular search engines such as Lucene, Sphinx, and Thinking Sphinx. Dive into the comparison between Lucene and Sphinx, and discover how tools like Sphinx Standalone Server and Thinking Sphinx can enhance your search capabilities. Understand the power of Thinking Sphinx in defining indexes, facets, geolocation features, and more for efficient text searching.


Uploaded on Aug 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. TOOLS FOR TEXT INDEXING AND SEARCHING PeWe 2011 Du an Zelen k FIIT STU zelenik@fiit.stuba.sk

  2. Searching using SQL LIKE CREATE INDEX names_index ON heroes(name) SELECT name FROM heroes WHERE name LIKE zelen% will use names_index, ok SELECT name FROM heroes WHERE name LIKE %ik won t use names_index (seriously don t do that) CREATE FULLTEXT INDEX names_fullindex ON heroes(name) SELECT name FROM heroes MATCH(name) AGAINST( %ik ) will use names_fullindex SELECT name FROM heroes MATCH(name) AGAINST( ze%ik ) won t use names_fullindex(seriously don t do that)

  3. Search Engines for TEXT Lucene Lucene Core - Java (library) Ferret Solr - Java (standalone server) Sunspot ElasticSearch - Lucene Core Tire Sphinx C++ Thinking Sphinx

  4. Lucene vs. Sphinx Lucene Sphinx live index update delata indexes :( only wraps ODBC tables direct import (ODBC) Java C++ very scalable very scalable Wikipedia, Digg Mininova, Slashdot, DoTankoch free, opensource free, opensource

  5. Sphinx Standalone server (http://sphinxsearch.com/) Thinking Sphinx (Rails Gem MVC) http://freelancing-god.github.com/ works directly with DB and Sphinx server

  6. Thinking Sphinx class Hero < ActiveRecord::Base define_index do indexes description, :sortable => true indexes sidekick(:name), :as => :sidekick, :sortable => true has sidekick, summoned_at, died_at end end Hero.search zelenik Hero.search :conditions => {:sidekick=> simko }, :match_mode => :any #(:all, :any, :phrase, :boolean) :order => :died_at

  7. Thinking Sphinx Excerpts heroes = Hero.search gigant heroes.excerpts.description has abnormally gigant muscles . Facets indexes sidekick.name, :as => :sidekick, :facet => true Geolocation has "RADIANS(latitude)", :as => :latitude, :type => :float has "RADIANS(longitude)", :as => :longitude, :type => :float Place.search zelenik", :geo => [@lat, @lng], :with => {"@geodist" => 0.0..10_000.0}

  8. Solr Standalone server (http://lucene.apache.org/solr/) Sunspot (Rails Gem) http://outoftime.github.com/sunspot/ communicates with DB and Solr server

  9. Sunspot Hero.search do fulltext muscles' with(:died_at).less_than Time.now order_by :summoned_at, :desc paginate :page => 2, :per_page => 15 facet :sidekick end class Hero < ActiveRecord::Base searchable do text :description string :sidekick do sidekick.name end time :summoned_at time :died_at end end

  10. Sunspot DSL Solr highlighting Class hierarchy Facets Geographical searches WillPaginate support Lucene analyzers (tokenizers, filters )

  11. ElasticSearch Standalone server based on Solr (http://www.elasticsearch.org/) Tire (Rails Gem), better than nothing https://github.com/karmi/tire communicates with DB and ElasticSearch server

  12. Tire class Hero < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks mapping do indexes :description, indexes :name, indexes :died_at, indexes :summoned_at, end end :type => 'string , :analyzer => 'snowball :type => 'string' :type => time :type => time Hero.search muscles'

  13. ElasticSearch ADVANTAGES OF SOLR REST DISTRIBUTED!!! http://www.youtube.com/watch?v=l4ReamjCxHo For instance, Hadoop http://www.elasticsearch.org/guide/reference/modules/g ateway/hadoop.html

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#