Dynamic Rank Frameworks in Lucene/Solr

A Static Rank
Framework for
Lucene/Solr
Mike Schultz
mike.schultz@gmail.com
Static Rank for Solr/Lucene
Dynamic Rank
Why Static Rank
Combining Scores
Static Rank Components
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Enum (Book, CD, DVD, Cassette)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Enum (Book, CD, DVD, Cassette)
Text (Natural Language)
Dynamic Rank
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
Query Dependent = F(Q,D)
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
Query Dependent = F(Q,D)
Huge dynamic range (0.001-1502.3)
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
Query Dependent = F(Q,D)
Huge dynamic range (0.001-1502.3)
Not comparable across queries
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
Query Dependent = F(Q,D)
Huge dynamic range (0.001-1502.3)
Not comparable across queries
Not easily normalized
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
All (dynamic) things equal, I want
Newer over older
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
All (dynamic) things equal, I want
Newer over older
CD over cassette
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
All (dynamic) things equal, I want
Newer over older
CD over cassette
Arbitrary feature A over arbitrary
feature B
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Query Independent = F(D)
i.e. static across queries
Static Score
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Query Independent = F(D)
i.e. static across queries
More easily bounded
Static Score
Combined Rank
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Static Rank
    System
Custom Query
Combined Score
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable, debuggable
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable, debuggable
Query-time only, no re-indexing
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable, debuggable
Query-time only, no re-indexing
Minimal parameters
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable, debuggable
Query-time only, no re-indexing
Minimal parameters
Static Rank should boost / demote
But not too much!
Docs should stay in their own dynamic
rank “neighborhood”.
Combining Scores - Approaches
Custom Query
Combined Score
Addition?
Dynamic(0.0001) + Static(0.3) = 0.3001
Dynamic(1542.1) + Static(0.3) = 1542.4
Difficult to get right across queries
Combining Scores - Approaches
Custom Query
Combined Score
Multiplication?
Dynamic(50.0) * Static(0.3) = 15.0
Dynamic(10.0) * Static(2.0) = 20.0
Could work, but awkward
Combining Scores - Approaches
Linear Query
Combined Score
1.
Bound StaticScore: -1.0 to 1.0
2.
CScore = DScore*(100+S%*SScore)
At most, staticRank will boost/demote
dynamicScore by S%
CScore = 0.014 * (100+30*0.5)
CScore = 145.3 * (100+30*-0.5)
LinearQuery
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
Extend solr.ValueSource/Parser
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
Extend solr.ValueSource/Parser
Uses field cache for inputs
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank
    System
Static Score
Extend solr.ValueSource/Parser
Uses field cache for inputs
Extremely fast
Static Rank
PubDate
IsNews
MediaType
Static Rank
PubDate
IsNews
MediaType
AgoValueSource
years
  ago
Static Rank
PubDate
IsNews
MediaType
MuxValueSource
0
T
F
AgoValueSource
years
  ago
years
  ago
MuxValueSource Config
Static Rank
PubDate
IsNews
MediaType
0
T
F
EnumValueSource
MuxValueSource
AgoValueSource
years
  ago
years
  ago
EnumValueSource Config
Maps Fixed-Vocabulary to YEARS AGO
A hierarchy and 3 values: MIN,0,MAX
All things equal (dynamically), DVD = +3.3 years
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSource
AgoValueSource
years
  ago
years
  ago
years
  ago
years
  ago
?
-1
1
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +1
1 parameter
Too abrupt
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +1
1 parameter
Too abrupt
Linear
No parameters (fixed)
Too gradual over 2000+ years
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +1
1 parameter
Too abrupt
Linear
No parameters (fixed)
Too gradual over 2000+ years
Sigmoid
2 parameters
Smooth over entire range
Easy to calculate
Sigmoid
Slope
Sigmoid
Slope
x
-intercept (year)
1.0
-1.0
Years-ago
x0 = 1.5 years ago
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSource
AgoValueSource
SigmoidValueSource
-1
1
years
  ago
years
  ago
years
  ago
SigmoidValueSource Config
Static Rank Config
Conclusion
solr.ValueSource/Parser - fast and flexible
Conclusion
solr.ValueSource/Parser - fast and flexible
CScore = DScore * (100 + S% * SScore)
-1.0 < SScore < 1.0
Conclusion
solr.ValueSource/Parser - fast and flexible
CScore = DScore * (100 + S% * SScore)
-1.0 < SScore < 1.0
 “Time” as a common currency for static features
Slide Note
Embed
Share

Explore the complexities of dynamic ranking in Lucene/Solr through various strategies such as static ranks, multiple field types, and dynamic score queries. Dive into the nuances of combining scores, Boolean operations, and the intricate dependencies between queries and data to enhance search relevance and effectiveness.

  • Lucene
  • Solr
  • Dynamic Ranking
  • Search Frameworks
  • Information Retrieval

Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A Static Rank Framework for Lucene/Solr Mike Schultz mike.schultz@gmail.com

  2. Static Rank for Solr/Lucene Dynamic Rank Why Static Rank Combining Scores Static Rank Components

  3. Multiple Fields /Multiple Types PubDate Continuous (Date, Int, Float, ) IsNews MediaType TextBody

  4. Multiple Fields / Multiple Types PubDate Continuous (Date, Int, Float, ) IsNews Boolean (True, False) MediaType TextBody

  5. Multiple Fields / Multiple Types PubDate Continuous (Date, Int, Float, ) IsNews Boolean (True, False) MediaType Enum (Book, CD, DVD, Cassette) TextBody

  6. Multiple Fields / Multiple Types PubDate Continuous (Date, Int, Float, ) IsNews Boolean (True, False) MediaType Enum (Book, CD, DVD, Cassette) Text (Natural Language) TextBody

  7. Dynamic Rank PubDate IsNews MediaType TextBody TF * IDF Dynamic Score Query

  8. Dynamic Rank Query Dependent = F(Q,D) PubDate IsNews MediaType TextBody TF * IDF Dynamic Score Query

  9. Dynamic Rank Query Dependent = F(Q,D) Huge dynamic range (0.001-1502.3) PubDate IsNews MediaType TextBody TF * IDF Dynamic Score Query

  10. Dynamic Rank Query Dependent = F(Q,D) Huge dynamic range (0.001-1502.3) Not comparable across queries PubDate IsNews MediaType TextBody TF * IDF Dynamic Score Query

  11. Dynamic Rank Query Dependent = F(Q,D) Huge dynamic range (0.001-1502.3) Not comparable across queries Not easily normalized PubDate IsNews MediaType TextBody TF * IDF Dynamic Score Query

  12. Why Static Rank? PubDate Static Rank System IsNews Static Score MediaType TextBody Query

  13. Why Static Rank? PubDate Static Rank System IsNews Static Score MediaType All (dynamic) things equal, I want Newer over older TextBody Query

  14. Why Static Rank? PubDate Static Rank System IsNews Static Score MediaType All (dynamic) things equal, I want Newer over older CD over cassette TextBody Query

  15. Why Static Rank? PubDate Static Rank System IsNews Static Score MediaType All (dynamic) things equal, I want Newer over older CD over cassette Arbitrary feature A over arbitrary feature B TextBody Query

  16. Static Rank PubDate Static Rank System IsNews Static Score MediaType Query Independent = F(D) i.e. static across queries TextBody Query

  17. Static Rank PubDate Static Rank System IsNews Static Score MediaType Query Independent = F(D) i.e. static across queries More easily bounded TextBody Query

  18. Combined Rank PubDate Static Rank System IsNews Combined Score MediaType Custom Query TextBody TF * IDF Query

  19. Framework - Requirements Intuitive, hand-tunable, debuggable Combined Score Custom Query

  20. Framework - Requirements Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Combined Score Custom Query

  21. Framework - Requirements Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Minimal parameters Combined Score Custom Query

  22. Framework - Requirements Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Minimal parameters Static Rank should boost / demote But not too much! Docs should stay in their own dynamic rank neighborhood . Combined Score Custom Query

  23. Combining Scores - Approaches Addition? Dynamic(0.0001) + Static(0.3) = 0.3001 Dynamic(1542.1) + Static(0.3) = 1542.4 Difficult to get right across queries Combined Score Custom Query

  24. Combining Scores - Approaches Multiplication? Dynamic(50.0) * Static(0.3) = 15.0 Dynamic(10.0) * Static(2.0) = 20.0 Could work, but awkward Combined Score Custom Query

  25. Combining Scores - Approaches 1. Bound StaticScore: -1.0 to 1.0 2. CScore = DScore*(100+S%*SScore) At most, staticRank will boost/demote dynamicScore by S% CScore = 0.014 * (100+30*0.5) CScore = 145.3 * (100+30*-0.5) Combined Score Linear Query

  26. LinearQuery

  27. Static Rank PubDate Static Rank System IsNews Static Score MediaType TextBody Query

  28. Static Rank PubDate Static Rank System IsNews Static Score MediaType Extend solr.ValueSource/Parser TextBody Query

  29. Static Rank PubDate Static Rank System IsNews Static Score MediaType Extend solr.ValueSource/Parser Uses field cache for inputs TextBody Query

  30. Static Rank PubDate Static Rank System IsNews Static Score MediaType Extend solr.ValueSource/Parser Uses field cache for inputs Extremely fast TextBody Query

  31. Static Rank PubDate IsNews MediaType

  32. Static Rank AgoValueSource years ago PubDate IsNews MediaType

  33. Static Rank AgoValueSource MuxValueSource years ago T PubDate years ago F 0 IsNews MediaType

  34. MuxValueSource Config

  35. Static Rank AgoValueSource MuxValueSource years ago T PubDate years ago F 0 IsNews EnumValueSource MediaType

  36. EnumValueSource Config Maps Fixed-Vocabulary to YEARS AGO A hierarchy and 3 values: MIN,0,MAX All things equal (dynamically), DVD = +3.3 years

  37. Static Rank AgoValueSource MuxValueSource years ago T PubDate SumValueSource years ago F 0 1 years ago IsNews ? -1 EnumValueSource years ago MediaType

  38. Mapping YearsAgo to -1.0 1.0 Step Function: if > 10 years-ago = -1, else = +1 1 parameter Too abrupt

  39. Mapping YearsAgo to -1.0 1.0 Step Function: if > 10 years-ago = -1, else = +1 1 parameter Too abrupt Linear No parameters (fixed) Too gradual over 2000+ years

  40. Mapping YearsAgo to -1.0 1.0 Step Function: if > 10 years-ago = -1, else = +1 1 parameter Too abrupt Linear No parameters (fixed) Too gradual over 2000+ years Sigmoid 2 parameters Smooth over entire range Easy to calculate

  41. Sigmoid Slope

  42. Sigmoid Slope x-intercept (year)

  43. 1.0 x0 = 1.5 years ago Years-ago -1.0

  44. Static Rank AgoValueSource MuxValueSource years ago T PubDate SumValueSource years ago F 0 1 IsNews -1 EnumValueSource years ago MediaType SigmoidValueSource

  45. SigmoidValueSource Config

  46. Static Rank Config

  47. Conclusion solr.ValueSource/Parser - fast and flexible

  48. Conclusion solr.ValueSource/Parser - fast and flexible CScore = DScore * (100 + S% * SScore) -1.0 < SScore < 1.0

  49. Conclusion solr.ValueSource/Parser - fast and flexible CScore = DScore * (100 + S% * SScore) -1.0 < SScore < 1.0 Time as a common currency for static features

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#