k-Ary Search on Modern Processors

undefined
 
k
-Ary Search on Modern
Processors
 
Fakultät Informatik
, Institut Systemarchitektur, Professur Datenbanken
 
Benjami
n Schlegel,
Rainer Gemulla,
Wolfgang Lehner
 
Sigmod/DaMon 2009
 
Searching
Is a fundamental operation in
computer science
Many application areas
(e.g. databases, …)
Demand for high performance
 
Different Types of Searching
Point queries
Nearest-neighbor key queries
Range queries
6/28/2009
Motivation
k-Ary Search on Modern Processors
Slide 2
 
Linear search
Linear complexity
Useful for small datasets
 
Hash-based search
Constant (best) complexity
Additional space requirements
Bad performance for range/nearest-key queries
 
Tree-based search
Logarithmic complexity
Good update properties
 
Sort-based search
Logarithmic complexity
No additional space requirements
Cache-conscious range scans
6/28/2009
k-Ary Search on Modern Processors
Slide 3
Motivation - Search algorithms
 
1.
Motivation
 
2.
Prerequisites
Binary search
SIMDized Binary Search
 
3.
k-Ary Search
Sorted Array
Linearized 
k
-ary Search Tree
 
4.
Experiments
 
5.
Conclusion
 
6/28/2009
 
k-Ary Search on Modern Processors
 
Slide 4
 
Agenda
 
Idea
Divide the search space in each iteration into two partitions
Reduce the search space to one of the partitions
Example
Successful search for key 4
 
 
Analysis
Worst-case iterations = 
log
2
(N+1)
 
 
 
Problems
No usage of modern CPU instruction sets
 
 
 
 
6/28/2009
k-Ary Search on Modern Processors
Slide 5
Binary search on a sorted array
Hit
 
Provide SIMD instructions
Current: SSE, SSE2, SSE3, SSSE3, SSE4A, SSE4.2, Altivec, …
Future: AVX, SSE5, …
(k-1) parallel operation executions for one instruction
Execution time similar to scalar operations
For free with current processors
 
SIMD instructions applicable for searching
Loading
Element-wise (e.g., compare two vectors)
Horizontal (e.g., horizontal sum of a vector)
6/28/2009
k-Ary Search on Modern Processors
Slide 6
Modern Processors
 
Idea
Same like plain binary search
But: compares (k-1) subsequent keys per iteration
Example
Successful search for key 4
 
 
Analysis
Worst-case iterations = 
log
2
(N/(k-1)+1)
 
log
2
(N+1)
 - 
log
2
(k-1)
 
 
 
 
Advantage
Constant reduction of log
2
(k-1) iterations
6/28/2009
k-Ary Search on Modern Processors
Slide 7
SIMDized binary search on a sorted array
 
Hit
 
1.
Motivation
 
2.
Prerequisites
Binary search
SIMDized Binary Search
 
3.
k-Ary Search
Sorted Array
Linearized 
k
-ary Search Tree
 
4.
Experiments
 
5.
Conclusion
 
6/28/2009
 
k-Ary Search on Modern Processors
 
Slide 8
 
Agenda
 
Idea
Divide the search space into 
k
 partitions using 
k-1
 separators
Choose a partition depending of the comparison of the separators
Example
Successful search for key 4
 
 
Analysis
Worst-case iterations = 
 
log
k
(N+1)
 
 Speedup = log
2
(k)
6/28/2009
k-Ary Search on Modern Processors
Slide 9
k
-ary search on a sorted array
 
Hit
 
Comparison step – choosing the next partition
Should require constant time independently of 
k
Use horizontal vector sum of element-wise comparison result
Example: search for key 77 (k=5):
 
 
 
 
 
 
 
Loading step
k-1 accesses to discontinuous memory locations in each iteration
Solution 1: Scatter&Gather instructions (Larrabee, GPU, …)
Solution 2: Searching on an alternative data layout
 
6/28/2009
k-Ary Search on Modern Processors
Slide 10
k
-ary search on a sorted array (cont.)
 
Idea
Construct conceptually 
k
-ary search tree
Store node of 
k
-ary search tree inorder in continuous array
Example
Mapping a sorted array on a linearized 
k
-ary search tree
 
 
 
 
 
Successful search for key 4
 
 
 
Disadvantage: Range scans become expensive
 
 
6/28/2009
k-Ary Search on Modern Processors
Slide 11
k
-ary search on a alternative data layout
 
Hit
 
1.
Motivation
 
2.
Prerequisites
Binary search
SIMDized Binary Search
 
3.
k-Ary Search
Sorted Array
Linearized 
k
-ary Search Tree
 
4.
Experiments
 
5.
Conclusion
 
6/28/2009
 
k-Ary Search on Modern Processors
 
Slide 12
 
Agenda
Setup
Multiple runs with uniform distributed keys between 2
7
 and 2
25
Each run: search for 4096 uniform distributed keys
Different key sizes: 4-byte integer and double precision floating
point
Algorithms
Binary search (Bin)
SIMDized binary search (Bin4)
k
-ary search on a sorted array (
k
-ary)
k
-ary search on a linearized 
k
-ary search tree (
k
-ary-lt)
Different platforms
IBM Cell PPE (PowerPC-970 compatible)
IBM Cell SPE (local store only / main memory)
Intel Nehalem 920 – 2.66 GHz
AMD Phenom 920 – 2.8 GHz
6/28/2009
k-Ary Search on Modern Processors
Slide 13
Experimental setup
 
Setup
Multiple runs with 
uniform distributed keys between 2
7
 and 2
25
Each
 run: 
search for 4096 uniform distributed keys
Different key sizes: 4-byte
 
integer and double precision floating
point
Algorithms
Binary search (Bin)
SIMDized binary search (Bin4)
k
-ary search on a sorted array (
k
-ary)
k
-ary search on a linearized 
k
-ary search tree (
k
-ary-lt)
Different platforms
IBM Cell PPE (PowerPC-970 compatible)
IBM Cell SPE 
(local store only / 
main memory
)
Intel Nehalem 920 – 2.66 GHz
AMD Phenom 920 – 2.8 GHz
Outcomes
Binary search performs worst
k
-ary search on a linearized 
k
-ary search tree performs best
Occurring spikes result from non-aligned DMA-accesses
6/28/2009
k-Ary Search on Modern Processors
Slide 14
Results: Cell BE SPE
 
1.5x
 
2.5x
 
1.3x
 
1.6x
32-bit Integer
64-bit Floating Point
Outcomes
Binary search and SIMDized binary search perform similar
Both 
k
-ary search algorithms scale well for a huge number of keys
6/28/2009
k-Ary Search on Modern Processors
Slide 15
Results: Intel Core i7
 
1.7x
 
4x
 
1.5x
 
2.5x
32-bit Integer
64-bit Floating Point
 
1.
Motivation
 
2.
Prerequisites
Binary search
SIMDized Binary Search
 
3.
k-Ary Search
Sorted Array
Linearized 
k
-ary Search Tree
 
4.
Experiments
 
5.
Conclusion
 
6/28/2009
 
k-Ary Search on Modern Processors
 
Slide 16
 
Agenda
Current Situation
Sort-based techniques are important
Modern CPUs provide SIMD instructions
Our Contribution
k
-ary search on Modern Processors
Expected Future Scaling
6/28/2009
k-Ary Search on Modern Processors
Slide 17
Conclusion
undefined
 
k
-Ary Search on Modern
Processors
 
Fakultät Informatik
, Institut Systemarchitektur, Professur Datenbanken
 
Benjami
n Schlegel,
Rainer Gemulla,
Wolfgang Lehner
 
Sigmod/DaMon 2009
Slide Note
Embed
Share

The presentation discusses the importance of searching operations in computer science, focusing on different types of searches such as point queries, nearest-neighbor key queries, and range queries. It explores search algorithms including linear search, hash-based search, tree-based search, and sort-based search, highlighting their complexities and performance characteristics. The agenda covers motivations, prerequisites, k-ary search strategies, experiments, and conclusions related to improving search operations on modern processors.

  • Search Operations
  • Computer Science
  • Algorithms
  • Modern Processors

Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Fakultt Informatik, Institut Systemarchitektur, Professur Datenbanken k-Ary Search on Modern Processors Sigmod/DaMon 2009 Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner

  2. Motivation Searching Is a fundamental operation in computer science Many application areas (e.g. databases, ) Demand for high performance Different Types of Searching Point queries Nearest-neighbor key queries Range queries 6/28/2009 k-Ary Search on Modern Processors Slide 2

  3. Motivation - Search algorithms Linear search Linear complexity Useful for small datasets Hash-based search Constant (best) complexity Additional space requirements Bad performance for range/nearest-key queries h(x) h(x) Tree-based search Logarithmic complexity Good update properties Sort-based search Logarithmic complexity No additional space requirements Cache-conscious range scans 6/28/2009 k-Ary Search on Modern Processors Slide 3

  4. Agenda 1. Motivation 2. Prerequisites Binary search SIMDized Binary Search 3. k-Ary Search Sorted Array Linearized k-ary Search Tree 4. Experiments 5. Conclusion 6/28/2009 k-Ary Search on Modern Processors Slide 4

  5. Binary search on a sorted array Idea Divide the search space in each iteration into two partitions Reduce the search space to one of the partitions Example Successful search for key 4 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 15 15 15 15 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 24 24 24 24 24 24 24 24 24 24 24 24 29 29 29 29 29 29 29 29 29 29 29 29 35 35 35 35 35 35 35 35 35 35 35 35 46 46 46 46 46 46 46 46 46 46 46 46 48 48 48 48 48 48 48 48 48 48 48 48 55 55 55 55 55 55 55 55 55 55 55 55 59 59 59 59 59 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 60 60 60 60 67 67 67 67 67 67 67 67 67 67 67 67 73 73 73 73 73 73 73 73 73 73 73 73 75 75 75 75 75 75 75 75 75 75 75 75 77 77 77 77 77 77 77 77 77 77 77 77 83 83 83 83 83 83 83 83 83 83 83 83 88 88 88 88 88 88 88 88 88 88 88 88 92 92 92 92 92 92 92 92 92 92 92 92 93 93 93 93 93 93 93 93 93 93 93 93 97 97 97 97 97 97 97 97 97 97 97 97 99 99 99 99 99 99 99 99 99 99 99 99 <1 <1 Hit Hit <7 <7 >1 >1 >7 >7 <15 <15 >15 >15 <48 <48 >48 >48 Analysis Worst-case iterations = log2(N+1) Number of keys Binary search 210 11 215 16 220 21 Problems No usage of modern CPU instruction sets 6/28/2009 k-Ary Search on Modern Processors Slide 5

  6. Modern Processors Provide SIMD instructions Current: SSE, SSE2, SSE3, SSSE3, SSE4A, SSE4.2, Altivec, Future: AVX, SSE5, (k-1) parallel operation executions for one instruction Execution time similar to scalar operations For free with current processors SIMD instructions applicable for searching Loading Element-wise (e.g., compare two vectors) Horizontal (e.g., horizontal sum of a vector) 6/28/2009 k-Ary Search on Modern Processors Slide 6

  7. SIMDized binary search on a sorted array Idea Same like plain binary search But: compares (k-1) subsequent keys per iteration Example Successful search for key 4 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 24 24 24 24 24 24 24 24 29 29 29 29 29 29 29 29 35 35 35 35 35 35 35 35 46 46 46 46 46 46 46 46 48 48 48 48 48 48 48 48 55 55 55 55 55 55 55 55 59 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 67 67 67 67 67 67 67 67 73 73 73 73 73 73 73 73 75 75 75 75 75 75 75 75 77 77 77 77 77 77 77 77 83 83 83 83 83 83 83 83 88 88 88 88 88 88 88 88 92 92 92 92 92 92 92 92 93 93 93 93 93 93 93 93 97 97 97 97 97 97 97 97 99 99 99 99 99 99 99 99 Hit Hit <15 <15 >18 >18 <48 <48 >55 >55 Analysis Worst-case iterations = log2(N/(k-1)+1) log2(N+1) - log2(k-1) Number of keys Binary search SIMDized binary search (k=5) 210 11 9 215 16 14 220 21 19 Advantage Constant reduction of log2(k-1) iterations 6/28/2009 k-Ary Search on Modern Processors Slide 7

  8. Agenda 1. Motivation 2. Prerequisites Binary search SIMDized Binary Search 3. k-Ary Search Sorted Array Linearized k-ary Search Tree 4. Experiments 5. Conclusion 6/28/2009 k-Ary Search on Modern Processors Slide 8

  9. k-ary search on a sorted array Idea Divide the search space into k partitions using k-1 separators Choose a partition depending of the comparison of the separators Example Successful search for key 4 7 7 7 7 7 7 7 7 35 35 35 35 35 35 35 35 88 88 88 88 88 88 88 88 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 24 24 24 24 24 24 24 24 29 29 29 29 29 29 29 29 46 46 46 46 46 46 46 46 48 48 48 48 48 48 48 48 55 55 55 55 55 55 55 55 59 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 67 67 67 67 67 67 67 67 73 73 73 73 73 73 73 73 75 75 75 75 75 75 75 75 77 77 77 77 77 77 77 77 83 83 83 83 83 83 83 83 92 92 92 92 92 92 92 92 93 93 93 93 93 93 93 93 97 97 97 97 97 97 97 97 99 99 99 99 99 99 99 99 Hit Hit <7 <7 >7 >7 <15 <15 >15 >15 <24 <24 >24 >24 <73 <73 >73 >73 Analysis Worst-case iterations = logk(N+1) Speedup = log2(k) Number of keys Binary search SIMDized binary search (k=5) k-ary search (k=5) 210 11 9 5 215 16 14 7 220 21 19 9 6/28/2009 k-Ary Search on Modern Processors Slide 9

  10. k-ary search on a sorted array (cont.) Comparison step choosing the next partition Should require constant time independently of k Use horizontal vector sum of element-wise comparison result Example: search for key 77 (k=5): Partition 0 Partition 0 Partition 1 Partition 1 Partition 2 Partition 2 Partition 3 Partition 3 Partition 4 Partition 4 1 1 4 4 7 7 10 10 11 11 15 15 18 18 19 19 24 24 29 29 35 35 46 46 48 48 55 55 59 59 60 60 67 67 73 73 75 75 77 77 83 83 88 88 92 92 93 93 97 97 99 99 Separators Separators: : 15 15 35 35 60 60 83 83 Replicated search key: Replicated search key: 77 77 77 77 77 77 77 77 Lower Lower- -than result: result: than comparison comparison - -1 1 - -1 1 - -1 1 0 0 - -3 3 Vector sum: Vector sum: Loading step k-1 accesses to discontinuous memory locations in each iteration Solution 1: Scatter&Gather instructions (Larrabee, GPU, ) Solution 2: Searching on an alternative data layout 6/28/2009 k-Ary Search on Modern Processors Slide 10

  11. k-ary search on a alternative data layout Idea Construct conceptually k-ary search tree Store node of k-ary search tree inorder in continuous array Example Mapping a sorted array on a linearized k-ary search tree 7 7 7 7 7 7 7 7 35 35 35 35 35 35 35 35 88 88 88 88 88 88 88 88 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 24 24 24 24 24 24 24 24 29 29 29 29 29 29 29 29 46 46 46 46 46 46 46 46 48 48 48 48 48 48 48 48 55 55 55 55 55 55 55 55 59 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 67 67 67 67 67 67 67 67 73 73 73 73 73 73 73 73 75 75 75 75 75 75 75 75 77 77 77 77 77 77 77 77 83 83 83 83 83 83 83 83 92 92 92 92 92 92 92 92 93 93 93 93 93 93 93 93 97 97 97 97 97 97 97 97 99 99 99 99 99 99 99 99 7 7 7 7 15 15 15 15 46 46 46 46 59 59 59 59 83 83 83 83 93 93 93 93 24 24 24 24 24 24 73 73 73 73 73 73 1 1 4 4 10 10 11 11 18 18 19 19 29 29 35 35 48 48 55 55 60 60 67 67 75 75 77 77 88 88 92 92 97 97 99 99 Successful search for key 4 24 24 24 24 24 24 24 24 73 73 73 73 73 73 73 73 7 7 7 7 7 7 7 7 15 15 15 15 15 15 15 15 46 46 46 46 46 46 46 46 59 59 59 59 59 59 59 59 83 83 83 83 83 83 83 83 93 93 93 93 93 93 93 93 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 29 29 29 29 29 29 29 29 35 35 35 35 35 35 35 35 48 48 48 48 48 48 48 48 55 55 55 55 55 55 55 55 60 60 60 60 60 60 60 60 67 67 67 67 67 67 67 67 75 75 75 75 75 75 75 75 77 77 77 77 77 77 77 77 88 88 88 88 88 88 88 88 92 92 92 92 92 92 92 92 97 97 97 97 97 97 97 97 99 99 99 99 99 99 99 99 Hit Hit <24 <24 >24 >24 <73 <73 >73 >73 <7 <7 >7 >7 <15 <15 >15 >15 Disadvantage: Range scans become expensive 6/28/2009 k-Ary Search on Modern Processors Slide 11

  12. Agenda 1. Motivation 2. Prerequisites Binary search SIMDized Binary Search 3. k-Ary Search Sorted Array Linearized k-ary Search Tree 4. Experiments 5. Conclusion 6/28/2009 k-Ary Search on Modern Processors Slide 12

  13. Experimental setup Setup Multiple runs with uniform distributed keys between 27and 225 Each run: search for 4096 uniform distributed keys Different key sizes: 4-byte integer and double precision floating point Algorithms Binary search (Bin) SIMDized binary search (Bin4) k-ary search on a sorted array (k-ary) k-ary search on a linearized k-ary search tree (k-ary-lt) Different platforms IBM Cell PPE (PowerPC-970 compatible) IBM Cell SPE (local store only / main memory) Intel Nehalem 920 2.66 GHz AMD Phenom 920 2.8 GHz AMD Phenom 920 2.8 GHz Setup Multiple runs with uniform distributed keys between 27and 225 Each run: search for 4096 uniform distributed keys Different key sizes: 4-byte integer and double precision floating point Algorithms Binary search (Bin) SIMDized binary search (Bin4) k-ary search on a sorted array (k-ary) k-ary search on a linearized k-ary search tree (k-ary-lt) Different platforms IBM Cell PPE (PowerPC-970 compatible) IBM Cell SPE (local store only / main memory) Intel Nehalem 920 2.66 GHz 6/28/2009 k-Ary Search on Modern Processors Slide 13

  14. Results: Cell BE SPE 64 64- -bit Floating Point bit Floating Point 32 32- -bit Integer bit Integer 1.6x 1.6x 1.3x 1.3x 2.5x 2.5x 1.5x 1.5x Outcomes Binary search performs worst k-ary search on a linearized k-ary search tree performs best Occurring spikes result from non-aligned DMA-accesses 6/28/2009 k-Ary Search on Modern Processors Slide 14

  15. Results: Intel Core i7 64 64- -bit Floating Point bit Floating Point 32 32- -bit Integer bit Integer 1.5x 1.5x 1.7x 1.7x 2.5x 2.5x 4x 4x Outcomes Binary search and SIMDized binary search perform similar Both k-ary search algorithms scale well for a huge number of keys 6/28/2009 k-Ary Search on Modern Processors Slide 15

  16. Agenda 1. Motivation 2. Prerequisites Binary search SIMDized Binary Search 3. k-Ary Search Sorted Array Linearized k-ary Search Tree 4. Experiments 5. Conclusion 6/28/2009 k-Ary Search on Modern Processors Slide 16

  17. Conclusion Current Situation Sort-based techniques are important Modern CPUs provide SIMD instructions Our Contribution k-ary search on Modern Processors Expected Future Scaling Number of keys Binary search SIMDized binary search (k=5) k-ary search (k=5) - SSE k-ary search (k=9) - AVX k-ary search (k=17) - Larrabee 210 11 9 5 4 3 215 16 14 7 5 4 220 21 19 9 7 5 6/28/2009 k-Ary Search on Modern Processors Slide 17

  18. Fakultt Informatik, Institut Systemarchitektur, Professur Datenbanken k-Ary Search on Modern Processors Sigmod/DaMon 2009 Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#