Enhancing Web Search Latency with DDS Prediction

 
1
Delayed-Dynamic-Selective (DDS)
Prediction for Reducing Extreme Tail
Latency in Web Search
 
Saehoon Kim
§
, 
Yuxiong He
*
,
 
Seung-won Hwang
§
,
Sameh Elnikety
*
, 
Seungjin Choi
§
 
§
 
*
Web Search Engine Requirement
2
Queries
This talk focuses on how to achieve low latency
without compromising the quality
Low Latency for All Users
Reduce tail latency
 (high-percentile response
time)
Reducing average latency is not sufficient
3
Commercial search engine reduces 99th-
percentile latency
Reducing End-to-End Latency
4
Long(-running )query
40 Index Server
Nodes (ISNs)
 
Reducing Tail Latency by Parallelization
 
Opportunity of
Parallelization
1.
Available idle
cores
2.
CPU-intensive
workloads
 
5
Challenges of Exploiting Parallelism
6
Parallelizing all queries
Inefficient under medium
to high load
Parallelizing short queries
No speed up
Parallelizing long queries
Good speed up
Parallelize only long(-running) queries
Prior Work - PREDictive Parallelization
Predict the query execution time
Parallelize the 
predicted long queries only
Execute the 
predicted short queries sequentially
7
“WSDM”
Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR’14]
Requirements
99
th
 tail latency at aggregator <= 120ms
Reduce 99.99
th
 tail latency at each ISN <=
120ms
8
 
Contributions
 
 
Key Contributions:
1.
Proposes DDS (
D
elayed-
D
ynamic-
S
elective)
prediction to achieve very high recall and good
precision
2.
Use DDS prediction to effectively reduce
extreme tail latency
 
9
Overview of DDS
10
Query
 
Delayed Prediction
 
1)
Complete 
many short
queries
 sequentially
 
2)
Collect 
dynamic
features
 
11
 
Dynamic Features
 
What are dynamic features?
Features that can only be collected at 
runtime
 
Two categories
NumEstMatchDocs
: to estimate the total #
matched docs
DynScores
: to predict early termination
 
12
Primary Factors for Execution Time
13
…….
…….
Inverted index for “2015”
 
Primary Factors for Execution Time
 
14
 
…….
 
…….
 
Inverted index for “2015”
Early Termination
15
Top-3 Results
 
If min. Dynamic score >
threshold, then stop.
 
Importance of Dynamic Features
 
Top-10 feature importance
by boosted regression tree
 
NumEstMachDoc helps 
to
predict # total matched
docs
 
DynScore helps 
to predict
early termination
 
16
 
Selective Prediction
 
17
 
Find out almost all long queries with good
precision
 
Identify the outliers (long query predicted as
short)
 
Selective Prediction
 
18
 
Long queries
 
Short queries
 
Overview of DDS
 
19
 
Query
 
Evaluations of Predictor Accuracy (1/3)
 
Baseline (PRED)
Static
 features with no delayed prediction
IDF, Static score (e.x. PageRank), etc.
 
Proposed method (DDS)
Dynamic
 (+static) features with 
Delayed
 and
Selective
 prediction
 
 
20
 
Evaluations of Predictor Accuracy (2/3)
 
69,010 Bing queries at production workload
14,565 queries >= 10ms
635 queries >= 100ms
 
Boosted regression tree with 10-fold cross
validation
For PRED, we use 69,010 queries
For DDS, we use 14,565 queries
 
21
 
Evaluations of Predictor Accuracy (3/3)
 
957% Improvement
 over PRED
 
22
 
Evaluations of Predictor Accuracy (3/3)
 
957% Improvement
 over PRED
 
23
 
Delayed
 
Evaluations of Predictor Accuracy (3/3)
 
957% Improvement
 over PRED
 
24
 
Delayed
 
Dynamic features
 
Selective features
 
Simulation Results on Tail Latency
Reduction
 
Baseline (PRED)
Predict query execution time before running it
Parallelize the long query with 4-way parallelism
 
Proposed method (DDS)
Run a query for 
10ms sequentially
Parallelizes the 
long or unpredictable queries
 with
4-way parallelism
 
25
 
ISN Response Time
 
26
 
ISN Response Time
 
27
ISN Response Time
28
 
Aggregator Response Time
 
DDS can optimize 99
th
-percentile tail latency at
aggregator under high QPS
 
29
 
Conclusion
 
Proposes a novel prediction framework
Delayed prediction/Dynamic features/Selective
prediction
Achieves a high precision and recall compared to
PRED
Reduces 99
th
-percentile aggregator response
time <= 120ms under high load!
 
30
 
31
Slide Note
Embed
Share

This presentation delves into DDS Prediction, a technique designed to reduce extreme tail latency in web search engines by optimizing query execution times and parallelizing specific queries. It addresses the challenges of improving latency for all users and emphasizes the importance of achieving high quality search results with minimal delays. The talk also highlights the significance of reducing end-to-end latency and optimizing resource parallelization to enhance overall search performance.

  • Web Search
  • Latency Optimization
  • DDS Prediction
  • Parallelization Techniques
  • Search Engine Performance

Uploaded on Aug 31, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search Saehoon Kim , Yuxiong He*, Seung-won Hwang , Sameh Elnikety*, Seungjin Choi * 1

  2. Web Search Engine Requirement Queries High quality + Low latency This talk focuses on how to achieve low latency without compromising the quality 2

  3. Low Latency for All Users Reduce tail latency (high-percentile response time) Reducing average latency is not sufficient Latency Commercial search engine reduces 99th- percentile latency 3

  4. Reducing End-to-End Latency The 99th percentile response time < 120ms Aggregator 40 Index Server Nodes (ISNs) ISN ISN ISN ISN The 99.99th percentile response time < 120ms Long(-running )query 4

  5. Reducing Tail Latency by Parallelization Resource Latency Opportunity of Parallelization 1. Available idle cores 2. CPU-intensive workloads Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms 5

  6. Challenges of Exploiting Parallelism Parallelizing all queries Inefficient under medium to high load Parallelizing short queries No speed up Parallelizing long queries Good speed up Parallelize only long(-running) queries 6

  7. Prior Work - PREDictive Parallelization Predict the query execution time Parallelize the predicted long queries only Execute the predicted short queries sequentially WSDM Long Feature Extraction Regression function Prediction model Short Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR 14] 7

  8. Requirements 99th tail latency at aggregator <= 120ms Reduce 99.99th tail latency at each ISN <= 120ms Recall >= 98.9% Precision Should be high Less queries to be parallelized 1.1% Requirements To optimize 99.99th tail latency 98.9% Reason PRED PRED cannot effectively reduce 99.99th tail latency 8

  9. Contributions Key Contributions: 1. Proposes DDS (Delayed-Dynamic-Selective) prediction to achieve very high recall and good precision 2. Use DDS prediction to effectively reduce extreme tail latency 9

  10. Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Queries > 10ms Query Long Dynamic prediction Queries < 10ms Predictor for execution time Finished Short 10

  11. Delayed Prediction 1) Complete many short queries sequentially 2) Collect dynamic features 11

  12. Dynamic Features What are dynamic features? Features that can only be collected at runtime Two categories NumEstMatchDocs: to estimate the total # matched docs DynScores: to predict early termination 12

  13. Primary Factors for Execution Time 1. # total matched documents Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Inverted index for 2015 Processing 13

  14. Primary Factors for Execution Time 1. # total matched documents Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Inverted index for 2015 Processing Not evaluated 2. Early termination 14

  15. Early Termination Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Processing Not evaluated Doc ID Doc ID Doc ID Doc ID Dynamic Score Dynamic Score Dynamic Score Dynamic Score To predict early termination, If min. Dynamic score > threshold, then stop. Top-3 Results Consider a dynamic score distribution Doc 1 Doc 3 Doc 3 Doc 3 -4.11 -4.01 -4.01 -4.01 Doc 1 Doc 1 Doc 8 -4.11 -4.11 -4.10 Doc 5 Doc 1 -4.23 -4.11 15

  16. Importance of Dynamic Features Top-10 feature importance by boosted regression tree NumEstMachDoc helps to predict # total matched docs DynScore helps to predict early termination 16

  17. Selective Prediction Find out almost all long queries with good precision Identify the outliers (long query predicted as short) Predicted execution time 17

  18. Selective Prediction Long queries Predicted ?1 error Short queries Predicted execution time 18

  19. Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Queries > 10ms Query Long Dynamic prediction Queries < 10ms Predictor for execution time Finished Short 19

  20. Evaluations of Predictor Accuracy (1/3) Baseline (PRED) Static features with no delayed prediction IDF, Static score (e.x. PageRank), etc. Proposed method (DDS) Dynamic (+static) features with Delayed and Selective prediction 20

  21. Evaluations of Predictor Accuracy (2/3) 69,010 Bing queries at production workload 14,565 queries >= 10ms 635 queries >= 100ms Boosted regression tree with 10-fold cross validation For PRED, we use 69,010 queries For DDS, we use 14,565 queries 21

  22. Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED 22

  23. Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Delayed 23

  24. Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Selective features Dynamic features Delayed 24

  25. Simulation Results on Tail Latency Reduction Baseline (PRED) Predict query execution time before running it Parallelize the long query with 4-way parallelism Proposed method (DDS) Run a query for 10ms sequentially Parallelizes the long or unpredictable queries with 4-way parallelism 25

  26. ISN Response Time 26

  27. ISN Response Time 27

  28. ISN Response Time 70% throughput increase 28

  29. Aggregator Response Time DDS can optimize 99th-percentile tail latency at aggregator under high QPS 29

  30. Conclusion Proposes a novel prediction framework Delayed prediction/Dynamic features/Selective prediction Achieves a high precision and recall compared to PRED Reduces 99th-percentile aggregator response time <= 120ms under high load! 30

  31. 31

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#