Enhancing Web Search Latency with DDS Prediction

Delayed-Dynamic-Selective (DDS)

Prediction for Reducing Extreme Tail

Latency in Web Search

Saehoon Kim

§

Yuxiong He

Seung-won Hwang

§

Sameh Elnikety

Seungjin Choi

§

§

Web Search Engine Requirement

Queries

This talk focuses on how to achieve low latency

without compromising the quality

Low Latency for All Users

•

Reduce tail latency

 (high-percentile response

time)

•

Reducing average latency is not sufficient

Commercial search engine reduces 99th-

percentile latency

Reducing End-to-End Latency

Long(-running )query

40 Index Server

Nodes (ISNs)

Reducing Tail Latency by Parallelization

Opportunity of

Parallelization

1.

Available idle

cores

2.

CPU-intensive

workloads

Challenges of Exploiting Parallelism

•

Parallelizing all queries

–

Inefficient under medium

to high load

•

Parallelizing short queries

–

No speed up

•

Parallelizing long queries

–

Good speed up

Parallelize only long(-running) queries

Prior Work - PREDictive Parallelization

•

Predict the query execution time

•

Parallelize the

predicted long queries only

•

Execute the

predicted short queries sequentially

“WSDM”

Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR’14]

Requirements

•

th

 tail latency at aggregator <= 120ms

•

Reduce 99.99

th

 tail latency at each ISN <=

120ms

Contributions

•

Key Contributions:

1.

Proposes DDS (

elayed-

ynamic-

elective)

prediction to achieve very high recall and good

precision

2.

Use DDS prediction to effectively reduce

extreme tail latency

Overview of DDS

Query

Delayed Prediction

1)

Complete

many short

queries

 sequentially

2)

Collect

dynamic

features

Dynamic Features

•

What are dynamic features?

–

Features that can only be collected at

runtime

•

Two categories

–

NumEstMatchDocs

: to estimate the total #

matched docs

–

DynScores

: to predict early termination

Primary Factors for Execution Time

…….

…….

Inverted index for “2015”

Primary Factors for Execution Time

…….

…….

Inverted index for “2015”

Early Termination

Top-3 Results

If min. Dynamic score >

threshold, then stop.

Importance of Dynamic Features

•

Top-10 feature importance

by boosted regression tree

•

NumEstMachDoc helps

to

predict # total matched

docs

•

DynScore helps

to predict

early termination

Selective Prediction

•

Find out almost all long queries with good

precision

•

Identify the outliers (long query predicted as

short)

Selective Prediction

Long queries

Short queries

Overview of DDS

Query

Evaluations of Predictor Accuracy (1/3)

•

Baseline (PRED)

–

Static

 features with no delayed prediction

–

IDF, Static score (e.x. PageRank), etc.

•

Proposed method (DDS)

–

Dynamic

 (+static) features with

Delayed

and

Selective

 prediction

Evaluations of Predictor Accuracy (2/3)

•

69,010 Bing queries at production workload

–

14,565 queries >= 10ms

–

635 queries >= 100ms

•

Boosted regression tree with 10-fold cross

validation

–

For PRED, we use 69,010 queries

–

For DDS, we use 14,565 queries

Evaluations of Predictor Accuracy (3/3)

957% Improvement

 over PRED

Evaluations of Predictor Accuracy (3/3)

957% Improvement

 over PRED

Delayed

Evaluations of Predictor Accuracy (3/3)

957% Improvement

 over PRED

Delayed

Dynamic features

Selective features

Simulation Results on Tail Latency

Reduction

•

Baseline (PRED)

–

Predict query execution time before running it

–

Parallelize the long query with 4-way parallelism

•

Proposed method (DDS)

–

Run a query for

10ms sequentially

–

Parallelizes the

long or unpredictable queries

 with

4-way parallelism

ISN Response Time

ISN Response Time

ISN Response Time

Aggregator Response Time

DDS can optimize 99

th

-percentile tail latency at

aggregator under high QPS

Conclusion

•

Proposes a novel prediction framework

–

Delayed prediction/Dynamic features/Selective

prediction

–

Achieves a high precision and recall compared to

PRED

•

Reduces 99

th

-percentile aggregator response

time <= 120ms under high load!

Slide Note

Embed Share

Download

This presentation delves into DDS Prediction, a technique designed to reduce extreme tail latency in web search engines by optimizing query execution times and parallelizing specific queries. It addresses the challenges of improving latency for all users and emphasizes the importance of achieving high quality search results with minimal delays. The talk also highlights the significance of reducing end-to-end latency and optimizing resource parallelization to enhance overall search performance.

kalila Follow

Uploaded on Aug 31, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search Saehoon Kim , Yuxiong He*, Seung-won Hwang , Sameh Elnikety*, Seungjin Choi * 1

Web Search Engine Requirement Queries High quality + Low latency This talk focuses on how to achieve low latency without compromising the quality 2

Low Latency for All Users Reduce tail latency (high-percentile response time) Reducing average latency is not sufficient Latency Commercial search engine reduces 99th- percentile latency 3

Reducing End-to-End Latency The 99th percentile response time < 120ms Aggregator 40 Index Server Nodes (ISNs) ISN ISN ISN ISN The 99.99th percentile response time < 120ms Long(-running )query 4

Reducing Tail Latency by Parallelization Resource Latency Opportunity of Parallelization 1. Available idle cores 2. CPU-intensive workloads Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms 5

Challenges of Exploiting Parallelism Parallelizing all queries Inefficient under medium to high load Parallelizing short queries No speed up Parallelizing long queries Good speed up Parallelize only long(-running) queries 6

Prior Work - PREDictive Parallelization Predict the query execution time Parallelize the predicted long queries only Execute the predicted short queries sequentially WSDM Long Feature Extraction Regression function Prediction model Short Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR 14] 7

Requirements 99th tail latency at aggregator <= 120ms Reduce 99.99th tail latency at each ISN <= 120ms Recall >= 98.9% Precision Should be high Less queries to be parallelized 1.1% Requirements To optimize 99.99th tail latency 98.9% Reason PRED PRED cannot effectively reduce 99.99th tail latency 8

Contributions Key Contributions: 1. Proposes DDS (Delayed-Dynamic-Selective) prediction to achieve very high recall and good precision 2. Use DDS prediction to effectively reduce extreme tail latency 9

Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Queries > 10ms Query Long Dynamic prediction Queries < 10ms Predictor for execution time Finished Short 10

Delayed Prediction 1) Complete many short queries sequentially 2) Collect dynamic features 11

Dynamic Features What are dynamic features? Features that can only be collected at runtime Two categories NumEstMatchDocs: to estimate the total # matched docs DynScores: to predict early termination 12

Primary Factors for Execution Time 1. # total matched documents Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Inverted index for 2015 Processing 13

Primary Factors for Execution Time 1. # total matched documents Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Inverted index for 2015 Processing Not evaluated 2. Early termination 14

Early Termination Lowest Docs sorted by static scores Highest Web Doc 1 Doc 2 Doc 3 . Doc N-2 Doc N-1 Doc N documents . . Inverted index for WSDM Processing Not evaluated Doc ID Doc ID Doc ID Doc ID Dynamic Score Dynamic Score Dynamic Score Dynamic Score To predict early termination, If min. Dynamic score > threshold, then stop. Top-3 Results Consider a dynamic score distribution Doc 1 Doc 3 Doc 3 Doc 3 -4.11 -4.01 -4.01 -4.01 Doc 1 Doc 1 Doc 8 -4.11 -4.11 -4.10 Doc 5 Doc 1 -4.23 -4.11 15

Importance of Dynamic Features Top-10 feature importance by boosted regression tree NumEstMachDoc helps to predict # total matched docs DynScore helps to predict early termination 16

Selective Prediction Find out almost all long queries with good precision Identify the outliers (long query predicted as short) Predicted execution time 17

Selective Prediction Long queries Predicted ?1 error Short queries Predicted execution time 18

Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Queries > 10ms Query Long Dynamic prediction Queries < 10ms Predictor for execution time Finished Short 19

Evaluations of Predictor Accuracy (1/3) Baseline (PRED) Static features with no delayed prediction IDF, Static score (e.x. PageRank), etc. Proposed method (DDS) Dynamic (+static) features with Delayed and Selective prediction 20

Evaluations of Predictor Accuracy (2/3) 69,010 Bing queries at production workload 14,565 queries >= 10ms 635 queries >= 100ms Boosted regression tree with 10-fold cross validation For PRED, we use 69,010 queries For DDS, we use 14,565 queries 21

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED 22

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Delayed 23

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Selective features Dynamic features Delayed 24

Simulation Results on Tail Latency Reduction Baseline (PRED) Predict query execution time before running it Parallelize the long query with 4-way parallelism Proposed method (DDS) Run a query for 10ms sequentially Parallelizes the long or unpredictable queries with 4-way parallelism 25

ISN Response Time 26

ISN Response Time 27

ISN Response Time 70% throughput increase 28

Aggregator Response Time DDS can optimize 99th-percentile tail latency at aggregator under high QPS 29

Conclusion Proposes a novel prediction framework Delayed prediction/Dynamic features/Selective prediction Achieves a high precision and recall compared to PRED Reduces 99th-percentile aggregator response time <= 120ms under high load! 30

Enhancing Web Search Latency with DDS Prediction

Download Presentation

Presentation Transcript

Related

More Related Content