Multiserver Stochastic Scheduling Analysis
This presentation delves into the analysis and optimality of multiserver stochastic scheduling, focusing on the theory of large-scale computing systems, queueing theory, and prior work on single-server and multiserver scheduling. It explores optimizing response time and resource efficiency in modern computing systems with multiple servers.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Multiserver Stochastic Scheduling: Analysis and Optimality Isaac Grosof CMU Northwestern University, Jan. 31, 2023 1 Northwestern University - Isaac Grosof
Theory of Large-Scale Computing Systems Datacenters, Supercomputers 1% of world s electricity 205 TWh, 140 Mt CO2, $320B. Need systems to be: Fast: low response time Efficient: minimal resources Scheduling is a key tool 2 Northwestern University - Isaac Grosof
Abstract Model: Queueing Theory Arrivals Response time: ? Varying job sizes (inherent work of a job) Arrivals are a stochastic process Scheduling policy preemptively determines which job to serve Modern computing systems: many servers. 3 Northwestern University - Isaac Grosof
Prior work: Single-server & Multiserver Scheduling 1 server Multiserver Well understood! [Pollaczek 30], [Schrage & Miller 66], [Scully 18]. Analysis only for First-Come First-Served (FCFS) [K llerstr m 74 & 79]. Nothing for complex scheduling policies. Analysis: Characterize ?[?] (mean response time) This talk: Multiserver results, building on single- server progress. Optimization: Minimize ?[?] over all scheduling policies. Entirely open. Negative worst-case results [Leonardi & Raz 97]. Shortest Remaining Processing Time (SRPT) is optimal! [Schrage 68]. 4 Northwestern University - Isaac Grosof
First Scheduling Analysis & Optimality 1 server/job (M/G/k) 3 Multiple servers/job (MSJ) 2 4 1 3 3 1 First analysis & optimality results for each system. 5 Northwestern University - Isaac Grosof
One server/job (M/G/k) model Load ? = ?? ? < 1 Size: ?, i.i.d. SRPT-k k servers Poisson(?) 6 Northwestern University - Isaac Grosof
One server/job (M/G/k) model Load ? = ?? ? < 1 Size: ?, i.i.d. SRPT-k k servers Poisson(?) SRPT for Multiserver Systems . Grosof, Scully, Harchol-Balter. IFIP Performance 18. Best Student Paper Award. Better scheduling: 37% higher completion rate, same resources, same response time. 7 Northwestern University - Isaac Grosof
Background: Relevant work x SRPT-k Work Relevant work Work ?: total remaining size of jobs Relevant job: job with remaining size ? Relevant work ??: total remaining size of relevant jobs Normalize: Work completes at rate 1 if all servers busy. 8 Northwestern University - Isaac Grosof
Analysis and Optimality of SRPT-k Speed 1 Speed 1/? SRPT-k SRPT-1 Speed 1/? Speed 1/? Idea: Bound SRPT-k relative to resource-pooled SRPT-1 Lemma: Bound SRPT-k relevant work relative to SRPT-1 relevant work Goal: Bound SRPT-k response time relative to SRPT-1 response time 9 Northwestern University - Isaac Grosof
Step 1: Bound relevant work SRPT-k SRPT-1 ??? ??1 Couple SRPT-k and SRPT-1: Same arrival sequence Consider ?(?) = ???(?) ??1(?). When # relevant jobs in SRPT-k system ?: ?(?) nonincreasing. 10 Northwestern University - Isaac Grosof
Step 1: Bound relevant work SRPT-k SRPT-1 ??? ??1 Couple SRPT-k and SRPT-1: Same arrival sequence Consider ?(?) = ???(?) ??1(?). When # relevant jobs in SRPT-k system ?: ?(?) nonincreasing. When # relevant jobs in SRPT-k system < ?: ???(?) ??, so ?(?) ?? For all t, ?(?) ??, so ? ??? ? ??1+ ?? 11 Northwestern University - Isaac Grosof
Step 2: Bound response time x SRPT-k Key idea: Bound on x s response time implied by bound on work completed while x in system. Work while x in queue Work while x in service 12 Northwestern University - Isaac Grosof
Step 2: Bound response time SRPT-k x Key idea: Bound on x s response time implied by bound on work completed while x in system. Relevant work seen on arrival Work while x in queue Relevant work Work while x in service 13 Northwestern University - Isaac Grosof
Step 2: Bound response time SRPT-k x Key idea: Bound on x s response time implied by bound on work completed while x in system. Relevant work seen on arrival Lemma: ? ??? ?[??1] + ?? Work while x in queue Relevant work Bound: Relevant work arriving after x Same as single server 2?? 1 ?? ? ??? ? ??1+ Work while x in service ?? 14 Northwestern University - Isaac Grosof
SRPT-k SRPT-k Analysis x 2?? 1 ?? Bound: ? ?????? ? ? ?????? 1+ Theorem: ? ????? ? ? ????? 1+ 2?? ? 1 ?ln 1 ? First analysis of SRPT-k! First analysis of any nontrivial multiserver scheduling policy! 15 Northwestern University - Isaac Grosof
SRPT-k SRPT-k Optimality x Theorem: ? ????? ? ? ????? 1+ 2?? ? 1 ?ln 1 ? ?(? ????? 1) as ? 1 [Lin, Wierman, & Zwart 11] ? ????? ? ? ????? 1= 1 ? ????? ? ? ???? ?= 1 Theorem: lim ? 1 lim ? 1 SRPT-k yields asymptotically optimal mean response time! First optimality result in the M/G/k! 16 Northwestern University - Isaac Grosof
Empirical Impact S: Hyperexponential, ?2 10. ? = 4 servers. Target: ? ? 10. FCFS-k (First-Come First-Served) SRPT-k: ? = 0.7 ? = 0.96 37% higher completion rate, same resources, same response time. Potential to save 10s TWh, 10s Mt CO2, $100Bs. 30 FCFS-k Prio-k PLCFS-k PSJF-k SRPT-k 25 20 ?[?] 15 10 5 0 0.5 0.7 0.8 0.6 0.9 1.0 Load ? 17 Northwestern University - Isaac Grosof
Prior Work: Single-server & Multiserver Scheduling Multiserver 1 server/job 1 server Well understood! [Pollaczek 30], [Schrage & Miller 66], [Scully 18]. SRPT-k s ?[?] bounded! Analysis: Characterize ?[?] (mean response time) SRPT is optimal! [Schrage 68]. SRPT-k is optimal! Optimization: Minimize ?[?] over all scheduling policies. 18 Northwestern University - Isaac Grosof
Prior Work: Single-server & Multiserver Scheduling Multiserver many servers/job Multiserver 1 server/job 1 server 3 3 2 2 4 2 1 Well understood! [Pollaczek 30], [Schrage & Miller 66], [Scully 18]. No analysis for any scheduling policy. SRPT-k s ?[?] bounded! Analysis: Characterize ?[?] (mean response time) SRPT is optimal! [Schrage 68]. Completely open SRPT-k is optimal! Optimization: Minimize ?[?] over all scheduling policies. 19 Northwestern University - Isaac Grosof
Multiserver-job (MSJ) Model 3 Size (height) = duration server need 4 2 2 3 3 1 Motivated by Google Borg scheduler [Tirmazi et al. 20]. Trace: Factor of 105 variation in server need. Prior scheduling: Heuristics and/or poor response time 20 Northwestern University - Isaac Grosof
New Multiserver-job (MSJ) Scheduling Policy Want to design SRPT-like scheduling policy. First try: 3 Greedy-SRPT 3 2 4 2 3 1 Wasted Server Want new policy: Avoid waste, SRPT-like. Goal: First ?[?] analysis & optimality. Setting for talk: ? is power of 2, all server needs are powers of 2. 21 Northwestern University - Isaac Grosof
New Multiserver-job (MSJ) Scheduling Policy 8 1 2 4 2 Setting for talk: ? is power of 2, all server needs are powers of 2. Optimal Scheduling in the Multiserver-job Model under Heavy Traffic , Grosof, Scully, Harchol-Balter, Scheller-Wolf. SIGMETRICS, 23. Also WCFS, Queueing Systems, 22. Better scheduling: 73% higher completion rate, same resources, same response time. 22 Northwestern University - Isaac Grosof
Define ServerFilling-SRPT M Least remaining size order k=8 2 4 1 2 8 1 1 4 4 1 2 1 1 1. Find minimal subset M of jobs of least rem. size with total server need ? servers 2. Among M, serve largest server need first. (tiebreak by remaining size) 23 Northwestern University - Isaac Grosof
Define ServerFilling-SRPT M Least remaining size order 4 k=8 2 4 1 2 8 1 1 4 2 1 2 1 1 1 1 1. Find minimal subset M of jobs of least rem. size with total server need ? servers 2. Among M, serve largest server need first. (tiebreak by remaining size) 3. When a job completes, M changes, rearrange preemptively. 24 Northwestern University - Isaac Grosof
Define ServerFilling-SRPT M Least remaining size order 4 k=8 2 4 1 2 8 1 1 4 2 1 1 1 1. Find minimal subset M of jobs of least rem. size with total server need ? servers 2. Among M, serve largest server need first. (tiebreak by remaining size) 3. When a job completes, M changes, rearrange preemptively. Efficiency theorem: If total server need of all jobs ?, ServerFilling-SRPT fills all servers Next: Analyze ? ? Server need is power of 2 25 Northwestern University - Isaac Grosof
Understand ?[?] for ServerFilling-SRPT 1/k 1/k 1/k 1/k 1/k 1/k 1/k 1/k ServerFilling-SRPT 4 SRPT-1 1 1 2 4 1 2 2 Proof plan: Compare against resource-pooled SRPT-1 Bound relevant work ? ?? Bound response time ? ? But: Obstacles at each step! 26 Northwestern University - Isaac Grosof
ServerFilling-SRPT: Analysis & Optimality Compare against resource-pooled SRPT-1 Bound relevant work ? ?? Bound response time ? ? ? ! ? Old ? ?? bound too weak New technique: MIAOW Bound intervals of ?, not individual ?. Old ? ? bound fails New formula: WINE ? ? =1 ? ? ! What size distribution? Size = duration server need ! ?[??] ?2 ?? 0 ?+1 ? 1 ? 1 ?+? 1 First analysis: ? ??? ???? ? ????? 1+ ? ??? ???? ? ???? ???= 1 ln ? First optimality: lim ? 1 27 Northwestern University - Isaac Grosof
Empirical Impact Setting: ? = 8, server need uniform [1,2,4,8], size Hyperexp, ?2 10, independent of server need. Target ?[?] = 1.5 Greedy-SRPT ServerFilling-SRPT: ? = 0.56 ? = 0.97 73% higher completion rate, same resources, same response time. Even more TWh, Mt CO2, $B. 5 4 3 ?[?] 2 1 0 0.2 1.0 0.6 0 0.4 0.8 Load ? 28 Northwestern University - Isaac Grosof
Theory Practical Power Management Huge Waste of Capacity & Energy Response Time Guarantees + Problem: No Preemption Current solution: Heuristics [Autopilot 20] 3 1 Wasted Servers My approach: Learning Workload Distributions Advanced Scheduling Mathematical Guarantees + 29 Northwestern University - Isaac Grosof
??? Future directions: Response time tails Response time ? Problem: Practitioners care about tail of response time. Analysis only for mean ? ? . Analyze tail? Schedule to optimize tail? Current theory: Optimal worst-case max. response time: FCFS. Asymptotic tail analysis. Conjecture: FCFS optimal for light-tailed sizes. My work and future approach: 1. Overturned conjecture with Nudge [GYSH, SIGMETRICS 21]. 2. Intermediate tails? 3. Multiserver tails? 30 Northwestern University - Isaac Grosof
Future directions: Scheduling with Estimates Problem: Don t know sizes Current solution: Non-size-based scheduling (FCFS). Schedule based on age & size distribution: Gittins Index [Gittins & Nash 77]. My approach: 1. Bring Gittins to multiserver world: Gittins-k [SGH, 21].ServerFilling-Gittins. 2. Distribution-robust scheduling: SRPT-B [SGM, 22]. 3. Confidence bounds from learning theory 4. New theory integrating scheduling + learning. 31 Northwestern University - Isaac Grosof
Broader Scope: My Papers SRPT-k, Performance 18, GSH Guardrails, SIGMETRICS 19, GSH M-Gittins-k, Performance 20, SGH Gittins-k, SIGMETRICS 21, SGH Nudge, SIGMETRICS 21, GYSH SRPT-B, ITCS 22, SGM ServerFilling-FCFS, Queueing Systems 22, GHS ServerFilling-SRPT, SIGMETRICS 23, GSHS +4 non-queueing-theory papers 32 Northwestern University - Isaac Grosof
Conclusion: Multiserver Analysis & Optimality One server/job: First optimality result: SRPT-k Multiple servers/job: First analysis & first optimality: ServerFilling-SRPT 2 SRPT-k ServerFilling-SRPT 4 2 4 2 1 1 33 Northwestern University - Isaac Grosof
34 Northwestern University - Isaac Grosof
Key tool: Size & Work 2 4 4 2 2 1 1 Size: Duration * server need Ex: 4hrs * 2CPUs = 8 CPU hours Work ?: Total remaining size of all jobs Normalize: Work completes at rate 1 if all servers busy. 35 Northwestern University - Isaac Grosof
Bound mean work ?[???] Key idea: In steady state, expected rate of increases matches expected rate of decreases. Arrivals and rate-1 work same as FCFS. Only lower-rate work differs. ? ???= E WFCFS 1+ k jobs work For reasonable S (e.g. phase type), ? ???= ? ????? 1+ ??(1) ? ???2= 0 Work at rate 1 Job arrives Work ??? Too few jobs: lower rate Time Formalize using ? 36 Northwestern University - Isaac Grosof
ServerFilling Analysis Lemma: ? ???= ? ????? 1+ ??(1) Theorem: ? ???= ? ????? 1+ ??(1) First analysis for any multiserver-job scheduling policy! Work at rate 1 Job arrives Work ??? Too few jobs: lower rate Time 37 Northwestern University - Isaac Grosof
Multiserver-Job Optimality ServerFilling: based on FCFS. For optimality: ServerFilling-SRPT. M Least remaining size order k=8 4 2 1 2 1 1 1 1 1 2 2 1 Optimal Scheduling in the Multiserver-job Model under Heavy Traffic . Grosof, Scully, Harchol-Balter, Scheller-Wolf. SIGMETRICS 23. 38 Northwestern University - Isaac Grosof