New Scheduling Policy: ServerFilling-SRPT in Multiserver Job Model

Slide Note
Embed
Share

Discusses a new scheduling policy, ServerFilling-SRPT, designed to favor small jobs in the multiserver job model. The policy aims to minimize mean response time by prioritizing jobs with the least remaining size order and arranging them based on server needs. The approach involves finding minimal subsets of jobs and preemptively rearranging them. The policy is compared against resource-pooled SRPT-1 to establish a lower bound on optimal scheduling.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Optimal Scheduling in the Multiserver-job Model under Heavy Traffic Isaac Grosof (CMU GA Tech UIUC Northwestern) Ziv Scully (Cornell) Mor Harchol-Balter (CMU) Alan Scheller-Wolf (CMU) ACM SIGMETRICS 2023, June 21st 1

  2. Modern Computing Systems Also: Many servers per job Modern Computing: Many servers CERN Datacenter Frontier Supercomputer Google Borg cluster 2

  3. Multiserver Job Model Job: (duration, server need) 3 Sampled i.i.d. from joint distribution Fraction of capacity = 1 Size ?= duration fraction of capacity 10 hours 1 2 system = 5 system-hours ?server need ? Poisson(?) 2 4 2 3 1 1 Size Server need 2 Response time ? Load ? = ?? ? < 1 Goal: Minimize mean response time ? ? 3

  4. Prior work on MSJ Scheduling Analyzing ?[?] Optimizing ?[?] Throughput optimality MaxWeight [MSY 12] Randomized Timers [PG 18] Only: ServerFilling [GHS 22] Unknown ?[?] Very recent Completely open! 4

  5. New scheduling policy: ServerFilling-SRPT Concept: Scheduling to favor small jobs M Least remaining size order k=8 2 4 1 2 8 1 1 4 4 1 2 1 1 Setting: ? is power of 2, all server needs are powers of 2. ServerFilling-SRPT: 1. Find minimal subset M of jobs of least remaing size with total server need ? servers 2. Among M, serve largest server need first. (tiebreak by remaining size) 5

  6. New scheduling policy: ServerFilling-SRPT Concept: Scheduling to favor small jobs M Least remaining size order 4 k=8 2 4 1 2 8 1 1 4 2 1 2 1 1 1 Setting: ? is power of 2, all server needs are powers of 2. ServerFilling-SRPT: 1. Find minimal subset M of jobs of least remaing size with total server need ? servers 2. Among M, serve largest server need first. (tiebreak by remaining size) 3. Whenever M changes, rearrange preemptively. Theorem: If ? jobs present, fills all servers 1 6

  7. Understand ?[?] for ServerFilling-SRPT ServerFilling-SRPT 4 SRPT-1 1 2 4 1 2 Resource-pooled super-server Size = duration Lower bound on optimal MSJ scheduling 2 Proof plan: Compare against resource-pooled SRPT-1 Bound relevant work ? ?? Bound response time ? ? 7

  8. Background for Proof: Relevant work SRPT-1 ServerFilling-SRPT 4 ??? ??1 1 2 4 1 2 2 ? 1 Relevant job: job with remaining size ? Relevant work ??: total remaining size of relevant jobs 8

  9. History of multiserver ? ? bounds Bound response time ? ? Bound relevant work ? ?? M/G/k/SRPT [GSH 18]: (one-server-per-job) Multiserver tagged job ?? ?? Many-jobs few-jobs ??? ??1+ ?? Fails in MSJ: Single Straggler 9

  10. Single Straggler Problem M Least remaining size order k=8 2 4 2 2 8 4 2 4 4 2 2 2 1 10

  11. Single Straggler Problem M Least remaining size order 4 k=8 2 4 2 2 8 4 2 4 2 2 2 2 1 2 Problem: 1-server job won t run anytime soon. Response time of jobs of size ?, ??, could be very large. ??bounds don t imply ?? bounds. 11

  12. New Proof Ingredient: MIAOW Bound response time ? ? Ideas from M/G/k/Gittins [SGH 21]: Bound relevant work ? ?? ??2 method: Bound ?[??] by bounding waste: ? ??1 ?? /(1 ??) WINE: ? ? =1 ? ?? ?2?? ? ?=0 New idea: Multiplicative Interval Analysis of Waste. ? ?? ? ??1 ?? ?2(1 ??) ?[?] impact of ? ????,? ?? : ???? ?? 1 Divide all ? into ln 1 ? multiplicatively-spaced intervals, bound each integral. ? + 1 ? 1 1 ?+? 1 Result: ? ??? ???? ? ????? 1 ln ? ? 12

  13. Results ? + 1 ? 1 ? 1 ?+? 1 Bound: ? ??? ???? ? ????? 1 ln ? Optimality: For any ? s.t. ?[?2log ?+] < , ? ??? ???? ? ????? 1 ? ??? ???? ? ???? ???= 1 lim ? 1 = lim ? 1 ServerFilling-SRPT has asymptotically optimal mean response time! 13

  14. Empirical Validation Setting: ? = 8, server need uniform [1,2,4,8], size Hyperexp, ?2 10, independent of server need. 5 4 3 ?[?] Great ?[?] at all loads! 2 1 0 0.2 1.0 0.6 0 0.4 0.8 Load ? 14

  15. Generalizations Unknown or estimated sizes: ServerFilling-Gittins. Heavy traffic optimality! Server needs are divisors of ?: DivisorFilling-SRPT & DivisorFilling-Gittins. Heavy traffic optimality! 15

  16. Conclusion ServerFilling-SRPT 4 1 2 4 1 2 2 Proved tight bounds and asymptotic optimality using MIAOW: ? ??? ???? ? ???? ???= 1 lim ? 1 16

Related