EEVDF: The New Linux Scheduler Overview
This presentation delves into the EEVDF Linux scheduler, highlighting its ideal scheduling concepts, task priorities, time-based scheduling, and vRuntime explanation. It covers the efficient handling of dynamic task additions and deletions, as well as the Core Fair Scheduler (CFS) mechanisms based on vRuntime. The content provides insights into task management in Linux environments, shedding light on key functionalities and optimizations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
EEVDF The New Linux Scheduler Balakumaran Kannan Balakumaran.Kannan@microsoft.com Senior Software Engineer Azure Linux, Microsoft
Agenda EEVDF with code snippet Highlights Ideal Scheduler - concept overview vRuntime Explained CFS with code snippet
Highlights T = T(Running) - T(Runnable) Peter Zijlstra
Ideal Scheduler Task Priority A 1
Ideal Scheduler Task Priority A 1 B 1
Ideal Scheduler Task Priority A 1 B 1 Not Practical! C 2
Time Based Scheduler Task Priority A 1 Time Quanta:100 ms
Time Based Scheduler Task Priority A 1 B 1 Time Quanta:100 ms
Time Based Scheduler Task Priority A 1 B 1 C 2 Time Quanta:100 ms
vRuntime Explained Instead of slicing the capacity, let s slice the time Problem of not accounting dynamic nature update_curr: for task-x 750 ms Task-1 (500 ms) Task-2 (500 ms) 0 ms Task-1 (0, 1000 ms) vRunTime(x) = delta_exec(x) * weight(nice_0) / weight(x) delta_exec(x) = now() - start_time(x) vRuntime = time x (weight of all tasks) - virtual entity to handle dynamic task addition and deletion vRunTime 1250 ms 0 ms vRunTime (0, 1000ms) Task-1 (0, 1000 ms) 750 ms vRunTime (750, 2000ms) Task-1 (750, 1000ms) Task-2 (0, 1000ms)
CFS - vRuntime Based All tasks are maintained in a single Red-Black Tree Key for the RB-Tree is vRuntime Left most node of the tree is selected __schedule(unsigned int sched_mode) pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr) struct sched_entity *left = __pick_first_entity(cfs_rq);
CFS - vRuntime Based static void update_curr(struct cfs_rq *cfs_rq) { delta_exec = now - curr->exec_start; if (unlikely((s64)delta_exec <= 0)) return; curr->sum_exec_runtime += delta_exec; curr->vruntime += calc_delta_fair(delta_exec, curr); update_min_vruntime(cfs_rq); }
Express Latency Needs! Created with CoPilot
EEVDF Earliest Eligible Virtual Deadline First EEVDF paper published in 1995 2 Parameters Eligibility Deadline Select the Earliest Deadline task among the Eligible Tasks
Eligibility - vRuntime Based Lag = Time a task is supposed to receive - Time the task has received actually Lag > 0 ? eligible : not-eligible
Deadline - vRuntime Based For task-x, with weight-1. With system time-quanta of 100ms, If no other tasks is running, Total weight of all tasks = 1 Time allotted to task-x = 100 * ( 1 / 1) = 100 ms If 2 other tasks are running with weights 2 and 3 respectively, Total weight of all tasks = 1 + 2 + 3 = 6 Time allotted to task-x = 100 * ( 1 / 6) = 16.67 ms
Deadline - vRuntime Based For task-x, with weight-1. Requires 20 ms runtime Deadline = System Time-Quanta required for allocate 20ms to task-x 1. If no other tasks is running, 20 * (1 / 1) = 20 ms If 2 other tasks are running with weights 2 and 3 respectively, 20 * (6 / 1) = 120 ms Deadline = required_time * (W / w(x))
Deadline - vRuntime Based For task-x, r = time required to service the request n tasks are there in the system and cumulative weight of all the n tasks is W In a given time t task-x will receive t * (w(x) / W) To receive r, what should be the T? (T is the time-quantum, which is necessary and sufficient for task-x to receive r service time) T * (w(x) / W) = r T = r * (W / w(x)) Deadline = eligible_time + r * (W / w(x))
Deadline - vRuntime Based /* * XXX: strictly: vd_i += N*r_i/w_i such that: vd_i > ve_i * this is probably good enough. */ static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se) { if ((s64)(se->vruntime - se->deadline) < 0) return; /* * For EEVDF the virtual time slope is determined by w_i (iow. * nice) while the request time r_i is determined by * sysctl_sched_base_slice. */ se->slice = sysctl_sched_base_slice; /* * EEVDF: vd_i = ve_i + r_i / w_i */ se->deadline = se->vruntime + calc_delta_fair(se->slice, se); /* * The task has consumed its request, reschedule. */ if (cfs_rq->nr_running > 1) { resched_curr(rq_of(cfs_rq)); clear_buddies(cfs_rq, se); } } If now is less than deadline, ignore. Because task is not eligible yet Place of improvement! Special logic to determine r will be prudent Separate logic to determine deadline based on latency_nice Deadline from now. if vruntime is low, deadline will be earlier
EEVDF RB-tree is arranged based on deadline New member min_vruntime introduced for each schedule_entity (not cfs_rq->min_vruntime which is assigned for new tasks) se->min_vruntime = min(se->vruntime, se->{left,right}->min_vruntime) min_vruntime is the minimum vruntime in that subtree
EEVDF - Earliest among Eligible static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) { struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node; struct sched_entity *se = __pick_first_entity(cfs_rq); struct sched_entity *curr = cfs_rq->curr; struct sched_entity *best = NULL; ... /* Pick the leftmost entity if it's eligible */ if (se && entity_eligible(cfs_rq, se)) { best = se; goto found; } while (node) { struct rb_node *left = node->rb_left; if (left && vruntime_eligible(cfs_rq, __node_2_se(left)->min_vruntime)) { node = left; continue; } se = __node_2_se(node); if (entity_eligible(cfs_rq, se)) { best = se; break; } node = node->rb_right; } found: if (!best || (curr && entity_before(curr, best))) best = curr; rb-tree is arranged based on deadline rb-tree is arranged based on deadline Pick leftmost if it s eligible (fastpath) Some node on the left subtree is eligible. Go for it. Place of improvement! Slow path logic. Try the right subtree return best; }
Q&A References: An EEVDF CPU scheduler for Linux [LWN.net] document (psu.edu) Linux source code