P3L1 - Scheduling

What algorithms are used in Linux OS? [Lesson Preview]

Completely fair scheduler and O(1) scheduler

How is a toy shop manager like an OS scheduler? [Visual Metaphor]

Assign tasks immediately (FIFO)Assign simple tasks first (SJF - shortest job first)Assign complex tasks first

Why would you want to dispatch orders immediately? [Visual Metaphor]

Low scheduling overhead

Why would you want to dispatch simple orders as soon as they arrive? [Visual Metaphor]

Maximize # of total orders that get processed. To do this, have to spend more time to check what kind of order it is. Need to be able to assess complexity of the orders

Why would you want to schedule complex orders as soon as they arrive? [Visual Metaphor]

To utilize all the workbenches or resources available

What does the CPU scheduler do? [Scheduling Overview]

Decides how and when processes and their threads access shared CPUsSchedules tasks running user-level processes/threads as well as kernel-level threads

When does the CPU scheduler run? [Scheduling Overview]

When CPU becomes idleNew task becomes readyTimeslice expired timeout

Which task is selected? [Scheduling Overview]

There is a scheduling policy/algorithmHow is it done?Depends on runqueue data structure

What is the responsibility of the scheduler? [Scheduling Overview]

Choose one of the tasks in ready queue, schedule onto CPU

When do threads go on the ready queue? [Scheduling Overview]

After an IO operation they were waiting on has completed or after they were woken up from a wake interrupt, when it's created so when someone forks a new thread, thread that was interrupted on CPU because another thread is running

When do we run the scheduler? [Scheduling Overview]

Whenever CPU is idle, so we don't keep it idle for too longWhenever a new task becomes ready, we want to check if any tasks are of higher importance and should interrupt the task

What are some ways scheduler allocates CPU to tasks? [Scheduling Overview]

Give each thread a time slice that expires

What is really happening when scheduler picks a thread to be scheduled onto CPU? [Scheduling Overview]

Perform a context switch to enter new thread contextEnter user modeSet program counter to point to next instruction

How does scheduler know which task to choose? [Scheduling Overview]

Scheduling policy or algorithm

How does scheduler accomplish whatever algorithm? [Scheduling Overview]

Details of scheduler implementation depend on run queue

How does this work? [Run To Completion Scheduling]

This assumes that as soon as a task is assigned to a CPU, it will run until it finishes

What are initial assumptions? [Run To Completion Scheduling]

We have a group of tasks to scheduleWe know exactly how much time these threads need to executeNo interrupts once it starts runningSingle CPU

What are some useful metrics for comparing schedulers? [Run To Completion Scheduling]

Throughput Average time it took for tasks to completeAverage time that tasks spent waiting before they were scheduledOverall CPU utilization

What is first come first serve algorithm? [Run To Completion Scheduling]

Tasks are scheduled first come first serveQueue structure is useful because FIFO

What are the calculations involved with first come first serve? [Run To Completion Scheduling]

T1 = 1sT2 = 10sT3 = 1sThroughput3/12second = 0.25 tasks/secondAverage completion time(1+11+12)/3 = 8 secondAverage wait time(0+1+11)/3 = 4 second

How do the metrics for first come first serve look?[Run To Completion Scheduling]

This is simple but wait time for tasks is poor even if there is just one long task in the system before the other shorter tasks

What is Shortest Job First goal? [Run To Completion Scheduling]

Schedule the tasks in the order of their execution time

What would be the order we execute these tasks for Shortest Job First? T1 = 1sT2 = 10sT3 = 1s[Run To Completion Scheduling]

T1T3T2

For shortest job first what data structure could we use? [Run To Completion Scheduling]

Queue but not FIFO, when we schedule a new task we have to traverse entire queue until we find the one with shortest execution timeOr a tree that rebalances so scheduler always evaluates node on the leftmostWe can maintain run queue as an ordered queue. It makes selection as short as before but inserting harder

What are the calculations for the tasks?T1 = 1sT2 = 10sT3 = 1s[SJF Performance Quiz]

OrderT1=1sT3=1sT2=10sThroughput3/12 = 1/4 tasks/secondAverage completion time(1+2+12)/3 = 15/3 secondsAverage wait time(0+1+2)/3 = 1 second

What is preemptive scheduling? [Preemptive Scheduling SJF Preempt]

Scheduling in which tasks don't just get the CPU and hog it until they're completedTasks don't have to arrive at the same timeIn this example, T2 arrives first so it starts executingT1 and T3 show up and they are shortest jobs. We do shortest job firstOnce those two have completed, T2 takes remaining time to execute

What needs to happen to get this to work? [Preemptive Scheduling SJF Preempt]

When tasks enter run queue, scheduler needs to be invoked to examine execution times and decide whether to preempt

What does execution time of task depend on? [Preemptive Scheduling SJF Preempt]

Inputs of taskWhether data is in cacheWhich other tasks are running

How do we use heuristics to guess what execution time of a task is? [Preemptive Scheduling SJF Preempt]

Use past execution time to predict futureOr take average over time - windowed average

What is the idea for this algorithm? [Preemptive Scheduling Priority]

Run highest priority task next

What data structure could you use for this? [Preemptive Scheduling Priority]

Multiple run queues, one per priorityPriority ordered tree structure

What is a danger with priority based scheduling? [Preemptive Scheduling Priority]

Starvation. A low priority task could be stuck infinitely in a run queue because there are higher priority tasks.We could protect against this with priority aging. Priority of task isn't fixed. It's some function of actual priority and time spent waiting

[Preemptive Scheduling Quiz]

T1 finishes atT2 finishes atT3 finishes at

What is priority inversion? [Priority Inversion]

When we introduce locks into scheduling, the priority order could be inverted

How do you prevent that? [Priority Inversion]

temporary boost the priority of the mutex ownerFor example, boost priority of T3 to be same as T1T1 can't proceed because it doesn't have a lockInstead of scheduling T2, we schedule T3 since it's higher priorityThen we can run T1When releasing mutex, lower priority of T3

What is this? [Round Robin Scheduling]

Run tasks round robin styleWe can include priority: when a higher priority task arrives, it will preempt lower priority task. If T2 and T3 have same priorities, scheduler will round robin between themWe can also use time-slicing and give each task a slice of time

What is a time slice? [Timesharing and Time-slices]

Max amount of uninterrupted time given to a task => time quantum

Why would a task run less than timeslice time? [Timesharing and Time-slices]

Wait on I/O, synchronization => will be placed on a queuehigher priority task becomes runnable

For non I/O bound tasks, how do you get time sharing? [Timesharing and Time-slices]

Timeslices

Why might task run in less time than timeslace? [Timesharing and Time-slices]

Wait on IO operation, synchronize with other tasks on a mutex, interrupted by higher priority task

For IO bound tasks does timesharing matter? [Timesharing and Time-slices]

Not as much since it is interrupted by IO

What is this example? [Timesharing and Time-slices]

The average wait for round robin is 1 second

What is the average completion time? [Timesharing and Time-slices]

It is 5.33 which is good

What are benefits of timesharing? [Timesharing and Time-slices]

SimpleGood wait time and completion timeShorter tasks finish sooner and schedule that is more responsive and IO operations can be executed asap

What are disadvantages of timesharing? [Timesharing and Time-slices]

Overhead, we have to interrupt the running task, run scheduler, to pick which task to run next, do context switch

What effect does the overhead of timesharing have? [Timesharing and Time-slices]

Total execution of timeline and throughput will increaseTasks start later, so average completion time increases

How do you avoid these impacts? [Timesharing and Time-slices]

If time slice values are significantly larger than context switch time

What should you consider for timeslice length? [Timesharing and Time-slices]

Nature of the tasks and overheads in system

What are the two scenarios to look at? [How Long Should a Timeslice Be]

IO bound and CPU bound

What timeslices do CPU bound tasks prefer? [CPU Bound Timeslice Length]

Longer[FILL IN CALCULATIONS]

RETURN

...

[IO Bound Timeslice Length]

Smaller timeslice[FILL IN CALCULATIONS]

[Timeslice Quiz]

RETURN

How would you gave IO and CPu tasks different time slice values? [Runqueue Data Structure]

Option 1: Make it easy for scheduler to figure out what type of task is being scheduled but maintain single run queue structureOption 2: separate into two different data structures/run queues

How does this work? [Runqueue Data Structure]

Place IO intensive tasks in first queueMedium IO tasks in second queueCPU intensive tasks go in third queue that has infinite time slice, like FCFS policy

What are the benefits? [Runqueue Data Structure]

IO intensive tasks have highest priorityCPU bound tasks don't have timeslicing overheads

How do we know if a task is CPU or IO intensive? What about tasks that have dynamic changes in behavior? [Runqueue Data Structure]

One single multi queue data structure

How would one single multi queue data structure work? (multi-level feedback queue) [Runqueue Data Structure]

Task starts off in top level but gets pushed down if it uses the entire time slice for the next runIt could also get pushed back up

How is multi-level feedback queue different from a priority queue? [Runqueue Data Structure]

different scheduling policies that are associated with each of these different levelsfeedback mechanism that allows us over time to adjust which one of these levels will we place a task and when we're trying to figure out what is the best time sharing schedule for the subtasks in the system

How does this scheduler work? [Linux O1 Scheduler]

Preemptive and priority based, total of 140 priority levels, 0 is highest, 139 is lowest

How are the priority levels grouped? [Linux O1 Scheduler]

Priority levels grouped into two classes0 to 99: real time tasks100 to 139: time sharing class

Where do user processes go in terms of priority level? [Linux O1 Scheduler]

They go to one of the time sharing priority levels. Default priority is 120. It can be adjusted with a nice value. Nice value are from -20 to 19

What does this scheduler borrow from the multi-level feedback queue scheduler? [Linux O1 Scheduler]

associates different time slice values with different priority levelsuses feedback from how task behaves in the past to determine how to adjust priority levels

How does this differ from the multi-level feedback queue scheduler? [Linux O1 Scheduler]

how it assigns time slice values to priorities: it assigns small time slice values to low priority cpu bound task, high time slice values to more IO taskshow it uses the feedback: based on time task spent sleepinglonger sleep->task is interactivelonger sleep->task should increase in priority. We subtract 5 from priority level to boost prioritySmaller sleep->CPU task so lower priority by incrementing by 5

What does Linux O(1) data structure look like? [Linux O1 Scheduler]

Run queue is organized as 2 arrays of task queuesEach array element points to first runnable task at that priority levelTwo arrays are called active and expired

How much time does it take to add a task? [Linux O1 Scheduler]

Constant time because you only need to index into this array based on priority level of task and follow pointer to end of task list to enqueue task there

How much time does it take to select a task? [Linux O1 Scheduler]

Constant timescheduler relies on certain instructions that return first priority level that has tasks on itThen takes constant time to index into array and get first task

What happens if tasks yield cpu to wait or are preempted by higher priority tasks? [Linux O1 Scheduler]

Total time of task including waiting and on cpu - time the task spent on CPU = time spent waitingIf that ^ is less than time slice they are still placed on the queue in the active list

How is the expired array used? [Linux O1 Scheduler]

expired array contains the tasks that are not currently active in the sense that the scheduler will not select tasks from the expired array as long as there's still tasks on any of the queues on the active array. When there are no more tasks on the active array, at that point the pointers of these two lists will be swapped and the expired array will become the new active one and vice versa. The active array will start holding all of the tasks that are removed from the active array and are becoming inactive

Why in the O(1) scheduler is the low priority tasks given the low time slices and high priority tasks given high time slices? [Linux O1 Scheduler]

having 2 arrays is an aging mechanismhigh priority tasks will consume the time slice, be placed on expired arraylow priority tasks will get a chance to run for their small time amount

Why was the O(1) scheduler was replaced with the completely fair scheduler? [Linux O1 Scheduler]

Applications became more time sensitive and jitter of O(1) was not acceptable

What were the problems with O(1)? [Linux CFS Scheduler]

performance of interactive tasks: once tasks are placed on expired list they won't be scheduled until all tasks on active list have a chance to executeperformance of interactive tasks is affected, causing jitterfairness: doesn't make fairness guaranteesdefinition of fairness: in a given time interval all tasks should be able to run for an amount of time proportional to their priority

What is the main idea? [Linux CFS Scheduler]

Uses red-black tree as run queue structureDynamic tree that self-balances itself so all paths are same sizeTasks are ordered based on amount of time they spend running on CPU/virtual runtimeTask on left spent less time on CPUTask on left will be scheduled sooner

How does CFS decide to switch tasks? [Linux CFS Scheduler]

Periodically increment vruntime of currently executing taskCompare runtime of current task to leftmost taskWhichever is smaller, it executesIf it preempts it will place task in the right tree location

How does CFS account for differences in task priority or niceness value? [Linux CFS Scheduler]

It changes effective rate at which tasks virtualtime progressesFor lower priority tasks, time passes more quickly as virtual runtime value progresses, so it will lose cpu more quicklyHigh priority task, time passes more slowly

What data structure does CFS use? [Linux CFS Scheduler]

One runqueue structure for all priority levels

What is performance of CFS? [Linux CFS Scheduler]

Selecting a task: O(1), just get leftmostAdd task: Olog(n)CFS scheduler might be replaced later bc of the second criteria

What was the main reason Linux O(1) scheduler was replaced by CFS?

Interactive tasks could wait unpredictable amounts of time

What is cache affinity? [Scheduling on Multiprocessors]

try to schedule the thread back on the same CPU where it executed before because it is more likely that it's cache will be hot

How do we achieve cache affinity? [Scheduling on Multiprocessors]

maintain a hierarchical scheduler architecture where at the top level, a load balancing component divides the task amongst the CPUs and then a per CPU scheduler with a per CPU run queue repeatedly schedules those tasks on a given CPU as much as possible

What about multiple memory nodes? [Scheduling on Multiprocessors]

CPU's and memory nodes connected via some interconnecta memory node can be technically connected to some subset of the CPU's. for instance to a socket that has multiple processorsfrom processor that's connected to memory node -> fasterbut because of non-uniform memory access platforms or NUMA platforms, any processor can access any memory node

How do NUMA processor relate to scheduling? [Scheduling on Multiprocessors]

scheduler to divide tasks in such a way so that tasks are bound to those cpu's that are closer to the memory node where the state of those tasks is

What is hyperthreading? [Hyperthreading]

CPU's have multiple sets of registers for stack pointer and program counterMultiple hardware supported execution contextsStill one CPU

How many CPU's are there? [Hyperthreading]

Only one

How many threads can execute on the CPU at a time? [Hyperthreading]

Only oneBut context switches are really fast

What does the CPU need to do in a context switch? [Hyperthreading]

Just switch from one set of registers to anotherNothing has to be saved or restored

What is another name for hyperthreading? [Hyperthreading]

Simultaneous multi threading

How does this help with latency? [Hyperthreading]

Because context switch is so fast, it makes sense to do it more often

What assumptions are made to understand what's required for scheduler in a SMT system? [Scheduling for Hyperthreading Platforms]

A thread can issue an instruction on every single cycleA memory access will take 4 cyclesTime it takes to context switch among different threads is instantaneous

What happens with coscheduling compute bound threads? [Scheduling for Hyperthreading Platforms]

As a result, these 2 threads will interfere with each other, they will be contending with the CPU pipeline resources and best case every one of them will basically spend 1 cycle idling while the other thread issues it's instructionMemory component is idle since nothing that's scheduled does any memory access

What happens with coscheduling memory bound threads? [Scheduling for Hyperthreading Platforms]

End up with idle cycles because both threads end up issuing memory operation and waiting

What happens with coscheduling compete and memory bound threads? [Scheduling for Hyperthreading Platforms]

Avoid / limit contention for processor pipelineBoth cpu and memory well utilizedSome level of degradation due to interference between two threads

How do we know if a thread is CPU or memory bound? [CPU Bound or Memory Bound]

Need hardware level infoPreviously when we used historic data we looked at sleep time. This won't work bc the thread is not really sleeping when it's waiting on a memory reference. It's active and just waitingSecond, to keep track of sleep we were using software methods that we can't use in a context switch

What hardware do we use to know if thread is cpu or memory bound? [CPU Bound or Memory Bound]

Hardware counters that get updated as processor is executing and keep info about various aspects of execution

What kind of info do hardware counters have? [CPU Bound or Memory Bound]

Cache usageLast level cache missesnumber of instructions that were retiredEnergy usage of CPU

how can hardware counters help a scheduler make any kinds of scheduling decisions? [CPU Bound or Memory Bound]

understand what threads need for resources, CPU or memory

How could you use cycles per instruction? [Scheduling with Hardware Counters]

A memory bound thread takes a lot of cycles to complete so has high CPICPU bound thread will complete instruction every cycle and has low CPI

How did she explore this question? [Scheduling with Hardware Counters]

Simulated a system that has 4 coresEach core is 4 way multithreadedTotal of 16 hardware contextsVary threads that get assigned to hardware contexts based on CPILowest CPI will be most CPU intensiveHighest CPI will be most memory intensiveShe wants to evaluate what overall performance is when specific mix of threads gets assigned to each of these contexts

What metric does she use to measure performance? [Scheduling with Hardware Counters]

Instructions per cycle - IPCMax is 4 per cycle because system has 4 cores in total

[Scheduling with Hardware Counters]

RETURN

What does the experiment results show? [CPI Experiment Quiz]

Mixed CPI showed good resultsBut do real life examples show these kinds of workloads?in practice, real workloads don't have behavior that exhibit significant differences in the CPI value and therefore CPI really won't be a useful metric

What are some takeaways from this paper? [CPI Experiment Results]

0