Changes between Initial Version and Version 1 of ClientSchedVersionFour


Ignore:
Timestamp:
Apr 26, 2007, 11:11:02 AM (17 years ago)
Author:
KSMarksPsych
Comment:

Added page

Legend:

Unmodified
Added
Removed
Modified
  • ClientSchedVersionFour

    v1 v1  
     1= Client scheduling policies (version 4) =
     2       
     3This document describes two interrelated scheduling policies in version 4.x of the BOINC client:
     4
     5    * '''CPU scheduling policy''': what result to run when.
     6    * '''Work fetch policy''': when to contact scheduling servers, which projects to contact, and how much work to task for.
     7
     8== CPU scheduling ==
     9
     10The CPU scheduling policy aims to achieve the following goals (in decreasing priority):
     11
     12   1. '''Maximize CPU utilization.'''
     13   2. '''Enforce resource shares.''' The ratio of CPU time allocated to projects that have work, in a typical period of a day or two, should be approximately the same as the ratio of the user-specified resource shares. If a process has no work for some period, it does not accumulate a 'debt' of work.
     14   3. '''Satisfy result deadlines if possible.'''
     15   4. '''Reschedule CPUs periodically.''' This goal stems from the large differences in duration of results from different projects. Participant in multiple projects will expect to see their computers do work on each of these projects in a reasonable time period.
     16   5. '''Minimize mean time to completion.''' For example, it's better to have one result from a project complete in time T than to have two results simultaneously complete in time 2T.
     17
     18A result is 'active' if there is a slot directory for it. There can be more active results than CPUs.
     19
     20== Debt ==
     21
     22The notion of 'debt' is used to respect the resource share allocation for each project. The debt to a project represents the amount of work (in CPU time) we owe it. Debt is decreased when CPU time is devoted to a project. We increase the debt to a project according to the total amount of work done in a time period scaled by the project's resource share.
     23
     24For example, consider a system participating in two projects, A and B, with resource shares 75% and 25%, respectively. Suppose in some time period, the system devotes 25 minutes of CPU time to project A and 15 minutes of CPU time to project B. We decrease the debt to A by 25 minutes and increase it by 30 minutes (75% of 25 + 15). So the debt increases overall. This makes sense because we expected to devote a larger percentage of the system resources to project A than it actually got.
     25
     26The choice of projects for which to start result computations can simply follow the debt ordering of the projects. The algorithm computes the 'anticipated debt' to a project (the debt we expect to owe after the time period expires) as it chooses result computations to run.
     27
     28If a project has no runnable results, its resource share should not be considered when determining the debts for other projects. Furthermore, such a project should not be allowed to build-up debt while it has no work. Thus, its debt should be reset to zero.
     29
     30== A sketch of the CPU scheduling algorithm ==
     31
     32This algorithm is run:
     33
     34    * Whenever a CPU is free
     35    * Whenever a new result arrives (via scheduler RPC)
     36    * Whenever it hasn't run for MV seconds, for some scheduling period MV
     37
     38We will attempt to minimize the number of active result computations for a project by dynamically choosing results to compute from a global pool. When we allocate CPU time to project, we will choose already running tasks first, then preempted tasks, and only choose to start a new result computation in the last resort. This will not guarantee the above property, but we hope it will be close to achieving it.
     39
     40   1. If a project has no runnable results:
     41         1. Reset its debt to 0, and do not consider its resource share to determine relative resource shares.
     42   2. Else:
     43         1. Decrease debts to projects according to the amount of work done for the projects in the last period.
     44         2. Increase debts to projects according to the projects' relative resource shares.
     45   3. Let the anticipated debt for each project be initialized to its current debt.
     46   4. Repeat until we decide on a result to compute for each processor:
     47         1. Choose the project that has the largest anticipated debt and a ready-to-compute result.
     48         2. Decrease the anticipated debt of the project by the expected amount of CPU time.
     49   5. Preempt current result computations, and start new ones.
     50
     51== Pseudocode ==
     52
     53{{{
     54data structures:
     55ACTIVE_TASK:
     56    double cpu_time_at_last_sched
     57    double current_cpu_time
     58    scheduler_state:
     59        PREEMPTED
     60        RUNNING
     61    next_scheduler_state    // temp
     62PROJECT:
     63    double work_done_this_period    // temp
     64    double debt
     65    double anticipated_debt // temp
     66    RESULT next_runnable_result
     67
     68schedule_cpus():
     69
     70foreach project P
     71    P.work_done_this_period = 0
     72
     73total_work_done_this_period = 0
     74foreach task T that is RUNNING:
     75    x = T.current_cpu_time - T.cpu_time_at_last_sched
     76    T.project.work_done_this_period += x
     77    total_work_done_this_period += x
     78
     79foreach P in projects:
     80    if P has a runnable result:
     81        adjusted_total_resource_share += P.resource_share
     82
     83foreach P in projects:
     84    if P has no runnable result:
     85        P.debt = 0
     86    else:
     87        P.debt += (P.resource_share / adjusted_total_resource_share)
     88                * total_work_done_this_period
     89                - P.work_done_this_period
     90
     91expected_pay_off = total_work_done_this_period / num_cpus
     92
     93foreach P in projects:
     94    P.anticipated_debt = P.debt
     95
     96foreach task T
     97    T.next_scheduler_state = PREEMPTED
     98
     99do num_cpus times:
     100    // choose the project with the largest anticipated debt
     101    P = argmax { P.anticipated_debt } over all P in projects with runnable result
     102    if none:
     103        break
     104    if (some T (not already scheduled to run) for P is RUNNING):
     105        T.next_scheduler_state = RUNNING
     106        P.anticipated_debt -= expected_pay_off
     107        continue
     108    if (some T (not already scheduled to run) for P is PREEMPTED):
     109        T.next_scheduler_state = RUNNING
     110        P.anticipated_debt -= expected_pay_off
     111        continue
     112    if (some R in results is for P, not active, and ready to run):
     113        Choose R with the earliest deadline
     114        T = new ACTIVE_TASK for R
     115        T.next_scheduler_state = RUNNING
     116        P.anticipated_debt -= expected_pay_off
     117
     118foreach task T
     119    if scheduler_state == PREEMPTED and next_scheduler_state = RUNNING
     120        unsuspend or run
     121    if scheduler_state == RUNNING and next_scheduler_state = PREEMPTED
     122        suspend (or kill)
     123
     124foreach task T
     125    T.cpu_time_at_last_sched = T.current_cpu_time
     126}}}
     127
     128== Work fetch policy ==
     129
     130The CPU scheduler is at its best when projects always have runnable results. When the CPU scheduler chooses to run a project without a runnable result, we say the CPU scheduler is 'starved'.
     131
     132The work fetch policy has the following goal:
     133
     134    * Maintain sufficient work so that the CPU scheduler is never starved.
     135    * More specifically: given a 'connection period' parameter T (days), always maintain sufficient work so that the CPU scheduler will not be starved for T days, given average work processing rate. The client should contact scheduling servers only about every T days.
     136    * Don't fetch more work than necessary, given the above goals. Thus, try to maintain enough work that starvation will occur between T and 2T days from now.
     137
     138=== When to get work ===
     139
     140At a given time, the CPU scheduler may need as many as
     141
     142    min_results(P) = ceil(ncpus * P.resource_share / total_resource_share)
     143
     144results for project P to avoid starvation.
     145
     146Let
     147
     148    ETTRC(RS, k)
     149    [__e__stimated __t__ime __t__o __r__esult __c__ount]
     150
     151be the amount of time that will elapse until the number of results in the set RS reaches k, given CPU speed, # CPUs, resource share, and active fraction. (The 'active fraction' is the fraction of time in which the core client is running and enabled to work. This statistic is continually updated.)
     152
     153Let
     154
     155    ETTPRC(P, k) = ETTRC(P.runnable_results, k)
     156    [__e__stimated __t__ime __t__o __p__roject __r__esult __c__ount]
     157
     158Then the amount of time that will elapse until starvation is estimated as
     159
     160    min { ETTPRC(P, min_results(P)-1) } over all P.
     161
     162It is time to get work when, for any project P,
     163
     164    ETTPRC(P, min_results(P)-1) < T
     165
     166where T is the connection period defined above.
     167
     168=== Computing ETTPRC(P, k) ===
     169
     170First, define estimated_cpu_time(S) to be the total FLOP estimate for the results in S divided by single CPU FLOPS.
     171
     172Let ordered set of results RS(P) = { R_1, R_2, ..., R_N } for project P be ordered by increasing deadline. The CPU scheduler will schedule these results in the same order. ETTPRC(P, k) can be approximated by
     173
     174    estimated_cpu_time(R_1, R_2, ..., R_N-k) / avg_proc_rate(P)
     175
     176where avg_proc_rate(P) is the average number of CPU seconds completed by the client for project P in a second of (wall-clock) time:
     177
     178    avg_proc_rate(P) = P.resource_share / total_resource_share * ncpus * 'active fraction'.
     179
     180=== How much work to get ===
     181
     182To only contact scheduling servers about every T days, the client should request enough work so that ''for each project P'':
     183
     184    ETTPRC(P, min_results(P)-1) >= 2T.
     185
     186More specifically, we want to get a set of results REQUEST such that
     187
     188    ETTRC(REQUEST, 0) = 2T - ETTPRC(P, min_results(P)-1).
     189
     190Since requests are in terms of CPU seconds, we make a request of
     191
     192    estimated_cpu_time(REQUEST) = avg_proc_rate * (2T - ETTPRC(P, min_results(P)-1).
     193
     194=== A sketch of the work fetch algorithm ===
     195
     196The algorithm sets P.work_request for each project P and returns an 'urgency level':
     197
     198{{{
     199NEED_WORK_IMMEDIATELY
     200    CPU scheduler is currently starved
     201NEED_WORK
     202    Will starve within T days
     203DONT_NEED_WORK
     204    otherwise
     205}}}
     206
     207It can be called whenever the client can make a scheduler RPC.
     208
     209   1. urgency = DONT_NEED_WORK
     210   2. For each project P
     211         1. Let RS(P) be P's runnable results ordered by increasing deadline.
     212         2. Let S = ETTPRC(RS(P), min_results(P)-1).
     213         3. If S < T
     214               1. If S == 0: urgency = NEED_WORK_IMMEDIATELY
     215               2. else: urgency = max(urgency, NEED_WORK)
     216         4. P.work_request = max(0, (2T - S) * avg_proc_rate(P))
     217   3. If urgency == DONT_NEED_WORK
     218         1. For each project P: P.work_request = 0
     219   4. Return urgency
     220
     221The mechanism for getting work periodically (once a second) calls compute_work_request(), checks if any project has a non-zero work request and if so, makes the scheduler RPC call to request the work.
     222
     223=== Pseudocode ===
     224
     225{{{
     226data structures:
     227PROJECT:
     228    double work_request
     229
     230total_resource_share = sum of all projects' resource_share
     231
     232avg_proc_rate(P):
     233    return P.resource_share / total_resource_share
     234           * ncpus * time_stats.active_frac
     235
     236ettprc(P, k):
     237    results_to_skip = k
     238    est = 0
     239    foreach result R for P in order of DECREASING deadline:
     240        if results_to_skip > 0:
     241            results_to_skip--
     242            continue
     243        est += estimated_cpu_time(R) / avg_proc_rate(P)
     244    return est
     245
     246compute_work_request():
     247    urgency = DONT_NEED_WORK
     248    foreach project P:
     249        P.work_request = 0
     250        est_time_to_starvation = ettprc(P, min_results(P)-1)
     251
     252        if est_time_to_starvation < T:
     253            if est_time_to_starvation == 0:
     254                urgency = NEED_WORK_IMMEDIATELY
     255            else:
     256                urgency = max(NEED_WORK, urgency)
     257        P.work_request =
     258            max(0, (2*T - est_time_to_starvation)*avg_proc_rate(P))
     259
     260    if urgency == DONT_NEED_WORK:
     261        foreach project P:
     262            P.work_request = 0
     263
     264    return urgency
     265}}}