Changes between Version 9 and Version 10 of ClientSchedOctTen


Ignore:
Timestamp:
Jul 7, 2011, 5:11:40 PM (13 years ago)
Author:
Ageless
Comment:
  1. Fixed heading.

Legend:

Unmodified
Added
Removed
Modified
  • ClientSchedOctTen

    v9 v10  
    11= Client scheduling changes =
    2 
    3 Design document for changes to the client work fetch and job scheduling policies,
    4 started Oct 2010.
     2Design document for changes to the client work fetch and job scheduling policies, started Oct 2010.
    53
    64This supercedes the following design docs:
     5
    76 * GpuWorkFetch
    87 * GpuSched
     
    109
    1110== Problems with current system ==
     11The current policies, described [wiki:GpuWorkFetch here], maintain long- and short-term debts for each (project, processor type) pair.
    1212
    13 The current policies, described [GpuWorkFetch here],
    14 maintain long- and short-term debts for each
    15 (project, processor type) pair.
     13Job scheduling for a given processor type is based on STD. Projects with greater STD for the type are given priority.
    1614
    17 Job scheduling for a given processor type is based on STD.
    18 Projects with greater STD for the type are given priority.
     15Work fetch is based on a weighted sum of LTDs. Work is typically fetched from the project for which this sum is greatest, and typically work is requested for all processor types.
    1916
    20 Work fetch is based on a weighted sum of LTDs.
    21 Work is typically fetched from the project for which this sum is greatest,
    22 and typically work is requested for all processor types.
     17=== Resource shares not enforced ===
    2318
    24 === Resource shares not enforced ==
    25 
    26 These policies may fail to enforce resource shares.
    27 Here are two scenarios that illustrate the underlying problems:
     19These policies may fail to enforce resource shares. Here are two scenarios that illustrate the underlying problems:
    2820
    2921==== Example 1 ====
     22A host has a fast GPU and a slow CPU. Project A has apps for both GPU and CPU. Project B has apps only for CPU. Equal resource shares.
    3023
    31 A host has a fast GPU and a slow CPU.
    32 Project A has apps for both GPU and CPU.
    33 Project B has apps only for CPU.
    34 Equal resource shares.
    35 
    36 In the current system each project will get 50% of the CPU.
    37 The target behavior, which matches resource shares better,
    38 is that project B gets 100% of the CPU
    39 and project A gets 100% of the GPU.
     24In the current system each project will get 50% of the CPU. The target behavior, which matches resource shares better, is that project B gets 100% of the CPU and project A gets 100% of the GPU.
    4025
    4126==== Example 2 ====
     27Same host. Additional project C has only CPU apps.
    4228
    43 Same host.
    44 Additional project C has only CPU apps.
    45 
    46 In this case A's CPU LTD will stay around zero,
    47 and the CPU LTD for B and C goes unboundedly negative,
    48 and gets clamped at the cutoff.
    49 All information about the relative debt of B and C is lost.
     29In this case A's CPU LTD will stay around zero, and the CPU LTD for B and C goes unboundedly negative, and gets clamped at the cutoff. All information about the relative debt of B and C is lost.
    5030
    5131=== Too many scheduler requests ===
     32The work fetch mechanism has two buffer sizes: min and max. The current work fetch policy requests work whenever the buffer goes below max. This can cause the client to issue frequent small work requests.
    5233
    53 The work fetch mechanism has two buffer sizes: min and max.
    54 The current work fetch policy requests work whenever
    55 the buffer goes below max.
    56 This can cause the client to issue frequent small work requests.
    57 
    58 The original intention of min and max is that they are hysteresis limits:
    59 when the buffer goes below min,
    60 the client tries to fetch enough work to increase the buffer to max,
    61 and then issues no further requests until the buffer falls below min again.
    62 In this way, the interval between scheduler RPCs to a given project
    63 is at least (max-min).
     34The original intention of min and max is that they are hysteresis limits: when the buffer goes below min, the client tries to fetch enough work to increase the buffer to max, and then issues no further requests until the buffer falls below min again. In this way, the interval between scheduler RPCs to a given project is at least (max-min).
    6435
    6536== Proposal: credit-driven scheduling ==
     37The idea is to make resource share apply to overall credit, not to individual resources types. If two projects have the same resource share, they should have the same RAC. Scheduling decisions should give preference to projects whose share of RAC is less than their resource share.
    6638
    67 The idea is to make resource share apply to overall credit,
    68 not to individual resources types.
    69 If two projects have the same resource share, they should have the same RAC.
    70 Scheduling decisions should give preference to projects
    71 whose share of RAC is less than their resource share.
     39There are problems with using project-granted credit as a basis for this approach:
    7240
    73 There are problems with using project-granted credit
    74 as a basis for this approach:
    75 
    76  * There may be a long and variable delay between completing a job
    77    and getting credit for it.
     41 * There may be a long and variable delay between completing a job and getting credit for it.
    7842 * Jobs may fail to get credit, e.g. because they don't validate.
    7943
    80 Hence we will use a surrogate called '''estimated credit''',
    81 maintained by the client.
    82 If projects grant credit fairly, and if all jobs validate,
    83 then estimated credit is roughly equal to granted credit over the long term.
     44Hence we will use a surrogate called '''estimated credit''', maintained by the client. If projects grant credit fairly, and if all jobs validate, then estimated credit is roughly equal to granted credit over the long term.
    8445
    85 Note: there is a potential advantage to using granted credit:
    86 doing so penalizes projects that grant inflated credit:
    87 the more credit a project grants, the less work a given host
    88 will do for it, assuming the host is attached to multiple projects.
    89 (The converse is potentially also true - a project would get more work done
    90 by granting less credit.  This effect could be minimized by
    91 combining estimated credit with granted credit.)
     46Note: there is a potential advantage to using granted credit: doing so penalizes projects that grant inflated credit: the more credit a project grants, the less work a given host will do for it, assuming the host is attached to multiple projects. (The converse is potentially also true - a project would get more work done by granting less credit.  This effect could be minimized by combining estimated credit with granted credit.)
    9247
    9348=== Estimated credit ===
    94 
    95 BOINC server software grants credit on the basis of peak FLOPS,
    96 with a scaling factor applied to GPUs to normalize them relative to CPUs.
    97 The normalized peak FLOPS of a GPU can be estimated.
     49BOINC server software grants credit on the basis of peak FLOPS, with a scaling factor applied to GPUs to normalize them relative to CPUs. The normalized peak FLOPS of a GPU can be estimated.
    9850
    9951The estimated credit for a T-second segment of job execution is given by
     52
    10053{{{
    10154T * ninstances(P) * peak_flops(P)
     
    10356summed over the processor types used by the job.
    10457
    105 The '''recent estimated credit''' REC(P) f a project P
    106 is maintained by the client,
    107 with an averaging half-life of, say, a month.
     58The '''recent estimated credit''' REC(P) f a project P is maintained by the client, with an averaging half-life of, say, a month.
    10859
    10960The '''scheduling priority''' of a project P is then
     61
    11062{{{
    11163SP(P) = share(P) - REC(P)
     
    11365where REC is normalized so that it sums to 1.
    11466
    115 In scenario 1 above,
    116 SP(A) will always be negative and
    117 SP(B) will always be positive.
     67In scenario 1 above, SP(A) will always be negative and SP(B) will always be positive.
    11868
    11969== Job scheduling ==
    120 
    121 As the job scheduling policy picks jobs to run (e.g. on a multiprocessor)
    122 it needs to take into account the jobs already scheduled,
    123 so that it doesn't always schedule multiple jobs from the same project.
    124 To accomplish this, as each job is scheduled we increment
    125 a copy of REC(P) as if the job had run for one scheduling period.
     70As the job scheduling policy picks jobs to run (e.g. on a multiprocessor) it needs to take into account the jobs already scheduled, so that it doesn't always schedule multiple jobs from the same project. To accomplish this, as each job is scheduled we increment a copy of REC(P) as if the job had run for one scheduling period.
    12671
    12772== Work fetch ==
    128 
    12973The proposed work fetch policy:
    13074
    131  * Work fetch for a given processor type
    132    is initiated whenever the saturated period is less than min.
     75 * Work fetch for a given processor type is initiated whenever the saturated period is less than min.
    13376 * ask the fetchable project with greatest SP(P) for "shortfall" seconds of work.
    13477 * option: if not at max after this RPC, try the project with 2nd-highest SP(), etc.
    135  * whenever a scheduler RPC to project P is done
    136    (e.g. to report results)
    137    and SP(P) is greatest among fetchable projects for a given processor type,
    138    request "shortfall" seconds of that type
     78 * whenever a scheduler RPC to project P is done (e.g. to report results) and SP(P) is greatest among fetchable projects for a given processor type, request "shortfall" seconds of that type
    13979
    14080Notes:
    14181
    142  * This will tend to get large (max-min) clumps of work for a single project,
    143    and variety will be lower than the current policy.
    144  * What about "overworked" scenario: project A has very long job,
    145    runs high-priority for weeks.
    146    When it finishes, if other projects are temporarily down,
    147    don't want to immediately fetch more work from A.
    148    
     82 * This will tend to get large (max-min) clumps of work for a single project, and variety will be lower than the current policy.
     83 * What about "overworked" scenario: project A has very long job, runs high-priority for weeks. When it finishes, if other projects are temporarily down, don't want to immediately fetch more work from A.