Changes between Version 4 and Version 5 of ClientSchedOctTen


Ignore:
Timestamp:
Oct 27, 2010, 1:53:18 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ClientSchedOctTen

    v4 v5  
    5555
    5656The idea is to make resource share apply to credit.
    57 If two projects have the same resource share,
    58 they should have the same RAC.
    59 This suggests the following principle,
    60 which can apply to both work fetch and job scheduling:
     57If two projects have the same resource share, they should have the same RAC.
     58Scheduling decisions should give preference to projects
     59whose share of RAC is less than their resource share.
    6160
    62  * Normalize RAC and resource share so that each one sums to 1 across projects.
    63  * For a project P, let G(P) = share(P) - RAC(P).
    64  * Give priority to projects for which G(P) is highest,
    65    i.e. that aren't getting as much credit as they should.
    66 
    67 This does 2 things:
    68 
    69  * It's the correct semantics for resource share:
    70    they now control something that volunteers can actually see,
    71    namely credit.
    72  * It penalizes projects that grant inflated credit:
    73    the more credit a project grants, the less work a given host
    74    will do for it, assuming the host is attached to multiple projects.
    75    (The converse is potentially true - a project would get more work done
    76    by granting less credit.  This is minimized by a mechanism described below.)
    77 
    78 Note: I've glossed over the issue of the time scale over which RAC is averaged.
    79 The RAC reported by servers has a half-life of a week.
    80 For purposes of scheduling a different (probably longer) period would be better.
    81 The client could potentially compute its own RAC
    82 based on changes in total credit.
    83 However, it's probably OK to just use the server-reported RAC.
    84 
    85 === Recent average FLOPS ===
    86 
    87 There are some problems with credit-driven scheduling:
     61There are problems with using project-granted credit
     62as a basis for this approach:
    8863
    8964 * There may be a long and variable delay between completing a job
     
    9166 * Jobs may fail to get credit, e.g. because they don't validate.
    9267
    93 To deal with these issues,
    94 I propose using not just RAC by itself,
    95 but the combination of RAC and '''recent average FLOPS''' (RAF) per project.
    96 This is intended to address the above 2 issues,
    97 and the issue of projects that grant too little credit.
     68Hence we will use a surrogate called '''estimated credit'''
     69that is maintained by the client.
     70If projects grant credit fairly, and if all jobs validate,
     71then estimated credit is roughly equal to granted credit over the long term.
     72
     73Note: there is a potential advantage to using granted credit too.
     74Doing so penalizes projects that grant inflated credit:
     75the more credit a project grants, the less work a given host
     76will do for it, assuming the host is attached to multiple projects.
     77(The converse is potentially also true - a project would get more work done
     78by granting less credit.  This effect could be minimized by
     79combining estimated credit with granted credit.)
     80
     81=== Estimated credit ===
     82
     83BOINC server software grants credit on the basis of peak FLOPS,
     84with a scaling factor applied to GPUs to normalize them relative to CPUs.
     85The normalized peak FLOPS of a GPU can be estimated.
     86
     87The estimated credit for a T-second segment of job execution is given by
     88{{{
     89T * ninstances(P) * peak_flops(P)
     90}}}
     91summed over the processor types used by the job.
     92
     93The '''recent estimated credit''' REC(P) f a project P
     94is maintained by the client,
     95with an averaging half-life of, say, a month.
    9896
    9997=== Work fetch ===
    10098
    101 In addition to G(P), the work fetch policy also needs to take into account
     99The work fetch policy also needs to take into account
    102100the amount of work currently queued for a project,
    103101so that it doesn't keep getting work from the same project.
     
    107105We then define the work-fetch priority of a project as
    108106{{{
    109 WFP(P) = share(P) - (RAC(P) + A*RAF(P) + B*Q(P))/(1+A+B)
     107WFP(P) = share(P) - (RAF(P) + A*Q(P))/(1+A)
    110108}}}
    111109
    112 where A and B are parameters, probably around 1.
     110where A is a parameter, probably around 1.
    113111
    114112=== Job scheduling ===
     
    121119The job-scheduling priority is then
    122120{{{
    123 JSP(P) = share(P) - (RAC(P) + C*RAF(P))
     121JSP(P) = share(P) - B*RAF(P))
    124122}}}
    125 where C is a parameter, probably around 1.
     123where B is a parameter, probably around 1.