Changes between Version 1 and Version 2 of ClientSchedOctTen


Ignore:
Timestamp:
Oct 26, 2010, 1:56:02 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ClientSchedOctTen

    v1 v2  
    4646and gets clamped at the cutoff.
    4747All information about the relative debt of B and C is lost.
     48
     49=== The bottom line ===
     50
     51The current approach - extending the STD/LTD model to multiple resource types -
     52seemed good at the time but turns out to be the wrong way.
     53
     54== Proposal: credit-driven scheduling ==
     55
     56The idea is to make resource share apply to credit.
     57If two projects have the same resource share,
     58they should have the same RAC.
     59This suggests the following principle,
     60which can apply to both work fetch and job scheduling:
     61
     62 * Normalize RAC and resource share so that each one sums to 1 across projects.
     63 * For a project P, let G(P) = share(P) - RAC(P).
     64 * Give priority to projects for which G(P) is highest,
     65   i.e. that aren't getting as much credit as they should.
     66
     67This does 2 things:
     68
     69 * It's the correct semantics for resource share:
     70   they now control something that volunteers can actually see,
     71   namely credit.
     72 * It penalizes projects that grant inflated credit:
     73   the more credit a project grants, the less work a given host
     74   will do for it, assuming the host is attached to multiple projects.
     75   (The converse is potentially true - a project would get more work done
     76   by granting less credit.  This is minimized by a mechanism described below.)
     77
     78Note: I've glossed over the issue of the time scale over which RAC is averaged.
     79The RAC reported by servers has a half-life of a week.
     80For purposes of scheduling a different (probably longer) period would be better.
     81The client could potentially compute its own RAC
     82based on changes in total credit.
     83However, it's probably OK to just use the server-reported RAC.
     84
     85=== Recent average FLOPS ===
     86
     87There are some problems with credit-driven scheduling:
     88
     89 * There may be a long and variable delay between completing a job
     90   and getting credit for it.
     91 * Jobs may fail to get credit, e.g. because they don't validate.
     92
     93To deal with these issues,
     94I propose using not just RAC by itself,
     95but the combination of RAC and '''recent average FLOPS''' (RAF) per project.
     96This is intended to address the above 2 issues,
     97and the issue of projects that grant too little credit.
     98
     99=== Work fetch ===
     100
     101In addition to G(P), the work fetch policy also needs to take into account
     102the amount of work currently queued for a project,
     103so that it doesn't keep getting work from the same project.
     104To accomplish this, we define Q(P) as the number of FLOPS currently queued for P,
     105normalized so that the sum over projects is 1.
     106
     107We then define the work-fetch priority of a project as
     108{{{
     109WFP(P) = share(P) - (RAC(P) + A*RAF(P) + B*Q(P))/(1+A+B)
     110}}}
     111
     112Where A and B are parameters, probably around 1.
     113
     114=== Job scheduling ===
     115
     116As the job scheduling policy picks jobs to run (e.g. on a multiprocessor)
     117it needs to take into account the jobs already scheduled,
     118so that it doesn't always schedule multiple jobs from the same project.
     119To accomplish this, as each job is scheduled we update
     120RAF(P) as if the job had run for one scheduling period.