Context Navigation

Changes between Version 4 and Version 5 of ClientSchedOctTen

Timestamp:: Oct 27, 2010, 1:53:18 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ClientSchedOctTen

-                      v4
+                      v5
 The idea is to make resource share apply to credit.
+If two projects have the same resource share,
+they should have the same RAC.
+This suggests the following principle,
+which can apply to both work fetch and job scheduling:
+If two projects have the same resource share, they should have the same RAC.
+Scheduling decisions should give preference to projects
+whose share of RAC is less than their resource share.
+ * Normalize RAC and resource share so that each one sums to 1 across projects.
+ * For a project P, let G(P) = share(P) - RAC(P).
+ * Give priority to projects for which G(P) is highest,
+   i.e. that aren't getting as much credit as they should.
+This does 2 things:
+ * It's the correct semantics for resource share:
+   they now control something that volunteers can actually see,
+   namely credit.
+ * It penalizes projects that grant inflated credit:
+   the more credit a project grants, the less work a given host
+   will do for it, assuming the host is attached to multiple projects.
+   (The converse is potentially true - a project would get more work done
+   by granting less credit.  This is minimized by a mechanism described below.)
+Note: I've glossed over the issue of the time scale over which RAC is averaged.
+The RAC reported by servers has a half-life of a week.
+For purposes of scheduling a different (probably longer) period would be better.
+The client could potentially compute its own RAC
+based on changes in total credit.
+However, it's probably OK to just use the server-reported RAC.
+=== Recent average FLOPS ===
+There are some problems with credit-driven scheduling:
+There are problems with using project-granted credit
+as a basis for this approach:
  * There may be a long and variable delay between completing a job
 …
  * Jobs may fail to get credit, e.g. because they don't validate.
+To deal with these issues,
+I propose using not just RAC by itself,
+but the combination of RAC and '''recent average FLOPS''' (RAF) per project.
+This is intended to address the above 2 issues,
+and the issue of projects that grant too little credit.
+Hence we will use a surrogate called '''estimated credit'''
+that is maintained by the client.
+If projects grant credit fairly, and if all jobs validate,
+then estimated credit is roughly equal to granted credit over the long term.
+Note: there is a potential advantage to using granted credit too.
+Doing so penalizes projects that grant inflated credit:
+the more credit a project grants, the less work a given host
+will do for it, assuming the host is attached to multiple projects.
+(The converse is potentially also true - a project would get more work done
+by granting less credit.  This effect could be minimized by
+combining estimated credit with granted credit.)
+=== Estimated credit ===
+BOINC server software grants credit on the basis of peak FLOPS,
+with a scaling factor applied to GPUs to normalize them relative to CPUs.
+The normalized peak FLOPS of a GPU can be estimated.
+The estimated credit for a T-second segment of job execution is given by
+{{{
+T * ninstances(P) * peak_flops(P)
+}}}
+summed over the processor types used by the job.
+The '''recent estimated credit''' REC(P) f a project P
+is maintained by the client,
+with an averaging half-life of, say, a month.
 === Work fetch ===
 In addition to G(P), the work fetch policy also needs to take into account
+The work fetch policy also needs to take into account
 the amount of work currently queued for a project,
 so that it doesn't keep getting work from the same project.
 …
 We then define the work-fetch priority of a project as
 {{{
 WFP(P) = share(P) - (RAC(P) + A*RAF(P) + B*Q(P))/(1+A+B)
+WFP(P) = share(P) - (RAF(P) + A*Q(P))/(1+A)
 }}}
 where A and B are parameters, probably around 1.
+where A is a parameter, probably around 1.
 === Job scheduling ===
 …
 The job-scheduling priority is then
 {{{
 JSP(P) = share(P) - (RAC(P) + C*RAF(P))
+JSP(P) = share(P) - B*RAF(P))
 }}}
 where C is a parameter, probably around 1.
+where B is a parameter, probably around 1.