Context Navigation

Changes between Version 1 and Version 2 of ClientSchedOctTen

Timestamp:: Oct 26, 2010, 1:56:02 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ClientSchedOctTen

-                      v1
+                      v2
 and gets clamped at the cutoff.
 All information about the relative debt of B and C is lost.
+=== The bottom line ===
+The current approach - extending the STD/LTD model to multiple resource types -
+seemed good at the time but turns out to be the wrong way.
+== Proposal: credit-driven scheduling ==
+The idea is to make resource share apply to credit.
+If two projects have the same resource share,
+they should have the same RAC.
+This suggests the following principle,
+which can apply to both work fetch and job scheduling:
+ * Normalize RAC and resource share so that each one sums to 1 across projects.
+ * For a project P, let G(P) = share(P) - RAC(P).
+ * Give priority to projects for which G(P) is highest,
+   i.e. that aren't getting as much credit as they should.
+This does 2 things:
+ * It's the correct semantics for resource share:
+   they now control something that volunteers can actually see,
+   namely credit.
+ * It penalizes projects that grant inflated credit:
+   the more credit a project grants, the less work a given host
+   will do for it, assuming the host is attached to multiple projects.
+   (The converse is potentially true - a project would get more work done
+   by granting less credit.  This is minimized by a mechanism described below.)
+Note: I've glossed over the issue of the time scale over which RAC is averaged.
+The RAC reported by servers has a half-life of a week.
+For purposes of scheduling a different (probably longer) period would be better.
+The client could potentially compute its own RAC
+based on changes in total credit.
+However, it's probably OK to just use the server-reported RAC.
+=== Recent average FLOPS ===
+There are some problems with credit-driven scheduling:
+ * There may be a long and variable delay between completing a job
+   and getting credit for it.
+ * Jobs may fail to get credit, e.g. because they don't validate.
+To deal with these issues,
+I propose using not just RAC by itself,
+but the combination of RAC and '''recent average FLOPS''' (RAF) per project.
+This is intended to address the above 2 issues,
+and the issue of projects that grant too little credit.
+=== Work fetch ===
+In addition to G(P), the work fetch policy also needs to take into account
+the amount of work currently queued for a project,
+so that it doesn't keep getting work from the same project.
+To accomplish this, we define Q(P) as the number of FLOPS currently queued for P,
+normalized so that the sum over projects is 1.
+We then define the work-fetch priority of a project as
+{{{
+WFP(P) = share(P) - (RAC(P) + A*RAF(P) + B*Q(P))/(1+A+B)
+}}}
+Where A and B are parameters, probably around 1.
+=== Job scheduling ===
+As the job scheduling policy picks jobs to run (e.g. on a multiprocessor)
+it needs to take into account the jobs already scheduled,
+so that it doesn't always schedule multiple jobs from the same project.
+To accomplish this, as each job is scheduled we update
+RAF(P) as if the job had run for one scheduling period.