Context Navigation

Changes between Initial Version and Version 1 of ClientSchedOctTen

                       v1
+= Client scheduling changes =
+Design document for changes to the client work fetch and job scheduling policies,
+started Oct 2010.
+This supercedes the following design docs:
+ * GpuWorkFetch
+ * GpuSched
+ * ClientSched
+== Problems with current system ==
+The current policies, described [GpuWorkFetch here],
+maintain long- and short-term debts for each
+(project, resource type) pair.
+Job scheduling for a given resource type is based on STD.
+Projects with greater STD for the resource are given priority.
+Work fetch is based on a weighted sum of LTDs.
+Work is typically fetched from the project for which this sum is greatest,
+and typically work is requested for all resource types.
+These policies fail to meet their goals in many cases.
+Here are two scenarios that illustrate the underlying problems:
+=== Example 1 ===
+A host has a fast GPU and a slow CPU.
+Project A has apps for both GPU and CPU.
+Project B has apps only for CPU.
+Equal resource shares.
+In the current system each project will get 50% of the CPU.
+The target behavior, which matches resource shares better,
+is that project B gets 100% of the CPU
+and project A gets 100% of the GPU.
+=== Example 2 ===
+Same host.
+Additional project C has only CPU apps.
+In this case A's CPU LTD will stay around zero,
+and the CPU LTD for B and C goes unboundedly negative,
+and gets clamped at the cutoff.
+All information about the relative debt of B and C is lost.