Context Navigation

Changes between Version 6 and Version 7 of GpuWorkFetch

Timestamp:: Dec 26, 2008, 10:32:43 AM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GpuWorkFetch

-                      v6
+                      v7
 = Work fetch and GPUs =
+== Current policy ==
+ * Weighted round-robin simulation
+  * get per-project and overall CPU shortfalls
+  * see what misses deadline
+ * If overall shortfall, get work from project with highest LTD
+ * Scheduler request includes just "work_req_seconds".
+Problems:
+There may be no CPU shortfall, but GPU is idle
+If GPU is idle, we should get work from a project that potentially has jobs for it.
+If the project has both CPU and GPU jobs, we may need to tell to send only GPU jobs.
+LTD isn't meaningful with GPUs
+== New policy ==
+{{{
+A CPU job is one that uses only CPU time
+A CUDA job is one that uses CUDA (and may use CPU as well)
+== Problems with the current work fetch policy ==
+The current work-fetch policy is essentially:
+ * Do a weighted round-robin simulation, computing overall CPU shortfall
+ * If there's a shortfall, request work from the project with highest LTD
+The scheduler request has a single number "work_req_seconds"
+indicating the total duration of jobs being requested.
+This policy has various problems.
+ * There's no way for the client to say "I have N idle CPUs, so send me enough jobs to use them all".
+And many problems related to GPUs:
+ * There may be no CPU shortfall, but GPUs are idle; no work will be fetched.
+ * If a GPU is idle, we should get work from a project that potentially has jobs for it.
+ * If a project has both CPU and GPU jobs, we may need to tell it to send only GPU (or only CPU) jobs.
+ * LTD is computed solely on the basis of CPU time used, so it doesn't provide a meaningly comparison between projects that use only GPUs, or between a GPU project and a CPU project.
+This document proposes a work-fetch system that solves these problems.
+For simplicity, the design assumes that there is only one GPU time (CUDA).
+It is straightforward to extend the design to handle additional GPU types.
+== Terminology ==
+A job sent to a client is associated with an app version,
+which uses some number (possibly fractional) of CPUs and CUDA devices.
+ * A '''CPU job''' is one that uses only CPU.
+ * A '''CUDA job''' is one that uses CUDA (and may use CPU as well).
+== Scheduler request ==
+New fields in scheduler request message:
+'''double cpu_req_seconds''': number of CPU seconds requested
+'''double cuda_req_seconds''': number of CUDA seconds requested
+'''double ninstances_cpu''': send enough jobs to occupy this many CPUs
+'''double ninstances_cuda''': send enough jobs to occupy this many CUDA devs
+For compatibility with old servers, the message still has '''work_req_seconds''';
+this is the max of (cpu,cuda)_req_seconds.
+== Client ==
+New abstraction: '''processing resource''' or PRSC.
+There are two processing resource types: CPU and CUDA.
+Each PRSC has its own
 ----------------------
 RESOURCE_WORK_FETCH