Context Navigation

Changes between Version 26 and Version 27 of GpuWorkFetch

Timestamp:: Jan 27, 2009, 11:51:36 AM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GpuWorkFetch

-                      v26
+                      v27
  * LTD is computed solely on the basis of CPU time used, so it doesn't provide a meaningful comparison between projects that use only GPUs, or between a GPU and CPU projects.
+== Example ==
+== Examples ==
+In following, A and B are projects.
+=== Example 1 ===
 Suppose that:
  * Project A has only GPU jobs and project B has both GPU and CPU jobs.
  * A host is attached to projects A and B with equal resource shares.
+ * A has only GPU jobs and B has both GPU and CPU jobs.
+ * The host is attached to A and B with equal resource shares.
  * The host's GPU is twice as fast as its CPU.
+In this case, the target behavior is:
+ * the CPU is used 100% by project B
+ * the GPU is used 75% by project A and 25% by project B
+This provides equal processing to the two projects.
+== Terminology ==
+The target behavior is:
+ * the CPU is used 100% by B
+ * the GPU is used 75% by A and 25% by B
+This provides equal total processing to A and B.
+=== Example 2 ===
+A has a 1-year CPU job with no slack, so it runs in high-priority mode.
+B has jobs available.
+Goal: after A's job finishes, B gets the CPU for a year.
+Variation: a new project C is attached when A's job finishes.
+It should immediately share the CPU with B.
+=== Example 3 ===
+A has GPU jobs but B doesn't.
+After a year, B gets a GPU app.
+Goal: A and B immediately share the GPU.
+== Resource types ==
 New abstraction: '''processing resource type''' just "resource type".
 Examples:
+Examples of resource types:
  * CPU
+ * A type of GPU
+ * the SPE processors in a Cell
+ * A coprocessor type (a kind of GPU, or the SPE processors in a Cell)
 A job sent to a client is associated with an app version,
 …
 and some number of instances of a particular coprocessor type.
-This design does not accommodate:
- * jobs that use more than one coprocessor type
- * jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).
 == Scheduler request and reply message ==
 …
  '''double req_instances''':: send enough jobs to occupy this many instances
+The semantics: a scheduler should send jobs for a resource type
+only if the request for that type is nonzero.
 For compatibility with old servers, the message still has '''work_req_seconds''',
 which is the max of the req_seconds.
+The semantics: a scheduler should send jobs for a resource type
+only if the request for that type is nonzero.
+== Client ==
+=== Per-resource-type backoff ===
+== Per-resource-type backoff ==
 We need to handle the situation where e.g. there's a GPU shortfall
 …
 we may ask it for resource B as well, even if it's backed off for B.
 === Long-term debt ===
+== Long-term debt ==
 We continue to use the idea of '''long-term debt''' (LTD),
 …
  * There is a separate LTD for each resource type
  * The "overall LTD", which is used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second).
+ * The "overall LTD", used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second).
 Per-resource LTD is maintained as follows:
 …
 === Work-fetch state ===
+Each resource has its own set of data related to work fetch.
+This is stored in an object of class PRSC_WORK_FETCH.
 Data members of PRSC_WORK_FETCH:
+== Client data structures ==
+=== RSC_WORK_FETCH ===
+Work-fetch state for a particular resource types.
+Data members:
  '''ninstances''':: number of instances of this resource type
 …
  '''double nidle''':: number of currently idle instances
 Member functions of PRSC_WORK_FETCH:
+Member functions:
  '''rr_init()''':: called at the start of RR simulation.  Compute project shares for this PRSC, and clear overall and per-project shortfalls.
  '''set_nidle()''':: called by RR sim after initial job assignment.
 Set nidle to # of idle instances.
  '''accumulate_shortfall(dt)''':: called by RR sim for each time interval during work buf period.
+ '''accumulate_shortfall()''':: called by RR sim for each time interval during work buf period.
 {{{
 shortfall += dt*(ninstances - instances in use)
 …
 }}}
+Each PRSC also needs to have some per-project data.
+This is stored in an object of class PRSC_PROJECT_DATA.
+=== RSC_PROJECT_WORK_FETCH ===
+State for a (resource type, project pair).
 It has the following "persistent" members (i.e., saved in state file):
  '''backoff timer'''*::  how long to wait until ask project for work specifically for this PRSC;
+ '''backoff_interval'''::  how long to wait until ask project for work specifically for this PRSC;
 double this any time we ask for work for this rsc and get none (maximum 24 hours). Clear it when we ask for work for this PRSC and get some job.
+ '''backoff_time''':: back off until this time
+ '''debt''': long term debt
 And the following transient members (used by rr_simulation()):
  '''double share''':: # of instances this project should get based on resource share
+ '''double runnable_share''':: # of instances this project should get based on resource share
 relative to the set of projects not backed off for this PRSC.
  '''instances_used''':: # of instances currently being used
+ '''double shortfall'''::
+ '''accumulate_shortfall(dt)'''::
+{{{
+shortfall += dt*(share - instances_used)
+}}}
+Each project has the following work-fetch-related state:
+ '''double long_term_debt*''':: the amount of processing (including GPU, but expressed in terms of CPU seconds) owed to this project.
+=== debt accounting ===
+{{{
+for each resource type R
+   for each project P
+      if P is not backed off for R
+         P.R.LTD += share
+   for each running job J, project P
+      for each resource R used by J
+         P.R.LTD -= share*dt
+}}}
+=== RR simulation ===
+{{{
+cpu_work_fetch.rr_init()
+cuda_work_fetch.rr_init()
+compute initial assignment of jobs
+cpu_work_fetch.set_nidle();
+cuda_work_fetch.set_nidle();
+do simulation as current
+on completion of an interval dt
+   cpu_work_fetch.accumulate_shortfall(dt)
+   cuda_work_fetch.accumulate_shortfall(dt)
+}}}
+=== Work fetch ===
+{{{
+=== PROJECT_WORK_FETCH ===
+Per-project work fetch state.
+Members:
+ '''overall_debt''':: weighted sum of per-resource debts
+=== WORK_FETCH ===
+Overall work-fetch state.
+=== Pseudo-code ===
+The top-level function is:
+{{{
+WORK_FETCH::choose_project()
 rr_simulation()
 …
 }}}
+{{{
+for each resource type R
+   for each project P
+      if P is not backed off for R
+         P.R.LTD += share
+   for each running job J, project P
+      for each resource R used by J
+         P.R.LTD -= share*dt
+}}}
+=== RR simulation ===
+{{{
+cpu_work_fetch.rr_init()
+cuda_work_fetch.rr_init()
+compute initial assignment of jobs
+cpu_work_fetch.set_nidle();
+cuda_work_fetch.set_nidle();
+do simulation as current
+on completion of an interval dt
+   cpu_work_fetch.accumulate_shortfall(dt)
+   cuda_work_fetch.accumulate_shortfall(dt)
+}}}
+=== Work fetch ===
 === Handling scheduler reply ===
 …
 The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons.
+This design does not accommodate:
+ * jobs that use more than one coprocessor type
+ * jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).