Context Navigation

Changes between Version 11 and Version 12 of GpuWorkFetch

Timestamp:: Dec 26, 2008, 12:29:13 PM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GpuWorkFetch

-                      v11
+                      v12
 indicating the total duration of jobs being requested.
 This policy has various problems.  First:
+This policy has some problems.  First:
  * There's no way for the client to say "I have N idle CPUs; send me enough jobs to use them all".
 Problems related to GPUs:
+And various problems related to GPUs:
  * If there is no CPU shortfall, no work will be fetched even if GPUs are idle.
 …
 == Client ==
 New abstraction: '''processing resource''' or PRSC.
 There are two processing resource types: CPU and CUDA.
 === Per-resource-type backoff
+=== Per-resource-type backoff ===
 We need to handle the situation where there's a GPU shortfall
 …
 it's cleared whenever we get a job of that type.
 ==- Work-fetch state ==
+=== Work-fetch state ===
 Each PRSC has its own set of data related to work fetch.
 This is stored in an object of class PRSC_WORK_FETCH.
 Data members of PRSC_WORK_FETCH:
+Data members of PRSC_WORK_FETCH (set by rr_simulation()):
 '''double shortfall''': shortfall for this resource
+'''double max_nidle''': number of idle instances
+'''double nidle''': number of currently idle instances
 Member functions of PRSC_WORK_FETCH:
+'''clear()''': called at the start of RR simulation
+'''rr_init()''': called at the start of RR simulation.
+Compute share of each project for this PRSC,
+and clear shortfall.
+------------
 '''prepare()''': called before exists_fetchable_project().
 sees if there's project to req from for this resource, and caches it
 …
 Each PRSC also needs to have some per-project data.
 This is stored in an object of class PRSC_PROJECT_DATA.
+Its members include (* means save in state file):
+It has the following "persistent" members (i.e., saved in state file):
+'''double long_term_debt*'''
+'''backoff timer'''*:  how long to wait until ask project for work specifically for this PRSC;
+double this any time we ask for work for this rsc and get none
+(maximum 24 hours).
+Clear it when we ask for work for this PRSC and get some job.
+And the following transient members (used by rr_simulation()):
 '''double shortfall'''
-'''int last_job'''*: last time we had a job from this proj using this rsc
-if the time is within last N days (30?)
-we assume that the project may possibly have jobs of that type
 '''bool runnable'''
 '''max deficit'''
-'''backoff timer'''*:  how long to wait until ask project for work only for this rsc
-double this any time we ask only for work for this rsc and get none
-(maximum 24 hours).
-Clear it when we have a job that uses the PRSC.
 '''double share''': # of instances this project should get based on RS
+'''double long_term_debt*'''
+'''instances_used''': # of instances currently being used
 === debt accounting ===
 …
 {{{
+cpu_work_fetch.rr_init()
+cuda_work_fetch.rr_init()
 do simulation as current
 on completion of an interval dt