Changes between Version 26 and Version 27 of GpuWorkFetch


Ignore:
Timestamp:
Jan 27, 2009, 11:51:36 AM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuWorkFetch

    v26 v27  
    2424 * LTD is computed solely on the basis of CPU time used, so it doesn't provide a meaningful comparison between projects that use only GPUs, or between a GPU and CPU projects.
    2525
    26 == Example ==
     26== Examples ==
     27
     28In following, A and B are projects.
     29
     30=== Example 1 ===
    2731
    2832Suppose that:
    29  * Project A has only GPU jobs and project B has both GPU and CPU jobs.
    30  * A host is attached to projects A and B with equal resource shares.
     33 * A has only GPU jobs and B has both GPU and CPU jobs.
     34 * The host is attached to A and B with equal resource shares.
    3135 * The host's GPU is twice as fast as its CPU.
    3236
    33 In this case, the target behavior is:
    34  * the CPU is used 100% by project B
    35  * the GPU is used 75% by project A and 25% by project B
    36 
    37 This provides equal processing to the two projects.
    38 
    39 == Terminology ==
     37The target behavior is:
     38 * the CPU is used 100% by B
     39 * the GPU is used 75% by A and 25% by B
     40
     41This provides equal total processing to A and B.
     42
     43=== Example 2 ===
     44
     45A has a 1-year CPU job with no slack, so it runs in high-priority mode.
     46B has jobs available.
     47
     48Goal: after A's job finishes, B gets the CPU for a year.
     49
     50Variation: a new project C is attached when A's job finishes.
     51It should immediately share the CPU with B.
     52
     53=== Example 3 ===
     54
     55A has GPU jobs but B doesn't.
     56After a year, B gets a GPU app.
     57
     58Goal: A and B immediately share the GPU.
     59
     60== Resource types ==
    4061
    4162New abstraction: '''processing resource type''' just "resource type".
    42 Examples:
     63Examples of resource types:
    4364 * CPU
    44  * A type of GPU
    45  * the SPE processors in a Cell
     65 * A coprocessor type (a kind of GPU, or the SPE processors in a Cell)
    4666
    4767A job sent to a client is associated with an app version,
     
    4969and some number of instances of a particular coprocessor type.
    5070
    51 This design does not accommodate:
    52 
    53  * jobs that use more than one coprocessor type
    54  * jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).
    55 
    5671== Scheduler request and reply message ==
    5772
     
    6681 '''double req_instances''':: send enough jobs to occupy this many instances
    6782
     83The semantics: a scheduler should send jobs for a resource type
     84only if the request for that type is nonzero.
     85
    6886For compatibility with old servers, the message still has '''work_req_seconds''',
    6987which is the max of the req_seconds.
    7088
    71 The semantics: a scheduler should send jobs for a resource type
    72 only if the request for that type is nonzero.
    73 
    74 == Client ==
    75 
    76 === Per-resource-type backoff ===
     89== Per-resource-type backoff ==
    7790
    7891We need to handle the situation where e.g. there's a GPU shortfall
     
    95108we may ask it for resource B as well, even if it's backed off for B.
    96109
    97 === Long-term debt ===
     110== Long-term debt ==
    98111
    99112We continue to use the idea of '''long-term debt''' (LTD),
     
    118131
    119132 * There is a separate LTD for each resource type
    120  * The "overall LTD", which is used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second).
     133 * The "overall LTD", used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second).
    121134
    122135Per-resource LTD is maintained as follows:
     
    133146
    134147
    135 === Work-fetch state ===
    136 
    137 Each resource has its own set of data related to work fetch.
    138 This is stored in an object of class PRSC_WORK_FETCH.
    139 
    140 Data members of PRSC_WORK_FETCH:
     148== Client data structures ==
     149
     150=== RSC_WORK_FETCH ===
     151
     152Work-fetch state for a particular resource types.
     153Data members:
    141154
    142155 '''ninstances''':: number of instances of this resource type
     
    147160 '''double nidle''':: number of currently idle instances
    148161
    149 Member functions of PRSC_WORK_FETCH:
     162Member functions:
    150163
    151164 '''rr_init()''':: called at the start of RR simulation.  Compute project shares for this PRSC, and clear overall and per-project shortfalls.
    152165 '''set_nidle()''':: called by RR sim after initial job assignment.
    153166Set nidle to # of idle instances.
    154  '''accumulate_shortfall(dt)''':: called by RR sim for each time interval during work buf period.
     167 '''accumulate_shortfall()''':: called by RR sim for each time interval during work buf period.
    155168{{{
    156169shortfall += dt*(ninstances - instances in use)
     
    169182}}}
    170183
    171 Each PRSC also needs to have some per-project data.
    172 This is stored in an object of class PRSC_PROJECT_DATA.
     184=== RSC_PROJECT_WORK_FETCH ===
     185
     186State for a (resource type, project pair).
    173187It has the following "persistent" members (i.e., saved in state file):
    174188
    175  '''backoff timer'''*::  how long to wait until ask project for work specifically for this PRSC;
     189 '''backoff_interval'''::  how long to wait until ask project for work specifically for this PRSC;
    176190double this any time we ask for work for this rsc and get none (maximum 24 hours). Clear it when we ask for work for this PRSC and get some job.
     191 '''backoff_time''':: back off until this time
     192 '''debt''': long term debt
    177193
    178194And the following transient members (used by rr_simulation()):
    179195
    180  '''double share''':: # of instances this project should get based on resource share
     196 '''double runnable_share''':: # of instances this project should get based on resource share
    181197relative to the set of projects not backed off for this PRSC.
    182198 '''instances_used''':: # of instances currently being used
    183  '''double shortfall'''::
    184  '''accumulate_shortfall(dt)'''::
    185 {{{
    186 shortfall += dt*(share - instances_used)
    187 }}}
    188 
    189 Each project has the following work-fetch-related state:
    190  '''double long_term_debt*''':: the amount of processing (including GPU, but expressed in terms of CPU seconds) owed to this project.
    191 
    192 === debt accounting ===
    193 {{{
    194 
    195 for each resource type R
    196    for each project P
    197       if P is not backed off for R
    198          P.R.LTD += share
    199    for each running job J, project P
    200       for each resource R used by J
    201          P.R.LTD -= share*dt
    202 }}}
    203 
    204 === RR simulation ===
    205 
    206 {{{
    207 cpu_work_fetch.rr_init()
    208 cuda_work_fetch.rr_init()
    209 
    210 compute initial assignment of jobs
    211 cpu_work_fetch.set_nidle();
    212 cuda_work_fetch.set_nidle();
    213 
    214 do simulation as current
    215 on completion of an interval dt
    216    cpu_work_fetch.accumulate_shortfall(dt)
    217    cuda_work_fetch.accumulate_shortfall(dt)
    218 }}}
    219 
    220 === Work fetch ===
    221 
    222 {{{
     199
     200=== PROJECT_WORK_FETCH ===
     201
     202Per-project work fetch state.
     203Members:
     204 '''overall_debt''':: weighted sum of per-resource debts
     205
     206=== WORK_FETCH ===
     207
     208Overall work-fetch state.
     209
     210=== Pseudo-code ===
     211
     212The top-level function is:
     213{{{
     214WORK_FETCH::choose_project()
    223215rr_simulation()
    224216
     
    254246
    255247}}}
     248
     249{{{
     250
     251for each resource type R
     252   for each project P
     253      if P is not backed off for R
     254         P.R.LTD += share
     255   for each running job J, project P
     256      for each resource R used by J
     257         P.R.LTD -= share*dt
     258}}}
     259
     260=== RR simulation ===
     261
     262{{{
     263cpu_work_fetch.rr_init()
     264cuda_work_fetch.rr_init()
     265
     266compute initial assignment of jobs
     267cpu_work_fetch.set_nidle();
     268cuda_work_fetch.set_nidle();
     269
     270do simulation as current
     271on completion of an interval dt
     272   cpu_work_fetch.accumulate_shortfall(dt)
     273   cuda_work_fetch.accumulate_shortfall(dt)
     274}}}
     275
     276=== Work fetch ===
     277
    256278
    257279=== Handling scheduler reply ===
     
    299321
    300322The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons.
     323
     324This design does not accommodate:
     325
     326 * jobs that use more than one coprocessor type
     327 * jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).
     328