Changes between Version 25 and Version 26 of GpuWorkFetch


Ignore:
Timestamp:
Jan 27, 2009, 10:54:31 AM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuWorkFetch

    v25 v26  
    9292and so on.
    9393
    94 *** Question:  If we need to contact a project for a tasks of two different types, and one of the backoffs is satisfied, do we ask for both types?
     94Note: if we decide to ask a project for work for resource A,
     95we may ask it for resource B as well, even if it's backed off for B.
    9596
    9697=== Long-term debt ===
    9798
    98 We'll continue to use the idea of '''long-term debt''' (LTD),
    99 which represents how much work (measured in device instance-seconds) is "owed" to each project P.
     99We continue to use the idea of '''long-term debt''' (LTD),
     100representing how much work (measured in device instance-seconds) is "owed" to each project P.
    100101This increases over time in proportion to P's resource share,
    101102and decreases as P uses resources.
    102 Simplified summary of the new policy: when we need work for a resource,
    103 we ask the project that may have that type of job and whose LTD is greatest.
    104 
    105 The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons.
     103Simplified summary of the new policy: when we need work for a resource R,
     104we ask the project that is not backed off for R and whose LTD is greatest.
    106105
    107106The notion of LTD needs to span resources;
     
    121120 * The "overall LTD", which is used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second).
    122121
    123 Next we need to specify exactly how LTD is maintained.
    124 It's clear how it decreases; the question is, how is it increased?
    125 We need to avoid situations where LTD increases without bound.
    126 
    127 The design is as follows.
    128 A project P accumulates debt for a resource when:
    129  * P is not backed off for that resource, and the backoff interval is not at the max.
     122Per-resource LTD is maintained as follows:
     123
     124A project is "debt eligible" for a resource R if:
     125
     126 * P is not backed off for R, and the backoff interval is not at the max.
    130127 * P is not suspended via GUI, and "no more tasks" is not set
    131128
    132 The rate at which P accumulates debt is its resource share relative
    133 to all the projects satisfying the above.
    134 
    135 When an application has used N instances of a resource for a time T,
    136 its debt decreases by an amount proportional to N*T.
     129Debt is adjusted as follows:
     130 * For each debt-eligible project P, the debt is increased by the amount it's owed (delta T times its resource share relative to other debt-eligible projects) minus the amount it got (the number of instance-seconds).
     131 * An offset is added to debt-eligible projects so that the net change is zero.  This prevents debt-eligible projects from drifting away from other projects.
     132 * An offset is added so that the maximum debt across all projects is zero (this ensures that when a new project is attached, it starts out debt-free).
     133
    137134
    138135=== Work-fetch state ===
     
    298295      seconds_to_fill
    299296}}}
     297
     298== Notes ==
     299
     300The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons.