Changes between Version 25 and Version 26 of GpuWorkFetch
- Timestamp:
- Jan 27, 2009, 10:54:31 AM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GpuWorkFetch
v25 v26 92 92 and so on. 93 93 94 *** Question: If we need to contact a project for a tasks of two different types, and one of the backoffs is satisfied, do we ask for both types? 94 Note: if we decide to ask a project for work for resource A, 95 we may ask it for resource B as well, even if it's backed off for B. 95 96 96 97 === Long-term debt === 97 98 98 We 'llcontinue to use the idea of '''long-term debt''' (LTD),99 which representshow much work (measured in device instance-seconds) is "owed" to each project P.99 We continue to use the idea of '''long-term debt''' (LTD), 100 representing how much work (measured in device instance-seconds) is "owed" to each project P. 100 101 This increases over time in proportion to P's resource share, 101 102 and decreases as P uses resources. 102 Simplified summary of the new policy: when we need work for a resource, 103 we ask the project that may have that type of job and whose LTD is greatest. 104 105 The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons. 103 Simplified summary of the new policy: when we need work for a resource R, 104 we ask the project that is not backed off for R and whose LTD is greatest. 106 105 107 106 The notion of LTD needs to span resources; … … 121 120 * The "overall LTD", which is used in the work-fetch decision, is the sum of the resource LTDs, weighted by the speed of the resource (FLOPs per instance-second). 122 121 123 Next we need to specify exactly how LTD is maintained. 124 It's clear how it decreases; the question is, how is it increased? 125 We need to avoid situations where LTD increases without bound. 126 127 The design is as follows. 128 A project P accumulates debt for a resource when: 129 * P is not backed off for that resource, and the backoff interval is not at the max. 122 Per-resource LTD is maintained as follows: 123 124 A project is "debt eligible" for a resource R if: 125 126 * P is not backed off for R, and the backoff interval is not at the max. 130 127 * P is not suspended via GUI, and "no more tasks" is not set 131 128 132 The rate at which P accumulates debt is its resource share relative 133 to all the projects satisfying the above.134 135 When an application has used N instances of a resource for a time T, 136 its debt decreases by an amount proportional to N*T. 129 Debt is adjusted as follows: 130 * For each debt-eligible project P, the debt is increased by the amount it's owed (delta T times its resource share relative to other debt-eligible projects) minus the amount it got (the number of instance-seconds). 131 * An offset is added to debt-eligible projects so that the net change is zero. This prevents debt-eligible projects from drifting away from other projects. 132 * An offset is added so that the maximum debt across all projects is zero (this ensures that when a new project is attached, it starts out debt-free). 133 137 134 138 135 === Work-fetch state === … … 298 295 seconds_to_fill 299 296 }}} 297 298 == Notes == 299 300 The idea of using RAC as a surrogate for LTD was discussed and set aside for various reasons.