Changes between Version 11 and Version 12 of GpuWorkFetch
- Timestamp:
- Dec 26, 2008, 12:29:13 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GpuWorkFetch
v11 v12 10 10 indicating the total duration of jobs being requested. 11 11 12 This policy has variousproblems. First:12 This policy has some problems. First: 13 13 14 14 * There's no way for the client to say "I have N idle CPUs; send me enough jobs to use them all". 15 15 16 Problems related to GPUs:16 And various problems related to GPUs: 17 17 18 18 * If there is no CPU shortfall, no work will be fetched even if GPUs are idle. … … 54 54 == Client == 55 55 56 57 56 New abstraction: '''processing resource''' or PRSC. 58 57 There are two processing resource types: CPU and CUDA. 59 58 60 === Per-resource-type backoff 59 === Per-resource-type backoff === 61 60 62 61 We need to handle the situation where there's a GPU shortfall … … 68 67 it's cleared whenever we get a job of that type. 69 68 70 == - Work-fetch state==69 === Work-fetch state === 71 70 72 71 Each PRSC has its own set of data related to work fetch. 73 72 This is stored in an object of class PRSC_WORK_FETCH. 74 73 75 Data members of PRSC_WORK_FETCH :74 Data members of PRSC_WORK_FETCH (set by rr_simulation()): 76 75 77 76 '''double shortfall''': shortfall for this resource 78 '''double max_nidle''': number of idle instances 77 78 '''double nidle''': number of currently idle instances 79 79 80 80 Member functions of PRSC_WORK_FETCH: 81 81 82 '''clear()''': called at the start of RR simulation 83 82 '''rr_init()''': called at the start of RR simulation. 83 Compute share of each project for this PRSC, 84 and clear shortfall. 85 86 ------------ 84 87 '''prepare()''': called before exists_fetchable_project(). 85 88 sees if there's project to req from for this resource, and caches it … … 135 138 Each PRSC also needs to have some per-project data. 136 139 This is stored in an object of class PRSC_PROJECT_DATA. 137 Its members include (* means save in state file): 140 It has the following "persistent" members (i.e., saved in state file): 141 142 '''double long_term_debt*''' 143 144 '''backoff timer'''*: how long to wait until ask project for work specifically for this PRSC; 145 double this any time we ask for work for this rsc and get none 146 (maximum 24 hours). 147 Clear it when we ask for work for this PRSC and get some job. 148 149 And the following transient members (used by rr_simulation()): 138 150 139 151 '''double shortfall''' 140 152 141 '''int last_job'''*: last time we had a job from this proj using this rsc142 if the time is within last N days (30?)143 we assume that the project may possibly have jobs of that type144 145 153 '''bool runnable''' 146 154 147 155 '''max deficit''' 148 156 149 '''backoff timer'''*: how long to wait until ask project for work only for this rsc150 double this any time we ask only for work for this rsc and get none151 (maximum 24 hours).152 Clear it when we have a job that uses the PRSC.153 154 157 '''double share''': # of instances this project should get based on RS 155 158 156 '''double long_term_debt*''' 157 159 '''instances_used''': # of instances currently being used 158 160 159 161 === debt accounting === … … 166 168 167 169 {{{ 170 cpu_work_fetch.rr_init() 171 cuda_work_fetch.rr_init() 168 172 do simulation as current 169 173 on completion of an interval dt