Changes between Version 15 and Version 16 of GpuWorkFetch


Ignore:
Timestamp:
Dec 26, 2008, 1:50:45 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuWorkFetch

    v15 v16  
    3737 * A '''CUDA job''' is one that uses CUDA (and may use CPU as well).
    3838
    39 == Scheduler request ==
     39== Scheduler request and reply message ==
    4040
    4141New fields in the scheduler request message:
     
    5252this is the max of (cpu,cuda)_req_seconds.
    5353
     54New fields in the scheduler reply message (these are not currently used):
     55
     56'''double have_cpu_jobs''': this project sometimes has CPU jobs for this platform (although this reply may not include any).
     57
     58'''double have_cuda_jobs''': same, for CUDA jobs.
     59
    5460== Client ==
    5561
     
    5965=== Per-resource-type backoff ===
    6066
    61 We need to handle the situation where there's a GPU shortfall
     67We need to handle the situation where e.g. there's a GPU shortfall
    6268but no projects are supplying GPU work
    6369(for either permanent or transient reasons).
    6470We don't want an overall work-fetch backoff from those projects.
     71
    6572Instead, we maintain a separate backoff timer per (project, PRSC).
    66 This is doubled whenever we ask for only work of that type and don't get any;
     73This is doubled whenever we ask for only work of that type and don't get any work;
    6774it's cleared whenever we get a job of that type.
    6875
     
    8592
    8693'''rr_init()''': called at the start of RR simulation.
    87 Compute share of each project for this PRSC,
    88 and clear shortfall.
     94Compute project shares for this PRSC, and clear overall and per-project shortfalls.
    8995
    9096'''set_nidle()''': called by RR sim after initial job assignment.
     
    9298
    9399'''accumulate_shortfall(dt)''': called by RR sim for each time interval during work buf period.
    94 {{{
    95 nidle_now = ninstances - instances in use
    96 shortfall += dt*(nidle_now)
     100{{{
     101shortfall += dt*(ninstances - instances in use)
    97102for each project p not backed off for this PRSC
    98103    p->PRSC_PROJECT_DATA.accumulate_shortfall(dt)
     
    102107select the best project to request this type of work from.
    103108It's the project not backed off for this PRSC,
    104 and for which LTD + this->shortfall is largest
     109and for which LTD + p->shortfall is largest
    105110
    106111'''accumulate_debt(dt)''':
     
    162167
    163168{{{
     169rr_simulation()
     170
    164171if cuda_work_fetch.nidle
    165172   cpu_work_fetch.shortfall = 0
     
    195202
    196203=== Handling scheduler reply ===
    197 
     204{{{
    198205if no jobs returned
    199206   double backoff for each requested PRSC
    200 
     207else
     208   clear backoff for the PRSC of each returned job
     209}}}
    201210== Scheduler changes ==
    202211{{{
     
    204213   have_cpu_app_versions
    205214   have_cuda_app_versions
    206 per-req vars
     215per-request vars
    207216   bool coproc_request
    208217   ncpu_jobs_sending