Context Navigation

Changes between Version 15 and Version 16 of GpuWorkFetch

Timestamp:: Dec 26, 2008, 1:50:45 PM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GpuWorkFetch

-                      v15
+                      v16
  * A '''CUDA job''' is one that uses CUDA (and may use CPU as well).
 == Scheduler request ==
+== Scheduler request and reply message ==
 New fields in the scheduler request message:
 …
 this is the max of (cpu,cuda)_req_seconds.
+New fields in the scheduler reply message (these are not currently used):
+'''double have_cpu_jobs''': this project sometimes has CPU jobs for this platform (although this reply may not include any).
+'''double have_cuda_jobs''': same, for CUDA jobs.
 == Client ==
 …
 === Per-resource-type backoff ===
 We need to handle the situation where there's a GPU shortfall
+We need to handle the situation where e.g. there's a GPU shortfall
 but no projects are supplying GPU work
 (for either permanent or transient reasons).
 We don't want an overall work-fetch backoff from those projects.
 Instead, we maintain a separate backoff timer per (project, PRSC).
 This is doubled whenever we ask for only work of that type and don't get any;
+This is doubled whenever we ask for only work of that type and don't get any work;
 it's cleared whenever we get a job of that type.
 …
 '''rr_init()''': called at the start of RR simulation.
+Compute share of each project for this PRSC,
+and clear shortfall.
+Compute project shares for this PRSC, and clear overall and per-project shortfalls.
 '''set_nidle()''': called by RR sim after initial job assignment.
 …
 '''accumulate_shortfall(dt)''': called by RR sim for each time interval during work buf period.
+{{{
+nidle_now = ninstances - instances in use
+shortfall += dt*(nidle_now)
+{{{
+shortfall += dt*(ninstances - instances in use)
 for each project p not backed off for this PRSC
     p->PRSC_PROJECT_DATA.accumulate_shortfall(dt)
 …
 select the best project to request this type of work from.
 It's the project not backed off for this PRSC,
 and for which LTD + this->shortfall is largest
+and for which LTD + p->shortfall is largest
 '''accumulate_debt(dt)''':
 …
 {{{
+rr_simulation()
 if cuda_work_fetch.nidle
    cpu_work_fetch.shortfall = 0
 …
 === Handling scheduler reply ===
+{{{
 if no jobs returned
    double backoff for each requested PRSC
+else
+   clear backoff for the PRSC of each returned job
+}}}
 == Scheduler changes ==
 {{{
 …
    have_cpu_app_versions
    have_cuda_app_versions
 per-req vars
+per-request vars
    bool coproc_request
    ncpu_jobs_sending