Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of GpuSched

Timestamp:: Oct 10, 2008, 11:11:17 AM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GpuSched

                       v1
+= Client CPU/GPU scheduling =
+Prior to version 6.3, the BOINC client assumed that a running application
+uses 1 CPU.
+Starting with version 6.3, this is generalized.
+ * Apps may use coprocessors (such as GPUs)
+ * The number of CPUs used by an app may be more or less than one, and it need not be an integer.
+For example, an app might use 2 CUDA GPUs and 0.5 CPUs.
+This information is visible in the BOINC Manager.
+The client's scheduler (i.e., the decision of which apps to run)
+has been modified to accommodate this diversity of apps.
+== The way things used to work ==
+The old scheduling policy:
+ * Make a list of runnable jobs, ordered by "importance" (as determined by whether the job is in danger of missing its deadline, and the long-term debt of its project).
+ * Run jobs in order of decreasing importance.  Skip those that would exceed RAM limits.  Keep going until we're running NCPUS jobs.
+There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint -
+but that's the basic idea.
+== How things work in 6.3 ==
+Suppose we're on a machine with 1 CPU and 1 GPU,
+and that we have the following runnable jobs (in order of decreasing importance):
+{{{
+) 1 CPU, 0 GPU
+) 1 CPU, 0 GPU
+) .5 CPU, 1 GPU
+}}}
+What should we run?
+If we use the old policy we'll just run 1), and the GPU will be idle.
+This is bad - the GPU typically is 50X faster than the CPU,
+and it seems like we should use it if at all possible.
+This leads to the following policy:
+== Unresolved issues ==
+Apps that use GPUs use the CPU as well.
+The CPU part typically is a polling loop:
+it starts a "kernel" on the GPU,
+waits for it to finish (checking once per .01 sec, say)
+then starts another kernel.
+If there's a delay between when the kernel finishes
+and when the CPU starts another one,
+the GPU sits idle and the entire program runs slowly.