| 1 | = Client CPU/GPU scheduling = |
| 2 | |
| 3 | Prior to version 6.3, the BOINC client assumed that a running application |
| 4 | uses 1 CPU. |
| 5 | Starting with version 6.3, this is generalized. |
| 6 | * Apps may use coprocessors (such as GPUs) |
| 7 | * The number of CPUs used by an app may be more or less than one, and it need not be an integer. |
| 8 | |
| 9 | For example, an app might use 2 CUDA GPUs and 0.5 CPUs. |
| 10 | This information is visible in the BOINC Manager. |
| 11 | |
| 12 | The client's scheduler (i.e., the decision of which apps to run) |
| 13 | has been modified to accommodate this diversity of apps. |
| 14 | |
| 15 | == The way things used to work == |
| 16 | |
| 17 | The old scheduling policy: |
| 18 | |
| 19 | * Make a list of runnable jobs, ordered by "importance" (as determined by whether the job is in danger of missing its deadline, and the long-term debt of its project). |
| 20 | * Run jobs in order of decreasing importance. Skip those that would exceed RAM limits. Keep going until we're running NCPUS jobs. |
| 21 | |
| 22 | There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint - |
| 23 | but that's the basic idea. |
| 24 | |
| 25 | == How things work in 6.3 == |
| 26 | |
| 27 | Suppose we're on a machine with 1 CPU and 1 GPU, |
| 28 | and that we have the following runnable jobs (in order of decreasing importance): |
| 29 | {{{ |
| 30 | 1) 1 CPU, 0 GPU |
| 31 | 2) 1 CPU, 0 GPU |
| 32 | 3) .5 CPU, 1 GPU |
| 33 | }}} |
| 34 | |
| 35 | What should we run? |
| 36 | If we use the old policy we'll just run 1), and the GPU will be idle. |
| 37 | This is bad - the GPU typically is 50X faster than the CPU, |
| 38 | and it seems like we should use it if at all possible. |
| 39 | |
| 40 | This leads to the following policy: |
| 41 | |
| 42 | |
| 43 | == Unresolved issues == |
| 44 | |
| 45 | Apps that use GPUs use the CPU as well. |
| 46 | The CPU part typically is a polling loop: |
| 47 | it starts a "kernel" on the GPU, |
| 48 | waits for it to finish (checking once per .01 sec, say) |
| 49 | then starts another kernel. |
| 50 | |
| 51 | If there's a delay between when the kernel finishes |
| 52 | and when the CPU starts another one, |
| 53 | the GPU sits idle and the entire program runs slowly. |
| 54 | |