| | 1 | = Client CPU/GPU scheduling = |
| | 2 | |
| | 3 | Prior to version 6.3, the BOINC client assumed that a running application |
| | 4 | uses 1 CPU. |
| | 5 | Starting with version 6.3, this is generalized. |
| | 6 | * Apps may use coprocessors (such as GPUs) |
| | 7 | * The number of CPUs used by an app may be more or less than one, and it need not be an integer. |
| | 8 | |
| | 9 | For example, an app might use 2 CUDA GPUs and 0.5 CPUs. |
| | 10 | This information is visible in the BOINC Manager. |
| | 11 | |
| | 12 | The client's scheduler (i.e., the decision of which apps to run) |
| | 13 | has been modified to accommodate this diversity of apps. |
| | 14 | |
| | 15 | == The way things used to work == |
| | 16 | |
| | 17 | The old scheduling policy: |
| | 18 | |
| | 19 | * Make a list of runnable jobs, ordered by "importance" (as determined by whether the job is in danger of missing its deadline, and the long-term debt of its project). |
| | 20 | * Run jobs in order of decreasing importance. Skip those that would exceed RAM limits. Keep going until we're running NCPUS jobs. |
| | 21 | |
| | 22 | There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint - |
| | 23 | but that's the basic idea. |
| | 24 | |
| | 25 | == How things work in 6.3 == |
| | 26 | |
| | 27 | Suppose we're on a machine with 1 CPU and 1 GPU, |
| | 28 | and that we have the following runnable jobs (in order of decreasing importance): |
| | 29 | {{{ |
| | 30 | 1) 1 CPU, 0 GPU |
| | 31 | 2) 1 CPU, 0 GPU |
| | 32 | 3) .5 CPU, 1 GPU |
| | 33 | }}} |
| | 34 | |
| | 35 | What should we run? |
| | 36 | If we use the old policy we'll just run 1), and the GPU will be idle. |
| | 37 | This is bad - the GPU typically is 50X faster than the CPU, |
| | 38 | and it seems like we should use it if at all possible. |
| | 39 | |
| | 40 | This leads to the following policy: |
| | 41 | |
| | 42 | |
| | 43 | == Unresolved issues == |
| | 44 | |
| | 45 | Apps that use GPUs use the CPU as well. |
| | 46 | The CPU part typically is a polling loop: |
| | 47 | it starts a "kernel" on the GPU, |
| | 48 | waits for it to finish (checking once per .01 sec, say) |
| | 49 | then starts another kernel. |
| | 50 | |
| | 51 | If there's a delay between when the kernel finishes |
| | 52 | and when the CPU starts another one, |
| | 53 | the GPU sits idle and the entire program runs slowly. |
| | 54 | |