3 | | == Current policy == |
4 | | |
5 | | * Weighted round-robin simulation |
6 | | * get per-project and overall CPU shortfalls |
7 | | * see what misses deadline |
8 | | * If overall shortfall, get work from project with highest LTD |
9 | | * Scheduler request includes just "work_req_seconds". |
10 | | |
11 | | Problems: |
12 | | |
13 | | There may be no CPU shortfall, but GPU is idle |
14 | | |
15 | | If GPU is idle, we should get work from a project that potentially has jobs for it. |
16 | | |
17 | | If the project has both CPU and GPU jobs, we may need to tell to send only GPU jobs. |
18 | | |
19 | | LTD isn't meaningful with GPUs |
20 | | |
21 | | == New policy == |
22 | | |
23 | | {{{ |
24 | | A CPU job is one that uses only CPU time |
25 | | A CUDA job is one that uses CUDA (and may use CPU as well) |
| 3 | == Problems with the current work fetch policy == |
| 4 | |
| 5 | The current work-fetch policy is essentially: |
| 6 | * Do a weighted round-robin simulation, computing overall CPU shortfall |
| 7 | * If there's a shortfall, request work from the project with highest LTD |
| 8 | |
| 9 | The scheduler request has a single number "work_req_seconds" |
| 10 | indicating the total duration of jobs being requested. |
| 11 | |
| 12 | This policy has various problems. |
| 13 | |
| 14 | * There's no way for the client to say "I have N idle CPUs, so send me enough jobs to use them all". |
| 15 | |
| 16 | And many problems related to GPUs: |
| 17 | |
| 18 | * There may be no CPU shortfall, but GPUs are idle; no work will be fetched. |
| 19 | |
| 20 | * If a GPU is idle, we should get work from a project that potentially has jobs for it. |
| 21 | |
| 22 | * If a project has both CPU and GPU jobs, we may need to tell it to send only GPU (or only CPU) jobs. |
| 23 | |
| 24 | * LTD is computed solely on the basis of CPU time used, so it doesn't provide a meaningly comparison between projects that use only GPUs, or between a GPU project and a CPU project. |
| 25 | |
| 26 | This document proposes a work-fetch system that solves these problems. |
| 27 | |
| 28 | For simplicity, the design assumes that there is only one GPU time (CUDA). |
| 29 | It is straightforward to extend the design to handle additional GPU types. |
| 30 | |
| 31 | == Terminology == |
| 32 | |
| 33 | A job sent to a client is associated with an app version, |
| 34 | which uses some number (possibly fractional) of CPUs and CUDA devices. |
| 35 | |
| 36 | * A '''CPU job''' is one that uses only CPU. |
| 37 | * A '''CUDA job''' is one that uses CUDA (and may use CPU as well). |
| 38 | |
| 39 | == Scheduler request == |
| 40 | |
| 41 | New fields in scheduler request message: |
| 42 | |
| 43 | '''double cpu_req_seconds''': number of CPU seconds requested |
| 44 | |
| 45 | '''double cuda_req_seconds''': number of CUDA seconds requested |
| 46 | |
| 47 | '''double ninstances_cpu''': send enough jobs to occupy this many CPUs |
| 48 | |
| 49 | '''double ninstances_cuda''': send enough jobs to occupy this many CUDA devs |
| 50 | |
| 51 | For compatibility with old servers, the message still has '''work_req_seconds'''; |
| 52 | this is the max of (cpu,cuda)_req_seconds. |
| 53 | |
| 54 | == Client == |
| 55 | |
| 56 | New abstraction: '''processing resource''' or PRSC. |
| 57 | There are two processing resource types: CPU and CUDA. |
| 58 | |
| 59 | Each PRSC has its own |
| 60 | |