Changes between Version 1 and Version 2 of GpuSched
- Timestamp:
- Oct 13, 2008, 10:33:23 AM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GpuSched
v1 v2 1 1 = Client CPU/GPU scheduling = 2 2 3 Prior to version 6.3, the BOINC client assumed that a running application 4 uses 1 CPU. 3 Prior to version 6.3, the BOINC client assumed that each running application uses 1 CPU. 5 4 Starting with version 6.3, this is generalized. 6 * Apps may use coprocessors (such as GPUs) 5 * Apps may use coprocessors (such as GPUs). 7 6 * The number of CPUs used by an app may be more or less than one, and it need not be an integer. 8 7 … … 15 14 == The way things used to work == 16 15 17 The old scheduling policy :16 The old scheduling policy is: 18 17 19 * Make a list of runnable jobs, ordered by "importance" (asdetermined by whether the job is in danger of missing its deadline, and the long-term debt of its project).18 * Order runnable jobs by "importance" (determined by whether the job is in danger of missing its deadline, and the long-term debt of its project). 20 19 * Run jobs in order of decreasing importance. Skip those that would exceed RAM limits. Keep going until we're running NCPUS jobs. 21 20 22 There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint -21 There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpointed recently - 23 22 but that's the basic idea. 24 23 25 24 == How things work in 6.3 == 26 25 27 Suppose we're on a machine with 1 CPU and 1 GPU, 26 The main design goal of the new scheduler is to use all resources. 27 In particular, we try to always use the GPU even if that means 28 overcommitting the CPU. 29 "Overcommitting" means running a set of apps whose demand for for CPUs exceeds 30 the actual number of CPUs. 31 32 The new policy is: 33 * Scan the set of runnable jobs in decreasing order of importance. 34 * If a job uses a resource that's not already fully utilized, and fits in RAM, run it. 35 36 Example: suppose we're on a machine with 1 CPU and 1 GPU, 28 37 and that we have the following runnable jobs (in order of decreasing importance): 29 38 {{{ … … 38 47 and it seems like we should use it if at all possible. 39 48 40 Th is leads to the following policy:49 The new policy will do the following: 41 50 51 * Run job 1. 52 * Skip job 2 because the CPU is already fully utilized. 53 * Run job 3 because the GPU is not fully utilized. 54 55 So we end up running jobs whose CPU demand is 1.5. 56 That's OK - they just run slower than if running alone. 42 57 43 58 == Unresolved issues == … … 53 68 the GPU sits idle and the entire program runs slowly. 54 69 70 The CPU scheduler on Windows doesn't work well, 71 and when the CPU is overcommitted the CPU part of GPU applications 72 doesn't run as often as it needs to in order to keep the GPU "fed". 73 As a result the GPU is underutilized and the program runs slowly. 74 (This seems to happen even if the GPU app is run at high priority 75 while other apps run at low priority). 76 77 If we can't resolve this we'll have to change the scheduling policy 78 to avoid overcommitting the CPU in the presence of GPU apps.