| 69 | | === Minimizing makespan === |
| 70 | | |
| 71 | | BOINC's scheduler should be modified to reduce makespan. |
| 72 | | The scheduler has control over two things: |
| 73 | | |
| 74 | | * what hosts to send which job instances to (and when to create new instances) |
| 75 | | * what deadline to assign to each job |
| 76 | | |
| 77 | | A variety of techniques are possible: |
| 78 | | |
| 79 | | * Use slower hosts for the first jobs of a batch, |
| 80 | | with the expectation that the host will finish one job |
| 81 | | in the duration of the batch |
| 82 | | (and by extension, very slow hosts should possibly |
| 83 | | not get any jobs for some batches). |
| 84 | | * Use faster, more reliable hosts for the last jobs of a batch. |
| 85 | | * Use increased replication for the last jobs of a batch |
| 86 | | to reduce the expected minimum turnaround. |
| 87 | | |
| 88 | | These techniques can be combined into policies. |
| 89 | | To study the relative performance of these policies, |
| 90 | | we can develop a simulator that takes as input a description |
| 91 | | of a real volunteer host population (e.g., SETI@home) |
| 92 | | and the parameters of a random batch arrival process, |
| 93 | | and computes the statistics of makespan, |
| 94 | | and perhaps the difference between predicted and actual makespan (see below). |
| 95 | | |
| 96 | | === Estimating makespan === |
| 97 | | |
| 98 | | Given the description of a possible new batch, |
| 99 | | the system should be able to provide a reasonably accurate estimate |
| 100 | | of its makespan (perhaps with error bars). |
| 101 | | The computation of this estimate must take into account |
| 102 | | |
| 103 | | * The computing resource (i.e. the distributions of speed and availability |
| 104 | | of the volunteer hosts, and their number). |
| 105 | | * The existing workload (batches currently in progress, and their state) |
| 106 | | * The parameters of the batch |
| 107 | | * The computing share and debt of the requesting user |
| 108 | | * The batch scheduling policy |
| 109 | | |
| 110 | | In addition, the system should provide users with a web-based interface |
| 111 | | for viewing batches in progress, |
| 112 | | showing their fraction done, |
| 113 | | an updated estimate of their completion time, |
| 114 | | information about errors, |
| 115 | | and a button for aborting the batch. |
| 116 | | |
| 117 | | === Job DAGs === |
| 118 | | |
| 119 | | Batches are a special case of Directed Acyclic Graphs (DAGs) of jobs. |
| 120 | | The nodes in such a graph represent jobs; |
| 121 | | there is an edge from job A to job B if A must be completed |
| 122 | | before B can be started. |
| 123 | | |
| 124 | | A batch can be viewed as a graph consisting of |
| 125 | | a "start node" fanning out to N job nodes, |
| 126 | | then fanning back in to a "finish node". |
| 127 | | |
| 128 | | Science portals may offer the ability to submit collections of work |
| 129 | | that can be described as DAGs but not as batches: |
| 130 | | for example, a collection of units, each of which consists of |
| 131 | | running program A and using its output as input to program B. |
| 132 | | |
| 133 | | (Note: BOINC's wrapper mechanism, which allows multiple applications |
| 134 | | to be run in sequence as part of a BOINC job, |
| 135 | | supports some simple cases, but it not general because it |
| 136 | | requires dependent jobs to run on a single host). |
| 137 | | |
| 138 | | As an extension of the work described above, |
| 139 | | we could support DAGs rather than just batches. |
| 140 | | |
| 141 | | == Combining multiple scheduling types == |
| 142 | | |
| 143 | | BOINC potentially supports the following types of work, |
| 144 | | each with its own scheduling mechanism: |
| 145 | | |
| 146 | | * Job-stream: the input is a stream of jobs; |
| 147 | | the goal is to maximize throughput and eventually finish all jobs. |
| 148 | | * Batch: described above. |
| 149 | | * Locality: a variant of job-stream in which |
| 150 | | jobs are assigned according to the files resident on the client. |
| 151 | | * Co-scheduling: like batch, except all jobs in each batch |
| 152 | | must run simultaneously (e.g., MPI applications). |
| 153 | | |
| 154 | | A single project (such as a portal) may have work of some or all types. |
| 155 | | |
| 156 | | How should we handle multiple scheduling types - |
| 157 | | in particular, how can we maintain resource shares in their presence? |
| 158 | | |
| 159 | | One simple approach is to maintain, for each scheduling type T, |
| 160 | | the maximum debt D(T) of any user with an unsent job of type T. |
| 161 | | Then use this top-level policy: |
| 162 | | {{{ |
| 163 | | for each scheduling type T in order of descending D(T) |
| 164 | | send jobs of type T |
| 165 | | }}} |
| 166 | | How to maintain D(T): |
| 167 | | |
| 168 | | * Job-stream: maintain in the feeder (consider only jobs in the cache) |
| 169 | | * Locality: constraint: only 1 user can use locality scheduling |
| 170 | | * Batch: maintain in batch module |
| 171 | | * Co-scheduling: maintain in co-scheduling module |
| 172 | | |
| 173 | | == Maintaining debts == |
| 174 | | |
| 175 | | A user's debt is adjusted each time a job is dispatched, |
| 176 | | based on the estimated FLOPS of the job. |
| 177 | | This quantity is recorded in the job record. |
| 178 | | |
| 179 | | When a job finishes or times out |
| 180 | | (at which point the actual computing done is known) |
| 181 | | the user debt is adjusted accordingly. |
| 182 | | |
| 183 | | == An idea for prioritizing batches == |
| | 47 | == Prioritizing batches == |
| 236 | | The scheme doesn't use the idea of |
| 237 | | accumulated credit proposed above. |
| 238 | | This is replaced by LST(U). |
| | 100 | == Combining multiple scheduling types == |
| | 101 | |
| | 102 | BOINC potentially supports the following types of work, |
| | 103 | each with its own scheduling mechanism: |
| | 104 | |
| | 105 | * Job-stream: the input is a stream of jobs; |
| | 106 | the goal is to maximize throughput and eventually finish all jobs. |
| | 107 | * Batch: described above. |
| | 108 | * Locality: a variant of job-stream in which |
| | 109 | jobs are assigned according to the files resident on the client. |
| | 110 | * Co-scheduling: like batch, except all jobs in each batch |
| | 111 | must run simultaneously (e.g., MPI applications). |
| | 112 | |
| | 113 | A single project (such as a portal) may have work of some or all types. |
| | 114 | |
| | 115 | How should we handle multiple scheduling types - |
| | 116 | in particular, how can we maintain resource shares in their presence? |
| | 117 | |
| | 118 | One simple approach is to maintain, for each scheduling type T, |
| | 119 | the least LET(T) of any unsent job of type T. |
| | 120 | Then use this top-level policy: |
| | 121 | {{{ |
| | 122 | for each scheduling type T in order of increasing LET(T) |
| | 123 | send jobs of type T |
| | 124 | }}} |
| | 125 | How to maintain LET(T): |
| | 126 | |
| | 127 | * Job-stream: maintain in the feeder (consider only jobs in the cache) |
| | 128 | * Locality: constraint: only 1 user can use locality scheduling |
| | 129 | * Batch: maintain in batch module |
| | 130 | * Co-scheduling: maintain in co-scheduling module |
| | 131 | |