69 | | === Minimizing makespan === |
70 | | |
71 | | BOINC's scheduler should be modified to reduce makespan. |
72 | | The scheduler has control over two things: |
73 | | |
74 | | * what hosts to send which job instances to (and when to create new instances) |
75 | | * what deadline to assign to each job |
76 | | |
77 | | A variety of techniques are possible: |
78 | | |
79 | | * Use slower hosts for the first jobs of a batch, |
80 | | with the expectation that the host will finish one job |
81 | | in the duration of the batch |
82 | | (and by extension, very slow hosts should possibly |
83 | | not get any jobs for some batches). |
84 | | * Use faster, more reliable hosts for the last jobs of a batch. |
85 | | * Use increased replication for the last jobs of a batch |
86 | | to reduce the expected minimum turnaround. |
87 | | |
88 | | These techniques can be combined into policies. |
89 | | To study the relative performance of these policies, |
90 | | we can develop a simulator that takes as input a description |
91 | | of a real volunteer host population (e.g., SETI@home) |
92 | | and the parameters of a random batch arrival process, |
93 | | and computes the statistics of makespan, |
94 | | and perhaps the difference between predicted and actual makespan (see below). |
95 | | |
96 | | === Estimating makespan === |
97 | | |
98 | | Given the description of a possible new batch, |
99 | | the system should be able to provide a reasonably accurate estimate |
100 | | of its makespan (perhaps with error bars). |
101 | | The computation of this estimate must take into account |
102 | | |
103 | | * The computing resource (i.e. the distributions of speed and availability |
104 | | of the volunteer hosts, and their number). |
105 | | * The existing workload (batches currently in progress, and their state) |
106 | | * The parameters of the batch |
107 | | * The computing share and debt of the requesting user |
108 | | * The batch scheduling policy |
109 | | |
110 | | In addition, the system should provide users with a web-based interface |
111 | | for viewing batches in progress, |
112 | | showing their fraction done, |
113 | | an updated estimate of their completion time, |
114 | | information about errors, |
115 | | and a button for aborting the batch. |
116 | | |
117 | | === Job DAGs === |
118 | | |
119 | | Batches are a special case of Directed Acyclic Graphs (DAGs) of jobs. |
120 | | The nodes in such a graph represent jobs; |
121 | | there is an edge from job A to job B if A must be completed |
122 | | before B can be started. |
123 | | |
124 | | A batch can be viewed as a graph consisting of |
125 | | a "start node" fanning out to N job nodes, |
126 | | then fanning back in to a "finish node". |
127 | | |
128 | | Science portals may offer the ability to submit collections of work |
129 | | that can be described as DAGs but not as batches: |
130 | | for example, a collection of units, each of which consists of |
131 | | running program A and using its output as input to program B. |
132 | | |
133 | | (Note: BOINC's wrapper mechanism, which allows multiple applications |
134 | | to be run in sequence as part of a BOINC job, |
135 | | supports some simple cases, but it not general because it |
136 | | requires dependent jobs to run on a single host). |
137 | | |
138 | | As an extension of the work described above, |
139 | | we could support DAGs rather than just batches. |
140 | | |
141 | | == Combining multiple scheduling types == |
142 | | |
143 | | BOINC potentially supports the following types of work, |
144 | | each with its own scheduling mechanism: |
145 | | |
146 | | * Job-stream: the input is a stream of jobs; |
147 | | the goal is to maximize throughput and eventually finish all jobs. |
148 | | * Batch: described above. |
149 | | * Locality: a variant of job-stream in which |
150 | | jobs are assigned according to the files resident on the client. |
151 | | * Co-scheduling: like batch, except all jobs in each batch |
152 | | must run simultaneously (e.g., MPI applications). |
153 | | |
154 | | A single project (such as a portal) may have work of some or all types. |
155 | | |
156 | | How should we handle multiple scheduling types - |
157 | | in particular, how can we maintain resource shares in their presence? |
158 | | |
159 | | One simple approach is to maintain, for each scheduling type T, |
160 | | the maximum debt D(T) of any user with an unsent job of type T. |
161 | | Then use this top-level policy: |
162 | | {{{ |
163 | | for each scheduling type T in order of descending D(T) |
164 | | send jobs of type T |
165 | | }}} |
166 | | How to maintain D(T): |
167 | | |
168 | | * Job-stream: maintain in the feeder (consider only jobs in the cache) |
169 | | * Locality: constraint: only 1 user can use locality scheduling |
170 | | * Batch: maintain in batch module |
171 | | * Co-scheduling: maintain in co-scheduling module |
172 | | |
173 | | == Maintaining debts == |
174 | | |
175 | | A user's debt is adjusted each time a job is dispatched, |
176 | | based on the estimated FLOPS of the job. |
177 | | This quantity is recorded in the job record. |
178 | | |
179 | | When a job finishes or times out |
180 | | (at which point the actual computing done is known) |
181 | | the user debt is adjusted accordingly. |
182 | | |
183 | | == An idea for prioritizing batches == |
| 47 | == Prioritizing batches == |
236 | | The scheme doesn't use the idea of |
237 | | accumulated credit proposed above. |
238 | | This is replaced by LST(U). |
| 100 | == Combining multiple scheduling types == |
| 101 | |
| 102 | BOINC potentially supports the following types of work, |
| 103 | each with its own scheduling mechanism: |
| 104 | |
| 105 | * Job-stream: the input is a stream of jobs; |
| 106 | the goal is to maximize throughput and eventually finish all jobs. |
| 107 | * Batch: described above. |
| 108 | * Locality: a variant of job-stream in which |
| 109 | jobs are assigned according to the files resident on the client. |
| 110 | * Co-scheduling: like batch, except all jobs in each batch |
| 111 | must run simultaneously (e.g., MPI applications). |
| 112 | |
| 113 | A single project (such as a portal) may have work of some or all types. |
| 114 | |
| 115 | How should we handle multiple scheduling types - |
| 116 | in particular, how can we maintain resource shares in their presence? |
| 117 | |
| 118 | One simple approach is to maintain, for each scheduling type T, |
| 119 | the least LET(T) of any unsent job of type T. |
| 120 | Then use this top-level policy: |
| 121 | {{{ |
| 122 | for each scheduling type T in order of increasing LET(T) |
| 123 | send jobs of type T |
| 124 | }}} |
| 125 | How to maintain LET(T): |
| 126 | |
| 127 | * Job-stream: maintain in the feeder (consider only jobs in the cache) |
| 128 | * Locality: constraint: only 1 user can use locality scheduling |
| 129 | * Batch: maintain in batch module |
| 130 | * Co-scheduling: maintain in co-scheduling module |
| 131 | |