133 | | The credit for a job J is proportional to PFC(J), |
134 | | but is normalized in the following ways: |
135 | | |
136 | | == ''A priori'' job size estimates == |
137 | | |
138 | | If we have an ''a priori'' estimate of job size, |
139 | | we can normalize by this to reduce the variance |
140 | | of various distributions (see below). |
141 | | This makes estimates of the means converge more quickly. |
142 | | |
143 | | We'll use workunit.rsc_fpops_est as this a priori estimate, |
144 | | and we'll denote it E(J). |
145 | | |
146 | | (''A posteriori'' estimates of job size may exist also, |
147 | | e.g., an iteration count reported by the app, |
148 | | but aren't cheat-proof; we don't use them.) |
| 151 | By default, the credit for a job J is proportional to PFC(J), |
| 152 | but is limited and normalized in the following ways: |
| 153 | |
| 154 | == Sanity check == |
| 155 | |
| 156 | If PFC(J) is infinite or is > wu.rsc_fpops_bound, |
| 157 | J is assigned a "default PFC" and other processing is skipped. |
| 158 | Default PFC is determined as follows: |
| 159 | |
| 160 | * If min_avg_pfc(A) is defined (see below) then |
| 161 | |
| 162 | D = min_avg_pfc(A) * E(J) |
| 163 | |
| 164 | * Otherwise |
| 165 | |
| 166 | D = wu.rsc_fpops_est |
178 | | |
179 | | == Cross-project normalization == |
| 214 | * Cheating or erroneous hosts can influence PFC^mean^(V) to |
| 215 | some extent. |
| 216 | This is limited by the Sanity Check mechanism, |
| 217 | and by the fact that only validated jobs are used. |
| 218 | The effect on credit will be negated by host normalization |
| 219 | (see below). |
| 220 | There may be an effect on cross-version normalization. |
| 221 | This could be eliminated by computing PFC^mean^(V) |
| 222 | as the sample-median value of PFC^mean^(H, V) (see below). |
| 223 | |
| 224 | == Host normalization == |
| 225 | |
| 226 | The second normalization is across hosts. |
| 227 | Assume jobs for a given app are distributed uniformly among hosts. |
| 228 | Then the average credit per job should be the same for all hosts. |
| 229 | To ensure this, for each app version V and host H |
| 230 | we maintain PFC^mean^(H, A), |
| 231 | the average of PFC(J)/E(J) for jobs completed by H using A. |
| 232 | |
| 233 | This yields the host scaling factor |
| 234 | |
| 235 | Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A)) |
| 236 | |
| 237 | There are some cases where hosts are not sent jobs uniformly: |
| 238 | |
| 239 | * job-size matching (smaller jobs sent to slower hosts) |
| 240 | * GPUGrid.net's scheme for sending some (presumably larger) |
| 241 | jobs to GPUs with more processors. |
| 242 | |
| 243 | The normalization by E(J) handles this |
| 244 | (assuming that wu.fpops_est is set appropriately). |
| 245 | |
| 246 | Notes: |
| 247 | * For some apps, the host normalization mechanism is prone to |
| 248 | a type of cheating called "cherry picking". |
| 249 | A mechanism for defeating this is described below. |
| 250 | * The host normalization mechanism reduces the claimed credit of hosts |
| 251 | that are less efficient than average, |
| 252 | and increases the claimed credit of hosts that are more efficient |
| 253 | than average. |
| 254 | |
| 255 | == Computing averages == |
| 256 | |
| 257 | Computation of averages needs to take into account: |
| 258 | |
| 259 | * The quantities being averaged may gradually change over time |
| 260 | (e.g. average job size may change) |
| 261 | and we need to track this. |
| 262 | This done as follows: for the first N samples |
| 263 | (N = ~100 for app versions, ~10 for hosts) |
| 264 | we take the straight average. |
| 265 | After that we use an exponential average |
| 266 | (with appropriate alpha for app version and host) |
| 267 | |
| 268 | * A given sample may be wildly off, |
| 269 | and we can't let this mess up the average. |
| 270 | Non-first samples are capped at 10 times the current average. |
| 271 | |
| 272 | == Anonymous platform == |
| 273 | |
| 274 | For anonymous platform apps, |
| 275 | since we don't reliably know anything about the devices involved, |
| 276 | we don't try to estimate PFC. |
| 277 | |
| 278 | For each app, we maintain min_avg_pfc(A), |
| 279 | the average PFC for the most efficient version of A. |
| 280 | |
| 281 | The claimed credit for anonymous platform jobs is |
| 282 | |
| 283 | claimed_credit^mean^(A)*E(J) |
| 284 | |
| 285 | The server maintains host_app_version records for anonymous platform, |
| 286 | and it keeps track of elapsed time statistics there. |
| 287 | These have app_version_id = -2 for CPU, -3 for NVIDIA GPU, -4 for ATI. |
| 288 | |
| 289 | == Claimed and granted credit == |
| 290 | |
| 291 | The '''claimed FLOPS''' for a given job J is |
| 292 | |
| 293 | F = PFC(J) * S(V) * S(H) |
| 294 | |
| 295 | and the claimed credit (in Cobblestones) is |
| 296 | |
| 297 | C = F*100/86400e9 |
| 298 | |
| 299 | When replication is used, |
| 300 | We take the set of hosts that |
| 301 | are not anon platform and not on scale probation (see below). |
| 302 | If this set is nonempty, we grant the average of their claimed credit. |
| 303 | Otherwise we grant |
| 304 | |
| 305 | claimed_credit^mean^(A)*E(J) |
| 306 | |
| 307 | == Cross-project version normalization == |
222 | | |
223 | | == Host normalization == |
224 | | |
225 | | The second normalization is across hosts. |
226 | | Assume jobs for a given app are distributed uniformly among hosts. |
227 | | Then the average credit per job should be the same for all hosts. |
228 | | To ensure this, for each app version V and host H |
229 | | we maintain PFC^mean^(H, A), |
230 | | the average of PFC(J)/E(J) for jobs completed by H using A. |
231 | | |
232 | | This yields the host scaling factor |
233 | | |
234 | | Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A)) |
235 | | |
236 | | There are some cases where hosts are not sent jobs uniformly: |
237 | | |
238 | | * job-size matching (smaller jobs sent to slower hosts) |
239 | | * GPUGrid.net's scheme for sending some (presumably larger) |
240 | | jobs to GPUs with more processors. |
241 | | |
242 | | The normalization by E(J) handles this |
243 | | (assuming that wu.fpops_est is set appropriately). |
244 | | |
245 | | Notes: |
246 | | * The host normalization mechanism reduces the claimed credit of hosts |
247 | | that are less efficient than average, |
248 | | and increases the claimed credit of hosts that are more efficient |
249 | | than average. |
250 | | |
251 | | == Claimed credit == |
252 | | |
253 | | The '''claimed FLOPS''' for a given job J is then |
254 | | |
255 | | F = PFC(J) * S(V) * S(H) |
256 | | |
257 | | and the claimed credit (in Cobblestones) is |
258 | | |
259 | | C = F*100/86400e9 |
260 | | |
261 | | == Computing averages == |
262 | | |
263 | | We need to compute averages carefully because |
264 | | |
265 | | * The quantities being averaged may gradually change over time |
266 | | (e.g. average job size may change) |
267 | | and we need to track this. |
268 | | * A given sample may be wildly off, |
269 | | and we can't let this mess up the average. |
270 | | |
271 | | The code that does this is |
272 | | [http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here]. |
273 | | |
274 | | == Anonymous platform == |
275 | | |
276 | | For anonymous platform apps, |
277 | | since we don't reliably know anything about the devices involved, |
278 | | we don't try to estimate PFC. |
279 | | |
280 | | For each app, we maintain min_avg_pfc(A), |
281 | | the average PFC for the most efficient version of A. |
282 | | |
283 | | The claimed credit for anonymous platform jobs is |
284 | | |
285 | | claimed_credit^mean^(A)*E(J) |
286 | | |
287 | | The server maintains host_app_version records for anonymous platform, |
288 | | and it keeps track of elapsed time statistics there. |
289 | | These have app_version_id = -2 for CPU, -3 for NVIDIA GPU, -4 for ATI. |
290 | | |
291 | | == Replication == |
292 | | |
293 | | We take the set of hosts that |
294 | | are not anon platform and not on scale probation (see below). |
295 | | If this set is nonempty, we grant the average of their claimed credit. |
296 | | Otherwise we grant |
297 | | |
298 | | claimed_credit^mean^(A)*E(J) |