57 | | If two projects have the same resource share, |
58 | | they should have the same RAC. |
59 | | This suggests the following principle, |
60 | | which can apply to both work fetch and job scheduling: |
| 57 | If two projects have the same resource share, they should have the same RAC. |
| 58 | Scheduling decisions should give preference to projects |
| 59 | whose share of RAC is less than their resource share. |
62 | | * Normalize RAC and resource share so that each one sums to 1 across projects. |
63 | | * For a project P, let G(P) = share(P) - RAC(P). |
64 | | * Give priority to projects for which G(P) is highest, |
65 | | i.e. that aren't getting as much credit as they should. |
66 | | |
67 | | This does 2 things: |
68 | | |
69 | | * It's the correct semantics for resource share: |
70 | | they now control something that volunteers can actually see, |
71 | | namely credit. |
72 | | * It penalizes projects that grant inflated credit: |
73 | | the more credit a project grants, the less work a given host |
74 | | will do for it, assuming the host is attached to multiple projects. |
75 | | (The converse is potentially true - a project would get more work done |
76 | | by granting less credit. This is minimized by a mechanism described below.) |
77 | | |
78 | | Note: I've glossed over the issue of the time scale over which RAC is averaged. |
79 | | The RAC reported by servers has a half-life of a week. |
80 | | For purposes of scheduling a different (probably longer) period would be better. |
81 | | The client could potentially compute its own RAC |
82 | | based on changes in total credit. |
83 | | However, it's probably OK to just use the server-reported RAC. |
84 | | |
85 | | === Recent average FLOPS === |
86 | | |
87 | | There are some problems with credit-driven scheduling: |
| 61 | There are problems with using project-granted credit |
| 62 | as a basis for this approach: |
93 | | To deal with these issues, |
94 | | I propose using not just RAC by itself, |
95 | | but the combination of RAC and '''recent average FLOPS''' (RAF) per project. |
96 | | This is intended to address the above 2 issues, |
97 | | and the issue of projects that grant too little credit. |
| 68 | Hence we will use a surrogate called '''estimated credit''' |
| 69 | that is maintained by the client. |
| 70 | If projects grant credit fairly, and if all jobs validate, |
| 71 | then estimated credit is roughly equal to granted credit over the long term. |
| 72 | |
| 73 | Note: there is a potential advantage to using granted credit too. |
| 74 | Doing so penalizes projects that grant inflated credit: |
| 75 | the more credit a project grants, the less work a given host |
| 76 | will do for it, assuming the host is attached to multiple projects. |
| 77 | (The converse is potentially also true - a project would get more work done |
| 78 | by granting less credit. This effect could be minimized by |
| 79 | combining estimated credit with granted credit.) |
| 80 | |
| 81 | === Estimated credit === |
| 82 | |
| 83 | BOINC server software grants credit on the basis of peak FLOPS, |
| 84 | with a scaling factor applied to GPUs to normalize them relative to CPUs. |
| 85 | The normalized peak FLOPS of a GPU can be estimated. |
| 86 | |
| 87 | The estimated credit for a T-second segment of job execution is given by |
| 88 | {{{ |
| 89 | T * ninstances(P) * peak_flops(P) |
| 90 | }}} |
| 91 | summed over the processor types used by the job. |
| 92 | |
| 93 | The '''recent estimated credit''' REC(P) f a project P |
| 94 | is maintained by the client, |
| 95 | with an averaging half-life of, say, a month. |