5 | | * eliminate the need for projects to supply FLOPs estimate for jobs |
6 | | * eliminate the need for projects to supply FLOPS estimates for app versions (in app_plan()) |
7 | | * simplify credit-granting policy |
8 | | |
9 | | == Outline == |
10 | | |
11 | | === Server === |
12 | | For each app, maintain |
13 | | * flops_avg: the estimated average number of FLOPs used by the app's jobs |
14 | | * flops_stdev: the standard deviation of the above |
15 | | |
16 | | Note: if a project has different types of jobs for a given app, |
17 | | with widely differing durations, |
18 | | they should create separate apps for them. |
19 | | That will reduce the variance of the estimate. |
20 | | |
21 | | Initially this is set to a high value (e.g. 1 GFLOP-day). |
22 | | |
23 | | Update: whenever a complete job is reported, |
24 | | let x = duration * (host's flops_est). |
25 | | Update flops_avg in a way that favors decrease over increase. |
26 | | That way, hosts that execute the app efficiently (close to peak hardware speed) |
27 | | have a larger weight in the estimate. |
| 5 | * Eliminate the need for projects to supply FLOPs estimate for jobs |
| 6 | * Eliminate the need for projects to supply FLOPS estimates for app versions (in app_plan()) |
| 7 | * Improve job completion time estimates by using different duration correction factors for different app versions |
| 8 | * Simplify the credit-granting policy |
| 9 | |
| 10 | == App units == |
| 11 | |
| 12 | '''App units''' are a project-defined measure of how much computation a completed job did. |
| 13 | They are typically a count of iterations of the app's main loop. |
| 14 | They should be approximately proportional to FLOPs performed, |
| 15 | but it doesn't matter what the proportion is |
| 16 | (i.e., you don't have to count the FLOPs in your main loop). |
| 17 | |
| 18 | Apps use a new API to report how many app units they've performed |
| 19 | (call this right before boinc_exit()): |
| 20 | {{{ |
| 21 | void boinc_app_units(double x); |
| 22 | }}} |
| 23 | This is optional (the default is one). |
| 24 | However, if an app's jobs have highly variable size and don't report app units, |
| 25 | BOINC will tend to give too little credit for them. |
| 26 | |
| 27 | The following API functions are deprecated: |
| 28 | {{{ |
| 29 | boinc_ops_per_cpu_second() |
| 30 | boinc_ops_cumulative() |
| 31 | }}} |
| 32 | |
| 33 | Job submission has a new optional argument '''predicted_app_units''': |
| 34 | the predicted number of app units the job will take. |
| 35 | The default is one. |
| 36 | The '''rsc_fpops_est''' and '''rsc_fpops_bound''' arguments |
| 37 | to job creation are deprecated. |
| 38 | |
| 39 | The predictions don't have to be exact. |
| 40 | In fact, it's OK if they're systematically too high or low, |
| 41 | as long as there's a linear correlation. |
| 42 | However, if predicted app units are not linearly correlated with |
| 43 | actual app units, bad completion time estimates will result. |
| 44 | |
| 45 | The server maintains the mean and variance of both actual and predicted app units, |
| 46 | and scales prediction by the ratio of the means. |
| 47 | It also maintains '''au_variance''', the variance of (actual/predicted), |
| 48 | which is used for completion time estimates and bounds. |
| 49 | |
| 50 | == Job completion time estimates and bounds == |
| 51 | |
| 52 | The BOINC client maintains a per-app-version estimate seconds_per_app_unit; |
| 53 | The completion time estimate of a job J is |
| 54 | {{{ |
| 55 | seconds_per_app_unit |
| 56 | * J.predicted_aus |
| 57 | * (app.mean_actual_aus/app.mean_predicted_aus + X*au_stdev) |
| 58 | }}} |
| 59 | where X is, say, 2 (meaning that for 95% of jobs, the actual completion time |
| 60 | will be less than the estimate). |
| 61 | |
| 62 | The completion time bound for a job is the same, |
| 63 | but with X=4 or so (meaning that 99.99% of jobs will complete without |
| 64 | being aborted due to time limit). |
| 65 | |
| 66 | == Estimating FLOPS per app unit == |
| 67 | |
| 68 | For credit-granting purposes |
| 69 | we want to estimate the number of FLOPs per app unit. |
| 70 | |
| 71 | When the scheduler dispatches a job, |
| 72 | it knows the numbers of CPUs and GPUs used, |
| 73 | as well as the peak hardware FLOPS of each device |
| 74 | (for CPUs, this is the Whetstone benchmark result; |
| 75 | for GPUs, it's derived from hardware parameters). |
| 76 | We define '''peak_flops_per_sec(J)''' |
| 77 | as the sum of these counts times the peak hardware FLOPS of the devices. |
| 78 | |
| 79 | The server sends the client peak_flops_per_sec(J). |
| 80 | When the client returns a job, it includes a value |
| 81 | raw_flops_per_sec(J). |
| 82 | This is usually the same as peak_flops_per_sec(J) |
| 83 | but it may be less (see note 2 below). |
| 84 | |
| 85 | Suppose a job J is executed on a given host using app version V, |
| 86 | and that it reports A app units and uses elapsed time T. |
| 87 | We then define raw_flops(J) as T * raw_flops_per_sec(J). |
| 88 | We define raw_flops_per_au(J) as raw_flops(J)/A. |
| 89 | |
| 90 | If we run jobs on lots of different hosts, |
| 91 | the distribution of raw_flops_per_au(J) will have variance, |
| 92 | because applications run with different efficiency on different hardware. |
| 93 | Suppose an app is memory-intensive. |
| 94 | Running on machine A with a fast CPU (say 10 GFLOPS Whetstone) and a slow memory system, |
| 95 | it might do 1 GFLOPS (10% efficiency), |
| 96 | while on machine B with a slow CPU (say 1 GFLOPS Whetstone) and a fast memory system, |
| 97 | it might also do 1 GFLOPS (100% efficiency). |
| 98 | A and B would report raw_flops_per_au(J) as 10 GFLOPS and 1 GFLOPS respectively. |
| 99 | |
| 100 | In the absence of FLOPS counting, we can't estimate the true FLOPS/app_unit. |
| 101 | The best we can do is to use raw_flops_per_au(J) from the most efficient hosts. |
| 102 | So what we will do is to use an order statistic, |
| 103 | say the 5th percentile value, from the distribution of raw_flops_per_au(J). |
| 104 | We'll call this est_flops_per_au(app). |
| 105 | |
| 106 | Notes: |
| 107 | |
| 108 | 1) Typically GPUs are much less efficient than CPUs. |
| 109 | If a large fraction of jobs (say, >95%) are being done by GPUs, |
| 110 | then the 5th percentile value won't give us a good estimate. |
| 111 | To deal with this we could maintain separate arrays |
| 112 | for GPU and CPU app versions, and take the min of the two 5th-percentile values. |
| 113 | |
| 114 | 2) If an application has only very inefficient app versions, |
| 115 | we'll get an inflated estimate of FLOPS/app_unit. |
| 116 | Assuming that GPU apps tend to be more inefficient than CPU apps, |
| 117 | this will happen with projects that have only a GPU app (e.g., GPUGRID.net). |
| 118 | Here's a scheme to address this problem: |
| 119 | |
| 120 | a) have the client report raw_flops_per_sec instead of using the server value. |
| 121 | |
| 122 | b) pass the client a flag saying that an app has both CPU and GPU versions. |
| 123 | |
| 124 | c) If a client has an app version V for an app with only GPU versions, |
| 125 | and it also has app versions V1...Vn for apps that have |
| 126 | both CPU and GPU versions, |
| 127 | then use the average of the est_flops_per_au for V1...Vn |
| 128 | in the raw_flops_per_au(J) that it reports for V. |
| 129 | |
| 130 | The net effect is to propagate efficiency estimates between projects. |
| 131 | |
| 132 | 3) If an app's jobs have variance that is not reflected in app units, |
| 133 | est_flops_per_au may be way too low |
| 134 | (suppose, e.g., that 5% of jobs end immediately). |
| 135 | A solution to this problem is to define a "reference job" |
| 136 | that's executed on all machines, |
| 137 | and use only that job to compute est_flops_per_au. |
| 138 | |
| 139 | == Credit == |
| 140 | |
| 141 | Grant each validated job credit proportional to |
| 142 | {{{ |
| 143 | app.mean_app_units * app.est_flops_per_au |
| 144 | }}} |
| 145 | |
| 146 | All jobs for a given app get the same credit. |
| 147 | A user doesn't get more credit for a longer-than-average job, |
| 148 | but it averages out in the end. |
| 149 | |
| 150 | This removes the incentive for hackers to report exaggerated |
| 151 | job statistics (say, app units). |
| 152 | |
| 153 | However, it introduces a new form of credit cheating: "cherry picking", |
| 154 | i.e. completing only short jobs |
| 155 | (based on input files, or on trial execution) |
| 156 | and aborting or discarding others. |
| 157 | |
| 158 | This can be discouraged either by mechanisms |
| 159 | that reduce the number of jobs/day when a job is aborted or times out. |
| 160 | |
| 161 | == Anonymous platform notes == |
| 162 | |
| 163 | == Implementation == |
| 164 | |
| 165 | === Protocol === |
| 166 | |
| 167 | Request message: |
| 168 | * for each reported job, add <app_units> |
| 169 | * for each app version, add <seconds_per_app_unit> |
| 170 | |
| 171 | Reply message: |
| 172 | * for each app, <est_flops_per_au> |
| 173 | * for each app version, < |
| 174 | |
| 175 | === Database === |
| 176 | |
| 177 | app.est_flops_per_au |
| 178 | |
| 179 | === Transitioner === |
| 180 | |
| 181 | The transitioner maintains, for each app, |
| 182 | an array of the N most recent values of raw_flops_per_au(J) for that app. |
| 183 | When this array is full, the transitioner computes the 5th percentile value of the array. |
| 184 | It then exponentially averages this value with app.est_flops_per_au. |
| 185 | |
| 186 | === Scheduler === |