Changes between Version 4 and Version 5 of AutoFlops
- Timestamp:
- Sep 21, 2009, 12:22:05 PM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
AutoFlops
v4 v5 51 51 The BOINC client maintains a estimate '''seconds_per_app_unit(V)''' 52 52 of elapsed time per app unit for the app version V. 53 This is computed as a weighted average of seconds per app unit 54 for recently completed jobs 53 55 It reports this to the scheduler. 54 56 … … 58 60 seconds_per_app_unit(V) 59 61 * J.predicted_aus 60 * (app. mean_actual_aus/app.mean_predicted_aus + X*au_stdev)62 * (app.actual_aus_mean/app.predicted_aus_mean + X*app.actual_aus_stdev) 61 63 }}} 62 64 where X is, say, 2 (meaning that for 95% of jobs, the actual completion time … … 107 109 So what we will do is to use an order statistic, 108 110 say the 5th percentile value, from the distribution of raw_flops_per_au(J). 109 We'll call this est_flops_per_au(app).111 We'll call this '''est_flops_per_au(app)'''. 110 112 111 113 Notes: … … 123 125 Here's a scheme to address this problem: 124 126 125 a) have the client report raw_flops_per_sec instead of using the server value. 126 127 b) pass the client a flag saying that an app has both CPU and GPU versions. 128 129 c) If a client has an app version V for an app with only GPU versions, 127 a) the server sends the client est_flops_per_au, 128 and a flag indicating whether the app has both GPU and CPU versions 129 130 b) If a client has an app version V for an app with only GPU versions, 130 131 and it also has app versions V1...Vn for apps that have 131 both CPU and GPU versions, 132 then use the average of the est_flops_per_au for V1...Vn 133 in the raw_flops_per_au(J) that it reports for V. 132 both CPU and GPU versions, then let 133 {{{ 134 raw_flops_per_au(J) = average of the est_flops_per_au for V1...Vn 135 }}} 136 otherwise raw_flops_per_au(J) = peak_flops_per_au(J) 134 137 135 138 The net effect is to propagate efficiency estimates between projects. … … 142 145 and use only that job to compute est_flops_per_au. 143 146 147 4) A job's elapsed time is influenced by factors such as 148 non-BOINC jobs and overcommitment by BOINC. 149 These can potentially lead to overestimates of FLOPS per app unit. 150 However, this effect will be minimal if a reasonable fraction (> 5%) 151 of hosts don't run non-BOINC jobs and don't have overcommitment. 152 144 153 == Credit == 145 154 … … 166 175 * resend jobs. 167 176 168 == Anonymous platform notes == 169 170 == Implementation == 177 == Implementation notes == 171 178 172 179 === Protocol === 173 180 174 181 Request message: 175 * for each reported job, add <app_units> 176 * for each app version, add <seconds_per_app_unit> 182 * for each reported job 183 * <app_units> 184 * <raw_flops_per_au> 185 * for each app version 186 * <seconds_per_app_unit> 187 * <raw_flops_per_sec> 177 188 178 189 Reply message: 179 * for each app, <est_flops_per_au> 180 * for each app version, < 190 191 * for each app 192 * <est_flops_per_au> 193 * for each app version 194 * for each job 195 * <peak_flops_per_sec> 181 196 182 197 === Database === … … 191 206 It then exponentially averages this value with app.est_flops_per_au. 192 207 193 === Scheduler ===194 195 Job completion time estimate:196 {{{197 estimated_time =198 J.predicted_app_units199 * A.predicted_app_units_scale200 / app_version.seconds_per_app_unit // as reported by host201 }}}202 or203 {{{204 J.predicted_app_units205 * A.predicted_app_units_scale206 * A.est_raw_flops_per_au207 / J.flops_estimate // as estimated by scheduler208 }}}209 210 208 === client === 211 209 … … 216 214 When a job completes, update this as an exponential average. 217 215 This replaces "duration correction factor". 216 217 == Anonymous platform notes == 218