Changes between Version 4 and Version 5 of AutoFlops


Ignore:
Timestamp:
Sep 21, 2009, 12:22:05 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AutoFlops

    v4 v5  
    5151The BOINC client maintains a estimate '''seconds_per_app_unit(V)'''
    5252of elapsed time per app unit for the app version V.
     53This is computed as a weighted average of seconds per app unit
     54for recently completed jobs
    5355It reports this to the scheduler.
    5456
     
    5860seconds_per_app_unit(V)
    5961* J.predicted_aus
    60 * (app.mean_actual_aus/app.mean_predicted_aus + X*au_stdev)
     62* (app.actual_aus_mean/app.predicted_aus_mean + X*app.actual_aus_stdev)
    6163}}}
    6264where X is, say, 2 (meaning that for 95% of jobs, the actual completion time
     
    107109So what we will do is to use an order statistic,
    108110say the 5th percentile value, from the distribution of raw_flops_per_au(J).
    109 We'll call this est_flops_per_au(app).
     111We'll call this '''est_flops_per_au(app)'''.
    110112
    111113Notes:
     
    123125Here's a scheme to address this problem:
    124126
    125 a) have the client report raw_flops_per_sec instead of using the server value.
    126 
    127 b) pass the client a flag saying that an app has both CPU and GPU versions.
    128 
    129 c) If a client has an app version V for an app with only GPU versions,
     127a) the server sends the client est_flops_per_au,
     128and a flag indicating whether the app has both GPU and CPU versions
     129
     130b) If a client has an app version V for an app with only GPU versions,
    130131and it also has app versions V1...Vn for apps that have
    131 both CPU and GPU versions,
    132 then use the average of the est_flops_per_au for V1...Vn
    133 in the raw_flops_per_au(J) that it reports for V.
     132both CPU and GPU versions, then let
     133{{{
     134raw_flops_per_au(J) = average of the est_flops_per_au for V1...Vn
     135}}}
     136otherwise raw_flops_per_au(J) = peak_flops_per_au(J)
    134137
    135138The net effect is to propagate efficiency estimates between projects.
     
    142145and use only that job to compute est_flops_per_au.
    143146
     1474) A job's elapsed time is influenced by factors such as
     148non-BOINC jobs and overcommitment by BOINC.
     149These can potentially lead to overestimates of FLOPS per app unit.
     150However, this effect will be minimal if a reasonable fraction (> 5%)
     151of hosts don't run non-BOINC jobs and don't have overcommitment.
     152
    144153== Credit ==
    145154
     
    166175 * resend jobs.
    167176
    168 == Anonymous platform notes ==
    169 
    170 == Implementation ==
     177== Implementation notes ==
    171178
    172179=== Protocol ===
    173180
    174181Request message:
    175  * for each reported job, add <app_units>
    176  * for each app version, add <seconds_per_app_unit>
     182 * for each reported job
     183  * <app_units>
     184  * <raw_flops_per_au>
     185 * for each app version
     186  * <seconds_per_app_unit>
     187  * <raw_flops_per_sec>
    177188
    178189Reply message:
    179  * for each app, <est_flops_per_au>
    180  * for each app version, <
     190
     191 * for each app
     192  * <est_flops_per_au>
     193 * for each app version
     194 * for each job
     195  * <peak_flops_per_sec>
    181196
    182197=== Database ===
     
    191206It then exponentially averages this value with app.est_flops_per_au.
    192207
    193 === Scheduler ===
    194 
    195 Job completion time estimate:
    196 {{{
    197 estimated_time =
    198         J.predicted_app_units
    199         * A.predicted_app_units_scale
    200         / app_version.seconds_per_app_unit   // as reported by host
    201 }}}
    202 or
    203 {{{
    204         J.predicted_app_units
    205         * A.predicted_app_units_scale
    206         * A.est_raw_flops_per_au
    207         / J.flops_estimate                                      // as estimated by scheduler
    208 }}}
    209 
    210208=== client ===
    211209
     
    216214When a job completes, update this as an exponential average.
    217215This replaces "duration correction factor".
     216
     217== Anonymous platform notes ==
     218