Context Navigation

Changes between Version 4 and Version 5 of AutoFlops

Timestamp:: Sep 21, 2009, 12:22:05 PM (16 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

AutoFlops

-                      v4
+                      v5
 The BOINC client maintains a estimate '''seconds_per_app_unit(V)'''
 of elapsed time per app unit for the app version V.
+This is computed as a weighted average of seconds per app unit
+for recently completed jobs
 It reports this to the scheduler.
 …
 seconds_per_app_unit(V)
 * J.predicted_aus
 * (app.mean_actual_aus/app.mean_predicted_aus + X*au_stdev)
+* (app.actual_aus_mean/app.predicted_aus_mean + X*app.actual_aus_stdev)
 }}}
 where X is, say, 2 (meaning that for 95% of jobs, the actual completion time
 …
 So what we will do is to use an order statistic,
 say the 5th percentile value, from the distribution of raw_flops_per_au(J).
 We'll call this est_flops_per_au(app).
+We'll call this '''est_flops_per_au(app)'''.
 Notes:
 …
 Here's a scheme to address this problem:
+a) have the client report raw_flops_per_sec instead of using the server value.
+b) pass the client a flag saying that an app has both CPU and GPU versions.
+c) If a client has an app version V for an app with only GPU versions,
+a) the server sends the client est_flops_per_au,
+and a flag indicating whether the app has both GPU and CPU versions
+b) If a client has an app version V for an app with only GPU versions,
 and it also has app versions V1...Vn for apps that have
+both CPU and GPU versions,
+then use the average of the est_flops_per_au for V1...Vn
+in the raw_flops_per_au(J) that it reports for V.
+both CPU and GPU versions, then let
+{{{
+raw_flops_per_au(J) = average of the est_flops_per_au for V1...Vn
+}}}
+otherwise raw_flops_per_au(J) = peak_flops_per_au(J)
 The net effect is to propagate efficiency estimates between projects.
 …
 and use only that job to compute est_flops_per_au.
+) A job's elapsed time is influenced by factors such as
+non-BOINC jobs and overcommitment by BOINC.
+These can potentially lead to overestimates of FLOPS per app unit.
+However, this effect will be minimal if a reasonable fraction (> 5%)
+of hosts don't run non-BOINC jobs and don't have overcommitment.
 == Credit ==
 …
  * resend jobs.
+== Anonymous platform notes ==
+== Implementation ==
+== Implementation notes ==
 === Protocol ===
 Request message:
+ * for each reported job, add <app_units>
+ * for each app version, add <seconds_per_app_unit>
+ * for each reported job
+  * <app_units>
+  * <raw_flops_per_au>
+ * for each app version
+  * <seconds_per_app_unit>
+  * <raw_flops_per_sec>
 Reply message:
+ * for each app, <est_flops_per_au>
+ * for each app version, <
+ * for each app
+  * <est_flops_per_au>
+ * for each app version
+ * for each job
+  * <peak_flops_per_sec>
 === Database ===
 …
 It then exponentially averages this value with app.est_flops_per_au.
-=== Scheduler ===
-Job completion time estimate:
-{{{
-estimated_time =
-        J.predicted_app_units
-        * A.predicted_app_units_scale
-        / app_version.seconds_per_app_unit   // as reported by host
-}}}
-or
-{{{
-        J.predicted_app_units
-        * A.predicted_app_units_scale
-        * A.est_raw_flops_per_au
-        / J.flops_estimate                                      // as estimated by scheduler
-}}}
 === client ===
 …
 When a job completes, update this as an exponential average.
 This replaces "duration correction factor".
+== Anonymous platform notes ==