Context Navigation

Changes between Version 3 and Version 4 of AutoFlops

Timestamp:: Sep 21, 2009, 11:12:00 AM (16 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

AutoFlops

-                      v3
+                      v4
 == App units ==
 '''App units''' are a project-defined measure of how much computation a completed job did.
+'''App units''' are a project-defined, application-specific measure of computation.
 They are typically a count of iterations of the app's main loop.
 They should be approximately proportional to FLOPs performed,
+They should be roughly proportional to FLOPs performed,
 but it doesn't matter what the proportion is
 (i.e., you don't have to count the FLOPs in your main loop).
 …
 The predictions don't have to be exact.
+In fact, it's OK if they're systematically too high or low,
+as long as there's a linear correlation.
+In fact, it's OK if they're systematically too high or low.
 However, if predicted app units are not linearly correlated with
 actual app units, bad completion time estimates will result.
 …
 == Job completion time estimates and bounds ==
+The BOINC client maintains a per-app-version estimate seconds_per_app_unit;
+The completion time estimate of a job J is
+{{{
+seconds_per_app_unit
+The BOINC client maintains a estimate '''seconds_per_app_unit(V)'''
+of elapsed time per app unit for the app version V.
+It reports this to the scheduler.
+The scheduler's completion time estimate for a job J
+using app version V on a given host is
+{{{
+seconds_per_app_unit(V)
 * J.predicted_aus
 * (app.mean_actual_aus/app.mean_predicted_aus + X*au_stdev)
 …
 == Estimating FLOPS per app unit ==
+For credit-granting purposes
+we want to estimate the number of FLOPs per app unit.
+For credit-granting purposes we want to estimate the number of FLOPs per app unit.
 When the scheduler dispatches a job,
 …
 The server sends the client peak_flops_per_sec(J).
 When the client returns a job, it includes a value
 raw_flops_per_sec(J).
+'''raw_flops_per_sec(J)'''.
 This is usually the same as peak_flops_per_sec(J)
 but it may be less (see note 2 below).
 …
 Suppose a job J is executed on a given host using app version V,
 and that it reports A app units and uses elapsed time T.
+We then define raw_flops(J) as T * raw_flops_per_sec(J).
+We define raw_flops_per_au(J) as raw_flops(J)/A.
+We then define
+{{{
+raw_flops(J) = T * raw_flops_per_sec(J)
+raw_flops_per_au(J) = raw_flops(J)/A
+}}}
 If we run jobs on lots of different hosts,
 …
 and aborting or discarding others.
+This can be discouraged either by mechanisms
+that reduce the number of jobs/day when a job is aborted or times out.
+This can be discouraged by server mechanisms:
+ * reducing the number of jobs/day when a job is aborted or times out.
+ * resend jobs.
 == Anonymous platform notes ==