Context Navigation

Changes between Version 30 and Version 31 of CreditNew

Timestamp:: Mar 26, 2010, 1:36:22 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v30
+                      v31
 == ''A priori'' job size estimates and bounds ==
+Projects supply estimates of the FLOPs used by a job
+(wu.rsc_fpops_est)
+and a limit on FLOPS, after which the job will be aborted
+(wu.rsc_fpops_bound).
+Previously, inaccuracy of rsc_fpops_est caused problems.
+The new system still uses rsc_fpops_est,
+but its primary purpose is now to indicate the relative size of jobs.
+Averages of job sizes are normalized by rsc_fpops_est,
+and if rsc_fpops_est is correlated with actual size,
+For each job, the project supplies
+ * an estimate of the FLOPs used by a job (wu.fpops_est)
+ * a limit on FLOPS, after which the job will be aborted
+  (wu.fpops_bound).
+Previously, inaccuracy of fpops_est caused problems.
+The new system still uses fpops_est,
+but its primary purpose is now to indicate the relative sizes of jobs.
+Averages of FLOP count and elapsed time
+are normalized by fpops_est (see below),
+and if fpops_est is correlated with actual size,
 these averages will converge more quickly.
-We'll denote workunit.rsc_fpops_est as E(J).
 Notes:
 …
 based on the resources used by the job and their peak speeds.
 If the job is finished in elapsed time T,
+When the job is finished in elapsed time T,
 we define peak_flop_count(J), or PFC(J) as
 …
 Notes:
+ * PFC(J) is not cheat-proof; e.g. cheaters can falsify elapsed time.
+ * PFC(J) is not cheat-proof;
+   cheaters can falsify elapsed time or device attributes.
  * We use elapsed time instead of actual device time (e.g., CPU time).
    If a job uses a resource inefficiently
 …
 but is limited and normalized in the following ways:
+== Computing averages ==
+The policies described below involve computing averages
+of various quantities.
+This computation must take into account:
+ * The quantities being averaged may gradually change over time
+   (e.g. average job size may change)
+   and we need to track this.
+   This done as follows: for the first N samples
+   (N = ~100 for app versions, ~10 for hosts)
+   we take the straight average.
+   After that we use an exponentially-weighted average
+   (with appropriate parameter for app version and host)
+ * A given sample may be wildly off,
+   and we can't let this mess up the average.
+   Samples after the first are capped at 10 times the current average.
+ * We keep track of the number of samples,
+   and use an average only if its number of samples
+   is above a '''sample threshold'''.
+== Data ==
+We maintain the following estimates:
+ app.min_avg_pfc:: an estimate of the average actual FLOPS for an app
+   (normalized by wu.fpops_est)
+ app_version.pfc_avg:: the average of PFC(J)/wu.fpops_est for an app version.
+ host_app_version.pfc_avg:: for each app version V and host H,
+   the average of PFC(J)/wu.fpops_est for jobs completed by H using A.
 == Sanity check ==
 If PFC(J) is infinite or is > wu.rsc_fpops_bound,
+If PFC(J) is infinite or is > wu.fpops_bound,
 J is assigned a "default PFC" and other processing is skipped.
 Default PFC is determined as follows:
  * If min_avg_pfc(A) is defined (see below) then
  D = min_avg_pfc(A) * E(J)
+ * If app.min_avg_pfc is defined then
+ D = app.min_avg_pfc * wu.fpops_est
  * Otherwise
  D = wu.rsc_fpops_est
+ D = wu.fpops_est
 == Cross-version normalization ==
 …
 so that the average is the same for each version.
+We maintain the average PFC^mean^(V) of PFC(J)/E(J) for each app version V.
+We periodically compute PFC^mean^(CPU) and PFC^mean^(GPU),
+and compute X as follows:
+For each app, we periodically compute cpu_pfc
+(the weighted average of app_version.pfc over CPU app versions)
+and similarly gpu_pfc.
+We then compute X as follows:
  * If there are only CPU or only GPU versions,
    and at least 2 versions are above a sample threshold,
    X is the average.
  * If there are both, and at least 1 of each is above a sample
+   and at least 2 versions are above sample threshold,
+   X is their average (weighted by # samples).
+ * If there are both, and at least 1 of each is above sample
    threshold, let X be the min of the averages.
+If X is defined, then for each version V we set
+ Scale(V) = (X/PFC^mean^(V))
+An app version V's jobs are scaled by this factor.
+For each app, we maintain min_avg_pfc(A),
+the average PFC for the most efficient version of A.
+This is an estimate of the app's average actual FLOPS.
+If X is defined, then for each app version
+ app_version.pfc_scale = (X/app_version.pfc_avg)
+The PFC of the app version's jobs are scaled by this factor.
 If X is defined, then we set
+ min_avg_pfc(A) = X
+Otherwise, if a version V is above sample threshold, we set
+ min_avg_pfc(A) = PFC^mean^(V)
+Notes:
+ app.min_avg_pfc = X
+Otherwise, if an app version is above sample threshold, we set
+ app.min_avg_pfc = app_version.pfc_avg
+Notes:
+ * Doesn't host normalization (see below) subsume version normalization?
+   Not if there are both CPU and GPU versions, because of the "min".
  * Version normalization is only applied if at least two
    versions are above sample threshold.
 …
 Assume jobs for a given app are distributed uniformly among hosts.
 Then the average credit per job should be the same for all hosts.
+To ensure this, for each app version V and host H
+we maintain PFC^mean^(H, A),
+the average of PFC(J)/E(J) for jobs completed by H using A.
+This yields the host scaling factor
+ Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A))
+We scale PFC by the factor
+ app_version.pfc_avg / host_app_version.pfc_avg
 There are some cases where hosts are not sent jobs uniformly:
 …
    jobs to GPUs with more processors.
 The normalization by E(J) handles this
+(assuming that wu.fpops_est is set appropriately).
+Notes:
  * For some apps, the host normalization mechanism is prone to
+The normalization by wu.fpops_est handles this.
+Notes:
+ * For apps with large variance of job sizes,
+   the host normalization mechanism is prone to
    a type of cheating called "cherry picking".
    A mechanism for defeating this is described below.
 …
    and increases the claimed credit of hosts that are more efficient
    than average.
-== Computing averages ==
-Computation of averages needs to take into account:
- * The quantities being averaged may gradually change over time
-   (e.g. average job size may change)
-   and we need to track this.
-   This done as follows: for the first N samples
-   (N = ~100 for app versions, ~10 for hosts)
-   we take the straight average.
-   After that we use an exponential average
-   (with appropriate alpha for app version and host)
- * A given sample may be wildly off,
-   and we can't let this mess up the average.
-   Non-first samples are capped at 10 times the current average.
 == Anonymous platform ==
 …
 (-2 for CPU, -3 for NVIDIA GPU, -4 for ATI).
 If min_avg_pfc(A) is defined and
 PFC^mean^(H, V) is above a sample threshold,
+If app.min_avg_pfc is defined and
+host_app_version.pfc_avg is above sample threshold,
 we normalize PFC by the factor
  min_avg_pfc(A)/PFC^mean^(H, V)
+ app.min_avg_pfc/host_app_version.pfc_avg
 Otherwise the claimed PFC is
+ min_avg_pfc(A)*E(J)
+If min_avg_pfc(A) is not defined, the claimed PFC is
+ wu.rsc_fpops_est
+ app.min_avg_pfc(A)*wu.fpops_est
+If app.min_avg_pfc is not defined, the claimed PFC is
+ wu.fpops_est
+Notes:
+ * We don't assume that anonymous platform apps on
+   different hosts but with the same platform and resource type
+   are comparable.
 == Summary ==
 …
  * the "claimed PFC" F
+ * a flag "approx" that is true if F
+   is an approximation and may not be comparable
+   with other instances of the job
+ * a flag "approx" that is true if F is an approximation
+   and may not be comparable with other instances of the job
 The algorithm:
 …
  pfc = peak FLOP count(J)
  approx = true;
  if pfc > wu.rsc_fpops_bound
    if min_avg_pfc(A) is defined
      F = min_avg_pfc(A) * E(J)
+ if pfc > wu.fpops_bound
+   if app.min_avg_pfc is defined
+     F = app.min_avg_pfc * wu.fpops_est
    else
      F = wu.rsc_fpops_est
+     F = wu.fpops_est
  else
    if job is anonymous platform
+     hav = host_app_version record
+         if min_avg_pfc(A) is defined
+       if hav.pfc.n > threshold
+         if app.min_avg_pfc is defined
+       if host_app_version.pfc_avg is above sample threshold
              approx = false
              F = min_avg_pfc(A) /hav.pfc.avg
+             F = app.min_avg_pfc / host_app_version.pfc_avg
            else
              F = min_avg_pfc(A) * E(J)
+             F = app.min_avg_pfc * wu.fpops_est
      else
            F = wu.rsc_fpops_est
+           F = wu.fpops_est
    else
      F = pfc;
 …
 The claimed credit of a job (in Cobblestones) is
  C = F* 200/86400e9
+ C = F * 200/86400e9
 If replication is not used, this is the granted credit.
 …
 Otherwise:
  if min_avg_pfc(A) is defined
    C = min_avg_pfc(A)*E(J)
+ if app.min_avg_pfc is defined
+   C = app.min_avg_pfc*wu.fpops_est
  else
    C = wu.rsc_fpops_est * 200/86400e9
+   C = wu.fpops_est * 200/86400e9
 == Cross-project version normalization ==
 …
 Unrelated to the credit proposal, but in a similar spirit.
 The server will maintain ET^mean^(H, V), the statistics of
 job runtimes (normalized by wu.rsc_fpops_est) per
+job runtimes (normalized by wu.fpops_est) per
 host and application version.
 The server's estimate of a job's runtime is then
  R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
+ R(J, H) = wu.fpops_est * ET^mean^(H, V)
 …
 int    app_version_id;          // generalized for anon platform
 AVERAGE pfc;
 AVERAGE_VAR et;                         // elapsed time / wu.rsc_fpops_est
+AVERAGE_VAR et;                         // elapsed time / wu.fpops_est
 double host_scale_time;
 bool scale_probation;