Context Navigation

Changes between Version 32 and Version 33 of CreditNew

Timestamp:: Mar 26, 2010, 3:23:57 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v32
+                      v33
 Notes:
  * For our purposes, the peak FLOPS of a device
    uses single or double precision, whichever is higher.
+   is based on single or double precision, whichever is higher.
 == Credit system goals ==
 …
   They aren't cheat-proof, and we don't use them.
 == Peak FLOP Count (PFC) ==
+== Peak FLOP Count ==
 This system uses the Peak-FLOPS-based approach,
 …
  PFC(J) = T * peak_flops(J)
+The credit for a job J is typically proportional to PFC(J),
+but is limited and normalized in various ways.
 Notes:
  * PFC(J) is not cheat-proof;
+ * PFC(J) is not reliable;
    cheaters can falsify elapsed time or device attributes.
  * We use elapsed time instead of actual device time (e.g., CPU time).
 …
    in the trickle message.
-By default, the credit for a job J is proportional to PFC(J),
-but is limited and normalized in the following ways:
 == Computing averages ==
 …
    and we need to track this.
    This done as follows: for the first N samples
-   (N = ~100 for app versions, ~10 for hosts)
    we take the straight average.
+   After that we use an exponentially-weighted average
+   (with appropriate parameter for app version and host)
+ * A given sample may be wildly off,
+   and we can't let this mess up the average.
+   Samples after the first are capped at 10 times the current average.
+   After that we use an exponentially-weighted average with parameter A.
+   The choice of N and A depends on the entity involved;
+   for app versions (which typically get thousands of jobs per day)
+   we might use N=100 and A=.001.
+   For hosts (which typically get a few jobs per day)
+   we might use N=10 and A=.01.
+ * To reduce the effect of erroneously huge samples,
+   samples after the first are capped at X times the current average.
+   X depends on the entity:
+   maybe 10 for hosts, 100 for app versions.
  * We keep track of the number of samples,
    and use an average only if its number of samples
 …
 We maintain the following estimates:
  app.min_avg_pfc:: an estimate of the average actual FLOPS for an app
+ app.min_avg_pfc:: an estimate of the average actual FLOPS for the app
    (normalized by wu.fpops_est)
  app_version.pfc_avg:: the average of PFC(J)/wu.fpops_est for an app version.
+ app_version.pfc_scale:: a PFC scale factor for the app version
  host_app_version.pfc_avg:: for each app version V and host H,
    the average of PFC(J)/wu.fpops_est for jobs completed by H using A.
+ host_app_version.scale_probation::
+   if set, the host is suspected of cherry-picking (see below)
+   and we don't use host normalization
 == Sanity check ==
 If PFC(J) is infinite or is > wu.fpops_bound,
 J is assigned a "default PFC" and other processing is skipped.
 Default PFC is determined as follows:
+J is assigned a "default PFC" D and other processing is skipped.
+D is determined as follows:
  * If app.min_avg_pfc is defined then
 …
    D = wu.fpops_est
+We also set host_app_version.scale_probation to true
+(ensuring that the host scale factor isn't used for a while)
+and host_app_version.error_rate to an initial value
+(ensuring that jobs sent to this host are replicated for a while).
 == Cross-version normalization ==
 …
 (e.g., CPU, multi-thread, and GPU versions).
 If jobs are distributed uniformly to versions,
+all versions should get the same average credit.
+We adjust the credit per job
+so that the average is the same for each version.
+all versions should get the same average granted credit.
+To make this so, we scale PFC as follows.
 For each app, we periodically compute cpu_pfc
 …
  app.min_avg_pfc = app_version.pfc_avg
+ app_version.pfc_scale = 1
 Notes:
 …
    then this mechanism doesn't work as intended.
    One solution is to create separate apps for separate types of jobs.
  * Cheating or erroneous hosts can influence PFC^mean^(V) to some extent.
+ * Cheating or erroneous hosts can influence app_version.pfc_avg to some extent.
    This is limited by the Sanity Check mechanism,
    and by the fact that only validated jobs are used.
    The effect on credit will be negated by host normalization
    (see below).
    There may be an effect on cross-version normalization.
    This could be eliminated by computing PFC^mean^(V)
    as the sample-median value of PFC^mean^(H, V) (see below).
+   There may be an adverse effect on cross-version normalization.
+   This could be eliminated by computing app_version.pfc_avg
+   as the sample-median value of host_app_version.pfc_avg
 == Host normalization ==
 …
 Then the average credit per job should be the same for all hosts.
 We scale PFC by the factor
+To achieve this, we scale PFC by the factor
  app_version.pfc_avg / host_app_version.pfc_avg
 …
    jobs to GPUs with more processors.
+The normalization by wu.fpops_est handles this.
+The normalization by wu.fpops_est handles this
+(assuming that it's set correctly).
 Notes:
  * For apps with large variance of job sizes,
    the host normalization mechanism is prone to
+   the host normalization mechanism is vulnerable to
    a type of cheating called "cherry picking".
    A mechanism for defeating this is described below.
 …
 and it keeps track of PFC and elapsed time statistics there.
 There are separate records per resource type.
 The app_version_id encodes the app ID and the resource type
+The record's app_version_id encodes the app ID and the resource type
 (-2 for CPU, -3 for NVIDIA GPU, -4 for ATI).
 If app.min_avg_pfc is defined and
+If app.min_avg_pfc is defined,
 host_app_version.pfc_avg is above sample threshold,
+and host_app_version.scale_probation is not set,
 we normalize PFC by the factor
 …
 Notes:
+ * We don't assume that anonymous platform apps on
+   different hosts but with the same platform and resource type
+   are comparable.
+ * In the current design, anonymous platform jobs don't
+   contributed to app.min_avg_pfc,
+   but it may be used to determine their credit.
+   This may cause problems:
+   e.g., suppose a project offers an inefficient version
+   and volunteers make a much more efficient version
+   and run it anonymous platform.
+   They'd get an unfair amount of credit.
+   This could be fixed by creating app_version records
+   representing all anonymous platform apps of a given
+   platform and resource type.
 == Summary ==
 …
  approx = true;
  if pfc > wu.fpops_bound
+   host_app_version.scale_probation = true
+   host_app_version.error_rate = initial value  // replicate for a while
    if app.min_avg_pfc is defined
      F = app.min_avg_pfc * wu.fpops_est
 …
  else
    if job is anonymous platform
          if app.min_avg_pfc is defined
+     if app.min_avg_pfc is defined
        if host_app_version.pfc_avg is above sample threshold
+             approx = false
+             F = app.min_avg_pfc / host_app_version.pfc_avg
+           else
+             F = app.min_avg_pfc * wu.fpops_est
+            and not host_app_version.scale_probation
+         F = app.min_avg_pfc / host_app_version.pfc_avg
+         approx = false
+       else
+         F = app.min_avg_pfc * wu.fpops_est
      else
            F = wu.fpops_est
+       F = wu.fpops_est
    else
      F = pfc;
+     if Scale(V) is defined
+           F *= Scale(V)
+         if Scale(H, V) is defined and (H,V) is not on scale probation
+       F *= Scale(H, V)
+         host_scale = 0
+     if host_app_version.pfc_avg is above sample threshold
+          and not host_app_version.scale_probation
+           host_scale = min(10, app_version.pfc_avg / host_app_version.pfc_avg)
+     if app_version.pfc_scale is defined
+       F *= app_version.pfc_scale
+           if host_scale
+         F *= host_scale
+         approx = false
+     else
+           if host_scale
+         F *= host_scale
+         app_version.pfc_avg.update(F)
+         host_app_version.pfc_avg.update(F)
 }}}
 …
 The claimed credit of a job (in Cobblestones) is
+ C = F * 200/86400e9
+ C = F * cobblestone_scale
+where cobblestone_scale is 200/86400e9.
 If replication is not used, this is the granted credit.
 …
 {{{
  if app.min_avg_pfc is defined
    C = app.min_avg_pfc*wu.fpops_est
+   C = app.min_avg_pfc*wu.fpops_est*cobblestone_scale
  else
    C = wu.fpops_est * 200/86400e9
+   C = wu.fpops_est * cobblestone_scale
 }}}
 …
 by claiming excessive credit
 (i.e., by falsifying benchmark scores or elapsed time).
 An exaggerated claim will increase PFC^mean^(H,A),
+An exaggerated claim will increase host_app_version.pfc_avg,
 causing subsequent credit to be scaled down proportionately.
 …
 For example, claiming a PFC of 1e304.
+If PFC(J) exceeds some multiple (say, 20) of PFC^mean^(V),
+the host's error rate is set to the initial value,
+so it won't do single replication for a while,
+and scale_probation (see below) is set to true.
+== Cherry picking ==
+This is handled by the sanity check mechanism,
+which grants a default amount of credit
+and treats the host with suspicion for a while.
+=== Cherry picking ===
 Suppose an application has a mix of long and short jobs.
 …
    and now < host_scale_time, don't use the host scale factor
 The idea is to apply the host scaling factor
+The idea is to use the host scaling factor
 only if there's solid evidence that the host is NOT cherry picking.
 …
 {{{
 int    host_id;
 int    app_version_id;          // generalized for anon platform
+int    app_version_id;        // generalized for anon platform
 AVERAGE pfc;
 AVERAGE_VAR et;                         // elapsed time / wu.fpops_est
+AVERAGE_VAR et;                // elapsed time / wu.fpops_est
 double host_scale_time;
 bool scale_probation;
 …
 {{{
 double min_avg_pfc;
 bool host_scale_check;          // whether to do scale probation
+bool host_scale_check;        // whether to do scale probation
 int max_jobs_in_progress;
 int max_gpu_jobs_in_progress;