Context Navigation

Changes between Version 14 and Version 15 of CreditNew

Timestamp:: Nov 16, 2009, 10:57:11 AM (16 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v14
+                      v15
 New fields in '''app_version''':
 {{{
+= New credit system design =
+== Peak FLOPS and efficiency ==
+BOINC estimates the peak FLOPS of each processor.
+For CPUs, this is the Whetstone benchmark score.
+For GPUs, it's given by a manufacturer-supplied formula.
+However, other factors affect application performance.
+For example, applications access memory,
+and the speed of a host's memory system is not reflected
+in its Whetstone score.
+So a given job might take the same amount of CPU time
+and a 1 GFLOPS host as on a 10 GFLOPS host.
+The "efficiency" of an application running on a given host
+is the ratio of actual FLOPS to peak FLOPS.
+GPUs typically have a much higher (50-100X) peak FLOPS than CPUs.
+However, application efficiency is typically lower
+(very roughly, 10% for GPUs, 50% for CPUs).
+Notes:
+ * The peaks FLOPS of a device is single or double precision,
+   whichever is higher.
+   Differentiating between single and double would unnecessarily
+   complicate things, and the distinction will disappear soon anyway.
+== Credit system goals ==
+Some possible goals in designing a credit system:
+ * Device neutrality: similar jobs should get similar credit
+   regardless of what processor or GPU they run on.
+ * Project neutrality: different projects should grant
+   about the same amount of credit per day for a given processor.
+It's easy to show that both goals can't be satisfied simultaneously.
+== The first credit system ==
+In the first iteration of BOINC's credit system,
+"claimed credit" was defined as
+{{{
+C1 = H.whetstone * J.cpu_time
+}}}
+There were then various schemes for taking the
+average or min claimed credit of the replicas of a job,
+and using that as the "granted credit".
+We call this system "Peak-FLOPS-based" because
+it's based on the CPU's peak performance.
+The problem with this system is that, for a given app version,
+efficiency can vary widely between hosts.
+In the above example,
+the 10 GFLOPS host would claim 10X as much credit,
+and its owner would be upset when it was granted only a tenth of that.
+Furthermore, the credits granted to a given host for a
+series of identical jobs could vary widely,
+depending on the host it was paired with by replication.
+This seemed arbitrary and unfair to users.
+== The second credit system ==
+We then switched to the philosophy that
+credit should be proportional to number of FLOPs actually performed
+by the application.
+We added API calls to let applications report this.
+We call this approach "Actual-FLOPs-based".
+SETI@home's application allowed counting of FLOPs,
+and they adopted this system,
+adding a scaling factor so that average credit per job
+was the same as the first credit system.
+Not all projects could count FLOPs, however.
+So SETI@home published their average credit per CPU second,
+and other projects continued to use benchmark-based credit,
+but multiplied it by a scaling factor to match SETI@home's average.
+This system had several problems:
+ * It didn't address GPUs.
+ * Project that couldn't count FLOPs still had device neutrality problems.
+ * It didn't prevent credit cheating when single replication was used.
+== Goals of the new (third) credit system ==
+ * Completely automated - projects don't have to
+   change code, settings, etc.
+ * Device neutrality
+ * Limited project neutrality: different projects should grant
+   about the same amount of credit per CPU hour, averaged over hosts.
+   Projects with GPU apps should grant credit in proportion
+   to the efficiency of the apps.
+   (This means that projects with efficient GPU apps will
+   grant more credit on average.  That's OK).
+== Peak FLOP Count (PFC) ==
+This system uses the Peak-FLOPS-based approach,
+but addresses its problems in a new way.
+When a job is issued to a host, the scheduler specifies usage(J,D),
+J's usage of processing resource D:
+how many CPUs and how many GPUs (possibly fractional).
+If the job is finished in elapsed time T,
+we define peak_flop_count(J), or PFC(J) as
+{{{
+PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
+}}}
+Notes:
+ * We use elapsed time instead of actual device time (e.g., CPU time).
+   If a job uses a resource inefficiently
+   (e.g., a CPU job that does lots of disk I/O)
+   PFC() won't reflect this.  That's OK.
+   The key thing is that BOINC reserved the device for the job,
+   whether or not the job used it efficiently.
+ * usage(J,D) may not be accurate; e.g., a GPU job may take
+   more or less CPU than the scheduler thinks it will.
+   Eventually we may switch to a scheme where the client
+   dynamically determines the CPU usage.
+   For now, though, we'll just use the scheduler's estimate.
+The granted credit for a job J is proportional to PFC(J),
+but is normalized in the following ways:
+== Cross-version normalization ==
+If a given application has multiple versions (e.g., CPU and GPU versions)
+the granted credit per job is adjusted
+so that the average is the same for each version.
+The adjustment is always downwards:
+we maintain the average PFC^mean^(V) of PFC() for each app version V,
+find the minimum X.
+An app version V's jobs are then scaled by the factor
+ S(V) = (X/PFC^mean^(V))
+The result for a given job J
+is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
+ VNPFC(J) = PFC(J) * (X/PFC^mean^(V))
+Notes:
+ * This addresses the common situation
+   where an app's GPU version is much less efficient than the CPU version
+   (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
+   To a certain extent, this mechanism shifts the system
+   towards the "Actual FLOPs" philosophy,
+   since credit is granted based on the most efficient app version.
+   It's not exactly "Actual FLOPs", since the most efficient
+   version may not be 100% efficient.
+ * There are two sources of variance in PFC(V):
+   the variation in host efficiency,
+   and possibly the variation in job size.
+   If we have an ''a priori'' estimate of job size
+   (e.g., workunit.rsc_fpops_est)
+   we can normalize by this to reduce the variance,
+   and make PFC^mean^(V) converge more quickly.
+ * ''a posteriori'' estimates of job size may exist also
+   (e.g., an iteration count reported by the app)
+   but using this for anything introduces a new cheating risk,
+   so it's probably better not to.
+== Cross-project normalization ==
+If an application has both CPU and GPU versions,
+then the version normalization mechanism uses the CPU
+version as a "sanity check" to limit the credit granted to GPU jobs.
+Suppose a project has an app with only a GPU version,
+so there's no CPU version to act as a sanity check.
+If we grant credit based only on GPU peak speed,
+the project will grant much more credit per GPU hour than other projects,
+violating limited project neutrality.
+A solution to this: if an app has only GPU versions,
+then for each version V we let
+S(V) be the average scaling factor
+for that GPU type among projects that do have both CPU and GPU versions.
+This factor is obtained from a central BOINC server.
+V's jobs are then scaled by S(V) as above.
+Notes:
+ * Projects will run a periodic script to update the scaling factors.
+ * Rather than GPU type, we'll probably use plan class,
+   since e.g. the average efficiency of CUDA 2.3 apps may be different
+   than that of CUDA 2.1 apps.
+ * Initially we'll obtain scaling factors from large projects
+   that have both GPU and CPU apps (e.g., SETI@home).
+   Eventually we'll use an average (weighted by work done) over multiple projects
+   (see below).
+== Host normalization ==
+Assuming that hosts are sent jobs for a given app uniformly,
+then, for that app,
+hosts should get the same average granted credit per job.
+To ensure this, for each application A we maintain the average VNPFC^mean^(A),
+and for each host H we maintain VNPFC^mean^(H, A).
+The '''claimed FLOPS''' for a given job J is then
+ F = VNPFC(J) * (VNPFC^mean^(A)/VNPFC^mean^(H, A))
+and the claimed credit (in Cobblestones) is
+ C = F*100/86400e9
+There are some cases where hosts are not sent jobs uniformly:
+ * job-size matching (smaller jobs sent to slower hosts)
+ * GPUGrid.net's scheme for sending some (presumably larger)
+   jobs to GPUs with more processors.
+In these cases average credit per job must differ between hosts,
+according to the types of jobs that are sent to them.
+This can be done by dividing
+each sample in the computation of VNPFC^mean^ by WU.rsc_fpops_est
+(in fact, there's no reason not to always do this).
+Notes:
+ * The host normalization mechanism reduces the claimed credit of hosts
+   that are less efficient than average,
+   and increases the claimed credit of hosts that are more efficient
+   than average.
+ * VNPFC^mean^ is averaged over jobs, not hosts.
+== Computing averages ==
+We need to compute averages carefully because
+ * The quantities being averaged may gradually change over time
+   (e.g. average job size may change,
+   app version efficiency may change as new versions are deployed)
+   and we need to track this.
+ * A given sample may be wildly off,
+   and we can't let this mess up the average.
+ * Averages should be weighted by job size.
+In addition, we may as well maintain the variance of the quantities,
+although the current system doesn't use it.
+The code that does all this is
+[http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here].
+== Cross-project scaling factors ==
+We'll have a script that publishes a project's
+accounting data (see Implementation).
+The BOINC web site will collect these from a few big projects
+and publish the averages.
+== Replication and cheating ==
+Host normalization mostly eliminates the incentive to cheat
+by claiming excessive credit
+(i.e., by falsifying benchmark scores or elapsed time).
+An exaggerated claim will increase VNPFC*(H,A),
+causing subsequent claimed credit to be scaled down proportionately.
+This means that no special cheat-prevention scheme
+is needed for single replications;
+granted credit = claimed credit.
+For jobs that are replicated, granted credit should be
+set to the min of the valid results
+(min is used instead of average to remove the incentive
+for cherry-picking, see below).
+However, there are still some possible forms of cheating.
+ * One-time cheats (like claiming 1e304) can be prevented by
+   capping VNPFC(J) at some multiple (say, 10) of VNPFC^mean^(A).
+ * Cherry-picking: suppose an application has two types of jobs,
+  which run for 1 second and 1 hour respectively.
+  Clients can figure out which is which, e.g. by running a job for 2 seconds
+  and seeing if it's exited.
+  Suppose a client systematically refuses the 1 hour jobs
+  (e.g., by reporting a crash or never reporting them).
+  Its VNPFC^mean^(H, A) will quickly decrease,
+  and soon it will be getting several thousand times more credit
+  per actual work than other hosts!
+  Countermeasure:
+  whenever a job errors out, times out, or fails to validate,
+  set the host's error rate back to the initial default,
+  and set its VNPFC^mean^(H, A) to VNPFC^mean^(A) for all apps A.
+  This puts the host to a state where several dozen of its
+  subsequent jobs will be replicated.
+== Trickle credit ==
+== Job runtime estimates ==
+Unrelated to the credit proposal, but in a similar spirit.
+The server will maintain ET^mean^(H, V), the statistics of
+job runtimes (normalized by wu.rsc_fpops_est) per
+host and application version.
+The server's estimate of a job's runtime is then
+ R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
+== Error rate,host punishment, and turnaround time estimation ==
+Unrelated to the credit proposal, but in a similar spirit.
+Due to hardware problems (e.g. a malfunctioning GPU)
+a host may have a 100% error rate for one app version
+and a 0% error rate for another.
+Similar for turnaround time.
+The host punishment mechanism is designed to
+deal with malfunctioning hosts.
+For each host the server maintains '''max_results_day'''.
+This is initialized to a project-specified value (e.g. 200)
+and scaled by the number of CPUs and/or GPUs.
+It's decremented if the client reports a crash
+(but not if the job was aborted).
+It's doubled when a successful (but not necessarily valid)
+result is received.
+So we'll move the "error_rate", "turnaround_time"
+and "max_results_day" fields from the host table to host_app_version.
+== Cherry picking ==
+Suppose an application has a mix of long and short jobs.
+If a client intentionally discards
+(or aborts, or reports errors from) the long jobs,
+but completes the short jobs,
+its host scaling factor will become large,
+and it will get excessive credit for the short jobs.
+This is called "cherry picking".
+Note: the host punishment mechanism
+doesn't deal effectively with cherry picking,
+We propose the following mechanism to deal with cherry picking:
+ * For each (host, app version), maintain "host_scale_time".
+   This is the earliest time at which host scaling will be applied.
+   This is set initially, and each time a job times out
+   or errors out, to now+X, where X is the app's delay bound.
+The idea is to apply the host scaling factor
+only if there's solid evidence that the host is NOT cherry picking.
+== Implementation ==
+=== Database changes ===
+New table '''host_app''':
+{{{
+int    host_id;
+int    app_id;
+int    vnpfc_n;
+double vnpfc_sum;
+double vnpfc_exp_avg;
+}}}
+New table '''host_app_version''':
+{{{
+int    host_id;
+int    app_version_id;
+int    et_n;
+double et_sum;
+double et_exp_avg;
+// some variable for recent error rate,
+// replacing host.error_rate and host.max_results_day
+// make sure that a host w/ 1 good and 1 bad GPU gets few GPU jobs
+}}}
+New fields in '''app_version''':
+{{{
 int    pfc_n;
 double pfc_sum;
 …
 == Compatibility ==
+int    pfc_n;
+double pfc_sum;
+double pfc_exp_avg;
+double pfc_scaling_factor;
+}}}
+New fields in '''app''':
+{{{
+int    vnpfc_n;
+double vnpfc_sum;
+double vnpfc_exp_avg;
+}}}
+=== New request message fields ===
+=== New reply message fields ===
+=== Scheduler changes ===
+=== Client changes ===
+=== Validator changes ===
+=== Server APIs for computing and granting credit ===
+== Compatibility ==