Context Navigation

Changes between Version 17 and Version 18 of CreditNew

Timestamp:: Nov 16, 2009, 12:28:55 PM (16 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v17
+                      v18
 New fields in '''app_version''':
 {{{
-= New credit system design =
-== Peak FLOPS and efficiency ==
-BOINC estimates the peak FLOPS of each processor.
-For CPUs, this is the Whetstone benchmark score.
-For GPUs, it's given by a manufacturer-supplied formula.
-However, other factors affect application performance.
-For example, applications access memory,
-and the speed of a host's memory system is not reflected
-in its Whetstone score.
-So a given job might take the same amount of CPU time
-and a 1 GFLOPS host as on a 10 GFLOPS host.
-The "efficiency" of an application running on a given host
-is the ratio of actual FLOPS to peak FLOPS.
-GPUs typically have a much higher (50-100X) peak FLOPS than CPUs.
-However, application efficiency is typically lower
-(very roughly, 10% for GPUs, 50% for CPUs).
-Notes:
- * The peaks FLOPS of a device is single or double precision,
-   whichever is higher.
-   Differentiating between single and double would unnecessarily
-   complicate things, and the distinction will disappear soon anyway.
-== Credit system goals ==
-Some possible goals in designing a credit system:
- * Device neutrality: similar jobs should get similar credit
-   regardless of what processor or GPU they run on.
- * Project neutrality: different projects should grant
-   about the same amount of credit per day for a given processor.
-It's easy to show that both goals can't be satisfied simultaneously.
-== The first credit system ==
-In the first iteration of BOINC's credit system,
-"claimed credit" was defined as
-{{{
-C1 = H.whetstone * J.cpu_time
-}}}
-There were then various schemes for taking the
-average or min claimed credit of the replicas of a job,
-and using that as the "granted credit".
-We call this system "Peak-FLOPS-based" because
-it's based on the CPU's peak performance.
-The problem with this system is that, for a given app version,
-efficiency can vary widely between hosts.
-In the above example,
-the 10 GFLOPS host would claim 10X as much credit,
-and its owner would be upset when it was granted only a tenth of that.
-Furthermore, the credits granted to a given host for a
-series of identical jobs could vary widely,
-depending on the host it was paired with by replication.
-This seemed arbitrary and unfair to users.
-== The second credit system ==
-We then switched to the philosophy that
-credit should be proportional to number of FLOPs actually performed
-by the application.
-We added API calls to let applications report this.
-We call this approach "Actual-FLOPs-based".
-SETI@home's application allowed counting of FLOPs,
-and they adopted this system,
-adding a scaling factor so that average credit per job
-was the same as the first credit system.
-Not all projects could count FLOPs, however.
-So SETI@home published their average credit per CPU second,
-and other projects continued to use benchmark-based credit,
-but multiplied it by a scaling factor to match SETI@home's average.
-This system had several problems:
- * It didn't address GPUs.
- * Project that couldn't count FLOPs still had device neutrality problems.
- * It didn't prevent credit cheating when single replication was used.
-== Goals of the new (third) credit system ==
- * Completely automated - projects don't have to
-   change code, settings, etc.
- * Device neutrality
- * Limited project neutrality: different projects should grant
-   about the same amount of credit per CPU hour, averaged over hosts.
-   Projects with GPU apps should grant credit in proportion
-   to the efficiency of the apps.
-   (This means that projects with efficient GPU apps will
-   grant more credit on average.  That's OK).
-== Peak FLOP Count (PFC) ==
-This system uses the Peak-FLOPS-based approach,
-but addresses its problems in a new way.
-When a job is issued to a host, the scheduler specifies usage(J,D),
-J's usage of processing resource D:
-how many CPUs and how many GPUs (possibly fractional).
-If the job is finished in elapsed time T,
-we define peak_flop_count(J), or PFC(J) as
-{{{
-PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
-}}}
-Notes:
- * We use elapsed time instead of actual device time (e.g., CPU time).
-   If a job uses a resource inefficiently
-   (e.g., a CPU job that does lots of disk I/O)
-   PFC() won't reflect this.  That's OK.
-   The key thing is that BOINC reserved the device for the job,
-   whether or not the job used it efficiently.
- * usage(J,D) may not be accurate; e.g., a GPU job may take
-   more or less CPU than the scheduler thinks it will.
-   Eventually we may switch to a scheme where the client
-   dynamically determines the CPU usage.
-   For now, though, we'll just use the scheduler's estimate.
-The granted credit for a job J is proportional to PFC(J),
-but is normalized in the following ways:
-== Cross-version normalization ==
-If a given application has multiple versions (e.g., CPU and GPU versions)
-the granted credit per job is adjusted
-so that the average is the same for each version.
-The adjustment is always downwards:
-we maintain the average PFC^mean^(V) of PFC() for each app version V,
-find the minimum X.
-An app version V's jobs are then scaled by the factor
- S(V) = (X/PFC^mean^(V))
-The result for a given job J
-is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
- VNPFC(J) = PFC(J) * (X/PFC^mean^(V))
-Notes:
- * This addresses the common situation
-   where an app's GPU version is much less efficient than the CPU version
-   (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
-   To a certain extent, this mechanism shifts the system
-   towards the "Actual FLOPs" philosophy,
-   since credit is granted based on the most efficient app version.
-   It's not exactly "Actual FLOPs", since the most efficient
-   version may not be 100% efficient.
- * There are two sources of variance in PFC(V):
-   the variation in host efficiency,
-   and possibly the variation in job size.
-   If we have an ''a priori'' estimate of job size
-   (e.g., workunit.rsc_fpops_est)
-   we can normalize by this to reduce the variance,
-   and make PFC^mean^(V) converge more quickly.
- * ''a posteriori'' estimates of job size may exist also
-   (e.g., an iteration count reported by the app)
-   but using this for anything introduces a new cheating risk,
-   so it's probably better not to.
-== Cross-project normalization ==
-If an application has both CPU and GPU versions,
-then the version normalization mechanism uses the CPU
-version as a "sanity check" to limit the credit granted to GPU jobs.
-Suppose a project has an app with only a GPU version,
-so there's no CPU version to act as a sanity check.
-If we grant credit based only on GPU peak speed,
-the project will grant much more credit per GPU hour than other projects,
-violating limited project neutrality.
-A solution to this: if an app has only GPU versions,
-then for each version V we let
-S(V) be the average scaling factor
-for that GPU type among projects that do have both CPU and GPU versions.
-This factor is obtained from a central BOINC server.
-V's jobs are then scaled by S(V) as above.
-Notes:
- * Projects will run a periodic script to update the scaling factors.
- * Rather than GPU type, we'll probably use plan class,
-   since e.g. the average efficiency of CUDA 2.3 apps may be different
-   than that of CUDA 2.1 apps.
- * Initially we'll obtain scaling factors from large projects
-   that have both GPU and CPU apps (e.g., SETI@home).
-   Eventually we'll use an average (weighted by work done) over multiple projects
-   (see below).
-== Host normalization ==
-Assuming that hosts are sent jobs for a given app uniformly,
-then, for that app,
-hosts should get the same average granted credit per job.
-To ensure this, for each application A we maintain the average VNPFC^mean^(A),
-and for each host H we maintain VNPFC^mean^(H, A).
-The '''claimed FLOPS''' for a given job J is then
- F = VNPFC(J) * (VNPFC^mean^(A)/VNPFC^mean^(H, A))
-and the claimed credit (in Cobblestones) is
- C = F*100/86400e9
-There are some cases where hosts are not sent jobs uniformly:
- * job-size matching (smaller jobs sent to slower hosts)
- * GPUGrid.net's scheme for sending some (presumably larger)
-   jobs to GPUs with more processors.
-In these cases average credit per job must differ between hosts,
-according to the types of jobs that are sent to them.
-This can be done by dividing
-each sample in the computation of VNPFC^mean^ by WU.rsc_fpops_est
-(in fact, there's no reason not to always do this).
-Notes:
- * The host normalization mechanism reduces the claimed credit of hosts
-   that are less efficient than average,
-   and increases the claimed credit of hosts that are more efficient
-   than average.
- * VNPFC^mean^ is averaged over jobs, not hosts.
-== Computing averages ==
-We need to compute averages carefully because
- * The quantities being averaged may gradually change over time
-   (e.g. average job size may change,
-   app version efficiency may change as new versions are deployed)
-   and we need to track this.
- * A given sample may be wildly off,
-   and we can't let this mess up the average.
- * Averages should be weighted by job size.
-In addition, we may as well maintain the variance of the quantities,
-although the current system doesn't use it.
-The code that does all this is
-[http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here].
-== Cross-project scaling factors ==
-We'll have a script that publishes a project's
-accounting data (see Implementation).
-The BOINC web site will collect these from a few big projects
-and publish the averages.
-== Replication and cheating ==
-Host normalization mostly eliminates the incentive to cheat
-by claiming excessive credit
-(i.e., by falsifying benchmark scores or elapsed time).
-An exaggerated claim will increase VNPFC*(H,A),
-causing subsequent claimed credit to be scaled down proportionately.
-This means that no special cheat-prevention scheme
-is needed for single replications;
-granted credit = claimed credit.
-For jobs that are replicated, granted credit should be
-set to the min of the valid results
-(min is used instead of average to remove the incentive
-for cherry-picking, see below).
-However, there are still some possible forms of cheating.
- * One-time cheats (like claiming 1e304) can be prevented by
-   capping VNPFC(J) at some multiple (say, 10) of VNPFC^mean^(A).
- * Cherry-picking: suppose an application has two types of jobs,
-  which run for 1 second and 1 hour respectively.
-  Clients can figure out which is which, e.g. by running a job for 2 seconds
-  and seeing if it's exited.
-  Suppose a client systematically refuses the 1 hour jobs
-  (e.g., by reporting a crash or never reporting them).
-  Its VNPFC^mean^(H, A) will quickly decrease,
-  and soon it will be getting several thousand times more credit
-  per actual work than other hosts!
-  Countermeasure:
-  whenever a job errors out, times out, or fails to validate,
-  set the host's error rate back to the initial default,
-  and set its VNPFC^mean^(H, A) to VNPFC^mean^(A) for all apps A.
-  This puts the host to a state where several dozen of its
-  subsequent jobs will be replicated.
-== Trickle credit ==
-CPDN breaks jobs into segments,
-sends a trickle-up message for each segment,
-and grants credit for each completed segment.
-In this case,
-the trickle message handlers should not grant a fixed amount of credit.
-Instead, the trickle-up messages should contain
-an "incremental elapsed time" field.
-== Job runtime estimates ==
-Unrelated to the credit proposal, but in a similar spirit.
-The server will maintain ET^mean^(H, V), the statistics of
-job runtimes (normalized by wu.rsc_fpops_est) per
-host and application version.
-The server's estimate of a job's runtime is then
- R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
-== Error rate, host punishment, and turnaround time estimation ==
-Unrelated to the credit proposal, but in a similar spirit.
-Due to hardware problems (e.g. a malfunctioning GPU)
-a host may have a 100% error rate for one app version
-and a 0% error rate for another.
-Similar for turnaround time.
-So we'll move the "error_rate" and "turnaround_time"
-fields from the host table to host_app_version.
-The host punishment mechanism is designed to deal with malfunctioning hosts.
-For each host the server maintains '''max_results_day'''.
-This is initialized to a project-specified value (e.g. 200)
-and scaled by the number of CPUs and/or GPUs.
-It's decremented if the client reports a crash
-(but not if the job was aborted).
-It's doubled when a successful (but not necessarily valid)
-result is received.
-This should also be per-app-version,
-so we'll move "max_results_day" from the host table to host_app_version.
-== Cherry picking ==
-Suppose an application has a mix of long and short jobs.
-If a client intentionally discards
-(or aborts, or reports errors from) the long jobs,
-but completes the short jobs,
-its host scaling factor will become large,
-and it will get excessive credit for the short jobs.
-This is called "cherry picking".
-The host punishment mechanism
-doesn't deal effectively with cherry picking,
-We propose the following mechanism to deal with cherry picking:
- * For each (host, app version) maintain "host_scale_time".
-   This is the earliest time at which host scaling will be applied.
- * for each (host, app version) maintain "scale_probation"
-   (initially true).
- * When send a job to a host,
-   if scale_probation is true,
-   set host_scale_time to now+X, where X is the app's delay bound.
- * When a job is successfully validated,
-   and now > host_scale_time,
-   set scale_probation to false.
- * If a job times out or errors out,
-   set scale_probation to true,
-   max the scale factor with 1,
-   and set host_scale_time to now+X.
- * when computing claimed credit for a job,
-   and now < host_scale_time, don't use the host scale factor
-The idea is to apply the host scaling factor
-only if there's solid evidence that the host is NOT cherry picking.
-Because this mechanism is punitive to hosts
-that experience actual failures,
-we'll make it selectable on a per-application basis (default off).
-In addition, to limit the extent of cheating
-(in case the above mechanism is defeated somehow)
-the host scaling factor will be min'd with a
-project-wide config parameter (default, say, 3).
-== Implementation ==
-=== Database changes ===
-New table '''host_app''':
-{{{
-int    host_id;
-int    app_id;
-int    vnpfc_n;
-double vnpfc_sum;
-double vnpfc_exp_avg;
-}}}
-New table '''host_app_version''':
-{{{
-int    host_id;
-int    app_version_id;
-int    et_n;
-double et_sum;
-double et_exp_avg;
-// some variable for recent error rate,
-// replacing host.error_rate and host.max_results_day
-// make sure that a host w/ 1 good and 1 bad GPU gets few GPU jobs
-}}}
-New fields in '''app_version''':
-{{{
 int    pfc_n;
 double pfc_sum;