Changes between Initial Version and Version 1 of CreditNew


Ignore:
Timestamp:
Oct 30, 2009, 2:35:19 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v1 v1  
     1= New credit system design =
     2
     3== Introduction ==
     4
     5We can estimate the peak FLOPS of a given processor.
     6For CPUs, this is the Whetstone benchmark score.
     7For GPUs, it's given by a manufacturer-supplied formula.
     8
     9Applications access memory,
     10and the speed of a host's memory system is not reflected
     11in its Whetstone score.
     12So a given job might take the same amount of CPU time
     13and a 1 GFLOPS host as on a 10 GFLOPS host.
     14The "efficiency" of an application running on a given host
     15is the ratio of actual FLOPS to peak FLOPS.
     16
     17GPUs typically have a much higher (50-100X) peak speed than GPUs.
     18However, application efficiency is typically lower
     19(very roughly, 10% for GPUs, 50% for CPUs).
     20
     21== The first credit system ==
     22
     23In the first iteration of credit system, "claimed credit" was defined as
     24{{{
     25C1 = H.whetstone * J.cpu_time
     26}}}
     27There were then various schemes for taking the
     28average or min of the claimed credit of the
     29replicas of a job, and using that as the "granted credit".
     30
     31We call this system "Peak-FLOPS-based" because
     32it's based on the CPU's peak performance.
     33
     34The problem with this system is that, for a given app version,
     35efficiency can vary widely.
     36In the above example,
     37host B would claim 10X as much credit,
     38and its owner would be upset when it was granted
     39only a tenth of that.
     40
     41Furthermore, the credits granted to a given host for a
     42series of identical jobs could vary widely,
     43depending on the host it was paired with by replication.
     44
     45So host neutrality was achieved,
     46but in a way that seemed arbitrary and unfair to users.
     47
     48== The second credit system ==
     49
     50To address the problems with host neutrality,
     51we switched to the philosophy that
     52credit should be proportional to number of FLOPs actually performed
     53by the application.
     54We added API calls to let applications report this.
     55We call this approach "Actual-FLOPs-based".
     56
     57SETI@home had an application that allowed counting of FLOPs,
     58and they adopted this system.
     59They added a scaling factor so that the average credit
     60was about the same as in the first credit system.
     61
     62Not all projects could count FLOPs, however.
     63So SETI@home published their average credit per CPU second,
     64and other projects continued to use benchmark-based credit,
     65but multiplied it by a scaling factor to match SETI@home's average.
     66
     67This system had several problems:
     68
     69 * It didn't address GPUs.
     70 * project that couldn't count FLOPs still had host neutrality problem
     71 * didn't address single replication
     72
     73
     74== Goals of the new (third) credit system ==
     75
     76 * Completely automate credit - projects don't have to
     77   change code, settings, etc.
     78
     79 * Device neutrality: similar jobs should get similar credit
     80   regardless of what processor or GPU they run on.
     81
     82 * Limited project neutrality: different projects should grant
     83   about the same amount of credit per CPU hour,
     84   averaged over hosts.
     85   Projects with GPU apps should grant credit in proportion
     86   to the efficiency of the apps.
     87   (This means that projects with efficient GPU apps will
     88   grant more credit on average.  That's OK).
     89
     90== Peak FLOP Count (PFC) ==
     91
     92This system uses to the Peak-FLOPS-based approach,
     93but addresses its problems in a new way.
     94
     95When a job is issued to a host, the scheduler specifies usage(J,D),
     96J's usage of processing resource D:
     97how many CPUs, and how many GPUs (possibly fractional).
     98
     99If the job is finished in elapsed time T,
     100we define peak_flop_count(J), or PFC(J) as
     101{{{
     102PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
     103}}}
     104
     105Notes:
     106
     107 * We use elapsed time instead of actual device time (e.g., CPU time).
     108   If a job uses a resource inefficiently
     109   (e.g., a CPU job that does lots of disk I/O)
     110   PFC() won't reflect this.  That's OK.
     111 * usage(J,D) may not be accurate; e.g., a GPU job may take
     112   more or less CPU than the scheduler thinks it will.
     113   Eventually we may switch to a scheme where the client
     114   dynamically determines the CPU usage.
     115   For now, though, we'll just use the scheduler's estimate.
     116
     117The idea of the system is that granted credit for a job J
     118is proportional to PFC(J),
     119but is normalized in the following ways:
     120
     121== Version normalization ==
     122
     123
     124If a given application has multiple versions (e.g., CPU and GPU versions)
     125the average granted credit is the same for each version.
     126The adjustment is always downwards:
     127we maintain the average PFC*(V) of PFC() for each app version,
     128find the minimum X,
     129then scale each app version's jobs by (X/PFC*(V)).
     130The results is called NPFC(J).
     131
     132Notes:
     133 * This mechanism provides device neutrality.
     134 * This addresses the common situation
     135   where an app's GPU version is much less efficient than the CPU version
     136   (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
     137   To a certain extent, this mechanism shifts the system
     138   towards the "Actual FLOPs" philosophy,
     139   since credit is granted based on the most efficient app version.
     140   It's not exactly "Actual FLOPs", since the most efficient
     141   version may not be 100% efficient.
     142 * Averages are computed as a moving average,
     143   so that the system will respond quickly as job sizes change
     144   or new app versions are deployed.
     145
     146== Project normalization ==
     147
     148If an application has both CPU and GPU versions,
     149then the version normalization mechanism uses the CPU
     150version as a "sanity check" to limit the credit granted for GPU jobs.
     151
     152Suppose a project has an app with only a GPU version,
     153so there's no CPU version to act as a sanity check.
     154If we grant credit based only on GPU peak speed,
     155the project will grant much more credit per GPU hour than
     156other projects, violating limited project neutrality.
     157
     158The solution to this is: if an app has only GPU versions,
     159then we scale its granted credit by a factor,
     160obtained from a central BOINC server,
     161which is based on the average scaling factor
     162for that GPU type among projects that
     163do have both CPU and GPU versions.
     164
     165Notes:
     166
     167 * Projects will run a periodic script to update the scaling factors.
     168 * Rather than GPU type, we'll actually use plan class,
     169   since e.g. the average efficiency of CUDA 2.3 apps may be different
     170   from that of CUDA 2.1 apps.
     171 * Initially we'll obtain scaling factors from large projects
     172   that have both GPU and CPU apps (e.g., SETI@home).
     173   Eventually we'll use an average (weighted by work done) over multiple projects.
     174
     175== Host normalization ==
     176
     177For a given application, all hosts should get the same average granted credit per job.
     178To ensure this, for each application A we maintain the average NPFC*(A),
     179and for each host H we maintain NPFC*(H, A).
     180The "claimed credit" for a given job J is then
     181{{{
     182NPFC(J) * (NPFC*(A)/NPFC*(H, A))
     183}}}
     184
     185Notes:
     186 * NPFC* is averaged over jobs, not hosts.
     187 * Both averages are recent averages, so that they respond to
     188   changes in job sizes and app versions characteristics.
     189 * This assumes that all hosts are sent the same distribution of jobs.
     190   There are two situations where this is not the case:
     191   a) job-size matching, and b) GPUGrid.net's scheme for sending
     192   some (presumably larger) jobs to GPUs with more processors.
     193   To deal with this, we'll weight the average by workunit.rsc_flops_est.
     194
     195== Replication and cheating ==
     196
     197Host normalization mostly eliminates the incentive to cheat
     198by claiming excessive credit
     199(i.e., by falsifying benchmark scores or elapsed time).
     200An exaggerated claim will increase NPFC*(H,A),
     201causing subsequent claimed credit to be scaled down proportionately.
     202This means that no special cheat-prevention scheme
     203is needed for single replications;
     204granted credit = claimed credit.
     205
     206For jobs that are replicated, granted credit is be
     207set to the min of the valid results
     208(min is used instead of average to remove the incentive
     209for cherry-picking, see below).
     210
     211However, there are still some possible forms of cheating.
     212
     213 * One-time cheats (like claiming 1e304) can be prevented by
     214   capping NPFC(J) at some multiple (say, 10) of NPFC*(A).
     215 * Cherry-picking: suppose an application has two types of jobs,
     216        which run for 1 second and 1 hour respectively.
     217        Clients can figure out which is which, e.g. by running a job for 2 seconds
     218        and seeing if it's exited.
     219        Suppose a client systematically refuses the 1 hour jobs
     220        (e.g., by reporting a crash or never reporting them).
     221        Its NPFC*(H, A) will quickly decrease,
     222        and soon it will be getting several thousand times more credit
     223        per actual work than other hosts!
     224        Countermeasure:
     225        whenever a job errors out, times out, or fails to validate,
     226        set the host's error rate back to the initial default,
     227        and set its NPFC*(H, A) to NPFC*(A) for all apps A.
     228        This puts the host to a state where several dozen of its
     229        subsequent jobs will be replicated.
     230
     231== Implementation ==
     232