Changes between Initial Version and Version 1 of CreditOptions


Ignore:
Timestamp:
May 1, 2018, 1:04:38 PM (6 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditOptions

    v1 v1  
     1= Credit options (proposal)
     2
     3"Credit" is a number associated with completed jobs,
     4reflecting how much floating-point computation was (or could have been) done.
     5For CPU applications the basic formula is:
     6
     71 unit of credit = 1/200 day runtime on a CPU whose Whetstone benchmark is 1 GFLOPS.
     8
     9Whetstone measures peak performance, and applications that do a lot of memory or disk access get lower FLOPS.
     10So credit measures peak, not actual, FLOPs.
     11
     12Credit is used for two purposes:
     13
     141) For users, to see their rate of progress,
     15to compete with other users or teams,
     16and to compare the performance of hosts.
     17
     182) To get an estimate of the peak performance available to a particular project,
     19or of the volunteer host pool as a whole.
     20
     21For 2) we care only about averages.
     22For 1) we also care about parity between similar jobs;
     23users get upset if someone else gets a lot more credit for a similar job.
     24
     25BOINC provides 4 ways of determining credit.
     26The choice (per app) depends on the properties of the app:
     27
     28 * If you can estimate a job's FLOPs in advance, use '''pre-assigned''' credit.
     29
     30 * Else if you can estimate a job's FLOPs after if completes, use '''post-assigned''' credit.
     31
     32 * Else if the app has only CPU versions, use '''runtime credit'''.
     33
     34 * Else use '''adaptive credit'''.
     35
     36== Pre-assigned credit ==
     37
     38You can use this if each job does the same computation.
     39Measure the runtime on a machine with known Whetstone benchmarks.
     40Pick a machine with enough RAM that you're not paging.
     41The credit is then
     42
     43(runtime in days)*benchmark*ncpus*200
     44
     45ncpus is the number of CPUs used by the app version; use a sequential version if possible.
     46
     47You can also use it if the runtime is a linear function of
     48some job attribute (e.g. input file size) that's known in advance.
     49
     50Currently specified with the --additional_xml flag or argument to create_work (cmdline or API).
     51This is ugly.
     52
     53Proposal: make it an official argument to both local and remote job-submission APIs.
     54
     55== Post-assigned credit ==
     56
     57Use this if you can estimate the FLOPs done by a completed job,
     58based on the contents of its output files or stderr.
     59For example, if your app has an outer loop,
     60and you can measure (as above) the credit C due for each iteration,
     61the job credit is C times the number of iterations performed.
     62
     63To use this: in your validator, have the init_result() function return the credit for the job.
     64
     65== Runtime-based credit ==
     66
     67Use this if the app has only CPU app versions.
     68The "claimed credit" for a job instance is runtime*ncpus*benchmark.
     69
     70To use this: pass the --credit_from_runtime option to the app's validator.
     71
     72The app's efficiency (the ratio between peak FLOPS and actual FLOPS)
     73can vary somewhat between hosts (e.g. because of different memory speeds,
     74or because small RAM causes paging).
     75Therefore there will be variation between claimed credit for identical jobs,
     76but generally this will be a factor of 2 or so.
     77
     78Runtime-based credit can't be used if the app has GPU versions
     79because efficiency can vary by orders of magnitude between CPU and GPU versions.
     80
     81Currently this assigns the claimed credit of the canonical result to all results.
     82TODO: average over valid results.
     83
     84== Adaptive credit ==
     85
     86Use this if you have GPU apps, and are unable to estimate FLOPs even after job completion.
     87This method maintains performance statistics on a (host, app version) level,
     88and uses these to normalize credit between CPU and GPU versions.
     89See [CreditNew].
     90
     91To use: this is the default.
     92
     93If you use this, the adaptation will happen faster if you provide
     94values for workunit fp_ops_est that are correlated with the actual FLOPs.
     95Use a constant value if you're not sure.