| 1 | = Credit options (proposal) |
| 2 | |
| 3 | "Credit" is a number associated with completed jobs, |
| 4 | reflecting how much floating-point computation was (or could have been) done. |
| 5 | For CPU applications the basic formula is: |
| 6 | |
| 7 | 1 unit of credit = 1/200 day runtime on a CPU whose Whetstone benchmark is 1 GFLOPS. |
| 8 | |
| 9 | Whetstone measures peak performance, and applications that do a lot of memory or disk access get lower FLOPS. |
| 10 | So credit measures peak, not actual, FLOPs. |
| 11 | |
| 12 | Credit is used for two purposes: |
| 13 | |
| 14 | 1) For users, to see their rate of progress, |
| 15 | to compete with other users or teams, |
| 16 | and to compare the performance of hosts. |
| 17 | |
| 18 | 2) To get an estimate of the peak performance available to a particular project, |
| 19 | or of the volunteer host pool as a whole. |
| 20 | |
| 21 | For 2) we care only about averages. |
| 22 | For 1) we also care about parity between similar jobs; |
| 23 | users get upset if someone else gets a lot more credit for a similar job. |
| 24 | |
| 25 | BOINC provides 4 ways of determining credit. |
| 26 | The choice (per app) depends on the properties of the app: |
| 27 | |
| 28 | * If you can estimate a job's FLOPs in advance, use '''pre-assigned''' credit. |
| 29 | |
| 30 | * Else if you can estimate a job's FLOPs after if completes, use '''post-assigned''' credit. |
| 31 | |
| 32 | * Else if the app has only CPU versions, use '''runtime credit'''. |
| 33 | |
| 34 | * Else use '''adaptive credit'''. |
| 35 | |
| 36 | == Pre-assigned credit == |
| 37 | |
| 38 | You can use this if each job does the same computation. |
| 39 | Measure the runtime on a machine with known Whetstone benchmarks. |
| 40 | Pick a machine with enough RAM that you're not paging. |
| 41 | The credit is then |
| 42 | |
| 43 | (runtime in days)*benchmark*ncpus*200 |
| 44 | |
| 45 | ncpus is the number of CPUs used by the app version; use a sequential version if possible. |
| 46 | |
| 47 | You can also use it if the runtime is a linear function of |
| 48 | some job attribute (e.g. input file size) that's known in advance. |
| 49 | |
| 50 | Currently specified with the --additional_xml flag or argument to create_work (cmdline or API). |
| 51 | This is ugly. |
| 52 | |
| 53 | Proposal: make it an official argument to both local and remote job-submission APIs. |
| 54 | |
| 55 | == Post-assigned credit == |
| 56 | |
| 57 | Use this if you can estimate the FLOPs done by a completed job, |
| 58 | based on the contents of its output files or stderr. |
| 59 | For example, if your app has an outer loop, |
| 60 | and you can measure (as above) the credit C due for each iteration, |
| 61 | the job credit is C times the number of iterations performed. |
| 62 | |
| 63 | To use this: in your validator, have the init_result() function return the credit for the job. |
| 64 | |
| 65 | == Runtime-based credit == |
| 66 | |
| 67 | Use this if the app has only CPU app versions. |
| 68 | The "claimed credit" for a job instance is runtime*ncpus*benchmark. |
| 69 | |
| 70 | To use this: pass the --credit_from_runtime option to the app's validator. |
| 71 | |
| 72 | The app's efficiency (the ratio between peak FLOPS and actual FLOPS) |
| 73 | can vary somewhat between hosts (e.g. because of different memory speeds, |
| 74 | or because small RAM causes paging). |
| 75 | Therefore there will be variation between claimed credit for identical jobs, |
| 76 | but generally this will be a factor of 2 or so. |
| 77 | |
| 78 | Runtime-based credit can't be used if the app has GPU versions |
| 79 | because efficiency can vary by orders of magnitude between CPU and GPU versions. |
| 80 | |
| 81 | Currently this assigns the claimed credit of the canonical result to all results. |
| 82 | TODO: average over valid results. |
| 83 | |
| 84 | == Adaptive credit == |
| 85 | |
| 86 | Use this if you have GPU apps, and are unable to estimate FLOPs even after job completion. |
| 87 | This method maintains performance statistics on a (host, app version) level, |
| 88 | and uses these to normalize credit between CPU and GPU versions. |
| 89 | See [CreditNew]. |
| 90 | |
| 91 | To use: this is the default. |
| 92 | |
| 93 | If you use this, the adaptation will happen faster if you provide |
| 94 | values for workunit fp_ops_est that are correlated with the actual FLOPs. |
| 95 | Use a constant value if you're not sure. |