| 1 | = Multi-size apps = |
| 2 | |
| 3 | The difference in throughput between a slow processor |
| 4 | (e.g. an Android device that runs infrequently) |
| 5 | and a fast processor (e.g. a GPU that's always on) |
| 6 | can be a factor of 1,000 or more. |
| 7 | Having a single job size can therefore present problems: |
| 8 | |
| 9 | * If the size is small, hosts with GPUs get huge numbers of jobs. |
| 10 | This causes performance problems on the client |
| 11 | and a high DB load on the server. |
| 12 | * If the size is large, slow hosts can't get jobs, |
| 13 | or they get jobs that take weeks to finish. |
| 14 | |
| 15 | To address this, BOINC provides a mechanism |
| 16 | that tries to match large jobs to fast devices. |
| 17 | |
| 18 | == How it works == |
| 19 | |
| 20 | A '''multi-size application''' has a set of N '''size classes''', 0 ... N-1. |
| 21 | Each job belongs to a size class. |
| 22 | Jobs of size class i are smaller than those of size class i+1. |
| 23 | You decide how many size classes to have, |
| 24 | and how large the jobs of a given size class are. |
| 25 | |
| 26 | The BOINC scheduler maintains statistics about the "effective speed" of devices for each multi-size app, |
| 27 | where effective speed is the device speed times host availability. |
| 28 | In particular, it computes and maintains the boundaries of the N quantiles. |
| 29 | |
| 30 | When a host requests for a particular device, |
| 31 | the scheduler computes its quantile for each multi-size application. |
| 32 | It preferentially sends it jobs of the corresponding size class. |
| 33 | If it must send jobs of a different size class, it prefers smaller classes. |
| 34 | |
| 35 | == Set up the application == |
| 36 | |
| 37 | To make an app multi-size, set the '''n_size_classes''' field of its database entry. |
| 38 | Currently this must be done manually, e.g. |
| 39 | {{{ |
| 40 | update app set n_size_classes=3 where id=14; |
| 41 | }}} |
| 42 | |
| 43 | == Job creation == |
| 44 | |
| 45 | Set the size class of jobs as you create them. |
| 46 | From C++: |
| 47 | {{{ |
| 48 | ... |
| 49 | wu.size_class = 2; |
| 50 | ret = create_work(wu, ...); |
| 51 | }}} |
| 52 | From scripts or command line: |
| 53 | {{{ |
| 54 | create_work ... --size_class 2 |
| 55 | }}} |
| 56 | |
| 57 | Don't forget to set wu.rsc_fpops_est and wu.rsc_fpops_bound appopriately as well. |
| 58 | |
| 59 | == Daemon configuration == |
| 60 | |
| 61 | Arrange to periodically run '''size_census''', |
| 62 | which computes effective speed statistics: |
| 63 | {{{ |
| 64 | <task> |
| 65 | <cmd>run_in_ops size_census</cmd> |
| 66 | <output>size_census.out</output> |
| 67 | <period>24 hour</period> |
| 68 | </task> |
| 69 | }}} |
| 70 | |
| 71 | For each multi-size app, you must run a daemon '''size_regulator''' |
| 72 | that regulates the flow of jobs into the shared-memory job cache, |
| 73 | making sure that cache doesn't get clogged with jobs of a single size |
| 74 | {{{ |
| 75 | <daemon> |
| 76 | <cmd>size_regulator --app_name uppercase --lo 10 --hi 30 --sleep_time 10</cmd> |
| 77 | <output>size_regulator_uppercase.out</output> |
| 78 | <pid_file>size_regulator_uppercase.pid</pid_file> |
| 79 | <disabled>1</disabled> |
| 80 | </daemon> |
| 81 | }}} |
| 82 | |
| 83 | The command-line options of size_regulator are |
| 84 | |
| 85 | --app_name :: name of the application |
| 86 | --lo :: keep at least this many jobs of each size class in cache |
| 87 | --hi :: keep at most this many jobs of each size class in cache |
| 88 | --sleep_time :: sleep this long if nothing to do |
| 89 | |
| 90 | The follow options correspond to those for '''feeder'''; use the same one |
| 91 | |
| 92 | --random_order :: |
| 93 | --priority_asc :: |
| 94 | --priority_order :: |
| 95 | --priority_order_create_time :: |