Version 2 (modified by 12 years ago) (diff) | ,
---|
Multi-size apps
The difference in throughput between a slow processor (e.g. an Android device that runs infrequently) and a fast processor (e.g. a GPU that's always on) can be a factor of 1,000 or more. Having a single job size can therefore present problems:
- If the size is small, hosts with GPUs get huge numbers of jobs. This causes performance problems on the client and a high DB load on the server.
- If the size is large, slow hosts can't get jobs, or they get jobs that take weeks to finish.
To address this, BOINC provides a mechanism that tries to match large jobs to fast devices.
How it works
A multi-size application has a set of N size classes, 0 ... N-1. Each job belongs to a size class. Jobs of size class i are smaller than those of size class i+1. You decide how many size classes to have, and how large the jobs of a given size class are.
The BOINC scheduler maintains statistics about the "effective speed" of devices for each multi-size app, where effective speed is the device speed times host availability. In particular, it computes and maintains the boundaries of the N quantiles.
When a host requests for a particular device, the scheduler computes its quantile for each multi-size application. It preferentially sends it jobs of the corresponding size class. If it must send jobs of a different size class, it prefers smaller classes.
Set up the application
To make an app multi-size, set the n_size_classes field of its database entry. Currently this must be done manually, e.g.
update app set n_size_classes=3 where id=14;
Job creation
Set the size class of jobs as you create them. From C++:
... wu.size_class = 2; ret = create_work(wu, ...);
From scripts or command line:
create_work ... --size_class 2
Don't forget to set wu.rsc_fpops_est and wu.rsc_fpops_bound appopriately as well.
Daemon configuration
Arrange to periodically run size_census, which computes effective speed statistics:
<task> <cmd>run_in_ops size_census</cmd> <output>size_census.out</output> <period>24 hour</period> </task>
For each multi-size app, you must run a daemon size_regulator that regulates the flow of jobs into the shared-memory job cache, making sure that cache doesn't get clogged with jobs of a single size
<daemon> <cmd>size_regulator --app_name uppercase --lo 10 --hi 30 --sleep_time 10</cmd> <output>size_regulator_uppercase.out</output> <pid_file>size_regulator_uppercase.pid</pid_file> <disabled>1</disabled> </daemon>
The command-line options of size_regulator are
- --app_name
- name of the application
- --lo
- keep at least this many jobs of each size class in cache
- --hi
- keep at most this many jobs of each size class in cache
- --sleep_time
- sleep this long if nothing to do
The follow options correspond to those for feeder; use the same one
- --random_order
- --priority_asc
- --priority_order
- --priority_order_create_time
Scheduler config
You use this feature you must use include the following in your config.xml:
<matchmaker>1</matchmaker>