Version 5 (modified by 17 years ago) (diff) | ,
---|
Using coprocessors
This document describes BOINC's support for applications that use coprocessors such as
- GPU(s)
- SPEs in a Cell processor
We'll assume that these resources are "allocated" rather than "scheduled": an application using a coprocessor has it locked while the app is in memory, even if the app is suspended by BOINC or descheduled by the OS.
Proposed design
- The BOINC client will probe for coprocessors, and report them in scheduler requests. The XML looks like:
<coprocs> <coproc_cuda> <count>1</count> <name>GeForce 8800 GT (1)</name> <totalGlobalMem>...</totalGlobalMem> ... </coproc_cuda> </coprocs>
- An app_version record (in the server DB) will have a new field coproc_req, which is a character string encoding its coprocessor requirements.
- The scheduler is linked with a project-specific function
bool coprocessor_compatible(COPROCS&, char coproc_req, double& flops);
This function:
- returns true if the coprocessor resources (COPROC&) are sufficient for the app version
- fills in the num_used fields of the elements of COPROC, indicating how many instances of each coprocessor will be used
- returns (in flops) the estimated FLOPS (used to estimate job completion time)
- The scheduler will be modified so that, when sending a job to a host, it finds the compatible app_version for which flops is greatest.
- The scheduler reply will include, for each app version, the list of coprocessors that it will use, and the estimated FLOPS.
- The client will be modified so that it keeps track of coprocessor allocation, i.e. how many instances of each are free. It only runs an app if enough instances are available, and it decrements the counts accordingly.
- The client will be modified to use app_version.flops in estimating job completion times.
Questions
- How does BOINC know if non-BOINC applications are using resources?
Possible future additions
- Allow app_versions to specify min and max requirements (and have a corresponding allocation scheme in the client).
- Let projects define their own resources, unknown to BOINC, and have "probe" programs (using the assigned-job mechanism) that surveys the resources on each host.
- Store the resource descriptions in the DB (or maybe flat files), so that you can study your host population.