Version 3 (modified by 17 years ago) (diff) | ,
---|
Managing allocated resources
BOINC has a hardwired model of various resources:
- CPUs
- physical memory and swap space
- disk space
Some hosts have other processing resources:
- GPU(s)
- SPEs in a Cell processor
How can we extend BOINC to take these resources into account, i.e. to send the best available app version to hosts, and to execute the best combination of apps on a given host?
We'll assume that these resources are "allocated" rather than "scheduled": an application using a resource has it locked while the app is in memory, even if the app is suspended by BOINC or descheduled by the OS.
Proposed design
- We define an XML notation for resources. This might look like
<resource> <type>Cell SPE</type> <number>6</number> </resource> <resource> <type>NVIDIA 8800 GPU with 1.7 driver</type> <number>1</number> </resource>
- The BOINC client will discover resources, and will pass the resource description in scheduler request messages.
- An app_version record (in the server DB) will have a new field resource_requirements of the form.
<resource> <type>REGEXP</type> <number>n</number> </resource> ...
A host is "compatible" with an app_version if, for each required resource, the host has at least n instances of a resource whose name matches REGEXP. - In addition, app_version will have an acceleration field, representing the (approximate) speedup relative to CPU-only execution.
- The scheduler will be modified so that, when sending a job to a host, it finds the compatible app_version for which acceleration is greatest.
- The scheduler reply will include app_version.resource_requirements and app_version.acceleration.
- The client will be modified so that it keeps track of resource allocation, i.e. how many instances of each resource are free. It only runs an app if enough instances are available, and it decrements the counts accordingly.
- The client will be modified to use app_version.acceleration in estimating job completion times.
Possible future additions
- Allow app_versions to specify min and max requirements (and have a corresponding allocation scheme in the client).
- Let projects define their own resources, unknown to BOINC, and have "probe" programs (using the assigned-job mechanism) that surveys the resources on each host.
- Store the resource descriptions in the DB (or maybe flat files), so that you can study your host population.