wiki:AppPlan

Version 3 (modified by davea, 17 years ago) (diff)

--

Application planning

T(DesignDocument)?

Application planning is a mechanism that lets the scheduler decide, using project-supplied logic, whether an application is able to run on a particular host, and if so what resources it will use and how fast it will run. It works as follows.

An app_version record (in the server DB) has a character string field plan_class. This identifies the range of processing resources that the application requires and is able to use. You can define these however you like, e.g. "cuda_1.1" apps require a CUDA-enabled GPU, "mt32" is a multithreaded app able to use 32 CPUs, etc.

The scheduler is linked with a project-supplied function

bool app_plan(HOST&, char* plan_class, HOST_USAGE&);

The HOST argument describes the host's CPU(s), and includes a field 'coprocs' listing its coprocessors.

When called with a particular HOST and plan class, the function returns true if the host's resources are sufficient for apps of that class. If true, it populates the HOST_USAGE structure:

struct HOST_USAGE {
   COPROCS coprocs;   // coprocessors used by the app (name and count)
   double ncpus;      // #CPUs used by app (may be fractional)
   double flops;      // estimated FLOPS
   char opaque[256];  // passed to the app in init_data.xml
};

When deciding whether to send a job to a host, the scheduler examines all latest-version app_versions for the platform, calls app_plan() for each, and selects the one for which flops is greatest.

The scheduler reply includes, for each app version, an XML encoding of HOST_USAGE.

The client keeps track of coprocessor allocation, i.e. how many instances of each are free. It only runs an app if enough instances are available.

The client uses app_version.usage.flops to estimate job completion times.