Changes between Version 18 and Version 19 of AppCoprocessor


Ignore:
Timestamp:
Aug 21, 2009, 11:38:19 AM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AppCoprocessor

    v18 v19  
    11= Applications that use coprocessors =
    22
    3 This document describes BOINC's support for applications that use coprocessors such as
    4 
    5  * GPUs
    6  * Cell SPEs
    7 
    8 We'll assume that these resources are allocated rather than scheduled:
    9 i.e., an application using a coprocessor has it locked while the app is in memory,
    10 even if the app is suspended by BOINC or descheduled by the OS.
     3BOINC supports applications that use coprocessors.
     4The supported coprocessor types (as of [18892])are NVIDIA and API GPUs.
    115
    126The BOINC client probes for coprocessors and reports them in scheduler requests.
     
    148It only runs an app if enough instances are available.
    159
     10You can develop your application using any programming system, e.g.
     11CUDA (for NVIDIA), Brook+ (for ATI) or OpenCL.
     12
     13== Command-line arguments ==
     14
     15Some hosts have multiple GPUs.
     16When your application is run by BOINC, it will be passed a command-line argument
     17{{{
     18--device N
     19}}}
     20where N is the device number of the GPU that is to be used.
     21If your application uses multiple GPUs,
     22it will be passed multiple --device arguments, e.g.
     23{{{
     24--device 0 --device 3
     25}}}
     26
    1627== Deploying a coprocessor app ==
    1728
    18 BOINC uses the [AppPlan application planning] mechanism to
    19 coordinate the scheduling of multi-threaded applications.
     29When you deploy a coprocessor app you must specify:
    2030
    21 Suppose you've developed a coprocessor program,
    22 that it uses a CUDA GPU and 1 GFLOPS of the CPU,
    23 and produces a total of 100 GFLOPS.
    24 To deploy it:
     31 * its hardware and software requirements
     32 * an estimate of what fraction of a CPU it will use
     33 * an estimate of its performance on individual hosts
    2534
    26  * Choose a "planning class" name for the program, say "cuda" (see below).
     35This information is specified in an
     36[AppPlan application planning function] that you link into your scheduler.
     37Specifically, you must:
     38
     39 * Choose a "plan class" name for your program, say "cuda" (see below).
    2740 * Create an [UpdateVersions app version], specifying its plan class as "cuda".
    28  * Link the following function into your scheduler (customize as needed):
     41 * Edit the function '''app_plan()''' in '''sched/sched_customize.cpp''' so that it contains a clause for your plan class.
     42
     43The default '''app_plan()''' contains a clause for plan class '''cuda'''.
     44We will explain its logic; you may need to modify it for your CUDA app.
     45
     46First, we check if the host has an NVIDIA GPU.
    2947{{{
    3048int app_plan(SCHEDULER_REQUEST& sreq, char* plan_class, HOST_USAGE& hu) {
     49    ...
    3150    if (!strcmp(plan_class, "cuda")) {
    32         // the following is for an app that uses a CUDA GPU
    33         //
    3451        COPROC_CUDA* cp = (COPROC_CUDA*)sreq.coprocs.lookup("CUDA");
    3552        if (!cp) {
     
    4158            return PLAN_REJECT_CUDA_NO_DEVICE;
    4259        }
     60}}}
     61
     62Check the compute capability (1.0 or better):
     63{{{
    4364        int v = (cp->prop.major)*100 + cp->prop.minor;
    4465        if (v < 100) {
     
    5071            return PLAN_REJECT_CUDA_VERSION;
    5172        }
     73}}}
    5274
    53         if (cp->drvVersion && cp->drvVersion < PLAN_CUDA_MIN_DRIVER_VERSION) {
    54             if (config.debug_version_select) {
    55                 log_messages.printf(MSG_NORMAL,
    56                     "[version] NVIDIA driver version %d < PLAN_CUDA_MIN_DRIVER_VERSION\n",
    57                     cp->drvVersion
    58                 );
     75Check the CUDA runtime version.
     76As of client version 6.10, all clients report the CUDA runtime version
     77(cp->cuda_version); use that if it's present.
     78In 6.8 and earlier, the CUDA runtime version isn't reported.
     79Windows clients report the driver version,
     80from which the CUDA version can be inferred;
     81Linux clients don't return the driver version,
     82so we don't know what the CUDA version is.
     83{{{
     84        // for CUDA 2.3, we need to check the CUDA RT version.
     85        // Old BOINC clients report display driver version;
     86        // newer ones report CUDA RT version
     87        //
     88        if (!strcmp(plan_class, "cuda23")) {
     89            if (cp->cuda_version) {
     90                if (cp->cuda_version < 2030) {
     91                    return PLAN_REJECT_CUDA_VERSION;
     92                }
     93            } else if (cp->display_driver_version) {
     94                if (cp->display_driver_version < PLAN_CUDA23_MIN_DRIVER_VERSION) {
     95                    return PLAN_REJECT_CUDA_VERSION;
     96                }
     97            } else {
     98                return PLAN_REJECT_CUDA_VERSION;
    5999            }
    60             return PLAN_REJECT_NVIDIA_DRIVER_VERSION;
    61         }
     100}}}
    62101
     102Check for the amount of video RAM:
     103{{{
    63104        if (cp->prop.dtotalGlobalMem < PLAN_CUDA_MIN_RAM) {
    64105            if (config.debug_version_select) {
     
    70111            return PLAN_REJECT_CUDA_MEM;
    71112        }
     113}}}
     114
     115Estimate the FLOPS:
     116{{{
    72117        hu.flops = cp->flops_estimate();
     118}}}
    73119
     120Estimate its CPU usage:
     121{{{
    74122        // assume we'll need 0.5% as many CPU FLOPS as GPU FLOPS
    75123        // to keep the GPU fed.
     
    78126        hu.avg_ncpus = x;
    79127        hu.max_ncpus = x;
    80 
    81         hu.ncudas = 1;
    82 
    83         if (config.debug_version_select) {
    84             log_messages.printf(MSG_NORMAL,
    85                 "[version] CUDA app estimated %.2f GFLOPS (clock %d count %d)\n",
    86                 hu.flops/1e9, cp->prop.clockRate,
    87                 cp->prop.multiProcessorCount
    88             );
    89         }
    90         return 0;
    91     }
    92     log_messages.printf(MSG_CRITICAL,
    93         "Unknown plan class: %s\n", plan_class
    94     );
    95     return PLAN_REJECT_UNKNOWN;
    96 }
    97128}}}
    98129
    99 == Questions ==
     130Indicate the number of GPUs used.
     131Typically this will be 1.
     132If your application uses only a fraction X<1 of the CPU processors,
     133and a fraction Y<1 of video RAM,
     134reports the number of GPUs as min(X, Y).
     135In this case BOINC will attempt to run multiple jobs per GPU is possible.
     136{{{
     137        hu.ncudas = 1;
     138}}}
    100139
    101  * How does BOINC know if non-BOINC applications are using resources?
    102 
    103 
     140Return 0 to indicate that the application can be run on the host:
     141{{{
     142        return 0;
     143}}}